diff mbox

[RFC,v2,1/2] drivers: pci: fix pci_mmap_fits() implementation for procfs mmap

Message ID 1414168089-8130-2-git-send-email-lorenzo.pieralisi@arm.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lorenzo Pieralisi Oct. 24, 2014, 4:28 p.m. UTC
The addresses stored in PCI device resources for memory spaces
correspond to CPU physical addresses, which do not necessarily
map 1:1 to PCI bus addresses as programmed in PCI devices configuration
spaces.

Therefore, the changes applied by commits:

8c05cd08a7504b855c26526
3b519e4ea618b6943a82931

imply that the sanity checks carried out in pci_mmap_fits() to
ensure that the user executes an mmap of a "real" pci resource are
erroneous when executed through procfs. Some platforms (ie SPARC)
expect the offset value to be passed in (procfs mapping) to be the
PCI BAR configuration value as read from the PCI device configuration
space, not the fixed-up CPU physical address that is present in PCI
device resources.

The required pgoff (offset in mmap syscall) value passed from userspace
is supposed to represent the resource value exported through
/proc/bus/pci/devices when the resource is mmapped though procfs (and 0
when the mapping is carried out through sysfs resource files), which
corresponds to the PCI resource filtered through the pci_resource_to_user()
API.

This patch converts the PCI resource to the expected "user visible"
value through pci_resource_to_user() before carrying out sanity checks
in pci_mmap_fits() so that the check is carried out on the resource
values as expected from the userspace mmap API.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 drivers/pci/pci-sysfs.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Bjorn Helgaas Nov. 10, 2014, 11:04 p.m. UTC | #1
[+cc Michael, since he merged 2311b1f2bbd3, which added
pci_resource_to_user()]

On Fri, Oct 24, 2014 at 05:28:05PM +0100, Lorenzo Pieralisi wrote:
> The addresses stored in PCI device resources for memory spaces
> correspond to CPU physical addresses, which do not necessarily
> map 1:1 to PCI bus addresses as programmed in PCI devices configuration
> spaces.
> 
> Therefore, the changes applied by commits:
> 
> 8c05cd08a7504b855c26526
> 3b519e4ea618b6943a82931
> 
> imply that the sanity checks carried out in pci_mmap_fits() to
> ensure that the user executes an mmap of a "real" pci resource are
> erroneous when executed through procfs. Some platforms (ie SPARC)
> expect the offset value to be passed in (procfs mapping) to be the
> PCI BAR configuration value as read from the PCI device configuration
> space, not the fixed-up CPU physical address that is present in PCI
> device resources.
> 
> The required pgoff (offset in mmap syscall) value passed from userspace
> is supposed to represent the resource value exported through
> /proc/bus/pci/devices when the resource is mmapped though procfs (and 0
> when the mapping is carried out through sysfs resource files), which
> corresponds to the PCI resource filtered through the pci_resource_to_user()
> API.
> 
> This patch converts the PCI resource to the expected "user visible"
> value through pci_resource_to_user() before carrying out sanity checks
> in pci_mmap_fits() so that the check is carried out on the resource
> values as expected from the userspace mmap API.

I'm trying to figure out what's going on here.  I think this fix is
correct, but it seems like there might be some additional simplification we
could do.

This patch is apparently a bug fix for mmap via procfs.  And the bug
apparently affects platforms where pci_resource_to_user() applies a
non-zero offset, i.e., microblaze, mips, power, and sparc.  It would be
helpful to have a bug report or an example of something that doesn't work.

The second patch fixes a bug on ARM.  How does that patch depend on this
one?  Since ARM doesn't implement pci_resource_to_user(), I wouldn't think
this first patch would change anything on ARM.

Here's what I think I understand so far:

  Applications can mmap PCI memory space via either sysfs or procfs (the
  procfs method is deprecated but still supported):

    - In sysfs, there's a separate /sys/devices/pci*/.../resource* file
      for each device BAR, and the application opens the appropriate
      file and supplies the offset from the beginning of the BAR as the
      mmap(2) offset.

    - In procfs, the application opens the single /proc/bus/pci/... file
      for the device.  On most platforms, it supplies the CPU physical
      address as the mmap(2) offset.  On a few platforms, such as SPARC,
      it supplies the bus address, i.e., a BAR value, instead.

But I'm not sure I have this right.  If the procfs offset is either the
CPU physical address or the BAR value, then pci_resource_to_user()
should be (depending on the arch) either a no-op or use
pci_resource_to_bus().

But that's not how it's implemented.  Maybe it *could* be?  If
pci_resource_to_user() gives you something that's not a CPU physical
address and not a bus address, what *does* it give you, and why would we
need this third kind of thing?

FWIW, I think the discussion leading up to pci_resource_to_user() is here:
http://lkml.iu.edu/hypermail/linux/kernel/0504.3/0467.html

Bjorn

> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Martin Wilck <martin.wilck@ts.fujitsu.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  drivers/pci/pci-sysfs.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 92b6d9a..777d8bc 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -963,17 +963,20 @@ void pci_remove_legacy_files(struct pci_bus *b)
>  int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
>  		  enum pci_mmap_api mmap_api)
>  {
> -	unsigned long nr, start, size, pci_start;
> +	unsigned long nr, start, size, pci_offset;
> +	resource_size_t pci_start, pci_end;
>  
>  	if (pci_resource_len(pdev, resno) == 0)
>  		return 0;
>  	nr = vma_pages(vma);
>  	start = vma->vm_pgoff;
> +	pci_resource_to_user(pdev, resno, &pdev->resource[resno],
> +			     &pci_start, &pci_end);
>  	size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> -	pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> -			pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> -	if (start >= pci_start && start < pci_start + size &&
> -			start + nr <= pci_start + size)
> +	pci_offset = (mmap_api == PCI_MMAP_PROCFS) ?
> +			pci_start >> PAGE_SHIFT : 0;
> +	if (start >= pci_offset && start < pci_offset + size &&
> +			start + nr <= pci_offset + size)
>  		return 1;
>  	return 0;
>  }
> -- 
> 2.1.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Nov. 11, 2014, 11:48 a.m. UTC | #2
On Mon, Nov 10, 2014 at 11:04:54PM +0000, Bjorn Helgaas wrote:
> [+cc Michael, since he merged 2311b1f2bbd3, which added
> pci_resource_to_user()]
> 
> On Fri, Oct 24, 2014 at 05:28:05PM +0100, Lorenzo Pieralisi wrote:
> > The addresses stored in PCI device resources for memory spaces
> > correspond to CPU physical addresses, which do not necessarily
> > map 1:1 to PCI bus addresses as programmed in PCI devices configuration
> > spaces.
> > 
> > Therefore, the changes applied by commits:
> > 
> > 8c05cd08a7504b855c26526
> > 3b519e4ea618b6943a82931
> > 
> > imply that the sanity checks carried out in pci_mmap_fits() to
> > ensure that the user executes an mmap of a "real" pci resource are
> > erroneous when executed through procfs. Some platforms (ie SPARC)
> > expect the offset value to be passed in (procfs mapping) to be the
> > PCI BAR configuration value as read from the PCI device configuration
> > space, not the fixed-up CPU physical address that is present in PCI
> > device resources.
> > 
> > The required pgoff (offset in mmap syscall) value passed from userspace
> > is supposed to represent the resource value exported through
> > /proc/bus/pci/devices when the resource is mmapped though procfs (and 0
> > when the mapping is carried out through sysfs resource files), which
> > corresponds to the PCI resource filtered through the pci_resource_to_user()
> > API.
> > 
> > This patch converts the PCI resource to the expected "user visible"
> > value through pci_resource_to_user() before carrying out sanity checks
> > in pci_mmap_fits() so that the check is carried out on the resource
> > values as expected from the userspace mmap API.
> 
> I'm trying to figure out what's going on here.  I think this fix is
> correct, but it seems like there might be some additional simplification we
> could do.
> 
> This patch is apparently a bug fix for mmap via procfs.  And the bug
> apparently affects platforms where pci_resource_to_user() applies a
> non-zero offset, i.e., microblaze, mips, power, and sparc.  It would be
> helpful to have a bug report or an example of something that doesn't work.

For ARM it is currently impossible to mmap resources to userspace on
eg integrator platform (where CPU <-> PCI bus addresses are not a 1:1
mapping).

> The second patch fixes a bug on ARM.  How does that patch depend on this
> one?  Since ARM doesn't implement pci_resource_to_user(), I wouldn't think
> this first patch would change anything on ARM.

It does not depend on patch 1, but patch 2 is a fix as long as we keep the
*current* mmap implementation (and in particular the expectations about
the offset mmap parameter), which is still uncertain given that patch
1 is just an RFC and the procfs mmap fix can be implemented differently (ie
we might decide that all arch should pass the PCI BAR value in the procfs
mmap offset parameter, if that's the case patch 2 might need to change or
become useless/wrong).

The gist of what we are discussing (and the reason why I posted this set) is
that we have to agree on the procfs and sysfs resource mmap interface, in
particular what the mmap offset parameter is meant to be.

If pci_resource_to_user() is there to make a resource value visible to
the user and that's the value that should be used as offset in the
procfs mmap call, fine by me, as long as we agree on that.

Current code is extremely confusing, that's certain.

> Here's what I think I understand so far:
> 
>   Applications can mmap PCI memory space via either sysfs or procfs (the
>   procfs method is deprecated but still supported):
> 
>     - In sysfs, there's a separate /sys/devices/pci*/.../resource* file
>       for each device BAR, and the application opens the appropriate
>       file and supplies the offset from the beginning of the BAR as the
>       mmap(2) offset.
> 
>     - In procfs, the application opens the single /proc/bus/pci/... file
>       for the device.  On most platforms, it supplies the CPU physical
>       address as the mmap(2) offset.  On a few platforms, such as SPARC,
>       it supplies the bus address, i.e., a BAR value, instead.
> 
> But I'm not sure I have this right.  If the procfs offset is either the
> CPU physical address or the BAR value, then pci_resource_to_user()
> should be (depending on the arch) either a no-op or use
> pci_resource_to_bus().

Exactly (pcibios_resource_to_bus() ?).

> But that's not how it's implemented.  Maybe it *could* be?  If
> pci_resource_to_user() gives you something that's not a CPU physical
> address and not a bus address, what *does* it give you, and why would we
> need this third kind of thing?

Well, you need a per arch function implementation where to define if
the conversion from CPU physical address to PCI bus should take place
or not right ? As you mentioned above, if that should be a per-arch
decision, there has to be a per-arch function to filter the resource
in question, I guess that's my understanding behind pci_resource_to_user(),
but I am not sure either, and understanding that was the primary reason
for this patchset so comments are welcome.

Thanks,
Lorenzo

> FWIW, I think the discussion leading up to pci_resource_to_user() is here:
> http://lkml.iu.edu/hypermail/linux/kernel/0504.3/0467.html
> 
> Bjorn
> 
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: David S. Miller <davem@davemloft.net>
> > Cc: Michal Simek <monstr@monstr.eu>
> > Cc: Martin Wilck <martin.wilck@ts.fujitsu.com>
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  drivers/pci/pci-sysfs.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 92b6d9a..777d8bc 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -963,17 +963,20 @@ void pci_remove_legacy_files(struct pci_bus *b)
> >  int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
> >  		  enum pci_mmap_api mmap_api)
> >  {
> > -	unsigned long nr, start, size, pci_start;
> > +	unsigned long nr, start, size, pci_offset;
> > +	resource_size_t pci_start, pci_end;
> >  
> >  	if (pci_resource_len(pdev, resno) == 0)
> >  		return 0;
> >  	nr = vma_pages(vma);
> >  	start = vma->vm_pgoff;
> > +	pci_resource_to_user(pdev, resno, &pdev->resource[resno],
> > +			     &pci_start, &pci_end);
> >  	size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> > -	pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> > -			pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> > -	if (start >= pci_start && start < pci_start + size &&
> > -			start + nr <= pci_start + size)
> > +	pci_offset = (mmap_api == PCI_MMAP_PROCFS) ?
> > +			pci_start >> PAGE_SHIFT : 0;
> > +	if (start >= pci_offset && start < pci_offset + size &&
> > +			start + nr <= pci_offset + size)
> >  		return 1;
> >  	return 0;
> >  }
> > -- 
> > 2.1.2
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 11, 2014, 2:20 p.m. UTC | #3
On Tue, Nov 11, 2014 at 4:48 AM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> On Mon, Nov 10, 2014 at 11:04:54PM +0000, Bjorn Helgaas wrote:
>> ...
>> Here's what I think I understand so far:
>>
>>   Applications can mmap PCI memory space via either sysfs or procfs (the
>>   procfs method is deprecated but still supported):
>>
>>     - In sysfs, there's a separate /sys/devices/pci*/.../resource* file
>>       for each device BAR, and the application opens the appropriate
>>       file and supplies the offset from the beginning of the BAR as the
>>       mmap(2) offset.
>>
>>     - In procfs, the application opens the single /proc/bus/pci/... file
>>       for the device.  On most platforms, it supplies the CPU physical
>>       address as the mmap(2) offset.  On a few platforms, such as SPARC,
>>       it supplies the bus address, i.e., a BAR value, instead.
>>
>> But I'm not sure I have this right.  If the procfs offset is either the
>> CPU physical address or the BAR value, then pci_resource_to_user()
>> should be (depending on the arch) either a no-op or use
>> pci_resource_to_bus().
>
> Exactly (pcibios_resource_to_bus() ?).
>
>> But that's not how it's implemented.  Maybe it *could* be?  If
>> pci_resource_to_user() gives you something that's not a CPU physical
>> address and not a bus address, what *does* it give you, and why would we
>> need this third kind of thing?
>
> Well, you need a per arch function implementation where to define if
> the conversion from CPU physical address to PCI bus should take place
> or not right ? As you mentioned above, if that should be a per-arch
> decision, there has to be a per-arch function to filter the resource
> in question, I guess that's my understanding behind pci_resource_to_user(),
> but I am not sure either, and understanding that was the primary reason
> for this patchset so comments are welcome.

I agree that we need pci_resource_to_user() because arches do
different things, so we can't just remove pci_resource_to_user() and
replace it with pci_resource_to_bus().  My point is that we have a
generic pci_resource_to_user() implementation that does nothing, and
if an arch *does* implement its own pci_resource_to_user(), it seems
like it should simply call pci_resource_to_user().

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Nov. 11, 2014, 3:57 p.m. UTC | #4
On Tue, Nov 11, 2014 at 02:20:31PM +0000, Bjorn Helgaas wrote:
> On Tue, Nov 11, 2014 at 4:48 AM, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> > On Mon, Nov 10, 2014 at 11:04:54PM +0000, Bjorn Helgaas wrote:
> >> ...
> >> Here's what I think I understand so far:
> >>
> >>   Applications can mmap PCI memory space via either sysfs or procfs (the
> >>   procfs method is deprecated but still supported):
> >>
> >>     - In sysfs, there's a separate /sys/devices/pci*/.../resource* file
> >>       for each device BAR, and the application opens the appropriate
> >>       file and supplies the offset from the beginning of the BAR as the
> >>       mmap(2) offset.
> >>
> >>     - In procfs, the application opens the single /proc/bus/pci/... file
> >>       for the device.  On most platforms, it supplies the CPU physical
> >>       address as the mmap(2) offset.  On a few platforms, such as SPARC,
> >>       it supplies the bus address, i.e., a BAR value, instead.
> >>
> >> But I'm not sure I have this right.  If the procfs offset is either the
> >> CPU physical address or the BAR value, then pci_resource_to_user()
> >> should be (depending on the arch) either a no-op or use
> >> pci_resource_to_bus().
> >
> > Exactly (pcibios_resource_to_bus() ?).
> >
> >> But that's not how it's implemented.  Maybe it *could* be?  If
> >> pci_resource_to_user() gives you something that's not a CPU physical
> >> address and not a bus address, what *does* it give you, and why would we
> >> need this third kind of thing?
> >
> > Well, you need a per arch function implementation where to define if
> > the conversion from CPU physical address to PCI bus should take place
> > or not right ? As you mentioned above, if that should be a per-arch
> > decision, there has to be a per-arch function to filter the resource
> > in question, I guess that's my understanding behind pci_resource_to_user(),
> > but I am not sure either, and understanding that was the primary reason
> > for this patchset so comments are welcome.
> 
> I agree that we need pci_resource_to_user() because arches do
> different things, so we can't just remove pci_resource_to_user() and
> replace it with pci_resource_to_bus().  My point is that we have a
> generic pci_resource_to_user() implementation that does nothing, and
> if an arch *does* implement its own pci_resource_to_user(), it seems
> like it should simply call pci_resource_to_user().

to_bus() you mean. Well, I agree, but I am not sure it would work on all
arches that deviate from the generic implementation, I can't speak for other
architectures since I do not have an in-depth knowledge of their PCI
internal implementations, in particular in relation to CPU <-> PCI
address map conversions/mappings.

I read your comment as an agreement on the approach I took in my patch,
except for the current pci_resource_to_user() implementation(s), which I did
not touch.

Thanks,
Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 11, 2014, 5:19 p.m. UTC | #5
On Tue, Nov 11, 2014 at 8:57 AM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> On Tue, Nov 11, 2014 at 02:20:31PM +0000, Bjorn Helgaas wrote:
>> On Tue, Nov 11, 2014 at 4:48 AM, Lorenzo Pieralisi
>> <lorenzo.pieralisi@arm.com> wrote:
>> > On Mon, Nov 10, 2014 at 11:04:54PM +0000, Bjorn Helgaas wrote:
>> >> ...
>> >> Here's what I think I understand so far:
>> >>
>> >>   Applications can mmap PCI memory space via either sysfs or procfs (the
>> >>   procfs method is deprecated but still supported):
>> >>
>> >>     - In sysfs, there's a separate /sys/devices/pci*/.../resource* file
>> >>       for each device BAR, and the application opens the appropriate
>> >>       file and supplies the offset from the beginning of the BAR as the
>> >>       mmap(2) offset.
>> >>
>> >>     - In procfs, the application opens the single /proc/bus/pci/... file
>> >>       for the device.  On most platforms, it supplies the CPU physical
>> >>       address as the mmap(2) offset.  On a few platforms, such as SPARC,
>> >>       it supplies the bus address, i.e., a BAR value, instead.
>> >>
>> >> But I'm not sure I have this right.  If the procfs offset is either the
>> >> CPU physical address or the BAR value, then pci_resource_to_user()
>> >> should be (depending on the arch) either a no-op or use
>> >> pci_resource_to_bus().
>> >
>> > Exactly (pcibios_resource_to_bus() ?).
>> >
>> >> But that's not how it's implemented.  Maybe it *could* be?  If
>> >> pci_resource_to_user() gives you something that's not a CPU physical
>> >> address and not a bus address, what *does* it give you, and why would we
>> >> need this third kind of thing?
>> >
>> > Well, you need a per arch function implementation where to define if
>> > the conversion from CPU physical address to PCI bus should take place
>> > or not right ? As you mentioned above, if that should be a per-arch
>> > decision, there has to be a per-arch function to filter the resource
>> > in question, I guess that's my understanding behind pci_resource_to_user(),
>> > but I am not sure either, and understanding that was the primary reason
>> > for this patchset so comments are welcome.
>>
>> I agree that we need pci_resource_to_user() because arches do
>> different things, so we can't just remove pci_resource_to_user() and
>> replace it with pci_resource_to_bus().  My point is that we have a
>> generic pci_resource_to_user() implementation that does nothing, and
>> if an arch *does* implement its own pci_resource_to_user(), it seems
>> like it should simply call pci_resource_to_user().
>
> to_bus() you mean.

Yes, exactly, sorry for my typo.  (And earlier, when I said
pci_resource_to_bus() instead of pcibios_resource_to_bus().
pcibios_resource_to_bus() *used* to be an arch-specific function with
many implementations, which is why it's named pcibios_*.  Now that the
PCI core supports address translation, it's no longer arch-specific.
We haven't changed the name to reflect that, but I don't think of it
along with the rest of the pcibios_*() functions anymore.  Hmm, and
now we have this pci_resource_to_user() thing, which *is*
arch-specific, but doesn't have a pcibios_* name.  No wonder this is
confusing :))

> Well, I agree, but I am not sure it would work on all
> arches that deviate from the generic implementation, I can't speak for other
> architectures since I do not have an in-depth knowledge of their PCI
> internal implementations, in particular in relation to CPU <-> PCI
> address map conversions/mappings.
>
> I read your comment as an agreement on the approach I took in my patch,
> except for the current pci_resource_to_user() implementation(s), which I did
> not touch.

Yes.  I have two things I'd like to clear up:

  1) Your patch changes behavior on platforms that implement their own
pci_resource_to_user().  So I'd like to mention the details of that in
the changelog, e.g., "procfs mmap on arches X, Y, Z has been broken
since commit C, and this change fixes them."  ARM doesn't implement
pci_resource_to_user(), so I don't think ARM is one of those arches.
But I'd really like to include specifics on what those arches are, and
what we think is currently broken, so their maintainers at least get a
heads-up and can look for that  breakage.

  2) I'd like to take this opportunity to analyze the current
pci_resource_to_user() implementations and see if we can reduce them
to calling pcibios_resource_to_bus().  This is obviously separable,
and I'll apply these patches anyway, but this does seem like a perfect
opportunity for someone (not necessarily you) to clean this stuff up,
since it's fresh in our minds.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Nov. 12, 2014, 7:23 a.m. UTC | #6
On Mon, 2014-11-10 at 16:04 -0700, Bjorn Helgaas wrote:
> But I'm not sure I have this right.  If the procfs offset is either
> the
> CPU physical address or the BAR value, then pci_resource_to_user()
> should be (depending on the arch) either a no-op or use
> pci_resource_to_bus().
> 
> But that's not how it's implemented.  Maybe it *could* be?  If
> pci_resource_to_user() gives you something that's not a CPU physical
> address and not a bus address, what *does* it give you, and why would
> we
> need this third kind of thing?
> 
> FWIW, I think the discussion leading up to pci_resource_to_user() is
> here:
> http://lkml.iu.edu/hypermail/linux/kernel/0504.3/0467.html

Oh, man... I remember that was all a giant trainwreck and some stuff
just couldn't be made completely right due to broken assumptions by
the proc code and users of it... but I don't remember all the details.

I think /proc users don't necessarily pass a BAR value but something
they try to somewhat translates themselves via the "resources" file,
which ends up working ... or not, depending on various factors such
as 32 vs 64 bit etc...

I wonder who still uses this interface....

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Nov. 12, 2014, 10:27 a.m. UTC | #7
On Wed, Nov 12, 2014 at 07:23:49AM +0000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-11-10 at 16:04 -0700, Bjorn Helgaas wrote:
> > But I'm not sure I have this right.  If the procfs offset is either
> > the
> > CPU physical address or the BAR value, then pci_resource_to_user()
> > should be (depending on the arch) either a no-op or use
> > pci_resource_to_bus().
> > 
> > But that's not how it's implemented.  Maybe it *could* be?  If
> > pci_resource_to_user() gives you something that's not a CPU physical
> > address and not a bus address, what *does* it give you, and why would
> > we
> > need this third kind of thing?
> > 
> > FWIW, I think the discussion leading up to pci_resource_to_user() is
> > here:
> > http://lkml.iu.edu/hypermail/linux/kernel/0504.3/0467.html
> 
> Oh, man... I remember that was all a giant trainwreck and some stuff
> just couldn't be made completely right due to broken assumptions by
> the proc code and users of it... but I don't remember all the details.
> 
> I think /proc users don't necessarily pass a BAR value but something
> they try to somewhat translates themselves via the "resources" file,
> which ends up working ... or not, depending on various factors such
> as 32 vs 64 bit etc...
> 
> I wonder who still uses this interface....

+1, even though I do not think that leaving it as it is is a good idea,
hence I posted this series.

I tried to fix it while fixing the way ARM pcibios code handles
pci_mmap_page_range() (for both procfs and sysfs mappings).

I will do what Bjorn suggested, more comments from arches maintainers
are welcome.

Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Nov. 13, 2014, 11:32 a.m. UTC | #8
On Tue, Nov 11, 2014 at 05:19:31PM +0000, Bjorn Helgaas wrote:

[...]

> > I read your comment as an agreement on the approach I took in my patch,
> > except for the current pci_resource_to_user() implementation(s), which I did
> > not touch.
> 
> Yes.  I have two things I'd like to clear up:
> 
>   1) Your patch changes behavior on platforms that implement their own
> pci_resource_to_user().  So I'd like to mention the details of that in
> the changelog, e.g., "procfs mmap on arches X, Y, Z has been broken
> since commit C, and this change fixes them."  ARM doesn't implement
> pci_resource_to_user(), so I don't think ARM is one of those arches.
> But I'd really like to include specifics on what those arches are, and
> what we think is currently broken, so their maintainers at least get a
> heads-up and can look for that  breakage.

I posted a v3, where I *tried* to bisect the commits that actually broke
the procfs interface and I added a commit log to explain why, it is very
hard to bisect a specific commit (given the dependency on the arch code)
and I do not have HW to test the fix on apart from ARM machines so there
is not much I can do on that side.

Please let me know what you think, thanks for having a look.

Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 92b6d9a..777d8bc 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -963,17 +963,20 @@  void pci_remove_legacy_files(struct pci_bus *b)
 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
 		  enum pci_mmap_api mmap_api)
 {
-	unsigned long nr, start, size, pci_start;
+	unsigned long nr, start, size, pci_offset;
+	resource_size_t pci_start, pci_end;
 
 	if (pci_resource_len(pdev, resno) == 0)
 		return 0;
 	nr = vma_pages(vma);
 	start = vma->vm_pgoff;
+	pci_resource_to_user(pdev, resno, &pdev->resource[resno],
+			     &pci_start, &pci_end);
 	size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
-	pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
-			pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
-	if (start >= pci_start && start < pci_start + size &&
-			start + nr <= pci_start + size)
+	pci_offset = (mmap_api == PCI_MMAP_PROCFS) ?
+			pci_start >> PAGE_SHIFT : 0;
+	if (start >= pci_offset && start < pci_offset + size &&
+			start + nr <= pci_offset + size)
 		return 1;
 	return 0;
 }