diff mbox series

[v5,11/15] PCI: Obey iomem restrictions for procfs mmap

Message ID 20201030100815.2269-12-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series follow_pfn and other iomap races | expand

Commit Message

Daniel Vetter Oct. 30, 2020, 10:08 a.m. UTC
There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Bjorn Helgaas Nov. 3, 2020, 9:28 p.m. UTC | #1
On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to PCI BARs while a driver is loaded and using it.
> 
> Fix this by adding the same iomem_is_exclusive() check we already have
> on the sysfs side in pci_mmap_resource().
> 
> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
only used in a few places:

  e1000_probe() calls pci_request_selected_regions_exclusive(),
  ne_pci_probe() calls pci_request_regions_exclusive(),
  vmbus_allocate_mmio() calls request_mem_region_exclusive()

which raises the question of whether it's worth keeping
IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
completely.

But if you want it,

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> --
> v2: Improve commit message (Bjorn)
> ---
>  drivers/pci/proc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index d35186b01d98..3a2f90beb4cb 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
>  		else
>  			return -EINVAL;
>  	}
> +
> +	if (dev->resource[i].flags & IORESOURCE_MEM &&
> +	    iomem_is_exclusive(dev->resource[i].start))
> +		return -EINVAL;
> +
>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);
>  	if (ret < 0)
> -- 
> 2.28.0
>
Dan Williams Nov. 3, 2020, 10:09 p.m. UTC | #2
On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > files, and the old proc interface. Two check against
> > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > this starts to matter, since we don't want random userspace having
> > access to PCI BARs while a driver is loaded and using it.
> >
> > Fix this by adding the same iomem_is_exclusive() check we already have
> > on the sysfs side in pci_mmap_resource().
> >
> > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>
> This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
> only used in a few places:
>
>   e1000_probe() calls pci_request_selected_regions_exclusive(),
>   ne_pci_probe() calls pci_request_regions_exclusive(),
>   vmbus_allocate_mmio() calls request_mem_region_exclusive()
>
> which raises the question of whether it's worth keeping
> IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
> completely.

Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to
IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd
be in favor of removing it as well.
Daniel Vetter Nov. 4, 2020, 8:44 a.m. UTC | #3
On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > > files, and the old proc interface. Two check against
> > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > > this starts to matter, since we don't want random userspace having
> > > access to PCI BARs while a driver is loaded and using it.
> > >
> > > Fix this by adding the same iomem_is_exclusive() check we already have
> > > on the sysfs side in pci_mmap_resource().
> > >
> > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >
> > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
> > only used in a few places:
> >
> >   e1000_probe() calls pci_request_selected_regions_exclusive(),
> >   ne_pci_probe() calls pci_request_regions_exclusive(),
> >   vmbus_allocate_mmio() calls request_mem_region_exclusive()
> >
> > which raises the question of whether it's worth keeping
> > IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
> > completely.
>
> Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to
> IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd
> be in favor of removing it as well.

Still has some value since it enforces exclusive access even if the
config isn't enabled, and iirc e1000 had some fun with userspace tools
clobbering the firmware and bricking the chip.

Another thing I kinda wondered, since pci maintainer is here: At least
in drivers/gpu I see very few drivers explicitly requestion regions
(this might be a historical artifact due to the shadow attach stuff
before we had real modesetting drivers). And pci core doesn't do that
either, even when a driver is bound. Is this intentional, or
should/could we do better? Since drivers work happily without
reserving regions I don't think "the drivers need to remember to do
this" will ever really work out well.
-Daniel
Bjorn Helgaas Nov. 4, 2020, 4:50 p.m. UTC | #4
On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote:
> On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > > > files, and the old proc interface. Two check against
> > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > > > this starts to matter, since we don't want random userspace having
> > > > access to PCI BARs while a driver is loaded and using it.
> > > >
> > > > Fix this by adding the same iomem_is_exclusive() check we already have
> > > > on the sysfs side in pci_mmap_resource().
> > > >
> > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > >
> > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
> > > only used in a few places:
> > >
> > >   e1000_probe() calls pci_request_selected_regions_exclusive(),
> > >   ne_pci_probe() calls pci_request_regions_exclusive(),
> > >   vmbus_allocate_mmio() calls request_mem_region_exclusive()
> > >
> > > which raises the question of whether it's worth keeping
> > > IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
> > > completely.
> >
> > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to
> > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd
> > be in favor of removing it as well.
> 
> Still has some value since it enforces exclusive access even if the
> config isn't enabled, and iirc e1000 had some fun with userspace tools
> clobbering the firmware and bricking the chip.

There's *some* value; I'm just skeptical since only three drivers use
it.

IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO
exclusivity for device drivers"), and the commit message says this is
only active when CONFIG_STRICT_DEVMEM is set.  I didn't check to see
whether that's still true.

That commit adds a bunch of wrappers and "__"-prefixed functions to
pass the IORESOURCE_EXCLUSIVE flag around.  That's a fair bit of
uglification for three drivers.

> Another thing I kinda wondered, since pci maintainer is here: At least
> in drivers/gpu I see very few drivers explicitly requestion regions
> (this might be a historical artifact due to the shadow attach stuff
> before we had real modesetting drivers). And pci core doesn't do that
> either, even when a driver is bound. Is this intentional, or
> should/could we do better? Since drivers work happily without
> reserving regions I don't think "the drivers need to remember to do
> this" will ever really work out well.

You're right, many drivers don't call pci_request_regions().  Maybe we
could do better, but I haven't looked into that recently.  There is a
related note in Documentation/PCI/pci.rst that's been there for a long
time (it refers to "pci_request_resources()", which has never existed
AFAICT).  I'm certainly open to proposals.
Dan Williams Nov. 4, 2020, 8:12 p.m. UTC | #5
On Wed, Nov 4, 2020 at 8:50 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote:
> > On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> > > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > > > > files, and the old proc interface. Two check against
> > > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > > > > this starts to matter, since we don't want random userspace having
> > > > > access to PCI BARs while a driver is loaded and using it.
> > > > >
> > > > > Fix this by adding the same iomem_is_exclusive() check we already have
> > > > > on the sysfs side in pci_mmap_resource().
> > > > >
> > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > >
> > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
> > > > only used in a few places:
> > > >
> > > >   e1000_probe() calls pci_request_selected_regions_exclusive(),
> > > >   ne_pci_probe() calls pci_request_regions_exclusive(),
> > > >   vmbus_allocate_mmio() calls request_mem_region_exclusive()
> > > >
> > > > which raises the question of whether it's worth keeping
> > > > IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
> > > > completely.
> > >
> > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to
> > > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd
> > > be in favor of removing it as well.
> >
> > Still has some value since it enforces exclusive access even if the
> > config isn't enabled, and iirc e1000 had some fun with userspace tools
> > clobbering the firmware and bricking the chip.
>
> There's *some* value; I'm just skeptical since only three drivers use
> it.
>
> IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO
> exclusivity for device drivers"), and the commit message says this is
> only active when CONFIG_STRICT_DEVMEM is set.  I didn't check to see
> whether that's still true.
>
> That commit adds a bunch of wrappers and "__"-prefixed functions to
> pass the IORESOURCE_EXCLUSIVE flag around.  That's a fair bit of
> uglification for three drivers.
>
> > Another thing I kinda wondered, since pci maintainer is here: At least
> > in drivers/gpu I see very few drivers explicitly requestion regions
> > (this might be a historical artifact due to the shadow attach stuff
> > before we had real modesetting drivers). And pci core doesn't do that
> > either, even when a driver is bound. Is this intentional, or
> > should/could we do better? Since drivers work happily without
> > reserving regions I don't think "the drivers need to remember to do
> > this" will ever really work out well.
>
> You're right, many drivers don't call pci_request_regions().  Maybe we
> could do better, but I haven't looked into that recently.  There is a
> related note in Documentation/PCI/pci.rst that's been there for a long
> time (it refers to "pci_request_resources()", which has never existed
> AFAICT).  I'm certainly open to proposals.

It seems a bug that the kernel permits MMIO regions with side effects
to be ioremap()'ed without request_mem_region() on the resource. I
wonder how much log spam would happen if ioremap() reported whenever a
non-IORESOURE_BUSY range was passed to it? The current state of
affairs to trust *remap users to have claimed their remap target seems
too ingrained to unwind now.
Daniel Vetter Nov. 5, 2020, 9:34 a.m. UTC | #6
On Wed, Nov 04, 2020 at 12:12:15PM -0800, Dan Williams wrote:
> On Wed, Nov 4, 2020 at 8:50 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote:
> > > On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote:
> > > > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > > > > > files, and the old proc interface. Two check against
> > > > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > > > > > this starts to matter, since we don't want random userspace having
> > > > > > access to PCI BARs while a driver is loaded and using it.
> > > > > >
> > > > > > Fix this by adding the same iomem_is_exclusive() check we already have
> > > > > > on the sysfs side in pci_mmap_resource().
> > > > > >
> > > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > >
> > > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently
> > > > > only used in a few places:
> > > > >
> > > > >   e1000_probe() calls pci_request_selected_regions_exclusive(),
> > > > >   ne_pci_probe() calls pci_request_regions_exclusive(),
> > > > >   vmbus_allocate_mmio() calls request_mem_region_exclusive()
> > > > >
> > > > > which raises the question of whether it's worth keeping
> > > > > IORESOURCE_EXCLUSIVE at all.  I'm totally fine with removing it
> > > > > completely.
> > > >
> > > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to
> > > > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd
> > > > be in favor of removing it as well.
> > >
> > > Still has some value since it enforces exclusive access even if the
> > > config isn't enabled, and iirc e1000 had some fun with userspace tools
> > > clobbering the firmware and bricking the chip.
> >
> > There's *some* value; I'm just skeptical since only three drivers use
> > it.
> >
> > IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO
> > exclusivity for device drivers"), and the commit message says this is
> > only active when CONFIG_STRICT_DEVMEM is set.  I didn't check to see
> > whether that's still true.
> >
> > That commit adds a bunch of wrappers and "__"-prefixed functions to
> > pass the IORESOURCE_EXCLUSIVE flag around.  That's a fair bit of
> > uglification for three drivers.
> >
> > > Another thing I kinda wondered, since pci maintainer is here: At least
> > > in drivers/gpu I see very few drivers explicitly requestion regions
> > > (this might be a historical artifact due to the shadow attach stuff
> > > before we had real modesetting drivers). And pci core doesn't do that
> > > either, even when a driver is bound. Is this intentional, or
> > > should/could we do better? Since drivers work happily without
> > > reserving regions I don't think "the drivers need to remember to do
> > > this" will ever really work out well.
> >
> > You're right, many drivers don't call pci_request_regions().  Maybe we
> > could do better, but I haven't looked into that recently.  There is a
> > related note in Documentation/PCI/pci.rst that's been there for a long
> > time (it refers to "pci_request_resources()", which has never existed
> > AFAICT).  I'm certainly open to proposals.
> 
> It seems a bug that the kernel permits MMIO regions with side effects
> to be ioremap()'ed without request_mem_region() on the resource. I
> wonder how much log spam would happen if ioremap() reported whenever a
> non-IORESOURE_BUSY range was passed to it? The current state of
> affairs to trust *remap users to have claimed their remap target seems
> too ingrained to unwind now.

Yeah I think that's hopeless. I think the only feasible approach is if bus
drivers claim resources by default when a driver is bound (it should nest,
so if the driver claims again, I think that should all keep working), just
using the driver name. Probably with some special casing for legacy io
(vgaarb.c should claim these I guess). Still probably tons of fallout.

Once that's rolled out to all bus drivers we could perhaps add the ioremap
check without drowning in log spam. Still a multi-year project I think :-/
-Daniel
diff mbox series

Patch

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@  static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)