Message ID | 20210204165831.2703772-3-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pci sysfs file iomem revoke support | expand |
I see I already acked this, but if you haven't merged it yet there are a few typos in the commit log: On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote: > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > the region") /dev/kmem zaps ptes when the kernel requests exclusive > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > the default for all driver uses. s/ptes/PTEs/ > Except there's two more ways to access PCI BARs: sysfs and proc mmap > support. Let's plug that hole. s/there's two/there are two/ > For revoke_devmem() to work we need to link our vma into the same > address_space, with consistent vma->vm_pgoff. ->pgoff is already > adjusted, because that's how (io_)remap_pfn_range works, but for the > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > to adjust this at at ->open time: > > - for sysfs this is easy, now that binary attributes support this. We > just set bin_attr->mapping when mmap is supported > - for procfs it's a bit more tricky, since procfs pci access has only > one file per device, and access to a specific resources first needs > to be set up with some ioctl calls. But mmap is only supported for > the same resources as sysfs exposes with mmap support, and otherwise > rejected, so we can set the mapping unconditionally at open time > without harm. s/pci access/PCI access/ s/a specific resources/a specific resource/ > A special consideration is for arch_can_pci_mmap_io() - we need to > make sure that the ->f_mapping doesn't alias between ioport and iomem > space. There's only 2 ways in-tree to support mmap of ioports: generic > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > architecture hand-rolling. Both approach support ioport mmap through a > special pfn range and not through magic pte attributes. Aliasing is > therefore not a problem. s/There's only 2/There are only two/ s/pci mmap/PCI mmap/ s/Both approach/Both approaches/ s/pfn/PFN/ s/pte/PTE/ > The only difference in access checks left is that sysfs PCI mmap does > not check for CAP_RAWIO. I'm not really sure whether that should be > added or not. > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pci-sysfs.c | 4 ++++ > drivers/pci/proc.c | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 0c45b4f7b214..f8afd54ca3e1 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_io->read = pci_read_legacy_io; > b->legacy_io->write = pci_write_legacy_io; > b->legacy_io->mmap = pci_mmap_legacy_io; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_io); > error = device_create_bin_file(&b->dev, b->legacy_io); > if (error) > @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_mem->size = 1024*1024; > b->legacy_mem->attr.mode = 0600; > b->legacy_mem->mmap = pci_mmap_legacy_mem; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_mem); > error = device_create_bin_file(&b->dev, b->legacy_mem); > if (error) > @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) > res_attr->mmap = pci_mmap_resource_uc; > } > } > + if (res_attr->mmap) > + res_attr->mapping = iomem_get_mapping(); > res_attr->attr.name = res_attr_name; > res_attr->attr.mode = 0600; > res_attr->size = pci_resource_len(pdev, num); > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > index 3a2f90beb4cb..9bab07302bbf 100644 > --- a/drivers/pci/proc.c > +++ b/drivers/pci/proc.c > @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) > fpriv->write_combine = 0; > > file->private_data = fpriv; > + file->f_mapping = iomem_get_mapping(); > > return 0; > } > -- > 2.30.0 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
[+cc Krzysztof, Pali, Oliver] On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote: > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > the region") /dev/kmem zaps ptes when the kernel requests exclusive > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > the default for all driver uses. > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > support. Let's plug that hole. IIUC, the idea is that if a driver calls request_mem_region() on a PCI BAR, we prevent access to the BAR via sysfs. I guess I'm OK with that if it's a real security improvement or something. But the downside of this implementation is that it depends on iomem_get_mapping(), which doesn't work until after fs_initcalls, which means the sysfs files cannot be static attributes of devices added before that. PCI devices are typically enumerated in subsys_initcall. Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc) to static attributes. This is a major improvement that could get rid of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(), and the "sysfs_initialized" hack. This would fix a race reported by Pali [1] (thanks to Oliver for the idea [2]). EXCEPT that this revoke change means the "resource%d", "legacy_io", and "legacy_mem" files cannot be static attributes because of iomem_get_mapping(). Any ideas on how to deal with this? Having to keep the pci_sysfs_init() initcall just for these few files seems like the tail wagging the dog. [1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali [2] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com > For revoke_devmem() to work we need to link our vma into the same > address_space, with consistent vma->vm_pgoff. ->pgoff is already > adjusted, because that's how (io_)remap_pfn_range works, but for the > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > to adjust this at at ->open time: > > - for sysfs this is easy, now that binary attributes support this. We > just set bin_attr->mapping when mmap is supported > - for procfs it's a bit more tricky, since procfs pci access has only > one file per device, and access to a specific resources first needs > to be set up with some ioctl calls. But mmap is only supported for > the same resources as sysfs exposes with mmap support, and otherwise > rejected, so we can set the mapping unconditionally at open time > without harm. > > A special consideration is for arch_can_pci_mmap_io() - we need to > make sure that the ->f_mapping doesn't alias between ioport and iomem > space. There's only 2 ways in-tree to support mmap of ioports: generic > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > architecture hand-rolling. Both approach support ioport mmap through a > special pfn range and not through magic pte attributes. Aliasing is > therefore not a problem. > > The only difference in access checks left is that sysfs PCI mmap does > not check for CAP_RAWIO. I'm not really sure whether that should be > added or not. > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pci-sysfs.c | 4 ++++ > drivers/pci/proc.c | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 0c45b4f7b214..f8afd54ca3e1 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_io->read = pci_read_legacy_io; > b->legacy_io->write = pci_write_legacy_io; > b->legacy_io->mmap = pci_mmap_legacy_io; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_io); > error = device_create_bin_file(&b->dev, b->legacy_io); > if (error) > @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_mem->size = 1024*1024; > b->legacy_mem->attr.mode = 0600; > b->legacy_mem->mmap = pci_mmap_legacy_mem; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_mem); > error = device_create_bin_file(&b->dev, b->legacy_mem); > if (error) > @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) > res_attr->mmap = pci_mmap_resource_uc; > } > } > + if (res_attr->mmap) > + res_attr->mapping = iomem_get_mapping(); > res_attr->attr.name = res_attr_name; > res_attr->attr.mode = 0600; > res_attr->size = pci_resource_len(pdev, num); > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > index 3a2f90beb4cb..9bab07302bbf 100644 > --- a/drivers/pci/proc.c > +++ b/drivers/pci/proc.c > @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) > fpriv->write_combine = 0; > > file->private_data = fpriv; > + file->f_mapping = iomem_get_mapping(); > > return 0; > } > -- > 2.30.0 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Sat, Mar 13, 2021 at 10:57 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Krzysztof, Pali, Oliver] > > On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote: > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > the default for all driver uses. > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > support. Let's plug that hole. > > IIUC, the idea is that if a driver calls request_mem_region() on a PCI > BAR, we prevent access to the BAR via sysfs. I guess I'm OK with that > if it's a real security improvement or something. Yup. > But the downside of this implementation is that it depends on > iomem_get_mapping(), which doesn't work until after fs_initcalls, > which means the sysfs files cannot be static attributes of devices > added before that. PCI devices are typically enumerated in > subsys_initcall. > > Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc) > to static attributes. This is a major improvement that could get rid > of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(), > and the "sysfs_initialized" hack. This would fix a race reported by > Pali [1] (thanks to Oliver for the idea [2]). > > EXCEPT that this revoke change means the "resource%d", "legacy_io", > and "legacy_mem" files cannot be static attributes because of > iomem_get_mapping(). > > Any ideas on how to deal with this? Having to keep the > pci_sysfs_init() initcall just for these few files seems like the tail > wagging the dog. It's a bit "pick your ugly". Either we have the late init call (not pretty), or the sysfs side needs a callback to fish out the address_space for the mmap at open() time, which didn't stir up much enthusiams with Greg because we need a new callback just for these mmio files. Either approach works. -Daniel > [1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali > [2] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com > > > For revoke_devmem() to work we need to link our vma into the same > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > to adjust this at at ->open time: > > > > - for sysfs this is easy, now that binary attributes support this. We > > just set bin_attr->mapping when mmap is supported > > - for procfs it's a bit more tricky, since procfs pci access has only > > one file per device, and access to a specific resources first needs > > to be set up with some ioctl calls. But mmap is only supported for > > the same resources as sysfs exposes with mmap support, and otherwise > > rejected, so we can set the mapping unconditionally at open time > > without harm. > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > architecture hand-rolling. Both approach support ioport mmap through a > > special pfn range and not through magic pte attributes. Aliasing is > > therefore not a problem. > > > > The only difference in access checks left is that sysfs PCI mmap does > > not check for CAP_RAWIO. I'm not really sure whether that should be > > added or not. > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: John Hubbard <jhubbard@nvidia.com> > > Cc: Jérôme Glisse <jglisse@redhat.com> > > Cc: Jan Kara <jack@suse.cz> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Cc: linux-mm@kvack.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-samsung-soc@vger.kernel.org > > Cc: linux-media@vger.kernel.org > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: linux-pci@vger.kernel.org > > --- > > drivers/pci/pci-sysfs.c | 4 ++++ > > drivers/pci/proc.c | 1 + > > 2 files changed, 5 insertions(+) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index 0c45b4f7b214..f8afd54ca3e1 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > b->legacy_io->read = pci_read_legacy_io; > > b->legacy_io->write = pci_write_legacy_io; > > b->legacy_io->mmap = pci_mmap_legacy_io; > > + b->legacy_io->mapping = iomem_get_mapping(); > > pci_adjust_legacy_attr(b, pci_mmap_io); > > error = device_create_bin_file(&b->dev, b->legacy_io); > > if (error) > > @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > b->legacy_mem->size = 1024*1024; > > b->legacy_mem->attr.mode = 0600; > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > + b->legacy_io->mapping = iomem_get_mapping(); > > pci_adjust_legacy_attr(b, pci_mmap_mem); > > error = device_create_bin_file(&b->dev, b->legacy_mem); > > if (error) > > @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) > > res_attr->mmap = pci_mmap_resource_uc; > > } > > } > > + if (res_attr->mmap) > > + res_attr->mapping = iomem_get_mapping(); > > res_attr->attr.name = res_attr_name; > > res_attr->attr.mode = 0600; > > res_attr->size = pci_resource_len(pdev, num); > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > > index 3a2f90beb4cb..9bab07302bbf 100644 > > --- a/drivers/pci/proc.c > > +++ b/drivers/pci/proc.c > > @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) > > fpriv->write_combine = 0; > > > > file->private_data = fpriv; > > + file->f_mapping = iomem_get_mapping(); > > > > return 0; > > } > > -- > > 2.30.0 > > > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0c45b4f7b214..f8afd54ca3e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error) @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error) @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } + if (res_attr->mmap) + res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num); diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0; file->private_data = fpriv; + file->f_mapping = iomem_get_mapping(); return 0; }