mbox series

[v6,0/3] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it

Message ID 20250214-vfio_pci_mmap-v6-0-6f300cb63a7e@linux.ibm.com (mailing list archive)
Headers show
Series vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it | expand

Message

Niklas Schnelle Feb. 14, 2025, 1:10 p.m. UTC
With the introduction of memory I/O (MIO) instructions enbaled in commit
71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
gained support for direct user-space access to mapped PCI resources.
Even without those however user-space can access mapped PCI resources
via the s390 specific MMIO syscalls. There is thus nothing fundamentally
preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
drivers to access PCI resources without going through the pread()
interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
however.

Firstly the s390 MMIO syscalls do not cause a page fault when
follow_pte() fails due to the page not being present. This breaks
vfio-pci's mmap() handling which lazily maps on first access.

Secondly on s390 there is a virtual PCI device called ISM which has
a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
which leads to any attempt to mmap() it fail with the following message:

    vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size

Even if one tried to map this BAR only partially the mapping would not
be usable on systems with MIO support enabled. So just block mapping
BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
by keeping the vfio-pci mmap() blocking behavior around for this
specific device via a PCI quirk and new pdev->non_mappable_bars
flag.

As noted by Alex Williamson With mmap() enabled in vfio-pci it makes
sense to also enable HAVE_PCI_MMAP with the same restriction for pdev->
non_mappable_bars. So this is added in patch 3 and I tested this with
another small test program.

Note:
For your convenience the code is also available in the tagged
b4/vfio_pci_mmap branch on my git.kernel.org site below:
https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/

Thanks,
Niklas

Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Changes in v6:
- Add a patch to also enable PCI resource mmap() via sysfs and proc
  exlcluding pdev->non_mappable_bars devices (Alex Williamson)
- Added Acks
- Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@linux.ibm.com

Changes in v5:
- Instead of relying on the existing pdev->non_compliant_bars introduce
  a new pdev->non_mappable_bars flag. This replaces the VFIO_PCI_MMAP
  Kconfig option and makes it per-device. This is necessary to not break
  upcoming vfio-pci use of ISM devices (Julian Ruess)
- Squash the removal of VFIO_PCI_MMAP into the second commit as this
  is now where its only use goes away.
- Switch to using follow_pfnmap_start() in MMIO syscall page fault
  handling to match upstream changes
- Dropped R-b's because the changes are significant
- Link to v4: https://lore.kernel.org/r/20240626-vfio_pci_mmap-v4-0-7f038870f022@linux.ibm.com

Changes in v4:
- Overhauled and split up patch 2 which caused errors on ppc due to
  unexported __kernel_io_end. Replaced it with a minimal s390 PCI fixup
  harness to set pdev->non_compliant_bars for ISM plus ignoring devices
  with this flag in vfio-pci. Idea for using PCI quirks came from
  Christoph Hellwig, thanks. Dropped R-bs for patch 2 accordingly.
- Rebased on v6.10-rc5 which includes the vfio-pci mmap fault handler
  fix to the issue I stumbled over independently in v3
- Link to v3: https://lore.kernel.org/r/20240529-vfio_pci_mmap-v3-0-cd217d019218@linux.ibm.com

Changes in v3:
- Rebased on v6.10-rc1 requiring change to follow_pte() call
- Use current->mm for fixup_user_fault() as seems more common
- Collected new trailers
- Link to v2: https://lore.kernel.org/r/20240523-vfio_pci_mmap-v2-0-0dc6c139a4f1@linux.ibm.com

Changes in v2:
- Changed last patch to remove VFIO_PCI_MMAP instead of just enabling it
  for s390 as it is unconditionally true with s390 supporting PCI resource mmap() (Jason)
- Collected R-bs from Jason
- Link to v1: https://lore.kernel.org/r/20240521-vfio_pci_mmap-v1-0-2f6315e0054e@linux.ibm.com

---
Niklas Schnelle (3):
      s390/pci: Fix s390_mmio_read/write syscall page fault handling
      PCI: s390: Support mmap() of BARs and replace VFIO_PCI_MMAP by a device flag
      PCI: s390: Enable HAVE_PCI_MMAP on s390 and restrict mmap() of resources to mappable BARs

 arch/s390/Kconfig                |  4 +---
 arch/s390/include/asm/pci.h      |  4 ++++
 arch/s390/pci/Makefile           |  2 +-
 arch/s390/pci/pci_fixup.c        | 23 +++++++++++++++++++++++
 arch/s390/pci/pci_mmio.c         | 18 +++++++++++++-----
 drivers/pci/pci-sysfs.c          |  4 ++++
 drivers/pci/proc.c               |  4 ++++
 drivers/s390/net/ism_drv.c       |  1 -
 drivers/vfio/pci/Kconfig         |  4 ----
 drivers/vfio/pci/vfio_pci_core.c |  2 +-
 include/linux/pci.h              |  1 +
 include/linux/pci_ids.h          |  1 +
 12 files changed, 53 insertions(+), 15 deletions(-)
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20240503-vfio_pci_mmap-1549e3d02ca7

Best regards,

Comments

Bjorn Helgaas Feb. 24, 2025, 8:53 p.m. UTC | #1
On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote:
> With the introduction of memory I/O (MIO) instructions enbaled in commit
> 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> gained support for direct user-space access to mapped PCI resources.
> Even without those however user-space can access mapped PCI resources
> via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
> drivers to access PCI resources without going through the pread()
> interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
> however.
> 
> Firstly the s390 MMIO syscalls do not cause a page fault when
> follow_pte() fails due to the page not being present. This breaks
> vfio-pci's mmap() handling which lazily maps on first access.
> 
> Secondly on s390 there is a virtual PCI device called ISM which has
> a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> which leads to any attempt to mmap() it fail with the following message:
> 
>     vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> 
> Even if one tried to map this BAR only partially the mapping would not
> be usable on systems with MIO support enabled. So just block mapping
> BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
> by keeping the vfio-pci mmap() blocking behavior around for this
> specific device via a PCI quirk and new pdev->non_mappable_bars
> flag.
> 
> As noted by Alex Williamson With mmap() enabled in vfio-pci it makes
> sense to also enable HAVE_PCI_MMAP with the same restriction for pdev->
> non_mappable_bars. So this is added in patch 3 and I tested this with
> another small test program.
> 
> Note:
> For your convenience the code is also available in the tagged
> b4/vfio_pci_mmap branch on my git.kernel.org site below:
> https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> 
> Thanks,
> Niklas
> 
> Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> Changes in v6:
> - Add a patch to also enable PCI resource mmap() via sysfs and proc
>   exlcluding pdev->non_mappable_bars devices (Alex Williamson)
> - Added Acks
> - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@linux.ibm.com

I think the series would be more readable if patch 2/3 included all
the core changes (adding pci_dev.non_mappable_bars, the 3/3
pci-sysfs.c and proc.c changes to test it, and I suppose the similar
vfio_pci_core.c change), and we moved all the s390 content from 2/3 to
3/3.
Niklas Schnelle Feb. 25, 2025, 8:59 a.m. UTC | #2
On Mon, 2025-02-24 at 14:53 -0600, Bjorn Helgaas wrote:
> On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote:
> > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > gained support for direct user-space access to mapped PCI resources.
> > Even without those however user-space can access mapped PCI resources
> > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
> > drivers to access PCI resources without going through the pread()
> > interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
> > however.
> > 
> > Firstly the s390 MMIO syscalls do not cause a page fault when
> > follow_pte() fails due to the page not being present. This breaks
> > vfio-pci's mmap() handling which lazily maps on first access.
> > 
> > Secondly on s390 there is a virtual PCI device called ISM which has
> > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > which leads to any attempt to mmap() it fail with the following message:
> > 
> >     vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > 
> > Even if one tried to map this BAR only partially the mapping would not
> > be usable on systems with MIO support enabled. So just block mapping
> > BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
> > by keeping the vfio-pci mmap() blocking behavior around for this
> > specific device via a PCI quirk and new pdev->non_mappable_bars
> > flag.
> > 
> > As noted by Alex Williamson With mmap() enabled in vfio-pci it makes
> > sense to also enable HAVE_PCI_MMAP with the same restriction for pdev->
> > non_mappable_bars. So this is added in patch 3 and I tested this with
> > another small test program.
> > 
> > Note:
> > For your convenience the code is also available in the tagged
> > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> > 
> > Thanks,
> > Niklas
> > 
> > Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > ---
> > Changes in v6:
> > - Add a patch to also enable PCI resource mmap() via sysfs and proc
> >   exlcluding pdev->non_mappable_bars devices (Alex Williamson)
> > - Added Acks
> > - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@linux.ibm.com
> 
> I think the series would be more readable if patch 2/3 included all
> the core changes (adding pci_dev.non_mappable_bars, the 3/3
> pci-sysfs.c and proc.c changes to test it, and I suppose the similar
> vfio_pci_core.c change), and we moved all the s390 content from 2/3 to
> 3/3.

Maybe we could do the following:

1/3: As is

2/3: Introduces pdev->non_mappable_bars and the checks in vfio and
proc.c/pci-sysfs.c. To make the flag handle the vfio case with
VFIO_PCI_MMAP gone, a one-line change in s390 will set pdev-
>non_mappable_bars = 1 for all PCI devices.

3/3: Changes setting pdev->non_mappable_bars = 1 in s390 to only the
ISM device using the quirk handling and adds HAVE_PCI_MMAP.

Thanks,
Niklas
Bjorn Helgaas Feb. 25, 2025, 8:35 p.m. UTC | #3
On Tue, Feb 25, 2025 at 09:59:13AM +0100, Niklas Schnelle wrote:
> On Mon, 2025-02-24 at 14:53 -0600, Bjorn Helgaas wrote:
> > On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote:
> > > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > > gained support for direct user-space access to mapped PCI resources.
> > > Even without those however user-space can access mapped PCI resources
> > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > > preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
> > > drivers to access PCI resources without going through the pread()
> > > interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
> > > however.
> > > 
> > > Firstly the s390 MMIO syscalls do not cause a page fault when
> > > follow_pte() fails due to the page not being present. This breaks
> > > vfio-pci's mmap() handling which lazily maps on first access.
> > > 
> > > Secondly on s390 there is a virtual PCI device called ISM which has
> > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > > which leads to any attempt to mmap() it fail with the following message:
> > > 
> > >     vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > > 
> > > Even if one tried to map this BAR only partially the mapping would not
> > > be usable on systems with MIO support enabled. So just block mapping
> > > BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
> > > by keeping the vfio-pci mmap() blocking behavior around for this
> > > specific device via a PCI quirk and new pdev->non_mappable_bars
> > > flag.
> > > 
> > > As noted by Alex Williamson With mmap() enabled in vfio-pci it makes
> > > sense to also enable HAVE_PCI_MMAP with the same restriction for pdev->
> > > non_mappable_bars. So this is added in patch 3 and I tested this with
> > > another small test program.
> > > 
> > > Note:
> > > For your convenience the code is also available in the tagged
> > > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> > > 
> > > Thanks,
> > > Niklas
> > > 
> > > Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > ---
> > > Changes in v6:
> > > - Add a patch to also enable PCI resource mmap() via sysfs and proc
> > >   exlcluding pdev->non_mappable_bars devices (Alex Williamson)
> > > - Added Acks
> > > - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@linux.ibm.com
> > 
> > I think the series would be more readable if patch 2/3 included all
> > the core changes (adding pci_dev.non_mappable_bars, the 3/3
> > pci-sysfs.c and proc.c changes to test it, and I suppose the similar
> > vfio_pci_core.c change), and we moved all the s390 content from 2/3 to
> > 3/3.
> 
> Maybe we could do the following:
> 
> 1/3: As is
> 
> 2/3: Introduces pdev->non_mappable_bars and the checks in vfio and
> proc.c/pci-sysfs.c. To make the flag handle the vfio case with
> VFIO_PCI_MMAP gone, a one-line change in s390 will set pdev-
> >non_mappable_bars = 1 for all PCI devices.

What if you moved the vfio_pci_core.c change to patch 3?  Then I think
patch 2 would do nothing at all (since there's nothing that sets
non_mappable_bars), and all the s390 stuff would be in patch 3?

Not sure if that's possible, but I think it's a little confusing to
have the s390 changes split across patch 2 and 3.

> 3/3: Changes setting pdev->non_mappable_bars = 1 in s390 to only the
> ISM device using the quirk handling and adds HAVE_PCI_MMAP.
> 
> Thanks,
> Niklas
Niklas Schnelle Feb. 26, 2025, 8:28 a.m. UTC | #4
On Tue, 2025-02-25 at 14:35 -0600, Bjorn Helgaas wrote:
> On Tue, Feb 25, 2025 at 09:59:13AM +0100, Niklas Schnelle wrote:
> > On Mon, 2025-02-24 at 14:53 -0600, Bjorn Helgaas wrote:
> > > On Fri, Feb 14, 2025 at 02:10:51PM +0100, Niklas Schnelle wrote:
> > > > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > > > gained support for direct user-space access to mapped PCI resources.
> > > > Even without those however user-space can access mapped PCI resources
> > > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > > > preventing s390 from supporting VFIO_PCI_MMAP, allowing user-space
> > > > drivers to access PCI resources without going through the pread()
> > > > interface. To actually enable VFIO_PCI_MMAP a few issues need fixing
> > > > however.
> > > > 
> > > > Firstly the s390 MMIO syscalls do not cause a page fault when
> > > > follow_pte() fails due to the page not being present. This breaks
> > > > vfio-pci's mmap() handling which lazily maps on first access.
> > > > 
> > > > Secondly on s390 there is a virtual PCI device called ISM which has
> > > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > > > which leads to any attempt to mmap() it fail with the following message:
> > > > 
> > > >     vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > > > 
> > > > Even if one tried to map this BAR only partially the mapping would not
> > > > be usable on systems with MIO support enabled. So just block mapping
> > > > BARs which don't fit between IOREMAP_START and IOREMAP_END. Solve this
> > > > by keeping the vfio-pci mmap() blocking behavior around for this
> > > > specific device via a PCI quirk and new pdev->non_mappable_bars
> > > > flag.
> > > > 
> > > > As noted by Alex Williamson With mmap() enabled in vfio-pci it makes
> > > > sense to also enable HAVE_PCI_MMAP with the same restriction for pdev->
> > > > non_mappable_bars. So this is added in patch 3 and I tested this with
> > > > another small test program.
> > > > 
> > > > Note:
> > > > For your convenience the code is also available in the tagged
> > > > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> > > > 
> > > > Thanks,
> > > > Niklas
> > > > 
> > > > Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
> > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > > ---
> > > > Changes in v6:
> > > > - Add a patch to also enable PCI resource mmap() via sysfs and proc
> > > >   exlcluding pdev->non_mappable_bars devices (Alex Williamson)
> > > > - Added Acks
> > > > - Link to v5: https://lore.kernel.org/r/20250212-vfio_pci_mmap-v5-0-633ca5e056da@linux.ibm.com
> > > 
> > > I think the series would be more readable if patch 2/3 included all
> > > the core changes (adding pci_dev.non_mappable_bars, the 3/3
> > > pci-sysfs.c and proc.c changes to test it, and I suppose the similar
> > > vfio_pci_core.c change), and we moved all the s390 content from 2/3 to
> > > 3/3.
> > 
> > Maybe we could do the following:
> > 
> > 1/3: As is
> > 
> > 2/3: Introduces pdev->non_mappable_bars and the checks in vfio and
> > proc.c/pci-sysfs.c. To make the flag handle the vfio case with
> > VFIO_PCI_MMAP gone, a one-line change in s390 will set pdev-
> > > non_mappable_bars = 1 for all PCI devices.
> 
> What if you moved the vfio_pci_core.c change to patch 3?  Then I think
> patch 2 would do nothing at all (since there's nothing that sets
> non_mappable_bars), and all the s390 stuff would be in patch 3?
> 
> Not sure if that's possible, but I think it's a little confusing to
> have the s390 changes split across patch 2 and 3.

I'm not really a fan of having a completely unused flag, even in an
intermediate commit. I've edited the commits yesterday and with this
approach the s390 specific part of 2/3 really is just the below hunk: 

diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 88f72745fa59..d14b8605a32c 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -590,6 +590,7 @@ int pcibios_device_add(struct pci_dev *pdev)
        zpci_zdev_get(zdev);
        if (pdev->is_physfn)
                pdev->no_vf_scan = 1;
+       pdev->non_mappable_bars = 1;

        zpci_map_resources(pdev);


That added line then gets deleted again in 3/3. I think this makes it
pretty logical, with patch 2/3 we set it for all PCI devices giving the
existing behavior and by pdev->non_mappable_bars replacing the "y if
S390" of VFIO_PCI_MMAP, then 3/3 narrows it down to just the ISM
device.

Thanks,
Niklas