mbox series

[v3,0/3] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it

Message ID 20240529-vfio_pci_mmap-v3-0-cd217d019218@linux.ibm.com (mailing list archive)
Headers show
Series vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it | expand

Message

Niklas Schnelle May 29, 2024, 11:36 a.m. UTC
With the introduction of memory I/O (MIO) instructions enbaled in commit
71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
gained support for direct user-space access to mapped PCI resources.
Even without those however user-space can access mapped PCI resources
via the s390 specific MMIO syscalls. There is thus nothing fundamentally
preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
to access PCI resources without going through the pread() interface.
To actually enable VFIO_PCI_MMAP a few issues need fixing however.

Firstly the s390 MMIO syscalls do not cause a page fault when
follow_pte() fails due to the page not being present. This breaks
vfio-pci's mmap() handling which lazily maps on first access.

Secondly on s390 there is a virtual PCI device called ISM which has
a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
which leads to any attempt to mmap() it fail with the following message:

    vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size

Even if one tried to map this BAR only partially the mapping would not
be usable on systems with MIO support enabled. So just block mapping
BARs which don't fit between IOREMAP_START and IOREMAP_END.

Note:
For your convenience the code is also available in the tagged
b4/vfio_pci_mmap branch on my git.kernel.org site below:
https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/

Thanks,
Niklas

Link: https://lore.kernel.org/all/c5ba134a1d4f4465b5956027e6a4ea6f6beff969.camel@linux.ibm.com/
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Changes in v3:
- Rebased on v6.10-rc1 requiring change to follow_pte() call
- Use current->mm for fixup_user_fault() as seems more common
- Collected new trailers
- Link to v2: https://lore.kernel.org/r/20240523-vfio_pci_mmap-v2-0-0dc6c139a4f1@linux.ibm.com

Changes in v2:
- Changed last patch to remove VFIO_PCI_MMAP instead of just enabling it
  for s390 as it is unconditionally true with s390 supporting PCI resource mmap() (Jason)
- Collected R-bs from Jason
- Link to v1: https://lore.kernel.org/r/20240521-vfio_pci_mmap-v1-0-2f6315e0054e@linux.ibm.com

---
Niklas Schnelle (3):
      s390/pci: Fix s390_mmio_read/write syscall page fault handling
      vfio/pci: Tolerate oversized BARs by disallowing mmap
      vfio/pci: Enable PCI resource mmap() on s390 and remove VFIO_PCI_MMAP

 arch/s390/pci/pci_mmio.c         | 18 +++++++++++++-----
 drivers/vfio/pci/Kconfig         |  4 ----
 drivers/vfio/pci/vfio_pci_core.c | 11 ++++++-----
 3 files changed, 19 insertions(+), 14 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240503-vfio_pci_mmap-1549e3d02ca7

Best regards,

Comments

Christian Borntraeger June 3, 2024, 3:50 p.m. UTC | #1
Am 29.05.24 um 13:36 schrieb Niklas Schnelle:
> With the introduction of memory I/O (MIO) instructions enbaled in commit
> 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> gained support for direct user-space access to mapped PCI resources.
> Even without those however user-space can access mapped PCI resources
> via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
> to access PCI resources without going through the pread() interface.
> To actually enable VFIO_PCI_MMAP a few issues need fixing however.
> 
> Firstly the s390 MMIO syscalls do not cause a page fault when
> follow_pte() fails due to the page not being present. This breaks
> vfio-pci's mmap() handling which lazily maps on first access.
> 
> Secondly on s390 there is a virtual PCI device called ISM which has
> a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> which leads to any attempt to mmap() it fail with the following message:
> 
>      vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> 
> Even if one tried to map this BAR only partially the mapping would not
> be usable on systems with MIO support enabled. So just block mapping
> BARs which don't fit between IOREMAP_START and IOREMAP_END.
> 
> Note:
> For your convenience the code is also available in the tagged
> b4/vfio_pci_mmap branch on my git.kernel.org site below:
> https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/


I guess its now mostly a question of who picks those patches? Alex?

Any patch suitable for stable?
Niklas Schnelle June 4, 2024, 9:27 a.m. UTC | #2
On Mon, 2024-06-03 at 17:50 +0200, Christian Borntraeger wrote:
> Am 29.05.24 um 13:36 schrieb Niklas Schnelle:
> > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > gained support for direct user-space access to mapped PCI resources.
> > Even without those however user-space can access mapped PCI resources
> > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
> > to access PCI resources without going through the pread() interface.
> > To actually enable VFIO_PCI_MMAP a few issues need fixing however.
> > 
> > Firstly the s390 MMIO syscalls do not cause a page fault when
> > follow_pte() fails due to the page not being present. This breaks
> > vfio-pci's mmap() handling which lazily maps on first access.
> > 
> > Secondly on s390 there is a virtual PCI device called ISM which has
> > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > which leads to any attempt to mmap() it fail with the following message:
> > 
> >      vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > 
> > Even if one tried to map this BAR only partially the mapping would not
> > be usable on systems with MIO support enabled. So just block mapping
> > BARs which don't fit between IOREMAP_START and IOREMAP_END.
> > 
> > Note:
> > For your convenience the code is also available in the tagged
> > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> 
> 
> I guess its now mostly a question of who picks those patches? Alex?

That matches my understanding as well.

> 
> Any patch suitable for stable?

I'd almost say all but the last one may be candidates for stable. I
found it hard to pinpoint a specific commit they fix though, hence the
lack of Fixes tag. For the first one I'm actually not sure if e.g.
rdma-core users could also run into this problem when they get swapped
out as I'm not sure if the mapping is pinned there.
Niklas Schnelle June 5, 2024, 7:49 a.m. UTC | #3
On Tue, 2024-06-04 at 11:27 +0200, Niklas Schnelle wrote:
> On Mon, 2024-06-03 at 17:50 +0200, Christian Borntraeger wrote:
> > Am 29.05.24 um 13:36 schrieb Niklas Schnelle:
> > > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > > gained support for direct user-space access to mapped PCI resources.
> > > Even without those however user-space can access mapped PCI resources
> > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > > preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
> > > to access PCI resources without going through the pread() interface.
> > > To actually enable VFIO_PCI_MMAP a few issues need fixing however.
> > > 
> > > Firstly the s390 MMIO syscalls do not cause a page fault when
> > > follow_pte() fails due to the page not being present. This breaks
> > > vfio-pci's mmap() handling which lazily maps on first access.
> > > 
> > > Secondly on s390 there is a virtual PCI device called ISM which has
> > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > > which leads to any attempt to mmap() it fail with the following message:
> > > 
> > >      vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > > 
> > > Even if one tried to map this BAR only partially the mapping would not
> > > be usable on systems with MIO support enabled. So just block mapping
> > > BARs which don't fit between IOREMAP_START and IOREMAP_END.
> > > 
> > > Note:
> > > For your convenience the code is also available in the tagged
> > > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > > https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
> > 
> > 
> > I guess its now mostly a question of who picks those patches? Alex?
> 
> That matches my understanding as well.
> 
> > 
> > Any patch suitable for stable?
> 
> I'd almost say all but the last one may be candidates for stable. I
> found it hard to pinpoint a specific commit they fix though, hence the
> lack of Fixes tag. For the first one I'm actually not sure if e.g.
> rdma-core users could also run into this problem when they get swapped
> out as I'm not sure if the mapping is pinned there.
> 

Was a bit unclear/wrong above. Obviously MMIO mappings can't be
"swapped out" I should have said "are subject to page faults".
Alex Williamson June 6, 2024, 5:27 p.m. UTC | #4
On Mon, 3 Jun 2024 17:50:13 +0200
Christian Borntraeger <borntraeger@linux.ibm.com> wrote:

> Am 29.05.24 um 13:36 schrieb Niklas Schnelle:
> > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > gained support for direct user-space access to mapped PCI resources.
> > Even without those however user-space can access mapped PCI resources
> > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
> > to access PCI resources without going through the pread() interface.
> > To actually enable VFIO_PCI_MMAP a few issues need fixing however.
> > 
> > Firstly the s390 MMIO syscalls do not cause a page fault when
> > follow_pte() fails due to the page not being present. This breaks
> > vfio-pci's mmap() handling which lazily maps on first access.
> > 
> > Secondly on s390 there is a virtual PCI device called ISM which has
> > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > which leads to any attempt to mmap() it fail with the following message:
> > 
> >      vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > 
> > Even if one tried to map this BAR only partially the mapping would not
> > be usable on systems with MIO support enabled. So just block mapping
> > BARs which don't fit between IOREMAP_START and IOREMAP_END.
> > 
> > Note:
> > For your convenience the code is also available in the tagged
> > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/  
> 
> 
> I guess its now mostly a question of who picks those patches? Alex?
> 
> Any patch suitable for stable?

Nothing here looks like stable material to me.  1/ only becomes an
issue when mmap of MMIO is allowed on s390 (ie. 3/), 2/ is generic, but
only really targets a device found on s390, and finally 3/ is
essentially enabling a new feature.

If we expect any conflicts with 1/ in the next merge window I can take
a branch for it and apply 2/ and 3/ through the vfio tree, otherwise I
can bring them all through the vfio tree if the s390 folks agree.
Thanks,

Alex
Alexander Gordeev June 7, 2024, 7:38 a.m. UTC | #5
On Thu, Jun 06, 2024 at 11:27:18AM -0600, Alex Williamson wrote:

Hi Alex,

> If we expect any conflicts with 1/ in the next merge window I can take
> a branch for it and apply 2/ and 3/ through the vfio tree, otherwise I
> can bring them all through the vfio tree if the s390 folks agree.

Yes. Pull it via the vfio tree, please.

> Thanks,
> 
> Alex

Thanks!
Niklas Schnelle June 7, 2024, 7:47 a.m. UTC | #6
On Thu, 2024-06-06 at 11:27 -0600, Alex Williamson wrote:
> On Mon, 3 Jun 2024 17:50:13 +0200
> Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> 
> > Am 29.05.24 um 13:36 schrieb Niklas Schnelle:
> > > With the introduction of memory I/O (MIO) instructions enbaled in commit
> > > 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") s390
> > > gained support for direct user-space access to mapped PCI resources.
> > > Even without those however user-space can access mapped PCI resources
> > > via the s390 specific MMIO syscalls. There is thus nothing fundamentally
> > > preventing s390 from supporting VFIO_PCI_MMAP allowing user-space drivers
> > > to access PCI resources without going through the pread() interface.
> > > To actually enable VFIO_PCI_MMAP a few issues need fixing however.
> > > 
> > > Firstly the s390 MMIO syscalls do not cause a page fault when
> > > follow_pte() fails due to the page not being present. This breaks
> > > vfio-pci's mmap() handling which lazily maps on first access.
> > > 
> > > Secondly on s390 there is a virtual PCI device called ISM which has
> > > a few oddities. For one it claims to have a 256 TiB PCI BAR (not a typo)
> > > which leads to any attempt to mmap() it fail with the following message:
> > > 
> > >      vmap allocation for size 281474976714752 failed: use vmalloc=<size> to increase size
> > > 
> > > Even if one tried to map this BAR only partially the mapping would not
> > > be usable on systems with MIO support enabled. So just block mapping
> > > BARs which don't fit between IOREMAP_START and IOREMAP_END.
> > > 
> > > Note:
> > > For your convenience the code is also available in the tagged
> > > b4/vfio_pci_mmap branch on my git.kernel.org site below:
> > > https: //git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/  
> > 
> > 
> > I guess its now mostly a question of who picks those patches? Alex?
> > 
> > Any patch suitable for stable?
> 
> Nothing here looks like stable material to me.  1/ only becomes an
> issue when mmap of MMIO is allowed on s390 (ie. 3/), 2/ is generic, but
> only really targets a device found on s390, and finally 3/ is
> essentially enabling a new feature.

I trust your judgement and was unsure too. I think for the
s390_mmio_read/write syscalls the only existing users out there are via
rdma-core, so unless Jason tells us that he thinks they could also be
affected by the lack of page fault handling I see no problem in going
upstream only.

> 
> If we expect any conflicts with 1/ in the next merge window I can take
> a branch for it and apply 2/ and 3/ through the vfio tree, otherwise I
> can bring them all through the vfio tree if the s390 folks agree.
> Thanks,
> 
> Alex
> 

I also agree with this going via the vfio tree. I don't forsee
conflicts with 1.

Thanks,
Niklas
Jason Gunthorpe June 7, 2024, 2:23 p.m. UTC | #7
On Fri, Jun 07, 2024 at 09:47:08AM +0200, Niklas Schnelle wrote:
> I trust your judgement and was unsure too. I think for the
> s390_mmio_read/write syscalls the only existing users out there are via
> rdma-core, so unless Jason tells us that he thinks they could also be
> affected by the lack of page fault handling I see no problem in going
> upstream only.

rdma doesn't use the fault path for the doorbell mmio.

Jason