mbox series

[v4,0/9] vhost-user: Add SHMEM_MAP/UNMAP requests

Message ID 20250217164012.246727-1-aesteve@redhat.com (mailing list archive)
Headers show
Series vhost-user: Add SHMEM_MAP/UNMAP requests | expand

Message

Albert Esteve Feb. 17, 2025, 4:40 p.m. UTC
Hi all,

v3->v4
- Change mmap strategy to use RAM blocks
  and subregions.
- Add new bitfield to qmp feature map
- Followed most review comments from
  last iteration.
- Merged documentation patch again with
  this one. Makes more sense to
  review them together after all.
- Add documentation for MEM_READ/WRITE
  messages.

The goal of this patch is to support
dynamic fd-backed memory maps initiated
from vhost-user backends.
There are many devices that could already
benefit of this feature, e.g.,
virtiofs or virtio-gpu.

After receiving the SHMEM_MAP/UNMAP request,
the frontend creates the RAMBlock form the
fd and maps it by adding it as a subregion
of the shared memory region container.

The VIRTIO Shared Memory Region list is
declared in the `VirtIODevice` struct
to make it generic.

TODO: There was a conversation on the
previous version around adding tests
to the patch (which I have acknowledged).
However, given the numerous changes
that the patch already has, I have
decided to send it early and collect
some feedback while I work on the
tests for the next iteration.
Given that I have been able to
test the implementation with
my local setup, I am more or less
confident that, at least, the code
is in a relatively sane state
so that no reviewing time is
wasted on broken patches.

This patch also includes:
- SHMEM_CONFIG frontend request that is
specifically meant to allow generic
vhost-user-device frontend to be able to
query VIRTIO Shared Memory settings from the
backend (as this device is generic and agnostic
of the actual backend configuration).

- MEM_READ/WRITE backend requests are
added to deal with a potential issue when having
multiple backends sharing a file descriptor.
When a backend calls SHMEM_MAP it makes
accessing to the region fail for other
backend as it is missing from their translation
table. So these requests are a fallback
for vhost-user memory translation fails.

Albert Esteve (9):
  vhost-user: Add VirtIO Shared Memory map request
  vhost_user.rst: Align VhostUserMsg excerpt members
  vhost_user.rst: Add SHMEM_MAP/_UNMAP to spec
  vhost_user: Add frontend get_shmem_config command
  vhost_user.rst: Add GET_SHMEM_CONFIG message
  qmp: add shmem feature map
  vhost-user-devive: Add shmem BAR
  vhost_user: Add mem_read/write backend requests
  vhost_user.rst: Add MEM_READ/WRITE messages

 docs/interop/vhost-user.rst               | 110 +++++++++
 hw/virtio/vhost-user-base.c               |  47 +++-
 hw/virtio/vhost-user-device-pci.c         |  36 ++-
 hw/virtio/vhost-user.c                    | 272 ++++++++++++++++++++--
 hw/virtio/virtio-qmp.c                    |   3 +
 hw/virtio/virtio.c                        |  81 +++++++
 include/hw/virtio/vhost-backend.h         |   9 +
 include/hw/virtio/vhost-user.h            |   1 +
 include/hw/virtio/virtio.h                |  29 +++
 subprojects/libvhost-user/libvhost-user.c | 160 +++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  92 ++++++++
 11 files changed, 813 insertions(+), 27 deletions(-)

Comments

David Hildenbrand Feb. 17, 2025, 8:01 p.m. UTC | #1
On 17.02.25 17:40, Albert Esteve wrote:
> Hi all,
> 

Hi,

looks like our debugging session was successfu :)

One question below.

> v3->v4
> - Change mmap strategy to use RAM blocks
>    and subregions.
> - Add new bitfield to qmp feature map
> - Followed most review comments from
>    last iteration.
> - Merged documentation patch again with
>    this one. Makes more sense to
>    review them together after all.
> - Add documentation for MEM_READ/WRITE
>    messages.
> 
> The goal of this patch is to support
> dynamic fd-backed memory maps initiated
> from vhost-user backends.
> There are many devices that could already
> benefit of this feature, e.g.,
> virtiofs or virtio-gpu.
> 
> After receiving the SHMEM_MAP/UNMAP request,
> the frontend creates the RAMBlock form the
> fd and maps it by adding it as a subregion
> of the shared memory region container.
> 
> The VIRTIO Shared Memory Region list is
> declared in the `VirtIODevice` struct
> to make it generic.
> 
> TODO: There was a conversation on the
> previous version around adding tests
> to the patch (which I have acknowledged).
> However, given the numerous changes
> that the patch already has, I have
> decided to send it early and collect
> some feedback while I work on the
> tests for the next iteration.
> Given that I have been able to
> test the implementation with
> my local setup, I am more or less
> confident that, at least, the code
> is in a relatively sane state
> so that no reviewing time is
> wasted on broken patches.
> 
> This patch also includes:
> - SHMEM_CONFIG frontend request that is
> specifically meant to allow generic
> vhost-user-device frontend to be able to
> query VIRTIO Shared Memory settings from the
> backend (as this device is generic and agnostic
> of the actual backend configuration).
> 
> - MEM_READ/WRITE backend requests are
> added to deal with a potential issue when having
> multiple backends sharing a file descriptor.
> When a backend calls SHMEM_MAP it makes
> accessing to the region fail for other
> backend as it is missing from their translation
> table. So these requests are a fallback
> for vhost-user memory translation fails.

Can you elaborate what the issue here is?

Why would SHMEM_MAP make accessing the region fail for other backends -- 
what makes this missing from their translation?
Albert Esteve Feb. 24, 2025, 8:54 a.m. UTC | #2
On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.02.25 17:40, Albert Esteve wrote:
> > Hi all,
> >
>
> Hi,
>
> looks like our debugging session was successfu :)
>
> One question below.
>
> > v3->v4
> > - Change mmap strategy to use RAM blocks
> >    and subregions.
> > - Add new bitfield to qmp feature map
> > - Followed most review comments from
> >    last iteration.
> > - Merged documentation patch again with
> >    this one. Makes more sense to
> >    review them together after all.
> > - Add documentation for MEM_READ/WRITE
> >    messages.
> >
> > The goal of this patch is to support
> > dynamic fd-backed memory maps initiated
> > from vhost-user backends.
> > There are many devices that could already
> > benefit of this feature, e.g.,
> > virtiofs or virtio-gpu.
> >
> > After receiving the SHMEM_MAP/UNMAP request,
> > the frontend creates the RAMBlock form the
> > fd and maps it by adding it as a subregion
> > of the shared memory region container.
> >
> > The VIRTIO Shared Memory Region list is
> > declared in the `VirtIODevice` struct
> > to make it generic.
> >
> > TODO: There was a conversation on the
> > previous version around adding tests
> > to the patch (which I have acknowledged).
> > However, given the numerous changes
> > that the patch already has, I have
> > decided to send it early and collect
> > some feedback while I work on the
> > tests for the next iteration.
> > Given that I have been able to
> > test the implementation with
> > my local setup, I am more or less
> > confident that, at least, the code
> > is in a relatively sane state
> > so that no reviewing time is
> > wasted on broken patches.
> >
> > This patch also includes:
> > - SHMEM_CONFIG frontend request that is
> > specifically meant to allow generic
> > vhost-user-device frontend to be able to
> > query VIRTIO Shared Memory settings from the
> > backend (as this device is generic and agnostic
> > of the actual backend configuration).
> >
> > - MEM_READ/WRITE backend requests are
> > added to deal with a potential issue when having
> > multiple backends sharing a file descriptor.
> > When a backend calls SHMEM_MAP it makes
> > accessing to the region fail for other
> > backend as it is missing from their translation
> > table. So these requests are a fallback
> > for vhost-user memory translation fails.
>
> Can you elaborate what the issue here is?
>
> Why would SHMEM_MAP make accessing the region fail for other backends --
> what makes this missing from their translation?

This issue was raised by Stefan Hajnoczi in one of the first
iterations of this patchset, based upon previous David Gilbert's work
on the virtiofs DAX Window.

Let me paste here some of his remarks:

"""
Other backends don't see these mappings. If the guest submits a vring
descriptor referencing a mapping to another backend, then that backend
won't be able to access this memory.
"""
[...]
"""
A bit more detail:

Device A has a VIRTIO Shared Memory Region. An application mmaps that
memory (examples: guest userspace driver using Linux VFIO, a guest
kernel driver that exposes the memory to userspace via mmap, or guest
kernel DAX). The application passes that memory as an I/O buffer to
device B (e.g. O_DIRECT disk I/O).

The result is that device B's vhost-user backend receives a vring
descriptor that points to a guest memory address in device A's VIRTIO
Shared Memory Region. Since device B does not have this memory in its
table, it cannot translate the address and the device breaks.
"""

I have not triggered the issue myself. So the idea is that the next
patch will *definitively* include some testing for the commits that I
cannot verify with my local setup.

BR,
Albert.

>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 24, 2025, 9:16 a.m. UTC | #3
On 24.02.25 09:54, Albert Esteve wrote:
> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 17.02.25 17:40, Albert Esteve wrote:
>>> Hi all,
>>>
>>
>> Hi,
>>
>> looks like our debugging session was successfu :)
>>
>> One question below.
>>
>>> v3->v4
>>> - Change mmap strategy to use RAM blocks
>>>     and subregions.
>>> - Add new bitfield to qmp feature map
>>> - Followed most review comments from
>>>     last iteration.
>>> - Merged documentation patch again with
>>>     this one. Makes more sense to
>>>     review them together after all.
>>> - Add documentation for MEM_READ/WRITE
>>>     messages.
>>>
>>> The goal of this patch is to support
>>> dynamic fd-backed memory maps initiated
>>> from vhost-user backends.
>>> There are many devices that could already
>>> benefit of this feature, e.g.,
>>> virtiofs or virtio-gpu.
>>>
>>> After receiving the SHMEM_MAP/UNMAP request,
>>> the frontend creates the RAMBlock form the
>>> fd and maps it by adding it as a subregion
>>> of the shared memory region container.
>>>
>>> The VIRTIO Shared Memory Region list is
>>> declared in the `VirtIODevice` struct
>>> to make it generic.
>>>
>>> TODO: There was a conversation on the
>>> previous version around adding tests
>>> to the patch (which I have acknowledged).
>>> However, given the numerous changes
>>> that the patch already has, I have
>>> decided to send it early and collect
>>> some feedback while I work on the
>>> tests for the next iteration.
>>> Given that I have been able to
>>> test the implementation with
>>> my local setup, I am more or less
>>> confident that, at least, the code
>>> is in a relatively sane state
>>> so that no reviewing time is
>>> wasted on broken patches.
>>>
>>> This patch also includes:
>>> - SHMEM_CONFIG frontend request that is
>>> specifically meant to allow generic
>>> vhost-user-device frontend to be able to
>>> query VIRTIO Shared Memory settings from the
>>> backend (as this device is generic and agnostic
>>> of the actual backend configuration).
>>>
>>> - MEM_READ/WRITE backend requests are
>>> added to deal with a potential issue when having
>>> multiple backends sharing a file descriptor.
>>> When a backend calls SHMEM_MAP it makes
>>> accessing to the region fail for other
>>> backend as it is missing from their translation
>>> table. So these requests are a fallback
>>> for vhost-user memory translation fails.
>>
>> Can you elaborate what the issue here is?
>>
>> Why would SHMEM_MAP make accessing the region fail for other backends --
>> what makes this missing from their translation?
> 
> This issue was raised by Stefan Hajnoczi in one of the first
> iterations of this patchset, based upon previous David Gilbert's work
> on the virtiofs DAX Window.
> 
> Let me paste here some of his remarks:
> 
> """
> Other backends don't see these mappings. If the guest submits a vring
> descriptor referencing a mapping to another backend, then that backend
> won't be able to access this memory.
> """
> [...]
> """
> A bit more detail:
> 
> Device A has a VIRTIO Shared Memory Region. An application mmaps that
> memory (examples: guest userspace driver using Linux VFIO, a guest
> kernel driver that exposes the memory to userspace via mmap, or guest
> kernel DAX). The application passes that memory as an I/O buffer to
> device B (e.g. O_DIRECT disk I/O).
> 
> The result is that device B's vhost-user backend receives a vring
> descriptor that points to a guest memory address in device A's VIRTIO
> Shared Memory Region. Since device B does not have this memory in its
> table, it cannot translate the address and the device breaks.
> """
> 
> I have not triggered the issue myself. So the idea is that the next
> patch will *definitively* include some testing for the commits that I
> cannot verify with my local setup.

Hah! But isn't that exact problem which is now solved by our rework?

Whatever is mapped in the VIRTIO Shared Memory Region will be 
communicated to all other vhost-user devices. So they should have that 
memory in their map and should be able to access it.

The only thing vhost-user devices cannot access are IIRC ram_device_ptr 
memory regions (e.g., from vfio devices). But that is independent shared 
memory regions.
Albert Esteve Feb. 24, 2025, 9:35 a.m. UTC | #4
On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.02.25 09:54, Albert Esteve wrote:
> > On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 17.02.25 17:40, Albert Esteve wrote:
> >>> Hi all,
> >>>
> >>
> >> Hi,
> >>
> >> looks like our debugging session was successfu :)
> >>
> >> One question below.
> >>
> >>> v3->v4
> >>> - Change mmap strategy to use RAM blocks
> >>>     and subregions.
> >>> - Add new bitfield to qmp feature map
> >>> - Followed most review comments from
> >>>     last iteration.
> >>> - Merged documentation patch again with
> >>>     this one. Makes more sense to
> >>>     review them together after all.
> >>> - Add documentation for MEM_READ/WRITE
> >>>     messages.
> >>>
> >>> The goal of this patch is to support
> >>> dynamic fd-backed memory maps initiated
> >>> from vhost-user backends.
> >>> There are many devices that could already
> >>> benefit of this feature, e.g.,
> >>> virtiofs or virtio-gpu.
> >>>
> >>> After receiving the SHMEM_MAP/UNMAP request,
> >>> the frontend creates the RAMBlock form the
> >>> fd and maps it by adding it as a subregion
> >>> of the shared memory region container.
> >>>
> >>> The VIRTIO Shared Memory Region list is
> >>> declared in the `VirtIODevice` struct
> >>> to make it generic.
> >>>
> >>> TODO: There was a conversation on the
> >>> previous version around adding tests
> >>> to the patch (which I have acknowledged).
> >>> However, given the numerous changes
> >>> that the patch already has, I have
> >>> decided to send it early and collect
> >>> some feedback while I work on the
> >>> tests for the next iteration.
> >>> Given that I have been able to
> >>> test the implementation with
> >>> my local setup, I am more or less
> >>> confident that, at least, the code
> >>> is in a relatively sane state
> >>> so that no reviewing time is
> >>> wasted on broken patches.
> >>>
> >>> This patch also includes:
> >>> - SHMEM_CONFIG frontend request that is
> >>> specifically meant to allow generic
> >>> vhost-user-device frontend to be able to
> >>> query VIRTIO Shared Memory settings from the
> >>> backend (as this device is generic and agnostic
> >>> of the actual backend configuration).
> >>>
> >>> - MEM_READ/WRITE backend requests are
> >>> added to deal with a potential issue when having
> >>> multiple backends sharing a file descriptor.
> >>> When a backend calls SHMEM_MAP it makes
> >>> accessing to the region fail for other
> >>> backend as it is missing from their translation
> >>> table. So these requests are a fallback
> >>> for vhost-user memory translation fails.
> >>
> >> Can you elaborate what the issue here is?
> >>
> >> Why would SHMEM_MAP make accessing the region fail for other backends --
> >> what makes this missing from their translation?
> >
> > This issue was raised by Stefan Hajnoczi in one of the first
> > iterations of this patchset, based upon previous David Gilbert's work
> > on the virtiofs DAX Window.
> >
> > Let me paste here some of his remarks:
> >
> > """
> > Other backends don't see these mappings. If the guest submits a vring
> > descriptor referencing a mapping to another backend, then that backend
> > won't be able to access this memory.
> > """
> > [...]
> > """
> > A bit more detail:
> >
> > Device A has a VIRTIO Shared Memory Region. An application mmaps that
> > memory (examples: guest userspace driver using Linux VFIO, a guest
> > kernel driver that exposes the memory to userspace via mmap, or guest
> > kernel DAX). The application passes that memory as an I/O buffer to
> > device B (e.g. O_DIRECT disk I/O).
> >
> > The result is that device B's vhost-user backend receives a vring
> > descriptor that points to a guest memory address in device A's VIRTIO
> > Shared Memory Region. Since device B does not have this memory in its
> > table, it cannot translate the address and the device breaks.
> > """
> >
> > I have not triggered the issue myself. So the idea is that the next
> > patch will *definitively* include some testing for the commits that I
> > cannot verify with my local setup.
>
> Hah! But isn't that exact problem which is now solved by our rework?
>
> Whatever is mapped in the VIRTIO Shared Memory Region will be
> communicated to all other vhost-user devices. So they should have that
> memory in their map and should be able to access it.

You mean the SET_MEM_TABLE message after the vhost_commit is sent to
all vhost-user devices? I was not sure, as I was testing with a single
device, that would be great, and simplify the patch a lot.

>
> The only thing vhost-user devices cannot access are IIRC ram_device_ptr
> memory regions (e.g., from vfio devices). But that is independent shared
> memory regions.
>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 24, 2025, 9:49 a.m. UTC | #5
On 24.02.25 10:35, Albert Esteve wrote:
> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 24.02.25 09:54, Albert Esteve wrote:
>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 17.02.25 17:40, Albert Esteve wrote:
>>>>> Hi all,
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> looks like our debugging session was successfu :)
>>>>
>>>> One question below.
>>>>
>>>>> v3->v4
>>>>> - Change mmap strategy to use RAM blocks
>>>>>      and subregions.
>>>>> - Add new bitfield to qmp feature map
>>>>> - Followed most review comments from
>>>>>      last iteration.
>>>>> - Merged documentation patch again with
>>>>>      this one. Makes more sense to
>>>>>      review them together after all.
>>>>> - Add documentation for MEM_READ/WRITE
>>>>>      messages.
>>>>>
>>>>> The goal of this patch is to support
>>>>> dynamic fd-backed memory maps initiated
>>>>> from vhost-user backends.
>>>>> There are many devices that could already
>>>>> benefit of this feature, e.g.,
>>>>> virtiofs or virtio-gpu.
>>>>>
>>>>> After receiving the SHMEM_MAP/UNMAP request,
>>>>> the frontend creates the RAMBlock form the
>>>>> fd and maps it by adding it as a subregion
>>>>> of the shared memory region container.
>>>>>
>>>>> The VIRTIO Shared Memory Region list is
>>>>> declared in the `VirtIODevice` struct
>>>>> to make it generic.
>>>>>
>>>>> TODO: There was a conversation on the
>>>>> previous version around adding tests
>>>>> to the patch (which I have acknowledged).
>>>>> However, given the numerous changes
>>>>> that the patch already has, I have
>>>>> decided to send it early and collect
>>>>> some feedback while I work on the
>>>>> tests for the next iteration.
>>>>> Given that I have been able to
>>>>> test the implementation with
>>>>> my local setup, I am more or less
>>>>> confident that, at least, the code
>>>>> is in a relatively sane state
>>>>> so that no reviewing time is
>>>>> wasted on broken patches.
>>>>>
>>>>> This patch also includes:
>>>>> - SHMEM_CONFIG frontend request that is
>>>>> specifically meant to allow generic
>>>>> vhost-user-device frontend to be able to
>>>>> query VIRTIO Shared Memory settings from the
>>>>> backend (as this device is generic and agnostic
>>>>> of the actual backend configuration).
>>>>>
>>>>> - MEM_READ/WRITE backend requests are
>>>>> added to deal with a potential issue when having
>>>>> multiple backends sharing a file descriptor.
>>>>> When a backend calls SHMEM_MAP it makes
>>>>> accessing to the region fail for other
>>>>> backend as it is missing from their translation
>>>>> table. So these requests are a fallback
>>>>> for vhost-user memory translation fails.
>>>>
>>>> Can you elaborate what the issue here is?
>>>>
>>>> Why would SHMEM_MAP make accessing the region fail for other backends --
>>>> what makes this missing from their translation?
>>>
>>> This issue was raised by Stefan Hajnoczi in one of the first
>>> iterations of this patchset, based upon previous David Gilbert's work
>>> on the virtiofs DAX Window.
>>>
>>> Let me paste here some of his remarks:
>>>
>>> """
>>> Other backends don't see these mappings. If the guest submits a vring
>>> descriptor referencing a mapping to another backend, then that backend
>>> won't be able to access this memory.
>>> """
>>> [...]
>>> """
>>> A bit more detail:
>>>
>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that
>>> memory (examples: guest userspace driver using Linux VFIO, a guest
>>> kernel driver that exposes the memory to userspace via mmap, or guest
>>> kernel DAX). The application passes that memory as an I/O buffer to
>>> device B (e.g. O_DIRECT disk I/O).
>>>
>>> The result is that device B's vhost-user backend receives a vring
>>> descriptor that points to a guest memory address in device A's VIRTIO
>>> Shared Memory Region. Since device B does not have this memory in its
>>> table, it cannot translate the address and the device breaks.
>>> """
>>>
>>> I have not triggered the issue myself. So the idea is that the next
>>> patch will *definitively* include some testing for the commits that I
>>> cannot verify with my local setup.
>>
>> Hah! But isn't that exact problem which is now solved by our rework?
>>
>> Whatever is mapped in the VIRTIO Shared Memory Region will be
>> communicated to all other vhost-user devices. So they should have that
>> memory in their map and should be able to access it.
> 
> You mean the SET_MEM_TABLE message after the vhost_commit is sent to
> all vhost-user devices? I was not sure, as I was testing with a single
> device, that would be great, and simplify the patch a lot.

Yes, all vhost-user devices should be updated.
Albert Esteve Feb. 24, 2025, 1:41 p.m. UTC | #6
On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.02.25 10:35, Albert Esteve wrote:
> > On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 24.02.25 09:54, Albert Esteve wrote:
> >>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 17.02.25 17:40, Albert Esteve wrote:
> >>>>> Hi all,
> >>>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> looks like our debugging session was successfu :)
> >>>>
> >>>> One question below.
> >>>>
> >>>>> v3->v4
> >>>>> - Change mmap strategy to use RAM blocks
> >>>>>      and subregions.
> >>>>> - Add new bitfield to qmp feature map
> >>>>> - Followed most review comments from
> >>>>>      last iteration.
> >>>>> - Merged documentation patch again with
> >>>>>      this one. Makes more sense to
> >>>>>      review them together after all.
> >>>>> - Add documentation for MEM_READ/WRITE
> >>>>>      messages.
> >>>>>
> >>>>> The goal of this patch is to support
> >>>>> dynamic fd-backed memory maps initiated
> >>>>> from vhost-user backends.
> >>>>> There are many devices that could already
> >>>>> benefit of this feature, e.g.,
> >>>>> virtiofs or virtio-gpu.
> >>>>>
> >>>>> After receiving the SHMEM_MAP/UNMAP request,
> >>>>> the frontend creates the RAMBlock form the
> >>>>> fd and maps it by adding it as a subregion
> >>>>> of the shared memory region container.
> >>>>>
> >>>>> The VIRTIO Shared Memory Region list is
> >>>>> declared in the `VirtIODevice` struct
> >>>>> to make it generic.
> >>>>>
> >>>>> TODO: There was a conversation on the
> >>>>> previous version around adding tests
> >>>>> to the patch (which I have acknowledged).
> >>>>> However, given the numerous changes
> >>>>> that the patch already has, I have
> >>>>> decided to send it early and collect
> >>>>> some feedback while I work on the
> >>>>> tests for the next iteration.
> >>>>> Given that I have been able to
> >>>>> test the implementation with
> >>>>> my local setup, I am more or less
> >>>>> confident that, at least, the code
> >>>>> is in a relatively sane state
> >>>>> so that no reviewing time is
> >>>>> wasted on broken patches.
> >>>>>
> >>>>> This patch also includes:
> >>>>> - SHMEM_CONFIG frontend request that is
> >>>>> specifically meant to allow generic
> >>>>> vhost-user-device frontend to be able to
> >>>>> query VIRTIO Shared Memory settings from the
> >>>>> backend (as this device is generic and agnostic
> >>>>> of the actual backend configuration).
> >>>>>
> >>>>> - MEM_READ/WRITE backend requests are
> >>>>> added to deal with a potential issue when having
> >>>>> multiple backends sharing a file descriptor.
> >>>>> When a backend calls SHMEM_MAP it makes
> >>>>> accessing to the region fail for other
> >>>>> backend as it is missing from their translation
> >>>>> table. So these requests are a fallback
> >>>>> for vhost-user memory translation fails.
> >>>>
> >>>> Can you elaborate what the issue here is?
> >>>>
> >>>> Why would SHMEM_MAP make accessing the region fail for other backends --
> >>>> what makes this missing from their translation?
> >>>
> >>> This issue was raised by Stefan Hajnoczi in one of the first
> >>> iterations of this patchset, based upon previous David Gilbert's work
> >>> on the virtiofs DAX Window.
> >>>
> >>> Let me paste here some of his remarks:
> >>>
> >>> """
> >>> Other backends don't see these mappings. If the guest submits a vring
> >>> descriptor referencing a mapping to another backend, then that backend
> >>> won't be able to access this memory.
> >>> """
> >>> [...]
> >>> """
> >>> A bit more detail:
> >>>
> >>> Device A has a VIRTIO Shared Memory Region. An application mmaps that
> >>> memory (examples: guest userspace driver using Linux VFIO, a guest
> >>> kernel driver that exposes the memory to userspace via mmap, or guest
> >>> kernel DAX). The application passes that memory as an I/O buffer to
> >>> device B (e.g. O_DIRECT disk I/O).
> >>>
> >>> The result is that device B's vhost-user backend receives a vring
> >>> descriptor that points to a guest memory address in device A's VIRTIO
> >>> Shared Memory Region. Since device B does not have this memory in its
> >>> table, it cannot translate the address and the device breaks.
> >>> """
> >>>
> >>> I have not triggered the issue myself. So the idea is that the next
> >>> patch will *definitively* include some testing for the commits that I
> >>> cannot verify with my local setup.
> >>
> >> Hah! But isn't that exact problem which is now solved by our rework?
> >>
> >> Whatever is mapped in the VIRTIO Shared Memory Region will be
> >> communicated to all other vhost-user devices. So they should have that
> >> memory in their map and should be able to access it.
> >
> > You mean the SET_MEM_TABLE message after the vhost_commit is sent to
> > all vhost-user devices? I was not sure, as I was testing with a single
> > device, that would be great, and simplify the patch a lot.
>
> Yes, all vhost-user devices should be updated.

Then, I think I agree with you, it would seem that this approach
naturally solved the issue with address translation among different
devices, as they all get the most up-to-date memory table after each
mmap.

WDYT, @Stefan Hajnoczi ?
If we are unsure, maybe we can leave the MEM_READ/WRITE support as a
later extension, and try to integrate the rest of this patch first.

>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 24, 2025, 1:57 p.m. UTC | #7
On 24.02.25 14:41, Albert Esteve wrote:
> On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 24.02.25 10:35, Albert Esteve wrote:
>>> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 24.02.25 09:54, Albert Esteve wrote:
>>>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 17.02.25 17:40, Albert Esteve wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> looks like our debugging session was successfu :)
>>>>>>
>>>>>> One question below.
>>>>>>
>>>>>>> v3->v4
>>>>>>> - Change mmap strategy to use RAM blocks
>>>>>>>       and subregions.
>>>>>>> - Add new bitfield to qmp feature map
>>>>>>> - Followed most review comments from
>>>>>>>       last iteration.
>>>>>>> - Merged documentation patch again with
>>>>>>>       this one. Makes more sense to
>>>>>>>       review them together after all.
>>>>>>> - Add documentation for MEM_READ/WRITE
>>>>>>>       messages.
>>>>>>>
>>>>>>> The goal of this patch is to support
>>>>>>> dynamic fd-backed memory maps initiated
>>>>>>> from vhost-user backends.
>>>>>>> There are many devices that could already
>>>>>>> benefit of this feature, e.g.,
>>>>>>> virtiofs or virtio-gpu.
>>>>>>>
>>>>>>> After receiving the SHMEM_MAP/UNMAP request,
>>>>>>> the frontend creates the RAMBlock form the
>>>>>>> fd and maps it by adding it as a subregion
>>>>>>> of the shared memory region container.
>>>>>>>
>>>>>>> The VIRTIO Shared Memory Region list is
>>>>>>> declared in the `VirtIODevice` struct
>>>>>>> to make it generic.
>>>>>>>
>>>>>>> TODO: There was a conversation on the
>>>>>>> previous version around adding tests
>>>>>>> to the patch (which I have acknowledged).
>>>>>>> However, given the numerous changes
>>>>>>> that the patch already has, I have
>>>>>>> decided to send it early and collect
>>>>>>> some feedback while I work on the
>>>>>>> tests for the next iteration.
>>>>>>> Given that I have been able to
>>>>>>> test the implementation with
>>>>>>> my local setup, I am more or less
>>>>>>> confident that, at least, the code
>>>>>>> is in a relatively sane state
>>>>>>> so that no reviewing time is
>>>>>>> wasted on broken patches.
>>>>>>>
>>>>>>> This patch also includes:
>>>>>>> - SHMEM_CONFIG frontend request that is
>>>>>>> specifically meant to allow generic
>>>>>>> vhost-user-device frontend to be able to
>>>>>>> query VIRTIO Shared Memory settings from the
>>>>>>> backend (as this device is generic and agnostic
>>>>>>> of the actual backend configuration).
>>>>>>>
>>>>>>> - MEM_READ/WRITE backend requests are
>>>>>>> added to deal with a potential issue when having
>>>>>>> multiple backends sharing a file descriptor.
>>>>>>> When a backend calls SHMEM_MAP it makes
>>>>>>> accessing to the region fail for other
>>>>>>> backend as it is missing from their translation
>>>>>>> table. So these requests are a fallback
>>>>>>> for vhost-user memory translation fails.
>>>>>>
>>>>>> Can you elaborate what the issue here is?
>>>>>>
>>>>>> Why would SHMEM_MAP make accessing the region fail for other backends --
>>>>>> what makes this missing from their translation?
>>>>>
>>>>> This issue was raised by Stefan Hajnoczi in one of the first
>>>>> iterations of this patchset, based upon previous David Gilbert's work
>>>>> on the virtiofs DAX Window.
>>>>>
>>>>> Let me paste here some of his remarks:
>>>>>
>>>>> """
>>>>> Other backends don't see these mappings. If the guest submits a vring
>>>>> descriptor referencing a mapping to another backend, then that backend
>>>>> won't be able to access this memory.
>>>>> """
>>>>> [...]
>>>>> """
>>>>> A bit more detail:
>>>>>
>>>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that
>>>>> memory (examples: guest userspace driver using Linux VFIO, a guest
>>>>> kernel driver that exposes the memory to userspace via mmap, or guest
>>>>> kernel DAX). The application passes that memory as an I/O buffer to
>>>>> device B (e.g. O_DIRECT disk I/O).
>>>>>
>>>>> The result is that device B's vhost-user backend receives a vring
>>>>> descriptor that points to a guest memory address in device A's VIRTIO
>>>>> Shared Memory Region. Since device B does not have this memory in its
>>>>> table, it cannot translate the address and the device breaks.
>>>>> """
>>>>>
>>>>> I have not triggered the issue myself. So the idea is that the next
>>>>> patch will *definitively* include some testing for the commits that I
>>>>> cannot verify with my local setup.
>>>>
>>>> Hah! But isn't that exact problem which is now solved by our rework?
>>>>
>>>> Whatever is mapped in the VIRTIO Shared Memory Region will be
>>>> communicated to all other vhost-user devices. So they should have that
>>>> memory in their map and should be able to access it.
>>>
>>> You mean the SET_MEM_TABLE message after the vhost_commit is sent to
>>> all vhost-user devices? I was not sure, as I was testing with a single
>>> device, that would be great, and simplify the patch a lot.
>>
>> Yes, all vhost-user devices should be updated.
> 
> Then, I think I agree with you, it would seem that this approach
> naturally solved the issue with address translation among different
> devices, as they all get the most up-to-date memory table after each
> mmap.
> 
> WDYT, @Stefan Hajnoczi ?
> If we are unsure, maybe we can leave the MEM_READ/WRITE support as a
> later extension, and try to integrate the rest of this patch first.

As commented offline, maybe one would want the option to enable the 
alternative mode, where such updates (in the SHM region) are not sent to 
vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE 
would be unavoidable.

What comes to mind are vhost-user devices with limited number of 
supported memslots.

No idea how relevant that really is, and how many SHM regions we will 
see in practice.

I recently increased the number of supported memslots for rust-vmm and 
libvhost-user, but not sure about other devices (in particular, dpdk, 
spdk, and whether we really care about that here).
Albert Esteve Feb. 24, 2025, 3:15 p.m. UTC | #8
On Mon, Feb 24, 2025 at 2:57 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.02.25 14:41, Albert Esteve wrote:
> > On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 24.02.25 10:35, Albert Esteve wrote:
> >>> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 24.02.25 09:54, Albert Esteve wrote:
> >>>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>> On 17.02.25 17:40, Albert Esteve wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> looks like our debugging session was successfu :)
> >>>>>>
> >>>>>> One question below.
> >>>>>>
> >>>>>>> v3->v4
> >>>>>>> - Change mmap strategy to use RAM blocks
> >>>>>>>       and subregions.
> >>>>>>> - Add new bitfield to qmp feature map
> >>>>>>> - Followed most review comments from
> >>>>>>>       last iteration.
> >>>>>>> - Merged documentation patch again with
> >>>>>>>       this one. Makes more sense to
> >>>>>>>       review them together after all.
> >>>>>>> - Add documentation for MEM_READ/WRITE
> >>>>>>>       messages.
> >>>>>>>
> >>>>>>> The goal of this patch is to support
> >>>>>>> dynamic fd-backed memory maps initiated
> >>>>>>> from vhost-user backends.
> >>>>>>> There are many devices that could already
> >>>>>>> benefit of this feature, e.g.,
> >>>>>>> virtiofs or virtio-gpu.
> >>>>>>>
> >>>>>>> After receiving the SHMEM_MAP/UNMAP request,
> >>>>>>> the frontend creates the RAMBlock form the
> >>>>>>> fd and maps it by adding it as a subregion
> >>>>>>> of the shared memory region container.
> >>>>>>>
> >>>>>>> The VIRTIO Shared Memory Region list is
> >>>>>>> declared in the `VirtIODevice` struct
> >>>>>>> to make it generic.
> >>>>>>>
> >>>>>>> TODO: There was a conversation on the
> >>>>>>> previous version around adding tests
> >>>>>>> to the patch (which I have acknowledged).
> >>>>>>> However, given the numerous changes
> >>>>>>> that the patch already has, I have
> >>>>>>> decided to send it early and collect
> >>>>>>> some feedback while I work on the
> >>>>>>> tests for the next iteration.
> >>>>>>> Given that I have been able to
> >>>>>>> test the implementation with
> >>>>>>> my local setup, I am more or less
> >>>>>>> confident that, at least, the code
> >>>>>>> is in a relatively sane state
> >>>>>>> so that no reviewing time is
> >>>>>>> wasted on broken patches.
> >>>>>>>
> >>>>>>> This patch also includes:
> >>>>>>> - SHMEM_CONFIG frontend request that is
> >>>>>>> specifically meant to allow generic
> >>>>>>> vhost-user-device frontend to be able to
> >>>>>>> query VIRTIO Shared Memory settings from the
> >>>>>>> backend (as this device is generic and agnostic
> >>>>>>> of the actual backend configuration).
> >>>>>>>
> >>>>>>> - MEM_READ/WRITE backend requests are
> >>>>>>> added to deal with a potential issue when having
> >>>>>>> multiple backends sharing a file descriptor.
> >>>>>>> When a backend calls SHMEM_MAP it makes
> >>>>>>> accessing to the region fail for other
> >>>>>>> backend as it is missing from their translation
> >>>>>>> table. So these requests are a fallback
> >>>>>>> for vhost-user memory translation fails.
> >>>>>>
> >>>>>> Can you elaborate what the issue here is?
> >>>>>>
> >>>>>> Why would SHMEM_MAP make accessing the region fail for other backends --
> >>>>>> what makes this missing from their translation?
> >>>>>
> >>>>> This issue was raised by Stefan Hajnoczi in one of the first
> >>>>> iterations of this patchset, based upon previous David Gilbert's work
> >>>>> on the virtiofs DAX Window.
> >>>>>
> >>>>> Let me paste here some of his remarks:
> >>>>>
> >>>>> """
> >>>>> Other backends don't see these mappings. If the guest submits a vring
> >>>>> descriptor referencing a mapping to another backend, then that backend
> >>>>> won't be able to access this memory.
> >>>>> """
> >>>>> [...]
> >>>>> """
> >>>>> A bit more detail:
> >>>>>
> >>>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that
> >>>>> memory (examples: guest userspace driver using Linux VFIO, a guest
> >>>>> kernel driver that exposes the memory to userspace via mmap, or guest
> >>>>> kernel DAX). The application passes that memory as an I/O buffer to
> >>>>> device B (e.g. O_DIRECT disk I/O).
> >>>>>
> >>>>> The result is that device B's vhost-user backend receives a vring
> >>>>> descriptor that points to a guest memory address in device A's VIRTIO
> >>>>> Shared Memory Region. Since device B does not have this memory in its
> >>>>> table, it cannot translate the address and the device breaks.
> >>>>> """
> >>>>>
> >>>>> I have not triggered the issue myself. So the idea is that the next
> >>>>> patch will *definitively* include some testing for the commits that I
> >>>>> cannot verify with my local setup.
> >>>>
> >>>> Hah! But isn't that exact problem which is now solved by our rework?
> >>>>
> >>>> Whatever is mapped in the VIRTIO Shared Memory Region will be
> >>>> communicated to all other vhost-user devices. So they should have that
> >>>> memory in their map and should be able to access it.
> >>>
> >>> You mean the SET_MEM_TABLE message after the vhost_commit is sent to
> >>> all vhost-user devices? I was not sure, as I was testing with a single
> >>> device, that would be great, and simplify the patch a lot.
> >>
> >> Yes, all vhost-user devices should be updated.
> >
> > Then, I think I agree with you, it would seem that this approach
> > naturally solved the issue with address translation among different
> > devices, as they all get the most up-to-date memory table after each
> > mmap.
> >
> > WDYT, @Stefan Hajnoczi ?
> > If we are unsure, maybe we can leave the MEM_READ/WRITE support as a
> > later extension, and try to integrate the rest of this patch first.
>
> As commented offline, maybe one would want the option to enable the
> alternative mode, where such updates (in the SHM region) are not sent to
> vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE
> would be unavoidable.

At first, I remember we discussed two options, having update messages
sent to all devices (which was deemed as potentially racy), or using
MEM_READ / MEM _WRITE messages. With this version of the patch there
is no option to avoid the mem_table update messages, which brings me
to my point in the previous message: it may make sense to continue
with this patch without MEM_READ/WRITE support, and leave that and the
option to make mem_table updates optional for a followup patch?

>
> What comes to mind are vhost-user devices with limited number of
> supported memslots.
>
> No idea how relevant that really is, and how many SHM regions we will
> see in practice.

In general, from what I see they usually require 1 or 2 regions,
except for virtio-scmi which requires >256.

>
> I recently increased the number of supported memslots for rust-vmm and
> libvhost-user, but not sure about other devices (in particular, dpdk,
> spdk, and whether we really care about that here).
>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 26, 2025, 9:53 a.m. UTC | #9
>> As commented offline, maybe one would want the option to enable the
>> alternative mode, where such updates (in the SHM region) are not sent to
>> vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE
>> would be unavoidable.
> 
> At first, I remember we discussed two options, having update messages
> sent to all devices (which was deemed as potentially racy), or using
> MEM_READ / MEM _WRITE messages. With this version of the patch there
> is no option to avoid the mem_table update messages, which brings me
> to my point in the previous message: it may make sense to continue
> with this patch without MEM_READ/WRITE support, and leave that and the
> option to make mem_table updates optional for a followup patch?

IMHO that would work for me.

> 
>>
>> What comes to mind are vhost-user devices with limited number of
>> supported memslots.
>>
>> No idea how relevant that really is, and how many SHM regions we will
>> see in practice.
> 
> In general, from what I see they usually require 1 or 2 regions,
> except for virtio-scmi which requires >256.

1/2 regions are not a problem. Once we're in the hundreds for a single
device, it will likely start being a problem, especially when you have more
such devices.

BUT, it would likely be a problem even with the alternative approach where
we don't communicate these regions to vhost-user: IIRC, vhost-net in
the kernel is usually limited to a maximum of 509 memslots as well as
default. Similarly, older KVM only supports a total of 509 memslots.

See https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html
"Compatibility with vhost-net and vhost-user".

In libvhost-user, and rust-vmm, we have a similar limit of ~509.


Note that for memory devices (DIMMs, virtio-mem), we'll use up to 256
memslots in case all devices support 509 memslots.
See MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT:

/*
  * Traditionally, KVM/vhost in many setups supported 509 memslots, whereby
  * 253 memslots were "reserved" for boot memory and other devices (such
  * as PCI BARs, which can get mapped dynamically) and 256 memslots were
  * dedicated for DIMMs. These magic numbers worked reliably in the past.
  *
  * Further, using many memslots can negatively affect performance, so setting
  * the soft-limit of memslots used by memory devices to the traditional
  * DIMM limit of 256 sounds reasonable.
  *
  * If we have less than 509 memslots, we will instruct memory devices that
  * support automatically deciding how many memslots to use to only use a single
  * one.
  *
  * Hotplugging vhost devices with at least 509 memslots is not expected to
  * cause problems, not even when devices automatically decided how many memslots
  * to use.
  */
#define MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT 256
#define MEMORY_DEVICES_SAFE_MAX_MEMSLOTS 509


That changes once you have some vhost-user devices consume combined with boot
memory more than 253 memslots.
Stefan Hajnoczi Feb. 27, 2025, 7:10 a.m. UTC | #10
On Wed, Feb 26, 2025 at 10:53:01AM +0100, David Hildenbrand wrote:
> > > As commented offline, maybe one would want the option to enable the
> > > alternative mode, where such updates (in the SHM region) are not sent to
> > > vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE
> > > would be unavoidable.
> > 
> > At first, I remember we discussed two options, having update messages
> > sent to all devices (which was deemed as potentially racy), or using
> > MEM_READ / MEM _WRITE messages. With this version of the patch there
> > is no option to avoid the mem_table update messages, which brings me
> > to my point in the previous message: it may make sense to continue
> > with this patch without MEM_READ/WRITE support, and leave that and the
> > option to make mem_table updates optional for a followup patch?
> 
> IMHO that would work for me.

I'm happy with dropping MEM_READ/WRITE. If the memslots limit becomes a
problem then it will be necessary to think about handling things
differently, but there are many possible uses of VIRTIO Shared Memory
Regions that will not hit the limit and I don't see a need to hold them
back.

Stefan

> 
> > 
> > > 
> > > What comes to mind are vhost-user devices with limited number of
> > > supported memslots.
> > > 
> > > No idea how relevant that really is, and how many SHM regions we will
> > > see in practice.
> > 
> > In general, from what I see they usually require 1 or 2 regions,
> > except for virtio-scmi which requires >256.
> 
> 1/2 regions are not a problem. Once we're in the hundreds for a single
> device, it will likely start being a problem, especially when you have more
> such devices.
> 
> BUT, it would likely be a problem even with the alternative approach where
> we don't communicate these regions to vhost-user: IIRC, vhost-net in
> the kernel is usually limited to a maximum of 509 memslots as well as
> default. Similarly, older KVM only supports a total of 509 memslots.
> 
> See https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html
> "Compatibility with vhost-net and vhost-user".
> 
> In libvhost-user, and rust-vmm, we have a similar limit of ~509.
> 
> 
> Note that for memory devices (DIMMs, virtio-mem), we'll use up to 256
> memslots in case all devices support 509 memslots.
> See MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT:
> 
> /*
>  * Traditionally, KVM/vhost in many setups supported 509 memslots, whereby
>  * 253 memslots were "reserved" for boot memory and other devices (such
>  * as PCI BARs, which can get mapped dynamically) and 256 memslots were
>  * dedicated for DIMMs. These magic numbers worked reliably in the past.
>  *
>  * Further, using many memslots can negatively affect performance, so setting
>  * the soft-limit of memslots used by memory devices to the traditional
>  * DIMM limit of 256 sounds reasonable.
>  *
>  * If we have less than 509 memslots, we will instruct memory devices that
>  * support automatically deciding how many memslots to use to only use a single
>  * one.
>  *
>  * Hotplugging vhost devices with at least 509 memslots is not expected to
>  * cause problems, not even when devices automatically decided how many memslots
>  * to use.
>  */
> #define MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT 256
> #define MEMORY_DEVICES_SAFE_MAX_MEMSLOTS 509
> 
> 
> That changes once you have some vhost-user devices consume combined with boot
> memory more than 253 memslots.
> 
> -- 
> Cheers,
> 
> David / dhildenb
>