Message ID | 20250217164012.246727-1-aesteve@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | vhost-user: Add SHMEM_MAP/UNMAP requests | expand |
On 17.02.25 17:40, Albert Esteve wrote: > Hi all, > Hi, looks like our debugging session was successfu :) One question below. > v3->v4 > - Change mmap strategy to use RAM blocks > and subregions. > - Add new bitfield to qmp feature map > - Followed most review comments from > last iteration. > - Merged documentation patch again with > this one. Makes more sense to > review them together after all. > - Add documentation for MEM_READ/WRITE > messages. > > The goal of this patch is to support > dynamic fd-backed memory maps initiated > from vhost-user backends. > There are many devices that could already > benefit of this feature, e.g., > virtiofs or virtio-gpu. > > After receiving the SHMEM_MAP/UNMAP request, > the frontend creates the RAMBlock form the > fd and maps it by adding it as a subregion > of the shared memory region container. > > The VIRTIO Shared Memory Region list is > declared in the `VirtIODevice` struct > to make it generic. > > TODO: There was a conversation on the > previous version around adding tests > to the patch (which I have acknowledged). > However, given the numerous changes > that the patch already has, I have > decided to send it early and collect > some feedback while I work on the > tests for the next iteration. > Given that I have been able to > test the implementation with > my local setup, I am more or less > confident that, at least, the code > is in a relatively sane state > so that no reviewing time is > wasted on broken patches. > > This patch also includes: > - SHMEM_CONFIG frontend request that is > specifically meant to allow generic > vhost-user-device frontend to be able to > query VIRTIO Shared Memory settings from the > backend (as this device is generic and agnostic > of the actual backend configuration). > > - MEM_READ/WRITE backend requests are > added to deal with a potential issue when having > multiple backends sharing a file descriptor. > When a backend calls SHMEM_MAP it makes > accessing to the region fail for other > backend as it is missing from their translation > table. So these requests are a fallback > for vhost-user memory translation fails. Can you elaborate what the issue here is? Why would SHMEM_MAP make accessing the region fail for other backends -- what makes this missing from their translation?
On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: > > On 17.02.25 17:40, Albert Esteve wrote: > > Hi all, > > > > Hi, > > looks like our debugging session was successfu :) > > One question below. > > > v3->v4 > > - Change mmap strategy to use RAM blocks > > and subregions. > > - Add new bitfield to qmp feature map > > - Followed most review comments from > > last iteration. > > - Merged documentation patch again with > > this one. Makes more sense to > > review them together after all. > > - Add documentation for MEM_READ/WRITE > > messages. > > > > The goal of this patch is to support > > dynamic fd-backed memory maps initiated > > from vhost-user backends. > > There are many devices that could already > > benefit of this feature, e.g., > > virtiofs or virtio-gpu. > > > > After receiving the SHMEM_MAP/UNMAP request, > > the frontend creates the RAMBlock form the > > fd and maps it by adding it as a subregion > > of the shared memory region container. > > > > The VIRTIO Shared Memory Region list is > > declared in the `VirtIODevice` struct > > to make it generic. > > > > TODO: There was a conversation on the > > previous version around adding tests > > to the patch (which I have acknowledged). > > However, given the numerous changes > > that the patch already has, I have > > decided to send it early and collect > > some feedback while I work on the > > tests for the next iteration. > > Given that I have been able to > > test the implementation with > > my local setup, I am more or less > > confident that, at least, the code > > is in a relatively sane state > > so that no reviewing time is > > wasted on broken patches. > > > > This patch also includes: > > - SHMEM_CONFIG frontend request that is > > specifically meant to allow generic > > vhost-user-device frontend to be able to > > query VIRTIO Shared Memory settings from the > > backend (as this device is generic and agnostic > > of the actual backend configuration). > > > > - MEM_READ/WRITE backend requests are > > added to deal with a potential issue when having > > multiple backends sharing a file descriptor. > > When a backend calls SHMEM_MAP it makes > > accessing to the region fail for other > > backend as it is missing from their translation > > table. So these requests are a fallback > > for vhost-user memory translation fails. > > Can you elaborate what the issue here is? > > Why would SHMEM_MAP make accessing the region fail for other backends -- > what makes this missing from their translation? This issue was raised by Stefan Hajnoczi in one of the first iterations of this patchset, based upon previous David Gilbert's work on the virtiofs DAX Window. Let me paste here some of his remarks: """ Other backends don't see these mappings. If the guest submits a vring descriptor referencing a mapping to another backend, then that backend won't be able to access this memory. """ [...] """ A bit more detail: Device A has a VIRTIO Shared Memory Region. An application mmaps that memory (examples: guest userspace driver using Linux VFIO, a guest kernel driver that exposes the memory to userspace via mmap, or guest kernel DAX). The application passes that memory as an I/O buffer to device B (e.g. O_DIRECT disk I/O). The result is that device B's vhost-user backend receives a vring descriptor that points to a guest memory address in device A's VIRTIO Shared Memory Region. Since device B does not have this memory in its table, it cannot translate the address and the device breaks. """ I have not triggered the issue myself. So the idea is that the next patch will *definitively* include some testing for the commits that I cannot verify with my local setup. BR, Albert. > > -- > Cheers, > > David / dhildenb >
On 24.02.25 09:54, Albert Esteve wrote: > On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 17.02.25 17:40, Albert Esteve wrote: >>> Hi all, >>> >> >> Hi, >> >> looks like our debugging session was successfu :) >> >> One question below. >> >>> v3->v4 >>> - Change mmap strategy to use RAM blocks >>> and subregions. >>> - Add new bitfield to qmp feature map >>> - Followed most review comments from >>> last iteration. >>> - Merged documentation patch again with >>> this one. Makes more sense to >>> review them together after all. >>> - Add documentation for MEM_READ/WRITE >>> messages. >>> >>> The goal of this patch is to support >>> dynamic fd-backed memory maps initiated >>> from vhost-user backends. >>> There are many devices that could already >>> benefit of this feature, e.g., >>> virtiofs or virtio-gpu. >>> >>> After receiving the SHMEM_MAP/UNMAP request, >>> the frontend creates the RAMBlock form the >>> fd and maps it by adding it as a subregion >>> of the shared memory region container. >>> >>> The VIRTIO Shared Memory Region list is >>> declared in the `VirtIODevice` struct >>> to make it generic. >>> >>> TODO: There was a conversation on the >>> previous version around adding tests >>> to the patch (which I have acknowledged). >>> However, given the numerous changes >>> that the patch already has, I have >>> decided to send it early and collect >>> some feedback while I work on the >>> tests for the next iteration. >>> Given that I have been able to >>> test the implementation with >>> my local setup, I am more or less >>> confident that, at least, the code >>> is in a relatively sane state >>> so that no reviewing time is >>> wasted on broken patches. >>> >>> This patch also includes: >>> - SHMEM_CONFIG frontend request that is >>> specifically meant to allow generic >>> vhost-user-device frontend to be able to >>> query VIRTIO Shared Memory settings from the >>> backend (as this device is generic and agnostic >>> of the actual backend configuration). >>> >>> - MEM_READ/WRITE backend requests are >>> added to deal with a potential issue when having >>> multiple backends sharing a file descriptor. >>> When a backend calls SHMEM_MAP it makes >>> accessing to the region fail for other >>> backend as it is missing from their translation >>> table. So these requests are a fallback >>> for vhost-user memory translation fails. >> >> Can you elaborate what the issue here is? >> >> Why would SHMEM_MAP make accessing the region fail for other backends -- >> what makes this missing from their translation? > > This issue was raised by Stefan Hajnoczi in one of the first > iterations of this patchset, based upon previous David Gilbert's work > on the virtiofs DAX Window. > > Let me paste here some of his remarks: > > """ > Other backends don't see these mappings. If the guest submits a vring > descriptor referencing a mapping to another backend, then that backend > won't be able to access this memory. > """ > [...] > """ > A bit more detail: > > Device A has a VIRTIO Shared Memory Region. An application mmaps that > memory (examples: guest userspace driver using Linux VFIO, a guest > kernel driver that exposes the memory to userspace via mmap, or guest > kernel DAX). The application passes that memory as an I/O buffer to > device B (e.g. O_DIRECT disk I/O). > > The result is that device B's vhost-user backend receives a vring > descriptor that points to a guest memory address in device A's VIRTIO > Shared Memory Region. Since device B does not have this memory in its > table, it cannot translate the address and the device breaks. > """ > > I have not triggered the issue myself. So the idea is that the next > patch will *definitively* include some testing for the commits that I > cannot verify with my local setup. Hah! But isn't that exact problem which is now solved by our rework? Whatever is mapped in the VIRTIO Shared Memory Region will be communicated to all other vhost-user devices. So they should have that memory in their map and should be able to access it. The only thing vhost-user devices cannot access are IIRC ram_device_ptr memory regions (e.g., from vfio devices). But that is independent shared memory regions.
On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote: > > On 24.02.25 09:54, Albert Esteve wrote: > > On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 17.02.25 17:40, Albert Esteve wrote: > >>> Hi all, > >>> > >> > >> Hi, > >> > >> looks like our debugging session was successfu :) > >> > >> One question below. > >> > >>> v3->v4 > >>> - Change mmap strategy to use RAM blocks > >>> and subregions. > >>> - Add new bitfield to qmp feature map > >>> - Followed most review comments from > >>> last iteration. > >>> - Merged documentation patch again with > >>> this one. Makes more sense to > >>> review them together after all. > >>> - Add documentation for MEM_READ/WRITE > >>> messages. > >>> > >>> The goal of this patch is to support > >>> dynamic fd-backed memory maps initiated > >>> from vhost-user backends. > >>> There are many devices that could already > >>> benefit of this feature, e.g., > >>> virtiofs or virtio-gpu. > >>> > >>> After receiving the SHMEM_MAP/UNMAP request, > >>> the frontend creates the RAMBlock form the > >>> fd and maps it by adding it as a subregion > >>> of the shared memory region container. > >>> > >>> The VIRTIO Shared Memory Region list is > >>> declared in the `VirtIODevice` struct > >>> to make it generic. > >>> > >>> TODO: There was a conversation on the > >>> previous version around adding tests > >>> to the patch (which I have acknowledged). > >>> However, given the numerous changes > >>> that the patch already has, I have > >>> decided to send it early and collect > >>> some feedback while I work on the > >>> tests for the next iteration. > >>> Given that I have been able to > >>> test the implementation with > >>> my local setup, I am more or less > >>> confident that, at least, the code > >>> is in a relatively sane state > >>> so that no reviewing time is > >>> wasted on broken patches. > >>> > >>> This patch also includes: > >>> - SHMEM_CONFIG frontend request that is > >>> specifically meant to allow generic > >>> vhost-user-device frontend to be able to > >>> query VIRTIO Shared Memory settings from the > >>> backend (as this device is generic and agnostic > >>> of the actual backend configuration). > >>> > >>> - MEM_READ/WRITE backend requests are > >>> added to deal with a potential issue when having > >>> multiple backends sharing a file descriptor. > >>> When a backend calls SHMEM_MAP it makes > >>> accessing to the region fail for other > >>> backend as it is missing from their translation > >>> table. So these requests are a fallback > >>> for vhost-user memory translation fails. > >> > >> Can you elaborate what the issue here is? > >> > >> Why would SHMEM_MAP make accessing the region fail for other backends -- > >> what makes this missing from their translation? > > > > This issue was raised by Stefan Hajnoczi in one of the first > > iterations of this patchset, based upon previous David Gilbert's work > > on the virtiofs DAX Window. > > > > Let me paste here some of his remarks: > > > > """ > > Other backends don't see these mappings. If the guest submits a vring > > descriptor referencing a mapping to another backend, then that backend > > won't be able to access this memory. > > """ > > [...] > > """ > > A bit more detail: > > > > Device A has a VIRTIO Shared Memory Region. An application mmaps that > > memory (examples: guest userspace driver using Linux VFIO, a guest > > kernel driver that exposes the memory to userspace via mmap, or guest > > kernel DAX). The application passes that memory as an I/O buffer to > > device B (e.g. O_DIRECT disk I/O). > > > > The result is that device B's vhost-user backend receives a vring > > descriptor that points to a guest memory address in device A's VIRTIO > > Shared Memory Region. Since device B does not have this memory in its > > table, it cannot translate the address and the device breaks. > > """ > > > > I have not triggered the issue myself. So the idea is that the next > > patch will *definitively* include some testing for the commits that I > > cannot verify with my local setup. > > Hah! But isn't that exact problem which is now solved by our rework? > > Whatever is mapped in the VIRTIO Shared Memory Region will be > communicated to all other vhost-user devices. So they should have that > memory in their map and should be able to access it. You mean the SET_MEM_TABLE message after the vhost_commit is sent to all vhost-user devices? I was not sure, as I was testing with a single device, that would be great, and simplify the patch a lot. > > The only thing vhost-user devices cannot access are IIRC ram_device_ptr > memory regions (e.g., from vfio devices). But that is independent shared > memory regions. > > -- > Cheers, > > David / dhildenb >
On 24.02.25 10:35, Albert Esteve wrote: > On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 24.02.25 09:54, Albert Esteve wrote: >>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 17.02.25 17:40, Albert Esteve wrote: >>>>> Hi all, >>>>> >>>> >>>> Hi, >>>> >>>> looks like our debugging session was successfu :) >>>> >>>> One question below. >>>> >>>>> v3->v4 >>>>> - Change mmap strategy to use RAM blocks >>>>> and subregions. >>>>> - Add new bitfield to qmp feature map >>>>> - Followed most review comments from >>>>> last iteration. >>>>> - Merged documentation patch again with >>>>> this one. Makes more sense to >>>>> review them together after all. >>>>> - Add documentation for MEM_READ/WRITE >>>>> messages. >>>>> >>>>> The goal of this patch is to support >>>>> dynamic fd-backed memory maps initiated >>>>> from vhost-user backends. >>>>> There are many devices that could already >>>>> benefit of this feature, e.g., >>>>> virtiofs or virtio-gpu. >>>>> >>>>> After receiving the SHMEM_MAP/UNMAP request, >>>>> the frontend creates the RAMBlock form the >>>>> fd and maps it by adding it as a subregion >>>>> of the shared memory region container. >>>>> >>>>> The VIRTIO Shared Memory Region list is >>>>> declared in the `VirtIODevice` struct >>>>> to make it generic. >>>>> >>>>> TODO: There was a conversation on the >>>>> previous version around adding tests >>>>> to the patch (which I have acknowledged). >>>>> However, given the numerous changes >>>>> that the patch already has, I have >>>>> decided to send it early and collect >>>>> some feedback while I work on the >>>>> tests for the next iteration. >>>>> Given that I have been able to >>>>> test the implementation with >>>>> my local setup, I am more or less >>>>> confident that, at least, the code >>>>> is in a relatively sane state >>>>> so that no reviewing time is >>>>> wasted on broken patches. >>>>> >>>>> This patch also includes: >>>>> - SHMEM_CONFIG frontend request that is >>>>> specifically meant to allow generic >>>>> vhost-user-device frontend to be able to >>>>> query VIRTIO Shared Memory settings from the >>>>> backend (as this device is generic and agnostic >>>>> of the actual backend configuration). >>>>> >>>>> - MEM_READ/WRITE backend requests are >>>>> added to deal with a potential issue when having >>>>> multiple backends sharing a file descriptor. >>>>> When a backend calls SHMEM_MAP it makes >>>>> accessing to the region fail for other >>>>> backend as it is missing from their translation >>>>> table. So these requests are a fallback >>>>> for vhost-user memory translation fails. >>>> >>>> Can you elaborate what the issue here is? >>>> >>>> Why would SHMEM_MAP make accessing the region fail for other backends -- >>>> what makes this missing from their translation? >>> >>> This issue was raised by Stefan Hajnoczi in one of the first >>> iterations of this patchset, based upon previous David Gilbert's work >>> on the virtiofs DAX Window. >>> >>> Let me paste here some of his remarks: >>> >>> """ >>> Other backends don't see these mappings. If the guest submits a vring >>> descriptor referencing a mapping to another backend, then that backend >>> won't be able to access this memory. >>> """ >>> [...] >>> """ >>> A bit more detail: >>> >>> Device A has a VIRTIO Shared Memory Region. An application mmaps that >>> memory (examples: guest userspace driver using Linux VFIO, a guest >>> kernel driver that exposes the memory to userspace via mmap, or guest >>> kernel DAX). The application passes that memory as an I/O buffer to >>> device B (e.g. O_DIRECT disk I/O). >>> >>> The result is that device B's vhost-user backend receives a vring >>> descriptor that points to a guest memory address in device A's VIRTIO >>> Shared Memory Region. Since device B does not have this memory in its >>> table, it cannot translate the address and the device breaks. >>> """ >>> >>> I have not triggered the issue myself. So the idea is that the next >>> patch will *definitively* include some testing for the commits that I >>> cannot verify with my local setup. >> >> Hah! But isn't that exact problem which is now solved by our rework? >> >> Whatever is mapped in the VIRTIO Shared Memory Region will be >> communicated to all other vhost-user devices. So they should have that >> memory in their map and should be able to access it. > > You mean the SET_MEM_TABLE message after the vhost_commit is sent to > all vhost-user devices? I was not sure, as I was testing with a single > device, that would be great, and simplify the patch a lot. Yes, all vhost-user devices should be updated.
On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote: > > On 24.02.25 10:35, Albert Esteve wrote: > > On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 24.02.25 09:54, Albert Esteve wrote: > >>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>> On 17.02.25 17:40, Albert Esteve wrote: > >>>>> Hi all, > >>>>> > >>>> > >>>> Hi, > >>>> > >>>> looks like our debugging session was successfu :) > >>>> > >>>> One question below. > >>>> > >>>>> v3->v4 > >>>>> - Change mmap strategy to use RAM blocks > >>>>> and subregions. > >>>>> - Add new bitfield to qmp feature map > >>>>> - Followed most review comments from > >>>>> last iteration. > >>>>> - Merged documentation patch again with > >>>>> this one. Makes more sense to > >>>>> review them together after all. > >>>>> - Add documentation for MEM_READ/WRITE > >>>>> messages. > >>>>> > >>>>> The goal of this patch is to support > >>>>> dynamic fd-backed memory maps initiated > >>>>> from vhost-user backends. > >>>>> There are many devices that could already > >>>>> benefit of this feature, e.g., > >>>>> virtiofs or virtio-gpu. > >>>>> > >>>>> After receiving the SHMEM_MAP/UNMAP request, > >>>>> the frontend creates the RAMBlock form the > >>>>> fd and maps it by adding it as a subregion > >>>>> of the shared memory region container. > >>>>> > >>>>> The VIRTIO Shared Memory Region list is > >>>>> declared in the `VirtIODevice` struct > >>>>> to make it generic. > >>>>> > >>>>> TODO: There was a conversation on the > >>>>> previous version around adding tests > >>>>> to the patch (which I have acknowledged). > >>>>> However, given the numerous changes > >>>>> that the patch already has, I have > >>>>> decided to send it early and collect > >>>>> some feedback while I work on the > >>>>> tests for the next iteration. > >>>>> Given that I have been able to > >>>>> test the implementation with > >>>>> my local setup, I am more or less > >>>>> confident that, at least, the code > >>>>> is in a relatively sane state > >>>>> so that no reviewing time is > >>>>> wasted on broken patches. > >>>>> > >>>>> This patch also includes: > >>>>> - SHMEM_CONFIG frontend request that is > >>>>> specifically meant to allow generic > >>>>> vhost-user-device frontend to be able to > >>>>> query VIRTIO Shared Memory settings from the > >>>>> backend (as this device is generic and agnostic > >>>>> of the actual backend configuration). > >>>>> > >>>>> - MEM_READ/WRITE backend requests are > >>>>> added to deal with a potential issue when having > >>>>> multiple backends sharing a file descriptor. > >>>>> When a backend calls SHMEM_MAP it makes > >>>>> accessing to the region fail for other > >>>>> backend as it is missing from their translation > >>>>> table. So these requests are a fallback > >>>>> for vhost-user memory translation fails. > >>>> > >>>> Can you elaborate what the issue here is? > >>>> > >>>> Why would SHMEM_MAP make accessing the region fail for other backends -- > >>>> what makes this missing from their translation? > >>> > >>> This issue was raised by Stefan Hajnoczi in one of the first > >>> iterations of this patchset, based upon previous David Gilbert's work > >>> on the virtiofs DAX Window. > >>> > >>> Let me paste here some of his remarks: > >>> > >>> """ > >>> Other backends don't see these mappings. If the guest submits a vring > >>> descriptor referencing a mapping to another backend, then that backend > >>> won't be able to access this memory. > >>> """ > >>> [...] > >>> """ > >>> A bit more detail: > >>> > >>> Device A has a VIRTIO Shared Memory Region. An application mmaps that > >>> memory (examples: guest userspace driver using Linux VFIO, a guest > >>> kernel driver that exposes the memory to userspace via mmap, or guest > >>> kernel DAX). The application passes that memory as an I/O buffer to > >>> device B (e.g. O_DIRECT disk I/O). > >>> > >>> The result is that device B's vhost-user backend receives a vring > >>> descriptor that points to a guest memory address in device A's VIRTIO > >>> Shared Memory Region. Since device B does not have this memory in its > >>> table, it cannot translate the address and the device breaks. > >>> """ > >>> > >>> I have not triggered the issue myself. So the idea is that the next > >>> patch will *definitively* include some testing for the commits that I > >>> cannot verify with my local setup. > >> > >> Hah! But isn't that exact problem which is now solved by our rework? > >> > >> Whatever is mapped in the VIRTIO Shared Memory Region will be > >> communicated to all other vhost-user devices. So they should have that > >> memory in their map and should be able to access it. > > > > You mean the SET_MEM_TABLE message after the vhost_commit is sent to > > all vhost-user devices? I was not sure, as I was testing with a single > > device, that would be great, and simplify the patch a lot. > > Yes, all vhost-user devices should be updated. Then, I think I agree with you, it would seem that this approach naturally solved the issue with address translation among different devices, as they all get the most up-to-date memory table after each mmap. WDYT, @Stefan Hajnoczi ? If we are unsure, maybe we can leave the MEM_READ/WRITE support as a later extension, and try to integrate the rest of this patch first. > > -- > Cheers, > > David / dhildenb >
On 24.02.25 14:41, Albert Esteve wrote: > On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 24.02.25 10:35, Albert Esteve wrote: >>> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 24.02.25 09:54, Albert Esteve wrote: >>>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: >>>>>> >>>>>> On 17.02.25 17:40, Albert Esteve wrote: >>>>>>> Hi all, >>>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> looks like our debugging session was successfu :) >>>>>> >>>>>> One question below. >>>>>> >>>>>>> v3->v4 >>>>>>> - Change mmap strategy to use RAM blocks >>>>>>> and subregions. >>>>>>> - Add new bitfield to qmp feature map >>>>>>> - Followed most review comments from >>>>>>> last iteration. >>>>>>> - Merged documentation patch again with >>>>>>> this one. Makes more sense to >>>>>>> review them together after all. >>>>>>> - Add documentation for MEM_READ/WRITE >>>>>>> messages. >>>>>>> >>>>>>> The goal of this patch is to support >>>>>>> dynamic fd-backed memory maps initiated >>>>>>> from vhost-user backends. >>>>>>> There are many devices that could already >>>>>>> benefit of this feature, e.g., >>>>>>> virtiofs or virtio-gpu. >>>>>>> >>>>>>> After receiving the SHMEM_MAP/UNMAP request, >>>>>>> the frontend creates the RAMBlock form the >>>>>>> fd and maps it by adding it as a subregion >>>>>>> of the shared memory region container. >>>>>>> >>>>>>> The VIRTIO Shared Memory Region list is >>>>>>> declared in the `VirtIODevice` struct >>>>>>> to make it generic. >>>>>>> >>>>>>> TODO: There was a conversation on the >>>>>>> previous version around adding tests >>>>>>> to the patch (which I have acknowledged). >>>>>>> However, given the numerous changes >>>>>>> that the patch already has, I have >>>>>>> decided to send it early and collect >>>>>>> some feedback while I work on the >>>>>>> tests for the next iteration. >>>>>>> Given that I have been able to >>>>>>> test the implementation with >>>>>>> my local setup, I am more or less >>>>>>> confident that, at least, the code >>>>>>> is in a relatively sane state >>>>>>> so that no reviewing time is >>>>>>> wasted on broken patches. >>>>>>> >>>>>>> This patch also includes: >>>>>>> - SHMEM_CONFIG frontend request that is >>>>>>> specifically meant to allow generic >>>>>>> vhost-user-device frontend to be able to >>>>>>> query VIRTIO Shared Memory settings from the >>>>>>> backend (as this device is generic and agnostic >>>>>>> of the actual backend configuration). >>>>>>> >>>>>>> - MEM_READ/WRITE backend requests are >>>>>>> added to deal with a potential issue when having >>>>>>> multiple backends sharing a file descriptor. >>>>>>> When a backend calls SHMEM_MAP it makes >>>>>>> accessing to the region fail for other >>>>>>> backend as it is missing from their translation >>>>>>> table. So these requests are a fallback >>>>>>> for vhost-user memory translation fails. >>>>>> >>>>>> Can you elaborate what the issue here is? >>>>>> >>>>>> Why would SHMEM_MAP make accessing the region fail for other backends -- >>>>>> what makes this missing from their translation? >>>>> >>>>> This issue was raised by Stefan Hajnoczi in one of the first >>>>> iterations of this patchset, based upon previous David Gilbert's work >>>>> on the virtiofs DAX Window. >>>>> >>>>> Let me paste here some of his remarks: >>>>> >>>>> """ >>>>> Other backends don't see these mappings. If the guest submits a vring >>>>> descriptor referencing a mapping to another backend, then that backend >>>>> won't be able to access this memory. >>>>> """ >>>>> [...] >>>>> """ >>>>> A bit more detail: >>>>> >>>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that >>>>> memory (examples: guest userspace driver using Linux VFIO, a guest >>>>> kernel driver that exposes the memory to userspace via mmap, or guest >>>>> kernel DAX). The application passes that memory as an I/O buffer to >>>>> device B (e.g. O_DIRECT disk I/O). >>>>> >>>>> The result is that device B's vhost-user backend receives a vring >>>>> descriptor that points to a guest memory address in device A's VIRTIO >>>>> Shared Memory Region. Since device B does not have this memory in its >>>>> table, it cannot translate the address and the device breaks. >>>>> """ >>>>> >>>>> I have not triggered the issue myself. So the idea is that the next >>>>> patch will *definitively* include some testing for the commits that I >>>>> cannot verify with my local setup. >>>> >>>> Hah! But isn't that exact problem which is now solved by our rework? >>>> >>>> Whatever is mapped in the VIRTIO Shared Memory Region will be >>>> communicated to all other vhost-user devices. So they should have that >>>> memory in their map and should be able to access it. >>> >>> You mean the SET_MEM_TABLE message after the vhost_commit is sent to >>> all vhost-user devices? I was not sure, as I was testing with a single >>> device, that would be great, and simplify the patch a lot. >> >> Yes, all vhost-user devices should be updated. > > Then, I think I agree with you, it would seem that this approach > naturally solved the issue with address translation among different > devices, as they all get the most up-to-date memory table after each > mmap. > > WDYT, @Stefan Hajnoczi ? > If we are unsure, maybe we can leave the MEM_READ/WRITE support as a > later extension, and try to integrate the rest of this patch first. As commented offline, maybe one would want the option to enable the alternative mode, where such updates (in the SHM region) are not sent to vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE would be unavoidable. What comes to mind are vhost-user devices with limited number of supported memslots. No idea how relevant that really is, and how many SHM regions we will see in practice. I recently increased the number of supported memslots for rust-vmm and libvhost-user, but not sure about other devices (in particular, dpdk, spdk, and whether we really care about that here).
On Mon, Feb 24, 2025 at 2:57 PM David Hildenbrand <david@redhat.com> wrote: > > On 24.02.25 14:41, Albert Esteve wrote: > > On Mon, Feb 24, 2025 at 10:49 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 24.02.25 10:35, Albert Esteve wrote: > >>> On Mon, Feb 24, 2025 at 10:16 AM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>> On 24.02.25 09:54, Albert Esteve wrote: > >>>>> On Mon, Feb 17, 2025 at 9:01 PM David Hildenbrand <david@redhat.com> wrote: > >>>>>> > >>>>>> On 17.02.25 17:40, Albert Esteve wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> looks like our debugging session was successfu :) > >>>>>> > >>>>>> One question below. > >>>>>> > >>>>>>> v3->v4 > >>>>>>> - Change mmap strategy to use RAM blocks > >>>>>>> and subregions. > >>>>>>> - Add new bitfield to qmp feature map > >>>>>>> - Followed most review comments from > >>>>>>> last iteration. > >>>>>>> - Merged documentation patch again with > >>>>>>> this one. Makes more sense to > >>>>>>> review them together after all. > >>>>>>> - Add documentation for MEM_READ/WRITE > >>>>>>> messages. > >>>>>>> > >>>>>>> The goal of this patch is to support > >>>>>>> dynamic fd-backed memory maps initiated > >>>>>>> from vhost-user backends. > >>>>>>> There are many devices that could already > >>>>>>> benefit of this feature, e.g., > >>>>>>> virtiofs or virtio-gpu. > >>>>>>> > >>>>>>> After receiving the SHMEM_MAP/UNMAP request, > >>>>>>> the frontend creates the RAMBlock form the > >>>>>>> fd and maps it by adding it as a subregion > >>>>>>> of the shared memory region container. > >>>>>>> > >>>>>>> The VIRTIO Shared Memory Region list is > >>>>>>> declared in the `VirtIODevice` struct > >>>>>>> to make it generic. > >>>>>>> > >>>>>>> TODO: There was a conversation on the > >>>>>>> previous version around adding tests > >>>>>>> to the patch (which I have acknowledged). > >>>>>>> However, given the numerous changes > >>>>>>> that the patch already has, I have > >>>>>>> decided to send it early and collect > >>>>>>> some feedback while I work on the > >>>>>>> tests for the next iteration. > >>>>>>> Given that I have been able to > >>>>>>> test the implementation with > >>>>>>> my local setup, I am more or less > >>>>>>> confident that, at least, the code > >>>>>>> is in a relatively sane state > >>>>>>> so that no reviewing time is > >>>>>>> wasted on broken patches. > >>>>>>> > >>>>>>> This patch also includes: > >>>>>>> - SHMEM_CONFIG frontend request that is > >>>>>>> specifically meant to allow generic > >>>>>>> vhost-user-device frontend to be able to > >>>>>>> query VIRTIO Shared Memory settings from the > >>>>>>> backend (as this device is generic and agnostic > >>>>>>> of the actual backend configuration). > >>>>>>> > >>>>>>> - MEM_READ/WRITE backend requests are > >>>>>>> added to deal with a potential issue when having > >>>>>>> multiple backends sharing a file descriptor. > >>>>>>> When a backend calls SHMEM_MAP it makes > >>>>>>> accessing to the region fail for other > >>>>>>> backend as it is missing from their translation > >>>>>>> table. So these requests are a fallback > >>>>>>> for vhost-user memory translation fails. > >>>>>> > >>>>>> Can you elaborate what the issue here is? > >>>>>> > >>>>>> Why would SHMEM_MAP make accessing the region fail for other backends -- > >>>>>> what makes this missing from their translation? > >>>>> > >>>>> This issue was raised by Stefan Hajnoczi in one of the first > >>>>> iterations of this patchset, based upon previous David Gilbert's work > >>>>> on the virtiofs DAX Window. > >>>>> > >>>>> Let me paste here some of his remarks: > >>>>> > >>>>> """ > >>>>> Other backends don't see these mappings. If the guest submits a vring > >>>>> descriptor referencing a mapping to another backend, then that backend > >>>>> won't be able to access this memory. > >>>>> """ > >>>>> [...] > >>>>> """ > >>>>> A bit more detail: > >>>>> > >>>>> Device A has a VIRTIO Shared Memory Region. An application mmaps that > >>>>> memory (examples: guest userspace driver using Linux VFIO, a guest > >>>>> kernel driver that exposes the memory to userspace via mmap, or guest > >>>>> kernel DAX). The application passes that memory as an I/O buffer to > >>>>> device B (e.g. O_DIRECT disk I/O). > >>>>> > >>>>> The result is that device B's vhost-user backend receives a vring > >>>>> descriptor that points to a guest memory address in device A's VIRTIO > >>>>> Shared Memory Region. Since device B does not have this memory in its > >>>>> table, it cannot translate the address and the device breaks. > >>>>> """ > >>>>> > >>>>> I have not triggered the issue myself. So the idea is that the next > >>>>> patch will *definitively* include some testing for the commits that I > >>>>> cannot verify with my local setup. > >>>> > >>>> Hah! But isn't that exact problem which is now solved by our rework? > >>>> > >>>> Whatever is mapped in the VIRTIO Shared Memory Region will be > >>>> communicated to all other vhost-user devices. So they should have that > >>>> memory in their map and should be able to access it. > >>> > >>> You mean the SET_MEM_TABLE message after the vhost_commit is sent to > >>> all vhost-user devices? I was not sure, as I was testing with a single > >>> device, that would be great, and simplify the patch a lot. > >> > >> Yes, all vhost-user devices should be updated. > > > > Then, I think I agree with you, it would seem that this approach > > naturally solved the issue with address translation among different > > devices, as they all get the most up-to-date memory table after each > > mmap. > > > > WDYT, @Stefan Hajnoczi ? > > If we are unsure, maybe we can leave the MEM_READ/WRITE support as a > > later extension, and try to integrate the rest of this patch first. > > As commented offline, maybe one would want the option to enable the > alternative mode, where such updates (in the SHM region) are not sent to > vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE > would be unavoidable. At first, I remember we discussed two options, having update messages sent to all devices (which was deemed as potentially racy), or using MEM_READ / MEM _WRITE messages. With this version of the patch there is no option to avoid the mem_table update messages, which brings me to my point in the previous message: it may make sense to continue with this patch without MEM_READ/WRITE support, and leave that and the option to make mem_table updates optional for a followup patch? > > What comes to mind are vhost-user devices with limited number of > supported memslots. > > No idea how relevant that really is, and how many SHM regions we will > see in practice. In general, from what I see they usually require 1 or 2 regions, except for virtio-scmi which requires >256. > > I recently increased the number of supported memslots for rust-vmm and > libvhost-user, but not sure about other devices (in particular, dpdk, > spdk, and whether we really care about that here). > > -- > Cheers, > > David / dhildenb >
>> As commented offline, maybe one would want the option to enable the >> alternative mode, where such updates (in the SHM region) are not sent to >> vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE >> would be unavoidable. > > At first, I remember we discussed two options, having update messages > sent to all devices (which was deemed as potentially racy), or using > MEM_READ / MEM _WRITE messages. With this version of the patch there > is no option to avoid the mem_table update messages, which brings me > to my point in the previous message: it may make sense to continue > with this patch without MEM_READ/WRITE support, and leave that and the > option to make mem_table updates optional for a followup patch? IMHO that would work for me. > >> >> What comes to mind are vhost-user devices with limited number of >> supported memslots. >> >> No idea how relevant that really is, and how many SHM regions we will >> see in practice. > > In general, from what I see they usually require 1 or 2 regions, > except for virtio-scmi which requires >256. 1/2 regions are not a problem. Once we're in the hundreds for a single device, it will likely start being a problem, especially when you have more such devices. BUT, it would likely be a problem even with the alternative approach where we don't communicate these regions to vhost-user: IIRC, vhost-net in the kernel is usually limited to a maximum of 509 memslots as well as default. Similarly, older KVM only supports a total of 509 memslots. See https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html "Compatibility with vhost-net and vhost-user". In libvhost-user, and rust-vmm, we have a similar limit of ~509. Note that for memory devices (DIMMs, virtio-mem), we'll use up to 256 memslots in case all devices support 509 memslots. See MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT: /* * Traditionally, KVM/vhost in many setups supported 509 memslots, whereby * 253 memslots were "reserved" for boot memory and other devices (such * as PCI BARs, which can get mapped dynamically) and 256 memslots were * dedicated for DIMMs. These magic numbers worked reliably in the past. * * Further, using many memslots can negatively affect performance, so setting * the soft-limit of memslots used by memory devices to the traditional * DIMM limit of 256 sounds reasonable. * * If we have less than 509 memslots, we will instruct memory devices that * support automatically deciding how many memslots to use to only use a single * one. * * Hotplugging vhost devices with at least 509 memslots is not expected to * cause problems, not even when devices automatically decided how many memslots * to use. */ #define MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT 256 #define MEMORY_DEVICES_SAFE_MAX_MEMSLOTS 509 That changes once you have some vhost-user devices consume combined with boot memory more than 253 memslots.
On Wed, Feb 26, 2025 at 10:53:01AM +0100, David Hildenbrand wrote: > > > As commented offline, maybe one would want the option to enable the > > > alternative mode, where such updates (in the SHM region) are not sent to > > > vhost-user devices. In such a configuration, the MEM_READ / MEM_WRITE > > > would be unavoidable. > > > > At first, I remember we discussed two options, having update messages > > sent to all devices (which was deemed as potentially racy), or using > > MEM_READ / MEM _WRITE messages. With this version of the patch there > > is no option to avoid the mem_table update messages, which brings me > > to my point in the previous message: it may make sense to continue > > with this patch without MEM_READ/WRITE support, and leave that and the > > option to make mem_table updates optional for a followup patch? > > IMHO that would work for me. I'm happy with dropping MEM_READ/WRITE. If the memslots limit becomes a problem then it will be necessary to think about handling things differently, but there are many possible uses of VIRTIO Shared Memory Regions that will not hit the limit and I don't see a need to hold them back. Stefan > > > > > > > > > What comes to mind are vhost-user devices with limited number of > > > supported memslots. > > > > > > No idea how relevant that really is, and how many SHM regions we will > > > see in practice. > > > > In general, from what I see they usually require 1 or 2 regions, > > except for virtio-scmi which requires >256. > > 1/2 regions are not a problem. Once we're in the hundreds for a single > device, it will likely start being a problem, especially when you have more > such devices. > > BUT, it would likely be a problem even with the alternative approach where > we don't communicate these regions to vhost-user: IIRC, vhost-net in > the kernel is usually limited to a maximum of 509 memslots as well as > default. Similarly, older KVM only supports a total of 509 memslots. > > See https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html > "Compatibility with vhost-net and vhost-user". > > In libvhost-user, and rust-vmm, we have a similar limit of ~509. > > > Note that for memory devices (DIMMs, virtio-mem), we'll use up to 256 > memslots in case all devices support 509 memslots. > See MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT: > > /* > * Traditionally, KVM/vhost in many setups supported 509 memslots, whereby > * 253 memslots were "reserved" for boot memory and other devices (such > * as PCI BARs, which can get mapped dynamically) and 256 memslots were > * dedicated for DIMMs. These magic numbers worked reliably in the past. > * > * Further, using many memslots can negatively affect performance, so setting > * the soft-limit of memslots used by memory devices to the traditional > * DIMM limit of 256 sounds reasonable. > * > * If we have less than 509 memslots, we will instruct memory devices that > * support automatically deciding how many memslots to use to only use a single > * one. > * > * Hotplugging vhost devices with at least 509 memslots is not expected to > * cause problems, not even when devices automatically decided how many memslots > * to use. > */ > #define MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT 256 > #define MEMORY_DEVICES_SAFE_MAX_MEMSLOTS 509 > > > That changes once you have some vhost-user devices consume combined with boot > memory more than 253 memslots. > > -- > Cheers, > > David / dhildenb >