Message ID | 20250203223916.1064540-1-almasrymina@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Device memory TCP TX | expand |
On 2/3/25 11:39 PM, Mina Almasry wrote: > The TX path had been dropped from the Device Memory TCP patch series > post RFCv1 [1], to make that series slightly easier to review. This > series rebases the implementation of the TX path on top of the > net_iov/netmem framework agreed upon and merged. The motivation for > the feature is thoroughly described in the docs & cover letter of the > original proposal, so I don't repeat the lengthy descriptions here, but > they are available in [1]. > > Sending this series as RFC as the winder closure is immenient. I plan on > reposting as non-RFC once the tree re-opens, addressing any feedback > I receive in the meantime. I guess you should drop this paragraph. > Full outline on usage of the TX path is detailed in the documentation > added in the first patch. > > Test example is available via the kselftest included in the series as well. > > The series is relatively small, as the TX path for this feature largely > piggybacks on the existing MSG_ZEROCOPY implementation. It looks like no additional device level support is required. That is IMHO so good up to suspicious level :) > Patch Overview: > --------------- > > 1. Documentation & tests to give high level overview of the feature > being added. > > 2. Add netmem refcounting needed for the TX path. > > 3. Devmem TX netlink API. > > 4. Devmem TX net stack implementation. It looks like even the above section needs some update. /P
On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > The TX path had been dropped from the Device Memory TCP patch series > > post RFCv1 [1], to make that series slightly easier to review. This > > series rebases the implementation of the TX path on top of the > > net_iov/netmem framework agreed upon and merged. The motivation for > > the feature is thoroughly described in the docs & cover letter of the > > original proposal, so I don't repeat the lengthy descriptions here, but > > they are available in [1]. > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > reposting as non-RFC once the tree re-opens, addressing any feedback > > I receive in the meantime. > > I guess you should drop this paragraph. > > > Full outline on usage of the TX path is detailed in the documentation > > added in the first patch. > > > > Test example is available via the kselftest included in the series as well. > > > > The series is relatively small, as the TX path for this feature largely > > piggybacks on the existing MSG_ZEROCOPY implementation. > > It looks like no additional device level support is required. That is > IMHO so good up to suspicious level :) > It is correct no additional device level support is required. I don't have any local changes to my driver to make this work. I think Stan on-list was able to run the TX path (he commented on fixes to the test but didn't say it doesn't work :D) and one other person was able to run it offlist. > > Patch Overview: > > --------------- > > > > 1. Documentation & tests to give high level overview of the feature > > being added. > > > > 2. Add netmem refcounting needed for the TX path. > > > > 3. Devmem TX netlink API. > > > > 4. Devmem TX net stack implementation. > > It looks like even the above section needs some update. > Ah, I usually keep the original cover letter untouched and put the updates under the version labels. Looks like you expect the full cover letter to be updated. Will do. Thanks for looking.
On 02/04, Mina Almasry wrote: > On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > > The TX path had been dropped from the Device Memory TCP patch series > > > post RFCv1 [1], to make that series slightly easier to review. This > > > series rebases the implementation of the TX path on top of the > > > net_iov/netmem framework agreed upon and merged. The motivation for > > > the feature is thoroughly described in the docs & cover letter of the > > > original proposal, so I don't repeat the lengthy descriptions here, but > > > they are available in [1]. > > > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > > reposting as non-RFC once the tree re-opens, addressing any feedback > > > I receive in the meantime. > > > > I guess you should drop this paragraph. > > > > > Full outline on usage of the TX path is detailed in the documentation > > > added in the first patch. > > > > > > Test example is available via the kselftest included in the series as well. > > > > > > The series is relatively small, as the TX path for this feature largely > > > piggybacks on the existing MSG_ZEROCOPY implementation. > > > > It looks like no additional device level support is required. That is > > IMHO so good up to suspicious level :) > > > > It is correct no additional device level support is required. I don't > have any local changes to my driver to make this work. I think Stan > on-list was able to run the TX path (he commented on fixes to the test > but didn't say it doesn't work :D) and one other person was able to > run it offlist. For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ I have similar internal patch for mlx5 (will share after RX part gets in). I agree that it seems like gve_unmap_packet needs some work to be more careful to not unmap NIOVs (if you were testing against gve).
On 2/4/25 7:06 PM, Stanislav Fomichev wrote: > On 02/04, Mina Almasry wrote: >> On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: >>> >>> On 2/3/25 11:39 PM, Mina Almasry wrote: >>>> The TX path had been dropped from the Device Memory TCP patch series >>>> post RFCv1 [1], to make that series slightly easier to review. This >>>> series rebases the implementation of the TX path on top of the >>>> net_iov/netmem framework agreed upon and merged. The motivation for >>>> the feature is thoroughly described in the docs & cover letter of the >>>> original proposal, so I don't repeat the lengthy descriptions here, but >>>> they are available in [1]. >>>> >>>> Sending this series as RFC as the winder closure is immenient. I plan on >>>> reposting as non-RFC once the tree re-opens, addressing any feedback >>>> I receive in the meantime. >>> >>> I guess you should drop this paragraph. >>> >>>> Full outline on usage of the TX path is detailed in the documentation >>>> added in the first patch. >>>> >>>> Test example is available via the kselftest included in the series as well. >>>> >>>> The series is relatively small, as the TX path for this feature largely >>>> piggybacks on the existing MSG_ZEROCOPY implementation. >>> >>> It looks like no additional device level support is required. That is >>> IMHO so good up to suspicious level :) >>> >> >> It is correct no additional device level support is required. I don't >> have any local changes to my driver to make this work. I think Stan >> on-list was able to run the TX path (he commented on fixes to the test >> but didn't say it doesn't work :D) and one other person was able to >> run it offlist. > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > I have similar internal patch for mlx5 (will share after RX part gets > in). I agree that it seems like gve_unmap_packet needs some work to be more > careful to not unmap NIOVs (if you were testing against gve). What happen if an user try to use devmem TX on a device not really supporting it? Silent data corruption? Don't we need some way for the device to opt-in (or opt-out) and avoid such issues? /P
On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 02/04, Mina Almasry wrote: > > On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > > > The TX path had been dropped from the Device Memory TCP patch series > > > > post RFCv1 [1], to make that series slightly easier to review. This > > > > series rebases the implementation of the TX path on top of the > > > > net_iov/netmem framework agreed upon and merged. The motivation for > > > > the feature is thoroughly described in the docs & cover letter of the > > > > original proposal, so I don't repeat the lengthy descriptions here, but > > > > they are available in [1]. > > > > > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > > > reposting as non-RFC once the tree re-opens, addressing any feedback > > > > I receive in the meantime. > > > > > > I guess you should drop this paragraph. > > > > > > > Full outline on usage of the TX path is detailed in the documentation > > > > added in the first patch. > > > > > > > > Test example is available via the kselftest included in the series as well. > > > > > > > > The series is relatively small, as the TX path for this feature largely > > > > piggybacks on the existing MSG_ZEROCOPY implementation. > > > > > > It looks like no additional device level support is required. That is > > > IMHO so good up to suspicious level :) > > > > > > > It is correct no additional device level support is required. I don't > > have any local changes to my driver to make this work. I think Stan > > on-list was able to run the TX path (he commented on fixes to the test > > but didn't say it doesn't work :D) and one other person was able to > > run it offlist. > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > I have similar internal patch for mlx5 (will share after RX part gets > in). I agree that it seems like gve_unmap_packet needs some work to be more > careful to not unmap NIOVs (if you were testing against gve). Hmm. I think you're right. We ran into a similar issue with the RX path. The RX path worked 'fine' on initial merge, but it was passing dmabuf dma-addrs to the dma-mapping API which Jason later called out to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will boil down into no-ops for a lot of setups I think which is why I'm not running into any issues in testing, but upon closer look, I think yes, we need to make sure the driver doesn't end up passing these niov dma-addrs to functions like dma_unmap_*() and dma_sync_*(). Stan, do you run into issues (crashes/warnings/bugs) in your setup when the driver tries to unmap niovs? Or did you implement these changes purely for safety? Let me take a deeper look here and suggest something for the next version. I think we may indeed need the driver to declare that it can handle niovs in the TX path correctly (i.e. not accidentally pass niov dma-addrs to the dma-mapping API). -- Thanks, Mina
On Tue, Feb 4, 2025 at 10:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 2/4/25 7:06 PM, Stanislav Fomichev wrote: > > On 02/04, Mina Almasry wrote: > >> On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > >>> > >>> On 2/3/25 11:39 PM, Mina Almasry wrote: > >>>> The TX path had been dropped from the Device Memory TCP patch series > >>>> post RFCv1 [1], to make that series slightly easier to review. This > >>>> series rebases the implementation of the TX path on top of the > >>>> net_iov/netmem framework agreed upon and merged. The motivation for > >>>> the feature is thoroughly described in the docs & cover letter of the > >>>> original proposal, so I don't repeat the lengthy descriptions here, but > >>>> they are available in [1]. > >>>> > >>>> Sending this series as RFC as the winder closure is immenient. I plan on > >>>> reposting as non-RFC once the tree re-opens, addressing any feedback > >>>> I receive in the meantime. > >>> > >>> I guess you should drop this paragraph. > >>> > >>>> Full outline on usage of the TX path is detailed in the documentation > >>>> added in the first patch. > >>>> > >>>> Test example is available via the kselftest included in the series as well. > >>>> > >>>> The series is relatively small, as the TX path for this feature largely > >>>> piggybacks on the existing MSG_ZEROCOPY implementation. > >>> > >>> It looks like no additional device level support is required. That is > >>> IMHO so good up to suspicious level :) > >>> > >> > >> It is correct no additional device level support is required. I don't > >> have any local changes to my driver to make this work. I think Stan > >> on-list was able to run the TX path (he commented on fixes to the test > >> but didn't say it doesn't work :D) and one other person was able to > >> run it offlist. > > > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > > I have similar internal patch for mlx5 (will share after RX part gets > > in). I agree that it seems like gve_unmap_packet needs some work to be more > > careful to not unmap NIOVs (if you were testing against gve). > > What happen if an user try to use devmem TX on a device not really > supporting it? Silent data corruption? > So the tx dma-buf binding netlink API will bind the dma-buf to the netdevice. If that fails, the uapi will return failure and devmem tx will not be enabled. If the dma-binding succeeds, then the device can indeed DMA into the dma-addrs in the device. The TX path will dma from the dma-addrs in the device just fine and it need not be aware that the dma-addrs are coming from a device and not from host memory. The only issue that Stan's patches is pointing to, is that the driver will likely be passing these dma-buf addresses into dma-mapping APIs like dma_unmap_*() and dma_sync_*() functions. Those, AFAIU, will be no-ops with dma-buf addresses in most setups, but it's not 100% safe to pass those dma-buf addresses to these dma-mapping APIs, so we should avoid these calls entirely. > Don't we need some way for the device to opt-in (or opt-out) and avoid > such issues? > Yeah, I think likely the driver needs to declare support (i.e. it's not using dma-mapping API with dma-buf addresses). -- Thanks, Mina
On 02/04, Mina Almasry wrote: > On Tue, Feb 4, 2025 at 10:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 2/4/25 7:06 PM, Stanislav Fomichev wrote: > > > On 02/04, Mina Almasry wrote: > > >> On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > >>> > > >>> On 2/3/25 11:39 PM, Mina Almasry wrote: > > >>>> The TX path had been dropped from the Device Memory TCP patch series > > >>>> post RFCv1 [1], to make that series slightly easier to review. This > > >>>> series rebases the implementation of the TX path on top of the > > >>>> net_iov/netmem framework agreed upon and merged. The motivation for > > >>>> the feature is thoroughly described in the docs & cover letter of the > > >>>> original proposal, so I don't repeat the lengthy descriptions here, but > > >>>> they are available in [1]. > > >>>> > > >>>> Sending this series as RFC as the winder closure is immenient. I plan on > > >>>> reposting as non-RFC once the tree re-opens, addressing any feedback > > >>>> I receive in the meantime. > > >>> > > >>> I guess you should drop this paragraph. > > >>> > > >>>> Full outline on usage of the TX path is detailed in the documentation > > >>>> added in the first patch. > > >>>> > > >>>> Test example is available via the kselftest included in the series as well. > > >>>> > > >>>> The series is relatively small, as the TX path for this feature largely > > >>>> piggybacks on the existing MSG_ZEROCOPY implementation. > > >>> > > >>> It looks like no additional device level support is required. That is > > >>> IMHO so good up to suspicious level :) > > >>> > > >> > > >> It is correct no additional device level support is required. I don't > > >> have any local changes to my driver to make this work. I think Stan > > >> on-list was able to run the TX path (he commented on fixes to the test > > >> but didn't say it doesn't work :D) and one other person was able to > > >> run it offlist. > > > > > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > > > I have similar internal patch for mlx5 (will share after RX part gets > > > in). I agree that it seems like gve_unmap_packet needs some work to be more > > > careful to not unmap NIOVs (if you were testing against gve). > > > > What happen if an user try to use devmem TX on a device not really > > supporting it? Silent data corruption? > > > > So the tx dma-buf binding netlink API will bind the dma-buf to the > netdevice. If that fails, the uapi will return failure and devmem tx > will not be enabled. > > If the dma-binding succeeds, then the device can indeed DMA into the > dma-addrs in the device. The TX path will dma from the dma-addrs in > the device just fine and it need not be aware that the dma-addrs are > coming from a device and not from host memory. > > The only issue that Stan's patches is pointing to, is that the driver > will likely be passing these dma-buf addresses into dma-mapping APIs > like dma_unmap_*() and dma_sync_*() functions. Those, AFAIU, will be > no-ops with dma-buf addresses in most setups, but it's not 100% safe > to pass those dma-buf addresses to these dma-mapping APIs, so we > should avoid these calls entirely. > > > Don't we need some way for the device to opt-in (or opt-out) and avoid > > such issues? > > > > Yeah, I think likely the driver needs to declare support (i.e. it's > not using dma-mapping API with dma-buf addresses). netif_skb_features/ndo_features_check seems like a good fit?
On 02/04, Mina Almasry wrote: > On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 02/04, Mina Almasry wrote: > > > On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > > > > The TX path had been dropped from the Device Memory TCP patch series > > > > > post RFCv1 [1], to make that series slightly easier to review. This > > > > > series rebases the implementation of the TX path on top of the > > > > > net_iov/netmem framework agreed upon and merged. The motivation for > > > > > the feature is thoroughly described in the docs & cover letter of the > > > > > original proposal, so I don't repeat the lengthy descriptions here, but > > > > > they are available in [1]. > > > > > > > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > > > > reposting as non-RFC once the tree re-opens, addressing any feedback > > > > > I receive in the meantime. > > > > > > > > I guess you should drop this paragraph. > > > > > > > > > Full outline on usage of the TX path is detailed in the documentation > > > > > added in the first patch. > > > > > > > > > > Test example is available via the kselftest included in the series as well. > > > > > > > > > > The series is relatively small, as the TX path for this feature largely > > > > > piggybacks on the existing MSG_ZEROCOPY implementation. > > > > > > > > It looks like no additional device level support is required. That is > > > > IMHO so good up to suspicious level :) > > > > > > > > > > It is correct no additional device level support is required. I don't > > > have any local changes to my driver to make this work. I think Stan > > > on-list was able to run the TX path (he commented on fixes to the test > > > but didn't say it doesn't work :D) and one other person was able to > > > run it offlist. > > > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > > I have similar internal patch for mlx5 (will share after RX part gets > > in). I agree that it seems like gve_unmap_packet needs some work to be more > > careful to not unmap NIOVs (if you were testing against gve). > > Hmm. I think you're right. We ran into a similar issue with the RX > path. The RX path worked 'fine' on initial merge, but it was passing > dmabuf dma-addrs to the dma-mapping API which Jason later called out > to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will > boil down into no-ops for a lot of setups I think which is why I'm not > running into any issues in testing, but upon closer look, I think yes, > we need to make sure the driver doesn't end up passing these niov > dma-addrs to functions like dma_unmap_*() and dma_sync_*(). > > Stan, do you run into issues (crashes/warnings/bugs) in your setup > when the driver tries to unmap niovs? Or did you implement these > changes purely for safety? I don't run into any issues with those unmaps in place, but I'm running x86 with iommu bypass (and as you mention in the other thread, those calls are no-ops in this case).
On Tue, Feb 4, 2025 at 11:43 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 02/04, Mina Almasry wrote: > > On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > On 02/04, Mina Almasry wrote: > > > > On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > > > > > The TX path had been dropped from the Device Memory TCP patch series > > > > > > post RFCv1 [1], to make that series slightly easier to review. This > > > > > > series rebases the implementation of the TX path on top of the > > > > > > net_iov/netmem framework agreed upon and merged. The motivation for > > > > > > the feature is thoroughly described in the docs & cover letter of the > > > > > > original proposal, so I don't repeat the lengthy descriptions here, but > > > > > > they are available in [1]. > > > > > > > > > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > > > > > reposting as non-RFC once the tree re-opens, addressing any feedback > > > > > > I receive in the meantime. > > > > > > > > > > I guess you should drop this paragraph. > > > > > > > > > > > Full outline on usage of the TX path is detailed in the documentation > > > > > > added in the first patch. > > > > > > > > > > > > Test example is available via the kselftest included in the series as well. > > > > > > > > > > > > The series is relatively small, as the TX path for this feature largely > > > > > > piggybacks on the existing MSG_ZEROCOPY implementation. > > > > > > > > > > It looks like no additional device level support is required. That is > > > > > IMHO so good up to suspicious level :) > > > > > > > > > > > > > It is correct no additional device level support is required. I don't > > > > have any local changes to my driver to make this work. I think Stan > > > > on-list was able to run the TX path (he commented on fixes to the test > > > > but didn't say it doesn't work :D) and one other person was able to > > > > run it offlist. > > > > > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > > > I have similar internal patch for mlx5 (will share after RX part gets > > > in). I agree that it seems like gve_unmap_packet needs some work to be more > > > careful to not unmap NIOVs (if you were testing against gve). > > > > Hmm. I think you're right. We ran into a similar issue with the RX > > path. The RX path worked 'fine' on initial merge, but it was passing > > dmabuf dma-addrs to the dma-mapping API which Jason later called out > > to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will > > boil down into no-ops for a lot of setups I think which is why I'm not > > running into any issues in testing, but upon closer look, I think yes, > > we need to make sure the driver doesn't end up passing these niov > > dma-addrs to functions like dma_unmap_*() and dma_sync_*(). > > > > Stan, do you run into issues (crashes/warnings/bugs) in your setup > > when the driver tries to unmap niovs? Or did you implement these > > changes purely for safety? > > I don't run into any issues with those unmaps in place, but I'm running x86 > with iommu bypass (and as you mention in the other thread, those > calls are no-ops in this case). The dma_addr from dma-buf should never enter dma_* APIs. dma-bufs exporters have their own implementation of these ops and they could be no-op for identity mappings or when iommu is disabled (in a VM? with no IOMMU enabled GPA=IOVA). so if we really want to map/unmap/sync these addresses the dma-buf APIs should be used to do that. Maybe some glue with a memory provider is required for these net_iovs? I think the safest option with these is that mappings are never unmapped manually by driver until the dma_buf_unmap_attachment is called during unbinding? But maybe that complicates things for io_uring?
On 02/04, Samiullah Khawaja wrote: > On Tue, Feb 4, 2025 at 11:43 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 02/04, Mina Almasry wrote: > > > On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > > > On 02/04, Mina Almasry wrote: > > > > > On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > On 2/3/25 11:39 PM, Mina Almasry wrote: > > > > > > > The TX path had been dropped from the Device Memory TCP patch series > > > > > > > post RFCv1 [1], to make that series slightly easier to review. This > > > > > > > series rebases the implementation of the TX path on top of the > > > > > > > net_iov/netmem framework agreed upon and merged. The motivation for > > > > > > > the feature is thoroughly described in the docs & cover letter of the > > > > > > > original proposal, so I don't repeat the lengthy descriptions here, but > > > > > > > they are available in [1]. > > > > > > > > > > > > > > Sending this series as RFC as the winder closure is immenient. I plan on > > > > > > > reposting as non-RFC once the tree re-opens, addressing any feedback > > > > > > > I receive in the meantime. > > > > > > > > > > > > I guess you should drop this paragraph. > > > > > > > > > > > > > Full outline on usage of the TX path is detailed in the documentation > > > > > > > added in the first patch. > > > > > > > > > > > > > > Test example is available via the kselftest included in the series as well. > > > > > > > > > > > > > > The series is relatively small, as the TX path for this feature largely > > > > > > > piggybacks on the existing MSG_ZEROCOPY implementation. > > > > > > > > > > > > It looks like no additional device level support is required. That is > > > > > > IMHO so good up to suspicious level :) > > > > > > > > > > > > > > > > It is correct no additional device level support is required. I don't > > > > > have any local changes to my driver to make this work. I think Stan > > > > > on-list was able to run the TX path (he commented on fixes to the test > > > > > but didn't say it doesn't work :D) and one other person was able to > > > > > run it offlist. > > > > > > > > For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ > > > > I have similar internal patch for mlx5 (will share after RX part gets > > > > in). I agree that it seems like gve_unmap_packet needs some work to be more > > > > careful to not unmap NIOVs (if you were testing against gve). > > > > > > Hmm. I think you're right. We ran into a similar issue with the RX > > > path. The RX path worked 'fine' on initial merge, but it was passing > > > dmabuf dma-addrs to the dma-mapping API which Jason later called out > > > to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will > > > boil down into no-ops for a lot of setups I think which is why I'm not > > > running into any issues in testing, but upon closer look, I think yes, > > > we need to make sure the driver doesn't end up passing these niov > > > dma-addrs to functions like dma_unmap_*() and dma_sync_*(). > > > > > > Stan, do you run into issues (crashes/warnings/bugs) in your setup > > > when the driver tries to unmap niovs? Or did you implement these > > > changes purely for safety? > > > > I don't run into any issues with those unmaps in place, but I'm running x86 > > with iommu bypass (and as you mention in the other thread, those > > calls are no-ops in this case). > The dma_addr from dma-buf should never enter dma_* APIs. dma-bufs > exporters have their own implementation of these ops and they could be > no-op for identity mappings or when iommu is disabled (in a VM? with > no IOMMU enabled GPA=IOVA). so if we really want to map/unmap/sync > these addresses the dma-buf APIs should be used to do that. Maybe some > glue with a memory provider is required for these net_iovs? I think > the safest option with these is that mappings are never unmapped > manually by driver until the dma_buf_unmap_attachment is called during > unbinding? But maybe that complicates things for io_uring? Correct, we don't want to call dma_* APIs on NIOVs, but currently we do (unmap on tx completion). I mentioned [0] in another thread, we need something similar for gve (and eventually mlx). skb_frag_dma_map hides the mapping, but the unmapping unconditionally explicitly calls dma_ APIs (in most drivers I've looked at). 0: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/
On Tue, 4 Feb 2025 11:41:09 -0800 Stanislav Fomichev wrote: > > > Don't we need some way for the device to opt-in (or opt-out) and avoid > > > such issues? > > > > > > > Yeah, I think likely the driver needs to declare support (i.e. it's > > not using dma-mapping API with dma-buf addresses). > > netif_skb_features/ndo_features_check seems like a good fit? validate_xmit_skb()
On Mon, 3 Feb 2025 22:39:10 +0000 Mina Almasry wrote: > v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* > === > > RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* nit: lore links are better please stick to RFC until a driver implementation is ready and included