Message ID | 168451505181.3702129.3429054159823295146.stgit@awfm-02.cornelisnetworks.com (mailing list archive) |
---|---|
Headers | show |
Series | Updates for 6.5 | expand |
On Fri, May 19, 2023 at 12:54:14PM -0400, Dennis Dalessandro wrote: > Here are 3 more patches related/spawned out of the work to scale back the > page pinning re-work. This series depends on [1] which was submitted for RC > recently. > > [1] https://patchwork.kernel.org/project/linux-rdma/patch/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com/ > > --- > > Brendan Cunningham (2): > IB/hfi1: Add mmu_rb_node refcount to hfi1_mmu_rb_template tracepoints > IB/hfi1: Remove unused struct mmu_rb_ops fields .insert, .invalidate I took these two > Patrick Kelsey (1): > IB/hfi1: Separate user SDMA page-pinning from memory type Don't know what to do with this one Maybe send a complete feature. Jason
On 6/1/23 6:54 PM, Jason Gunthorpe wrote: > On Fri, May 19, 2023 at 12:54:14PM -0400, Dennis Dalessandro wrote: >> Here are 3 more patches related/spawned out of the work to scale back the >> page pinning re-work. This series depends on [1] which was submitted for RC >> recently. >> >> [1] https://patchwork.kernel.org/project/linux-rdma/patch/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com/ >> >> --- >> >> Brendan Cunningham (2): >> IB/hfi1: Add mmu_rb_node refcount to hfi1_mmu_rb_template tracepoints >> IB/hfi1: Remove unused struct mmu_rb_ops fields .insert, .invalidate > > I took these two > >> Patrick Kelsey (1): >> IB/hfi1: Separate user SDMA page-pinning from memory type > > Don't know what to do with this one I'm not seeing why it's a problem. It improves our existing page pinning by making the code easier to follow. It makes the code easier to maintain as well centralizing the pinning code. > Maybe send a complete feature. It is a complete feature. It's just a refactoring really of an existing feature. It makes it more flexible to extend in the future and is the current interface or psm2 and libfabric. We are clearly not throwing some uapi changes over the fence that have no user backing. We have existing use cases already. Now all of this being said, this is starting to concern me. We plan to start sending patches for supporting new HW soon. We were going to do this incrementally. Is that going to be considered an incomplete feature? Should we wait until it's all done and ship it all at once? I was envisioning doing things the way we did for rdmavt, showing our work so to speak, making incremental changes overtime as opposed to how we submitted the original hfi1 code in a giant blob. For instance, we are going to add a field to a data structure to hold parameters that differ between multiple chips. The first patch was going to be adding that infrastructure and moving WFR (the existing chip) values into it. In my opinion that's basically the same thing we are doing here with the pinning interface. This is all in effort to be open and involve the community as much as possible along the way so we avoid a fiasco like we had years ago with hfi1's introduction. -Denny
On Thu, Jun 01, 2023 at 11:41:24PM -0400, Dennis Dalessandro wrote: > On 6/1/23 6:54 PM, Jason Gunthorpe wrote: > > On Fri, May 19, 2023 at 12:54:14PM -0400, Dennis Dalessandro wrote: > >> Here are 3 more patches related/spawned out of the work to scale back the > >> page pinning re-work. This series depends on [1] which was submitted for RC > >> recently. > >> > >> [1] https://patchwork.kernel.org/project/linux-rdma/patch/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com/ > >> > >> --- > >> > >> Brendan Cunningham (2): > >> IB/hfi1: Add mmu_rb_node refcount to hfi1_mmu_rb_template tracepoints > >> IB/hfi1: Remove unused struct mmu_rb_ops fields .insert, .invalidate > > > > I took these two > > > >> Patrick Kelsey (1): > >> IB/hfi1: Separate user SDMA page-pinning from memory type > > > > Don't know what to do with this one > > I'm not seeing why it's a problem. It improves our existing page pinning by > making the code easier to follow. It makes the code easier to maintain as well > centralizing the pinning code. > > > Maybe send a complete feature. > > It is a complete feature. It's just a refactoring really of an existing feature. > It makes it more flexible to extend in the future and is the current interface > or psm2 and libfabric. If it was refactoring it would not add new uAPI. The commit message said this was done for 'other than system memory' which the driver doesn't support. So it is all very confusing what this is all for. > Now all of this being said, this is starting to concern me. We plan to start > sending patches for supporting new HW soon. We were going to do this > incrementally. I think we said long ago that this was the last HW that can use the hfi1 cdev. So you will have to think carefully about is needed. It is part of why I don't want to take uAPI changed for incomplete features. I'm wondering how you will fit dmabuf into hfi1 when I won't be happy if this is done by adding dmabuf support to the cdev. > Is that going to be considered an incomplete feature? Should we > wait until it's all done and ship it all at once? I was envisioning doing things > the way we did for rdmavt, showing our work so to speak, making incremental > changes overtime as opposed to how we submitted the original hfi1 code in a > giant blob. I think it depends, stay away from the uapi and things are easier Jason
On 6/2/23 9:42 AM, Jason Gunthorpe wrote: > I think we said long ago that this was the last HW that can use the > hfi1 cdev. Yeah. I don't want to conflate the two things though. This patch is specifically for the older HW. > So you will have to think carefully about is needed. It is part of why > I don't want to take uAPI changed for incomplete features. I'm > wondering how you will fit dmabuf into hfi1 when I won't be happy if > this is done by adding dmabuf support to the cdev. I think at one point you said it was ok to have the fast datapath go through the cdev. We want the core to own the configuration/etc. This is the fast datapath. The high level idea in my mind is that we do basically what is done with our control IOCTLs through the core, setting up contexts, etc. However, still create a cdev for sending in the lists of pages to pin down since that is hot path. I assume that cdev could be created by the core and owned rather than done by the driver directly. >> Is that going to be considered an incomplete feature? Should we >> wait until it's all done and ship it all at once? I was envisioning doing things >> the way we did for rdmavt, showing our work so to speak, making incremental >> changes overtime as opposed to how we submitted the original hfi1 code in a >> giant blob. > > I think it depends, stay away from the uapi and things are easier The new chip doesn't require any changes to uapi. We'll start sending patches soon then. -Denny
On Fri, Jun 02, 2023 at 01:15:47PM -0400, Dennis Dalessandro wrote: > > So you will have to think carefully about is needed. It is part of why > > I don't want to take uAPI changed for incomplete features. I'm > > wondering how you will fit dmabuf into hfi1 when I won't be happy if > > this is done by adding dmabuf support to the cdev. > > I think at one point you said it was ok to have the fast datapath go through the > cdev. We want the core to own the configuration/etc. This is the fast datapath. Maybe, but dmabuf binding and page extraction really can't be fast datapath, it just isn't fast to start with. You should be crystalizing the dmabuf into a MR and caching the DMA addrs like everything else and referring the MR on your fast path. > The high level idea in my mind is that we do basically what is done with our > control IOCTLs through the core, setting up contexts, etc. However, still create > a cdev for sending in the lists of pages to pin down since that is hot path. I > assume that cdev could be created by the core and owned rather than done by the > driver directly. I never really liked that HFI has effectively its own weird implementation of MRs and ODP hidden in the cdev. Jason
On 6/2/23 1:23 PM, Jason Gunthorpe wrote: > On Fri, Jun 02, 2023 at 01:15:47PM -0400, Dennis Dalessandro wrote: >>> So you will have to think carefully about is needed. It is part of why >>> I don't want to take uAPI changed for incomplete features. I'm >>> wondering how you will fit dmabuf into hfi1 when I won't be happy if >>> this is done by adding dmabuf support to the cdev. The cdev just needs to know what type of memory it's dealing with. We expect the dmabuf to be allocated and ready to use. Just like we would a GPU buffer. So would you still reject the patch if we sent support for AMD's GPU instead of dmabuf if it's all in-tree and upstream and we have the user code to go with it? To me enabling this support for our cdev shouldn't be that big of a deal. It's in support of our existing HW. I'm still committed to fixing the cdev once and for all. However I'm planning that as part of the next gen HW release. I just don't see why we need to tie the two things together especially when the only uAPI change is to have a way to pass in the type of buffer that is backing the iovec, and I guess should bump the version number as well. We aren't adding any new IOCTLs or anything like that. Regardless, specific to this patch, Brendan has reworked the patch to not touch UAPI but just do the refactoring of splitting out the page pinning from the rest of the SDMA stuff. -Denny
On Fri, Jun 16, 2023 at 12:33:40PM -0400, Dennis Dalessandro wrote: > On 6/2/23 1:23 PM, Jason Gunthorpe wrote: > > On Fri, Jun 02, 2023 at 01:15:47PM -0400, Dennis Dalessandro wrote: > >>> So you will have to think carefully about is needed. It is part of why > >>> I don't want to take uAPI changed for incomplete features. I'm > >>> wondering how you will fit dmabuf into hfi1 when I won't be happy if > >>> this is done by adding dmabuf support to the cdev. > > The cdev just needs to know what type of memory it's dealing with. We expect the > dmabuf to be allocated and ready to use. Just like we would a GPU buffer. So > would you still reject the patch if we sent support for AMD's GPU instead of > dmabuf if it's all in-tree and upstream and we have the user code to go with it? I don't understand what that even means, how can you support AMD GPU without also using DMABUF? My big problem with this patch is I can't understand what it really even does because it is somehow tied to the HW functionality. Which is also very confusing because DMABUF is supposed to have general MR based code for processing MRs. We are not able to support DMABUF on "ODP" like situations, and AFAIK this hfi stuff is basically a weird version of ODP / some kernel support for registration caching. So maybe if you explain it better and more carefully, IDK. Jason