Message ID | 0-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com |
---|---|
Headers | show |
Series | Introduce fwctl subystem | expand |
On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: > v5: > - Move hunks between patches to make more sense > - Rename ucmd_buffer to fwctl_ucmd_buffer > - Update comments and commit messages > - Copyright to 2025 > - Drop bxnt WIP patches > - Allow a NULL ops->info > - Decode more op codes for mlx5 and the sub-operation for > MLX5_CMD_OP_ACCESS_REG/_USER Did you address my feedback? I asked for the mlx5 support to only be enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this patchset clearly admitted on v4: As explained above, netdev doesn't need it https://lore.kernel.org/all/Z6ZsOMLq7tt3ijX_@x130/ Greg, I've been asking for this interface to be scoped to when RDMA (/CXL/storage) is in enabled on these NICs since pretty much the first RFC. Do I need to keep responding?
On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote: > On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: > > v5: > > - Move hunks between patches to make more sense > > - Rename ucmd_buffer to fwctl_ucmd_buffer > > - Update comments and commit messages > > - Copyright to 2025 > > - Drop bxnt WIP patches > > - Allow a NULL ops->info > > - Decode more op codes for mlx5 and the sub-operation for > > MLX5_CMD_OP_ACCESS_REG/_USER > > Did you address my feedback? I asked for the mlx5 support to only be > enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this > patchset clearly admitted on v4: I never agreed to that formulation. I suggested that perhaps runtime configurations where netdev is the only driver using the HW could be disabled (ie a netdev exclusion, not a rdma inclusion). However, there is not agreement on this from Saeed who is responsible for mlx5: https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/ I also surveyed other stakeholders on a netdev-exclusion proposal and did not hear support. You need to convince people this is a good idea. However, I would agree fwctl should not accept any fwctl drivers for simple networking devices. However, "smart nics" and RDMA capable devices are in-scope. I could also probably agree to using kconfig to disable fwctl drivers on kernels that statically compile out rdma, vdpa, nvme and related, though I agree with Saeed that it seems to lack technical merit. > Greg, I've been asking for this interface to be scoped to when RDMA > (/CXL/storage) is in enabled on these NICs since pretty much the first > RFC. You only started asking for this more limited approach in v4. All your previous arguments were that fwctl should be entirely killed for any networking HW. Jason
On 04 Mar 10:00, Jason Gunthorpe wrote: >On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote: >> On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: >> > v5: >> > - Move hunks between patches to make more sense >> > - Rename ucmd_buffer to fwctl_ucmd_buffer >> > - Update comments and commit messages >> > - Copyright to 2025 >> > - Drop bxnt WIP patches >> > - Allow a NULL ops->info >> > - Decode more op codes for mlx5 and the sub-operation for >> > MLX5_CMD_OP_ACCESS_REG/_USER >> >> Did you address my feedback? I asked for the mlx5 support to only be >> enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this >> patchset clearly admitted on v4: When I said fwctl is not needed for netdev, I meant that it will not be used for netdev object configuration and as I said before FW will block that anyways. fwctl in mlx5 is not only for RDMA, So I don't know how to address your comment. Not to mention that fwctl is a very great tool to debug netdev problems. > >I never agreed to that formulation. I suggested that perhaps runtime >configurations where netdev is the only driver using the HW could be >disabled (ie a netdev exclusion, not a rdma inclusion). > >However, there is not agreement on this from Saeed who is responsible >for mlx5: > > https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/ > >I also surveyed other stakeholders on a netdev-exclusion proposal and >did not hear support. You need to convince people this is a good idea. > >However, I would agree fwctl should not accept any fwctl drivers for >simple networking devices. However, "smart nics" and RDMA capable >devices are in-scope. > >I could also probably agree to using kconfig to disable fwctl drivers >on kernels that statically compile out rdma, vdpa, nvme and related, >though I agree with Saeed that it seems to lack technical merit. > >> Greg, I've been asking for this interface to be scoped to when RDMA >> (/CXL/storage) is in enabled on these NICs since pretty much the first >> RFC. > >You only started asking for this more limited approach in v4. All your >previous arguments were that fwctl should be entirely killed for any >networking HW. > >Jason >
On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > I never agreed to that formulation. I suggested that perhaps runtime > configurations where netdev is the only driver using the HW could be > disabled (ie a netdev exclusion, not a rdma inclusion). I thought you were arguing that me opposing the addition was "maintainer overreach". As in me telling other parts of the kernel what is and isn't allowed. Do I not get a say what gets merged under drivers/net/ now? > However, I would agree fwctl should not accept any fwctl drivers for > simple networking devices. However, "smart nics" and RDMA capable > devices are in-scope. Please stop with the fake technical arguments. I worked on SmartNICs for a decade.
On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > > I never agreed to that formulation. I suggested that perhaps runtime > > configurations where netdev is the only driver using the HW could be > > disabled (ie a netdev exclusion, not a rdma inclusion). > > I thought you were arguing that me opposing the addition was > "maintainer overreach". As in me telling other parts of the kernel > what is and isn't allowed. Do I not get a say what gets merged under > drivers/net/ now? The PCI core drivers are a shared resource jointly maintained by all the subsytems that use them. They are maintained by their respective maintainers. Saeed/etc in this case. It would be inappropriate for your preferences to supersede Saeed's when he is a maintainer of the mlx5_core driver and fwctl. Please try and get Saeed on board with your plan. If the placement under drivers/net makes this confusing then we can certainly change the directory names. Jason
On Wed, Mar 05, 2025 at 09:32:54AM -0400, Jason Gunthorpe wrote: > On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > > On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > > > I never agreed to that formulation. I suggested that perhaps runtime > > > configurations where netdev is the only driver using the HW could be > > > disabled (ie a netdev exclusion, not a rdma inclusion). > > > > I thought you were arguing that me opposing the addition was > > "maintainer overreach". As in me telling other parts of the kernel > > what is and isn't allowed. Do I not get a say what gets merged under > > drivers/net/ now? > > The PCI core drivers are a shared resource jointly maintained by all > the subsytems that use them. They are maintained by their respective > maintainers. Saeed/etc in this case. > > It would be inappropriate for your preferences to supersede Saeed's > when he is a maintainer of the mlx5_core driver and fwctl. Please try > and get Saeed on board with your plan. > > If the placement under drivers/net makes this confusing then we can > certainly change the directory names. I suggested it before to Jakub and it looks like he is not opposing to it. https://lore.kernel.org/all/20250211075553.GF17863@unreal/ Thanks > > Jason >
Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: >> > I never agreed to that formulation. I suggested that perhaps runtime >> > configurations where netdev is the only driver using the HW could be >> > disabled (ie a netdev exclusion, not a rdma inclusion). >> >> I thought you were arguing that me opposing the addition was >> "maintainer overreach". As in me telling other parts of the kernel >> what is and isn't allowed. Do I not get a say what gets merged under >> drivers/net/ now? > >The PCI core drivers are a shared resource jointly maintained by all >the subsytems that use them. They are maintained by their respective >maintainers. Saeed/etc in this case. > >It would be inappropriate for your preferences to supersede Saeed's >when he is a maintainer of the mlx5_core driver and fwctl. Please try >and get Saeed on board with your plan. > >If the placement under drivers/net makes this confusing then we can >certainly change the directory names. According to how mlx5 driver is structured, and the rest of the advanced drivers in the same area are becoming as well, it would make sense to me to have mlx5 core in separate core directory, maintained directly by driver maintainer: drivers/core/mlx5/ then each of the protocol auxiliary device lands in appropriate subsystem directory. It's not always simple to find clear cut though. Like for example devlink and representors code related to eswitch orchestration. The benefit would be that lots of core-related non-netdev patch-trafic would go directly, which would ease the netdev maintainership burden and would make things more flexible. This per-driver-serialization for patchset processing bottleneck would become much more bareable.
On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote: > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: > >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > >> > I never agreed to that formulation. I suggested that perhaps runtime > >> > configurations where netdev is the only driver using the HW could be > >> > disabled (ie a netdev exclusion, not a rdma inclusion). > >> > >> I thought you were arguing that me opposing the addition was > >> "maintainer overreach". As in me telling other parts of the kernel > >> what is and isn't allowed. Do I not get a say what gets merged under > >> drivers/net/ now? > > > >The PCI core drivers are a shared resource jointly maintained by all > >the subsytems that use them. They are maintained by their respective > >maintainers. Saeed/etc in this case. > > > >It would be inappropriate for your preferences to supersede Saeed's > >when he is a maintainer of the mlx5_core driver and fwctl. Please try > >and get Saeed on board with your plan. > > > >If the placement under drivers/net makes this confusing then we can > >certainly change the directory names. > > According to how mlx5 driver is structured, and the rest of the advanced > drivers in the same area are becoming as well, it would make sense to me > to have mlx5 core in separate core directory, maintained directly by driver > maintainer: > drivers/core/mlx5/ > then each of the protocol auxiliary device lands in appropriate > subsystem directory. In my vision, the write access to that drivers/core/ will be given to all relevant subsystem maintainers, so it will operate like shared branch, but foe everyone. It means that series for netdev that changes mlx5_core and netdev code will be sent to netdev and applied by netdev maintainers. In similar way, series which targets RDMA will be handled by RDMA crew. It will allow us to make sure that every piece of code in shared repository is actually used. Thanks
Wed, Mar 05, 2025 at 04:22:46PM +0100, leon@kernel.org wrote: >On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote: >> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: >> >> > I never agreed to that formulation. I suggested that perhaps runtime >> >> > configurations where netdev is the only driver using the HW could be >> >> > disabled (ie a netdev exclusion, not a rdma inclusion). >> >> >> >> I thought you were arguing that me opposing the addition was >> >> "maintainer overreach". As in me telling other parts of the kernel >> >> what is and isn't allowed. Do I not get a say what gets merged under >> >> drivers/net/ now? >> > >> >The PCI core drivers are a shared resource jointly maintained by all >> >the subsytems that use them. They are maintained by their respective >> >maintainers. Saeed/etc in this case. >> > >> >It would be inappropriate for your preferences to supersede Saeed's >> >when he is a maintainer of the mlx5_core driver and fwctl. Please try >> >and get Saeed on board with your plan. >> > >> >If the placement under drivers/net makes this confusing then we can >> >certainly change the directory names. >> >> According to how mlx5 driver is structured, and the rest of the advanced >> drivers in the same area are becoming as well, it would make sense to me >> to have mlx5 core in separate core directory, maintained directly by driver >> maintainer: >> drivers/core/mlx5/ >> then each of the protocol auxiliary device lands in appropriate >> subsystem directory. > >In my vision, the write access to that drivers/core/ will be given to all >relevant subsystem maintainers, so it will operate like shared branch, but >foe everyone. > >It means that series for netdev that changes mlx5_core and netdev code >will be sent to netdev and applied by netdev maintainers. In similar >way, series which targets RDMA will be handled by RDMA crew. > >It will allow us to make sure that every piece of code in shared >repository is actually used. Makes perfect sense to me.
On 3/5/25 8:08 AM, Jiri Pirko wrote: > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >>> I thought you were arguing that me opposing the addition was >>> "maintainer overreach". As in me telling other parts of the kernel >>> what is and isn't allowed. Do I not get a say what gets merged under >>> drivers/net/ now? >> >> The PCI core drivers are a shared resource jointly maintained by all >> the subsytems that use them. They are maintained by their respective >> maintainers. Saeed/etc in this case. >> >> It would be inappropriate for your preferences to supersede Saeed's >> when he is a maintainer of the mlx5_core driver and fwctl. Please try >> and get Saeed on board with your plan. >> >> If the placement under drivers/net makes this confusing then we can >> certainly change the directory names. > > According to how mlx5 driver is structured, and the rest of the advanced > drivers in the same area are becoming as well, it would make sense to me > to have mlx5 core in separate core directory, maintained directly by driver > maintainer: > drivers/core/mlx5/ > then each of the protocol auxiliary device lands in appropriate > subsystem directory. +1 This is how I have structured our drivers -- core driver for owning the PCI device and hosting the APIs to communicate with hardware, an aux bus and then smaller subsystem focused drivers for the aux devices that make the device usable from different contexts. I think we are ready to start upstreaming, but I am waiting to see how this falls out - to see if our core driver can land in a non-subsystem specific location (e.g., drivers/core) or if it needs to go with fwctl as a generic location.
On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: > On 3/5/25 8:08 AM, Jiri Pirko wrote: > > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: > >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > >>> I thought you were arguing that me opposing the addition was > >>> "maintainer overreach". As in me telling other parts of the kernel > >>> what is and isn't allowed. Do I not get a say what gets merged under > >>> drivers/net/ now? > >> > >> The PCI core drivers are a shared resource jointly maintained by all > >> the subsytems that use them. They are maintained by their respective > >> maintainers. Saeed/etc in this case. > >> > >> It would be inappropriate for your preferences to supersede Saeed's > >> when he is a maintainer of the mlx5_core driver and fwctl. Please try > >> and get Saeed on board with your plan. > >> > >> If the placement under drivers/net makes this confusing then we can > >> certainly change the directory names. > > > > According to how mlx5 driver is structured, and the rest of the advanced > > drivers in the same area are becoming as well, it would make sense to me > > to have mlx5 core in separate core directory, maintained directly by driver > > maintainer: > > drivers/core/mlx5/ > > then each of the protocol auxiliary device lands in appropriate > > subsystem directory. > > +1 > > This is how I have structured our drivers -- core driver for owning the > PCI device and hosting the APIs to communicate with hardware, an aux bus > and then smaller subsystem focused drivers for the aux devices that make > the device usable from different contexts. > > I think we are ready to start upstreaming, but I am waiting to see how > this falls out - to see if our core driver can land in a non-subsystem > specific location (e.g., drivers/core) or if it needs to go with fwctl > as a generic location. Do it right, and push it to drivers/core. I'm aware of at least one driver from huge company (not Nvidia) which is in preparation phase before upstreaming, and will fit nicely into this model. They have separated blocks for PCI, eth, RDMA and GPU. Thanks > > > >
On 05 Mar 20:28, Leon Romanovsky wrote: >On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: >> On 3/5/25 8:08 AM, Jiri Pirko wrote: >> > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> >>> I thought you were arguing that me opposing the addition was >> >>> "maintainer overreach". As in me telling other parts of the kernel >> >>> what is and isn't allowed. Do I not get a say what gets merged under >> >>> drivers/net/ now? >> >> >> >> The PCI core drivers are a shared resource jointly maintained by all >> >> the subsytems that use them. They are maintained by their respective >> >> maintainers. Saeed/etc in this case. >> >> >> >> It would be inappropriate for your preferences to supersede Saeed's >> >> when he is a maintainer of the mlx5_core driver and fwctl. Please try >> >> and get Saeed on board with your plan. >> >> >> >> If the placement under drivers/net makes this confusing then we can >> >> certainly change the directory names. >> > >> > According to how mlx5 driver is structured, and the rest of the advanced >> > drivers in the same area are becoming as well, it would make sense to me >> > to have mlx5 core in separate core directory, maintained directly by driver >> > maintainer: >> > drivers/core/mlx5/ >> > then each of the protocol auxiliary device lands in appropriate >> > subsystem directory. >> >> +1 >> >> This is how I have structured our drivers -- core driver for owning the >> PCI device and hosting the APIs to communicate with hardware, an aux bus >> and then smaller subsystem focused drivers for the aux devices that make >> the device usable from different contexts. >> >> I think we are ready to start upstreaming, but I am waiting to see how >> this falls out - to see if our core driver can land in a non-subsystem >> specific location (e.g., drivers/core) or if it needs to go with fwctl >> as a generic location. > >Do it right, and push it to drivers/core. I'm aware of at least one >driver from huge company (not Nvidia) which is in preparation phase >before upstreaming, and will fit nicely into this model. > How do you imagine this driver/core structure should look like? Who will be the top dir maintainer? It should be something that is tightly coupled with aux, currently aux is under drivers/base/auxiliary.c I think it should move to drivers/aux/auxiliary.c and device drivers should implement their own aux buses, WH access APIs and probing/init logic under that directory e.g: drivers/aux/mlx5/..
On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > How do you imagine this driver/core structure should look like? Who > will be the top dir maintainer? I would set something like this up more like DRM. Every driver maintainer gets commit rights, some rules about no uAPIs, or at least other acks before merging uAPI. Use the tree for staging shared branches. Driver maintainers with the most commits per cycle does the PR or something like that. There is no subsystem or cross-driver entanglement so there is no real need for gatekeeping. It would be a good opportunity to help more people engage with the kernel process and learn the full maintainer flow. > It should be something that is tightly coupled with aux, currently > aux is under drivers/base/auxiliary.c I think it should move to > drivers/aux/auxiliary.c and device drivers should implement their > own aux buses, WH access APIs and probing/init logic under that > directory e.g: drivers/aux/mlx5/.. That makes sense to me. I would expect everything in this collection to be PCI drivers spawing aux devices. drivers/aux_core/ or something like that, perhaps? Jason
On Wed, 5 Mar 2025 09:32:54 -0400 Jason Gunthorpe wrote: > > I thought you were arguing that me opposing the addition was > > "maintainer overreach". As in me telling other parts of the kernel > > what is and isn't allowed. Do I not get a say what gets merged under > > drivers/net/ now? > > The PCI core drivers are a shared resource jointly maintained by all > the subsytems that use them. They are maintained by their respective > maintainers. Saeed/etc in this case. PCI core driver? You need it for RDMA. Whether you move the code around or spell it backwards, I don't care, as long as vast majority of the users of this **NIC**, who only use TCP/IP do not have fwctl access.
On Wed, Mar 05, 2025 at 07:21:54PM -0400, Jason Gunthorpe wrote: > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > > > How do you imagine this driver/core structure should look like? Who > > will be the top dir maintainer? > > I would set something like this up more like DRM. Every driver > maintainer gets commit rights, some rules about no uAPIs, or at least > other acks before merging uAPI. Use the tree for staging shared > branches. > > Driver maintainers with the most commits per cycle does the PR or > something like that. > > There is no subsystem or cross-driver entanglement so there is no real > need for gatekeeping. Yes, it can be structured like you proposed too or/and combined with my idea https://lore.kernel.org/netdev/20250303150015.GA1926949@unreal/ The most important part is that it needs to be group of maintainers. > > It would be a good opportunity to help more people engage with the > kernel process and learn the full maintainer flow. > > > It should be something that is tightly coupled with aux, currently > > aux is under drivers/base/auxiliary.c I think it should move to > > drivers/aux/auxiliary.c and device drivers should implement their > > own aux buses, WH access APIs and probing/init logic under that > > directory e.g: drivers/aux/mlx5/.. > > That makes sense to me. I would expect everything in this collection > to be PCI drivers spawing aux devices. > > drivers/aux_core/ or something like that, perhaps? I like Saeed's proposal "drivers/aux/", it is more short and catchy. Thanks > > Jason >
On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > >> How do you imagine this driver/core structure should look like? Who >> will be the top dir maintainer? > > I would set something like this up more like DRM. Every driver > maintainer gets commit rights, some rules about no uAPIs, or at least > other acks before merging uAPI. Use the tree for staging shared > branches. why no uapi? Core driver can have knowledge of h/w resources across all use cases. For example, our core driver supports a generid netlink based dump (no set operations; get and dump only so maybe that should be the restriction?) of all objects regardless of how created -- netdev, ib, etc. -- and with much more detail. > > Driver maintainers with the most commits per cycle does the PR or > something like that. > > There is no subsystem or cross-driver entanglement so there is no real > need for gatekeeping. > > It would be a good opportunity to help more people engage with the > kernel process and learn the full maintainer flow. > >> It should be something that is tightly coupled with aux, currently >> aux is under drivers/base/auxiliary.c I think it should move to >> drivers/aux/auxiliary.c and device drivers should implement their >> own aux buses, WH access APIs and probing/init logic under that >> directory e.g: drivers/aux/mlx5/.. > > That makes sense to me. I would expect everything in this collection > to be PCI drivers spawing aux devices. > > drivers/aux_core/ or something like that, perhaps? > drivers/aux_core works for me; removes the 'pci' assumption and makes it clear the real attribute here is use of the aux bus with subsystem specific devices. I am still not clear on how such a branch will work - e.g. We will want multi-vendor review, not just merge the PR and go.
On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: > On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > > > >> How do you imagine this driver/core structure should look like? Who > >> will be the top dir maintainer? > > > > I would set something like this up more like DRM. Every driver > > maintainer gets commit rights, some rules about no uAPIs, or at least > > other acks before merging uAPI. Use the tree for staging shared > > branches. > > why no uapi? Core driver can have knowledge of h/w resources across all > use cases. For example, our core driver supports a generid netlink based > dump (no set operations; get and dump only so maybe that should be the > restriction?) of all objects regardless of how created -- netdev, ib, > etc. -- and with much more detail. Because, we want to make sure that UAPI will be aligned with relevant subsystems without any way to bypass them. Thanks
On 3/11/2025 4:23 AM, David Ahern wrote: > On 3/6/25 12:21 AM, Jason Gunthorpe wrote: >> >>> It should be something that is tightly coupled with aux, currently >>> aux is under drivers/base/auxiliary.c I think it should move to >>> drivers/aux/auxiliary.c and device drivers should implement their >>> own aux buses, WH access APIs and probing/init logic under that >>> directory e.g: drivers/aux/mlx5/.. >> >> That makes sense to me. I would expect everything in this collection >> to be PCI drivers spawing aux devices. >> >> drivers/aux_core/ or something like that, perhaps? >> > > drivers/aux_core works for me; removes the 'pci' assumption and makes it > clear the real attribute here is use of the aux bus with subsystem > specific devices. I am still not clear on how such a branch will work - > e.g. We will want multi-vendor review, not just merge the PR and go. > At the risk of getting too far into the naming-choosing rat hole... note that even "aux_core" has the assumption that the thing is auxiliary_bus oriented, which I don't think is a hard truth. As an example, yes pds_core supports pds_vdpa through an auxiliary_device, but it also supports pds_vfio_pci with no aux dev involved. I could live with drivers/aux_core, but I'd prefer drivers/core. sln
On 3/11/25 2:59 PM, Leon Romanovsky wrote: > On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: >> On 3/6/25 12:21 AM, Jason Gunthorpe wrote: >>> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: >>> >>>> How do you imagine this driver/core structure should look like? Who >>>> will be the top dir maintainer? >>> >>> I would set something like this up more like DRM. Every driver >>> maintainer gets commit rights, some rules about no uAPIs, or at least >>> other acks before merging uAPI. Use the tree for staging shared >>> branches. >> >> why no uapi? Core driver can have knowledge of h/w resources across all >> use cases. For example, our core driver supports a generid netlink based >> dump (no set operations; get and dump only so maybe that should be the >> restriction?) of all objects regardless of how created -- netdev, ib, >> etc. -- and with much more detail. > > Because, we want to make sure that UAPI will be aligned with relevant > subsystems without any way to bypass them. > > Thanks I hope there will be an open mind on get / dump style introspection apis here. Devices can work support and work within limited subsystem APIs and also allow the dumping of full essential and relevant contexts for a device. More specifically, I do not see netdev APIs ever recognizing RDMA concepts like domains and memory regions. For us, everything is relative to a domain and a region - e.g., whether a queue is created for a netdev device or an IB QP both use the same common internal APIs. I would prefer not to use fwctl for something so basic.
On 03/12, David Ahern wrote: > On 3/11/25 2:59 PM, Leon Romanovsky wrote: > > On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: > >> On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > >>> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > >>> > >>>> How do you imagine this driver/core structure should look like? Who > >>>> will be the top dir maintainer? > >>> > >>> I would set something like this up more like DRM. Every driver > >>> maintainer gets commit rights, some rules about no uAPIs, or at least > >>> other acks before merging uAPI. Use the tree for staging shared > >>> branches. > >> > >> why no uapi? Core driver can have knowledge of h/w resources across all > >> use cases. For example, our core driver supports a generid netlink based > >> dump (no set operations; get and dump only so maybe that should be the > >> restriction?) of all objects regardless of how created -- netdev, ib, > >> etc. -- and with much more detail. > > > > Because, we want to make sure that UAPI will be aligned with relevant > > subsystems without any way to bypass them. > > > > Thanks > > I hope there will be an open mind on get / dump style introspection apis > here. Devices can work support and work within limited subsystem APIs > and also allow the dumping of full essential and relevant contexts for a > device. [..] > More specifically, I do not see netdev APIs ever recognizing RDMA > concepts like domains and memory regions. For us, everything is relative > to a domain and a region - e.g., whether a queue is created for a netdev > device or an IB QP both use the same common internal APIs. I would > prefer not to use fwctl for something so basic. What specifically do you mean here by 'memory regions'? Netdev does recognize (as of recently, devmem) the concept of memory regions in the form of dmabufs.
On 3/5/25 7:28 PM, Leon Romanovsky wrote: > On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: >> On 3/5/25 8:08 AM, Jiri Pirko wrote: >>> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >>>> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >>>>> I thought you were arguing that me opposing the addition was >>>>> "maintainer overreach". As in me telling other parts of the kernel >>>>> what is and isn't allowed. Do I not get a say what gets merged under >>>>> drivers/net/ now? >>>> >>>> The PCI core drivers are a shared resource jointly maintained by all >>>> the subsytems that use them. They are maintained by their respective >>>> maintainers. Saeed/etc in this case. >>>> >>>> It would be inappropriate for your preferences to supersede Saeed's >>>> when he is a maintainer of the mlx5_core driver and fwctl. Please try >>>> and get Saeed on board with your plan. >>>> >>>> If the placement under drivers/net makes this confusing then we can >>>> certainly change the directory names. >>> >>> According to how mlx5 driver is structured, and the rest of the advanced >>> drivers in the same area are becoming as well, it would make sense to me >>> to have mlx5 core in separate core directory, maintained directly by driver >>> maintainer: >>> drivers/core/mlx5/ >>> then each of the protocol auxiliary device lands in appropriate >>> subsystem directory. >> >> +1 >> >> This is how I have structured our drivers -- core driver for owning the >> PCI device and hosting the APIs to communicate with hardware, an aux bus >> and then smaller subsystem focused drivers for the aux devices that make >> the device usable from different contexts. >> >> I think we are ready to start upstreaming, but I am waiting to see how >> this falls out - to see if our core driver can land in a non-subsystem >> specific location (e.g., drivers/core) or if it needs to go with fwctl >> as a generic location. > > Do it right, and push it to drivers/core. I'm aware of at least one > driver from huge company (not Nvidia) which is in preparation phase > before upstreaming, and will fit nicely into this model. > > They have separated blocks for PCI, eth, RDMA and GPU. > Adding that group here after an offlist discussion with that team. If I understand correctly, their preferred driver breakout is: ┌───────────────────────────────────────────────────────────┐ │ Platform Driver │ │ habanalabs │ └────────▲───────────────────▲────────────────────▲─────────┘ │AUX │AUX │AUX ┌────────┴────────┐ ┌────────┴────────┐ ┌─────────┴─────────┐ │ Accel Driver │ │ Ethernet Driver │ │ InfiniBand Driver │ │ habanalabs_accel│ │ habanalabs_en │ │ habanalabs_ib │ └─────────────────┘ └─────────────────┘ └───────────────────┘ So that means 3 different vendors and 3 different devices looking for a similar auxbus based hierarchy with a core driver not buried within one of the subsystems. I guess at this point we just need to move forward with the proposal and start sending patches. Seems like drivers/core is the consensus for the core driver?
On Thu, Mar 13, 2025 at 01:30:52PM +0100, David Ahern wrote: > On 3/5/25 7:28 PM, Leon Romanovsky wrote: > > On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: > >> On 3/5/25 8:08 AM, Jiri Pirko wrote: > >>> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: > >>>> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > >>>>> I thought you were arguing that me opposing the addition was > >>>>> "maintainer overreach". As in me telling other parts of the kernel > >>>>> what is and isn't allowed. Do I not get a say what gets merged under > >>>>> drivers/net/ now? > >>>> > >>>> The PCI core drivers are a shared resource jointly maintained by all > >>>> the subsytems that use them. They are maintained by their respective > >>>> maintainers. Saeed/etc in this case. > >>>> > >>>> It would be inappropriate for your preferences to supersede Saeed's > >>>> when he is a maintainer of the mlx5_core driver and fwctl. Please try > >>>> and get Saeed on board with your plan. > >>>> > >>>> If the placement under drivers/net makes this confusing then we can > >>>> certainly change the directory names. > >>> > >>> According to how mlx5 driver is structured, and the rest of the advanced > >>> drivers in the same area are becoming as well, it would make sense to me > >>> to have mlx5 core in separate core directory, maintained directly by driver > >>> maintainer: > >>> drivers/core/mlx5/ > >>> then each of the protocol auxiliary device lands in appropriate > >>> subsystem directory. > >> > >> +1 > >> > >> This is how I have structured our drivers -- core driver for owning the > >> PCI device and hosting the APIs to communicate with hardware, an aux bus > >> and then smaller subsystem focused drivers for the aux devices that make > >> the device usable from different contexts. > >> > >> I think we are ready to start upstreaming, but I am waiting to see how > >> this falls out - to see if our core driver can land in a non-subsystem > >> specific location (e.g., drivers/core) or if it needs to go with fwctl > >> as a generic location. > > > > Do it right, and push it to drivers/core. I'm aware of at least one > > driver from huge company (not Nvidia) which is in preparation phase > > before upstreaming, and will fit nicely into this model. > > > > They have separated blocks for PCI, eth, RDMA and GPU. > > > > Adding that group here after an offlist discussion with that team. If I > understand correctly, their preferred driver breakout is: > > > ┌───────────────────────────────────────────────────────────┐ > │ Platform Driver │ > │ habanalabs │ > └────────▲───────────────────▲────────────────────▲─────────┘ > │AUX │AUX │AUX > ┌────────┴────────┐ ┌────────┴────────┐ ┌─────────┴─────────┐ > │ Accel Driver │ │ Ethernet Driver │ │ InfiniBand Driver │ > │ habanalabs_accel│ │ habanalabs_en │ │ habanalabs_ib │ > └─────────────────┘ └─────────────────┘ └───────────────────┘ You got it right. > > So that means 3 different vendors and 3 different devices looking for a > similar auxbus based hierarchy with a core driver not buried within one > of the subsystems. > > I guess at this point we just need to move forward with the proposal and > start sending patches. > > Seems like drivers/core is the consensus for the core driver? Yes, anything that is not aux_core is fine by me. drivers/core or drivers/aux. Thanks >
On 3/13/2025 5:48 AM, Leon Romanovsky wrote: > On Thu, Mar 13, 2025 at 01:30:52PM +0100, David Ahern wrote: >> So that means 3 different vendors and 3 different devices looking for a >> similar auxbus based hierarchy with a core driver not buried within one >> of the subsystems. >> >> I guess at this point we just need to move forward with the proposal and >> start sending patches. >> >> Seems like drivers/core is the consensus for the core driver? > > Yes, anything that is not aux_core is fine by me. > > drivers/core or drivers/aux. Between the two of these I prefer drivers/core - I don't want to see this tied to aux for the same reasons we don't want it tied to pci or net. sln
On Thu, Mar 13, 2025 at 12:59:16PM -0700, Nelson, Shannon wrote: > On 3/13/2025 5:48 AM, Leon Romanovsky wrote: > > On Thu, Mar 13, 2025 at 01:30:52PM +0100, David Ahern wrote: > > > > > So that means 3 different vendors and 3 different devices looking for a > > > similar auxbus based hierarchy with a core driver not buried within one > > > of the subsystems. > > > > > > I guess at this point we just need to move forward with the proposal and > > > start sending patches. > > > > > > Seems like drivers/core is the consensus for the core driver? > > > > Yes, anything that is not aux_core is fine by me. > > > > drivers/core or drivers/aux. > > Between the two of these I prefer drivers/core - I don't want to see this > tied to aux for the same reasons we don't want it tied to pci or net. Decades ago we tried to add drivers/core/ but I think tools really didn't like to see "core" as a directory name. Hopefully you all don't run into that issue here as well :) Anyway, if you all want me to run that tree as a "neutral" third-party, I'll be glad to do so. thanks, greg k-h
On 3/13/2025 12:59 PM, Nelson, Shannon wrote: > On 3/13/2025 5:48 AM, Leon Romanovsky wrote: >> On Thu, Mar 13, 2025 at 01:30:52PM +0100, David Ahern wrote: > > >>> So that means 3 different vendors and 3 different devices looking for a >>> similar auxbus based hierarchy with a core driver not buried within one >>> of the subsystems. >>> >>> I guess at this point we just need to move forward with the proposal and >>> start sending patches. >>> >>> Seems like drivers/core is the consensus for the core driver? >> >> Yes, anything that is not aux_core is fine by me. >> >> drivers/core or drivers/aux. > > Between the two of these I prefer drivers/core - I don't want to see > this tied to aux for the same reasons we don't want it tied to pci or net. > > sln > > I'd throw my hat in for drivers/core as well, I think it makes the most sense and is the most subsystem neutral name. Hopefully any issue with tooling can be solved relatively easily.
On Fri, Mar 14, 2025 at 06:37:11AM +0100, Greg Kroah-Hartman wrote: > On Thu, Mar 13, 2025 at 12:59:16PM -0700, Nelson, Shannon wrote: > > On 3/13/2025 5:48 AM, Leon Romanovsky wrote: > > > On Thu, Mar 13, 2025 at 01:30:52PM +0100, David Ahern wrote: > > > > > > > > So that means 3 different vendors and 3 different devices looking for a > > > > similar auxbus based hierarchy with a core driver not buried within one > > > > of the subsystems. > > > > > > > > I guess at this point we just need to move forward with the proposal and > > > > start sending patches. > > > > > > > > Seems like drivers/core is the consensus for the core driver? > > > > > > Yes, anything that is not aux_core is fine by me. > > > > > > drivers/core or drivers/aux. > > > > Between the two of these I prefer drivers/core - I don't want to see this > > tied to aux for the same reasons we don't want it tied to pci or net. > > Decades ago we tried to add drivers/core/ but I think tools really > didn't like to see "core" as a directory name. Hopefully you all don't > run into that issue here as well :) > > Anyway, if you all want me to run that tree as a "neutral" third-party, > I'll be glad to do so. Thanks for readiness, we will definitely be glad to see you on board. Thanks > > thanks, > > greg k-h
On 3/12/25 11:34 AM, Stanislav Fomichev wrote: >> More specifically, I do not see netdev APIs ever recognizing RDMA >> concepts like domains and memory regions. For us, everything is relative >> to a domain and a region - e.g., whether a queue is created for a netdev >> device or an IB QP both use the same common internal APIs. I would >> prefer not to use fwctl for something so basic. > > What specifically do you mean here by 'memory regions'? Ne netdev queues and flows are a subset of RDMA operations, so I mean MRs as in: IBV_REG_MR(3) Libibverbs Programmer's Manual NAME ibv_reg_mr, ibv_reg_mr_iova, ibv_reg_dmabuf_mr, ibv_dereg_mr - register or deregister a memory region (MR) SYNOPSIS #include <infiniband/verbs.h> struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access); struct ibv_mr *ibv_reg_mr_iova(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, int access); struct ibv_mr *ibv_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length, uint64_t iova, int fd, int access); int ibv_dereg_mr(struct ibv_mr *mr); DESCRIPTION ibv_reg_mr() registers a memory region (MR) associated with the protection domain pd. The MR's starting address is addr and its size is length. The argument access describes the desired mem‐ ory protection attributes; it is either 0 or the bitwise OR of one or more of the following flags: ...
On 03/14, David Ahern wrote: > On 3/12/25 11:34 AM, Stanislav Fomichev wrote: > >> More specifically, I do not see netdev APIs ever recognizing RDMA > >> concepts like domains and memory regions. For us, everything is relative > >> to a domain and a region - e.g., whether a queue is created for a netdev > >> device or an IB QP both use the same common internal APIs. I would > >> prefer not to use fwctl for something so basic. > > > > What specifically do you mean here by 'memory regions'? Ne > > netdev queues and flows are a subset of RDMA operations, so I mean MRs > as in: > > IBV_REG_MR(3) Libibverbs Programmer's Manual > > > NAME > ibv_reg_mr, ibv_reg_mr_iova, ibv_reg_dmabuf_mr, ibv_dereg_mr - > register or deregister a memory region (MR) > > SYNOPSIS > #include <infiniband/verbs.h> > > struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, > size_t length, int access); > > struct ibv_mr *ibv_reg_mr_iova(struct ibv_pd *pd, void *addr, > size_t length, uint64_t hca_va, > int access); > > struct ibv_mr *ibv_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, > size_t length, uint64_t iova, > int fd, int access); > > int ibv_dereg_mr(struct ibv_mr *mr); > > DESCRIPTION > ibv_reg_mr() registers a memory region (MR) associated with the > protection domain pd. The MR's starting address is addr and its size is > length. The argument access describes the desired mem‐ > ory protection attributes; it is either 0 or the bitwise OR of > one or more of the following flags: > > ... Sure, and netdev has: - name: bind-rx doc: Bind dmabuf to netdev attribute-set: dmabuf flags: [ admin-perm ] do: request: attributes: - ifindex - fd - queues reply: attributes: - id Which accepts dmabuf fd. Looks very similar to ibv_reg_dmabuf_mr to me. And I think we can safely ignore ibv_reg_mr_iova which needs things like proprietary nvidia-peermem to function. My point is: I don't think netdev is as opposed to memory regions as you think it is. As long as you come up with sensible new UAPI and as long as everything is in the open, it's up for discussion.
On Wed, Mar 12, 2025 at 10:31:51AM +0100, David Ahern wrote: > I hope there will be an open mind on get / dump style introspection apis > here. Devices can work support and work within limited subsystem APIs > and also allow the dumping of full essential and relevant contexts for a > device. We have three places to put dumpables like you are talking about, fwctl, netdev and rdma rdma already has a robust dumper for these kinds of objects. netdev drivers have a few options, and people use debugfs there. fwctl is supposed to let you dump the FW side view of the sytem. Do you *really* need another one? It sounds excessive to me. > More specifically, I do not see netdev APIs ever recognizing RDMA > concepts like domains and memory regions. Why not? If netdev is going to provide the same detailed view of the HW state we expect from fwctl it will need to provide ways to dump the actualy underlying HW objects, whatever they may be. It is not like building a netdev driver on top of RDMA is anything new, mlx5 does it as well. > For us, everything is relative to a domain and a region - e.g., > whether a queue is created for a netdev device or an IB QP both use > the same common internal APIs. I would prefer not to use fwctl for > something so basic. It is probably the right answer though. Jason
On Fri, Mar 14, 2025 at 11:09:47AM -0700, Jacob Keller wrote: > I'd throw my hat in for drivers/core as well, I think it makes the most > sense and is the most subsystem neutral name. Hopefully any issue with > tooling can be solved relatively easily. Given Greg's point about core dump files sometimes being in .gitignore maybe "shared_core", or something along that path is a better name? Jason
On 3/17/25 1:33 PM, Jason Gunthorpe wrote: > On Fri, Mar 14, 2025 at 11:09:47AM -0700, Jacob Keller wrote: > >> I'd throw my hat in for drivers/core as well, I think it makes the most >> sense and is the most subsystem neutral name. Hopefully any issue with >> tooling can be solved relatively easily. > > Given Greg's point about core dump files sometimes being in .gitignore > maybe "shared_core", or something along that path is a better name? > Not seeing the conflict on drivers/core: $ find . -name .gitignore | xargs grep core ./tools/testing/selftests/powerpc/ptrace/.gitignore:core-pkey ./tools/testing/selftests/cgroup/.gitignore:test_core ./tools/testing/selftests/mincore/.gitignore:mincore_selftest ./arch/mips/crypto/.gitignore:poly1305-core.S ./arch/arm/crypto/.gitignore:aesbs-core.S ./arch/arm/crypto/.gitignore:sha256-core.S ./arch/arm/crypto/.gitignore:sha512-core.S ./arch/arm/crypto/.gitignore:poly1305-core.S ./arch/arm64/crypto/.gitignore:sha256-core.S ./arch/arm64/crypto/.gitignore:sha512-core.S ./arch/arm64/crypto/.gitignore:poly1305-core.S
> -----Original Message----- > From: David Ahern <dsahern@kernel.org> > Sent: Monday, March 17, 2025 12:01 PM > To: Jason Gunthorpe <jgg@nvidia.com>; Keller, Jacob E > <jacob.e.keller@intel.com> > Cc: Nelson, Shannon <shannon.nelson@amd.com>; Leon Romanovsky > <leon@kernel.org>; Jiri Pirko <jiri@resnulli.us>; Jakub Kicinski > <kuba@kernel.org>; Greg Kroah-Hartman <gregkh@linuxfoundation.org>; Andy > Gospodarek <andrew.gospodarek@broadcom.com>; Aron Silverton > <aron.silverton@oracle.com>; Williams, Dan J <dan.j.williams@intel.com>; Daniel > Vetter <daniel.vetter@ffwll.ch>; Jiang, Dave <dave.jiang@intel.com>; Christoph > Hellwig <hch@infradead.org>; Itay Avraham <itayavr@nvidia.com>; Jiri Pirko > <jiri@nvidia.com>; Jonathan Cameron <Jonathan.Cameron@huawei.com>; > Leonid Bloch <lbloch@nvidia.com>; linux-cxl@vger.kernel.org; linux- > rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed > <saeedm@nvidia.com>; Sinyuk, Konstantin <konstantin.sinyuk@intel.com> > Subject: Re: [PATCH v5 0/8] Introduce fwctl subystem > > On 3/17/25 1:33 PM, Jason Gunthorpe wrote: > > On Fri, Mar 14, 2025 at 11:09:47AM -0700, Jacob Keller wrote: > > > >> I'd throw my hat in for drivers/core as well, I think it makes the most > >> sense and is the most subsystem neutral name. Hopefully any issue with > >> tooling can be solved relatively easily. > > > > Given Greg's point about core dump files sometimes being in .gitignore > > maybe "shared_core", or something along that path is a better name? > > > > Not seeing the conflict on drivers/core: > > $ find . -name .gitignore | xargs grep core > ./tools/testing/selftests/powerpc/ptrace/.gitignore:core-pkey > ./tools/testing/selftests/cgroup/.gitignore:test_core > ./tools/testing/selftests/mincore/.gitignore:mincore_selftest > ./arch/mips/crypto/.gitignore:poly1305-core.S > ./arch/arm/crypto/.gitignore:aesbs-core.S > ./arch/arm/crypto/.gitignore:sha256-core.S > ./arch/arm/crypto/.gitignore:sha512-core.S > ./arch/arm/crypto/.gitignore:poly1305-core.S > ./arch/arm64/crypto/.gitignore:sha256-core.S > ./arch/arm64/crypto/.gitignore:sha512-core.S > ./arch/arm64/crypto/.gitignore:poly1305-core.S I would guess its people putting core in their own ignore lists, not necessarily what we commit to the kernel tree. Thanks, Jake
On Mon, Mar 17, 2025 at 08:33:04PM +0000, Keller, Jacob E wrote: > > > > -----Original Message----- > > From: David Ahern <dsahern@kernel.org> > > Sent: Monday, March 17, 2025 12:01 PM > > To: Jason Gunthorpe <jgg@nvidia.com>; Keller, Jacob E > > <jacob.e.keller@intel.com> > > Cc: Nelson, Shannon <shannon.nelson@amd.com>; Leon Romanovsky > > <leon@kernel.org>; Jiri Pirko <jiri@resnulli.us>; Jakub Kicinski > > <kuba@kernel.org>; Greg Kroah-Hartman <gregkh@linuxfoundation.org>; Andy > > Gospodarek <andrew.gospodarek@broadcom.com>; Aron Silverton > > <aron.silverton@oracle.com>; Williams, Dan J <dan.j.williams@intel.com>; Daniel > > Vetter <daniel.vetter@ffwll.ch>; Jiang, Dave <dave.jiang@intel.com>; Christoph > > Hellwig <hch@infradead.org>; Itay Avraham <itayavr@nvidia.com>; Jiri Pirko > > <jiri@nvidia.com>; Jonathan Cameron <Jonathan.Cameron@huawei.com>; > > Leonid Bloch <lbloch@nvidia.com>; linux-cxl@vger.kernel.org; linux- > > rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed > > <saeedm@nvidia.com>; Sinyuk, Konstantin <konstantin.sinyuk@intel.com> > > Subject: Re: [PATCH v5 0/8] Introduce fwctl subystem > > > > On 3/17/25 1:33 PM, Jason Gunthorpe wrote: > > > On Fri, Mar 14, 2025 at 11:09:47AM -0700, Jacob Keller wrote: > > > > > >> I'd throw my hat in for drivers/core as well, I think it makes the most > > >> sense and is the most subsystem neutral name. Hopefully any issue with > > >> tooling can be solved relatively easily. > > > > > > Given Greg's point about core dump files sometimes being in .gitignore > > > maybe "shared_core", or something along that path is a better name? > > > > > > > Not seeing the conflict on drivers/core: > > > > $ find . -name .gitignore | xargs grep core > > ./tools/testing/selftests/powerpc/ptrace/.gitignore:core-pkey > > ./tools/testing/selftests/cgroup/.gitignore:test_core > > ./tools/testing/selftests/mincore/.gitignore:mincore_selftest > > ./arch/mips/crypto/.gitignore:poly1305-core.S > > ./arch/arm/crypto/.gitignore:aesbs-core.S > > ./arch/arm/crypto/.gitignore:sha256-core.S > > ./arch/arm/crypto/.gitignore:sha512-core.S > > ./arch/arm/crypto/.gitignore:poly1305-core.S > > ./arch/arm64/crypto/.gitignore:sha256-core.S > > ./arch/arm64/crypto/.gitignore:sha512-core.S > > ./arch/arm64/crypto/.gitignore:poly1305-core.S > > I would guess its people putting core in their own ignore lists, not necessarily what we commit to the kernel tree. Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we had git, so this wasn't a git issue. I'm all for "drivers/core/" but note, that really looks like "the driver core" area of the kernel, so maybe pick a different name? Maybe drivers/lib/ as this is common stuff for a variety of different drivers? I don't know, naming is hard, sorry. I've been defaulting to just using dutch words for projects these days as they don't seem to be used much. Hey, that might work here, drivers/kern/ perhaps? Nah... thanks, greg k-h
On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we > had git, so this wasn't a git issue. I'm all for "drivers/core/" but > note, that really looks like "the driver core" area of the kernel, so > maybe pick a different name? Yeah, +1. We have lots of places calling what is in drivers/base 'core'. Jason
On 3/18/25 6:25 AM, Jason Gunthorpe wrote: > On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > >> Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we >> had git, so this wasn't a git issue. I'm all for "drivers/core/" but >> note, that really looks like "the driver core" area of the kernel, so >> maybe pick a different name? > > Yeah, +1. We have lots of places calling what is in drivers/base 'core'. just throwing in my 2c drivers/main drivers/common drivers/primary > > Jason
On Tue, Mar 18, 2025 at 08:39:50AM -0700, Dave Jiang wrote: > > > On 3/18/25 6:25 AM, Jason Gunthorpe wrote: > > On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > > > >> Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we > >> had git, so this wasn't a git issue. I'm all for "drivers/core/" but > >> note, that really looks like "the driver core" area of the kernel, so > >> maybe pick a different name? > > > > Yeah, +1. We have lots of places calling what is in drivers/base 'core'. > > just throwing in my 2c > > drivers/main Implies the "driver core" > drivers/common lib/ maybe? > drivers/primary It's not going to be the primary drivers for my laptop :) Naming is hard. Let's see some code first... greg k-h
> -----Original Message----- > From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 18, 2025 6:25 AM > To: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; David Ahern > <dsahern@kernel.org>; Nelson, Shannon <shannon.nelson@amd.com>; Leon > Romanovsky <leon@kernel.org>; Jiri Pirko <jiri@resnulli.us>; Jakub Kicinski > <kuba@kernel.org>; Andy Gospodarek <andrew.gospodarek@broadcom.com>; > Aron Silverton <aron.silverton@oracle.com>; Williams, Dan J > <dan.j.williams@intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Jiang, Dave > <dave.jiang@intel.com>; Christoph Hellwig <hch@infradead.org>; Itay Avraham > <itayavr@nvidia.com>; Jiri Pirko <jiri@nvidia.com>; Jonathan Cameron > <Jonathan.Cameron@huawei.com>; Leonid Bloch <lbloch@nvidia.com>; linux- > cxl@vger.kernel.org; linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed > Mahameed <saeedm@nvidia.com>; Sinyuk, Konstantin > <konstantin.sinyuk@intel.com> > Subject: Re: [PATCH v5 0/8] Introduce fwctl subystem > > On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > > > Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we > > had git, so this wasn't a git issue. I'm all for "drivers/core/" but > > note, that really looks like "the driver core" area of the kernel, so > > maybe pick a different name? > > Yeah, +1. We have lots of places calling what is in drivers/base 'core'. > > Jason Core is quite often used like "the core of the kernel that is non-driver stuff", i.e. "the networking core", "the PTP core", etc. Naming things is indeed hard.
On 3/18/25 17:06, Greg Kroah-Hartman wrote: > On Tue, Mar 18, 2025 at 08:39:50AM -0700, Dave Jiang wrote: >> >> >> On 3/18/25 6:25 AM, Jason Gunthorpe wrote: >>> On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: >>> >>>> Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we >>>> had git, so this wasn't a git issue. I'm all for "drivers/core/" but >>>> note, that really looks like "the driver core" area of the kernel, so >>>> maybe pick a different name? >>> >>> Yeah, +1. We have lots of places calling what is in drivers/base 'core'. >> >> just throwing in my 2c >> >> drivers/main > > Implies the "driver core" > >> drivers/common > > lib/ maybe? > >> drivers/primary > > It's not going to be the primary drivers for my laptop :) > > Naming is hard. Let's see some code first... > > greg k-h > "platfrom" would also suggest a wider thing, so: "complex"? anyway, I don't like fwctl, so maybe "fwctl" to at least reveal the real reason :P
On Wed, Mar 19, 2025 at 06:48:48AM +0100, Przemek Kitszel wrote: > On 3/18/25 17:06, Greg Kroah-Hartman wrote: > > On Tue, Mar 18, 2025 at 08:39:50AM -0700, Dave Jiang wrote: > > > > > > > > > On 3/18/25 6:25 AM, Jason Gunthorpe wrote: > > > > On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > > > > > > > > > Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we > > > > > had git, so this wasn't a git issue. I'm all for "drivers/core/" but > > > > > note, that really looks like "the driver core" area of the kernel, so > > > > > maybe pick a different name? > > > > > > > > Yeah, +1. We have lots of places calling what is in drivers/base 'core'. > > > > > > just throwing in my 2c > > > > > > drivers/main > > > > Implies the "driver core" > > > > > drivers/common > > > > lib/ maybe? > > > > > drivers/primary > > > > It's not going to be the primary drivers for my laptop :) > > > > Naming is hard. Let's see some code first... > > > > greg k-h > > > > "platfrom" would also suggest a wider thing, so: > "complex"? > > anyway, I don't like fwctl, so maybe "fwctl" to at least reveal the real > reason :P We are discussing where to move XXX_core drivers which historically were located in drivers/net/ethernet/XXX/, see this idea https://lore.kernel.org/all/20250211075553.GF17863@unreal/ FWCTL is unrelated to this discussion and you are not forced to use it if you don't like it. Thanks
On Tue, Mar 18, 2025 at 05:06:17PM +0100, Greg Kroah-Hartman wrote: > On Tue, Mar 18, 2025 at 08:39:50AM -0700, Dave Jiang wrote: > > > > > > On 3/18/25 6:25 AM, Jason Gunthorpe wrote: > > > On Tue, Mar 18, 2025 at 02:20:45PM +0100, Greg Kroah-Hartman wrote: > > > > > >> Yes, note, the issue came up in the 2.5.x kernel days, _WAY_ before we > > >> had git, so this wasn't a git issue. I'm all for "drivers/core/" but > > >> note, that really looks like "the driver core" area of the kernel, so > > >> maybe pick a different name? > > > > > > Yeah, +1. We have lots of places calling what is in drivers/base 'core'. > > > > just throwing in my 2c > > > > drivers/main > > Implies the "driver core" > > > drivers/common > > lib/ maybe? > > > drivers/primary > > It's not going to be the primary drivers for my laptop :) > > Naming is hard. Let's see some code first... Yes, let's do name contest later when code will come. There are multiple companies already started to work on it and my hope that it will be ready for next merge cycle. Thanks > > greg k-h
> We are discussing where to move XXX_core drivers which historically were > located in drivers/net/ethernet/XXX/, see this idea https://lore.kernel.org/all/20250211075553.GF17863@unreal/ yes, I know > FWCTL is unrelated to this discussion and it is related, I see the move as a workaround for "netdev maintainer complains more than I would like", sorry :P > you are not forced to use it > if you don't like it. I know, time will tell if I would be forced to use it to don't let competitors to make a short term advantage by going by the "who cares, we have fwctl" route.
On Wed, Mar 19, 2025 at 11:46:15AM +0100, Przemek Kitszel wrote: > > > We are discussing where to move XXX_core drivers which historically were > > located in drivers/net/ethernet/XXX/, see this idea https://lore.kernel.org/all/20250211075553.GF17863@unreal/ > > yes, I know > > > FWCTL is unrelated to this discussion and > > it is related, > I see the move as a workaround for "netdev maintainer complains more > than I would like", sorry :P No, it is not. There is no technical reason to keep non-netdev code in netdev subsystem. If you have an explanation why PCI-to-AUX, VDPA, VFIO and RDMA logic MUST be under netdev subsystem, please speak it now. Otherwise please keep brigading out of this discussion. Thanks
On Thu, Feb 27, 2025 at 08:26:28PM -0400, Jason Gunthorpe wrote: > Jason Gunthorpe (6): > fwctl: Add basic structure for a class subsystem with a cdev > fwctl: Basic ioctl dispatch for the character device > fwctl: FWCTL_INFO to return basic information about the device > taint: Add TAINT_FWCTL > fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware > fwctl: Add documentation > > Saeed Mahameed (2): > fwctl/mlx5: Support for communicating with mlx5 fw > mlx5: Create an auxiliary device for fwctl_mlx5 Applied - it has been in linux-next for two weeks already. Jason
[ We now have the required three drivers on the list so things are looking probable for reaching this merge window. I will work out some shared branches with CXL and get it into linux-next once all three drivers can be assembled and reviews seem to be concluding. There are couple open notes - Greg was interested in a new name, Dan offered auxctl ] fwctl is a new subsystem intended to bring some common rules and order to the growing pattern of exposing a secure FW interface directly to userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a device for datapath operations fwctl is focused on debugging, configuration and provisioning of the device. It will not have the necessary features like interrupt delivery to support a datapath. This concept is similar to the long standing practice in the "HW" RAID space of having a device specific misc device to manage the RAID controller FW. fwctl generalizes this notion of a companion debug and management interface that goes along with a dataplane implemented in an appropriate subsystem. The need for this has reached a critical point as many users are moving to run lockdown enabled kernels. Several existing devices have had long standing tooling for management that relied on /sys/../resource0 or PCI config space access which is not permitted in lockdown. A major point of fwctl is to define and document the rules that a device must follow to expose a lockdown compatible RPC. Based on some discussion fwctl splits the RPCs into four categories FWCTL_RPC_CONFIGURATION FWCTL_RPC_DEBUG_READ_ONLY FWCTL_RPC_DEBUG_WRITE FWCTL_RPC_DEBUG_WRITE_FULL Where the latter two trigger a new TAINT_FWCTL, and the final one requires CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW would be responsible to restrict RPCs to the requested security scope, while the core code handles the tainting and CAP checks. For details see the final patch which introduces the documentation. The CXL FWCTL driver is now in it own series on v7: https://lore.kernel.org/r/20250220194438.2281088-1-dave.jiang@intel.com And a driver for the Pensando DSC (A smart NIC): https://lore.kernel.org/r/20250211234854.52277-1-shannon.nelson@amd.com I've got soft commitments for about 7 drivers in total now. There have been three LWN articles written discussing various aspects of this proposal: https://lwn.net/Articles/955001/ https://lwn.net/Articles/969383/ https://lwn.net/Articles/990802/ A really giant ksummit thread preceding a discussion at the Maintainer Summit: https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/ Several have expressed general support for this concept: AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org NVIDIA Networking Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay. Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com Work is ongoing for userspace, currently the mellanox tool suite has been ported over: https://github.com/Mellanox/mstflint And a more simplified example how to use it: https://github.com/jgunthorpe/mlx5ctl.git This is on github: https://github.com/jgunthorpe/linux/commits/fwctl v5: - Move hunks between patches to make more sense - Rename ucmd_buffer to fwctl_ucmd_buffer - Update comments and commit messages - Copyright to 2025 - Drop bxnt WIP patches - Allow a NULL ops->info - Decode more op codes for mlx5 and the sub-operation for MLX5_CMD_OP_ACCESS_REG/_USER v4: https://patch.msgid.link/r/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com - Rebase to v6.14-rc1 - Fine tune comments and rst documentatin - Adjust cleanup.h usage - remove places that add more ofuscation than value - CXL is back to its own independent series - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit - Fix mlx5ctl_validate_rpc() logic around scope checking - Disable mlx5ctl on SFs v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com - Rebase to v6.11-rc4 - Add a squashed version of David's CXL series as the 2nd driver - Add missing includes - Improve comments based on feedback - Use the kdoc format that puts the member docs inside the struct - Rewrite fwctl_alloc_device() to be clearer - Incorporate all remarks for the documentation v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com - Rebase to v6.10-rc5 - Minor style changes - Follow the style consensus for the guard stuff - Documentation grammer/spelling - Add missed length output for mlx5 get_info - Add two more missed MLX5 CMD's - Collect tags v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com> Cc: Aron Silverton <aron.silverton@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Ahern <dsahern@kernel.org> Cc: Itay Avraham <itayavr@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Jiri Pirko <jiri@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Leonid Bloch <lbloch@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: "Nelson, Shannon" <shannon.nelson@amd.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Jason Gunthorpe (6): fwctl: Add basic structure for a class subsystem with a cdev fwctl: Basic ioctl dispatch for the character device fwctl: FWCTL_INFO to return basic information about the device taint: Add TAINT_FWCTL fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware fwctl: Add documentation Saeed Mahameed (2): fwctl/mlx5: Support for communicating with mlx5 fw mlx5: Create an auxiliary device for fwctl_mlx5 Documentation/admin-guide/tainted-kernels.rst | 5 + Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++ Documentation/userspace-api/fwctl/index.rst | 12 + Documentation/userspace-api/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 1 + MAINTAINERS | 18 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/fwctl/Kconfig | 23 + drivers/fwctl/Makefile | 5 + drivers/fwctl/main.c | 421 ++++++++++++++++++ drivers/fwctl/mlx5/Makefile | 4 + drivers/fwctl/mlx5/main.c | 411 +++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 + include/linux/fwctl.h | 135 ++++++ include/linux/panic.h | 3 +- include/uapi/fwctl/fwctl.h | 139 ++++++ include/uapi/fwctl/mlx5.h | 36 ++ kernel/panic.c | 1 + tools/debugging/kernel-chktaint | 8 + 20 files changed, 1519 insertions(+), 1 deletion(-) create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst create mode 100644 Documentation/userspace-api/fwctl/index.rst create mode 100644 drivers/fwctl/Kconfig create mode 100644 drivers/fwctl/Makefile create mode 100644 drivers/fwctl/main.c create mode 100644 drivers/fwctl/mlx5/Makefile create mode 100644 drivers/fwctl/mlx5/main.c create mode 100644 include/linux/fwctl.h create mode 100644 include/uapi/fwctl/fwctl.h create mode 100644 include/uapi/fwctl/mlx5.h base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b