Message ID | 0-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce fwctl subystem | expand |
On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: > v5: > - Move hunks between patches to make more sense > - Rename ucmd_buffer to fwctl_ucmd_buffer > - Update comments and commit messages > - Copyright to 2025 > - Drop bxnt WIP patches > - Allow a NULL ops->info > - Decode more op codes for mlx5 and the sub-operation for > MLX5_CMD_OP_ACCESS_REG/_USER Did you address my feedback? I asked for the mlx5 support to only be enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this patchset clearly admitted on v4: As explained above, netdev doesn't need it https://lore.kernel.org/all/Z6ZsOMLq7tt3ijX_@x130/ Greg, I've been asking for this interface to be scoped to when RDMA (/CXL/storage) is in enabled on these NICs since pretty much the first RFC. Do I need to keep responding?
On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote: > On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: > > v5: > > - Move hunks between patches to make more sense > > - Rename ucmd_buffer to fwctl_ucmd_buffer > > - Update comments and commit messages > > - Copyright to 2025 > > - Drop bxnt WIP patches > > - Allow a NULL ops->info > > - Decode more op codes for mlx5 and the sub-operation for > > MLX5_CMD_OP_ACCESS_REG/_USER > > Did you address my feedback? I asked for the mlx5 support to only be > enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this > patchset clearly admitted on v4: I never agreed to that formulation. I suggested that perhaps runtime configurations where netdev is the only driver using the HW could be disabled (ie a netdev exclusion, not a rdma inclusion). However, there is not agreement on this from Saeed who is responsible for mlx5: https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/ I also surveyed other stakeholders on a netdev-exclusion proposal and did not hear support. You need to convince people this is a good idea. However, I would agree fwctl should not accept any fwctl drivers for simple networking devices. However, "smart nics" and RDMA capable devices are in-scope. I could also probably agree to using kconfig to disable fwctl drivers on kernels that statically compile out rdma, vdpa, nvme and related, though I agree with Saeed that it seems to lack technical merit. > Greg, I've been asking for this interface to be scoped to when RDMA > (/CXL/storage) is in enabled on these NICs since pretty much the first > RFC. You only started asking for this more limited approach in v4. All your previous arguments were that fwctl should be entirely killed for any networking HW. Jason
On 04 Mar 10:00, Jason Gunthorpe wrote: >On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote: >> On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote: >> > v5: >> > - Move hunks between patches to make more sense >> > - Rename ucmd_buffer to fwctl_ucmd_buffer >> > - Update comments and commit messages >> > - Copyright to 2025 >> > - Drop bxnt WIP patches >> > - Allow a NULL ops->info >> > - Decode more op codes for mlx5 and the sub-operation for >> > MLX5_CMD_OP_ACCESS_REG/_USER >> >> Did you address my feedback? I asked for the mlx5 support to only be >> enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this >> patchset clearly admitted on v4: When I said fwctl is not needed for netdev, I meant that it will not be used for netdev object configuration and as I said before FW will block that anyways. fwctl in mlx5 is not only for RDMA, So I don't know how to address your comment. Not to mention that fwctl is a very great tool to debug netdev problems. > >I never agreed to that formulation. I suggested that perhaps runtime >configurations where netdev is the only driver using the HW could be >disabled (ie a netdev exclusion, not a rdma inclusion). > >However, there is not agreement on this from Saeed who is responsible >for mlx5: > > https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/ > >I also surveyed other stakeholders on a netdev-exclusion proposal and >did not hear support. You need to convince people this is a good idea. > >However, I would agree fwctl should not accept any fwctl drivers for >simple networking devices. However, "smart nics" and RDMA capable >devices are in-scope. > >I could also probably agree to using kconfig to disable fwctl drivers >on kernels that statically compile out rdma, vdpa, nvme and related, >though I agree with Saeed that it seems to lack technical merit. > >> Greg, I've been asking for this interface to be scoped to when RDMA >> (/CXL/storage) is in enabled on these NICs since pretty much the first >> RFC. > >You only started asking for this more limited approach in v4. All your >previous arguments were that fwctl should be entirely killed for any >networking HW. > >Jason >
On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > I never agreed to that formulation. I suggested that perhaps runtime > configurations where netdev is the only driver using the HW could be > disabled (ie a netdev exclusion, not a rdma inclusion). I thought you were arguing that me opposing the addition was "maintainer overreach". As in me telling other parts of the kernel what is and isn't allowed. Do I not get a say what gets merged under drivers/net/ now? > However, I would agree fwctl should not accept any fwctl drivers for > simple networking devices. However, "smart nics" and RDMA capable > devices are in-scope. Please stop with the fake technical arguments. I worked on SmartNICs for a decade.
On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > > I never agreed to that formulation. I suggested that perhaps runtime > > configurations where netdev is the only driver using the HW could be > > disabled (ie a netdev exclusion, not a rdma inclusion). > > I thought you were arguing that me opposing the addition was > "maintainer overreach". As in me telling other parts of the kernel > what is and isn't allowed. Do I not get a say what gets merged under > drivers/net/ now? The PCI core drivers are a shared resource jointly maintained by all the subsytems that use them. They are maintained by their respective maintainers. Saeed/etc in this case. It would be inappropriate for your preferences to supersede Saeed's when he is a maintainer of the mlx5_core driver and fwctl. Please try and get Saeed on board with your plan. If the placement under drivers/net makes this confusing then we can certainly change the directory names. Jason
On Wed, Mar 05, 2025 at 09:32:54AM -0400, Jason Gunthorpe wrote: > On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > > On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > > > I never agreed to that formulation. I suggested that perhaps runtime > > > configurations where netdev is the only driver using the HW could be > > > disabled (ie a netdev exclusion, not a rdma inclusion). > > > > I thought you were arguing that me opposing the addition was > > "maintainer overreach". As in me telling other parts of the kernel > > what is and isn't allowed. Do I not get a say what gets merged under > > drivers/net/ now? > > The PCI core drivers are a shared resource jointly maintained by all > the subsytems that use them. They are maintained by their respective > maintainers. Saeed/etc in this case. > > It would be inappropriate for your preferences to supersede Saeed's > when he is a maintainer of the mlx5_core driver and fwctl. Please try > and get Saeed on board with your plan. > > If the placement under drivers/net makes this confusing then we can > certainly change the directory names. I suggested it before to Jakub and it looks like he is not opposing to it. https://lore.kernel.org/all/20250211075553.GF17863@unreal/ Thanks > > Jason >
Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: >> > I never agreed to that formulation. I suggested that perhaps runtime >> > configurations where netdev is the only driver using the HW could be >> > disabled (ie a netdev exclusion, not a rdma inclusion). >> >> I thought you were arguing that me opposing the addition was >> "maintainer overreach". As in me telling other parts of the kernel >> what is and isn't allowed. Do I not get a say what gets merged under >> drivers/net/ now? > >The PCI core drivers are a shared resource jointly maintained by all >the subsytems that use them. They are maintained by their respective >maintainers. Saeed/etc in this case. > >It would be inappropriate for your preferences to supersede Saeed's >when he is a maintainer of the mlx5_core driver and fwctl. Please try >and get Saeed on board with your plan. > >If the placement under drivers/net makes this confusing then we can >certainly change the directory names. According to how mlx5 driver is structured, and the rest of the advanced drivers in the same area are becoming as well, it would make sense to me to have mlx5 core in separate core directory, maintained directly by driver maintainer: drivers/core/mlx5/ then each of the protocol auxiliary device lands in appropriate subsystem directory. It's not always simple to find clear cut though. Like for example devlink and representors code related to eswitch orchestration. The benefit would be that lots of core-related non-netdev patch-trafic would go directly, which would ease the netdev maintainership burden and would make things more flexible. This per-driver-serialization for patchset processing bottleneck would become much more bareable.
On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote: > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: > >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: > >> > I never agreed to that formulation. I suggested that perhaps runtime > >> > configurations where netdev is the only driver using the HW could be > >> > disabled (ie a netdev exclusion, not a rdma inclusion). > >> > >> I thought you were arguing that me opposing the addition was > >> "maintainer overreach". As in me telling other parts of the kernel > >> what is and isn't allowed. Do I not get a say what gets merged under > >> drivers/net/ now? > > > >The PCI core drivers are a shared resource jointly maintained by all > >the subsytems that use them. They are maintained by their respective > >maintainers. Saeed/etc in this case. > > > >It would be inappropriate for your preferences to supersede Saeed's > >when he is a maintainer of the mlx5_core driver and fwctl. Please try > >and get Saeed on board with your plan. > > > >If the placement under drivers/net makes this confusing then we can > >certainly change the directory names. > > According to how mlx5 driver is structured, and the rest of the advanced > drivers in the same area are becoming as well, it would make sense to me > to have mlx5 core in separate core directory, maintained directly by driver > maintainer: > drivers/core/mlx5/ > then each of the protocol auxiliary device lands in appropriate > subsystem directory. In my vision, the write access to that drivers/core/ will be given to all relevant subsystem maintainers, so it will operate like shared branch, but foe everyone. It means that series for netdev that changes mlx5_core and netdev code will be sent to netdev and applied by netdev maintainers. In similar way, series which targets RDMA will be handled by RDMA crew. It will allow us to make sure that every piece of code in shared repository is actually used. Thanks
Wed, Mar 05, 2025 at 04:22:46PM +0100, leon@kernel.org wrote: >On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote: >> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote: >> >> > I never agreed to that formulation. I suggested that perhaps runtime >> >> > configurations where netdev is the only driver using the HW could be >> >> > disabled (ie a netdev exclusion, not a rdma inclusion). >> >> >> >> I thought you were arguing that me opposing the addition was >> >> "maintainer overreach". As in me telling other parts of the kernel >> >> what is and isn't allowed. Do I not get a say what gets merged under >> >> drivers/net/ now? >> > >> >The PCI core drivers are a shared resource jointly maintained by all >> >the subsytems that use them. They are maintained by their respective >> >maintainers. Saeed/etc in this case. >> > >> >It would be inappropriate for your preferences to supersede Saeed's >> >when he is a maintainer of the mlx5_core driver and fwctl. Please try >> >and get Saeed on board with your plan. >> > >> >If the placement under drivers/net makes this confusing then we can >> >certainly change the directory names. >> >> According to how mlx5 driver is structured, and the rest of the advanced >> drivers in the same area are becoming as well, it would make sense to me >> to have mlx5 core in separate core directory, maintained directly by driver >> maintainer: >> drivers/core/mlx5/ >> then each of the protocol auxiliary device lands in appropriate >> subsystem directory. > >In my vision, the write access to that drivers/core/ will be given to all >relevant subsystem maintainers, so it will operate like shared branch, but >foe everyone. > >It means that series for netdev that changes mlx5_core and netdev code >will be sent to netdev and applied by netdev maintainers. In similar >way, series which targets RDMA will be handled by RDMA crew. > >It will allow us to make sure that every piece of code in shared >repository is actually used. Makes perfect sense to me.
On 3/5/25 8:08 AM, Jiri Pirko wrote: > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >>> I thought you were arguing that me opposing the addition was >>> "maintainer overreach". As in me telling other parts of the kernel >>> what is and isn't allowed. Do I not get a say what gets merged under >>> drivers/net/ now? >> >> The PCI core drivers are a shared resource jointly maintained by all >> the subsytems that use them. They are maintained by their respective >> maintainers. Saeed/etc in this case. >> >> It would be inappropriate for your preferences to supersede Saeed's >> when he is a maintainer of the mlx5_core driver and fwctl. Please try >> and get Saeed on board with your plan. >> >> If the placement under drivers/net makes this confusing then we can >> certainly change the directory names. > > According to how mlx5 driver is structured, and the rest of the advanced > drivers in the same area are becoming as well, it would make sense to me > to have mlx5 core in separate core directory, maintained directly by driver > maintainer: > drivers/core/mlx5/ > then each of the protocol auxiliary device lands in appropriate > subsystem directory. +1 This is how I have structured our drivers -- core driver for owning the PCI device and hosting the APIs to communicate with hardware, an aux bus and then smaller subsystem focused drivers for the aux devices that make the device usable from different contexts. I think we are ready to start upstreaming, but I am waiting to see how this falls out - to see if our core driver can land in a non-subsystem specific location (e.g., drivers/core) or if it needs to go with fwctl as a generic location.
On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: > On 3/5/25 8:08 AM, Jiri Pirko wrote: > > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: > >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: > >>> I thought you were arguing that me opposing the addition was > >>> "maintainer overreach". As in me telling other parts of the kernel > >>> what is and isn't allowed. Do I not get a say what gets merged under > >>> drivers/net/ now? > >> > >> The PCI core drivers are a shared resource jointly maintained by all > >> the subsytems that use them. They are maintained by their respective > >> maintainers. Saeed/etc in this case. > >> > >> It would be inappropriate for your preferences to supersede Saeed's > >> when he is a maintainer of the mlx5_core driver and fwctl. Please try > >> and get Saeed on board with your plan. > >> > >> If the placement under drivers/net makes this confusing then we can > >> certainly change the directory names. > > > > According to how mlx5 driver is structured, and the rest of the advanced > > drivers in the same area are becoming as well, it would make sense to me > > to have mlx5 core in separate core directory, maintained directly by driver > > maintainer: > > drivers/core/mlx5/ > > then each of the protocol auxiliary device lands in appropriate > > subsystem directory. > > +1 > > This is how I have structured our drivers -- core driver for owning the > PCI device and hosting the APIs to communicate with hardware, an aux bus > and then smaller subsystem focused drivers for the aux devices that make > the device usable from different contexts. > > I think we are ready to start upstreaming, but I am waiting to see how > this falls out - to see if our core driver can land in a non-subsystem > specific location (e.g., drivers/core) or if it needs to go with fwctl > as a generic location. Do it right, and push it to drivers/core. I'm aware of at least one driver from huge company (not Nvidia) which is in preparation phase before upstreaming, and will fit nicely into this model. They have separated blocks for PCI, eth, RDMA and GPU. Thanks > > > >
On 05 Mar 20:28, Leon Romanovsky wrote: >On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote: >> On 3/5/25 8:08 AM, Jiri Pirko wrote: >> > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote: >> >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote: >> >>> I thought you were arguing that me opposing the addition was >> >>> "maintainer overreach". As in me telling other parts of the kernel >> >>> what is and isn't allowed. Do I not get a say what gets merged under >> >>> drivers/net/ now? >> >> >> >> The PCI core drivers are a shared resource jointly maintained by all >> >> the subsytems that use them. They are maintained by their respective >> >> maintainers. Saeed/etc in this case. >> >> >> >> It would be inappropriate for your preferences to supersede Saeed's >> >> when he is a maintainer of the mlx5_core driver and fwctl. Please try >> >> and get Saeed on board with your plan. >> >> >> >> If the placement under drivers/net makes this confusing then we can >> >> certainly change the directory names. >> > >> > According to how mlx5 driver is structured, and the rest of the advanced >> > drivers in the same area are becoming as well, it would make sense to me >> > to have mlx5 core in separate core directory, maintained directly by driver >> > maintainer: >> > drivers/core/mlx5/ >> > then each of the protocol auxiliary device lands in appropriate >> > subsystem directory. >> >> +1 >> >> This is how I have structured our drivers -- core driver for owning the >> PCI device and hosting the APIs to communicate with hardware, an aux bus >> and then smaller subsystem focused drivers for the aux devices that make >> the device usable from different contexts. >> >> I think we are ready to start upstreaming, but I am waiting to see how >> this falls out - to see if our core driver can land in a non-subsystem >> specific location (e.g., drivers/core) or if it needs to go with fwctl >> as a generic location. > >Do it right, and push it to drivers/core. I'm aware of at least one >driver from huge company (not Nvidia) which is in preparation phase >before upstreaming, and will fit nicely into this model. > How do you imagine this driver/core structure should look like? Who will be the top dir maintainer? It should be something that is tightly coupled with aux, currently aux is under drivers/base/auxiliary.c I think it should move to drivers/aux/auxiliary.c and device drivers should implement their own aux buses, WH access APIs and probing/init logic under that directory e.g: drivers/aux/mlx5/..
On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > How do you imagine this driver/core structure should look like? Who > will be the top dir maintainer? I would set something like this up more like DRM. Every driver maintainer gets commit rights, some rules about no uAPIs, or at least other acks before merging uAPI. Use the tree for staging shared branches. Driver maintainers with the most commits per cycle does the PR or something like that. There is no subsystem or cross-driver entanglement so there is no real need for gatekeeping. It would be a good opportunity to help more people engage with the kernel process and learn the full maintainer flow. > It should be something that is tightly coupled with aux, currently > aux is under drivers/base/auxiliary.c I think it should move to > drivers/aux/auxiliary.c and device drivers should implement their > own aux buses, WH access APIs and probing/init logic under that > directory e.g: drivers/aux/mlx5/.. That makes sense to me. I would expect everything in this collection to be PCI drivers spawing aux devices. drivers/aux_core/ or something like that, perhaps? Jason
On Wed, 5 Mar 2025 09:32:54 -0400 Jason Gunthorpe wrote: > > I thought you were arguing that me opposing the addition was > > "maintainer overreach". As in me telling other parts of the kernel > > what is and isn't allowed. Do I not get a say what gets merged under > > drivers/net/ now? > > The PCI core drivers are a shared resource jointly maintained by all > the subsytems that use them. They are maintained by their respective > maintainers. Saeed/etc in this case. PCI core driver? You need it for RDMA. Whether you move the code around or spell it backwards, I don't care, as long as vast majority of the users of this **NIC**, who only use TCP/IP do not have fwctl access.
On Wed, Mar 05, 2025 at 07:21:54PM -0400, Jason Gunthorpe wrote: > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > > > How do you imagine this driver/core structure should look like? Who > > will be the top dir maintainer? > > I would set something like this up more like DRM. Every driver > maintainer gets commit rights, some rules about no uAPIs, or at least > other acks before merging uAPI. Use the tree for staging shared > branches. > > Driver maintainers with the most commits per cycle does the PR or > something like that. > > There is no subsystem or cross-driver entanglement so there is no real > need for gatekeeping. Yes, it can be structured like you proposed too or/and combined with my idea https://lore.kernel.org/netdev/20250303150015.GA1926949@unreal/ The most important part is that it needs to be group of maintainers. > > It would be a good opportunity to help more people engage with the > kernel process and learn the full maintainer flow. > > > It should be something that is tightly coupled with aux, currently > > aux is under drivers/base/auxiliary.c I think it should move to > > drivers/aux/auxiliary.c and device drivers should implement their > > own aux buses, WH access APIs and probing/init logic under that > > directory e.g: drivers/aux/mlx5/.. > > That makes sense to me. I would expect everything in this collection > to be PCI drivers spawing aux devices. > > drivers/aux_core/ or something like that, perhaps? I like Saeed's proposal "drivers/aux/", it is more short and catchy. Thanks > > Jason >
On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > >> How do you imagine this driver/core structure should look like? Who >> will be the top dir maintainer? > > I would set something like this up more like DRM. Every driver > maintainer gets commit rights, some rules about no uAPIs, or at least > other acks before merging uAPI. Use the tree for staging shared > branches. why no uapi? Core driver can have knowledge of h/w resources across all use cases. For example, our core driver supports a generid netlink based dump (no set operations; get and dump only so maybe that should be the restriction?) of all objects regardless of how created -- netdev, ib, etc. -- and with much more detail. > > Driver maintainers with the most commits per cycle does the PR or > something like that. > > There is no subsystem or cross-driver entanglement so there is no real > need for gatekeeping. > > It would be a good opportunity to help more people engage with the > kernel process and learn the full maintainer flow. > >> It should be something that is tightly coupled with aux, currently >> aux is under drivers/base/auxiliary.c I think it should move to >> drivers/aux/auxiliary.c and device drivers should implement their >> own aux buses, WH access APIs and probing/init logic under that >> directory e.g: drivers/aux/mlx5/.. > > That makes sense to me. I would expect everything in this collection > to be PCI drivers spawing aux devices. > > drivers/aux_core/ or something like that, perhaps? > drivers/aux_core works for me; removes the 'pci' assumption and makes it clear the real attribute here is use of the aux bus with subsystem specific devices. I am still not clear on how such a branch will work - e.g. We will want multi-vendor review, not just merge the PR and go.
On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: > On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > > > >> How do you imagine this driver/core structure should look like? Who > >> will be the top dir maintainer? > > > > I would set something like this up more like DRM. Every driver > > maintainer gets commit rights, some rules about no uAPIs, or at least > > other acks before merging uAPI. Use the tree for staging shared > > branches. > > why no uapi? Core driver can have knowledge of h/w resources across all > use cases. For example, our core driver supports a generid netlink based > dump (no set operations; get and dump only so maybe that should be the > restriction?) of all objects regardless of how created -- netdev, ib, > etc. -- and with much more detail. Because, we want to make sure that UAPI will be aligned with relevant subsystems without any way to bypass them. Thanks
On 3/11/2025 4:23 AM, David Ahern wrote: > On 3/6/25 12:21 AM, Jason Gunthorpe wrote: >> >>> It should be something that is tightly coupled with aux, currently >>> aux is under drivers/base/auxiliary.c I think it should move to >>> drivers/aux/auxiliary.c and device drivers should implement their >>> own aux buses, WH access APIs and probing/init logic under that >>> directory e.g: drivers/aux/mlx5/.. >> >> That makes sense to me. I would expect everything in this collection >> to be PCI drivers spawing aux devices. >> >> drivers/aux_core/ or something like that, perhaps? >> > > drivers/aux_core works for me; removes the 'pci' assumption and makes it > clear the real attribute here is use of the aux bus with subsystem > specific devices. I am still not clear on how such a branch will work - > e.g. We will want multi-vendor review, not just merge the PR and go. > At the risk of getting too far into the naming-choosing rat hole... note that even "aux_core" has the assumption that the thing is auxiliary_bus oriented, which I don't think is a hard truth. As an example, yes pds_core supports pds_vdpa through an auxiliary_device, but it also supports pds_vfio_pci with no aux dev involved. I could live with drivers/aux_core, but I'd prefer drivers/core. sln
On 3/11/25 2:59 PM, Leon Romanovsky wrote: > On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: >> On 3/6/25 12:21 AM, Jason Gunthorpe wrote: >>> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: >>> >>>> How do you imagine this driver/core structure should look like? Who >>>> will be the top dir maintainer? >>> >>> I would set something like this up more like DRM. Every driver >>> maintainer gets commit rights, some rules about no uAPIs, or at least >>> other acks before merging uAPI. Use the tree for staging shared >>> branches. >> >> why no uapi? Core driver can have knowledge of h/w resources across all >> use cases. For example, our core driver supports a generid netlink based >> dump (no set operations; get and dump only so maybe that should be the >> restriction?) of all objects regardless of how created -- netdev, ib, >> etc. -- and with much more detail. > > Because, we want to make sure that UAPI will be aligned with relevant > subsystems without any way to bypass them. > > Thanks I hope there will be an open mind on get / dump style introspection apis here. Devices can work support and work within limited subsystem APIs and also allow the dumping of full essential and relevant contexts for a device. More specifically, I do not see netdev APIs ever recognizing RDMA concepts like domains and memory regions. For us, everything is relative to a domain and a region - e.g., whether a queue is created for a netdev device or an IB QP both use the same common internal APIs. I would prefer not to use fwctl for something so basic.
On 03/12, David Ahern wrote: > On 3/11/25 2:59 PM, Leon Romanovsky wrote: > > On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote: > >> On 3/6/25 12:21 AM, Jason Gunthorpe wrote: > >>> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote: > >>> > >>>> How do you imagine this driver/core structure should look like? Who > >>>> will be the top dir maintainer? > >>> > >>> I would set something like this up more like DRM. Every driver > >>> maintainer gets commit rights, some rules about no uAPIs, or at least > >>> other acks before merging uAPI. Use the tree for staging shared > >>> branches. > >> > >> why no uapi? Core driver can have knowledge of h/w resources across all > >> use cases. For example, our core driver supports a generid netlink based > >> dump (no set operations; get and dump only so maybe that should be the > >> restriction?) of all objects regardless of how created -- netdev, ib, > >> etc. -- and with much more detail. > > > > Because, we want to make sure that UAPI will be aligned with relevant > > subsystems without any way to bypass them. > > > > Thanks > > I hope there will be an open mind on get / dump style introspection apis > here. Devices can work support and work within limited subsystem APIs > and also allow the dumping of full essential and relevant contexts for a > device. [..] > More specifically, I do not see netdev APIs ever recognizing RDMA > concepts like domains and memory regions. For us, everything is relative > to a domain and a region - e.g., whether a queue is created for a netdev > device or an IB QP both use the same common internal APIs. I would > prefer not to use fwctl for something so basic. What specifically do you mean here by 'memory regions'? Netdev does recognize (as of recently, devmem) the concept of memory regions in the form of dmabufs.
[ We now have the required three drivers on the list so things are looking probable for reaching this merge window. I will work out some shared branches with CXL and get it into linux-next once all three drivers can be assembled and reviews seem to be concluding. There are couple open notes - Greg was interested in a new name, Dan offered auxctl ] fwctl is a new subsystem intended to bring some common rules and order to the growing pattern of exposing a secure FW interface directly to userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a device for datapath operations fwctl is focused on debugging, configuration and provisioning of the device. It will not have the necessary features like interrupt delivery to support a datapath. This concept is similar to the long standing practice in the "HW" RAID space of having a device specific misc device to manage the RAID controller FW. fwctl generalizes this notion of a companion debug and management interface that goes along with a dataplane implemented in an appropriate subsystem. The need for this has reached a critical point as many users are moving to run lockdown enabled kernels. Several existing devices have had long standing tooling for management that relied on /sys/../resource0 or PCI config space access which is not permitted in lockdown. A major point of fwctl is to define and document the rules that a device must follow to expose a lockdown compatible RPC. Based on some discussion fwctl splits the RPCs into four categories FWCTL_RPC_CONFIGURATION FWCTL_RPC_DEBUG_READ_ONLY FWCTL_RPC_DEBUG_WRITE FWCTL_RPC_DEBUG_WRITE_FULL Where the latter two trigger a new TAINT_FWCTL, and the final one requires CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW would be responsible to restrict RPCs to the requested security scope, while the core code handles the tainting and CAP checks. For details see the final patch which introduces the documentation. The CXL FWCTL driver is now in it own series on v7: https://lore.kernel.org/r/20250220194438.2281088-1-dave.jiang@intel.com And a driver for the Pensando DSC (A smart NIC): https://lore.kernel.org/r/20250211234854.52277-1-shannon.nelson@amd.com I've got soft commitments for about 7 drivers in total now. There have been three LWN articles written discussing various aspects of this proposal: https://lwn.net/Articles/955001/ https://lwn.net/Articles/969383/ https://lwn.net/Articles/990802/ A really giant ksummit thread preceding a discussion at the Maintainer Summit: https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/ Several have expressed general support for this concept: AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org NVIDIA Networking Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay. Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com Work is ongoing for userspace, currently the mellanox tool suite has been ported over: https://github.com/Mellanox/mstflint And a more simplified example how to use it: https://github.com/jgunthorpe/mlx5ctl.git This is on github: https://github.com/jgunthorpe/linux/commits/fwctl v5: - Move hunks between patches to make more sense - Rename ucmd_buffer to fwctl_ucmd_buffer - Update comments and commit messages - Copyright to 2025 - Drop bxnt WIP patches - Allow a NULL ops->info - Decode more op codes for mlx5 and the sub-operation for MLX5_CMD_OP_ACCESS_REG/_USER v4: https://patch.msgid.link/r/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com - Rebase to v6.14-rc1 - Fine tune comments and rst documentatin - Adjust cleanup.h usage - remove places that add more ofuscation than value - CXL is back to its own independent series - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit - Fix mlx5ctl_validate_rpc() logic around scope checking - Disable mlx5ctl on SFs v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com - Rebase to v6.11-rc4 - Add a squashed version of David's CXL series as the 2nd driver - Add missing includes - Improve comments based on feedback - Use the kdoc format that puts the member docs inside the struct - Rewrite fwctl_alloc_device() to be clearer - Incorporate all remarks for the documentation v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com - Rebase to v6.10-rc5 - Minor style changes - Follow the style consensus for the guard stuff - Documentation grammer/spelling - Add missed length output for mlx5 get_info - Add two more missed MLX5 CMD's - Collect tags v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com> Cc: Aron Silverton <aron.silverton@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Ahern <dsahern@kernel.org> Cc: Itay Avraham <itayavr@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Jiri Pirko <jiri@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Leonid Bloch <lbloch@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: "Nelson, Shannon" <shannon.nelson@amd.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Jason Gunthorpe (6): fwctl: Add basic structure for a class subsystem with a cdev fwctl: Basic ioctl dispatch for the character device fwctl: FWCTL_INFO to return basic information about the device taint: Add TAINT_FWCTL fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware fwctl: Add documentation Saeed Mahameed (2): fwctl/mlx5: Support for communicating with mlx5 fw mlx5: Create an auxiliary device for fwctl_mlx5 Documentation/admin-guide/tainted-kernels.rst | 5 + Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++ Documentation/userspace-api/fwctl/index.rst | 12 + Documentation/userspace-api/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 1 + MAINTAINERS | 18 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/fwctl/Kconfig | 23 + drivers/fwctl/Makefile | 5 + drivers/fwctl/main.c | 421 ++++++++++++++++++ drivers/fwctl/mlx5/Makefile | 4 + drivers/fwctl/mlx5/main.c | 411 +++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 + include/linux/fwctl.h | 135 ++++++ include/linux/panic.h | 3 +- include/uapi/fwctl/fwctl.h | 139 ++++++ include/uapi/fwctl/mlx5.h | 36 ++ kernel/panic.c | 1 + tools/debugging/kernel-chktaint | 8 + 20 files changed, 1519 insertions(+), 1 deletion(-) create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst create mode 100644 Documentation/userspace-api/fwctl/index.rst create mode 100644 drivers/fwctl/Kconfig create mode 100644 drivers/fwctl/Makefile create mode 100644 drivers/fwctl/main.c create mode 100644 drivers/fwctl/mlx5/Makefile create mode 100644 drivers/fwctl/mlx5/main.c create mode 100644 include/linux/fwctl.h create mode 100644 include/uapi/fwctl/fwctl.h create mode 100644 include/uapi/fwctl/mlx5.h base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b