mbox series

[v5,0/8] Introduce fwctl subystem

Message ID 0-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com (mailing list archive)
Headers show
Series Introduce fwctl subystem | expand

Message

Jason Gunthorpe Feb. 28, 2025, 12:26 a.m. UTC
[
We now have the required three drivers on the list so things are
looking probable for reaching this merge window. I will work out some
shared branches with CXL and get it into linux-next once all three drivers
can be assembled and reviews seem to be concluding.

There are couple open notes
 - Greg was interested in a new name, Dan offered auxctl
]

fwctl is a new subsystem intended to bring some common rules and order to
the growing pattern of exposing a secure FW interface directly to
userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
exposing a device for datapath operations fwctl is focused on debugging,
configuration and provisioning of the device. It will not have the
necessary features like interrupt delivery to support a datapath.

This concept is similar to the long standing practice in the "HW" RAID
space of having a device specific misc device to manage the RAID
controller FW. fwctl generalizes this notion of a companion debug and
management interface that goes along with a dataplane implemented in an
appropriate subsystem.

The need for this has reached a critical point as many users are moving to
run lockdown enabled kernels. Several existing devices have had long
standing tooling for management that relied on /sys/../resource0 or PCI
config space access which is not permitted in lockdown. A major point of
fwctl is to define and document the rules that a device must follow to
expose a lockdown compatible RPC.

Based on some discussion fwctl splits the RPCs into four categories

	FWCTL_RPC_CONFIGURATION
	FWCTL_RPC_DEBUG_READ_ONLY
	FWCTL_RPC_DEBUG_WRITE
	FWCTL_RPC_DEBUG_WRITE_FULL

Where the latter two trigger a new TAINT_FWCTL, and the final one requires
CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
would be responsible to restrict RPCs to the requested security scope,
while the core code handles the tainting and CAP checks.

For details see the final patch which introduces the documentation.

The CXL FWCTL driver is now in it own series on v7:
 https://lore.kernel.org/r/20250220194438.2281088-1-dave.jiang@intel.com
And a driver for the Pensando DSC (A smart NIC):
 https://lore.kernel.org/r/20250211234854.52277-1-shannon.nelson@amd.com

I've got soft commitments for about 7 drivers in total now.

There have been three LWN articles written discussing various aspects of
this proposal:

 https://lwn.net/Articles/955001/
 https://lwn.net/Articles/969383/
 https://lwn.net/Articles/990802/

A really giant ksummit thread preceding a discussion at the Maintainer
Summit:

 https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/

Several have expressed general support for this concept:

 AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
 Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
 Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
 Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
 Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
 NVIDIA Networking
 Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
 Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
 SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com

Work is ongoing for userspace, currently the mellanox tool suite has been
ported over:
  https://github.com/Mellanox/mstflint

And a more simplified example how to use it:
  https://github.com/jgunthorpe/mlx5ctl.git

This is on github: https://github.com/jgunthorpe/linux/commits/fwctl

v5:
 - Move hunks between patches to make more sense
 - Rename ucmd_buffer to fwctl_ucmd_buffer
 - Update comments and commit messages
 - Copyright to 2025
 - Drop bxnt WIP patches
 - Allow a NULL ops->info
 - Decode more op codes for mlx5 and the sub-operation for
   MLX5_CMD_OP_ACCESS_REG/_USER
v4: https://patch.msgid.link/r/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com
 - Rebase to v6.14-rc1
 - Fine tune comments and rst documentatin
 - Adjust cleanup.h usage - remove places that add more ofuscation than
   value
 - CXL is back to its own independent series
 - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
 - Fix mlx5ctl_validate_rpc() logic around scope checking
 - Disable mlx5ctl on SFs
v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
 - Rebase to v6.11-rc4
 - Add a squashed version of David's CXL series as the 2nd driver
 - Add missing includes
 - Improve comments based on feedback
 - Use the kdoc format that puts the member docs inside the struct
 - Rewrite fwctl_alloc_device() to be clearer
 - Incorporate all remarks for the documentation
v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
 - Rebase to v6.10-rc5
 - Minor style changes
 - Follow the style consensus for the guard stuff
 - Documentation grammer/spelling
 - Add missed length output for mlx5 get_info
 - Add two more missed MLX5 CMD's
 - Collect tags
v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com

Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Cc: Aron Silverton <aron.silverton@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Itay Avraham <itayavr@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Leonid Bloch <lbloch@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: "Nelson, Shannon" <shannon.nelson@amd.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (6):
  fwctl: Add basic structure for a class subsystem with a cdev
  fwctl: Basic ioctl dispatch for the character device
  fwctl: FWCTL_INFO to return basic information about the device
  taint: Add TAINT_FWCTL
  fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  fwctl: Add documentation

Saeed Mahameed (2):
  fwctl/mlx5: Support for communicating with mlx5 fw
  mlx5: Create an auxiliary device for fwctl_mlx5

 Documentation/admin-guide/tainted-kernels.rst |   5 +
 Documentation/userspace-api/fwctl/fwctl.rst   | 285 ++++++++++++
 Documentation/userspace-api/fwctl/index.rst   |  12 +
 Documentation/userspace-api/index.rst         |   1 +
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |  18 +
 drivers/Kconfig                               |   2 +
 drivers/Makefile                              |   1 +
 drivers/fwctl/Kconfig                         |  23 +
 drivers/fwctl/Makefile                        |   5 +
 drivers/fwctl/main.c                          | 421 ++++++++++++++++++
 drivers/fwctl/mlx5/Makefile                   |   4 +
 drivers/fwctl/mlx5/main.c                     | 411 +++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |   9 +
 include/linux/fwctl.h                         | 135 ++++++
 include/linux/panic.h                         |   3 +-
 include/uapi/fwctl/fwctl.h                    | 139 ++++++
 include/uapi/fwctl/mlx5.h                     |  36 ++
 kernel/panic.c                                |   1 +
 tools/debugging/kernel-chktaint               |   8 +
 20 files changed, 1519 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
 create mode 100644 Documentation/userspace-api/fwctl/index.rst
 create mode 100644 drivers/fwctl/Kconfig
 create mode 100644 drivers/fwctl/Makefile
 create mode 100644 drivers/fwctl/main.c
 create mode 100644 drivers/fwctl/mlx5/Makefile
 create mode 100644 drivers/fwctl/mlx5/main.c
 create mode 100644 include/linux/fwctl.h
 create mode 100644 include/uapi/fwctl/fwctl.h
 create mode 100644 include/uapi/fwctl/mlx5.h


base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b

Comments

Jakub Kicinski March 4, 2025, 1:53 a.m. UTC | #1
On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote:
> v5:
>  - Move hunks between patches to make more sense
>  - Rename ucmd_buffer to fwctl_ucmd_buffer
>  - Update comments and commit messages
>  - Copyright to 2025
>  - Drop bxnt WIP patches
>  - Allow a NULL ops->info
>  - Decode more op codes for mlx5 and the sub-operation for
>    MLX5_CMD_OP_ACCESS_REG/_USER

Did you address my feedback? I asked for the mlx5 support to only be
enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this
patchset clearly admitted on v4:

  As explained above, netdev doesn't need it

https://lore.kernel.org/all/Z6ZsOMLq7tt3ijX_@x130/

Greg, I've been asking for this interface to be scoped to when RDMA
(/CXL/storage) is in enabled on these NICs since pretty much the first
RFC. Do I need to keep responding?
Jason Gunthorpe March 4, 2025, 2 p.m. UTC | #2
On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote:
> On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote:
> > v5:
> >  - Move hunks between patches to make more sense
> >  - Rename ucmd_buffer to fwctl_ucmd_buffer
> >  - Update comments and commit messages
> >  - Copyright to 2025
> >  - Drop bxnt WIP patches
> >  - Allow a NULL ops->info
> >  - Decode more op codes for mlx5 and the sub-operation for
> >    MLX5_CMD_OP_ACCESS_REG/_USER
> 
> Did you address my feedback? I asked for the mlx5 support to only be
> enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this
> patchset clearly admitted on v4:

I never agreed to that formulation. I suggested that perhaps runtime
configurations where netdev is the only driver using the HW could be
disabled (ie a netdev exclusion, not a rdma inclusion).

However, there is not agreement on this from Saeed who is responsible
for mlx5:

 https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/

I also surveyed other stakeholders on a netdev-exclusion proposal and
did not hear support. You need to convince people this is a good idea.

However, I would agree fwctl should not accept any fwctl drivers for
simple networking devices. However, "smart nics" and RDMA capable
devices are in-scope.

I could also probably agree to using kconfig to disable fwctl drivers
on kernels that statically compile out rdma, vdpa, nvme and related,
though I agree with Saeed that it seems to lack technical merit.

> Greg, I've been asking for this interface to be scoped to when RDMA
> (/CXL/storage) is in enabled on these NICs since pretty much the first
> RFC. 

You only started asking for this more limited approach in v4. All your
previous arguments were that fwctl should be entirely killed for any
networking HW.

Jason
Saeed Mahameed March 4, 2025, 5:59 p.m. UTC | #3
On 04 Mar 10:00, Jason Gunthorpe wrote:
>On Mon, Mar 03, 2025 at 05:53:58PM -0800, Jakub Kicinski wrote:
>> On Thu, 27 Feb 2025 20:26:28 -0400 Jason Gunthorpe wrote:
>> > v5:
>> >  - Move hunks between patches to make more sense
>> >  - Rename ucmd_buffer to fwctl_ucmd_buffer
>> >  - Update comments and commit messages
>> >  - Copyright to 2025
>> >  - Drop bxnt WIP patches
>> >  - Allow a NULL ops->info
>> >  - Decode more op codes for mlx5 and the sub-operation for
>> >    MLX5_CMD_OP_ACCESS_REG/_USER
>>
>> Did you address my feedback? I asked for the mlx5 support to only be
>> enabled in RDMA is in use. Saeed who wrote the mlx5 parts of this
>> patchset clearly admitted on v4:

When I said fwctl is not needed for netdev, I meant that it will not be used
for netdev object configuration and as I said before FW will block that
anyways. fwctl in mlx5 is not only for RDMA, So I don't know how to address
your comment. 

Not to mention that fwctl is a very great tool to debug netdev problems.

>
>I never agreed to that formulation. I suggested that perhaps runtime
>configurations where netdev is the only driver using the HW could be
>disabled (ie a netdev exclusion, not a rdma inclusion).
>
>However, there is not agreement on this from Saeed who is responsible
>for mlx5:
>
> https://lore.kernel.org/all/Z7z0ADkimCkhr7Xz@x130/
>
>I also surveyed other stakeholders on a netdev-exclusion proposal and
>did not hear support. You need to convince people this is a good idea.
>
>However, I would agree fwctl should not accept any fwctl drivers for
>simple networking devices. However, "smart nics" and RDMA capable
>devices are in-scope.
>

>I could also probably agree to using kconfig to disable fwctl drivers
>on kernels that statically compile out rdma, vdpa, nvme and related,
>though I agree with Saeed that it seems to lack technical merit.
>
>> Greg, I've been asking for this interface to be scoped to when RDMA
>> (/CXL/storage) is in enabled on these NICs since pretty much the first
>> RFC.
>
>You only started asking for this more limited approach in v4. All your
>previous arguments were that fwctl should be entirely killed for any
>networking HW.
>
>Jason
>
Jakub Kicinski March 5, 2025, 12:42 a.m. UTC | #4
On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
> I never agreed to that formulation. I suggested that perhaps runtime
> configurations where netdev is the only driver using the HW could be
> disabled (ie a netdev exclusion, not a rdma inclusion).

I thought you were arguing that me opposing the addition was
"maintainer overreach". As in me telling other parts of the kernel
what is and isn't allowed. Do I not get a say what gets merged under
drivers/net/ now?

> However, I would agree fwctl should not accept any fwctl drivers for
> simple networking devices. However, "smart nics" and RDMA capable
> devices are in-scope.

Please stop with the fake technical arguments.
I worked on SmartNICs for a decade.
Jason Gunthorpe March 5, 2025, 1:32 p.m. UTC | #5
On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
> > I never agreed to that formulation. I suggested that perhaps runtime
> > configurations where netdev is the only driver using the HW could be
> > disabled (ie a netdev exclusion, not a rdma inclusion).
> 
> I thought you were arguing that me opposing the addition was
> "maintainer overreach". As in me telling other parts of the kernel
> what is and isn't allowed. Do I not get a say what gets merged under
> drivers/net/ now?

The PCI core drivers are a shared resource jointly maintained by all
the subsytems that use them. They are maintained by their respective
maintainers. Saeed/etc in this case.

It would be inappropriate for your preferences to supersede Saeed's
when he is a maintainer of the mlx5_core driver and fwctl. Please try
and get Saeed on board with your plan.

If the placement under drivers/net makes this confusing then we can
certainly change the directory names.

Jason
Leon Romanovsky March 5, 2025, 1:43 p.m. UTC | #6
On Wed, Mar 05, 2025 at 09:32:54AM -0400, Jason Gunthorpe wrote:
> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
> > On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
> > > I never agreed to that formulation. I suggested that perhaps runtime
> > > configurations where netdev is the only driver using the HW could be
> > > disabled (ie a netdev exclusion, not a rdma inclusion).
> > 
> > I thought you were arguing that me opposing the addition was
> > "maintainer overreach". As in me telling other parts of the kernel
> > what is and isn't allowed. Do I not get a say what gets merged under
> > drivers/net/ now?
> 
> The PCI core drivers are a shared resource jointly maintained by all
> the subsytems that use them. They are maintained by their respective
> maintainers. Saeed/etc in this case.
> 
> It would be inappropriate for your preferences to supersede Saeed's
> when he is a maintainer of the mlx5_core driver and fwctl. Please try
> and get Saeed on board with your plan.
> 
> If the placement under drivers/net makes this confusing then we can
> certainly change the directory names.

I suggested it before to Jakub and it looks like he is not opposing to it.
https://lore.kernel.org/all/20250211075553.GF17863@unreal/

Thanks

> 
> Jason
>
Jiri Pirko March 5, 2025, 3:08 p.m. UTC | #7
Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
>On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
>> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
>> > I never agreed to that formulation. I suggested that perhaps runtime
>> > configurations where netdev is the only driver using the HW could be
>> > disabled (ie a netdev exclusion, not a rdma inclusion).
>> 
>> I thought you were arguing that me opposing the addition was
>> "maintainer overreach". As in me telling other parts of the kernel
>> what is and isn't allowed. Do I not get a say what gets merged under
>> drivers/net/ now?
>
>The PCI core drivers are a shared resource jointly maintained by all
>the subsytems that use them. They are maintained by their respective
>maintainers. Saeed/etc in this case.
>
>It would be inappropriate for your preferences to supersede Saeed's
>when he is a maintainer of the mlx5_core driver and fwctl. Please try
>and get Saeed on board with your plan.
>
>If the placement under drivers/net makes this confusing then we can
>certainly change the directory names.

According to how mlx5 driver is structured, and the rest of the advanced
drivers in the same area are becoming as well, it would make sense to me
to have mlx5 core in separate core directory, maintained directly by driver
maintainer:
drivers/core/mlx5/
then each of the protocol auxiliary device lands in appropriate
subsystem directory.
It's not always simple to find clear cut though. Like for example
devlink and representors code related to eswitch orchestration.

The benefit would be that lots of core-related non-netdev patch-trafic
would go directly, which would ease the netdev maintainership burden and
would make things more flexible. This per-driver-serialization for
patchset processing bottleneck would become much more bareable.
Leon Romanovsky March 5, 2025, 3:22 p.m. UTC | #8
On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote:
> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
> >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
> >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
> >> > I never agreed to that formulation. I suggested that perhaps runtime
> >> > configurations where netdev is the only driver using the HW could be
> >> > disabled (ie a netdev exclusion, not a rdma inclusion).
> >> 
> >> I thought you were arguing that me opposing the addition was
> >> "maintainer overreach". As in me telling other parts of the kernel
> >> what is and isn't allowed. Do I not get a say what gets merged under
> >> drivers/net/ now?
> >
> >The PCI core drivers are a shared resource jointly maintained by all
> >the subsytems that use them. They are maintained by their respective
> >maintainers. Saeed/etc in this case.
> >
> >It would be inappropriate for your preferences to supersede Saeed's
> >when he is a maintainer of the mlx5_core driver and fwctl. Please try
> >and get Saeed on board with your plan.
> >
> >If the placement under drivers/net makes this confusing then we can
> >certainly change the directory names.
> 
> According to how mlx5 driver is structured, and the rest of the advanced
> drivers in the same area are becoming as well, it would make sense to me
> to have mlx5 core in separate core directory, maintained directly by driver
> maintainer:
> drivers/core/mlx5/
> then each of the protocol auxiliary device lands in appropriate
> subsystem directory.

In my vision, the write access to that drivers/core/ will be given to all
relevant subsystem maintainers, so it will operate like shared branch, but
foe everyone.

It means that series for netdev that changes mlx5_core and netdev code
will be sent to netdev and applied by netdev maintainers. In similar
way, series which targets RDMA will be handled by RDMA crew.

It will allow us to make sure that every piece of code in shared
repository is actually used.

Thanks
Jiri Pirko March 5, 2025, 3:56 p.m. UTC | #9
Wed, Mar 05, 2025 at 04:22:46PM +0100, leon@kernel.org wrote:
>On Wed, Mar 05, 2025 at 04:08:19PM +0100, Jiri Pirko wrote:
>> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
>> >On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
>> >> On Tue, 4 Mar 2025 10:00:36 -0400 Jason Gunthorpe wrote:
>> >> > I never agreed to that formulation. I suggested that perhaps runtime
>> >> > configurations where netdev is the only driver using the HW could be
>> >> > disabled (ie a netdev exclusion, not a rdma inclusion).
>> >> 
>> >> I thought you were arguing that me opposing the addition was
>> >> "maintainer overreach". As in me telling other parts of the kernel
>> >> what is and isn't allowed. Do I not get a say what gets merged under
>> >> drivers/net/ now?
>> >
>> >The PCI core drivers are a shared resource jointly maintained by all
>> >the subsytems that use them. They are maintained by their respective
>> >maintainers. Saeed/etc in this case.
>> >
>> >It would be inappropriate for your preferences to supersede Saeed's
>> >when he is a maintainer of the mlx5_core driver and fwctl. Please try
>> >and get Saeed on board with your plan.
>> >
>> >If the placement under drivers/net makes this confusing then we can
>> >certainly change the directory names.
>> 
>> According to how mlx5 driver is structured, and the rest of the advanced
>> drivers in the same area are becoming as well, it would make sense to me
>> to have mlx5 core in separate core directory, maintained directly by driver
>> maintainer:
>> drivers/core/mlx5/
>> then each of the protocol auxiliary device lands in appropriate
>> subsystem directory.
>
>In my vision, the write access to that drivers/core/ will be given to all
>relevant subsystem maintainers, so it will operate like shared branch, but
>foe everyone.
>
>It means that series for netdev that changes mlx5_core and netdev code
>will be sent to netdev and applied by netdev maintainers. In similar
>way, series which targets RDMA will be handled by RDMA crew.
>
>It will allow us to make sure that every piece of code in shared
>repository is actually used.

Makes perfect sense to me.
David Ahern March 5, 2025, 6:17 p.m. UTC | #10
On 3/5/25 8:08 AM, Jiri Pirko wrote:
> Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
>> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
>>> I thought you were arguing that me opposing the addition was
>>> "maintainer overreach". As in me telling other parts of the kernel
>>> what is and isn't allowed. Do I not get a say what gets merged under
>>> drivers/net/ now?
>>
>> The PCI core drivers are a shared resource jointly maintained by all
>> the subsytems that use them. They are maintained by their respective
>> maintainers. Saeed/etc in this case.
>>
>> It would be inappropriate for your preferences to supersede Saeed's
>> when he is a maintainer of the mlx5_core driver and fwctl. Please try
>> and get Saeed on board with your plan.
>>
>> If the placement under drivers/net makes this confusing then we can
>> certainly change the directory names.
> 
> According to how mlx5 driver is structured, and the rest of the advanced
> drivers in the same area are becoming as well, it would make sense to me
> to have mlx5 core in separate core directory, maintained directly by driver
> maintainer:
> drivers/core/mlx5/
> then each of the protocol auxiliary device lands in appropriate
> subsystem directory.

+1

This is how I have structured our drivers -- core driver for owning the
PCI device and hosting the APIs to communicate with hardware, an aux bus
and then smaller subsystem focused drivers for the aux devices that make
the device usable from different contexts.

I think we are ready to start upstreaming, but I am waiting to see how
this falls out - to see if our core driver can land in a non-subsystem
specific location (e.g., drivers/core) or if it needs to go with fwctl
as a generic location.
Leon Romanovsky March 5, 2025, 6:28 p.m. UTC | #11
On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote:
> On 3/5/25 8:08 AM, Jiri Pirko wrote:
> > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
> >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
> >>> I thought you were arguing that me opposing the addition was
> >>> "maintainer overreach". As in me telling other parts of the kernel
> >>> what is and isn't allowed. Do I not get a say what gets merged under
> >>> drivers/net/ now?
> >>
> >> The PCI core drivers are a shared resource jointly maintained by all
> >> the subsytems that use them. They are maintained by their respective
> >> maintainers. Saeed/etc in this case.
> >>
> >> It would be inappropriate for your preferences to supersede Saeed's
> >> when he is a maintainer of the mlx5_core driver and fwctl. Please try
> >> and get Saeed on board with your plan.
> >>
> >> If the placement under drivers/net makes this confusing then we can
> >> certainly change the directory names.
> > 
> > According to how mlx5 driver is structured, and the rest of the advanced
> > drivers in the same area are becoming as well, it would make sense to me
> > to have mlx5 core in separate core directory, maintained directly by driver
> > maintainer:
> > drivers/core/mlx5/
> > then each of the protocol auxiliary device lands in appropriate
> > subsystem directory.
> 
> +1
> 
> This is how I have structured our drivers -- core driver for owning the
> PCI device and hosting the APIs to communicate with hardware, an aux bus
> and then smaller subsystem focused drivers for the aux devices that make
> the device usable from different contexts.
> 
> I think we are ready to start upstreaming, but I am waiting to see how
> this falls out - to see if our core driver can land in a non-subsystem
> specific location (e.g., drivers/core) or if it needs to go with fwctl
> as a generic location.

Do it right, and push it to drivers/core. I'm aware of at least one
driver from huge company (not Nvidia) which is in preparation phase
before upstreaming, and will fit nicely into this model.

They have separated blocks for PCI, eth, RDMA and GPU.

Thanks

> 
> 
> 
>
Saeed Mahameed March 5, 2025, 8:41 p.m. UTC | #12
On 05 Mar 20:28, Leon Romanovsky wrote:
>On Wed, Mar 05, 2025 at 11:17:19AM -0700, David Ahern wrote:
>> On 3/5/25 8:08 AM, Jiri Pirko wrote:
>> > Wed, Mar 05, 2025 at 02:32:54PM +0100, jgg@nvidia.com wrote:
>> >> On Tue, Mar 04, 2025 at 04:42:03PM -0800, Jakub Kicinski wrote:
>> >>> I thought you were arguing that me opposing the addition was
>> >>> "maintainer overreach". As in me telling other parts of the kernel
>> >>> what is and isn't allowed. Do I not get a say what gets merged under
>> >>> drivers/net/ now?
>> >>
>> >> The PCI core drivers are a shared resource jointly maintained by all
>> >> the subsytems that use them. They are maintained by their respective
>> >> maintainers. Saeed/etc in this case.
>> >>
>> >> It would be inappropriate for your preferences to supersede Saeed's
>> >> when he is a maintainer of the mlx5_core driver and fwctl. Please try
>> >> and get Saeed on board with your plan.
>> >>
>> >> If the placement under drivers/net makes this confusing then we can
>> >> certainly change the directory names.
>> >
>> > According to how mlx5 driver is structured, and the rest of the advanced
>> > drivers in the same area are becoming as well, it would make sense to me
>> > to have mlx5 core in separate core directory, maintained directly by driver
>> > maintainer:
>> > drivers/core/mlx5/
>> > then each of the protocol auxiliary device lands in appropriate
>> > subsystem directory.
>>
>> +1
>>
>> This is how I have structured our drivers -- core driver for owning the
>> PCI device and hosting the APIs to communicate with hardware, an aux bus
>> and then smaller subsystem focused drivers for the aux devices that make
>> the device usable from different contexts.
>>
>> I think we are ready to start upstreaming, but I am waiting to see how
>> this falls out - to see if our core driver can land in a non-subsystem
>> specific location (e.g., drivers/core) or if it needs to go with fwctl
>> as a generic location.
>
>Do it right, and push it to drivers/core. I'm aware of at least one
>driver from huge company (not Nvidia) which is in preparation phase
>before upstreaming, and will fit nicely into this model.
>

How do you imagine this driver/core structure should look like? Who will be
the top dir maintainer? It should be something that is tightly coupled with
aux, currently aux is under drivers/base/auxiliary.c I think it should move 
to drivers/aux/auxiliary.c and device drivers should implement their own
aux buses, WH access APIs and probing/init logic under that directory
e.g: drivers/aux/mlx5/..
Jason Gunthorpe March 5, 2025, 11:21 p.m. UTC | #13
On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote:

> How do you imagine this driver/core structure should look like? Who
> will be the top dir maintainer?

I would set something like this up more like DRM. Every driver
maintainer gets commit rights, some rules about no uAPIs, or at least
other acks before merging uAPI. Use the tree for staging shared
branches.

Driver maintainers with the most commits per cycle does the PR or
something like that.

There is no subsystem or cross-driver entanglement so there is no real
need for gatekeeping.

It would be a good opportunity to help more people engage with the
kernel process and learn the full maintainer flow.

> It should be something that is tightly coupled with aux, currently
> aux is under drivers/base/auxiliary.c I think it should move to
> drivers/aux/auxiliary.c and device drivers should implement their
> own aux buses, WH access APIs and probing/init logic under that
> directory e.g: drivers/aux/mlx5/..

That makes sense to me. I would expect everything in this collection
to be PCI drivers spawing aux devices.

drivers/aux_core/ or something like that, perhaps?

Jason
Jakub Kicinski March 6, 2025, 2:16 a.m. UTC | #14
On Wed, 5 Mar 2025 09:32:54 -0400 Jason Gunthorpe wrote:
> > I thought you were arguing that me opposing the addition was
> > "maintainer overreach". As in me telling other parts of the kernel
> > what is and isn't allowed. Do I not get a say what gets merged under
> > drivers/net/ now?  
> 
> The PCI core drivers are a shared resource jointly maintained by all
> the subsytems that use them. They are maintained by their respective
> maintainers. Saeed/etc in this case.

PCI core driver? You need it for RDMA.

Whether you move the code around or spell it backwards, I don't care,
as long as vast majority of the users of this **NIC**, who only use
TCP/IP do not have fwctl access.
Leon Romanovsky March 6, 2025, 7:29 a.m. UTC | #15
On Wed, Mar 05, 2025 at 07:21:54PM -0400, Jason Gunthorpe wrote:
> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote:
> 
> > How do you imagine this driver/core structure should look like? Who
> > will be the top dir maintainer?
> 
> I would set something like this up more like DRM. Every driver
> maintainer gets commit rights, some rules about no uAPIs, or at least
> other acks before merging uAPI. Use the tree for staging shared
> branches.
> 
> Driver maintainers with the most commits per cycle does the PR or
> something like that.
> 
> There is no subsystem or cross-driver entanglement so there is no real
> need for gatekeeping.

Yes, it can be structured like you proposed too or/and combined with my
idea https://lore.kernel.org/netdev/20250303150015.GA1926949@unreal/

The most important part is that it needs to be group of maintainers.

> 
> It would be a good opportunity to help more people engage with the
> kernel process and learn the full maintainer flow.
> 
> > It should be something that is tightly coupled with aux, currently
> > aux is under drivers/base/auxiliary.c I think it should move to
> > drivers/aux/auxiliary.c and device drivers should implement their
> > own aux buses, WH access APIs and probing/init logic under that
> > directory e.g: drivers/aux/mlx5/..
> 
> That makes sense to me. I would expect everything in this collection
> to be PCI drivers spawing aux devices.
> 
> drivers/aux_core/ or something like that, perhaps?

I like Saeed's proposal "drivers/aux/", it is more short and catchy.

Thanks

> 
> Jason
>
David Ahern March 11, 2025, 11:23 a.m. UTC | #16
On 3/6/25 12:21 AM, Jason Gunthorpe wrote:
> On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote:
> 
>> How do you imagine this driver/core structure should look like? Who
>> will be the top dir maintainer?
> 
> I would set something like this up more like DRM. Every driver
> maintainer gets commit rights, some rules about no uAPIs, or at least
> other acks before merging uAPI. Use the tree for staging shared
> branches.

why no uapi? Core driver can have knowledge of h/w resources across all
use cases. For example, our core driver supports a generid netlink based
dump (no set operations; get and dump only so maybe that should be the
restriction?) of all objects regardless of how created -- netdev, ib,
etc. -- and with much more detail.

> 
> Driver maintainers with the most commits per cycle does the PR or
> something like that.
> 
> There is no subsystem or cross-driver entanglement so there is no real
> need for gatekeeping.
> 
> It would be a good opportunity to help more people engage with the
> kernel process and learn the full maintainer flow.
> 
>> It should be something that is tightly coupled with aux, currently
>> aux is under drivers/base/auxiliary.c I think it should move to
>> drivers/aux/auxiliary.c and device drivers should implement their
>> own aux buses, WH access APIs and probing/init logic under that
>> directory e.g: drivers/aux/mlx5/..
> 
> That makes sense to me. I would expect everything in this collection
> to be PCI drivers spawing aux devices.
> 
> drivers/aux_core/ or something like that, perhaps?
> 

drivers/aux_core works for me; removes the 'pci' assumption and makes it
clear the real attribute here is use of the aux bus with subsystem
specific devices. I am still not clear on how such a branch will work -
e.g. We will want multi-vendor review, not just merge the PR and go.
Leon Romanovsky March 11, 2025, 1:59 p.m. UTC | #17
On Tue, Mar 11, 2025 at 12:23:19PM +0100, David Ahern wrote:
> On 3/6/25 12:21 AM, Jason Gunthorpe wrote:
> > On Wed, Mar 05, 2025 at 12:41:35PM -0800, Saeed Mahameed wrote:
> > 
> >> How do you imagine this driver/core structure should look like? Who
> >> will be the top dir maintainer?
> > 
> > I would set something like this up more like DRM. Every driver
> > maintainer gets commit rights, some rules about no uAPIs, or at least
> > other acks before merging uAPI. Use the tree for staging shared
> > branches.
> 
> why no uapi? Core driver can have knowledge of h/w resources across all
> use cases. For example, our core driver supports a generid netlink based
> dump (no set operations; get and dump only so maybe that should be the
> restriction?) of all objects regardless of how created -- netdev, ib,
> etc. -- and with much more detail.

Because, we want to make sure that UAPI will be aligned with relevant
subsystems without any way to bypass them.

Thanks
Nelson, Shannon March 11, 2025, 2:27 p.m. UTC | #18
On 3/11/2025 4:23 AM, David Ahern wrote:
> On 3/6/25 12:21 AM, Jason Gunthorpe wrote:

>>
>>> It should be something that is tightly coupled with aux, currently
>>> aux is under drivers/base/auxiliary.c I think it should move to
>>> drivers/aux/auxiliary.c and device drivers should implement their
>>> own aux buses, WH access APIs and probing/init logic under that
>>> directory e.g: drivers/aux/mlx5/..
>>
>> That makes sense to me. I would expect everything in this collection
>> to be PCI drivers spawing aux devices.
>>
>> drivers/aux_core/ or something like that, perhaps?
>>
> 
> drivers/aux_core works for me; removes the 'pci' assumption and makes it
> clear the real attribute here is use of the aux bus with subsystem
> specific devices. I am still not clear on how such a branch will work -
> e.g. We will want multi-vendor review, not just merge the PR and go.
> 

At the risk of getting too far into the naming-choosing rat hole... note 
that even "aux_core" has the assumption that the thing is auxiliary_bus 
oriented, which I don't think is a hard truth.  As an example, yes 
pds_core supports pds_vdpa through an auxiliary_device, but it also 
supports pds_vfio_pci with no aux dev involved.

I could live with drivers/aux_core, but I'd prefer drivers/core.

sln