mbox series

[net-next,00/11] mlxsw: extend line card model by devices and info

Message ID 20220425034431.3161260-1-idosch@nvidia.com (mailing list archive)
Headers show
Series mlxsw: extend line card model by devices and info | expand

Message

Ido Schimmel April 25, 2022, 3:44 a.m. UTC
Jiri says:

This patchset is extending the line card model by three items:
1) line card devices
2) line card info
3) line card device info

First three patches are introducing the necessary changes in devlink
core.

Then, all three extensions are implemented in mlxsw alongside with
selftest.

Examples:
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8 state active type 16x100G
    supported_types:
      16x100G
    devices:
      device 0
      device 1
      device 2
      device 3
$ devlink lc info pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8
    versions:
        fixed:
          hw.revision 0
        running:
          ini.version 4
    devices:
      device 0
        versions:
            running:
              fw 19.2010.1310
      device 1
        versions:
            running:
              fw 19.2010.1310
      device 2
        versions:
            running:
              fw 19.2010.1310
      device 3
        versions:
            running:
              fw 19.2010.1310

Note that device FW flashing is going to be implemented in the follow-up
patchset.

Jiri Pirko (11):
  devlink: introduce line card devices support
  devlink: introduce line card info get message
  devlink: introduce line card device info infrastructure
  mlxsw: reg: Extend MDDQ by device_info
  mlxsw: core_linecards: Probe provisioned line cards for devices and
    attach them
  selftests: mlxsw: Check devices on provisioned line card
  mlxsw: core_linecards: Expose HW revision and INI version
  selftests: mlxsw: Check line card info on provisioned line card
  mlxsw: reg: Extend MDDQ device_info by FW version fields
  mlxsw: core_linecards: Expose device FW version over device info
  selftests: mlxsw: Check device info on activated line card

 .../networking/devlink/devlink-linecard.rst   |   4 +
 Documentation/networking/devlink/mlxsw.rst    |  33 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |   1 +
 .../ethernet/mellanox/mlxsw/core_linecards.c  | 237 +++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  87 ++++-
 include/net/devlink.h                         |  18 +-
 include/uapi/linux/devlink.h                  |   5 +
 net/core/devlink.c                            | 303 +++++++++++++++++-
 .../drivers/net/mlxsw/devlink_linecard.sh     |  61 ++++
 9 files changed, 739 insertions(+), 10 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org April 25, 2022, 9:50 a.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Mon, 25 Apr 2022 06:44:20 +0300 you wrote:
> Jiri says:
> 
> This patchset is extending the line card model by three items:
> 1) line card devices
> 2) line card info
> 3) line card device info
> 
> [...]

Here is the summary with links:
  - [net-next,01/11] devlink: introduce line card devices support
    https://git.kernel.org/netdev/net-next/c/8d92e4fbcf0f
  - [net-next,02/11] devlink: introduce line card info get message
    https://git.kernel.org/netdev/net-next/c/276910aecc6a
  - [net-next,03/11] devlink: introduce line card device info infrastructure
    https://git.kernel.org/netdev/net-next/c/28b2d1f1ac41
  - [net-next,04/11] mlxsw: reg: Extend MDDQ by device_info
    https://git.kernel.org/netdev/net-next/c/798e2df5067c
  - [net-next,05/11] mlxsw: core_linecards: Probe provisioned line cards for devices and attach them
    https://git.kernel.org/netdev/net-next/c/8e2e10f65112
  - [net-next,06/11] selftests: mlxsw: Check devices on provisioned line card
    https://git.kernel.org/netdev/net-next/c/5e2229891825
  - [net-next,07/11] mlxsw: core_linecards: Expose HW revision and INI version
    https://git.kernel.org/netdev/net-next/c/3b37130f4855
  - [net-next,08/11] selftests: mlxsw: Check line card info on provisioned line card
    https://git.kernel.org/netdev/net-next/c/08682c9e58cd
  - [net-next,09/11] mlxsw: reg: Extend MDDQ device_info by FW version fields
    https://git.kernel.org/netdev/net-next/c/c38e9bf33812
  - [net-next,10/11] mlxsw: core_linecards: Expose device FW version over device info
    https://git.kernel.org/netdev/net-next/c/e932b4bdbd7c
  - [net-next,11/11] selftests: mlxsw: Check device info on activated line card
    https://git.kernel.org/netdev/net-next/c/002defd576a3

You are awesome, thank you!
Jakub Kicinski April 25, 2022, 4 p.m. UTC | #2
On Mon, 25 Apr 2022 06:44:20 +0300 Ido Schimmel wrote:
> This patchset is extending the line card model by three items:
> 1) line card devices
> 2) line card info
> 3) line card device info
> 
> First three patches are introducing the necessary changes in devlink
> core.
> 
> Then, all three extensions are implemented in mlxsw alongside with
> selftest.

:/ what is a line card device? You must provide document what you're
doing, this:

 .../networking/devlink/devlink-linecard.rst   |   4 +

is not enough.

How many operations and attributes are you going to copy&paste?

Is linking devlink instances into a hierarchy a better approach?

Would you mind if I revert this? I don't understand why the line card
patches are applied in 6h on Sunday night, especially that RFCv1 had
quite a long discussion. But really any uAPI additions should warrant
longer review time, IMHO.
Ido Schimmel April 25, 2022, 7:39 p.m. UTC | #3
On Mon, Apr 25, 2022 at 09:00:21AM -0700, Jakub Kicinski wrote:
> On Mon, 25 Apr 2022 06:44:20 +0300 Ido Schimmel wrote:
> > This patchset is extending the line card model by three items:
> > 1) line card devices
> > 2) line card info
> > 3) line card device info
> > 
> > First three patches are introducing the necessary changes in devlink
> > core.
> > 
> > Then, all three extensions are implemented in mlxsw alongside with
> > selftest.
> 
> :/ what is a line card device? You must provide document what you're
> doing, this:
> 
>  .../networking/devlink/devlink-linecard.rst   |   4 +
> 
> is not enough.
> 
> How many operations and attributes are you going to copy&paste?
> 
> Is linking devlink instances into a hierarchy a better approach?

In this particular case, these devices are gearboxes. They are running
their own firmware and we want user space to be able to query and update
the running firmware version.

The idea (implemented in the next patchset) is to let these devices
expose their own "component name", which can then be plugged into the
existing flash command:

    $ devlink lc show pci/0000:01:00.0 lc 8
    pci/0000:01:00.0:
      lc 8 state active type 16x100G
        supported_types:
           16x100G
        devices:
          device 0 flashable true component lc8_dev0
          device 1 flashable false
          device 2 flashable false
          device 3 flashable false
    $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0

Registering a separate devlink instance for these devices sounds like an
overkill to me. If you are not OK with a separate command (e.g.,
DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
also an option. We discussed this during internal review, but felt that
the current approach is cleaner.

> Would you mind if I revert this?

I can't stop you, but keep in mind that it's already late here and that
tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
be available to continue this discussion tomorrow morning, so probably
best to wait for his feedback.
Jakub Kicinski April 25, 2022, 7:52 p.m. UTC | #4
On Mon, 25 Apr 2022 22:39:57 +0300 Ido Schimmel wrote:
> > :/ what is a line card device? You must provide document what you're
> > doing, this:
> > 
> >  .../networking/devlink/devlink-linecard.rst   |   4 +
> > 
> > is not enough.
> > 
> > How many operations and attributes are you going to copy&paste?
> > 
> > Is linking devlink instances into a hierarchy a better approach?  
> 
> In this particular case, these devices are gearboxes. They are running
> their own firmware and we want user space to be able to query and update
> the running firmware version.

Nothing too special, then, we don't create "devices" for every
component of the system which can have a separate FW. That's where
"components" are intended to be used..

> The idea (implemented in the next patchset) is to let these devices
> expose their own "component name", which can then be plugged into the
> existing flash command:
> 
>     $ devlink lc show pci/0000:01:00.0 lc 8
>     pci/0000:01:00.0:
>       lc 8 state active type 16x100G
>         supported_types:
>            16x100G
>         devices:
>           device 0 flashable true component lc8_dev0
>           device 1 flashable false
>           device 2 flashable false
>           device 3 flashable false
>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0

IDK if it's just me or this assumes deep knowledge of the system.
I don't understand why we need to list devices 1-3 at all. And they
don't even have names. No information is exposed. 

There are many components on any networking device, including plenty
40G-R4 -> 25G-R1 gearboxes out there.

> Registering a separate devlink instance for these devices sounds like an
> overkill to me. If you are not OK with a separate command (e.g.,
> DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
> also an option. We discussed this during internal review, but felt that
> the current approach is cleaner.

I don't know what you have queued, so if you don't need a full devlink
instance (IOW line cards won't need more individual config) that's fine.
For just FW flashing you can list many devices and update the
components... no need to introduce new objects or uAPI.

> > Would you mind if I revert this?  
> 
> I can't stop you, but keep in mind that it's already late here and that
> tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
> be available to continue this discussion tomorrow morning, so probably
> best to wait for his feedback.

Sure, no rush.
Jiri Pirko April 26, 2022, 6:47 a.m. UTC | #5
Mon, Apr 25, 2022 at 09:39:57PM CEST, idosch@idosch.org wrote:
>On Mon, Apr 25, 2022 at 09:00:21AM -0700, Jakub Kicinski wrote:
>> On Mon, 25 Apr 2022 06:44:20 +0300 Ido Schimmel wrote:
>> > This patchset is extending the line card model by three items:
>> > 1) line card devices
>> > 2) line card info
>> > 3) line card device info
>> > 
>> > First three patches are introducing the necessary changes in devlink
>> > core.
>> > 
>> > Then, all three extensions are implemented in mlxsw alongside with
>> > selftest.
>> 
>> :/ what is a line card device? You must provide document what you're
>> doing, this:
>> 
>>  .../networking/devlink/devlink-linecard.rst   |   4 +
>> 
>> is not enough.
>> 
>> How many operations and attributes are you going to copy&paste?
>> 
>> Is linking devlink instances into a hierarchy a better approach?
>
>In this particular case, these devices are gearboxes. They are running
>their own firmware and we want user space to be able to query and update
>the running firmware version.
>
>The idea (implemented in the next patchset) is to let these devices
>expose their own "component name", which can then be plugged into the
>existing flash command:
>
>    $ devlink lc show pci/0000:01:00.0 lc 8
>    pci/0000:01:00.0:
>      lc 8 state active type 16x100G
>        supported_types:
>           16x100G
>        devices:
>          device 0 flashable true component lc8_dev0
>          device 1 flashable false
>          device 2 flashable false
>          device 3 flashable false
>    $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>
>Registering a separate devlink instance for these devices sounds like an

It is not a separate devlink device, not removetely. A devlink device is
attached to some bus on the host, it contains entities like netdevices,
etc.

Line card devices, on contrary, are accessible over ASIC FW interface,
they reside on line cards. ASIC FW is using build-in SDK to communicate
with them. There is really nothing to expose, except for the face they
are there, with some FW version and later on (follow-up patchset) to be
able to flash FW on them.

It's a gearbox. I found it odd to name it gearbox as in theory, there
might be some other single purpose device on the line card of other type
than gearbox. Therefore, "device". I admit it is a bit misleading. Idea
for a better name?


>overkill to me. If you are not OK with a separate command (e.g.,
>DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
>also an option. We discussed this during internal review, but felt that

We would need to add all the line card hierarchy into info_get. That
would look a bit odd to me.


>the current approach is cleaner.
>
>> Would you mind if I revert this?

Let's see what you need to change and change it in place, if possible.


>
>I can't stop you, but keep in mind that it's already late here and that
>tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
>be available to continue this discussion tomorrow morning, so probably
>best to wait for his feedback.
Jiri Pirko April 26, 2022, 6:57 a.m. UTC | #6
Mon, Apr 25, 2022 at 09:52:18PM CEST, kuba@kernel.org wrote:
>On Mon, 25 Apr 2022 22:39:57 +0300 Ido Schimmel wrote:
>> > :/ what is a line card device? You must provide document what you're
>> > doing, this:
>> > 
>> >  .../networking/devlink/devlink-linecard.rst   |   4 +
>> > 
>> > is not enough.
>> > 
>> > How many operations and attributes are you going to copy&paste?
>> > 
>> > Is linking devlink instances into a hierarchy a better approach?  
>> 
>> In this particular case, these devices are gearboxes. They are running
>> their own firmware and we want user space to be able to query and update
>> the running firmware version.
>
>Nothing too special, then, we don't create "devices" for every
>component of the system which can have a separate FW. That's where
>"components" are intended to be used..

*
Sure, that is why I re-used components :)
But you have to somehow link the component to the particular gearbox on
particular line card. Say, you need to flash GB on line card 8. This is
basically providing a way to expose this relationship to user. Also, the
"lc info" shows the FW version for gearboxes. As Ido mentioned, the GB
versions could be listed in "devlink dev info" in theory. But then, you
need to somehow expose the relationship with line card as well.

I don't see any simpler iface than this.


>
>> The idea (implemented in the next patchset) is to let these devices
>> expose their own "component name", which can then be plugged into the
>> existing flash command:
>> 
>>     $ devlink lc show pci/0000:01:00.0 lc 8
>>     pci/0000:01:00.0:
>>       lc 8 state active type 16x100G
>>         supported_types:
>>            16x100G
>>         devices:
>>           device 0 flashable true component lc8_dev0
>>           device 1 flashable false
>>           device 2 flashable false
>>           device 3 flashable false
>>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>
>IDK if it's just me or this assumes deep knowledge of the system.
>I don't understand why we need to list devices 1-3 at all. And they
>don't even have names. No information is exposed. 

There are 4 gearboxes on the line card. They share the same flash. So if
you flash gearbox 0, the rest will use the same FW.
I'm exposing them for the sake of completeness. Also, the interface
needs to be designed as a list anyway, as different line cards may have
separate flash per gearbox.

What's is the harm in exposing devices 1-3? If you insist, we can hide
them.


>
>There are many components on any networking device, including plenty
>40G-R4 -> 25G-R1 gearboxes out there.
>
>> Registering a separate devlink instance for these devices sounds like an
>> overkill to me. If you are not OK with a separate command (e.g.,
>> DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
>> also an option. We discussed this during internal review, but felt that
>> the current approach is cleaner.
>
>I don't know what you have queued, so if you don't need a full devlink
>instance (IOW line cards won't need more individual config) that's fine.

Yeah, incoparable, the devlink dev and line card device - gearbox.


>For just FW flashing you can list many devices and update the
>components... no need to introduce new objects or uAPI.

Please see * above.


>
>> > Would you mind if I revert this?  
>> 
>> I can't stop you, but keep in mind that it's already late here and that
>> tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
>> be available to continue this discussion tomorrow morning, so probably
>> best to wait for his feedback.
>
>Sure, no rush.
Andrew Lunn April 26, 2022, 12:11 p.m. UTC | #7
> >> The idea (implemented in the next patchset) is to let these devices
> >> expose their own "component name", which can then be plugged into the
> >> existing flash command:
> >> 
> >>     $ devlink lc show pci/0000:01:00.0 lc 8
> >>     pci/0000:01:00.0:
> >>       lc 8 state active type 16x100G
> >>         supported_types:
> >>            16x100G
> >>         devices:
> >>           device 0 flashable true component lc8_dev0
> >>           device 1 flashable false
> >>           device 2 flashable false
> >>           device 3 flashable false
> >>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
> >
> >IDK if it's just me or this assumes deep knowledge of the system.
> >I don't understand why we need to list devices 1-3 at all. And they
> >don't even have names. No information is exposed. 
> 
> There are 4 gearboxes on the line card. They share the same flash. So if
> you flash gearbox 0, the rest will use the same FW.

Is the flash big enough to hold four copies of the firmware? Listing
all four devices gives you a path forward to allowing each device to
have its own firmware.

On the flip side, flashable false is not really true. I don't know
this API at all, but are there any flags? Can you at least indicate it
is shared? You could then have an output something like:

           device 0 flashable true shared component lc8_dev0
           device 1 flashable true shared component lc8_dev0
           device 2 flashable true shared component lc8_dev0
           device 3 flashable true shared component lc8_dev0

so you get to see the sharing relationship.

   Andrew
Andrew Lunn April 26, 2022, 12:27 p.m. UTC | #8
> It is not a separate devlink device, not removetely. A devlink device is
> attached to some bus on the host, it contains entities like netdevices,
> etc.
> 
> Line card devices, on contrary, are accessible over ASIC FW interface,
> they reside on line cards. ASIC FW is using build-in SDK to communicate
> with them. There is really nothing to expose, except for the face they
> are there, with some FW version and later on (follow-up patchset) to be
> able to flash FW on them.

But isn't this just an implementation detail?

Say the flash was directly accessible to the host? It is just another
mtd devices? The gearbox is just another bunch of MMIO registers. You
can access the SFP socket via a host i2c bus, etc. More of a SoC like
implementation, which the enterprise routers are like.

This is a completely different set of implementation details, but i
still have the same basic building blocks. Should it look the same,
and the implementation details are hidden, or do you want to expose
your implementation details?

	Andrew
Jiri Pirko April 26, 2022, 12:36 p.m. UTC | #9
Tue, Apr 26, 2022 at 02:11:43PM CEST, andrew@lunn.ch wrote:
>> >> The idea (implemented in the next patchset) is to let these devices
>> >> expose their own "component name", which can then be plugged into the
>> >> existing flash command:
>> >> 
>> >>     $ devlink lc show pci/0000:01:00.0 lc 8
>> >>     pci/0000:01:00.0:
>> >>       lc 8 state active type 16x100G
>> >>         supported_types:
>> >>            16x100G
>> >>         devices:
>> >>           device 0 flashable true component lc8_dev0
>> >>           device 1 flashable false
>> >>           device 2 flashable false
>> >>           device 3 flashable false
>> >>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>> >
>> >IDK if it's just me or this assumes deep knowledge of the system.
>> >I don't understand why we need to list devices 1-3 at all. And they
>> >don't even have names. No information is exposed. 
>> 
>> There are 4 gearboxes on the line card. They share the same flash. So if
>> you flash gearbox 0, the rest will use the same FW.
>
>Is the flash big enough to hold four copies of the firmware? Listing

One copy, all 4 are using it.


>all four devices gives you a path forward to allowing each device to
>have its own firmware.

Yes.

>
>On the flip side, flashable false is not really true. I don't know
>this API at all, but are there any flags? Can you at least indicate it
>is shared? You could then have an output something like:

Yes, that I can do.

>
>           device 0 flashable true shared component lc8_dev0
>           device 1 flashable true shared component lc8_dev0
>           device 2 flashable true shared component lc8_dev0
>           device 3 flashable true shared component lc8_dev0
>
>so you get to see the sharing relationship.

Agreed.

>
>   Andrew
Jiri Pirko April 26, 2022, 12:41 p.m. UTC | #10
Tue, Apr 26, 2022 at 02:27:54PM CEST, andrew@lunn.ch wrote:
>> It is not a separate devlink device, not removetely. A devlink device is
>> attached to some bus on the host, it contains entities like netdevices,
>> etc.
>> 
>> Line card devices, on contrary, are accessible over ASIC FW interface,
>> they reside on line cards. ASIC FW is using build-in SDK to communicate
>> with them. There is really nothing to expose, except for the face they
>> are there, with some FW version and later on (follow-up patchset) to be
>> able to flash FW on them.
>
>But isn't this just an implementation detail?
>
>Say the flash was directly accessible to the host? It is just another
>mtd devices? The gearbox is just another bunch of MMIO registers. You

Not really, the ASIC FW is also not mtd device. ASIC FW exposes set of
registers to work with the ASIC FW flash. The linecard gearbox
implements the same set of registers, tunneled over MDDT register.

Also, the gearbox is not accessable from the host. The ASIC FW is the
only one able to access it.


>can access the SFP socket via a host i2c bus, etc. More of a SoC like
>implementation, which the enterprise routers are like.
>
>This is a completely different set of implementation details, but i
>still have the same basic building blocks. Should it look the same,
>and the implementation details are hidden, or do you want to expose
>your implementation details?

Well, I got your point. If the HW would be designed in the way the
building blocks are exposed to the host, that would work. However, that
is not the case here, unfortunatelly.


>
>	Andrew
Jakub Kicinski April 26, 2022, 12:41 p.m. UTC | #11
On Tue, 26 Apr 2022 08:57:15 +0200 Jiri Pirko wrote:
> >> In this particular case, these devices are gearboxes. They are running
> >> their own firmware and we want user space to be able to query and update
> >> the running firmware version.  
> >
> >Nothing too special, then, we don't create "devices" for every
> >component of the system which can have a separate FW. That's where
> >"components" are intended to be used..  
> 
> *
> Sure, that is why I re-used components :)

Well, right, I guess you did reuse them a little :)

> But you have to somehow link the component to the particular gearbox on
> particular line card. Say, you need to flash GB on line card 8. This is
> basically providing a way to expose this relationship to user.
> 
> Also, the "lc info" shows the FW version for gearboxes. As Ido
> mentioned, the GB versions could be listed in "devlink dev info" in
> theory. But then, you need to somehow expose the relationship with
> line card as well.

Why would the automation which comes to update the firmware care 
at all where the component is? Humans can see what the component 
is by looking at the name.

If we do need to know (*if*!) you can list FW components as a lc
attribute, no need for new commands and objects.

IMHO we should either keep lc objects simple and self contained or 
give them a devlink instance. Creating sub-objects from them is very
worrying. If there is _any_ chance we'll need per-lc health reporters 
or sbs or params(
Andrew Lunn April 26, 2022, 1:45 p.m. UTC | #12
> Well, I got your point. If the HW would be designed in the way the
> building blocks are exposed to the host, that would work. However, that
> is not the case here, unfortunatelly.

I'm with Jakub. It is the uAPI which matters here. It should look the
same for a SoC style enterprise router and your discombobulated TOR
router. How you talk to the different building blocks is an
implementation detail.

	Andrew
Jiri Pirko April 26, 2022, 2 p.m. UTC | #13
Tue, Apr 26, 2022 at 02:41:30PM CEST, kuba@kernel.org wrote:
>On Tue, 26 Apr 2022 08:57:15 +0200 Jiri Pirko wrote:
>> >> In this particular case, these devices are gearboxes. They are running
>> >> their own firmware and we want user space to be able to query and update
>> >> the running firmware version.  
>> >
>> >Nothing too special, then, we don't create "devices" for every
>> >component of the system which can have a separate FW. That's where
>> >"components" are intended to be used..  
>> 
>> *
>> Sure, that is why I re-used components :)
>
>Well, right, I guess you did reuse them a little :)

I use them a lot. It is not visible in this patchset, but in the
flashing follow-up patchset.


>
>> But you have to somehow link the component to the particular gearbox on
>> particular line card. Say, you need to flash GB on line card 8. This is
>> basically providing a way to expose this relationship to user.
>> 
>> Also, the "lc info" shows the FW version for gearboxes. As Ido
>> mentioned, the GB versions could be listed in "devlink dev info" in
>> theory. But then, you need to somehow expose the relationship with
>> line card as well.
>
>Why would the automation which comes to update the firmware care 
>at all where the component is? Humans can see what the component 
>is by looking at the name.

The relationship-by-name sounds a bit fragile to me. The names of
components are up to the individual drivers.


>
>If we do need to know (*if*!) you can list FW components as a lc
>attribute, no need for new commands and objects.

There is no new command for that, only one nested attribute which
carries the device list added to the existing command. They are no new
objects, they are just few nested values.


>
>IMHO we should either keep lc objects simple and self contained or 
>give them a devlink instance. Creating sub-objects from them is very

Give them a devlink instance? I don't understand how. LC is not a
separate device, far from that. That does not make any sense to me.


>worrying. If there is _any_ chance we'll need per-lc health reporters 
>or sbs or params(
Jiri Pirko April 26, 2022, 2:05 p.m. UTC | #14
Tue, Apr 26, 2022 at 03:45:48PM CEST, andrew@lunn.ch wrote:
>> Well, I got your point. If the HW would be designed in the way the
>> building blocks are exposed to the host, that would work. However, that
>> is not the case here, unfortunatelly.
>
>I'm with Jakub. It is the uAPI which matters here. It should look the
>same for a SoC style enterprise router and your discombobulated TOR
>router. How you talk to the different building blocks is an
>implementation detail.

It's not that simple. Take the gearbox for example. You say bunch of
MDIO registers. ASIC FW has a custom SDK internally that is used to
talk to the gearbox.

The flash, you say expose by MTD, but there is no access to it directly
from host. Can't be done. There are HW design limitations that are
blocking your concept.
Jakub Kicinski April 26, 2022, 2:51 p.m. UTC | #15
On Tue, 26 Apr 2022 16:00:10 +0200 Jiri Pirko wrote:
> >> But you have to somehow link the component to the particular gearbox on
> >> particular line card. Say, you need to flash GB on line card 8. This is
> >> basically providing a way to expose this relationship to user.
> >> 
> >> Also, the "lc info" shows the FW version for gearboxes. As Ido
> >> mentioned, the GB versions could be listed in "devlink dev info" in
> >> theory. But then, you need to somehow expose the relationship with
> >> line card as well.  
> >
> >Why would the automation which comes to update the firmware care 
> >at all where the component is? Humans can see what the component 
> >is by looking at the name.  
> 
> The relationship-by-name sounds a bit fragile to me. The names of
> components are up to the individual drivers.

I asked you how the automation will operate. You must answer questions
if you want to have a discussion. Automation is the relevant part.
You're not designing an interface for SDK users but for end users.

> >If we do need to know (*if*!) you can list FW components as a lc
> >attribute, no need for new commands and objects.  
> 
> There is no new command for that, only one nested attribute which
> carries the device list added to the existing command. They are no new
> objects, they are just few nested values.

DEVLINK_CMD_LINECARD_INFO_GET

> >IMHO we should either keep lc objects simple and self contained or 
> >give them a devlink instance. Creating sub-objects from them is very  
> 
> Give them a devlink instance? I don't understand how. LC is not a
> separate device, far from that. That does not make any sense to me.

You can put a name of another devlink instance as an attribute of a lc.
See below.

> >worrying. If there is _any_ chance we'll need per-lc health reporters 
> >or sbs or params(
Andrew Lunn April 26, 2022, 3:24 p.m. UTC | #16
> Does not make sense to me at all. Line cards are detachable PHY sets in
> essence, very basic functionality. They does not have buffers, health
> and params, I don't think so. 

Ido recently added support to ethtool to upgrade the flash in SFPs.
They are far from simple devices. Some of the GPON ones have linux
running in them, that you can log in to.

The real question is, can you do everything you need to do via
existing uAPIs like ethtool and hwmon.

	Andrew
Andrew Lunn April 26, 2022, 3:36 p.m. UTC | #17
On Tue, Apr 26, 2022 at 04:05:43PM +0200, Jiri Pirko wrote:
> Tue, Apr 26, 2022 at 03:45:48PM CEST, andrew@lunn.ch wrote:
> >> Well, I got your point. If the HW would be designed in the way the
> >> building blocks are exposed to the host, that would work. However, that
> >> is not the case here, unfortunatelly.
> >
> >I'm with Jakub. It is the uAPI which matters here. It should look the
> >same for a SoC style enterprise router and your discombobulated TOR
> >router. How you talk to the different building blocks is an
> >implementation detail.
> 
> It's not that simple. Take the gearbox for example. You say bunch of
> MDIO registers. ASIC FW has a custom SDK internally that is used to
> talk to the gearbox.
> 
> The flash, you say expose by MTD, but there is no access to it directly
> from host. Can't be done. There are HW design limitations that are
> blocking your concept.

The MTD API and your SDK API are abstractions. You give it a blob of
data and ask it to write it to the storage. Somehow that happens. Does
user space need to know MTD or an SDK is being used? Does user space
care? I would expect the same uAPI for both, here is a firmware blob,
write it to storage. The driver knows if it needs to use the MTD API
or the SDK API, it is all abstracted away in the driver.

	Andrew
Jiri Pirko April 27, 2022, 7:35 a.m. UTC | #18
Tue, Apr 26, 2022 at 04:51:33PM CEST, kuba@kernel.org wrote:
>On Tue, 26 Apr 2022 16:00:10 +0200 Jiri Pirko wrote:
>> >> But you have to somehow link the component to the particular gearbox on
>> >> particular line card. Say, you need to flash GB on line card 8. This is
>> >> basically providing a way to expose this relationship to user.
>> >> 
>> >> Also, the "lc info" shows the FW version for gearboxes. As Ido
>> >> mentioned, the GB versions could be listed in "devlink dev info" in
>> >> theory. But then, you need to somehow expose the relationship with
>> >> line card as well.  
>> >
>> >Why would the automation which comes to update the firmware care 
>> >at all where the component is? Humans can see what the component 
>> >is by looking at the name.  
>> 
>> The relationship-by-name sounds a bit fragile to me. The names of
>> components are up to the individual drivers.
>
>I asked you how the automation will operate. You must answer questions
>if you want to have a discussion. Automation is the relevant part.

Automation, not sure. It would probably just see type of gearbox and
flash it. Not sure I understand the question, perhaps you could explain?
Plus, the possibility is to auto-flash the GB from driver directly.


>You're not designing an interface for SDK users but for end users.

Sure, that is the aim of this API. Human end user. That is why I wanted
the user to see the relationships between devlink dev, line cards and
the gearboxes on them. If you want to limit the visibility, sure, just
tell me how.


>
>> >If we do need to know (*if*!) you can list FW components as a lc
>> >attribute, no need for new commands and objects.  
>> 
>> There is no new command for that, only one nested attribute which
>> carries the device list added to the existing command. They are no new
>> objects, they are just few nested values.
>
>DEVLINK_CMD_LINECARD_INFO_GET

Okay, that is not only to expose devices. That is also to expose info
about linecards, like HW revision, INI version etc. Where else to put
it? I can perhaps embed it into devlink dev info, but I thought separate
command would be more suitable. object cmd, object info cmd. It is
more clear I believe.


>
>> >IMHO we should either keep lc objects simple and self contained or 
>> >give them a devlink instance. Creating sub-objects from them is very  
>> 
>> Give them a devlink instance? I don't understand how. LC is not a
>> separate device, far from that. That does not make any sense to me.
>
>You can put a name of another devlink instance as an attribute of a lc.
>See below.
>
>> >worrying. If there is _any_ chance we'll need per-lc health reporters 
>> >or sbs or params(
Jiri Pirko April 27, 2022, 7:37 a.m. UTC | #19
Tue, Apr 26, 2022 at 05:24:41PM CEST, andrew@lunn.ch wrote:
>> Does not make sense to me at all. Line cards are detachable PHY sets in
>> essence, very basic functionality. They does not have buffers, health
>> and params, I don't think so. 
>
>Ido recently added support to ethtool to upgrade the flash in SFPs.
>They are far from simple devices. Some of the GPON ones have linux
>running in them, that you can log in to.
>
>The real question is, can you do everything you need to do via
>existing uAPIs like ethtool and hwmon.

No, it does not fit. Ethtool works with netdevices which are
instantiated for separate port. The disconnection between what we can do
with netdev as a handle and how the devices are modeled was the reason
for devlink introduction in the past.

>
>	Andrew
Jakub Kicinski April 27, 2022, 2:14 p.m. UTC | #20
On Wed, 27 Apr 2022 09:35:34 +0200 Jiri Pirko wrote:
> >> The relationship-by-name sounds a bit fragile to me. The names of
> >> components are up to the individual drivers.  
> >
> >I asked you how the automation will operate. You must answer questions
> >if you want to have a discussion. Automation is the relevant part.  
> 
> Automation, not sure. It would probably just see type of gearbox and
> flash it. Not sure I understand the question, perhaps you could explain?
> Plus, the possibility is to auto-flash the GB from driver directly.
> 
> 
> >You're not designing an interface for SDK users but for end users.  
> 
> Sure, that is the aim of this API. Human end user. That is why I wanted
> the user to see the relationships between devlink dev, line cards and
> the gearboxes on them. If you want to limit the visibility, sure, just
> tell me how.

Okay, we have completely different views on what the goals should be.
Perhaps that explains the differences in the design.

Of the three API levels (SDK, automation, human) I think automation
is the only one that's interesting to us in Linux. SDK interfaces are
necessarily too low level as they expose too much of internal details
to standardize. Humans are good with dealing with uncertainty and
diverse so there's no a good benchmark.

The benchmark for automation is - can a machine use this API across
different vendors to reliably achieve its goals. For FW info/flashing
the goal is keeping the FW versions up to date. This is documented:

https://www.kernel.org/doc/html/latest/networking/devlink/devlink-flash.html#firmware-version-management

What would the pseudo code look like with "line cards" in the picture?
Apply RFC1925 truth 12.

> >> There is no new command for that, only one nested attribute which
> >> carries the device list added to the existing command. They are no new
> >> objects, they are just few nested values.  
> >
> >DEVLINK_CMD_LINECARD_INFO_GET  
> 
> Okay, that is not only to expose devices. That is also to expose info
> about linecards, like HW revision, INI version etc. Where else to put
> it? I can perhaps embed it into devlink dev info, but I thought separate
> command would be more suitable. object cmd, object info cmd. It is
> more clear I believe.

> >> If so, how does the user know if/when to flash it?
> >> If not, where would you list it if devices nest is not the correct place?  
> >
> >Let me mock up what I had in mind for you since it did not come thru 
> >in the explanation:
> >
> >$ devlink dev info show pci/0000:01:00.0
> >    versions:
> >        fixed:
> >          hw.revision 0
> >          lc2.hw.revision a
> >          lc8.hw.revision b
> >        running:
> >          ini.version 4
> >          lc2.gearbox 1.1.3
> >          lc8.gearbox 1.2.3  
> 
> Would be rather:
> 
>           lc2.gearbox0 1.1.3
>           lc2.gearbox1 1.2.4

I thought you said your gearboxes all the the same FW? 
Theoretically, yes. Theoretically, I can also have nested "line cards".

>           lc8.gearbox0 1.2.3
> 
> Okay, I see. So instead of having clear api with relationships and
> clear human+machine readability we have squahed indexes into strings.
> I fail to see the benefit, other than no-api-extension :/ On contrary.

Show me the real life use for all the "clear api with relationships"
and I'll shut up.

I would not take falling back to physical (HW) hierarchy for the API
design as a point of pride. Seems lazy if I'm completely honest.
Someone else's HW may have a different hierarchy, and you're just
forcing the automation engineer iterate over irrelevant structures
("devices").

My hunch is that automation will not want to deal with line cards
separately, and flash the entire devices in one go to a tested and
verified bundle blob provided by the vendor. If they do want to poke 
at line cards - the information is still there in what I described.

> >$ devlink lc show pci/0000:01:00.0 lc 8
> >pci/0000:01:00.0:
> >  lc 8 state active type 16x100G
> >    supported_types:
> >      16x100G
> >    versions: 
> >      lc8.hw.revision (a) 
> >      lc8.gearbox (1.2.3)
> >
> >Where the data in the brackets is optionally fetched thru the existing
> >"dev info" API, but rendered together by the user space.  
> 
> Quite odd. I find it questionable to say at least to mix multiple
> command netlink outputs into one output.

Really? So we're going to be designing kernel APIs so that each message
contains complete information and can't contain references now?

> The processing of it would be a small nightmare considering the way
> how the netlink message processing works in iproute2 :/
Jiri Pirko April 29, 2022, 11:51 a.m. UTC | #21
Wed, Apr 27, 2022 at 04:14:47PM CEST, kuba@kernel.org wrote:
>On Wed, 27 Apr 2022 09:35:34 +0200 Jiri Pirko wrote:
>> >> The relationship-by-name sounds a bit fragile to me. The names of
>> >> components are up to the individual drivers.  
>> >
>> >I asked you how the automation will operate. You must answer questions
>> >if you want to have a discussion. Automation is the relevant part.  
>> 
>> Automation, not sure. It would probably just see type of gearbox and
>> flash it. Not sure I understand the question, perhaps you could explain?
>> Plus, the possibility is to auto-flash the GB from driver directly.
>> 
>> 
>> >You're not designing an interface for SDK users but for end users.  
>> 
>> Sure, that is the aim of this API. Human end user. That is why I wanted
>> the user to see the relationships between devlink dev, line cards and
>> the gearboxes on them. If you want to limit the visibility, sure, just
>> tell me how.
>
>Okay, we have completely different views on what the goals should be.
>Perhaps that explains the differences in the design.

I don't think so. I'm just a bit confused that you say "You're not designing
an interface for SDK users but for *end users*" and now you say that
primary is automation. Nevertheless, both are equally important and the
iface should work for both or them, I believe.


>
>Of the three API levels (SDK, automation, human) I think automation
>is the only one that's interesting to us in Linux. SDK interfaces are
>necessarily too low level as they expose too much of internal details
>to standardize. Humans are good with dealing with uncertainty and
>diverse so there's no a good benchmark.
>
>The benchmark for automation is - can a machine use this API across
>different vendors to reliably achieve its goals. For FW info/flashing
>the goal is keeping the FW versions up to date. This is documented:
>
>https://www.kernel.org/doc/html/latest/networking/devlink/devlink-flash.html#firmware-version-management
>
>What would the pseudo code look like with "line cards" in the picture?
>Apply RFC1925 truth 12.

Something like this:

$lc_count = array_size(devlink-lc-info[$handle])

for ($lcnum = 0; $lcnum < $lc_count; lcnum++):
    $dev_count = array_size(devlink-lc-info[$handle][$lcnum])

    for ($devnum = 0; $devnum < $dev_count; $devnum++):
    
        # Get unique HW design identifier (gearbox id)
        $hw_id = devlink-lc-info[$handle][$lcnum][$devnum]['fw.psid']

        # Find out which FW flash we want to use for this device
        $want_flash_vers = some-db-backed.lookup($hw_id, 'flash')

        # Update flash if necessary
        if $want_flash_vers != devlink-lc-info[$handle][$lcnum][$devnum]['fw']:
            $file = some-db-backed.download($hw_id, 'flash')
            $component = devlink-lc[$handle][$lcnum][$devnum]['component']
            devlink-dev-flash($handle, $component, $file)

devlink-reload()

Clear indexes, not squashed somewhere in middle of string.


>
>> >> There is no new command for that, only one nested attribute which
>> >> carries the device list added to the existing command. They are no new
>> >> objects, they are just few nested values.  
>> >
>> >DEVLINK_CMD_LINECARD_INFO_GET  
>> 
>> Okay, that is not only to expose devices. That is also to expose info
>> about linecards, like HW revision, INI version etc. Where else to put
>> it? I can perhaps embed it into devlink dev info, but I thought separate
>> command would be more suitable. object cmd, object info cmd. It is
>> more clear I believe.
>
>> >> If so, how does the user know if/when to flash it?
>> >> If not, where would you list it if devices nest is not the correct place?  
>> >
>> >Let me mock up what I had in mind for you since it did not come thru 
>> >in the explanation:
>> >
>> >$ devlink dev info show pci/0000:01:00.0
>> >    versions:
>> >        fixed:
>> >          hw.revision 0
>> >          lc2.hw.revision a
>> >          lc8.hw.revision b
>> >        running:
>> >          ini.version 4
>> >          lc2.gearbox 1.1.3
>> >          lc8.gearbox 1.2.3  
>> 
>> Would be rather:
>> 
>>           lc2.gearbox0 1.1.3
>>           lc2.gearbox1 1.2.4
>
>I thought you said your gearboxes all the the same FW? 
>Theoretically, yes. Theoretically, I can also have nested "line cards".

Well, yeah. I was under impresion that possibility of having multiple
devices on the same LC is not close to 0. But I get your point.

Let's try to figure out he iface as you want it:
We will have devlink dev info extended to be able to provide info
about the LC/gearbox. Let's work with same prefix "lcX." for all
info related to line card X.

First problem is, who is going to enforce this prefix. It is driver's
responsibility to provide the string which is put out. The solution
would be to have a helper similar to devlink_info_version_*_put()
called devlink_info_lc_version_*_put() that would accept lc pointer and
add the prefix. Does it make sense to you?

We need 3 things:
1) current version of gearbox FW 
   That is easy, we have it - "versions"
   (DEVLINK_ATTR_INFO_VERSION_* nested attrs). We can have multiple
   nests that would carry the versions for individiual line cards.
   Example:
       versions:
           fixed:
             hw.revision 0
             lc2.hw.revision a
             lc8.hw.revision b
           running:
             ini.version 4
             lc2.gearbox.fw.version 1.1.3
             lc8.gearbox.fw.version 1.2.3
2) HW id (as you have it in your pseudocode), it is PSID in case of
   mlxsw. We already have PSID for ASIC:
   ....
   This should be also easy, as this is exposed as "fixed version" in a
   same way as previous example.
   Example:
       versions:
           fixed:
             lc2.gearbox.fw.psid XXX
             lc8.gearbox.fw.psid YYY
3) Component name
   This one is a bit tricky. It is not a version, so put it under
   "versions" does not make much sense.
   Also, there are more than one. Looking at how versions are done,
   multiple nests of attr type DEVLINK_ATTR_INFO_VERSION_* are put to
   the message. So we can introduce new attr DEVLINK_ATTR_INFO_FLASH_COMPONENT
   and put one per linecard/gearbox.
   Here arain we need some kind of helper to driver to call to put this
   into msg: devlink_info_lc_flash_component_put()
   Example:
     pci/0000:01:00.0:
       driver mlxsw_spectrum3
       flash_components:
         lc2.gearbox.fw
         lc8.gearbox.fw

    Then the flashing of the gearbox will be done by:
    # devlink dev flash pci/0000:01:00.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 component lc2.gearbox.fw

    Maybe this would call for some sort of API between driver and devlink to
    register individial components. devlink.c can then sanitize the
    input value of the component according to the registered list.

    Even withot that I think this would be valuable anyway to let the
    user know what are the supported component names.

What do you think? If you agree, I can jump into implementing this and
you can feel free to revert this patchset.



>
>>           lc8.gearbox0 1.2.3
>> 
>> Okay, I see. So instead of having clear api with relationships and
>> clear human+machine readability we have squahed indexes into strings.
>> I fail to see the benefit, other than no-api-extension :/ On contrary.
>
>Show me the real life use for all the "clear api with relationships"
>and I'll shut up.

See the pseudo-code above.


>
>I would not take falling back to physical (HW) hierarchy for the API
>design as a point of pride. Seems lazy if I'm completely honest.

I got that. I spent a lot of time to find a good abstraction though.


>Someone else's HW may have a different hierarchy, and you're just
>forcing the automation engineer iterate over irrelevant structures
>("devices").

Well, "linecard device" could be anything on th LC, from gearbox to
whatever. It should fit. But I get your point.


>
>My hunch is that automation will not want to deal with line cards
>separately, and flash the entire devices in one go to a tested and
>verified bundle blob provided by the vendor. If they do want to poke 
>at line cards - the information is still there in what I described.
>
>> >$ devlink lc show pci/0000:01:00.0 lc 8
>> >pci/0000:01:00.0:
>> >  lc 8 state active type 16x100G
>> >    supported_types:
>> >      16x100G
>> >    versions: 
>> >      lc8.hw.revision (a) 
>> >      lc8.gearbox (1.2.3)
>> >
>> >Where the data in the brackets is optionally fetched thru the existing
>> >"dev info" API, but rendered together by the user space.  
>> 
>> Quite odd. I find it questionable to say at least to mix multiple
>> command netlink outputs into one output.
>
>Really? So we're going to be designing kernel APIs so that each message
>contains complete information and can't contain references now?

Can you give me an exapmple of devlink or any other iproute2 app cmd
that actually does something similar to this?



>
>> The processing of it would be a small nightmare considering the way
>> how the netlink message processing works in iproute2 :/
Jakub Kicinski April 29, 2022, 6:45 p.m. UTC | #22
On Fri, 29 Apr 2022 13:51:33 +0200 Jiri Pirko wrote:
> >Of the three API levels (SDK, automation, human) I think automation
> >is the only one that's interesting to us in Linux. SDK interfaces are
> >necessarily too low level as they expose too much of internal details
> >to standardize. Humans are good with dealing with uncertainty and
> >diverse so there's no a good benchmark.
> >
> >The benchmark for automation is - can a machine use this API across
> >different vendors to reliably achieve its goals. For FW info/flashing
> >the goal is keeping the FW versions up to date. This is documented:
> >
> >https://www.kernel.org/doc/html/latest/networking/devlink/devlink-flash.html#firmware-version-management
> >
> >What would the pseudo code look like with "line cards" in the picture?
> >Apply RFC1925 truth 12.  
> 
> Something like this:
> 
> $lc_count = array_size(devlink-lc-info[$handle])
> 
> for ($lcnum = 0; $lcnum < $lc_count; lcnum++):
>     $dev_count = array_size(devlink-lc-info[$handle][$lcnum])
> 
>     for ($devnum = 0; $devnum < $dev_count; $devnum++):

Here goes the iteration I complained about in my previous message.
Tracking FW versions makes most sense at the level of a product (as 
in the unit of HW one can purchase from the system vendor). That
integrates well with system tracking HW in the fleet. Product in your
case will be a line card or populated chassis, I believe.

>         # Get unique HW design identifier (gearbox id)
>         $hw_id = devlink-lc-info[$handle][$lcnum][$devnum]['fw.psid']

1) you can't use 'fw.psid' in generic logic, it's a Melvidia's invention
2) looking at your cover letter there's no fw.psid for the device
   reported, the automation will not work, Q.E.D.

>         # Find out which FW flash we want to use for this device
>         $want_flash_vers = some-db-backed.lookup($hw_id, 'flash')
> 
>         # Update flash if necessary
>         if $want_flash_vers != devlink-lc-info[$handle][$lcnum][$devnum]['fw']:
>             $file = some-db-backed.download($hw_id, 'flash')
>             $component = devlink-lc[$handle][$lcnum][$devnum]['component']
>             devlink-dev-flash($handle, $component, $file)
> 
> devlink-reload()
> 
> Clear indexes, not squashed somewhere in middle of string.
> 
> >I thought you said your gearboxes all the the same FW? 
> >Theoretically, yes. Theoretically, I can also have nested "line cards".  
> 
> Well, yeah. I was under impresion that possibility of having multiple
> devices on the same LC is not close to 0. But I get your point.
> 
> Let's try to figure out he iface as you want it:
> We will have devlink dev info extended to be able to provide info
> about the LC/gearbox. Let's work with same prefix "lcX." for all
> info related to line card X.
> 
> First problem is, who is going to enforce this prefix. It is driver's
> responsibility to provide the string which is put out. The solution
> would be to have a helper similar to devlink_info_version_*_put()
> called devlink_info_lc_version_*_put() that would accept lc pointer and
> add the prefix. Does it make sense to you?
> 
> We need 3 things:
> 1) current version of gearbox FW 
>    That is easy, we have it - "versions"
>    (DEVLINK_ATTR_INFO_VERSION_* nested attrs). We can have multiple
>    nests that would carry the versions for individiual line cards.
>    Example:
>        versions:
>            fixed:
>              hw.revision 0
>              lc2.hw.revision a
>              lc8.hw.revision b
>            running:
>              ini.version 4
>              lc2.gearbox.fw.version 1.1.3
>              lc8.gearbox.fw.version 1.2.3
> 2) HW id (as you have it in your pseudocode), it is PSID in case of
>    mlxsw. We already have PSID for ASIC:
>    ....
>    This should be also easy, as this is exposed as "fixed version" in a
>    same way as previous example.
>    Example:
>        versions:
>            fixed:
>              lc2.gearbox.fw.psid XXX
>              lc8.gearbox.fw.psid YYY
> 3) Component name
>    This one is a bit tricky. It is not a version, so put it under
>    "versions" does not make much sense.
>    Also, there are more than one. Looking at how versions are done,
>    multiple nests of attr type DEVLINK_ATTR_INFO_VERSION_* are put to
>    the message. So we can introduce new attr DEVLINK_ATTR_INFO_FLASH_COMPONENT
>    and put one per linecard/gearbox.
>    Here arain we need some kind of helper to driver to call to put this
>    into msg: devlink_info_lc_flash_component_put()
>    Example:
>      pci/0000:01:00.0:
>        driver mlxsw_spectrum3
>        flash_components:
>          lc2.gearbox.fw
>          lc8.gearbox.fw
> 
>     Then the flashing of the gearbox will be done by:
>     # devlink dev flash pci/0000:01:00.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 component lc2.gearbox.fw

The main question to me is whether users will want to flash the entire
device, or update line cards individually.

What's inside mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2? Doesn't
sound like it's FW just for a single gearbox?

> >Really? So we're going to be designing kernel APIs so that each message
> >contains complete information and can't contain references now?  
> 
> Can you give me an exapmple of devlink or any other iproute2 app cmd
> that actually does something similar to this?

Off the top of my head - doesn't ip has some caches for name resolution
etc.? Either way current code in iproute2.git is hardly written on the
stone tablets.
Jiri Pirko April 29, 2022, 7:29 p.m. UTC | #23
Fri, Apr 29, 2022 at 08:45:35PM CEST, kuba@kernel.org wrote:
>On Fri, 29 Apr 2022 13:51:33 +0200 Jiri Pirko wrote:
>> >Of the three API levels (SDK, automation, human) I think automation
>> >is the only one that's interesting to us in Linux. SDK interfaces are
>> >necessarily too low level as they expose too much of internal details
>> >to standardize. Humans are good with dealing with uncertainty and
>> >diverse so there's no a good benchmark.
>> >
>> >The benchmark for automation is - can a machine use this API across
>> >different vendors to reliably achieve its goals. For FW info/flashing
>> >the goal is keeping the FW versions up to date. This is documented:
>> >
>> >https://www.kernel.org/doc/html/latest/networking/devlink/devlink-flash.html#firmware-version-management
>> >
>> >What would the pseudo code look like with "line cards" in the picture?
>> >Apply RFC1925 truth 12.  
>> 
>> Something like this:
>> 
>> $lc_count = array_size(devlink-lc-info[$handle])
>> 
>> for ($lcnum = 0; $lcnum < $lc_count; lcnum++):
>>     $dev_count = array_size(devlink-lc-info[$handle][$lcnum])
>> 
>>     for ($devnum = 0; $devnum < $dev_count; $devnum++):
>
>Here goes the iteration I complained about in my previous message.
>Tracking FW versions makes most sense at the level of a product (as 
>in the unit of HW one can purchase from the system vendor). That
>integrates well with system tracking HW in the fleet. Product in your
>case will be a line card or populated chassis, I believe.

Well, most probably, you are right. In theory, there migth de 2 types of
devices/gearboxes on a single line card. I admit I find it unlikely now.


>
>>         # Get unique HW design identifier (gearbox id)
>>         $hw_id = devlink-lc-info[$handle][$lcnum][$devnum]['fw.psid']
>
>1) you can't use 'fw.psid' in generic logic, it's a Melvidia's invention
>2) looking at your cover letter there's no fw.psid for the device
>   reported, the automation will not work, Q.E.D.

We can make is a "symlink" to fw.hw_id or whatever. But that is not the
point here. For ASIC FW, we currently have also fw.psid.


>
>>         # Find out which FW flash we want to use for this device
>>         $want_flash_vers = some-db-backed.lookup($hw_id, 'flash')
>> 
>>         # Update flash if necessary
>>         if $want_flash_vers != devlink-lc-info[$handle][$lcnum][$devnum]['fw']:
>>             $file = some-db-backed.download($hw_id, 'flash')
>>             $component = devlink-lc[$handle][$lcnum][$devnum]['component']
>>             devlink-dev-flash($handle, $component, $file)
>> 
>> devlink-reload()
>> 
>> Clear indexes, not squashed somewhere in middle of string.
>> 
>> >I thought you said your gearboxes all the the same FW? 
>> >Theoretically, yes. Theoretically, I can also have nested "line cards".  
>> 
>> Well, yeah. I was under impresion that possibility of having multiple
>> devices on the same LC is not close to 0. But I get your point.
>> 
>> Let's try to figure out he iface as you want it:
>> We will have devlink dev info extended to be able to provide info
>> about the LC/gearbox. Let's work with same prefix "lcX." for all
>> info related to line card X.
>> 
>> First problem is, who is going to enforce this prefix. It is driver's
>> responsibility to provide the string which is put out. The solution
>> would be to have a helper similar to devlink_info_version_*_put()
>> called devlink_info_lc_version_*_put() that would accept lc pointer and
>> add the prefix. Does it make sense to you?
>> 
>> We need 3 things:
>> 1) current version of gearbox FW 
>>    That is easy, we have it - "versions"
>>    (DEVLINK_ATTR_INFO_VERSION_* nested attrs). We can have multiple
>>    nests that would carry the versions for individiual line cards.
>>    Example:
>>        versions:
>>            fixed:
>>              hw.revision 0
>>              lc2.hw.revision a
>>              lc8.hw.revision b
>>            running:
>>              ini.version 4
>>              lc2.gearbox.fw.version 1.1.3
>>              lc8.gearbox.fw.version 1.2.3
>> 2) HW id (as you have it in your pseudocode), it is PSID in case of
>>    mlxsw. We already have PSID for ASIC:
>>    ....
>>    This should be also easy, as this is exposed as "fixed version" in a
>>    same way as previous example.
>>    Example:
>>        versions:
>>            fixed:
>>              lc2.gearbox.fw.psid XXX
>>              lc8.gearbox.fw.psid YYY
>> 3) Component name
>>    This one is a bit tricky. It is not a version, so put it under
>>    "versions" does not make much sense.
>>    Also, there are more than one. Looking at how versions are done,
>>    multiple nests of attr type DEVLINK_ATTR_INFO_VERSION_* are put to
>>    the message. So we can introduce new attr DEVLINK_ATTR_INFO_FLASH_COMPONENT
>>    and put one per linecard/gearbox.
>>    Here arain we need some kind of helper to driver to call to put this
>>    into msg: devlink_info_lc_flash_component_put()
>>    Example:
>>      pci/0000:01:00.0:
>>        driver mlxsw_spectrum3
>>        flash_components:
>>          lc2.gearbox.fw
>>          lc8.gearbox.fw
>> 
>>     Then the flashing of the gearbox will be done by:
>>     # devlink dev flash pci/0000:01:00.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 component lc2.gearbox.fw
>
>The main question to me is whether users will want to flash the entire
>device, or update line cards individually.

I think it makes sense to update them individually. The versions are
also reported individually. What's the benefit of not doing that. Also,
how would you name the "group" component. Sounds odd to me.


>
>What's inside mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2? Doesn't
>sound like it's FW just for a single gearbox?
>
>> >Really? So we're going to be designing kernel APIs so that each message
>> >contains complete information and can't contain references now?  
>> 
>> Can you give me an exapmple of devlink or any other iproute2 app cmd
>> that actually does something similar to this?
>
>Off the top of my head - doesn't ip has some caches for name resolution
>etc.? Either way current code in iproute2.git is hardly written on the
>stone tablets.

Not sure, that is why I thought you might have an example. Anyway, I
don't think it is important. I think that all 3 values exposed I
described above can be just in devlink dev info.
Jakub Kicinski April 29, 2022, 10:38 p.m. UTC | #24
On Fri, 29 Apr 2022 21:29:16 +0200 Jiri Pirko wrote:
> >The main question to me is whether users will want to flash the entire
> >device, or update line cards individually.  
> 
> I think it makes sense to update them individually. The versions are
> also reported individually.

Okay, but neither I want that, nor does it match what Ido described as
the direction for mlxsw, quoting:

  The idea (implemented in the next patchset) is to let these devices
  expose their own "component name", which can then be plugged into the
  existing flash command:

    $ devlink lc show pci/0000:01:00.0 lc 8
    pci/0000:01:00.0:
      lc 8 state active type 16x100G
        supported_types:
           16x100G
        devices:
          device 0 flashable true component lc8_dev0
          device 1 flashable false
          device 2 flashable false
          device 3 flashable false
    $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0

Your "devices" are _not_ individually flashable. It seems natural for
single-board devices like a NIC or a line card to have a single flash
with all the images burned together.

> What's the benefit of not doing that.

As already mentioned in my previous reply the user will likely have 
a database of all their networking assets, and having to break them
up further than the physical piece of gear they order from the supplier
is a pain. Plus the vendor will likely also prefer to ship a single
validated image rather than a blob for every board component with FW.

> Also, how would you name the "group" component. Sounds odd to me.

To flash the whole device we skip the component.

> >What's inside mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2? Doesn't
> >sound like it's FW just for a single gearbox?

Please answer questions. I already complained about this once in 
this thread.
Jiri Pirko April 30, 2022, 6:27 a.m. UTC | #25
Sat, Apr 30, 2022 at 12:38:45AM CEST, kuba@kernel.org wrote:
>On Fri, 29 Apr 2022 21:29:16 +0200 Jiri Pirko wrote:
>> >The main question to me is whether users will want to flash the entire
>> >device, or update line cards individually.  
>> 
>> I think it makes sense to update them individually. The versions are
>> also reported individually.
>
>Okay, but neither I want that, nor does it match what Ido described as
>the direction for mlxsw, quoting:
>
>  The idea (implemented in the next patchset) is to let these devices
>  expose their own "component name", which can then be plugged into the
>  existing flash command:
>
>    $ devlink lc show pci/0000:01:00.0 lc 8
>    pci/0000:01:00.0:
>      lc 8 state active type 16x100G
>        supported_types:
>           16x100G
>        devices:
>          device 0 flashable true component lc8_dev0
>          device 1 flashable false
>          device 2 flashable false
>          device 3 flashable false
>    $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>
>Your "devices" are _not_ individually flashable. It seems natural for
>single-board devices like a NIC or a line card to have a single flash
>with all the images burned together.

Wait a second. I think that we don't understand each other. Currently,
we have a single device to flash on a linecard, the gearbox.
There is one file to flash it. So 1:1 between line card and file to
flash. That is exactly as I described in the proposal. 1 component name
per line card.


>
>> What's the benefit of not doing that.
>
>As already mentioned in my previous reply the user will likely have 
>a database of all their networking assets, and having to break them
>up further than the physical piece of gear they order from the supplier
>is a pain. Plus the vendor will likely also prefer to ship a single
>validated image rather than a blob for every board component with FW.

Depends on the vendor :) But it is hypothetical, let's see what the
future brings. But I agree with you.


>
>> Also, how would you name the "group" component. Sounds odd to me.
>
>To flash the whole device we skip the component.

Sec. I think these is a misunderstanding here. The component it what we
already have in devlink dev flash. Quoting devlink-dev man page:
   devlink dev flash - write device's non-volatile memory.
       DEV - specifies the devlink device to write to.

       file PATH - Path to the file which will be written into device's flash.
       The path needs to be relative to one of the directories searched by the
       kernel firmware loaded, such as /lib/firmware.

---->  component NAME - If device stores multiple firmware images in non-
       volatile memory, this parameter may be used to indicate which firmware
       image should be written.  The value of NAME should match the component
       names from devlink dev info and may be driver-dependent.

This is currently not used in devlink capable drivers. It is a concept
taken from ethtool (I think you were the one that requested to take it,
but my memory could be wrong).
Anyway, the default is component NULL. In case of mlxsw, in that case
the target is ASIC FW.

Now I just want to use this component name to target individual line
cards. I see it is a nice fit. Don't you think?

I see that the manpage is mentioning "the component names from devlink dev info"
which is not actually implemented, but exactly what I proposed.


>
>> >What's inside mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2? Doesn't
>> >sound like it's FW just for a single gearbox?
>
>Please answer questions. I already complained about this once in 
>this thread.

Sorry, I missed this one. The file IS a FW just for a SINGLE gearbox.
Jakub Kicinski May 2, 2022, 2:39 p.m. UTC | #26
On Sat, 30 Apr 2022 08:27:35 +0200 Jiri Pirko wrote:
> Now I just want to use this component name to target individual line
> cards. I see it is a nice fit. Don't you think?

Still on the fence.

> I see that the manpage is mentioning "the component names from devlink dev info"
> which is not actually implemented, but exactly what I proposed.

How do you tie the line card to the component name? lc8_dev0 from 
the flashing example is not present in the lc info output.

> >Please answer questions. I already complained about this once in 
> >this thread.  
> 
> Sorry, I missed this one. The file IS a FW just for a SINGLE gearbox.

I see.
Jiri Pirko May 23, 2022, 9:42 a.m. UTC | #27
First of all, sorry for delay :/


Mon, May 02, 2022 at 04:39:33PM CEST, kuba@kernel.org wrote:
>On Sat, 30 Apr 2022 08:27:35 +0200 Jiri Pirko wrote:
>> Now I just want to use this component name to target individual line
>> cards. I see it is a nice fit. Don't you think?
>
>Still on the fence.

Why?

>
>> I see that the manpage is mentioning "the component names from devlink dev info"
>> which is not actually implemented, but exactly what I proposed.
>
>How do you tie the line card to the component name? lc8_dev0 from 
>the flashing example is not present in the lc info output.

Okay, I will move it there. Makes sense.


>
>> >Please answer questions. I already complained about this once in 
>> >this thread.  
>> 
>> Sorry, I missed this one. The file IS a FW just for a SINGLE gearbox.
>
>I see.
Jakub Kicinski May 23, 2022, 5:56 p.m. UTC | #28
On Mon, 23 May 2022 11:42:07 +0200 Jiri Pirko wrote:
> Mon, May 02, 2022 at 04:39:33PM CEST, kuba@kernel.org wrote:
> >On Sat, 30 Apr 2022 08:27:35 +0200 Jiri Pirko wrote:  
> >> Now I just want to use this component name to target individual line
> >> cards. I see it is a nice fit. Don't you think?  
> >
> >Still on the fence.  
> 
> Why?

IIRC my concern was mixing objects. We have component name coming from
lc info, but then use it in dev flash.

> >> I see that the manpage is mentioning "the component names from devlink dev info"
> >> which is not actually implemented, but exactly what I proposed.  
> >
> >How do you tie the line card to the component name? lc8_dev0 from 
> >the flashing example is not present in the lc info output.  
> 
> Okay, I will move it there. Makes sense.

FWIW I think I meant my comment as a way to underline that what you
argue for is not what's implemented (assuming your "not actually
implemented" referred to the flashing). I was trying to send you back 
to the drawing board rather than break open a box of band-aides.
Jiri Pirko May 24, 2022, 6:46 a.m. UTC | #29
Mon, May 23, 2022 at 07:56:40PM CEST, kuba@kernel.org wrote:
>On Mon, 23 May 2022 11:42:07 +0200 Jiri Pirko wrote:
>> Mon, May 02, 2022 at 04:39:33PM CEST, kuba@kernel.org wrote:
>> >On Sat, 30 Apr 2022 08:27:35 +0200 Jiri Pirko wrote:  
>> >> Now I just want to use this component name to target individual line
>> >> cards. I see it is a nice fit. Don't you think?  
>> >
>> >Still on the fence.  
>> 
>> Why?
>
>IIRC my concern was mixing objects. We have component name coming from
>lc info, but then use it in dev flash.

Sure. I considered that. The thing is, even if you put the lc component
names to output of "devlink dev info", you would need to provide lc
objects as well (somehow) - to contain the versions.

But the component name is related to lc object listed in "devlink lc",
so "devlink lc info" sounds line the correct place to put it.

If you are concern about "devlink dev flash" using component name from
"devlink lc info", I would rather introduce "devlink lc flash" so you
have a match. But from what I see, I don't really see the necessity for
this match. Do you?


>
>> >> I see that the manpage is mentioning "the component names from devlink dev info"
>> >> which is not actually implemented, but exactly what I proposed.  
>> >
>> >How do you tie the line card to the component name? lc8_dev0 from 
>> >the flashing example is not present in the lc info output.  
>> 
>> Okay, I will move it there. Makes sense.
>
>FWIW I think I meant my comment as a way to underline that what you
>argue for is not what's implemented (assuming your "not actually
>implemented" referred to the flashing). I was trying to send you back 
>to the drawing board rather than break open a box of band-aides.

Sure, lets do this right, I don't want to band-aide anything...
Jiri Pirko May 24, 2022, 2:31 p.m. UTC | #30
Tue, May 24, 2022 at 08:46:38AM CEST, jiri@resnulli.us wrote:
>Mon, May 23, 2022 at 07:56:40PM CEST, kuba@kernel.org wrote:
>>On Mon, 23 May 2022 11:42:07 +0200 Jiri Pirko wrote:
>>> Mon, May 02, 2022 at 04:39:33PM CEST, kuba@kernel.org wrote:
>>> >On Sat, 30 Apr 2022 08:27:35 +0200 Jiri Pirko wrote:  
>>> >> Now I just want to use this component name to target individual line
>>> >> cards. I see it is a nice fit. Don't you think?  
>>> >
>>> >Still on the fence.  
>>> 
>>> Why?
>>
>>IIRC my concern was mixing objects. We have component name coming from
>>lc info, but then use it in dev flash.
>
>Sure. I considered that. The thing is, even if you put the lc component
>names to output of "devlink dev info", you would need to provide lc
>objects as well (somehow) - to contain the versions.
>
>But the component name is related to lc object listed in "devlink lc",
>so "devlink lc info" sounds line the correct place to put it.
>
>If you are concern about "devlink dev flash" using component name from
>"devlink lc info", I would rather introduce "devlink lc flash" so you
>have a match. But from what I see, I don't really see the necessity for
>this match. Do you?

Okay, we can eventually avoid using component name at all for now,
considering one flash object per linecard (with possibility to extend by
component later on). This would look like:

$ devlink lc info pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8
    versions:
        fixed:
          hw.revision 0
	  fw.psid MT_0000000749
        running:
          ini.version 4
          fw 19.2010.1310

$ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

I have to admit I like this.
We would reuse the existing DEVLINK_CMD_FLASH_UPDATE cmd and when
DEVLINK_ATTR_LINECARD_INDEX attribute is present, we call the lc-flash
op. How does this sound?


>
>
>>
>>> >> I see that the manpage is mentioning "the component names from devlink dev info"
>>> >> which is not actually implemented, but exactly what I proposed.  
>>> >
>>> >How do you tie the line card to the component name? lc8_dev0 from 
>>> >the flashing example is not present in the lc info output.  
>>> 
>>> Okay, I will move it there. Makes sense.
>>
>>FWIW I think I meant my comment as a way to underline that what you
>>argue for is not what's implemented (assuming your "not actually
>>implemented" referred to the flashing). I was trying to send you back 
>>to the drawing board rather than break open a box of band-aides.
>
>Sure, lets do this right, I don't want to band-aide anything...
Jakub Kicinski May 24, 2022, 6 p.m. UTC | #31
On Tue, 24 May 2022 16:31:45 +0200 Jiri Pirko wrote:
> >Sure. I considered that. The thing is, even if you put the lc component
> >names to output of "devlink dev info", you would need to provide lc
> >objects as well (somehow) - to contain the versions.
> >
> >But the component name is related to lc object listed in "devlink lc",
> >so "devlink lc info" sounds line the correct place to put it.
> >
> >If you are concern about "devlink dev flash" using component name from
> >"devlink lc info", I would rather introduce "devlink lc flash" so you
> >have a match. But from what I see, I don't really see the necessity for
> >this match. Do you?  
> 
> Okay, we can eventually avoid using component name at all for now,
> considering one flash object per linecard (with possibility to extend by
> component later on). This would look like:
> 
> $ devlink lc info pci/0000:01:00.0 lc 8
> pci/0000:01:00.0:
>   lc 8
>     versions:
>         fixed:
>           hw.revision 0
> 	  fw.psid MT_0000000749
>         running:
>           ini.version 4
>           fw 19.2010.1310
> 
> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
> 
> I have to admit I like this.
> We would reuse the existing DEVLINK_CMD_FLASH_UPDATE cmd and when
> DEVLINK_ATTR_LINECARD_INDEX attribute is present, we call the lc-flash
> op. How does this sound?

We talked about this earlier in the thread, I think. If you need both
info and flash per LC just make them a separate devlink instance and
let them have all the objects they need. Then just put the instance
name under lc info.
Jiri Pirko May 25, 2022, 6:20 a.m. UTC | #32
Tue, May 24, 2022 at 08:00:57PM CEST, kuba@kernel.org wrote:
>On Tue, 24 May 2022 16:31:45 +0200 Jiri Pirko wrote:
>> >Sure. I considered that. The thing is, even if you put the lc component
>> >names to output of "devlink dev info", you would need to provide lc
>> >objects as well (somehow) - to contain the versions.
>> >
>> >But the component name is related to lc object listed in "devlink lc",
>> >so "devlink lc info" sounds line the correct place to put it.
>> >
>> >If you are concern about "devlink dev flash" using component name from
>> >"devlink lc info", I would rather introduce "devlink lc flash" so you
>> >have a match. But from what I see, I don't really see the necessity for
>> >this match. Do you?  
>> 
>> Okay, we can eventually avoid using component name at all for now,
>> considering one flash object per linecard (with possibility to extend by
>> component later on). This would look like:
>> 
>> $ devlink lc info pci/0000:01:00.0 lc 8
>> pci/0000:01:00.0:
>>   lc 8
>>     versions:
>>         fixed:
>>           hw.revision 0
>> 	  fw.psid MT_0000000749
>>         running:
>>           ini.version 4
>>           fw 19.2010.1310
>> 
>> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>> 
>> I have to admit I like this.
>> We would reuse the existing DEVLINK_CMD_FLASH_UPDATE cmd and when
>> DEVLINK_ATTR_LINECARD_INDEX attribute is present, we call the lc-flash
>> op. How does this sound?
>
>We talked about this earlier in the thread, I think. If you need both
>info and flash per LC just make them a separate devlink instance and
>let them have all the objects they need. Then just put the instance
>name under lc info.

I don't follow :/ What do you mean be "separate devlink instance" here?
Could you draw me an example?
Jakub Kicinski May 25, 2022, 3:50 p.m. UTC | #33
On Wed, 25 May 2022 08:20:45 +0200 Jiri Pirko wrote:
> >We talked about this earlier in the thread, I think. If you need both
> >info and flash per LC just make them a separate devlink instance and
> >let them have all the objects they need. Then just put the instance
> >name under lc info.  
> 
> I don't follow :/ What do you mean be "separate devlink instance" here?
> Could you draw me an example?

Separate instance:

	for (i = 0; i < sw->num_lcs; i++) {
		devlink_register(&sw->lc_dl[i]);
		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
	}

then report that under the linecard

	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
	devlink_nl_put_handle(msg, lc->devlink);
	nla_nest_end(msg...)

then user can update the linecard like any devlink instance, switch,
NIC etc. It's better code reuse and I don't see any downside, TBH.
Jiri Pirko May 26, 2022, 9:05 a.m. UTC | #34
Wed, May 25, 2022 at 05:50:54PM CEST, kuba@kernel.org wrote:
>On Wed, 25 May 2022 08:20:45 +0200 Jiri Pirko wrote:
>> >We talked about this earlier in the thread, I think. If you need both
>> >info and flash per LC just make them a separate devlink instance and
>> >let them have all the objects they need. Then just put the instance
>> >name under lc info.  
>> 
>> I don't follow :/ What do you mean be "separate devlink instance" here?
>> Could you draw me an example?
>
>Separate instance:
>
>	for (i = 0; i < sw->num_lcs; i++) {
>		devlink_register(&sw->lc_dl[i]);
>		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
>	}
>
>then report that under the linecard
>
>	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
>	devlink_nl_put_handle(msg, lc->devlink);
>	nla_nest_end(msg...)
>
>then user can update the linecard like any devlink instance, switch,
>NIC etc. It's better code reuse and I don't see any downside, TBH.

I think you already suggested something like that. What you mean is that
linecard would be modeled as any other devlink instance. Sorry, but that
does not make any sense to me at all :/ Should devlink port not be
devlink port and rather modeled as a separate devlink instance too? Same
to the rest of the objects we already have. Should we have a tree of
same devlink objects each overloaded to some specific type. If yes, that
is completely changing the devlink modelling.

The handle of the devlink object is bus_name/dev_name. In other words,
there is a device on a bus and for each you can create devlink instance.
Linecards is not on a any bus, is is not modeled in /sys. What would be
the handle?

What you suggest seems to me like completely broken :/

I don't understand the motivation :/ The only I can think of that you
will have the same "devlink dev flash" mechanism, but that's it.
I'm not sure I understand why we cannot resolve the flashing otherwise,
for example as I suggested with "devlink lc info" and "devlink lc
flash". What is wrong with that?
Jiri Pirko May 26, 2022, 10:47 a.m. UTC | #35
Thu, May 26, 2022 at 11:05:53AM CEST, jiri@resnulli.us wrote:
>Wed, May 25, 2022 at 05:50:54PM CEST, kuba@kernel.org wrote:
>>On Wed, 25 May 2022 08:20:45 +0200 Jiri Pirko wrote:
>>> >We talked about this earlier in the thread, I think. If you need both
>>> >info and flash per LC just make them a separate devlink instance and
>>> >let them have all the objects they need. Then just put the instance
>>> >name under lc info.  
>>> 
>>> I don't follow :/ What do you mean be "separate devlink instance" here?
>>> Could you draw me an example?
>>
>>Separate instance:
>>
>>	for (i = 0; i < sw->num_lcs; i++) {
>>		devlink_register(&sw->lc_dl[i]);
>>		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
>>	}
>>
>>then report that under the linecard
>>
>>	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
>>	devlink_nl_put_handle(msg, lc->devlink);
>>	nla_nest_end(msg...)
>>
>>then user can update the linecard like any devlink instance, switch,
>>NIC etc. It's better code reuse and I don't see any downside, TBH.
>
>I think you already suggested something like that. What you mean is that
>linecard would be modeled as any other devlink instance. Sorry, but that
>does not make any sense to me at all :/ Should devlink port not be
>devlink port and rather modeled as a separate devlink instance too? Same
>to the rest of the objects we already have. Should we have a tree of
>same devlink objects each overloaded to some specific type. If yes, that
>is completely changing the devlink modelling.
>
>The handle of the devlink object is bus_name/dev_name. In other words,
>there is a device on a bus and for each you can create devlink instance.
>Linecards is not on a any bus, is is not modeled in /sys. What would be
>the handle?
>
>What you suggest seems to me like completely broken :/

After some thinking I think I understand why you suggest this.
The versions in dev info and dev flash is there on devlink instance level,
so you would like to reuse it.

But it does not click. Having devlink instance just for this feels very
artificial, unfitting and heavy. IDK.


>
>I don't understand the motivation :/ The only I can think of that you
>will have the same "devlink dev flash" mechanism, but that's it.
>I'm not sure I understand why we cannot resolve the flashing otherwise,
>for example as I suggested with "devlink lc info" and "devlink lc
>flash". What is wrong with that?
Jiri Pirko May 26, 2022, 11:45 a.m. UTC | #36
Wed, May 25, 2022 at 05:50:54PM CEST, kuba@kernel.org wrote:
>On Wed, 25 May 2022 08:20:45 +0200 Jiri Pirko wrote:
>> >We talked about this earlier in the thread, I think. If you need both
>> >info and flash per LC just make them a separate devlink instance and
>> >let them have all the objects they need. Then just put the instance
>> >name under lc info.  
>> 
>> I don't follow :/ What do you mean be "separate devlink instance" here?
>> Could you draw me an example?
>
>Separate instance:
>
>	for (i = 0; i < sw->num_lcs; i++) {
>		devlink_register(&sw->lc_dl[i]);
>		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
>	}
>
>then report that under the linecard
>
>	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
>	devlink_nl_put_handle(msg, lc->devlink);
>	nla_nest_end(msg...)
>
>then user can update the linecard like any devlink instance, switch,
>NIC etc. It's better code reuse and I don't see any downside, TBH.

Okay, I was thinking about this a litle bit more, and I would like to
explore extending the components path. Exposing the components in
"devlink dev info" and then using them in "devlink dev flash". LC could
be just one of multiple potential users of components. Will send RFC
soon.
Jakub Kicinski May 26, 2022, 5:35 p.m. UTC | #37
On Thu, 26 May 2022 13:45:49 +0200 Jiri Pirko wrote:
> >Separate instance:
> >
> >	for (i = 0; i < sw->num_lcs; i++) {
> >		devlink_register(&sw->lc_dl[i]);
> >		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
> >	}
> >
> >then report that under the linecard
> >
> >	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
> >	devlink_nl_put_handle(msg, lc->devlink);
> >	nla_nest_end(msg...)
> >
> >then user can update the linecard like any devlink instance, switch,
> >NIC etc. It's better code reuse and I don't see any downside, TBH.  
> 
> Okay, I was thinking about this a litle bit more, and I would like to
> explore extending the components path. Exposing the components in
> "devlink dev info" and then using them in "devlink dev flash". LC could
> be just one of multiple potential users of components. Will send RFC
> soon.

Feel free to send a mockup of the devlink user space outputs.
The core code for devlink is just meaningless marshaling anyway.
Jiri Pirko May 27, 2022, 7:27 a.m. UTC | #38
Thu, May 26, 2022 at 07:35:39PM CEST, kuba@kernel.org wrote:
>On Thu, 26 May 2022 13:45:49 +0200 Jiri Pirko wrote:
>> >Separate instance:
>> >
>> >	for (i = 0; i < sw->num_lcs; i++) {
>> >		devlink_register(&sw->lc_dl[i]);
>> >		devlink_line_card_link(&sw->lc[i], &sw->lc_dl[i]);
>> >	}
>> >
>> >then report that under the linecard
>> >
>> >	nla_nest_start(msg, DEVLINK_SUBORDINATE_INSTANCE);
>> >	devlink_nl_put_handle(msg, lc->devlink);
>> >	nla_nest_end(msg...)
>> >
>> >then user can update the linecard like any devlink instance, switch,
>> >NIC etc. It's better code reuse and I don't see any downside, TBH.  
>> 
>> Okay, I was thinking about this a litle bit more, and I would like to
>> explore extending the components path. Exposing the components in
>> "devlink dev info" and then using them in "devlink dev flash". LC could
>> be just one of multiple potential users of components. Will send RFC
>> soon.
>
>Feel free to send a mockup of the devlink user space outputs.
>The core code for devlink is just meaningless marshaling anyway.

Okay. So the output of devlink dev info would be extended by
"components" nest. This nest would carry array of components which
contain versions. The name of the component is openin each array member
nest:

$ devlink dev info
pci/0000:01:00.0:
  driver mlxsw_spectrum2
  versions:
      fixed:
        hw.revision A0
        fw.psid MT_0000000199
      running:
        fw.version 29.2010.2302
        fw 29.2010.2302
  components:
    lc1:
      versions:
        fixed:
          hw.revision 0
          fw.psid MT_0000000111
        running:
          fw 19.2010.1310
          ini.version 4
    lc2:
      versions:
        fixed:
          hw.revision 0
          fw.psid MT_0000000111
        running:
          fw 19.2010.1310
          ini.version 4
    someothercomponentname:
      versions:
        running:
	   fw: 888

Now on top of exsisting "devlink dev flash" cmd without component, user
may specify the component name from the array above:

$ devlink dev flash pci/0000:01:00.0 component lc1 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

$ devlink dev flash pci/0000:01:00.0 component someothercomponentname file foo.bin

Note this is generic vehicle, line cards would benefit but it is usable
for multiple ASIC FW partitions for example.

Note that on "devlink dev flash" there is no change. This is implemented
currently. Only "devlink dev info" is extended to show the component
list.
Jakub Kicinski May 28, 2022, 12:10 a.m. UTC | #39
On Fri, 27 May 2022 09:27:47 +0200 Jiri Pirko wrote:
> Okay. So the output of devlink dev info would be extended by
> "components" nest. This nest would carry array of components which
> contain versions. The name of the component is openin each array member
> nest:
> 
> $ devlink dev info
> pci/0000:01:00.0:
>   driver mlxsw_spectrum2
>   versions:
>       fixed:
>         hw.revision A0
>         fw.psid MT_0000000199
>       running:
>         fw.version 29.2010.2302
>         fw 29.2010.2302
>   components:
>     lc1:

Is the "lc1" free-form or generated by the core based on subobjects?
Is it carried as a string or object type + id?

I guess my suggestion of a CLI mockup has proven its weakness :)

>       versions:
>         fixed:
>           hw.revision 0
>           fw.psid MT_0000000111
>         running:
>           fw 19.2010.1310
>           ini.version 4
>     lc2:
>       versions:
>         fixed:
>           hw.revision 0
>           fw.psid MT_0000000111
>         running:
>           fw 19.2010.1310
>           ini.version 4
>     someothercomponentname:
>       versions:
>         running:
> 	   fw: 888
> 
> Now on top of exsisting "devlink dev flash" cmd without component, user
> may specify the component name from the array above:
> 
> $ devlink dev flash pci/0000:01:00.0 component lc1 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
> 
> $ devlink dev flash pci/0000:01:00.0 component someothercomponentname file foo.bin
> 
> Note this is generic vehicle, line cards would benefit but it is usable
> for multiple ASIC FW partitions for example.
> 
> Note that on "devlink dev flash" there is no change. This is implemented
> currently. Only "devlink dev info" is extended to show the component
> list.

I sort of assumed that the DEVLINK_ATTR_INFO_VERSION_NAME is the
component, the docs also use the word "component" for it. 

For the nfp for instance we had "fw.app" for the datapath microcode and
"fw.mgmt" for the control processor. These are separate partitions on
the flash. I don't think we ever implemented writing them separately
but it's certainly was our internal plan at some point.
Jiri Pirko May 28, 2022, 9:09 a.m. UTC | #40
Sat, May 28, 2022 at 02:10:38AM CEST, kuba@kernel.org wrote:
>On Fri, 27 May 2022 09:27:47 +0200 Jiri Pirko wrote:
>> Okay. So the output of devlink dev info would be extended by
>> "components" nest. This nest would carry array of components which
>> contain versions. The name of the component is openin each array member
>> nest:
>> 
>> $ devlink dev info
>> pci/0000:01:00.0:
>>   driver mlxsw_spectrum2
>>   versions:
>>       fixed:
>>         hw.revision A0
>>         fw.psid MT_0000000199
>>       running:
>>         fw.version 29.2010.2302
>>         fw 29.2010.2302
>>   components:
>>     lc1:
>
>Is the "lc1" free-form or generated by the core based on subobjects?
>Is it carried as a string or object type + id?

It could be both:
1) for line cards I plan to have a helper to have this generated by core
2) for other FW objects, it is up to the driver.


>
>I guess my suggestion of a CLI mockup has proven its weakness :)

I'm not sure I understand what you mean by this sentence. Could you
please be more blunt? You know, my english is not so good to understand
some hidden meanings :)



>
>>       versions:
>>         fixed:
>>           hw.revision 0
>>           fw.psid MT_0000000111
>>         running:
>>           fw 19.2010.1310
>>           ini.version 4
>>     lc2:
>>       versions:
>>         fixed:
>>           hw.revision 0
>>           fw.psid MT_0000000111
>>         running:
>>           fw 19.2010.1310
>>           ini.version 4
>>     someothercomponentname:
>>       versions:
>>         running:
>> 	   fw: 888
>> 
>> Now on top of exsisting "devlink dev flash" cmd without component, user
>> may specify the component name from the array above:
>> 
>> $ devlink dev flash pci/0000:01:00.0 component lc1 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>> 
>> $ devlink dev flash pci/0000:01:00.0 component someothercomponentname file foo.bin
>> 
>> Note this is generic vehicle, line cards would benefit but it is usable
>> for multiple ASIC FW partitions for example.
>> 
>> Note that on "devlink dev flash" there is no change. This is implemented
>> currently. Only "devlink dev info" is extended to show the component
>> list.
>
>I sort of assumed that the DEVLINK_ATTR_INFO_VERSION_NAME is the
>component, the docs also use the word "component" for it. 

Okay, that I didn't see.

>
>For the nfp for instance we had "fw.app" for the datapath microcode and
>"fw.mgmt" for the control processor. These are separate partitions on
>the flash. I don't think we ever implemented writing them separately
>but it's certainly was our internal plan at some point.

Okay, so what you say it, we already have components in "devlink dev
info". Like you pointed out as an example:
  fw.app
  fw.mgmt
so the flash comment would be:
  devlink dev flash pci/0000:01:00.0 component fw.app file foo.bin
  devlink dev flash pci/0000:01:00.0 component fw.mgmt file bar.bin
?

If yes, what should be the default in case component is not defined? Do
we need to expose it in "devlink dev info"? How?

So to extend this existing facility with my line card example, we would
have:

$ devlink dev info
pci/0000:01:00.0:
   driver mlxsw_spectrum2
   versions:
       fixed:
         hw.revision A0
         fw.psid MT_0000000199
	 lc1.hw.revision 0
	 lc1.fw.psid MT_0000000111
	 lc2.hw.revision 0
	 lc2.fw.psid MT_0000000111
       running:
         fw.version 29.2010.2302
         fw 29.2010.2302
	 lc1.fw 19.2010.1310
	 lc1.ini.version 4
	 lc2.fw 19.2010.1310
	 lc2.ini.version 4

And then:
devlink dev flash pci/0000:01:00.0 component lc1.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

Does this sound correct?

Also, to avoid free-form, I can imagine to have per-linecard info_get() op
which would be called for each line card from devlink_nl_info_fill() and
prefix the "lcX" automatically without driver being involved.

Sounds good?

Thanks!
David Ahern May 28, 2022, 3:58 p.m. UTC | #41
On 5/24/22 8:31 AM, Jiri Pirko wrote:
> 
> $ devlink lc info pci/0000:01:00.0 lc 8
> pci/0000:01:00.0:

...

> 
> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2


A lot of your proposed syntax for devlink commands has 'lc' twice. If
'lc' is the subcommand, then you are managing a linecard making 'lc'
before the '8' redundant. How about 'slot 8' or something along those lines?
Jakub Kicinski May 28, 2022, 7:02 p.m. UTC | #42
On Sat, 28 May 2022 11:09:01 +0200 Jiri Pirko wrote:
> Sat, May 28, 2022 at 02:10:38AM CEST, kuba@kernel.org wrote:
> >
> >Is the "lc1" free-form or generated by the core based on subobjects?
> >Is it carried as a string or object type + id?  
> 
> It could be both:
> 1) for line cards I plan to have a helper to have this generated by core
> 2) for other FW objects, it is up to the driver.

Did you mean "either" or "both"?

> >I guess my suggestion of a CLI mockup has proven its weakness :)  
> 
> I'm not sure I understand what you mean by this sentence. Could you
> please be more blunt? You know, my english is not so good to understand
> some hidden meanings :)

The question of what kind of attribute "lc1" is carried in would had
been answered in posting of a code, while CLI mockup doesn't provide
such detail.

> >
> >I sort of assumed that the DEVLINK_ATTR_INFO_VERSION_NAME is the
> >component, the docs also use the word "component" for it.   
> 
> Okay, that I didn't see.
> 
> >
> >For the nfp for instance we had "fw.app" for the datapath microcode and
> >"fw.mgmt" for the control processor. These are separate partitions on
> >the flash. I don't think we ever implemented writing them separately
> >but it's certainly was our internal plan at some point.  
> 
> Okay, so what you say it, we already have components in "devlink dev
> info". Like you pointed out as an example:
>   fw.app
>   fw.mgmt
> so the flash comment would be:
>   devlink dev flash pci/0000:01:00.0 component fw.app file foo.bin
>   devlink dev flash pci/0000:01:00.0 component fw.mgmt file bar.bin
> ?

Correct.

> If yes, what should be the default in case component is not defined? Do
> we need to expose it in "devlink dev info"? How?

Not defined as in someone tries to flash component X but there is no
version for X in info?

> So to extend this existing facility with my line card example, we would
> have:
> 
> $ devlink dev info
> pci/0000:01:00.0:
>    driver mlxsw_spectrum2
>    versions:
>        fixed:
>          hw.revision A0
>          fw.psid MT_0000000199
> 	 lc1.hw.revision 0
> 	 lc1.fw.psid MT_0000000111
> 	 lc2.hw.revision 0
> 	 lc2.fw.psid MT_0000000111
>        running:
>          fw.version 29.2010.2302
>          fw 29.2010.2302
> 	 lc1.fw 19.2010.1310
> 	 lc1.ini.version 4
> 	 lc2.fw 19.2010.1310
> 	 lc2.ini.version 4
> 
> And then:
> devlink dev flash pci/0000:01:00.0 component lc1.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
> 
> Does this sound correct?

I think I suggested something like that in the past, but back then 
I was assuming that lc FW would come from the same large FW bundle
file as the control plan FW, and we would not have to use the component.

Let's step back and look from the automation perspective again.
Assuming we don't want to hardcode matching "lc$i" there how can 
a generic FW update service scan the dev info and decide on what
dev flash command to fire off?

> Also, to avoid free-form, I can imagine to have per-linecard info_get() op
> which would be called for each line card from devlink_nl_info_fill() and
> prefix the "lcX" automatically without driver being involved.
> 
> Sounds good?

Hm. That's moving the matryoshka-ing of the objects from the uAPI level
to the internals. 

If we don't do the string prefix but instead pass the subobject info to
the user space as an attribute per version we can at least avoid
per-subobject commands (DEVLINK_CMD_LINECARD_INFO_GET). Much closer to
how health reporters are implemented than how params are done, so I
think it is a good direction.

We still need to iron out how the automation can go over the main FW
and sub-objects in a generic way.

I still think full devlink sub-instance is better because we will end
up needing params or health. Fake devices can be made with auxbus or
otherwise. But if you really don't want sub-instances we can explore 
the above.
Jiri Pirko May 29, 2022, 9:23 a.m. UTC | #43
Sat, May 28, 2022 at 09:02:53PM CEST, kuba@kernel.org wrote:
>On Sat, 28 May 2022 11:09:01 +0200 Jiri Pirko wrote:
>> Sat, May 28, 2022 at 02:10:38AM CEST, kuba@kernel.org wrote:
>> >
>> >Is the "lc1" free-form or generated by the core based on subobjects?
>> >Is it carried as a string or object type + id?  
>> 
>> It could be both:
>> 1) for line cards I plan to have a helper to have this generated by core
>> 2) for other FW objects, it is up to the driver.
>
>Did you mean "either" or "both"?

Both.


>
>> >I guess my suggestion of a CLI mockup has proven its weakness :)  
>> 
>> I'm not sure I understand what you mean by this sentence. Could you
>> please be more blunt? You know, my english is not so good to understand
>> some hidden meanings :)
>
>The question of what kind of attribute "lc1" is carried in would had
>been answered in posting of a code, while CLI mockup doesn't provide
>such detail.
>
>> >
>> >I sort of assumed that the DEVLINK_ATTR_INFO_VERSION_NAME is the
>> >component, the docs also use the word "component" for it.   
>> 
>> Okay, that I didn't see.
>> 
>> >
>> >For the nfp for instance we had "fw.app" for the datapath microcode and
>> >"fw.mgmt" for the control processor. These are separate partitions on
>> >the flash. I don't think we ever implemented writing them separately
>> >but it's certainly was our internal plan at some point.  
>> 
>> Okay, so what you say it, we already have components in "devlink dev
>> info". Like you pointed out as an example:
>>   fw.app
>>   fw.mgmt
>> so the flash comment would be:
>>   devlink dev flash pci/0000:01:00.0 component fw.app file foo.bin
>>   devlink dev flash pci/0000:01:00.0 component fw.mgmt file bar.bin
>> ?
>
>Correct.
>
>> If yes, what should be the default in case component is not defined? Do
>> we need to expose it in "devlink dev info"? How?
>
>Not defined as in someone tries to flash component X but there is no
>version for X in info?
>
>> So to extend this existing facility with my line card example, we would
>> have:
>> 
>> $ devlink dev info
>> pci/0000:01:00.0:
>>    driver mlxsw_spectrum2
>>    versions:
>>        fixed:
>>          hw.revision A0
>>          fw.psid MT_0000000199
>> 	 lc1.hw.revision 0
>> 	 lc1.fw.psid MT_0000000111
>> 	 lc2.hw.revision 0
>> 	 lc2.fw.psid MT_0000000111
>>        running:
>>          fw.version 29.2010.2302
>>          fw 29.2010.2302
>> 	 lc1.fw 19.2010.1310
>> 	 lc1.ini.version 4
>> 	 lc2.fw 19.2010.1310
>> 	 lc2.ini.version 4
>> 
>> And then:
>> devlink dev flash pci/0000:01:00.0 component lc1.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>> 
>> Does this sound correct?
>
>I think I suggested something like that in the past, but back then 

Yes, you did.


>I was assuming that lc FW would come from the same large FW bundle
>file as the control plan FW, and we would not have to use the component.
>
>Let's step back and look from the automation perspective again.
>Assuming we don't want to hardcode matching "lc$i" there how can 
>a generic FW update service scan the dev info and decide on what
>dev flash command to fire off?

Hardcode matching lc$i? I don't follow. It is a part of the
version/component name.
So if devlink dev info outputs:
lc2.fw 19.2010.1310
then you use for devlink dev flash:
devlink dev flash pci/0000:01:00.0 component lc2.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
Same name, same string.

What am I missing?



>
>> Also, to avoid free-form, I can imagine to have per-linecard info_get() op
>> which would be called for each line card from devlink_nl_info_fill() and
>> prefix the "lcX" automatically without driver being involved.
>> 
>> Sounds good?
>
>Hm. That's moving the matryoshka-ing of the objects from the uAPI level
>to the internals. 
>
>If we don't do the string prefix but instead pass the subobject info to
>the user space as an attribute per version we can at least avoid
>per-subobject commands (DEVLINK_CMD_LINECARD_INFO_GET). Much closer to
>how health reporters are implemented than how params are done, so I
>think it is a good direction.

Sorry, I'm a bit lost. Could you please provide some example about how
you envision it? For me it is a guessing game :/
My guess is you would like to add to the version nest where
DEVLINK_ATTR_INFO_VERSION_NAME resides for example
DEVLINK_ATTR_LINECARD_INDEX?

Correct?


>
>We still need to iron out how the automation can go over the main FW
>and sub-objects in a generic way.
>
>I still think full devlink sub-instance is better because we will end
>up needing params or health. Fake devices can be made with auxbus or
>otherwise. But if you really don't want sub-instances we can explore 
>the above.

I really don't.
Jiri Pirko May 29, 2022, 9:24 a.m. UTC | #44
Sat, May 28, 2022 at 05:58:56PM CEST, dsahern@gmail.com wrote:
>On 5/24/22 8:31 AM, Jiri Pirko wrote:
>> 
>> $ devlink lc info pci/0000:01:00.0 lc 8
>> pci/0000:01:00.0:
>
>...
>
>> 
>> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>
>
>A lot of your proposed syntax for devlink commands has 'lc' twice. If
>'lc' is the subcommand, then you are managing a linecard making 'lc'
>before the '8' redundant. How about 'slot 8' or something along those lines?

Well, there is 1:1 match between cmd line options and output, as always.

Object name is one thing, the option name is different. It is quite
common to name them both the same. I'm not sure I understand why it
would be an issue.
Jakub Kicinski May 30, 2022, 7:54 p.m. UTC | #45
On Sun, 29 May 2022 11:23:01 +0200 Jiri Pirko wrote:
> >Let's step back and look from the automation perspective again.
> >Assuming we don't want to hardcode matching "lc$i" there how can 
> >a generic FW update service scan the dev info and decide on what
> >dev flash command to fire off?  
> 
> Hardcode matching lc$i? I don't follow. It is a part of the
> version/component name.
> So if devlink dev info outputs:
> lc2.fw 19.2010.1310
> then you use for devlink dev flash:
> devlink dev flash pci/0000:01:00.0 component lc2.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
> Same name, same string.
> 
> What am I missing?

Nevermind, I think we can iterate over all the groupings.
Since I hope you agreed that component has an established
meaning can we use group instead?

> >> Also, to avoid free-form, I can imagine to have per-linecard info_get() op
> >> which would be called for each line card from devlink_nl_info_fill() and
> >> prefix the "lcX" automatically without driver being involved.
> >> 
> >> Sounds good?  
> >
> >Hm. That's moving the matryoshka-ing of the objects from the uAPI level
> >to the internals. 
> >
> >If we don't do the string prefix but instead pass the subobject info to
> >the user space as an attribute per version we can at least avoid
> >per-subobject commands (DEVLINK_CMD_LINECARD_INFO_GET). Much closer to
> >how health reporters are implemented than how params are done, so I
> >think it is a good direction.  
> 
> Sorry, I'm a bit lost. Could you please provide some example about how
> you envision it? For me it is a guessing game :/
> My guess is you would like to add to the version nest where
> DEVLINK_ATTR_INFO_VERSION_NAME resides for example
> DEVLINK_ATTR_LINECARD_INDEX?
> 
> Correct?

Yup.
David Ahern May 31, 2022, 2:11 a.m. UTC | #46
On 5/29/22 3:24 AM, Jiri Pirko wrote:
> Sat, May 28, 2022 at 05:58:56PM CEST, dsahern@gmail.com wrote:
>> On 5/24/22 8:31 AM, Jiri Pirko wrote:
>>>
>>> $ devlink lc info pci/0000:01:00.0 lc 8
>>> pci/0000:01:00.0:
>>
>> ...
>>
>>>
>>> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>>
>>
>> A lot of your proposed syntax for devlink commands has 'lc' twice. If
>> 'lc' is the subcommand, then you are managing a linecard making 'lc'
>> before the '8' redundant. How about 'slot 8' or something along those lines?
> 
> Well, there is 1:1 match between cmd line options and output, as always.
> 
> Object name is one thing, the option name is different. It is quite
> common to name them both the same. I'm not sure I understand why it
> would be an issue.
> 

example? To me it says something is off in your model when you want to
use the same keyword twice in a command line.
Jiri Pirko May 31, 2022, 7:05 a.m. UTC | #47
Tue, May 31, 2022 at 04:11:16AM CEST, dsahern@gmail.com wrote:
>On 5/29/22 3:24 AM, Jiri Pirko wrote:
>> Sat, May 28, 2022 at 05:58:56PM CEST, dsahern@gmail.com wrote:
>>> On 5/24/22 8:31 AM, Jiri Pirko wrote:
>>>>
>>>> $ devlink lc info pci/0000:01:00.0 lc 8
>>>> pci/0000:01:00.0:
>>>
>>> ...
>>>
>>>>
>>>> $ devlink lc flash pci/0000:01:00.0 lc 8 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>>>
>>>
>>> A lot of your proposed syntax for devlink commands has 'lc' twice. If
>>> 'lc' is the subcommand, then you are managing a linecard making 'lc'
>>> before the '8' redundant. How about 'slot 8' or something along those lines?
>> 
>> Well, there is 1:1 match between cmd line options and output, as always.
>> 
>> Object name is one thing, the option name is different. It is quite
>> common to name them both the same. I'm not sure I understand why it
>> would be an issue.
>> 
>
>example? To me it says something is off in your model when you want to
>use the same keyword twice in a command line.

man devlink-trap
man devlink-sb
Jiri Pirko May 31, 2022, 7:11 a.m. UTC | #48
Mon, May 30, 2022 at 09:54:08PM CEST, kuba@kernel.org wrote:
>On Sun, 29 May 2022 11:23:01 +0200 Jiri Pirko wrote:
>> >Let's step back and look from the automation perspective again.
>> >Assuming we don't want to hardcode matching "lc$i" there how can 
>> >a generic FW update service scan the dev info and decide on what
>> >dev flash command to fire off?  
>> 
>> Hardcode matching lc$i? I don't follow. It is a part of the
>> version/component name.
>> So if devlink dev info outputs:
>> lc2.fw 19.2010.1310
>> then you use for devlink dev flash:
>> devlink dev flash pci/0000:01:00.0 component lc2.fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>> Same name, same string.
>> 
>> What am I missing?
>
>Nevermind, I think we can iterate over all the groupings.
>Since I hope you agreed that component has an established

Yeah, component=version. I will send a RFC soon that tights it together.

>meaning can we use group instead?

Group of what? Could you provide me example what you mean?


>
>> >> Also, to avoid free-form, I can imagine to have per-linecard info_get() op
>> >> which would be called for each line card from devlink_nl_info_fill() and
>> >> prefix the "lcX" automatically without driver being involved.
>> >> 
>> >> Sounds good?  
>> >
>> >Hm. That's moving the matryoshka-ing of the objects from the uAPI level
>> >to the internals. 
>> >
>> >If we don't do the string prefix but instead pass the subobject info to
>> >the user space as an attribute per version we can at least avoid
>> >per-subobject commands (DEVLINK_CMD_LINECARD_INFO_GET). Much closer to
>> >how health reporters are implemented than how params are done, so I
>> >think it is a good direction.  
>> 
>> Sorry, I'm a bit lost. Could you please provide some example about how
>> you envision it? For me it is a guessing game :/
>> My guess is you would like to add to the version nest where
>> DEVLINK_ATTR_INFO_VERSION_NAME resides for example
>> DEVLINK_ATTR_LINECARD_INDEX?
>> 
>> Correct?
>
>Yup.

Hmm, in that case, I'm not sure how to do this. As cmd options and       
outputs should match, we would have:                                     
                                                                         
devlink dev info                                                         
lc2.fw 19.2010.1310                                                      
                                                                         
here lc2 and fw are concatenated from DEVLINK_ATTR_LINECARD_INDEX and DEVLINK_ATTR_INFO_VERSION_NAME
                                                                         
Now on devlink dev flash side, when I pass "component lc2.fw", how could 
the "devlink dev flash" know to divide it to DEVLINK_ATTR_LINECARD_INDEX 
and FLASH_COMPONENT? Should I parse the cmd line option and figure the
"lcX." prefix into an attribute?
                                                                         
Or, we would have to have something like:                                    
devlink dev flash pci/0000:01:00.0 lc 2 component fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
                                                                         
But to be consistent with the output, we would have to change "devlink   
dev info" to something like:                                             
pci/0000:01:00.0:                                                        
  versions:                                                              
      running:                                                           
        fw 1.2.3                                                         
        fw.mgmt 10.20.30                                                 
        lc 2 fw 19.2010.1310                                             
                                                                         
But that would break the existing JSON output, because "running" is an array:
                "running": {                                             
                    "fw": "1.2.3",                                       
                    "fw.mgmt": "10.20.30"                                
                },                                                       

So probably better to stick to "lcx.y" notation in both devlink dev info
and flash and split/squash to attributes internally. What do you think?
Jakub Kicinski May 31, 2022, 3:05 p.m. UTC | #49
On Tue, 31 May 2022 09:11:27 +0200 Jiri Pirko wrote:
> >Nevermind, I think we can iterate over all the groupings.
> >Since I hope you agreed that component has an established  
> 
> Yeah, component=version. I will send a RFC soon that tights it together.
> 
> >meaning can we use group instead?  
> 
> Group of what? Could you provide me example what you mean?

Group of components. As explained component has an existing meaning,
we can't reuse the term with a different one now.

> >> Sorry, I'm a bit lost. Could you please provide some example about how
> >> you envision it? For me it is a guessing game :/
> >> My guess is you would like to add to the version nest where
> >> DEVLINK_ATTR_INFO_VERSION_NAME resides for example
> >> DEVLINK_ATTR_LINECARD_INDEX?
> >> 
> >> Correct?  
> >
> >Yup.  
> 
> Hmm, in that case, I'm not sure how to do this. As cmd options and       
> outputs should match, we would have:                                     
>                                                           
> devlink dev info                                                         
> lc2.fw 19.2010.1310                                                      
>                                                                          
> here lc2 and fw are concatenated from DEVLINK_ATTR_LINECARD_INDEX and DEVLINK_ATTR_INFO_VERSION_NAME

lc2 is the group name.
                                                     
> Now on devlink dev flash side, when I pass "component lc2.fw", how could 
> the "devlink dev flash" know to divide it to DEVLINK_ATTR_LINECARD_INDEX 
> and FLASH_COMPONENT? Should I parse the cmd line option and figure the
> "lcX." prefix into an attribute?
>                                                        
> Or, we would have to have something like:                                    
> devlink dev flash pci/0000:01:00.0 lc 2 component fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

Yup, it'll make DaveA happy as well.

> But to be consistent with the output, we would have to change "devlink   
> dev info" to something like:                                             
> pci/0000:01:00.0:                                                        
>   versions:                                                              
>       running:                                                           
>         fw 1.2.3                                                         
>         fw.mgmt 10.20.30                                                 
>         lc 2 fw 19.2010.1310                                             

Yup.
                                                            
> But that would break the existing JSON output, because "running" is an array:
>                 "running": {                                             
>                     "fw": "1.2.3",                                       
>                     "fw.mgmt": "10.20.30"                                
>                 },                                                       

No, the lc versions should be in separate nests. Since they are not
updated when flashing main FW mixing them into existing versions would
break uAPI.

> So probably better to stick to "lcx.y" notation in both devlink dev info
> and flash and split/squash to attributes internally. What do you think?

BTW how do you intend to activate the new FW? Extend the reload command?
Jiri Pirko May 31, 2022, 3:51 p.m. UTC | #50
Tue, May 31, 2022 at 05:05:55PM CEST, kuba@kernel.org wrote:
>On Tue, 31 May 2022 09:11:27 +0200 Jiri Pirko wrote:
>> >Nevermind, I think we can iterate over all the groupings.
>> >Since I hope you agreed that component has an established  
>> 
>> Yeah, component=version. I will send a RFC soon that tights it together.
>> 
>> >meaning can we use group instead?  
>> 
>> Group of what? Could you provide me example what you mean?
>
>Group of components. As explained component has an existing meaning,
>we can't reuse the term with a different one now.

I still don't follow. I don't want to reuse it.
Really, could you be more specific and show examples, please?


>
>> >> Sorry, I'm a bit lost. Could you please provide some example about how
>> >> you envision it? For me it is a guessing game :/
>> >> My guess is you would like to add to the version nest where
>> >> DEVLINK_ATTR_INFO_VERSION_NAME resides for example
>> >> DEVLINK_ATTR_LINECARD_INDEX?
>> >> 
>> >> Correct?  
>> >
>> >Yup.  
>> 
>> Hmm, in that case, I'm not sure how to do this. As cmd options and       
>> outputs should match, we would have:                                     
>>                                                           
>> devlink dev info                                                         
>> lc2.fw 19.2010.1310                                                      
>>                                                                          
>> here lc2 and fw are concatenated from DEVLINK_ATTR_LINECARD_INDEX and DEVLINK_ATTR_INFO_VERSION_NAME
>
>lc2 is the group name.
>                                                     
>> Now on devlink dev flash side, when I pass "component lc2.fw", how could 
>> the "devlink dev flash" know to divide it to DEVLINK_ATTR_LINECARD_INDEX 
>> and FLASH_COMPONENT? Should I parse the cmd line option and figure the
>> "lcX." prefix into an attribute?
>>                                                        
>> Or, we would have to have something like:                                    
>> devlink dev flash pci/0000:01:00.0 lc 2 component fw file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
>
>Yup, it'll make DaveA happy as well.
>
>> But to be consistent with the output, we would have to change "devlink   
>> dev info" to something like:                                             
>> pci/0000:01:00.0:                                                        
>>   versions:                                                              
>>       running:                                                           
>>         fw 1.2.3                                                         
>>         fw.mgmt 10.20.30                                                 
>>         lc 2 fw 19.2010.1310                                             
>
>Yup.

Set, you say "yup" but below you say it should be in a separate nest.
That is confusing me.


>                                                            
>> But that would break the existing JSON output, because "running" is an array:
>>                 "running": {                                             
>>                     "fw": "1.2.3",                                       
>>                     "fw.mgmt": "10.20.30"                                
>>                 },                                                       
>
>No, the lc versions should be in separate nests. Since they are not
>updated when flashing main FW mixing them into existing versions would
>break uAPI.

Could you please draw it out for me exacly as you envision it? We are
dancing around it, I can't really understand what exactly do you mean.


>
>> So probably better to stick to "lcx.y" notation in both devlink dev info
>> and flash and split/squash to attributes internally. What do you think?
>
>BTW how do you intend to activate the new FW? Extend the reload command?

Not sure now. Extending reload is an option. Have to think about it.
Jakub Kicinski May 31, 2022, 4:08 p.m. UTC | #51
On Tue, 31 May 2022 17:51:36 +0200 Jiri Pirko wrote:
> Tue, May 31, 2022 at 05:05:55PM CEST, kuba@kernel.org wrote:
> >> Group of what? Could you provide me example what you mean?  
> >
> >Group of components. As explained component has an existing meaning,
> >we can't reuse the term with a different one now.  
> 
> I still don't follow. I don't want to reuse it.
> Really, could you be more specific and show examples, please?

What you had in your previous examples, just don't call it components
but come up with a new term:

$ devlink dev info
pci/0000:01:00.0:
  driver mlxsw_spectrum2
  versions:
      fixed:
        hw.revision A0
        fw.psid MT_0000000199
      running:
        fw.version 29.2010.2302
        fw 29.2010.2302
  groups? sections? subordinates? :                         <= here
    lc1:
      versions:
        fixed:
          hw.revision 0
          fw.psid MT_0000000111
        running:
          fw 19.2010.1310
          ini.version 4

Note that lc1 is not a nest at netlink level but user space can group
the attrs pretty trivially.

> >> But to be consistent with the output, we would have to change "devlink   
> >> dev info" to something like:                                             
> >> pci/0000:01:00.0:                                                        
> >>   versions:                                                              
> >>       running:                                                           
> >>         fw 1.2.3                                                         
> >>         fw.mgmt 10.20.30                                                 
> >>         lc 2 fw 19.2010.1310                                               
> >
> >Yup.  
> 
> Set, you say "yup" but below you say it should be in a separate nest.
> That is confusing me.

Ah, sorry. I hope the above is clear.
                                                  
> >> But that would break the existing JSON output, because "running" is an array:
> >>                 "running": {                                             
> >>                     "fw": "1.2.3",                                       
> >>                     "fw.mgmt": "10.20.30"                                
> >>                 },                                                         
> >
> >No, the lc versions should be in separate nests. Since they are not
> >updated when flashing main FW mixing them into existing versions would
> >break uAPI.  
> 
> Could you please draw it out for me exacly as you envision it? We are
> dancing around it, I can't really understand what exactly do you mean.

Why would I prototype your feature for you? I prefer a separate dl
instance. If you want to explore other options the "drawing out" is
on you :/

> >> So probably better to stick to "lcx.y" notation in both devlink dev info
> >> and flash and split/squash to attributes internally. What do you think?  
> >
> >BTW how do you intend to activate the new FW? Extend the reload command?  
> 
> Not sure now. Extending reload is an option. Have to think about it.


Jiri Pirko May 31, 2022, 7:34 p.m. UTC | #52
Tue, May 31, 2022 at 06:08:52PM CEST, kuba@kernel.org wrote:
>On Tue, 31 May 2022 17:51:36 +0200 Jiri Pirko wrote:
>> Tue, May 31, 2022 at 05:05:55PM CEST, kuba@kernel.org wrote:
>> >> Group of what? Could you provide me example what you mean?  
>> >
>> >Group of components. As explained component has an existing meaning,
>> >we can't reuse the term with a different one now.  
>> 
>> I still don't follow. I don't want to reuse it.
>> Really, could you be more specific and show examples, please?
>
>What you had in your previous examples, just don't call it components
>but come up with a new term:
>
>$ devlink dev info
>pci/0000:01:00.0:
>  driver mlxsw_spectrum2
>  versions:
>      fixed:
>        hw.revision A0
>        fw.psid MT_0000000199
>      running:
>        fw.version 29.2010.2302
>        fw 29.2010.2302
>  groups? sections? subordinates? :                         <= here
>    lc1:
>      versions:
>        fixed:
>          hw.revision 0
>          fw.psid MT_0000000111
>        running:
>          fw 19.2010.1310
>          ini.version 4
>
>Note that lc1 is not a nest at netlink level but user space can group
>the attrs pretty trivially.

Awesome! I think now it is clearer. To be in sync with devlink dev
flash cmd option, we have to have "lc 1" here, have to think how that
can be done.


>
>> >> But to be consistent with the output, we would have to change "devlink   
>> >> dev info" to something like:                                             
>> >> pci/0000:01:00.0:                                                        
>> >>   versions:                                                              
>> >>       running:                                                           
>> >>         fw 1.2.3                                                         
>> >>         fw.mgmt 10.20.30                                                 
>> >>         lc 2 fw 19.2010.1310                                               
>> >
>> >Yup.  
>> 
>> Set, you say "yup" but below you say it should be in a separate nest.
>> That is confusing me.
>
>Ah, sorry. I hope the above is clear.
>                                                  
>> >> But that would break the existing JSON output, because "running" is an array:
>> >>                 "running": {                                             
>> >>                     "fw": "1.2.3",                                       
>> >>                     "fw.mgmt": "10.20.30"                                
>> >>                 },                                                         
>> >
>> >No, the lc versions should be in separate nests. Since they are not
>> >updated when flashing main FW mixing them into existing versions would
>> >break uAPI.  
>> 
>> Could you please draw it out for me exacly as you envision it? We are
>> dancing around it, I can't really understand what exactly do you mean.
>
>Why would I prototype your feature for you? I prefer a separate dl
>instance. If you want to explore other options the "drawing out" is
>on you :/

Well, you are basically leading my arm when I'm drawing the thing. Looks
like you exactly know what you are looking for. That is why.
If you let me to the stuff my way, we would be already done.
You have to decide which way you want this.

And again, for the record, I strongly believe that a separate dl
instance for this does not make any sense at all :/ I wonder why you
still think it does.


>
>> >> So probably better to stick to "lcx.y" notation in both devlink dev info
>> >> and flash and split/squash to attributes internally. What do you think?  
>> >
>> >BTW how do you intend to activate the new FW? Extend the reload command?  
>> 
>> Not sure now. Extending reload is an option. Have to think about it.
>
>
Jakub Kicinski May 31, 2022, 10:41 p.m. UTC | #53
On Tue, 31 May 2022 21:34:42 +0200 Jiri Pirko wrote:
> And again, for the record, I strongly believe that a separate dl
> instance for this does not make any sense at all :/ I wonder why you
> still think it does.

For purely software reuse reasons. I think the line cards will require
a lot of the same attributes as the full devlink instance, so making
them a subobject which can have all the same attributes is poor SW arch.
Think about it from OOP perspective, you'd definitely factor all that
stuff out to an abstract class. We can't do that in netlink but whatever
just make it a full dl instance and describe the link between the two.

Most NIC vendors (everyone excluding Netronome?) decided that devlink
instance is equivalent to a bus device which IIUC it was not supposed
to be. It was supposed to be the whole ASIC. If we're okay to stretch
the definition of a dl instance to be "any independently controllable
unit of HW" for NICs then IDK why we can't make a line card a dl
instance.

Are you afraid of hiding dependencies?
Jiri Pirko June 1, 2022, 7:35 a.m. UTC | #54
Wed, Jun 01, 2022 at 12:41:59AM CEST, kuba@kernel.org wrote:
>On Tue, 31 May 2022 21:34:42 +0200 Jiri Pirko wrote:
>> And again, for the record, I strongly believe that a separate dl
>> instance for this does not make any sense at all :/ I wonder why you
>> still think it does.
>
>For purely software reuse reasons. I think the line cards will require
>a lot of the same attributes as the full devlink instance, so making
>them a subobject which can have all the same attributes is poor SW arch.

Sure, I understand the motivation.


>Think about it from OOP perspective, you'd definitely factor all that
>stuff out to an abstract class. We can't do that in netlink but whatever
>just make it a full dl instance and describe the link between the two.
>
>Most NIC vendors (everyone excluding Netronome?) decided that devlink
>instance is equivalent to a bus device which IIUC it was not supposed
>to be. It was supposed to be the whole ASIC. If we're okay to stretch

I agree, that is incorrect. That is why I was thinking about sort of
"alias" to make it right (2 PF devlink instances would be one connected
by alias). Not implemented yet though :/


>the definition of a dl instance to be "any independently controllable
>unit of HW" for NICs then IDK why we can't make a line card a dl
>instance.

Well, it is not independently controllable. Well, truth is, that in our
current implementation, there is one independent "configuration", and
that is flash burn of the gearbox. It is done using a "tunnelling"
register which encapsulates register communication what is done during
flash burning.


>
>Are you afraid of hiding dependencies?

Not really, I'm just not sure I see it is worth the excercise.

In czech, we have this saying: "kanon na vrabce". I think that the
following picture is better than any translation :)
https://i.iinfo.cz/images/72/shutterstock-com-kanon-delo-ptak-vrabec-strilet-1.jpg

Will think about it some more.