mbox series

[net-next,0/3] ethtool: Max power support

Message ID 20240329092321.16843-1-wojciech.drewek@intel.com (mailing list archive)
Headers show
Series ethtool: Max power support | expand

Message

Wojciech Drewek March 29, 2024, 9:23 a.m. UTC
Some ethernet modules use nonstandard power levels [1]. Extend ethtool
module implementation to support new attributes that will allow user
to change maximum power. Rename structures and functions to be more
generic. Introduce an example of the new API in ice driver.

Ethtool examples:
$ ethtool --show-module enp1s0f0np0
Module parameters for enp1s0f0np0:
power-min-allowed: 1000 mW
power-max-allowed: 3000 mW
power-max-set: 1500 mW

$ ethtool --set-module enp1s0f0np0 power-max-set 4000

This idea was originally discussed here [2]

[1] https://www.fs.com/de-en/products/69111.html
[2] https://lore.kernel.org/netdev/MW4PR11MB57768054635E8DEF841BB2A9FDE3A@MW4PR11MB5776.namprd11.prod.outlook.com/

Wojciech Drewek (3):
  ethtool: Make module API more generic
  ethtool: Introduce max power support
  ice: Implement ethtool max power configuration

 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  21 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  46 ++
 drivers/net/ethernet/intel/ice/ice_common.h   |   3 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  |  14 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  | 461 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   2 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   4 +
 .../net/ethernet/mellanox/mlxsw/core_env.c    |   2 +-
 .../net/ethernet/mellanox/mlxsw/core_env.h    |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/minimal.c |   8 +-
 .../mellanox/mlxsw/spectrum_ethtool.c         |   8 +-
 include/linux/ethtool.h                       |  35 +-
 include/uapi/linux/ethtool_netlink.h          |   4 +
 net/ethtool/module.c                          | 102 +++-
 net/ethtool/netlink.h                         |   2 +-
 17 files changed, 669 insertions(+), 50 deletions(-)

Comments

Jakub Kicinski March 29, 2024, 10:16 p.m. UTC | #1
On Fri, 29 Mar 2024 10:23:18 +0100 Wojciech Drewek wrote:
> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
> module implementation to support new attributes that will allow user
> to change maximum power. Rename structures and functions to be more
> generic. Introduce an example of the new API in ice driver.

I'm no SFP expert but seems reasonable.

Would be good to insert more references to the SFP / CMIS specs
which describe the standard registers.

Also the series is suffering from lack of docs and spec, please
update both:

  Documentation/networking/ethtool-netlink.rst
  Documentation/netlink/specs/ethtool.yaml
Andrew Lunn March 30, 2024, 9:57 p.m. UTC | #2
On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
> module implementation to support new attributes that will allow user
> to change maximum power. Rename structures and functions to be more
> generic. Introduce an example of the new API in ice driver.
> 
> Ethtool examples:
> $ ethtool --show-module enp1s0f0np0
> Module parameters for enp1s0f0np0:
> power-min-allowed: 1000 mW
> power-max-allowed: 3000 mW
> power-max-set: 1500 mW
> 
> $ ethtool --set-module enp1s0f0np0 power-max-set 4000

We have had a device tree property for a long time:

  maximum-power-milliwatt:
    minimum: 1000
    default: 1000
    description:
      Maximum module power consumption Specifies the maximum power consumption
      allowable by a module in the slot, in milli-Watts. Presently, modules can
      be up to 1W, 1.5W or 2W.

Could you flip the name around to be consistent with DT?

> minimum-power-allowed: 1000 mW
> maximum-power-allowed: 3000 mW
> maximum-power-set: 1500 mW

Also, what does minimum-power-allowed actually tell us? Do you imagine
it will ever be below 1W because of bad board design? Do you have a
bad board design which does not allow 1W?

Also, this is about the board, the SFP cage, not the actual SFP
module?  Maybe the word cage needs to be in these names?

Do we want to be able to enumerate what the module itself supports?
If so, we need to include module in the name, to identify the numbers
are about the module, not the cage.

    Andrew
Wojciech Drewek April 2, 2024, 9:58 a.m. UTC | #3
On 29.03.2024 23:16, Jakub Kicinski wrote:
> On Fri, 29 Mar 2024 10:23:18 +0100 Wojciech Drewek wrote:
>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
>> module implementation to support new attributes that will allow user
>> to change maximum power. Rename structures and functions to be more
>> generic. Introduce an example of the new API in ice driver.
> 
> I'm no SFP expert but seems reasonable.
> 
> Would be good to insert more references to the SFP / CMIS specs
> which describe the standard registers.

Sure, I'll search for additional references.

> 
> Also the series is suffering from lack of docs and spec, please
> update both:
> 
>   Documentation/networking/ethtool-netlink.rst
>   Documentation/netlink/specs/ethtool.yaml


Sure thing, I'll write the documentation.
Wojciech Drewek April 2, 2024, 11:38 a.m. UTC | #4
On 30.03.2024 22:57, Andrew Lunn wrote:
> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
>> module implementation to support new attributes that will allow user
>> to change maximum power. Rename structures and functions to be more
>> generic. Introduce an example of the new API in ice driver.
>>
>> Ethtool examples:
>> $ ethtool --show-module enp1s0f0np0
>> Module parameters for enp1s0f0np0:
>> power-min-allowed: 1000 mW
>> power-max-allowed: 3000 mW
>> power-max-set: 1500 mW
>>
>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> 
> We have had a device tree property for a long time:
> 
>   maximum-power-milliwatt:
>     minimum: 1000
>     default: 1000
>     description:
>       Maximum module power consumption Specifies the maximum power consumption
>       allowable by a module in the slot, in milli-Watts. Presently, modules can
>       be up to 1W, 1.5W or 2W.
> 
> Could you flip the name around to be consistent with DT?

Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) 

> 
>> minimum-power-allowed: 1000 mW
>> maximum-power-allowed: 3000 mW
>> maximum-power-set: 1500 mW
> 
> Also, what does minimum-power-allowed actually tell us? Do you imagine
> it will ever be below 1W because of bad board design? Do you have a
> bad board design which does not allow 1W?

Yes. in case of QSFP we don't support 1W, 1.5W is the minimum.
This parameter tells the user what is the lowest limit he can set.

> 
> Also, this is about the board, the SFP cage, not the actual SFP
> module?  Maybe the word cage needs to be in these names?

It's about cage. Thanks for bringing it to my attention because now I
see it might be misleading. I'm extending {set|show}-module command
but the changes are about max power in the cage. With that in mind
I agree that adding 'cage' to the names makes sense.

> 
> Do we want to be able to enumerate what the module itself supports?
> If so, we need to include module in the name, to identify the numbers
> are about the module, not the cage.

I hope that my previous paragraph answers this as well.

> 
>     Andrew
Jakub Kicinski April 2, 2024, 2:25 p.m. UTC | #5
On Tue, 2 Apr 2024 13:38:59 +0200 Wojciech Drewek wrote:
> > Also, this is about the board, the SFP cage, not the actual SFP
> > module?  Maybe the word cage needs to be in these names?  
> 
> It's about cage. Thanks for bringing it to my attention because now I
> see it might be misleading. I'm extending {set|show}-module command
> but the changes are about max power in the cage. With that in mind
> I agree that adding 'cage' to the names makes sense.

Noob question, what happens if you plug a module with higher power
needs into the cage?
Andrew Lunn April 2, 2024, 2:46 p.m. UTC | #6
On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote:
> 
> 
> On 30.03.2024 22:57, Andrew Lunn wrote:
> > On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
> >> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
> >> module implementation to support new attributes that will allow user
> >> to change maximum power. Rename structures and functions to be more
> >> generic. Introduce an example of the new API in ice driver.
> >>
> >> Ethtool examples:
> >> $ ethtool --show-module enp1s0f0np0
> >> Module parameters for enp1s0f0np0:
> >> power-min-allowed: 1000 mW
> >> power-max-allowed: 3000 mW
> >> power-max-set: 1500 mW
> >>
> >> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> > 
> > We have had a device tree property for a long time:
> > 
> >   maximum-power-milliwatt:
> >     minimum: 1000
> >     default: 1000
> >     description:
> >       Maximum module power consumption Specifies the maximum power consumption
> >       allowable by a module in the slot, in milli-Watts. Presently, modules can
> >       be up to 1W, 1.5W or 2W.
> > 
> > Could you flip the name around to be consistent with DT?
> 
> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) 

That is a DT thing. Helps make the units of an ABI obvious. However,
milliwatts is pretty standard with the kernel of user APIs, e.g. all
hwmon calls use milliwatts.

> >> minimum-power-allowed: 1000 mW
> >> maximum-power-allowed: 3000 mW
> >> maximum-power-set: 1500 mW
> > 
> > Also, what does minimum-power-allowed actually tell us? Do you imagine
> > it will ever be below 1W because of bad board design? Do you have a
> > bad board design which does not allow 1W?
> 
> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum.

So if i plug in a 1W QSFP device, it will let the magic smoke out
because it is force fed 1.5W?

Looking at
https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table
7 it indicates different power budget classifications. Power level 1
is a Maximum power of 1.5W. So does your parameter represent this?  It
is the minimum maximum power? And your other parameter is the maximum
maximum power?

I agree with Jakub here, there needs to be documentation added
explaining in detail what these parameters mean, and ideally,
references to the specification.

Does

$ ethtool --set-module enp1s0f0np0 power-max-set 4000

actually talk to the SFP module and tell it the maximum power it can
consume. So in this case, it is not the cage, but the module?

Or is it talking to some entity which is managing the overall power
consumption of a number of cages, and asking it to allocate a maximum
of 4W to this cage. It might return an error message saying there is
no power budget left?

Or is it doing both?

Sorry to be picky, but at some point, somebody is going to want to
implement this in the Linux SFP driver, and we want a consistent
implementation cross different implementations.

	Andrew
Andrew Lunn April 2, 2024, 2:53 p.m. UTC | #7
On Tue, Apr 02, 2024 at 07:25:47AM -0700, Jakub Kicinski wrote:
> On Tue, 2 Apr 2024 13:38:59 +0200 Wojciech Drewek wrote:
> > > Also, this is about the board, the SFP cage, not the actual SFP
> > > module?  Maybe the word cage needs to be in these names?  
> > 
> > It's about cage. Thanks for bringing it to my attention because now I
> > see it might be misleading. I'm extending {set|show}-module command
> > but the changes are about max power in the cage. With that in mind
> > I agree that adding 'cage' to the names makes sense.
> 
> Noob question, what happens if you plug a module with higher power
> needs into the cage?

https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf

Section 3.2:

 It is recommended that the host, through the management interface,
 identify the power consumption class of the module before allowing the
 module to go into high power mode.

So it should start in lower power mode. Table 7 suggests the module
can assume 1.5W, since that is the lowest power level.

   Andrew
Jakub Kicinski April 2, 2024, 2:57 p.m. UTC | #8
On Tue, 2 Apr 2024 16:46:54 +0200 Andrew Lunn wrote:
> Looking at
> https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table
> 7 it indicates different power budget classifications. Power level 1
> is a Maximum power of 1.5W. So does your parameter represent this?  It
> is the minimum maximum power? And your other parameter is the maximum
> maximum power?
> 
> I agree with Jakub here, there needs to be documentation added
> explaining in detail what these parameters mean, and ideally,
> references to the specification.
> 
> Does
> 
> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> 
> actually talk to the SFP module and tell it the maximum power it can
> consume. So in this case, it is not the cage, but the module?
> 
> Or is it talking to some entity which is managing the overall power
> consumption of a number of cages, and asking it to allocate a maximum
> of 4W to this cage. It might return an error message saying there is
> no power budget left?
> 
> Or is it doing both?
> 
> Sorry to be picky, but at some point, somebody is going to want to
> implement this in the Linux SFP driver, and we want a consistent
> implementation cross different implementations.

Or "guessing how things work" another way of putting this would be -
please go investigate what exactly the FW will do with these values.
Wojciech Drewek April 3, 2024, 1:18 p.m. UTC | #9
On 02.04.2024 16:46, Andrew Lunn wrote:
> On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote:
>>
>>
>> On 30.03.2024 22:57, Andrew Lunn wrote:
>>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
>>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
>>>> module implementation to support new attributes that will allow user
>>>> to change maximum power. Rename structures and functions to be more
>>>> generic. Introduce an example of the new API in ice driver.
>>>>
>>>> Ethtool examples:
>>>> $ ethtool --show-module enp1s0f0np0
>>>> Module parameters for enp1s0f0np0:
>>>> power-min-allowed: 1000 mW
>>>> power-max-allowed: 3000 mW
>>>> power-max-set: 1500 mW
>>>>
>>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
>>>
>>> We have had a device tree property for a long time:
>>>
>>>   maximum-power-milliwatt:
>>>     minimum: 1000
>>>     default: 1000
>>>     description:
>>>       Maximum module power consumption Specifies the maximum power consumption
>>>       allowable by a module in the slot, in milli-Watts. Presently, modules can
>>>       be up to 1W, 1.5W or 2W.
>>>
>>> Could you flip the name around to be consistent with DT?
>>
>> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) 
> 
> That is a DT thing. Helps make the units of an ABI obvious. However,
> milliwatts is pretty standard with the kernel of user APIs, e.g. all
> hwmon calls use milliwatts.
> 
>>>> minimum-power-allowed: 1000 mW
>>>> maximum-power-allowed: 3000 mW
>>>> maximum-power-set: 1500 mW
>>>
>>> Also, what does minimum-power-allowed actually tell us? Do you imagine
>>> it will ever be below 1W because of bad board design? Do you have a
>>> bad board design which does not allow 1W?
>>
>> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum.
> 
> So if i plug in a 1W QSFP device, it will let the magic smoke out
> because it is force fed 1.5W?
> 
> Looking at
> https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table
> 7 it indicates different power budget classifications. Power level 1
> is a Maximum power of 1.5W. So does your parameter represent this?  It
> is the minimum maximum power? And your other parameter is the maximum
> maximum power?

Exactly as you described, minimum-power-allowed is in fact minimum value
which maximum-power-set can be set to (so minimum maximum). the other
parameter is maximim maximum.

> 
> I agree with Jakub here, there needs to be documentation added
> explaining in detail what these parameters mean, and ideally,
> references to the specification.

I completely agree, I'll include documentation in the next version.
I see now that those parameters might look confusing, minimum-power-allowed
is not true minimum in fact. We can try to came up with better names
but it might get silly (minimum-maximum-power) XD.

> 
> Does
> 
> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> 
> actually talk to the SFP module and tell it the maximum power it can
> consume. So in this case, it is not the cage, but the module?

It does not work that way in ice example.

> 
> Or is it talking to some entity which is managing the overall power
> consumption of a number of cages, and asking it to allocate a maximum
> of 4W to this cage. It might return an error message saying there is
> no power budget left?

That's right, we talk to firmware to set those restrictions.
In the ice implementation, the driver is responsible for checking if the
overall board budget is not exceeded.

> 
> Or is it doing both?
> 
> Sorry to be picky, but at some point, somebody is going to want to
> implement this in the Linux SFP driver, and we want a consistent
> implementation cross different implementations.

No problem, I see that your points are totally valid.

> 
> 	Andrew
Andrew Lunn April 3, 2024, 1:40 p.m. UTC | #10
On Wed, Apr 03, 2024 at 03:18:44PM +0200, Wojciech Drewek wrote:
> 
> 
> On 02.04.2024 16:46, Andrew Lunn wrote:
> > On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote:
> >>
> >>
> >> On 30.03.2024 22:57, Andrew Lunn wrote:
> >>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
> >>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
> >>>> module implementation to support new attributes that will allow user
> >>>> to change maximum power. Rename structures and functions to be more
> >>>> generic. Introduce an example of the new API in ice driver.
> >>>>
> >>>> Ethtool examples:
> >>>> $ ethtool --show-module enp1s0f0np0
> >>>> Module parameters for enp1s0f0np0:
> >>>> power-min-allowed: 1000 mW
> >>>> power-max-allowed: 3000 mW
> >>>> power-max-set: 1500 mW
> >>>>
> >>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> >>>
> >>> We have had a device tree property for a long time:
> >>>
> >>>   maximum-power-milliwatt:
> >>>     minimum: 1000
> >>>     default: 1000
> >>>     description:
> >>>       Maximum module power consumption Specifies the maximum power consumption
> >>>       allowable by a module in the slot, in milli-Watts. Presently, modules can
> >>>       be up to 1W, 1.5W or 2W.
> >>>
> >>> Could you flip the name around to be consistent with DT?
> >>
> >> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) 
> > 
> > That is a DT thing. Helps make the units of an ABI obvious. However,
> > milliwatts is pretty standard with the kernel of user APIs, e.g. all
> > hwmon calls use milliwatts.
> > 
> >>>> minimum-power-allowed: 1000 mW
> >>>> maximum-power-allowed: 3000 mW
> >>>> maximum-power-set: 1500 mW
> >>>
> >>> Also, what does minimum-power-allowed actually tell us? Do you imagine
> >>> it will ever be below 1W because of bad board design? Do you have a
> >>> bad board design which does not allow 1W?
> >>
> >> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum.
> > 
> > So if i plug in a 1W QSFP device, it will let the magic smoke out
> > because it is force fed 1.5W?
> > 
> > Looking at
> > https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table
> > 7 it indicates different power budget classifications. Power level 1
> > is a Maximum power of 1.5W. So does your parameter represent this?  It
> > is the minimum maximum power? And your other parameter is the maximum
> > maximum power?
> 
> Exactly as you described, minimum-power-allowed is in fact minimum value
> which maximum-power-set can be set to (so minimum maximum). the other
> parameter is maximim maximum.

Table 7 in that document is titled "Power Budget Classification". So
how about

minimum-power-class-allowed: 1000 mW
maximum-power-class-allowed: 3000 mW

	Andrew
Andrew Lunn April 3, 2024, 1:49 p.m. UTC | #11
> > $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> > 
> > actually talk to the SFP module and tell it the maximum power it can
> > consume. So in this case, it is not the cage, but the module?
> 
> It does not work that way in ice example.
> > 
> > Or is it talking to some entity which is managing the overall power
> > consumption of a number of cages, and asking it to allocate a maximum
> > of 4W to this cage. It might return an error message saying there is
> > no power budget left?
> 
> That's right, we talk to firmware to set those restrictions.
> In the ice implementation, the driver is responsible for checking if the
> overall board budget is not exceeded.

So i can get the board to agree that the cage can supply 3W to the
module, but how do i then tell the module this?

I would also suggest you don't focus too much on ICE. I find it better
to think about an abstract system. A board with a power supply to a
number of SFP cages, and some cages have modules in them. What does
the kAPI look like, the use cases for this abstract system.

    Andrew
Wojciech Drewek April 4, 2024, 12:21 p.m. UTC | #12
On 03.04.2024 15:40, Andrew Lunn wrote:
> On Wed, Apr 03, 2024 at 03:18:44PM +0200, Wojciech Drewek wrote:
>>
>>
>> On 02.04.2024 16:46, Andrew Lunn wrote:
>>> On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote:
>>>>
>>>>
>>>> On 30.03.2024 22:57, Andrew Lunn wrote:
>>>>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote:
>>>>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool
>>>>>> module implementation to support new attributes that will allow user
>>>>>> to change maximum power. Rename structures and functions to be more
>>>>>> generic. Introduce an example of the new API in ice driver.
>>>>>>
>>>>>> Ethtool examples:
>>>>>> $ ethtool --show-module enp1s0f0np0
>>>>>> Module parameters for enp1s0f0np0:
>>>>>> power-min-allowed: 1000 mW
>>>>>> power-max-allowed: 3000 mW
>>>>>> power-max-set: 1500 mW
>>>>>>
>>>>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
>>>>>
>>>>> We have had a device tree property for a long time:
>>>>>
>>>>>   maximum-power-milliwatt:
>>>>>     minimum: 1000
>>>>>     default: 1000
>>>>>     description:
>>>>>       Maximum module power consumption Specifies the maximum power consumption
>>>>>       allowable by a module in the slot, in milli-Watts. Presently, modules can
>>>>>       be up to 1W, 1.5W or 2W.
>>>>>
>>>>> Could you flip the name around to be consistent with DT?
>>>>
>>>> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) 
>>>
>>> That is a DT thing. Helps make the units of an ABI obvious. However,
>>> milliwatts is pretty standard with the kernel of user APIs, e.g. all
>>> hwmon calls use milliwatts.
>>>
>>>>>> minimum-power-allowed: 1000 mW
>>>>>> maximum-power-allowed: 3000 mW
>>>>>> maximum-power-set: 1500 mW
>>>>>
>>>>> Also, what does minimum-power-allowed actually tell us? Do you imagine
>>>>> it will ever be below 1W because of bad board design? Do you have a
>>>>> bad board design which does not allow 1W?
>>>>
>>>> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum.
>>>
>>> So if i plug in a 1W QSFP device, it will let the magic smoke out
>>> because it is force fed 1.5W?
>>>
>>> Looking at
>>> https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table
>>> 7 it indicates different power budget classifications. Power level 1
>>> is a Maximum power of 1.5W. So does your parameter represent this?  It
>>> is the minimum maximum power? And your other parameter is the maximum
>>> maximum power?
>>
>> Exactly as you described, minimum-power-allowed is in fact minimum value
>> which maximum-power-set can be set to (so minimum maximum). the other
>> parameter is maximim maximum.
> 
> Table 7 in that document is titled "Power Budget Classification". So
> how about
> 
> minimum-power-class-allowed: 1000 mW
> maximum-power-class-allowed: 3000 mW

Sounds good

> 
> 	Andrew
Wojciech Drewek April 4, 2024, 12:45 p.m. UTC | #13
On 03.04.2024 15:49, Andrew Lunn wrote:
>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
>>>
>>> actually talk to the SFP module and tell it the maximum power it can
>>> consume. So in this case, it is not the cage, but the module?
>>
>> It does not work that way in ice example.
>>>
>>> Or is it talking to some entity which is managing the overall power
>>> consumption of a number of cages, and asking it to allocate a maximum
>>> of 4W to this cage. It might return an error message saying there is
>>> no power budget left?
>>
>> That's right, we talk to firmware to set those restrictions.
>> In the ice implementation, the driver is responsible for checking if the
>> overall board budget is not exceeded.
> 
> So i can get the board to agree that the cage can supply 3W to the
> module, but how do i then tell the module this?

I'd assume it is not possible, if the module consumes more power
than maximum than the link will not come up and error will be printed.

> 
> I would also suggest you don't focus too much on ICE. I find it better
> to think about an abstract system. A board with a power supply to a
> number of SFP cages, and some cages have modules in them. What does
> the kAPI look like, the use cases for this abstract system.

My design for this API is to have an option to get and set maximum
power that the module in the cage can consume. It's not about modifying
module's power consumption, it's about setting restrictions for it.

The use case is to let the user change maximum power in the given cage
(so he can plug in the module with higher power consumption). Before that
he will lower maximum power in different cage. Thanks to that the overall
budget for the board won't be exceeded. Does it make sense for the abstract
system you described?

> 
>     Andrew
> 
> 
>
Andrew Lunn April 4, 2024, 1:53 p.m. UTC | #14
On Thu, Apr 04, 2024 at 02:45:43PM +0200, Wojciech Drewek wrote:
> 
> 
> On 03.04.2024 15:49, Andrew Lunn wrote:
> >>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
> >>>
> >>> actually talk to the SFP module and tell it the maximum power it can
> >>> consume. So in this case, it is not the cage, but the module?
> >>
> >> It does not work that way in ice example.
> >>>
> >>> Or is it talking to some entity which is managing the overall power
> >>> consumption of a number of cages, and asking it to allocate a maximum
> >>> of 4W to this cage. It might return an error message saying there is
> >>> no power budget left?
> >>
> >> That's right, we talk to firmware to set those restrictions.
> >> In the ice implementation, the driver is responsible for checking if the
> >> overall board budget is not exceeded.
> > 
> > So i can get the board to agree that the cage can supply 3W to the
> > module, but how do i then tell the module this?
> 
> I'd assume it is not possible, if the module consumes more power
> than maximum than the link will not come up and error will be printed.

Take a look at the Linux SFP driver. In sfp_probe() is reads the DT
property maximum-power-milliwatt:

https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L3030

When the module is inserted and probed, the modules power capabilities
are read from the EEPROM:

https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L2320

https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1929

The code looks to see what conformance level the module has, so to
know if it even supports different power levels, and the registers
needed have been implemented.

Later, the SFP state machine will transition the module to higher
power:
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1995

by writing a register in the SFP.

> > I would also suggest you don't focus too much on ICE. I find it better
> > to think about an abstract system. A board with a power supply to a
> > number of SFP cages, and some cages have modules in them. What does
> > the kAPI look like, the use cases for this abstract system.
> 
> My design for this API is to have an option to get and set maximum
> power that the module in the cage can consume. It's not about modifying
> module's power consumption, it's about setting restrictions for it.
> 
> The use case is to let the user change maximum power in the given cage
> (so he can plug in the module with higher power consumption). Before that
> he will lower maximum power in different cage. Thanks to that the overall
> budget for the board won't be exceeded. Does it make sense for the abstract
> system you described?

So there are a few different phases here. The standard says the module
start up in low power mode.

Something needs to enumerate what the module actually supports in
terms of different power modes.

Something then needs to determine if the board/cage can support higher
power operation, that there is sufficient power budged. Budget then
needs to be allocated to the cage.

Lastly, the module needs to be told it can go to a higher power level.

I would say the current Linux SFP code is overly simple, by design. It
supports the concept of cage supplied by a dedicated regulator. There
is no sharing of power across a number of cages. Hence it just ups the
power if the board indicates the power is available and the module
support a higher power level.

However, you have a more complex setup, shared power supplies,
etc. The policy of what modules gets how much power should come from
user space. So we need user space APIs for this, and a clear
understanding of how they work. Please could you describe how i would
use ethtool to go through these phases. And how do i down grade a
modules power consumption to make more power available to another
module.

	Andrew
Wojciech Drewek April 9, 2024, 12:20 p.m. UTC | #15
On 04.04.2024 15:53, Andrew Lunn wrote:
> On Thu, Apr 04, 2024 at 02:45:43PM +0200, Wojciech Drewek wrote:
>>
>>
>> On 03.04.2024 15:49, Andrew Lunn wrote:
>>>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000
>>>>>
>>>>> actually talk to the SFP module and tell it the maximum power it can
>>>>> consume. So in this case, it is not the cage, but the module?
>>>>
>>>> It does not work that way in ice example.
>>>>>
>>>>> Or is it talking to some entity which is managing the overall power
>>>>> consumption of a number of cages, and asking it to allocate a maximum
>>>>> of 4W to this cage. It might return an error message saying there is
>>>>> no power budget left?
>>>>
>>>> That's right, we talk to firmware to set those restrictions.
>>>> In the ice implementation, the driver is responsible for checking if the
>>>> overall board budget is not exceeded.
>>>
>>> So i can get the board to agree that the cage can supply 3W to the
>>> module, but how do i then tell the module this?
>>
>> I'd assume it is not possible, if the module consumes more power
>> than maximum than the link will not come up and error will be printed.
> 
> Take a look at the Linux SFP driver. In sfp_probe() is reads the DT
> property maximum-power-milliwatt:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L3030
> 
> When the module is inserted and probed, the modules power capabilities
> are read from the EEPROM:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L2320
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1929
> 
> The code looks to see what conformance level the module has, so to
> know if it even supports different power levels, and the registers
> needed have been implemented.
> 
> Later, the SFP state machine will transition the module to higher
> power:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1995
> 
> by writing a register in the SFP.
> 
>>> I would also suggest you don't focus too much on ICE. I find it better
>>> to think about an abstract system. A board with a power supply to a
>>> number of SFP cages, and some cages have modules in them. What does
>>> the kAPI look like, the use cases for this abstract system.
>>
>> My design for this API is to have an option to get and set maximum
>> power that the module in the cage can consume. It's not about modifying
>> module's power consumption, it's about setting restrictions for it.
>>
>> The use case is to let the user change maximum power in the given cage
>> (so he can plug in the module with higher power consumption). Before that
>> he will lower maximum power in different cage. Thanks to that the overall
>> budget for the board won't be exceeded. Does it make sense for the abstract
>> system you described?
> 
> So there are a few different phases here. The standard says the module
> start up in low power mode.
> 
> Something needs to enumerate what the module actually supports in
> terms of different power modes.

So, this is something that would require new netlink attribute.
From what you described, SFP driver is capable of reading module's
power capabilities so it could be implemented there.

> 
> Something then needs to determine if the board/cage can support higher
> power operation, that there is sufficient power budged. Budget then
> needs to be allocated to the cage.

This is something my current design supports I think. Using
ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports
and change it.

> 
> Lastly, the module needs to be told it can go to a higher power level.

This could be done using ethtool_module_power_mode_policy I think.

> 
> I would say the current Linux SFP code is overly simple, by design. It
> supports the concept of cage supplied by a dedicated regulator. There
> is no sharing of power across a number of cages. Hence it just ups the
> power if the board indicates the power is available and the module
> support a higher power level.
> 
> However, you have a more complex setup, shared power supplies,
> etc. The policy of what modules gets how much power should come from
> user space. So we need user space APIs for this, and a clear
> understanding of how they work. Please could you describe how i would
> use ethtool to go through these phases. And how do i down grade a
> modules power consumption to make more power available to another
> module.

I described how I see it for each phase above. 
Downgrading module's power consumption is not possible with my initial
design. You can downgrade max power for the cage and if the module
plugged in to it consumes more power than error will be printed and
link will not be established.
In the end it looks like we would need new netlink attribute to support
modification of module's power consumption.

> 
> 	Andrew
> 
>
Andrew Lunn April 9, 2024, 1:39 p.m. UTC | #16
> This is something my current design supports I think. Using
> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports
> and change it.
 
> This could be done using ethtool_module_power_mode_policy I think.

All these 'I think' don't give me a warm fuzzy feeling this is a well
thought out and designed uAPI.

I assume you have ethtool patches for your new netlink attributes. So
show us the real usage. Start with an SFP in its default lower power
mode. Show us the commands to display the current status. Allocate it
more power, tell the module it can use more power, and then show us
the status after the change has been made.

Then lower the power to that cage and assign the power to a different
cage.

This is something you can later reuse in the 0/X patch describing the
big picture of what the patchset does, and it will guide others who
want to implement the same API in the Linux SFP driver, or other MAC
drivers.

	Andrew
Wojciech Drewek April 12, 2024, 1:21 p.m. UTC | #17
On 09.04.2024 15:39, Andrew Lunn wrote:
>> This is something my current design supports I think. Using
>> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports
>> and change it.
>  
>> This could be done using ethtool_module_power_mode_policy I think.
> 
> All these 'I think' don't give me a warm fuzzy feeling this is a well
> thought out and designed uAPI.
> 
> I assume you have ethtool patches for your new netlink attributes. So
> show us the real usage. Start with an SFP in its default lower power
> mode. Show us the commands to display the current status. Allocate it
> more power, tell the module it can use more power, and then show us
> the status after the change has been made.

Ok, but do we really need an API to switch the module between high/low power mode?
I'd assume this is done automatically e.g. when we lower max power in the cage below
module's high power mode than it should imply that the module should go to low power mode.
Same with the high power mode, if enough power is assigned to the cage than module
should go to the high power mode.

Regarding the current status and what module supports, there is -m option:
$ ethtool -m ens801f0np0
        Identifier                                : 0x0d (QSFP+)
        Extended identifier                       : 0x00
        Extended identifier description           : 1.5W max. Power consumption
        Extended identifier description           : No CDR in TX, No CDR in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled

> 
> Then lower the power to that cage and assign the power to a different
> cage.
> 
> This is something you can later reuse in the 0/X patch describing the
> big picture of what the patchset does, and it will guide others who
> want to implement the same API in the Linux SFP driver, or other MAC
> drivers.
> 
> 	Andrew
Andrew Lunn April 15, 2024, 10:03 p.m. UTC | #18
On Fri, Apr 12, 2024 at 03:21:24PM +0200, Wojciech Drewek wrote:
> 
> 
> On 09.04.2024 15:39, Andrew Lunn wrote:
> >> This is something my current design supports I think. Using
> >> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports
> >> and change it.
> >  
> >> This could be done using ethtool_module_power_mode_policy I think.
> > 
> > All these 'I think' don't give me a warm fuzzy feeling this is a well
> > thought out and designed uAPI.
> > 
> > I assume you have ethtool patches for your new netlink attributes. So
> > show us the real usage. Start with an SFP in its default lower power
> > mode. Show us the commands to display the current status. Allocate it
> > more power, tell the module it can use more power, and then show us
> > the status after the change has been made.
> 
> Ok, but do we really need an API to switch the module between high/low power mode?

Probably not. But you need to document that the API you are adding is
also expected to talk to the module and tell it to use more/less
power.

> Regarding the current status and what module supports, there is -m option:
> $ ethtool -m ens801f0np0
>         Identifier                                : 0x0d (QSFP+)
>         Extended identifier                       : 0x00
>         Extended identifier description           : 1.5W max. Power consumption
>         Extended identifier description           : No CDR in TX, No CDR in RX
>         Extended identifier description           : High Power Class (> 3.5 W) not enabled

So you can make this part of your commit message. Show this. Invoke
your new ethtool option, then show this again with the module
reporting a higher power consumption. The reduce the power using
ethtool and show the power consumption has reduced.

Also, in the ethtool-netlink.rst file, clearly document what the API
is doing, so that somebody else can implement it for another device.

Please also document hotplug behaviour. Say I use your new API to
increase the power to 3.5W. I then eject the module. Does the
available power automatically get put back into the pool? When i
reinsert the module, it will be in low power class, and i need to
issue the ethtool command again to increase its power?

   Andrew
Wojciech Drewek April 18, 2024, 11:48 a.m. UTC | #19
On 16.04.2024 00:03, Andrew Lunn wrote:
> On Fri, Apr 12, 2024 at 03:21:24PM +0200, Wojciech Drewek wrote:
>>
>>
>> On 09.04.2024 15:39, Andrew Lunn wrote:
>>>> This is something my current design supports I think. Using
>>>> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports
>>>> and change it.
>>>  
>>>> This could be done using ethtool_module_power_mode_policy I think.
>>>
>>> All these 'I think' don't give me a warm fuzzy feeling this is a well
>>> thought out and designed uAPI.
>>>
>>> I assume you have ethtool patches for your new netlink attributes. So
>>> show us the real usage. Start with an SFP in its default lower power
>>> mode. Show us the commands to display the current status. Allocate it
>>> more power, tell the module it can use more power, and then show us
>>> the status after the change has been made.
>>
>> Ok, but do we really need an API to switch the module between high/low power mode?
> 
> Probably not. But you need to document that the API you are adding is
> also expected to talk to the module and tell it to use more/less
> power.
> 
>> Regarding the current status and what module supports, there is -m option:
>> $ ethtool -m ens801f0np0
>>         Identifier                                : 0x0d (QSFP+)
>>         Extended identifier                       : 0x00
>>         Extended identifier description           : 1.5W max. Power consumption
>>         Extended identifier description           : No CDR in TX, No CDR in RX
>>         Extended identifier description           : High Power Class (> 3.5 W) not enabled
> 
> So you can make this part of your commit message. Show this. Invoke
> your new ethtool option, then show this again with the module
> reporting a higher power consumption. The reduce the power using
> ethtool and show the power consumption has reduced.
> 
> Also, in the ethtool-netlink.rst file, clearly document what the API
> is doing, so that somebody else can implement it for another device.
> 
> Please also document hotplug behaviour. Say I use your new API to
> increase the power to 3.5W. I then eject the module. Does the
> available power automatically get put back into the pool? When i
> reinsert the module, it will be in low power class, and i need to
> issue the ethtool command again to increase its power?

Ok, I'll answer all of those questions in the documentation included
in the 2nd version of the patchset.

> 
>    Andrew
>