Message ID | 20240329092321.16843-1-wojciech.drewek@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | ethtool: Max power support | expand |
On Fri, 29 Mar 2024 10:23:18 +0100 Wojciech Drewek wrote: > Some ethernet modules use nonstandard power levels [1]. Extend ethtool > module implementation to support new attributes that will allow user > to change maximum power. Rename structures and functions to be more > generic. Introduce an example of the new API in ice driver. I'm no SFP expert but seems reasonable. Would be good to insert more references to the SFP / CMIS specs which describe the standard registers. Also the series is suffering from lack of docs and spec, please update both: Documentation/networking/ethtool-netlink.rst Documentation/netlink/specs/ethtool.yaml
On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: > Some ethernet modules use nonstandard power levels [1]. Extend ethtool > module implementation to support new attributes that will allow user > to change maximum power. Rename structures and functions to be more > generic. Introduce an example of the new API in ice driver. > > Ethtool examples: > $ ethtool --show-module enp1s0f0np0 > Module parameters for enp1s0f0np0: > power-min-allowed: 1000 mW > power-max-allowed: 3000 mW > power-max-set: 1500 mW > > $ ethtool --set-module enp1s0f0np0 power-max-set 4000 We have had a device tree property for a long time: maximum-power-milliwatt: minimum: 1000 default: 1000 description: Maximum module power consumption Specifies the maximum power consumption allowable by a module in the slot, in milli-Watts. Presently, modules can be up to 1W, 1.5W or 2W. Could you flip the name around to be consistent with DT? > minimum-power-allowed: 1000 mW > maximum-power-allowed: 3000 mW > maximum-power-set: 1500 mW Also, what does minimum-power-allowed actually tell us? Do you imagine it will ever be below 1W because of bad board design? Do you have a bad board design which does not allow 1W? Also, this is about the board, the SFP cage, not the actual SFP module? Maybe the word cage needs to be in these names? Do we want to be able to enumerate what the module itself supports? If so, we need to include module in the name, to identify the numbers are about the module, not the cage. Andrew
On 29.03.2024 23:16, Jakub Kicinski wrote: > On Fri, 29 Mar 2024 10:23:18 +0100 Wojciech Drewek wrote: >> Some ethernet modules use nonstandard power levels [1]. Extend ethtool >> module implementation to support new attributes that will allow user >> to change maximum power. Rename structures and functions to be more >> generic. Introduce an example of the new API in ice driver. > > I'm no SFP expert but seems reasonable. > > Would be good to insert more references to the SFP / CMIS specs > which describe the standard registers. Sure, I'll search for additional references. > > Also the series is suffering from lack of docs and spec, please > update both: > > Documentation/networking/ethtool-netlink.rst > Documentation/netlink/specs/ethtool.yaml Sure thing, I'll write the documentation.
On 30.03.2024 22:57, Andrew Lunn wrote: > On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: >> Some ethernet modules use nonstandard power levels [1]. Extend ethtool >> module implementation to support new attributes that will allow user >> to change maximum power. Rename structures and functions to be more >> generic. Introduce an example of the new API in ice driver. >> >> Ethtool examples: >> $ ethtool --show-module enp1s0f0np0 >> Module parameters for enp1s0f0np0: >> power-min-allowed: 1000 mW >> power-max-allowed: 3000 mW >> power-max-set: 1500 mW >> >> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > > We have had a device tree property for a long time: > > maximum-power-milliwatt: > minimum: 1000 > default: 1000 > description: > Maximum module power consumption Specifies the maximum power consumption > allowable by a module in the slot, in milli-Watts. Presently, modules can > be up to 1W, 1.5W or 2W. > > Could you flip the name around to be consistent with DT? Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) > >> minimum-power-allowed: 1000 mW >> maximum-power-allowed: 3000 mW >> maximum-power-set: 1500 mW > > Also, what does minimum-power-allowed actually tell us? Do you imagine > it will ever be below 1W because of bad board design? Do you have a > bad board design which does not allow 1W? Yes. in case of QSFP we don't support 1W, 1.5W is the minimum. This parameter tells the user what is the lowest limit he can set. > > Also, this is about the board, the SFP cage, not the actual SFP > module? Maybe the word cage needs to be in these names? It's about cage. Thanks for bringing it to my attention because now I see it might be misleading. I'm extending {set|show}-module command but the changes are about max power in the cage. With that in mind I agree that adding 'cage' to the names makes sense. > > Do we want to be able to enumerate what the module itself supports? > If so, we need to include module in the name, to identify the numbers > are about the module, not the cage. I hope that my previous paragraph answers this as well. > > Andrew
On Tue, 2 Apr 2024 13:38:59 +0200 Wojciech Drewek wrote: > > Also, this is about the board, the SFP cage, not the actual SFP > > module? Maybe the word cage needs to be in these names? > > It's about cage. Thanks for bringing it to my attention because now I > see it might be misleading. I'm extending {set|show}-module command > but the changes are about max power in the cage. With that in mind > I agree that adding 'cage' to the names makes sense. Noob question, what happens if you plug a module with higher power needs into the cage?
On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote: > > > On 30.03.2024 22:57, Andrew Lunn wrote: > > On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: > >> Some ethernet modules use nonstandard power levels [1]. Extend ethtool > >> module implementation to support new attributes that will allow user > >> to change maximum power. Rename structures and functions to be more > >> generic. Introduce an example of the new API in ice driver. > >> > >> Ethtool examples: > >> $ ethtool --show-module enp1s0f0np0 > >> Module parameters for enp1s0f0np0: > >> power-min-allowed: 1000 mW > >> power-max-allowed: 3000 mW > >> power-max-set: 1500 mW > >> > >> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > > > > We have had a device tree property for a long time: > > > > maximum-power-milliwatt: > > minimum: 1000 > > default: 1000 > > description: > > Maximum module power consumption Specifies the maximum power consumption > > allowable by a module in the slot, in milli-Watts. Presently, modules can > > be up to 1W, 1.5W or 2W. > > > > Could you flip the name around to be consistent with DT? > > Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) That is a DT thing. Helps make the units of an ABI obvious. However, milliwatts is pretty standard with the kernel of user APIs, e.g. all hwmon calls use milliwatts. > >> minimum-power-allowed: 1000 mW > >> maximum-power-allowed: 3000 mW > >> maximum-power-set: 1500 mW > > > > Also, what does minimum-power-allowed actually tell us? Do you imagine > > it will ever be below 1W because of bad board design? Do you have a > > bad board design which does not allow 1W? > > Yes. in case of QSFP we don't support 1W, 1.5W is the minimum. So if i plug in a 1W QSFP device, it will let the magic smoke out because it is force fed 1.5W? Looking at https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table 7 it indicates different power budget classifications. Power level 1 is a Maximum power of 1.5W. So does your parameter represent this? It is the minimum maximum power? And your other parameter is the maximum maximum power? I agree with Jakub here, there needs to be documentation added explaining in detail what these parameters mean, and ideally, references to the specification. Does $ ethtool --set-module enp1s0f0np0 power-max-set 4000 actually talk to the SFP module and tell it the maximum power it can consume. So in this case, it is not the cage, but the module? Or is it talking to some entity which is managing the overall power consumption of a number of cages, and asking it to allocate a maximum of 4W to this cage. It might return an error message saying there is no power budget left? Or is it doing both? Sorry to be picky, but at some point, somebody is going to want to implement this in the Linux SFP driver, and we want a consistent implementation cross different implementations. Andrew
On Tue, Apr 02, 2024 at 07:25:47AM -0700, Jakub Kicinski wrote: > On Tue, 2 Apr 2024 13:38:59 +0200 Wojciech Drewek wrote: > > > Also, this is about the board, the SFP cage, not the actual SFP > > > module? Maybe the word cage needs to be in these names? > > > > It's about cage. Thanks for bringing it to my attention because now I > > see it might be misleading. I'm extending {set|show}-module command > > but the changes are about max power in the cage. With that in mind > > I agree that adding 'cage' to the names makes sense. > > Noob question, what happens if you plug a module with higher power > needs into the cage? https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf Section 3.2: It is recommended that the host, through the management interface, identify the power consumption class of the module before allowing the module to go into high power mode. So it should start in lower power mode. Table 7 suggests the module can assume 1.5W, since that is the lowest power level. Andrew
On Tue, 2 Apr 2024 16:46:54 +0200 Andrew Lunn wrote: > Looking at > https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table > 7 it indicates different power budget classifications. Power level 1 > is a Maximum power of 1.5W. So does your parameter represent this? It > is the minimum maximum power? And your other parameter is the maximum > maximum power? > > I agree with Jakub here, there needs to be documentation added > explaining in detail what these parameters mean, and ideally, > references to the specification. > > Does > > $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > > actually talk to the SFP module and tell it the maximum power it can > consume. So in this case, it is not the cage, but the module? > > Or is it talking to some entity which is managing the overall power > consumption of a number of cages, and asking it to allocate a maximum > of 4W to this cage. It might return an error message saying there is > no power budget left? > > Or is it doing both? > > Sorry to be picky, but at some point, somebody is going to want to > implement this in the Linux SFP driver, and we want a consistent > implementation cross different implementations. Or "guessing how things work" another way of putting this would be - please go investigate what exactly the FW will do with these values.
On 02.04.2024 16:46, Andrew Lunn wrote: > On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote: >> >> >> On 30.03.2024 22:57, Andrew Lunn wrote: >>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: >>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool >>>> module implementation to support new attributes that will allow user >>>> to change maximum power. Rename structures and functions to be more >>>> generic. Introduce an example of the new API in ice driver. >>>> >>>> Ethtool examples: >>>> $ ethtool --show-module enp1s0f0np0 >>>> Module parameters for enp1s0f0np0: >>>> power-min-allowed: 1000 mW >>>> power-max-allowed: 3000 mW >>>> power-max-set: 1500 mW >>>> >>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 >>> >>> We have had a device tree property for a long time: >>> >>> maximum-power-milliwatt: >>> minimum: 1000 >>> default: 1000 >>> description: >>> Maximum module power consumption Specifies the maximum power consumption >>> allowable by a module in the slot, in milli-Watts. Presently, modules can >>> be up to 1W, 1.5W or 2W. >>> >>> Could you flip the name around to be consistent with DT? >> >> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) > > That is a DT thing. Helps make the units of an ABI obvious. However, > milliwatts is pretty standard with the kernel of user APIs, e.g. all > hwmon calls use milliwatts. > >>>> minimum-power-allowed: 1000 mW >>>> maximum-power-allowed: 3000 mW >>>> maximum-power-set: 1500 mW >>> >>> Also, what does minimum-power-allowed actually tell us? Do you imagine >>> it will ever be below 1W because of bad board design? Do you have a >>> bad board design which does not allow 1W? >> >> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum. > > So if i plug in a 1W QSFP device, it will let the magic smoke out > because it is force fed 1.5W? > > Looking at > https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table > 7 it indicates different power budget classifications. Power level 1 > is a Maximum power of 1.5W. So does your parameter represent this? It > is the minimum maximum power? And your other parameter is the maximum > maximum power? Exactly as you described, minimum-power-allowed is in fact minimum value which maximum-power-set can be set to (so minimum maximum). the other parameter is maximim maximum. > > I agree with Jakub here, there needs to be documentation added > explaining in detail what these parameters mean, and ideally, > references to the specification. I completely agree, I'll include documentation in the next version. I see now that those parameters might look confusing, minimum-power-allowed is not true minimum in fact. We can try to came up with better names but it might get silly (minimum-maximum-power) XD. > > Does > > $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > > actually talk to the SFP module and tell it the maximum power it can > consume. So in this case, it is not the cage, but the module? It does not work that way in ice example. > > Or is it talking to some entity which is managing the overall power > consumption of a number of cages, and asking it to allocate a maximum > of 4W to this cage. It might return an error message saying there is > no power budget left? That's right, we talk to firmware to set those restrictions. In the ice implementation, the driver is responsible for checking if the overall board budget is not exceeded. > > Or is it doing both? > > Sorry to be picky, but at some point, somebody is going to want to > implement this in the Linux SFP driver, and we want a consistent > implementation cross different implementations. No problem, I see that your points are totally valid. > > Andrew
On Wed, Apr 03, 2024 at 03:18:44PM +0200, Wojciech Drewek wrote: > > > On 02.04.2024 16:46, Andrew Lunn wrote: > > On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote: > >> > >> > >> On 30.03.2024 22:57, Andrew Lunn wrote: > >>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: > >>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool > >>>> module implementation to support new attributes that will allow user > >>>> to change maximum power. Rename structures and functions to be more > >>>> generic. Introduce an example of the new API in ice driver. > >>>> > >>>> Ethtool examples: > >>>> $ ethtool --show-module enp1s0f0np0 > >>>> Module parameters for enp1s0f0np0: > >>>> power-min-allowed: 1000 mW > >>>> power-max-allowed: 3000 mW > >>>> power-max-set: 1500 mW > >>>> > >>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > >>> > >>> We have had a device tree property for a long time: > >>> > >>> maximum-power-milliwatt: > >>> minimum: 1000 > >>> default: 1000 > >>> description: > >>> Maximum module power consumption Specifies the maximum power consumption > >>> allowable by a module in the slot, in milli-Watts. Presently, modules can > >>> be up to 1W, 1.5W or 2W. > >>> > >>> Could you flip the name around to be consistent with DT? > >> > >> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) > > > > That is a DT thing. Helps make the units of an ABI obvious. However, > > milliwatts is pretty standard with the kernel of user APIs, e.g. all > > hwmon calls use milliwatts. > > > >>>> minimum-power-allowed: 1000 mW > >>>> maximum-power-allowed: 3000 mW > >>>> maximum-power-set: 1500 mW > >>> > >>> Also, what does minimum-power-allowed actually tell us? Do you imagine > >>> it will ever be below 1W because of bad board design? Do you have a > >>> bad board design which does not allow 1W? > >> > >> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum. > > > > So if i plug in a 1W QSFP device, it will let the magic smoke out > > because it is force fed 1.5W? > > > > Looking at > > https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table > > 7 it indicates different power budget classifications. Power level 1 > > is a Maximum power of 1.5W. So does your parameter represent this? It > > is the minimum maximum power? And your other parameter is the maximum > > maximum power? > > Exactly as you described, minimum-power-allowed is in fact minimum value > which maximum-power-set can be set to (so minimum maximum). the other > parameter is maximim maximum. Table 7 in that document is titled "Power Budget Classification". So how about minimum-power-class-allowed: 1000 mW maximum-power-class-allowed: 3000 mW Andrew
> > $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > > > > actually talk to the SFP module and tell it the maximum power it can > > consume. So in this case, it is not the cage, but the module? > > It does not work that way in ice example. > > > > Or is it talking to some entity which is managing the overall power > > consumption of a number of cages, and asking it to allocate a maximum > > of 4W to this cage. It might return an error message saying there is > > no power budget left? > > That's right, we talk to firmware to set those restrictions. > In the ice implementation, the driver is responsible for checking if the > overall board budget is not exceeded. So i can get the board to agree that the cage can supply 3W to the module, but how do i then tell the module this? I would also suggest you don't focus too much on ICE. I find it better to think about an abstract system. A board with a power supply to a number of SFP cages, and some cages have modules in them. What does the kAPI look like, the use cases for this abstract system. Andrew
On 03.04.2024 15:40, Andrew Lunn wrote: > On Wed, Apr 03, 2024 at 03:18:44PM +0200, Wojciech Drewek wrote: >> >> >> On 02.04.2024 16:46, Andrew Lunn wrote: >>> On Tue, Apr 02, 2024 at 01:38:59PM +0200, Wojciech Drewek wrote: >>>> >>>> >>>> On 30.03.2024 22:57, Andrew Lunn wrote: >>>>> On Fri, Mar 29, 2024 at 10:23:18AM +0100, Wojciech Drewek wrote: >>>>>> Some ethernet modules use nonstandard power levels [1]. Extend ethtool >>>>>> module implementation to support new attributes that will allow user >>>>>> to change maximum power. Rename structures and functions to be more >>>>>> generic. Introduce an example of the new API in ice driver. >>>>>> >>>>>> Ethtool examples: >>>>>> $ ethtool --show-module enp1s0f0np0 >>>>>> Module parameters for enp1s0f0np0: >>>>>> power-min-allowed: 1000 mW >>>>>> power-max-allowed: 3000 mW >>>>>> power-max-set: 1500 mW >>>>>> >>>>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 >>>>> >>>>> We have had a device tree property for a long time: >>>>> >>>>> maximum-power-milliwatt: >>>>> minimum: 1000 >>>>> default: 1000 >>>>> description: >>>>> Maximum module power consumption Specifies the maximum power consumption >>>>> allowable by a module in the slot, in milli-Watts. Presently, modules can >>>>> be up to 1W, 1.5W or 2W. >>>>> >>>>> Could you flip the name around to be consistent with DT? >>>> >>>> Yea, I'm open to any name suggestion although I don't like the unit in the parameter name :) >>> >>> That is a DT thing. Helps make the units of an ABI obvious. However, >>> milliwatts is pretty standard with the kernel of user APIs, e.g. all >>> hwmon calls use milliwatts. >>> >>>>>> minimum-power-allowed: 1000 mW >>>>>> maximum-power-allowed: 3000 mW >>>>>> maximum-power-set: 1500 mW >>>>> >>>>> Also, what does minimum-power-allowed actually tell us? Do you imagine >>>>> it will ever be below 1W because of bad board design? Do you have a >>>>> bad board design which does not allow 1W? >>>> >>>> Yes. in case of QSFP we don't support 1W, 1.5W is the minimum. >>> >>> So if i plug in a 1W QSFP device, it will let the magic smoke out >>> because it is force fed 1.5W? >>> >>> Looking at >>> https://www.optcore.net/wp-content/uploads/2017/04/QSFP-MSA.pdf table >>> 7 it indicates different power budget classifications. Power level 1 >>> is a Maximum power of 1.5W. So does your parameter represent this? It >>> is the minimum maximum power? And your other parameter is the maximum >>> maximum power? >> >> Exactly as you described, minimum-power-allowed is in fact minimum value >> which maximum-power-set can be set to (so minimum maximum). the other >> parameter is maximim maximum. > > Table 7 in that document is titled "Power Budget Classification". So > how about > > minimum-power-class-allowed: 1000 mW > maximum-power-class-allowed: 3000 mW Sounds good > > Andrew
On 03.04.2024 15:49, Andrew Lunn wrote: >>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 >>> >>> actually talk to the SFP module and tell it the maximum power it can >>> consume. So in this case, it is not the cage, but the module? >> >> It does not work that way in ice example. >>> >>> Or is it talking to some entity which is managing the overall power >>> consumption of a number of cages, and asking it to allocate a maximum >>> of 4W to this cage. It might return an error message saying there is >>> no power budget left? >> >> That's right, we talk to firmware to set those restrictions. >> In the ice implementation, the driver is responsible for checking if the >> overall board budget is not exceeded. > > So i can get the board to agree that the cage can supply 3W to the > module, but how do i then tell the module this? I'd assume it is not possible, if the module consumes more power than maximum than the link will not come up and error will be printed. > > I would also suggest you don't focus too much on ICE. I find it better > to think about an abstract system. A board with a power supply to a > number of SFP cages, and some cages have modules in them. What does > the kAPI look like, the use cases for this abstract system. My design for this API is to have an option to get and set maximum power that the module in the cage can consume. It's not about modifying module's power consumption, it's about setting restrictions for it. The use case is to let the user change maximum power in the given cage (so he can plug in the module with higher power consumption). Before that he will lower maximum power in different cage. Thanks to that the overall budget for the board won't be exceeded. Does it make sense for the abstract system you described? > > Andrew > > >
On Thu, Apr 04, 2024 at 02:45:43PM +0200, Wojciech Drewek wrote: > > > On 03.04.2024 15:49, Andrew Lunn wrote: > >>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 > >>> > >>> actually talk to the SFP module and tell it the maximum power it can > >>> consume. So in this case, it is not the cage, but the module? > >> > >> It does not work that way in ice example. > >>> > >>> Or is it talking to some entity which is managing the overall power > >>> consumption of a number of cages, and asking it to allocate a maximum > >>> of 4W to this cage. It might return an error message saying there is > >>> no power budget left? > >> > >> That's right, we talk to firmware to set those restrictions. > >> In the ice implementation, the driver is responsible for checking if the > >> overall board budget is not exceeded. > > > > So i can get the board to agree that the cage can supply 3W to the > > module, but how do i then tell the module this? > > I'd assume it is not possible, if the module consumes more power > than maximum than the link will not come up and error will be printed. Take a look at the Linux SFP driver. In sfp_probe() is reads the DT property maximum-power-milliwatt: https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L3030 When the module is inserted and probed, the modules power capabilities are read from the EEPROM: https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L2320 https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1929 The code looks to see what conformance level the module has, so to know if it even supports different power levels, and the registers needed have been implemented. Later, the SFP state machine will transition the module to higher power: https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1995 by writing a register in the SFP. > > I would also suggest you don't focus too much on ICE. I find it better > > to think about an abstract system. A board with a power supply to a > > number of SFP cages, and some cages have modules in them. What does > > the kAPI look like, the use cases for this abstract system. > > My design for this API is to have an option to get and set maximum > power that the module in the cage can consume. It's not about modifying > module's power consumption, it's about setting restrictions for it. > > The use case is to let the user change maximum power in the given cage > (so he can plug in the module with higher power consumption). Before that > he will lower maximum power in different cage. Thanks to that the overall > budget for the board won't be exceeded. Does it make sense for the abstract > system you described? So there are a few different phases here. The standard says the module start up in low power mode. Something needs to enumerate what the module actually supports in terms of different power modes. Something then needs to determine if the board/cage can support higher power operation, that there is sufficient power budged. Budget then needs to be allocated to the cage. Lastly, the module needs to be told it can go to a higher power level. I would say the current Linux SFP code is overly simple, by design. It supports the concept of cage supplied by a dedicated regulator. There is no sharing of power across a number of cages. Hence it just ups the power if the board indicates the power is available and the module support a higher power level. However, you have a more complex setup, shared power supplies, etc. The policy of what modules gets how much power should come from user space. So we need user space APIs for this, and a clear understanding of how they work. Please could you describe how i would use ethtool to go through these phases. And how do i down grade a modules power consumption to make more power available to another module. Andrew
On 04.04.2024 15:53, Andrew Lunn wrote: > On Thu, Apr 04, 2024 at 02:45:43PM +0200, Wojciech Drewek wrote: >> >> >> On 03.04.2024 15:49, Andrew Lunn wrote: >>>>> $ ethtool --set-module enp1s0f0np0 power-max-set 4000 >>>>> >>>>> actually talk to the SFP module and tell it the maximum power it can >>>>> consume. So in this case, it is not the cage, but the module? >>>> >>>> It does not work that way in ice example. >>>>> >>>>> Or is it talking to some entity which is managing the overall power >>>>> consumption of a number of cages, and asking it to allocate a maximum >>>>> of 4W to this cage. It might return an error message saying there is >>>>> no power budget left? >>>> >>>> That's right, we talk to firmware to set those restrictions. >>>> In the ice implementation, the driver is responsible for checking if the >>>> overall board budget is not exceeded. >>> >>> So i can get the board to agree that the cage can supply 3W to the >>> module, but how do i then tell the module this? >> >> I'd assume it is not possible, if the module consumes more power >> than maximum than the link will not come up and error will be printed. > > Take a look at the Linux SFP driver. In sfp_probe() is reads the DT > property maximum-power-milliwatt: > > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L3030 > > When the module is inserted and probed, the modules power capabilities > are read from the EEPROM: > > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L2320 > > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1929 > > The code looks to see what conformance level the module has, so to > know if it even supports different power levels, and the registers > needed have been implemented. > > Later, the SFP state machine will transition the module to higher > power: > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/sfp.c#L1995 > > by writing a register in the SFP. > >>> I would also suggest you don't focus too much on ICE. I find it better >>> to think about an abstract system. A board with a power supply to a >>> number of SFP cages, and some cages have modules in them. What does >>> the kAPI look like, the use cases for this abstract system. >> >> My design for this API is to have an option to get and set maximum >> power that the module in the cage can consume. It's not about modifying >> module's power consumption, it's about setting restrictions for it. >> >> The use case is to let the user change maximum power in the given cage >> (so he can plug in the module with higher power consumption). Before that >> he will lower maximum power in different cage. Thanks to that the overall >> budget for the board won't be exceeded. Does it make sense for the abstract >> system you described? > > So there are a few different phases here. The standard says the module > start up in low power mode. > > Something needs to enumerate what the module actually supports in > terms of different power modes. So, this is something that would require new netlink attribute. From what you described, SFP driver is capable of reading module's power capabilities so it could be implemented there. > > Something then needs to determine if the board/cage can support higher > power operation, that there is sufficient power budged. Budget then > needs to be allocated to the cage. This is something my current design supports I think. Using ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports and change it. > > Lastly, the module needs to be told it can go to a higher power level. This could be done using ethtool_module_power_mode_policy I think. > > I would say the current Linux SFP code is overly simple, by design. It > supports the concept of cage supplied by a dedicated regulator. There > is no sharing of power across a number of cages. Hence it just ups the > power if the board indicates the power is available and the module > support a higher power level. > > However, you have a more complex setup, shared power supplies, > etc. The policy of what modules gets how much power should come from > user space. So we need user space APIs for this, and a clear > understanding of how they work. Please could you describe how i would > use ethtool to go through these phases. And how do i down grade a > modules power consumption to make more power available to another > module. I described how I see it for each phase above. Downgrading module's power consumption is not possible with my initial design. You can downgrade max power for the cage and if the module plugged in to it consumes more power than error will be printed and link will not be established. In the end it looks like we would need new netlink attribute to support modification of module's power consumption. > > Andrew > >
> This is something my current design supports I think. Using > ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports > and change it. > This could be done using ethtool_module_power_mode_policy I think. All these 'I think' don't give me a warm fuzzy feeling this is a well thought out and designed uAPI. I assume you have ethtool patches for your new netlink attributes. So show us the real usage. Start with an SFP in its default lower power mode. Show us the commands to display the current status. Allocate it more power, tell the module it can use more power, and then show us the status after the change has been made. Then lower the power to that cage and assign the power to a different cage. This is something you can later reuse in the 0/X patch describing the big picture of what the patchset does, and it will guide others who want to implement the same API in the Linux SFP driver, or other MAC drivers. Andrew
On 09.04.2024 15:39, Andrew Lunn wrote: >> This is something my current design supports I think. Using >> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports >> and change it. > >> This could be done using ethtool_module_power_mode_policy I think. > > All these 'I think' don't give me a warm fuzzy feeling this is a well > thought out and designed uAPI. > > I assume you have ethtool patches for your new netlink attributes. So > show us the real usage. Start with an SFP in its default lower power > mode. Show us the commands to display the current status. Allocate it > more power, tell the module it can use more power, and then show us > the status after the change has been made. Ok, but do we really need an API to switch the module between high/low power mode? I'd assume this is done automatically e.g. when we lower max power in the cage below module's high power mode than it should imply that the module should go to low power mode. Same with the high power mode, if enough power is assigned to the cage than module should go to the high power mode. Regarding the current status and what module supports, there is -m option: $ ethtool -m ens801f0np0 Identifier : 0x0d (QSFP+) Extended identifier : 0x00 Extended identifier description : 1.5W max. Power consumption Extended identifier description : No CDR in TX, No CDR in RX Extended identifier description : High Power Class (> 3.5 W) not enabled > > Then lower the power to that cage and assign the power to a different > cage. > > This is something you can later reuse in the 0/X patch describing the > big picture of what the patchset does, and it will guide others who > want to implement the same API in the Linux SFP driver, or other MAC > drivers. > > Andrew
On Fri, Apr 12, 2024 at 03:21:24PM +0200, Wojciech Drewek wrote: > > > On 09.04.2024 15:39, Andrew Lunn wrote: > >> This is something my current design supports I think. Using > >> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports > >> and change it. > > > >> This could be done using ethtool_module_power_mode_policy I think. > > > > All these 'I think' don't give me a warm fuzzy feeling this is a well > > thought out and designed uAPI. > > > > I assume you have ethtool patches for your new netlink attributes. So > > show us the real usage. Start with an SFP in its default lower power > > mode. Show us the commands to display the current status. Allocate it > > more power, tell the module it can use more power, and then show us > > the status after the change has been made. > > Ok, but do we really need an API to switch the module between high/low power mode? Probably not. But you need to document that the API you are adding is also expected to talk to the module and tell it to use more/less power. > Regarding the current status and what module supports, there is -m option: > $ ethtool -m ens801f0np0 > Identifier : 0x0d (QSFP+) > Extended identifier : 0x00 > Extended identifier description : 1.5W max. Power consumption > Extended identifier description : No CDR in TX, No CDR in RX > Extended identifier description : High Power Class (> 3.5 W) not enabled So you can make this part of your commit message. Show this. Invoke your new ethtool option, then show this again with the module reporting a higher power consumption. The reduce the power using ethtool and show the power consumption has reduced. Also, in the ethtool-netlink.rst file, clearly document what the API is doing, so that somebody else can implement it for another device. Please also document hotplug behaviour. Say I use your new API to increase the power to 3.5W. I then eject the module. Does the available power automatically get put back into the pool? When i reinsert the module, it will be in low power class, and i need to issue the ethtool command again to increase its power? Andrew
On 16.04.2024 00:03, Andrew Lunn wrote: > On Fri, Apr 12, 2024 at 03:21:24PM +0200, Wojciech Drewek wrote: >> >> >> On 09.04.2024 15:39, Andrew Lunn wrote: >>>> This is something my current design supports I think. Using >>>> ETHTOOL_A_MODULE_MAX_POWER_SET user can get what cage supports >>>> and change it. >>> >>>> This could be done using ethtool_module_power_mode_policy I think. >>> >>> All these 'I think' don't give me a warm fuzzy feeling this is a well >>> thought out and designed uAPI. >>> >>> I assume you have ethtool patches for your new netlink attributes. So >>> show us the real usage. Start with an SFP in its default lower power >>> mode. Show us the commands to display the current status. Allocate it >>> more power, tell the module it can use more power, and then show us >>> the status after the change has been made. >> >> Ok, but do we really need an API to switch the module between high/low power mode? > > Probably not. But you need to document that the API you are adding is > also expected to talk to the module and tell it to use more/less > power. > >> Regarding the current status and what module supports, there is -m option: >> $ ethtool -m ens801f0np0 >> Identifier : 0x0d (QSFP+) >> Extended identifier : 0x00 >> Extended identifier description : 1.5W max. Power consumption >> Extended identifier description : No CDR in TX, No CDR in RX >> Extended identifier description : High Power Class (> 3.5 W) not enabled > > So you can make this part of your commit message. Show this. Invoke > your new ethtool option, then show this again with the module > reporting a higher power consumption. The reduce the power using > ethtool and show the power consumption has reduced. > > Also, in the ethtool-netlink.rst file, clearly document what the API > is doing, so that somebody else can implement it for another device. > > Please also document hotplug behaviour. Say I use your new API to > increase the power to 3.5W. I then eject the module. Does the > available power automatically get put back into the pool? When i > reinsert the module, it will be in low power class, and i need to > issue the ethtool command again to increase its power? Ok, I'll answer all of those questions in the documentation included in the 2nd version of the patchset. > > Andrew >