diff mbox series

[Intel-wired-lan,iwl-next,v4,4/5] ice: Add tx_scheduling_layers devlink param

Message ID 20240219100555.7220-5-mateusz.polchlopek@intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series ice: Support 5 layer Tx scheduler topology | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Mateusz Polchlopek Feb. 19, 2024, 10:05 a.m. UTC
From: Lukasz Czapnik <lukasz.czapnik@intel.com>

It was observed that Tx performance was inconsistent across all queues
and/or VSIs and that it was directly connected to existing 9-layer
topology of the Tx scheduler.

Introduce new private devlink param - tx_scheduling_layers. This parameter
gives user flexibility to choose the 5-layer transmit scheduler topology
which helps to smooth out the transmit performance.

Allowed parameter values are 5 and 9.

Example usage:

Show:
devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
pci/0000:4b:00.0:
  name tx_scheduling_layers type driver-specific
    values:
      cmode permanent value 9

Set:
devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
cmode permanent

devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 9
cmode permanent

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 169 ++++++++++++++++++
 .../net/ethernet/intel/ice/ice_fw_update.c    |   7 +-
 .../net/ethernet/intel/ice/ice_fw_update.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   7 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 6 files changed, 189 insertions(+), 8 deletions(-)

Comments

Jiri Pirko Feb. 19, 2024, 12:37 p.m. UTC | #1
Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>
>It was observed that Tx performance was inconsistent across all queues
>and/or VSIs and that it was directly connected to existing 9-layer
>topology of the Tx scheduler.
>
>Introduce new private devlink param - tx_scheduling_layers. This parameter
>gives user flexibility to choose the 5-layer transmit scheduler topology
>which helps to smooth out the transmit performance.
>
>Allowed parameter values are 5 and 9.
>
>Example usage:
>
>Show:
>devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>pci/0000:4b:00.0:
>  name tx_scheduling_layers type driver-specific
>    values:
>      cmode permanent value 9
>
>Set:
>devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>cmode permanent

This is kind of proprietary param similar to number of which were shot
down for mlx5 in past. Jakub?

Also, given this is apparently nvconfig configuration, there could be
probably more suitable to use some provisioning tool. This is related to
the mlx5 misc driver.

Until be figure out the plan, this has my nack:

NAcked-by: Jiri Pirko <jiri@nvidia.com>
Przemek Kitszel Feb. 19, 2024, 1:33 p.m. UTC | #2
On 2/19/24 13:37, Jiri Pirko wrote:
> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>>
>> It was observed that Tx performance was inconsistent across all queues
>> and/or VSIs and that it was directly connected to existing 9-layer
>> topology of the Tx scheduler.
>>
>> Introduce new private devlink param - tx_scheduling_layers. This parameter
>> gives user flexibility to choose the 5-layer transmit scheduler topology
>> which helps to smooth out the transmit performance.
>>
>> Allowed parameter values are 5 and 9.
>>
>> Example usage:
>>
>> Show:
>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> pci/0000:4b:00.0:
>>   name tx_scheduling_layers type driver-specific
>>     values:
>>       cmode permanent value 9
>>
>> Set:
>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> cmode permanent
> 
> This is kind of proprietary param similar to number of which were shot

not sure if this is the same kind of param, but for sure proprietary one

> down for mlx5 in past. Jakub?

I'm not that familiar with the history/ies around mlx5, but this case is
somewhat different, at least for me:
we have a performance fix for the tree inside the FW/HW, while you
(IIRC) were about to introduce some nice and general abstraction layer,
which could be used by other HW vendors too, but instead it was mlx-only

> 
> Also, given this is apparently nvconfig configuration, there could be
> probably more suitable to use some provisioning tool. 

TBH, we will want to add some other NVM related params, but that does
not justify yet another tool to configure PF. (And then there would be
a big debate if FW update should be moved there too for consistency).

> This is related to the mlx5 misc driver.
> 
> Until be figure out the plan, this has my nack:
> 
> NAcked-by: Jiri Pirko <jiri@nvidia.com>

IMO this is an easy case, but would like to hear from netdev maintainers
Jiri Pirko Feb. 19, 2024, 5:15 p.m. UTC | #3
Mon, Feb 19, 2024 at 02:33:54PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/19/24 13:37, Jiri Pirko wrote:
>> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> > From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>> > 
>> > It was observed that Tx performance was inconsistent across all queues
>> > and/or VSIs and that it was directly connected to existing 9-layer
>> > topology of the Tx scheduler.
>> > 
>> > Introduce new private devlink param - tx_scheduling_layers. This parameter
>> > gives user flexibility to choose the 5-layer transmit scheduler topology
>> > which helps to smooth out the transmit performance.
>> > 
>> > Allowed parameter values are 5 and 9.
>> > 
>> > Example usage:
>> > 
>> > Show:
>> > devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> > pci/0000:4b:00.0:
>> >   name tx_scheduling_layers type driver-specific
>> >     values:
>> >       cmode permanent value 9
>> > 
>> > Set:
>> > devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> > cmode permanent
>> 
>> This is kind of proprietary param similar to number of which were shot
>
>not sure if this is the same kind of param, but for sure proprietary one
>
>> down for mlx5 in past. Jakub?
>
>I'm not that familiar with the history/ies around mlx5, but this case is
>somewhat different, at least for me:
>we have a performance fix for the tree inside the FW/HW, while you
>(IIRC) were about to introduce some nice and general abstraction layer,
>which could be used by other HW vendors too, but instead it was mlx-only

Nope. Same thing. Vendor/device specific FW/HW knob. Nothing to
abstract.


>
>> 
>> Also, given this is apparently nvconfig configuration, there could be
>> probably more suitable to use some provisioning tool.
>
>TBH, we will want to add some other NVM related params, but that does
>not justify yet another tool to configure PF. (And then there would be
>a big debate if FW update should be moved there too for consistency).
>
>> This is related to the mlx5 misc driver.
>> 
>> Until be figure out the plan, this has my nack:
>> 
>> NAcked-by: Jiri Pirko <jiri@nvidia.com>
>
>IMO this is an easy case, but would like to hear from netdev maintainers
>
>
Jakub Kicinski Feb. 21, 2024, 11:38 p.m. UTC | #4
On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
> >devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
> >pci/0000:4b:00.0:
> >  name tx_scheduling_layers type driver-specific
> >    values:
> >      cmode permanent value 9
> >
> >Set:
> >devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
> >cmode permanent  

It's been almost a year since v1 was posted, could you iterate quicker? :( 

> This is kind of proprietary param similar to number of which were shot
> down for mlx5 in past. Jakub?

I remain somewhat confused about what this does.
Specifically IIUC the problem is that the radix of each node is
limited, so we need to start creating multi-layer hierarchies 
if we want a higher radix. Or in the "5-layer mode" the radix 
is automatically higher?
Mateusz Polchlopek Feb. 22, 2024, 1:25 p.m. UTC | #5
On 2/22/2024 12:38 AM, Jakub Kicinski wrote:
> On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
>>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>>> pci/0000:4b:00.0:
>>>   name tx_scheduling_layers type driver-specific
>>>     values:
>>>       cmode permanent value 9
>>>
>>> Set:
>>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>>> cmode permanent
> 
> It's been almost a year since v1 was posted, could you iterate quicker? :(
> 

Yes... Sorry for iterating so slowly, I promise to do it quicker in the 
future. That issue was kind of problematic for us, so it took quite a 
long time.

>> This is kind of proprietary param similar to number of which were shot
>> down for mlx5 in past. Jakub?
> 
> I remain somewhat confused about what this does.
> Specifically IIUC the problem is that the radix of each node is
> limited, so we need to start creating multi-layer hierarchies
> if we want a higher radix. Or in the "5-layer mode" the radix
> is automatically higher?

Basically, switching from 9 to 5 layers topology allows us to have 512 
leaves instead of 8 leaves which improves performance. I will add this 
information to the commit message and Documentation too, when we get an 
ACK for devlink parameter.

In the future we plan to extend this feature to support other layer 
counts too.
Jakub Kicinski Feb. 22, 2024, 11:07 p.m. UTC | #6
On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
> >> This is kind of proprietary param similar to number of which were shot
> >> down for mlx5 in past. Jakub?  
> > 
> > I remain somewhat confused about what this does.
> > Specifically IIUC the problem is that the radix of each node is
> > limited, so we need to start creating multi-layer hierarchies
> > if we want a higher radix. Or in the "5-layer mode" the radix
> > is automatically higher?  
> 
> Basically, switching from 9 to 5 layers topology allows us to have 512 
> leaves instead of 8 leaves which improves performance. I will add this 
> information to the commit message and Documentation too, when we get an 
> ACK for devlink parameter.

Sounds fine. Please update the doc to focus on the radix, rather than
the layers. Layers are not so important to the user. And maybe give an
example of things which won't be possible with 5-layer config.

Jiri, I'm not aware of any other devices with this sort of trade off.
We shouldn't add the param if either:
 - this can be changed dynamically as user instantiates rate limiters;
 - we know other devices have similar needs.
If neither of those is true, param seems fine to me..
Jiri Pirko Feb. 23, 2024, 9:45 a.m. UTC | #7
Fri, Feb 23, 2024 at 12:07:17AM CET, kuba@kernel.org wrote:
>On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
>> >> This is kind of proprietary param similar to number of which were shot
>> >> down for mlx5 in past. Jakub?  
>> > 
>> > I remain somewhat confused about what this does.
>> > Specifically IIUC the problem is that the radix of each node is
>> > limited, so we need to start creating multi-layer hierarchies
>> > if we want a higher radix. Or in the "5-layer mode" the radix
>> > is automatically higher?  
>> 
>> Basically, switching from 9 to 5 layers topology allows us to have 512 
>> leaves instead of 8 leaves which improves performance. I will add this 
>> information to the commit message and Documentation too, when we get an 
>> ACK for devlink parameter.
>
>Sounds fine. Please update the doc to focus on the radix, rather than
>the layers. Layers are not so important to the user. And maybe give an
>example of things which won't be possible with 5-layer config.
>
>Jiri, I'm not aware of any other devices with this sort of trade off.
>We shouldn't add the param if either:
> - this can be changed dynamically as user instantiates rate limiters;
> - we know other devices have similar needs.
>If neither of those is true, param seems fine to me..

Where is this policy documented? If not, could you please? Let's make
this policy clear for now and for the future.
Jakub Kicinski Feb. 23, 2024, 2:27 p.m. UTC | #8
On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>> Jiri, I'm not aware of any other devices with this sort of trade off.
>> We shouldn't add the param if either:
>>  - this can be changed dynamically as user instantiates rate limiters;
>>  - we know other devices have similar needs.
>> If neither of those is true, param seems fine to me..  
> 
> Where is this policy documented? If not, could you please? Let's make
> this policy clear for now and for the future.

Because you think it's good as a policy or because not so much?
Both of the points are a judgment call, at least from upstream
perspective since we're working with very limited information.
So enshrining this as a "policy" is not very practical.

Do you recall any specific param that got rejected from mlx5?
Y'all were allowed to add the eq sizing params, which I think
is not going to be mlx5-only for long. Otherwise I only remember
cases where I'd try to push people to use the resource API, which
IMO is better for setting limits and delegating resources.
Jiri Pirko Feb. 25, 2024, 7:18 a.m. UTC | #9
Fri, Feb 23, 2024 at 03:27:57PM CET, kuba@kernel.org wrote:
>On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>>> Jiri, I'm not aware of any other devices with this sort of trade off.
>>> We shouldn't add the param if either:
>>>  - this can be changed dynamically as user instantiates rate limiters;
>>>  - we know other devices have similar needs.
>>> If neither of those is true, param seems fine to me..  
>> 
>> Where is this policy documented? If not, could you please? Let's make
>> this policy clear for now and for the future.
>
>Because you think it's good as a policy or because not so much?
>Both of the points are a judgment call, at least from upstream
>perspective since we're working with very limited information.
>So enshrining this as a "policy" is not very practical.

No, I don't mind the policy. Up to you. Makes sense to me. I'm just
saying it would be great to have this written down so everyone can
easily tell which kind of param is and is not acceptable.


>
>Do you recall any specific param that got rejected from mlx5?
>Y'all were allowed to add the eq sizing params, which I think
>is not going to be mlx5-only for long. Otherwise I only remember
>cases where I'd try to push people to use the resource API, which
>IMO is better for setting limits and delegating resources.

I don't have anything solid in mind, I would have to look it up. But
there is certainly quite big amount of uncertainties among my
colleagues to jundge is some param would or would not be acceptable to
you. That's why I believe it would save a lot of people time to write
the policy down in details, with examples, etc. Could you please?

Thanks!
Jakub Kicinski Feb. 27, 2024, 2:37 a.m. UTC | #10
On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
> >Do you recall any specific param that got rejected from mlx5?
> >Y'all were allowed to add the eq sizing params, which I think
> >is not going to be mlx5-only for long. Otherwise I only remember
> >cases where I'd try to push people to use the resource API, which
> >IMO is better for setting limits and delegating resources.  
> 
> I don't have anything solid in mind, I would have to look it up. But
> there is certainly quite big amount of uncertainties among my
> colleagues to jundge is some param would or would not be acceptable to
> you. That's why I believe it would save a lot of people time to write
> the policy down in details, with examples, etc. Could you please?

How about this? (BTW took me half an hour to write, just in case 
you're wondering)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 4e01dc32bc08..f1eef6d065be 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
 level, it can be used to provide configuration that may affect multiple
 ports on a single device.
 
-This document describes a number of generic parameters that are supported
-across multiple drivers. Each driver is also free to add their own
-parameters. Each driver must document the specific parameters they support,
-whether generic or not.
+There are two categories of devlink parameters - generic parameters
+and device-specific quirks. Generic devlink parameters are configuration
+knobs which don't fit into any larger API, but are supported across multiple
+drivers. This document describes a number of generic parameters.
+Each driver can also add its own parameters, which are documented in driver
+specific files.
 
 Configuration modes
 ===================
@@ -137,3 +139,32 @@ own name.
    * - ``event_eq_size``
      - u32
      - Control the size of asynchronous control events EQ.
+
+Adding new params
+=================
+
+Addition of new devlink params is carefully scrutinized upstream.
+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
+devlink params should never be used in their place e.g. to allow easier
+delivery via out-of-tree modules, or to save development time.
+
+devlink parameters must always be thoroughly documented, both from technical
+perspective (to allow meaningful upstream review), and from user perspective
+(to allow users to make informed decisions).
+
+The requirements above should make it obvious that any "automatic" /
+"pass-through" registration of devlink parameters, based on strings
+read from the device, will not be accepted.
+
+There are two broad categories of devlink params which had been accepted
+in the past:
+
+ - device-specific configuration knobs, which cannot be inferred from
+   other device configuration. Note that the author is expected to study
+   other drivers to make sure that the configuration is in fact unique
+   to the implementation.
+
+ - configuration which must be set at device initialization time.
+   Allowing user to enable features at runtime is always preferable
+   but in reality most devices allow certain features to be enabled/disabled
+   only by changing configuration stored in NVM.
Jiri Pirko Feb. 27, 2024, 12:17 p.m. UTC | #11
Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> >Do you recall any specific param that got rejected from mlx5?
>> >Y'all were allowed to add the eq sizing params, which I think
>> >is not going to be mlx5-only for long. Otherwise I only remember
>> >cases where I'd try to push people to use the resource API, which
>> >IMO is better for setting limits and delegating resources.  
>> 
>> I don't have anything solid in mind, I would have to look it up. But
>> there is certainly quite big amount of uncertainties among my
>> colleagues to jundge is some param would or would not be acceptable to
>> you. That's why I believe it would save a lot of people time to write
>> the policy down in details, with examples, etc. Could you please?
>
>How about this? (BTW took me half an hour to write, just in case 
>you're wondering)
>
>diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>index 4e01dc32bc08..f1eef6d065be 100644
>--- a/Documentation/networking/devlink/devlink-params.rst
>+++ b/Documentation/networking/devlink/devlink-params.rst
>@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
> level, it can be used to provide configuration that may affect multiple
> ports on a single device.
> 
>-This document describes a number of generic parameters that are supported
>-across multiple drivers. Each driver is also free to add their own
>-parameters. Each driver must document the specific parameters they support,
>-whether generic or not.
>+There are two categories of devlink parameters - generic parameters
>+and device-specific quirks. Generic devlink parameters are configuration
>+knobs which don't fit into any larger API, but are supported across multiple
>+drivers. This document describes a number of generic parameters.
>+Each driver can also add its own parameters, which are documented in driver
>+specific files.
> 
> Configuration modes
> ===================
>@@ -137,3 +139,32 @@ own name.
>    * - ``event_eq_size``
>      - u32
>      - Control the size of asynchronous control events EQ.
>+
>+Adding new params
>+=================
>+
>+Addition of new devlink params is carefully scrutinized upstream.
>+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>+devlink params should never be used in their place e.g. to allow easier
>+delivery via out-of-tree modules, or to save development time.
>+
>+devlink parameters must always be thoroughly documented, both from technical
>+perspective (to allow meaningful upstream review), and from user perspective
>+(to allow users to make informed decisions).
>+
>+The requirements above should make it obvious that any "automatic" /
>+"pass-through" registration of devlink parameters, based on strings
>+read from the device, will not be accepted.
>+
>+There are two broad categories of devlink params which had been accepted
>+in the past:
>+
>+ - device-specific configuration knobs, which cannot be inferred from
>+   other device configuration. Note that the author is expected to study
>+   other drivers to make sure that the configuration is in fact unique
>+   to the implementation.
>+
>+ - configuration which must be set at device initialization time.
>+   Allowing user to enable features at runtime is always preferable
>+   but in reality most devices allow certain features to be enabled/disabled
>+   only by changing configuration stored in NVM.

Looks like most of the generic params does not fit either of these 2
categories. Did you mean these 2 categories for driver specific?
Przemek Kitszel Feb. 27, 2024, 1:05 p.m. UTC | #12
On 2/27/24 13:17, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>>>> Do you recall any specific param that got rejected from mlx5?
>>>> Y'all were allowed to add the eq sizing params, which I think
>>>> is not going to be mlx5-only for long. Otherwise I only remember
>>>> cases where I'd try to push people to use the resource API, which
>>>> IMO is better for setting limits and delegating resources.
>>>
>>> I don't have anything solid in mind, I would have to look it up. But
>>> there is certainly quite big amount of uncertainties among my
>>> colleagues to jundge is some param would or would not be acceptable to
>>> you. That's why I believe it would save a lot of people time to write
>>> the policy down in details, with examples, etc. Could you please?
>>
>> How about this? (BTW took me half an hour to write, just in case
>> you're wondering)

Thank you!

>>
>> diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> index 4e01dc32bc08..f1eef6d065be 100644
>> --- a/Documentation/networking/devlink/devlink-params.rst
>> +++ b/Documentation/networking/devlink/devlink-params.rst
>> @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> level, it can be used to provide configuration that may affect multiple
>> ports on a single device.
>>
>> -This document describes a number of generic parameters that are supported
>> -across multiple drivers. Each driver is also free to add their own
>> -parameters. Each driver must document the specific parameters they support,
>> -whether generic or not.
>> +There are two categories of devlink parameters - generic parameters
>> +and device-specific quirks. Generic devlink parameters are configuration
>> +knobs which don't fit into any larger API, but are supported across multiple

re Jiri: Generic ones are described here.

>> +drivers. This document describes a number of generic parameters.
>> +Each driver can also add its own parameters, which are documented in driver
>> +specific files.
>>
>> Configuration modes
>> ===================
>> @@ -137,3 +139,32 @@ own name.
>>     * - ``event_eq_size``
>>       - u32
>>       - Control the size of asynchronous control events EQ.
>> +
>> +Adding new params
>> +=================
>> +
>> +Addition of new devlink params is carefully scrutinized upstream.
>> +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> +devlink params should never be used in their place e.g. to allow easier
>> +delivery via out-of-tree modules, or to save development time.
>> +
>> +devlink parameters must always be thoroughly documented, both from technical
>> +perspective (to allow meaningful upstream review), and from user perspective
>> +(to allow users to make informed decisions).
>> +
>> +The requirements above should make it obvious that any "automatic" /
>> +"pass-through" registration of devlink parameters, based on strings
>> +read from the device, will not be accepted.
>> +
>> +There are two broad categories of devlink params which had been accepted
>> +in the past:
>> +
>> + - device-specific configuration knobs, which cannot be inferred from
>> +   other device configuration. Note that the author is expected to study
>> +   other drivers to make sure that the configuration is in fact unique
>> +   to the implementation.

What if it would not be unique, should they then proceed to add generic
(other word would be "common") param, and make the other driver/s use
it? Without deprecating the old method ofc.

What about knob being vendor specific, but given vendor has multiple,
very similar drivers? (ugh)

>> +
>> + - configuration which must be set at device initialization time.
>> +   Allowing user to enable features at runtime is always preferable
>> +   but in reality most devices allow certain features to be enabled/disabled
>> +   only by changing configuration stored in NVM.
> 
> Looks like most of the generic params does not fit either of these 2
> categories. Did you mean these 2 categories for driver specific?

If you mean the two paragraphs above (both started by "-"), this is for
vendor specific knobs, and reads fine.
Jiri Pirko Feb. 27, 2024, 3:39 p.m. UTC | #13
Tue, Feb 27, 2024 at 02:05:45PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/27/24 13:17, Jiri Pirko wrote:
>> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> > On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> > > > Do you recall any specific param that got rejected from mlx5?
>> > > > Y'all were allowed to add the eq sizing params, which I think
>> > > > is not going to be mlx5-only for long. Otherwise I only remember
>> > > > cases where I'd try to push people to use the resource API, which
>> > > > IMO is better for setting limits and delegating resources.
>> > > 
>> > > I don't have anything solid in mind, I would have to look it up. But
>> > > there is certainly quite big amount of uncertainties among my
>> > > colleagues to jundge is some param would or would not be acceptable to
>> > > you. That's why I believe it would save a lot of people time to write
>> > > the policy down in details, with examples, etc. Could you please?
>> > 
>> > How about this? (BTW took me half an hour to write, just in case
>> > you're wondering)
>
>Thank you!
>
>> > 
>> > diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> > index 4e01dc32bc08..f1eef6d065be 100644
>> > --- a/Documentation/networking/devlink/devlink-params.rst
>> > +++ b/Documentation/networking/devlink/devlink-params.rst
>> > @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> > level, it can be used to provide configuration that may affect multiple
>> > ports on a single device.
>> > 
>> > -This document describes a number of generic parameters that are supported
>> > -across multiple drivers. Each driver is also free to add their own
>> > -parameters. Each driver must document the specific parameters they support,
>> > -whether generic or not.
>> > +There are two categories of devlink parameters - generic parameters
>> > +and device-specific quirks. Generic devlink parameters are configuration
>> > +knobs which don't fit into any larger API, but are supported across multiple
>
>re Jiri: Generic ones are described here.
>
>> > +drivers. This document describes a number of generic parameters.
>> > +Each driver can also add its own parameters, which are documented in driver
>> > +specific files.
>> > 
>> > Configuration modes
>> > ===================
>> > @@ -137,3 +139,32 @@ own name.
>> >     * - ``event_eq_size``
>> >       - u32
>> >       - Control the size of asynchronous control events EQ.
>> > +
>> > +Adding new params
>> > +=================
>> > +
>> > +Addition of new devlink params is carefully scrutinized upstream.
>> > +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> > +devlink params should never be used in their place e.g. to allow easier
>> > +delivery via out-of-tree modules, or to save development time.
>> > +
>> > +devlink parameters must always be thoroughly documented, both from technical
>> > +perspective (to allow meaningful upstream review), and from user perspective
>> > +(to allow users to make informed decisions).
>> > +
>> > +The requirements above should make it obvious that any "automatic" /
>> > +"pass-through" registration of devlink parameters, based on strings
>> > +read from the device, will not be accepted.
>> > +
>> > +There are two broad categories of devlink params which had been accepted
>> > +in the past:
>> > +
>> > + - device-specific configuration knobs, which cannot be inferred from
>> > +   other device configuration. Note that the author is expected to study
>> > +   other drivers to make sure that the configuration is in fact unique
>> > +   to the implementation.
>
>What if it would not be unique, should they then proceed to add generic
>(other word would be "common") param, and make the other driver/s use
>it? Without deprecating the old method ofc.
>
>What about knob being vendor specific, but given vendor has multiple,
>very similar drivers? (ugh)
>
>> > +
>> > + - configuration which must be set at device initialization time.
>> > +   Allowing user to enable features at runtime is always preferable
>> > +   but in reality most devices allow certain features to be enabled/disabled
>> > +   only by changing configuration stored in NVM.
>> 
>> Looks like most of the generic params does not fit either of these 2
>> categories. Did you mean these 2 categories for driver specific?
>
>If you mean the two paragraphs above (both started by "-"), this is for
>vendor specific knobs, and reads fine.

Do you assume or read it somewhere? I don't see it. I have the same
assumption though :)
Andrew Lunn Feb. 27, 2024, 3:41 p.m. UTC | #14
> What if it would not be unique, should they then proceed to add generic
> (other word would be "common") param, and make the other driver/s use
> it? Without deprecating the old method ofc.

If it is useful, somebody else will copy it and it will become
common. If nobody copies it, its probably not useful :-)

A lot of what we do in networking comes from standard. Its the
standards which gives us interoperability. Also, there is the saying,
form follows function. There are only so many ways you can implement
the same thing.

Is anybody truly building unique hardware, whos form somehow does not
follow function and is yet still standards compliant? More likely,
they are just the first, and others will copy or re-invent it sooner
or later.

So for me, unique is a pretty high bar to reach.

	Andrew
Jiri Pirko Feb. 27, 2024, 4:04 p.m. UTC | #15
Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
>> What if it would not be unique, should they then proceed to add generic
>> (other word would be "common") param, and make the other driver/s use
>> it? Without deprecating the old method ofc.
>
>If it is useful, somebody else will copy it and it will become
>common. If nobody copies it, its probably not useful :-)
>
>A lot of what we do in networking comes from standard. Its the
>standards which gives us interoperability. Also, there is the saying,
>form follows function. There are only so many ways you can implement
>the same thing.
>
>Is anybody truly building unique hardware, whos form somehow does not
>follow function and is yet still standards compliant? More likely,
>they are just the first, and others will copy or re-invent it sooner
>or later.

Wait, standard in protocol sense is completely parallel to the hw/fw
implementations. They may be (and in reality they are) a lots of
tunables to tweak specific hw/fw internals. So modern nics are very
unique. Still providing the same inputs and outputs, protocol-wise.


>
>So for me, unique is a pretty high bar to reach.
>
>	Andrew
>
Andrew Lunn Feb. 27, 2024, 8:38 p.m. UTC | #16
On Tue, Feb 27, 2024 at 05:04:47PM +0100, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
> >> What if it would not be unique, should they then proceed to add generic
> >> (other word would be "common") param, and make the other driver/s use
> >> it? Without deprecating the old method ofc.
> >
> >If it is useful, somebody else will copy it and it will become
> >common. If nobody copies it, its probably not useful :-)
> >
> >A lot of what we do in networking comes from standard. Its the
> >standards which gives us interoperability. Also, there is the saying,
> >form follows function. There are only so many ways you can implement
> >the same thing.
> >
> >Is anybody truly building unique hardware, whos form somehow does not
> >follow function and is yet still standards compliant? More likely,
> >they are just the first, and others will copy or re-invent it sooner
> >or later.
> 
> Wait, standard in protocol sense is completely parallel to the hw/fw
> implementations. They may be (and in reality they are) a lots of
> tunables to tweak specific hw/fw internals. So modern nics are very
> unique. Still providing the same inputs and outputs, protocol-wise.

I think there is a general trickle down of technologies. What is top
of the line now, because normal everyday in 5 - 10 years time. Think
of a top of the line 10G Ethernet from 10 years ago. Is its design
that different to what you get integrated into today's SoC?  Are the
same or similar tunables from 10 year old top the line NICs also in
todays everyday SoCs?

Every PC is going to be an AI PC, if you believe the market hype at
the moment. But don't you think every PC will also have a network
processor of some sort in 5 - 10 years, derived from today network
processor. It will just be another tile in the SoC, next to the AI
tile, the GPU tile, and the CPU tiles. My guess would be, those
tunables in todays hardware will trickle down into those tiles in 5-10
years because they have been shown to be useful, they work, lets
re-use what we have.

       Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 02102e937b30..4143b50bd15d 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1692,6 +1692,14 @@  struct ice_aqc_nvm {
 };
 
 #define ICE_AQC_NVM_START_POINT			0
+#define ICE_AQC_NVM_TX_TOPO_MOD_ID             0x14B
+
+struct ice_aqc_nvm_tx_topo_user_sel {
+	__le16 length;
+	u8 data;
+#define ICE_AQC_NVM_TX_TOPO_USER_SEL	BIT(4)
+	u8 reserved;
+};
 
 /* NVM Checksum Command (direct, 0x0706) */
 struct ice_aqc_nvm_checksum {
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index cc717175178b..db4872990e51 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -770,6 +770,167 @@  ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_get_tx_topo_user_sel - Read user's choice from flash
+ * @pf: pointer to pf structure
+ * @layers: value read from flash will be saved here
+ *
+ * Reads user's preference for Tx Scheduler Topology Tree from PFA TLV.
+ *
+ * Returns zero when read was successful, negative values otherwise.
+ */
+static int ice_get_tx_topo_user_sel(struct ice_pf *pf, uint8_t *layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			     sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (usr_sel.data & ICE_AQC_NVM_TX_TOPO_USER_SEL)
+		*layers = ICE_SCHED_5_LAYERS;
+       else
+		*layers = ICE_SCHED_9_LAYERS;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_update_tx_topo_user_sel - Save user's preference in flash
+ * @pf: pointer to pf structure
+ * @layers: value to be saved in flash
+ *
+ * Variable "layers" defines user's preference about number of layers in Tx
+ * Scheduler Topology Tree. This choice should be stored in PFA TLV field
+ * and be picked up by driver, next time during init.
+ *
+ * Returns zero when save was successful, negative values otherwise.
+ */
+static int ice_update_tx_topo_user_sel(struct ice_pf *pf, int layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_WRITE);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			      sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (layers == ICE_SCHED_5_LAYERS)
+		usr_sel.data |= ICE_AQC_NVM_TX_TOPO_USER_SEL;
+	else
+		usr_sel.data &= ~ICE_AQC_NVM_TX_TOPO_USER_SEL;
+
+	err = ice_write_one_nvm_block(pf, ICE_AQC_NVM_TX_TOPO_MOD_ID, 2,
+				      sizeof(usr_sel.data), &usr_sel.data,
+				      true, NULL, NULL);
+	if (err)
+		err = -EIO;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_get - Get tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to store the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_get(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_get_tx_topo_user_sel(pf, &ctx->val.vu8);
+	if (err) {
+		dev_warn(dev, "Failed to read Tx Scheduler Tree - User Selection data from flash\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_set - Set tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to get the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_set(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_update_tx_topo_user_sel(pf, ctx->val.vu8);
+	if (err)
+		return -EIO;
+
+	dev_warn(dev, "Tx scheduling layers have been changed on this device. You must reboot the system for the change to take effect.");
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_validate - Validate passed tx_scheduling_layers
+ *                                       parameter value
+ * @devlink: unused pointer to devlink instance
+ * @id: the parameter ID to validate
+ * @val: value to validate
+ * @extack: netlink extended ACK structure
+ *
+ * Supported values are:
+ * - 5 - five layers Tx Scheduler Topology Tree
+ * - 9 - nine layers Tx Scheduler Topology Tree
+ *
+ * Returns zero when passed parameter value is supported. Negative value on
+ * error.
+ */
+static int ice_devlink_tx_sched_layers_validate(struct devlink *devlink, u32 id,
+					        union devlink_param_value val,
+					        struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+
+	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Requested feature is not supported by the FW on this device. Update the FW and run this command again.");
+		return -EOPNOTSUPP;
+	}
+
+	if (val.vu8 != ICE_SCHED_5_LAYERS && val.vu8 != ICE_SCHED_9_LAYERS) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Wrong number of tx scheduler layers provided.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * ice_tear_down_devlink_rate_tree - removes devlink-rate exported tree
  * @pf: pf struct
@@ -1601,6 +1762,7 @@  static int ice_devlink_loopback_validate(struct devlink *devlink, u32 id,
 enum ice_param_id {
 	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	ICE_DEVLINK_PARAM_ID_LOOPBACK,
+	ICE_DEVLINK_PARAM_ID_TX_BALANCE,
 };
 
 static const struct devlink_param ice_devlink_params[] = {
@@ -1618,6 +1780,13 @@  static const struct devlink_param ice_devlink_params[] = {
 			     ice_devlink_loopback_get,
 			     ice_devlink_loopback_set,
 			     ice_devlink_loopback_validate),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_BALANCE,
+			     "tx_scheduling_layers",
+			     DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+			     ice_devlink_tx_sched_layers_get,
+			     ice_devlink_tx_sched_layers_set,
+			     ice_devlink_tx_sched_layers_validate),
 };
 
 static void ice_devlink_free(void *devlink_ptr)
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.c b/drivers/net/ethernet/intel/ice/ice_fw_update.c
index 319a2d6fe26c..f81db6c107c8 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.c
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.c
@@ -286,10 +286,9 @@  ice_send_component_table(struct pldmfw *context, struct pldmfw_component *compon
  *
  * Returns: zero on success, or a negative error code on failure.
  */
-static int
-ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
-			u16 block_size, u8 *block, bool last_cmd,
-			u8 *reset_level, struct netlink_ext_ack *extack)
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack)
 {
 	u16 completion_module, completion_retval;
 	struct device *dev = ice_pf_to_dev(pf);
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.h b/drivers/net/ethernet/intel/ice/ice_fw_update.h
index 750574885716..04b200462757 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.h
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.h
@@ -9,5 +9,8 @@  int ice_devlink_flash_update(struct devlink *devlink,
 			     struct netlink_ext_ack *extack);
 int ice_get_pending_updates(struct ice_pf *pf, u8 *pending,
 			    struct netlink_ext_ack *extack);
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack);
 
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index d4e05d2cb30c..84eab92dc03c 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -18,10 +18,9 @@ 
  *
  * Read the NVM using the admin queue commands (0x0701)
  */
-static int
-ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
-		void *data, bool last_command, bool read_shadow_ram,
-		struct ice_sq_cd *cd)
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
 	struct ice_aqc_nvm *cmd;
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 774c2317967d..63cdc6bdac58 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -14,6 +14,9 @@  struct ice_orom_civd_info {
 
 int ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access);
 void ice_release_nvm(struct ice_hw *hw);
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd);
 int
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);