[net-next,0/2] ice: support FEC automatic disable

Message ID	20220823150438.3613327-1-jacob.e.keller@intel.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Jacob Keller <jacob.e.keller@intel.com> To: netdev@vger.kernel.org Cc: Jacob Keller <jacob.e.keller@intel.com> Subject: [PATCH net-next 0/2] ice: support FEC automatic disable Date: Tue, 23 Aug 2022 08:04:36 -0700 Message-Id: <20220823150438.3613327-1-jacob.e.keller@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ice: support FEC automatic disable \| expand [net-next,0/2] ice: support FEC automatic disable [net-next,1/2] ethtool: pass netlink extended ACK to .set_fecparam [net-next,2/2] ice: add support for Auto FEC with FEC disabled via ETHTOOL_SFECPARAM

Jacob Keller Aug. 23, 2022, 3:04 p.m. UTC

This series implements support for users to configure automatic FEC
selection including the option of disabling FEC. It implements similar
behavior as a previous submission we made at

https://lore.kernel.org/netdev/20220714180311.933648-1-anthony.l.nguyen@intel.com/

This implementation varies, in that we now honor ETHTOOL_FEC_AUTO |
ETHTOOL_FEC_OFF as the new automatic plus disable mode.

This is in line with the request Jakub made to avoid using a new private
flag. I opted to use a bit-wise or of the two already supported flags rather
than trying to introduce a new flag.

I think this makes sense as essentially this is a request to automatically
select but also include "off" as a possible option. I'm not sure if this is
the best approach, but it seemed better than trying to add a new
ETHTOOL_FEC_AUTO_DISABLE or similarly confusing option. The need for this is
due to the quirk of how the ice firmware Link Establishment State Machine
works to decide what FEC mode to use.

Current userspace and the API already support multiple bit selection, though
it does have the downside of not guaranteeing consistency across drivers...
I'm open to alternative suggestions and implementations if someone has a
better suggestion.

Some alternatives we've considered already:

1) use a private flag

  Rejected for good reason, as private flags are difficult to discover and
  vary wildly across drivers. It also makes the driver behave differently to
  the same userspace request which may not be obvious to applications.

2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"

  This could work, but it means that behavior will differ depending on the
  firmware version. Users have no way to know that and might be surprised to
  find the behavior differ across devices which have different firmware
  which do or don't support this variation of automatic selection.

2) introduce a new FEC mode to the ETHTOOL interface

  I considered just adding a brand new flag, but choosing a name here is
  relatively difficult. Most names read as some sort of "disable automatic
  selection" which isn't the best meaning.

3) use combined ETHTOOL_FEC_AUTO | ETHTOOL_FEC_OFF (this series)

  This version simply accepts a combined bitwise OR of ETHTOOL_FEC_AUTO and
  ETHTOOL_FEC_OFF. This was previously rejected by ice so it should not
  cause compatibility issues. The API already supports it, though it was
  noted the semantics of this combination are not well defined and could
  behave differently across drivers.

  This version has the downside of not being explicit in the API now since
  drivers may not all interpret this combination the same way. Thats
  understandably a concern, but I'm not sure what the best approach to avoid
  that here.

  This version does allow users to explicitly request the new behavior, and
  allows reporting an error when firmware can't support it.

To aid in reporting errors to userspace, I also extended the .set_fecparam
to take the netlink extended ACK struct. This allows directly reporting why
the option didn't take when using the netlink backed interface for ethtool.


Jacob Keller (2):
  ethtool: pass netlink extended ACK to .set_fecparam
  ice: add support for Auto FEC with FEC disabled via ETHTOOL_SFECPARAM

 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  3 +-
 .../ethernet/cavium/liquidio/lio_ethtool.c    |  3 +-
 .../ethernet/chelsio/cxgb4/cxgb4_ethtool.c    |  3 +-
 .../ethernet/fungible/funeth/funeth_ethtool.c |  3 +-
 .../ethernet/hisilicon/hns3/hns3_ethtool.c    |  3 +-
 .../net/ethernet/intel/i40e/i40e_ethtool.c    |  3 +-
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_common.c   | 54 ++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_common.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  | 15 ++++--
 drivers/net/ethernet/intel/ice/ice_main.c     |  3 +-
 drivers/net/ethernet/intel/ice/ice_type.h     |  9 +++-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |  3 +-
 .../marvell/prestera/prestera_ethtool.c       |  3 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  3 +-
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  3 +-
 .../ethernet/pensando/ionic/ionic_ethtool.c   |  3 +-
 .../net/ethernet/qlogic/qede/qede_ethtool.c   |  3 +-
 drivers/net/ethernet/sfc/ethtool_common.c     |  3 +-
 drivers/net/ethernet/sfc/ethtool_common.h     |  3 +-
 .../net/ethernet/sfc/siena/ethtool_common.c   |  3 +-
 .../net/ethernet/sfc/siena/ethtool_common.h   |  3 +-
 drivers/net/netdevsim/ethtool.c               |  3 +-
 include/linux/ethtool.h                       |  3 +-
 net/ethtool/fec.c                             |  2 +-
 net/ethtool/ioctl.c                           |  2 +-
 26 files changed, 115 insertions(+), 26 deletions(-)


base-commit: 90b3bee3a23249977852079b908270afc6ee03bb

Gal Pressman Aug. 24, 2022, 1:35 p.m. UTC | #1

On 23/08/2022 18:04, Jacob Keller wrote:
> 2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"
>
>   This could work, but it means that behavior will differ depending on the
>   firmware version. Users have no way to know that and might be surprised to
>   find the behavior differ across devices which have different firmware
>   which do or don't support this variation of automatic selection.

Hi Jacob,
This is exactly how it's already implemented in mlx5, and I don't really
understand how firmware version is related? Is it specific to your
device firmware?
Maybe you can workaround that in the driver?

I feel like we're going the wrong way here having different flags
interpretations by different drivers.

Jakub Kicinski Aug. 24, 2022, 4:25 p.m. UTC | #2

On Wed, 24 Aug 2022 16:35:41 +0300 Gal Pressman wrote:
> On 23/08/2022 18:04, Jacob Keller wrote:
> > 2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"
> >
> >   This could work, but it means that behavior will differ depending on the
> >   firmware version. Users have no way to know that and might be surprised to
> >   find the behavior differ across devices which have different firmware
> >   which do or don't support this variation of automatic selection.  
> 
> Hi Jacob,
> This is exactly how it's already implemented in mlx5, and I don't really
> understand how firmware version is related? Is it specific to your
> device firmware?
> Maybe you can workaround that in the driver?
> 
> I feel like we're going the wrong way here having different flags
> interpretations by different drivers.

Hm, according to my notes the drivers supporting a single bit and
multiple bits were evenly split when I wrote the docs. Either way
we're going to make someone unhappy?

While we're talking about mlx5 FEC, what does it report on get?
I wasn't sure if it reports the supported or configured mode.

Jacob Keller Aug. 24, 2022, 5:40 p.m. UTC | #3

> -----Original Message-----
> From: Gal Pressman <gal@nvidia.com>
> Sent: Wednesday, August 24, 2022 6:36 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On 23/08/2022 18:04, Jacob Keller wrote:
> > 2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"
> >
> >   This could work, but it means that behavior will differ depending on the
> >   firmware version. Users have no way to know that and might be surprised to
> >   find the behavior differ across devices which have different firmware
> >   which do or don't support this variation of automatic selection.
> 
> Hi Jacob,
> This is exactly how it's already implemented in mlx5, and I don't really
> understand how firmware version is related? Is it specific to your
> device firmware?
> Maybe you can workaround that in the driver?

For ice, the original "auto" implementation (which is handled by firmware) will automatically select an FEC mode based on the media type and using a state machine to go through options until it finds a valid link.

This implementation would never select "No FEC" (i.e. FEC_OFF) for certain module types which do not list "No FEC" as part of their auto negotiation supported list. (Despite this not actually being auto negotiation). Some of our customers were surprised by this and asked if we could change it, so new firmware has an option to allow choosing "No FEC". This is an "opt-in" that the driver must tell firmware when setting up FEC mode. This obviously is only available on newer firmware. Going with option 2 would result in differing behavior depending on what firmware and driver you're using.

I thought that was a bit confusing since userspace/users would not know which variant is in use, and my understanding from our customer engineers is that we don't want to change the behavior without an explicit request of some kind. That's where the original private flag came from.

As for working around this in the driver, I am not sure how feasible that is.

> 
> I feel like we're going the wrong way here having different flags
> interpretations by different drivers.

I would prefer to avoid that as well

Thanks,
Jake

Gal Pressman Aug. 25, 2022, 7:08 a.m. UTC | #4

On 24/08/2022 20:40, Keller, Jacob E wrote:
>
>> -----Original Message-----
>> From: Gal Pressman <gal@nvidia.com>
>> Sent: Wednesday, August 24, 2022 6:36 AM
>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
>>
>> On 23/08/2022 18:04, Jacob Keller wrote:
>>> 2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"
>>>
>>>   This could work, but it means that behavior will differ depending on the
>>>   firmware version. Users have no way to know that and might be surprised to
>>>   find the behavior differ across devices which have different firmware
>>>   which do or don't support this variation of automatic selection.
>> Hi Jacob,
>> This is exactly how it's already implemented in mlx5, and I don't really
>> understand how firmware version is related? Is it specific to your
>> device firmware?
>> Maybe you can workaround that in the driver?
> For ice, the original "auto" implementation (which is handled by firmware) will automatically select an FEC mode based on the media type and using a state machine to go through options until it finds a valid link.
>
> This implementation would never select "No FEC" (i.e. FEC_OFF) for certain module types which do not list "No FEC" as part of their auto negotiation supported list. (Despite this not actually being auto negotiation). Some of our customers were surprised by this and asked if we could change it, so new firmware has an option to allow choosing "No FEC". This is an "opt-in" that the driver must tell firmware when setting up FEC mode. This obviously is only available on newer firmware. Going with option 2 would result in differing behavior depending on what firmware and driver you're using.
>
> I thought that was a bit confusing since userspace/users would not know which variant is in use, and my understanding from our customer engineers is that we don't want to change the behavior without an explicit request of some kind. That's where the original private flag came from.
>
> As for working around this in the driver, I am not sure how feasible that is.
>
>> I feel like we're going the wrong way here having different flags
>> interpretations by different drivers.
> I would prefer to avoid that as well

Then maybe adding a new flag is the right thing to do here.

That way the existing auto mode will keep its current meaning (all modes
including off), which you'll be able to support on newer firmware
versions, and the new auto flag (all modes excluding off) will be
supported on all firmware versions.
Then maybe we can even add support for the new flag in mlx5 (I need to
check whether that's feasible with our hardware).

Gal Pressman Aug. 25, 2022, 7:08 a.m. UTC | #5

On 24/08/2022 19:25, Jakub Kicinski wrote:
> On Wed, 24 Aug 2022 16:35:41 +0300 Gal Pressman wrote:
>> On 23/08/2022 18:04, Jacob Keller wrote:
>>> 2) always treat ETHTOOL_FEC_AUTO as "automatic + allow disable"
>>>
>>>   This could work, but it means that behavior will differ depending on the
>>>   firmware version. Users have no way to know that and might be surprised to
>>>   find the behavior differ across devices which have different firmware
>>>   which do or don't support this variation of automatic selection.  
>> Hi Jacob,
>> This is exactly how it's already implemented in mlx5, and I don't really
>> understand how firmware version is related? Is it specific to your
>> device firmware?
>> Maybe you can workaround that in the driver?
>>
>> I feel like we're going the wrong way here having different flags
>> interpretations by different drivers.
> Hm, according to my notes the drivers supporting a single bit and
> multiple bits were evenly split when I wrote the docs. Either way
> we're going to make someone unhappy?

IMHO, passing both auto and off is confusing.
BTW, I think this is also inconsistent with the direction SONiC took (or
is taking?) with the auto fec mode interpretation?

> While we're talking about mlx5 FEC, what does it report on get?
> I wasn't sure if it reports the supported or configured mode.

I think it's the configured mode and the active mode.

Jakub Kicinski Aug. 25, 2022, 4:29 p.m. UTC | #6

On Thu, 25 Aug 2022 10:08:05 +0300 Gal Pressman wrote:
> Then maybe adding a new flag is the right thing to do here.
> 
> That way the existing auto mode will keep its current meaning (all modes
> including off), which you'll be able to support on newer firmware
> versions, and the new auto flag (all modes excluding off) will be
> supported on all firmware versions.
> Then maybe we can even add support for the new flag in mlx5 (I need to
> check whether that's feasible with our hardware).

Sorry, I misinterpreted your previous reply, somehow I thought you
quoted option (3), because my fallible reading of mlx5 was that it
accepts multiple flags.

(First) option 2 is fine.

Do you happen to have a link to what SONiC defined?
We really need to establish some expectations before we start extending
the API. Naively I thought the IEEE spec was more prescriptive :(

Jacob Keller Aug. 25, 2022, 4:57 p.m. UTC | #7

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 25, 2022 9:30 AM
> To: Gal Pressman <gal@nvidia.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Saeed Mahameed
> <saeedm@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On Thu, 25 Aug 2022 10:08:05 +0300 Gal Pressman wrote:
> > Then maybe adding a new flag is the right thing to do here.
> >
> > That way the existing auto mode will keep its current meaning (all modes
> > including off), which you'll be able to support on newer firmware
> > versions, and the new auto flag (all modes excluding off) will be
> > supported on all firmware versions.
> > Then maybe we can even add support for the new flag in mlx5 (I need to
> > check whether that's feasible with our hardware).
> 
> Sorry, I misinterpreted your previous reply, somehow I thought you
> quoted option (3), because my fallible reading of mlx5 was that it
> accepts multiple flags.
> 
> (First) option 2 is fine.
> 


Even though existing behavior doesn't do that for ice right now and wouldn't be able to do that properly with old firmware?

Thanks,
Jake

> Do you happen to have a link to what SONiC defined?
> We really need to establish some expectations before we start extending
> the API. Naively I thought the IEEE spec was more prescriptive :(

Yea its a bit unfortunate :(

Jakub Kicinski Aug. 25, 2022, 5:30 p.m. UTC | #8

On Thu, 25 Aug 2022 16:57:50 +0000 Keller, Jacob E wrote:
> > Sorry, I misinterpreted your previous reply, somehow I thought you
> > quoted option (3), because my fallible reading of mlx5 was that it
> > accepts multiple flags.
> > 
> > (First) option 2 is fine.
> 
> Even though existing behavior doesn't do that for ice right now and
> wouldn't be able to do that properly with old firmware?

Update your FW seems like a reasonable thing to ask customers to do.
Are there cables already qualified for the old FW which would switch
to No FEC under new rules?

Can you share how your FW picks the mode exactly?

There must be _some_ standardization here, because we're talking about
<5m cables, so we can safely assume it's linking to a ToR switch.

Jacob Keller Aug. 25, 2022, 5:51 p.m. UTC | #9

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 25, 2022 10:30 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Gal Pressman <gal@nvidia.com>; Saeed Mahameed <saeedm@nvidia.com>;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On Thu, 25 Aug 2022 16:57:50 +0000 Keller, Jacob E wrote:
> > > Sorry, I misinterpreted your previous reply, somehow I thought you
> > > quoted option (3), because my fallible reading of mlx5 was that it
> > > accepts multiple flags.
> > >
> > > (First) option 2 is fine.
> >
> > Even though existing behavior doesn't do that for ice right now and
> > wouldn't be able to do that properly with old firmware?
> 
> Update your FW seems like a reasonable thing to ask customers to do.
> Are there cables already qualified for the old FW which would switch
> to No FEC under new rules?

Not sure I follow what you're asking here.

> 
> Can you share how your FW picks the mode exactly?
> 

I'm not entirely sure how it selects, other than it selects from the table of supported modes. It uses some state machine to go through options until a suitable link is made, but the details of how exactly it does this I'm not quite sure.

The old firmware never picks "No FEC" for some media types, because the standard was that those types would always require FEC of some kind (R or RS). However in practice the modules can work with no FEC anyways, and according to customer reports enabling this has helped with linking issues. That's the sum of my understanding thus far.

I would prefer this option of just "auto means also possibly disable", but I wasn't sure how other devices worked and we had some concerns about the change in behavior. Going with this option would significantly simplify things since I could bury the "set the auto disable flag if necessary" into a lower level of the code and only expose a single ICE_FEC_AUTO instead of ICE_FEC_AUTO_DIS...

Thus, I'm happy to respin this, as the new behavior when supported by firmware if we have some consensus there. I am happy to drop or respin the extack changes if you think thats still valuable as well in the event we need to extend that API in the future.

> There must be _some_ standardization here, because we're talking about
> <5m cables, so we can safely assume it's linking to a ToR switch.

I believe so.

Jakub Kicinski Aug. 25, 2022, 8:34 p.m. UTC | #10

On Thu, 25 Aug 2022 17:51:01 +0000 Keller, Jacob E wrote:
> > Update your FW seems like a reasonable thing to ask customers to do.
> > Are there cables already qualified for the old FW which would switch
> > to No FEC under new rules?  
> 
> Not sure I follow what you're asking here.

IIRC your NICs only work with qualified cables. I was asking if any of
the qualified cables would actually negotiate to No FEC under new rules.
Maybe such cables are very rare and there's no need to be super careful?

> > Can you share how your FW picks the mode exactly?
> 
> I'm not entirely sure how it selects, other than it selects from the
> table of supported modes. It uses some state machine to go through
> options until a suitable link is made, but the details of how exactly
> it does this I'm not quite sure.

State machine? So you're trying different FEC modes or reading module
data and picking one?

Various NICs do either or both, but I believe AUTO means the former.

> The old firmware never picks "No FEC" for some media types, because
> the standard was that those types would always require FEC of some
> kind (R or RS). However in practice the modules can work with no FEC
> anyways, and according to customer reports enabling this has helped
> with linking issues. That's the sum of my understanding thus far.

Well, according to the IEEE standard there sure are cables which don't
need FEC. Maybe your customers had problems linking up because switch
had a different selection algo?

> I would prefer this option of just "auto means also possibly
> disable", but I wasn't sure how other devices worked and we had some
> concerns about the change in behavior. Going with this option would
> significantly simplify things since I could bury the "set the auto
> disable flag if necessary" into a lower level of the code and only
> expose a single ICE_FEC_AUTO instead of ICE_FEC_AUTO_DIS...
> 
> Thus, I'm happy to respin this, as the new behavior when supported by
> firmware if we have some consensus there.

Hard to get consensus if we still don't know what the FW does...
But if there's no new uAPI, just always enabling OFF with AUTO
then I guess I'd have nothing to complain about as I don't know
what other drivers do either.

> I am happy to drop or
> respin the extack changes if you think thats still valuable as well
> in the event we need to extend that API in the future.

Your call. May be useful to do it sooner rather than later, but 
we should find at least one use for the extack as a justification.

> > There must be _some_ standardization here, because we're talking
> > about <5m cables, so we can safely assume it's linking to a ToR
> > switch.  
> 
> I believe so.

Jacob Keller Aug. 25, 2022, 9:04 p.m. UTC | #11

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 25, 2022 1:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Gal Pressman <gal@nvidia.com>; Saeed Mahameed <saeedm@nvidia.com>;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On Thu, 25 Aug 2022 17:51:01 +0000 Keller, Jacob E wrote:
> > > Update your FW seems like a reasonable thing to ask customers to do.
> > > Are there cables already qualified for the old FW which would switch
> > > to No FEC under new rules?
> >
> > Not sure I follow what you're asking here.
> 
> IIRC your NICs only work with qualified cables. I was asking if any of
> the qualified cables would actually negotiate to No FEC under new rules.
> Maybe such cables are very rare and there's no need to be super careful?
> 

I am not sure if that would be related

> > > Can you share how your FW picks the mode exactly?
> >
> > I'm not entirely sure how it selects, other than it selects from the
> > table of supported modes. It uses some state machine to go through
> > options until a suitable link is made, but the details of how exactly
> > it does this I'm not quite sure.
> 
> State machine? So you're trying different FEC modes or reading module
> data and picking one?
> 

I believe it is trying different modes, but it is selecting from module data for what to try.

> Various NICs do either or both, but I believe AUTO means the former.
> 
> > The old firmware never picks "No FEC" for some media types, because
> > the standard was that those types would always require FEC of some
> > kind (R or RS). However in practice the modules can work with no FEC
> > anyways, and according to customer reports enabling this has helped
> > with linking issues. That's the sum of my understanding thus far.
> 
> Well, according to the IEEE standard there sure are cables which don't
> need FEC. Maybe your customers had problems linking up because switch
> had a different selection algo?
> 

Right. I believe its because some combinations in their module data don't list No FEC, but still work with No FEC. The understanding I got from the firmware person I've asked was that the list of whats supported is recorded in the module somehow.

> > I would prefer this option of just "auto means also possibly
> > disable", but I wasn't sure how other devices worked and we had some
> > concerns about the change in behavior. Going with this option would
> > significantly simplify things since I could bury the "set the auto
> > disable flag if necessary" into a lower level of the code and only
> > expose a single ICE_FEC_AUTO instead of ICE_FEC_AUTO_DIS...
> >
> > Thus, I'm happy to respin this, as the new behavior when supported by
> > firmware if we have some consensus there.
> 
> Hard to get consensus if we still don't know what the FW does...
> But if there's no new uAPI, just always enabling OFF with AUTO
> then I guess I'd have nothing to complain about as I don't know
> what other drivers do either.
> 

It's been frustratingly difficult to get real answers here.

> > I am happy to drop or
> > respin the extack changes if you think thats still valuable as well
> > in the event we need to extend that API in the future.
> 
> Your call. May be useful to do it sooner rather than later, but
> we should find at least one use for the extack as a justification.
> 

There is at least one error in ice, and I'll update other drivers that print error messages too if I find any.

Jacob Keller Aug. 26, 2022, 12:38 a.m. UTC | #12

On 8/25/2022 1:34 PM, Jakub Kicinski wrote:
> Hard to get consensus if we still don't know what the FW does...
> But if there's no new uAPI, just always enabling OFF with AUTO
> then I guess I'd have nothing to complain about as I don't know
> what other drivers do either.
> 
Ok. I think I have a basic summary of the situation and whats going on
in the firmware. I'll try to summarize here:

Firmware has a state machine that we call the Link Establishment State
Machine. This is the state machine which will attempt to establish link.
This state machine only applies when auto-negotiation is not used. If
auto-negotation is used it will perform the standard auto-negotiation
flow and set FEC through that method.

The way this works follows this flow:

1) driver sends a set PHY capabilities request. This includes various
bits including whether to automatically select FEC, and which FEC modes
to select from. When we enable ETHTOOL_FEC_AUTO, the driver always sets
all FEC modes with the auto FEC bit.

2) the firmware receives this request and begins the LESM. This starts
with the firmware generating a list of configurations to attempt. Each
configuration is a possible link mode combined with a bitwise AND of the
FEC modes requested above in set PHY capabiltiies and the set of FEC
modes supported by that link mode. The example I gave was if you plugged
in a CA-L cable, it would try:

  100G-CAUI4
  50G-LAUI2
  25G-AUI-C2C
  10G-SFI-DA

I'm still not 100% sure how it decides which link modes to choose for
which cable, but I believe this is in a table stored within the firmware
module we call the netlist.

2a) for older firmware, the set PHY capabiltiies interface does not have
a bit to set for requesting No FEC. Instead, each media type has a
determination made about whether it needed FEC of not. I was told for
example that CA-N cables would enable No FEC as an option, but CA-L
cables would not (even though No FEC is supported for the link modes in
question).

2b) on newer firmware, the set PHY capabilities interface does have a
bit to request No FEC. In this case, if we set the No FEC bit, then the
firmware will be able to select No FEC as an option for cables that
otherwise wouldn't have selected it in the old firmware (such as CA-L
cables mentioned above).

3) once the firmware has generated the list of possible configurations,
it will iterate through them in a loop. Each configuration is applied,
and then we wait some time (the timeout is also stored in the netlist
module). If link establishes at one of these phases, we stop and use
that configuration. Otherwise we move to the next configuration and try
that. Each FEC mode is tried in sequence. (Unless the automatic FEC
selection bit is *not* set. In that case, only one of the FEC modes is
tried instead, and it is expected that software only set one bit to try.
That would perform forced FEC selection instead).

This process will repeat as it iterates through the configurations until
link is established.

As a side note, the first stage is to try auto-negotiation if enabled.
So in the case where auto-negotiation is enabled it will first try
auto-negotiation, then the set of forced configurations, and then loop
back to trying auto-negotiation before trying the forced configs again.

So from the software programming state, we currently translate
ETHTOOL_FEC_AUTO by setting the automatic bit as well as setting every
FEC mode bit, except the "No FEC" bit. This is a new bit which is only
available on newer firmware.

With the proposed change, we would add the "No FEC" bit when user
requested both ETHTOOL_FEC_AUTO and ETHTOOL_FEC_OFF simultaneously.

From reading your previous replies, you would prefer to just have the
driver set the "No FEC" bit always for ETHTOOL_FEC_AUTO when its
available/supported by firmware?

Jakub Kicinski Aug. 26, 2022, 1:01 a.m. UTC | #13

On Thu, 25 Aug 2022 17:38:14 -0700 Jacob Keller wrote:
> On 8/25/2022 1:34 PM, Jakub Kicinski wrote:
> > Hard to get consensus if we still don't know what the FW does...
> > But if there's no new uAPI, just always enabling OFF with AUTO
> > then I guess I'd have nothing to complain about as I don't know
> > what other drivers do either.
> >   
> Ok. I think I have a basic summary of the situation and whats going on
> in the firmware. I'll try to summarize here:
> 
> Firmware has a state machine that we call the Link Establishment State
> Machine. This is the state machine which will attempt to establish link.
> This state machine only applies when auto-negotiation is not used. If
> auto-negotation is used it will perform the standard auto-negotiation
> flow and set FEC through that method.
> 
> The way this works follows this flow:
> 
> 1) driver sends a set PHY capabilities request. This includes various
> bits including whether to automatically select FEC, and which FEC modes
> to select from. When we enable ETHTOOL_FEC_AUTO, the driver always sets
> all FEC modes with the auto FEC bit.
> 
> 2) the firmware receives this request and begins the LESM. This starts
> with the firmware generating a list of configurations to attempt. Each
> configuration is a possible link mode combined with a bitwise AND of the
> FEC modes requested above in set PHY capabiltiies and the set of FEC
> modes supported by that link mode. The example I gave was if you plugged
> in a CA-L cable, it would try:
> 
>   100G-CAUI4
>   50G-LAUI2
>   25G-AUI-C2C
>   10G-SFI-DA
> 
> I'm still not 100% sure how it decides which link modes to choose for
> which cable, but I believe this is in a table stored within the firmware
> module we call the netlist.
> 
> 2a) for older firmware, the set PHY capabiltiies interface does not have
> a bit to set for requesting No FEC. Instead, each media type has a
> determination made about whether it needed FEC of not. I was told for
> example that CA-N cables would enable No FEC as an option, but CA-L
> cables would not (even though No FEC is supported for the link modes in
> question).
> 
> 2b) on newer firmware, the set PHY capabilities interface does have a
> bit to request No FEC. In this case, if we set the No FEC bit, then the
> firmware will be able to select No FEC as an option for cables that
> otherwise wouldn't have selected it in the old firmware (such as CA-L
> cables mentioned above).

Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
From the initial reading of your series I thought that Intel NICs 
would _never_ pick No FEC.

Sounds like we need a bit for "ignore the standard and try everything".

What about BASE-R FEC? Is the FW going to try it on the CA-L cable?

> 3) once the firmware has generated the list of possible configurations,
> it will iterate through them in a loop. Each configuration is applied,
> and then we wait some time (the timeout is also stored in the netlist
> module). If link establishes at one of these phases, we stop and use
> that configuration. Otherwise we move to the next configuration and try
> that. Each FEC mode is tried in sequence. (Unless the automatic FEC
> selection bit is *not* set. In that case, only one of the FEC modes is
> tried instead, and it is expected that software only set one bit to try.
> That would perform forced FEC selection instead).
> 
> This process will repeat as it iterates through the configurations until
> link is established.
> 
> As a side note, the first stage is to try auto-negotiation if enabled.
> So in the case where auto-negotiation is enabled it will first try
> auto-negotiation, then the set of forced configurations, and then loop
> back to trying auto-negotiation before trying the forced configs again.
> 
> So from the software programming state, we currently translate
> ETHTOOL_FEC_AUTO by setting the automatic bit as well as setting every
> FEC mode bit, except the "No FEC" bit. This is a new bit which is only
> available on newer firmware.
> 
> With the proposed change, we would add the "No FEC" bit when user
> requested both ETHTOOL_FEC_AUTO and ETHTOOL_FEC_OFF simultaneously.
> 
> From reading your previous replies, you would prefer to just have the
> driver set the "No FEC" bit always for ETHTOOL_FEC_AUTO when its
> available/supported by firmware?

Jacob Keller Aug. 26, 2022, 5:51 p.m. UTC | #14

On 8/25/2022 6:01 PM, Jakub Kicinski wrote:
> On Thu, 25 Aug 2022 17:38:14 -0700 Jacob Keller wrote:
>> 2b) on newer firmware, the set PHY capabilities interface does have a
>> bit to request No FEC. In this case, if we set the No FEC bit, then the
>> firmware will be able to select No FEC as an option for cables that
>> otherwise wouldn't have selected it in the old firmware (such as CA-L
>> cables mentioned above).
> 
> Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
> From the initial reading of your series I thought that Intel NICs 
> would _never_ pick No FEC.
> 

That was my original interpretation when I was first introduced to this
problem but I was mistaken, hence why the commit message wasn't clear :(

This is rather more complicated than I originally understood and the
names for various bits have not been named very well so their behavior
isn't exactly obvious...

> Sounds like we need a bit for "ignore the standard and try everything".
> 
> What about BASE-R FEC? Is the FW going to try it on the CA-L cable?
> 

Ok I got further clarification on this. We have a bit, "Auto FEC
enable", as well as a bitmask for which FEC modes to try.

If "Auto FEC En" is set, then the Link Establishment State Machine will
try all of the FEC options we list in that bitmask, as long as we can
theoretically support them even if they aren't spec compliant.

For old firmware the bitmask didn't include a bit for "No FEC", where as
the new firmware has a bit for "No FEC".

We were always setting "Auto FEC En" so currently we try all FEC modes
we could theoretically support.

If "Auto FEC En" is disabled, then we only try FEC modes which are spec
compliant. Additionally, only a single FEC mode is tried based on a
priority and the bitmask.

Currently and historically the driver has always set "Auto FEC En", so
we were enabling non-spec compliant FEC modes, but "No FEC" was only
based on spec compliance with the media type.

From this, I think I agree the correct behavior is to add a bit for
"override the spec and try everything", and then on new firmware we'd
set the "No FEC" while on old firmware we'd be limited to only trying
FEC modes.

Does that make sense?

So yea I think we do probably need a "ignore the standard" bit.. but
currently that appears to already be what ice does (excepting No FEC
which didn't previously have a bit to set for it)

Thanks,
Jake

Jakub Kicinski Aug. 26, 2022, 11:57 p.m. UTC | #15

On Fri, 26 Aug 2022 10:51:21 -0700 Jacob Keller wrote:
> On 8/25/2022 6:01 PM, Jakub Kicinski wrote:
> > Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
> > From the initial reading of your series I thought that Intel NICs 
> > would _never_ pick No FEC.
> 
> That was my original interpretation when I was first introduced to this
> problem but I was mistaken, hence why the commit message wasn't clear :(
> 
> This is rather more complicated than I originally understood and the
> names for various bits have not been named very well so their behavior
> isn't exactly obvious...
> 
> > Sounds like we need a bit for "ignore the standard and try everything".
> > 
> > What about BASE-R FEC? Is the FW going to try it on the CA-L cable?
> 
> Ok I got further clarification on this. We have a bit, "Auto FEC
> enable", as well as a bitmask for which FEC modes to try.
> 
> If "Auto FEC En" is set, then the Link Establishment State Machine will
> try all of the FEC options we list in that bitmask, as long as we can
> theoretically support them even if they aren't spec compliant.
> 
> For old firmware the bitmask didn't include a bit for "No FEC", where as
> the new firmware has a bit for "No FEC".
> 
> We were always setting "Auto FEC En" so currently we try all FEC modes
> we could theoretically support.
> 
> If "Auto FEC En" is disabled, then we only try FEC modes which are spec
> compliant. Additionally, only a single FEC mode is tried based on a
> priority and the bitmask.
> 
> Currently and historically the driver has always set "Auto FEC En", so
> we were enabling non-spec compliant FEC modes, but "No FEC" was only
> based on spec compliance with the media type.
> 
> From this, I think I agree the correct behavior is to add a bit for
> "override the spec and try everything", and then on new firmware we'd
> set the "No FEC" while on old firmware we'd be limited to only trying
> FEC modes.
> 
> Does that make sense?
> 
> So yea I think we do probably need a "ignore the standard" bit.. but
> currently that appears to already be what ice does (excepting No FEC
> which didn't previously have a bit to set for it)

Thanks for getting to the bottom of this :)

The "override spec modes" bit sounds like a reasonable addition,
since we possibly have different behavior between vendors letting 
the user know if the device will follow the rules can save someone
debugging time.

But it does sound orthogonal to you adding the No FEC bit to the mask
for ice.

Let me add Simon and Andy so that we have the major vendors on the CC.
(tl;dr the question is whether your FW follows the guidance of 
'Table 110C–1—Host and cable assembly combinations' in AUTO FEC mode).

If all the vendors already ignore the standard (like Intel and AFAIU
nVidia) then we just need to document, no point adding the bit...

Gal Pressman Aug. 28, 2022, 10:42 a.m. UTC | #16

On 27/08/2022 02:57, Jakub Kicinski wrote:
> On Fri, 26 Aug 2022 10:51:21 -0700 Jacob Keller wrote:
>> On 8/25/2022 6:01 PM, Jakub Kicinski wrote:
>>> Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
>>> From the initial reading of your series I thought that Intel NICs 
>>> would _never_ pick No FEC.
>> That was my original interpretation when I was first introduced to this
>> problem but I was mistaken, hence why the commit message wasn't clear :(
>>
>> This is rather more complicated than I originally understood and the
>> names for various bits have not been named very well so their behavior
>> isn't exactly obvious...
>>
>>> Sounds like we need a bit for "ignore the standard and try everything".
>>>
>>> What about BASE-R FEC? Is the FW going to try it on the CA-L cable?
>> Ok I got further clarification on this. We have a bit, "Auto FEC
>> enable", as well as a bitmask for which FEC modes to try.
>>
>> If "Auto FEC En" is set, then the Link Establishment State Machine will
>> try all of the FEC options we list in that bitmask, as long as we can
>> theoretically support them even if they aren't spec compliant.
>>
>> For old firmware the bitmask didn't include a bit for "No FEC", where as
>> the new firmware has a bit for "No FEC".
>>
>> We were always setting "Auto FEC En" so currently we try all FEC modes
>> we could theoretically support.
>>
>> If "Auto FEC En" is disabled, then we only try FEC modes which are spec
>> compliant. Additionally, only a single FEC mode is tried based on a
>> priority and the bitmask.
>>
>> Currently and historically the driver has always set "Auto FEC En", so
>> we were enabling non-spec compliant FEC modes, but "No FEC" was only
>> based on spec compliance with the media type.
>>
>> From this, I think I agree the correct behavior is to add a bit for
>> "override the spec and try everything", and then on new firmware we'd
>> set the "No FEC" while on old firmware we'd be limited to only trying
>> FEC modes.
>>
>> Does that make sense?
>>
>> So yea I think we do probably need a "ignore the standard" bit.. but
>> currently that appears to already be what ice does (excepting No FEC
>> which didn't previously have a bit to set for it)
> Thanks for getting to the bottom of this :)
>
> The "override spec modes" bit sounds like a reasonable addition,
> since we possibly have different behavior between vendors letting 
> the user know if the device will follow the rules can save someone
> debugging time.
>
> But it does sound orthogonal to you adding the No FEC bit to the mask
> for ice.
>
> Let me add Simon and Andy so that we have the major vendors on the CC.
> (tl;dr the question is whether your FW follows the guidance of 
> 'Table 110C–1—Host and cable assembly combinations' in AUTO FEC mode).
>
> If all the vendors already ignore the standard (like Intel and AFAIU
> nVidia) then we just need to document, no point adding the bit...

I think we misunderstood each other :).
Our implementation definitely *does not* ignore the standard.
When autoneg is disabled, and auto fec is enabled, the firmware chooses
a fec mode according to the spec. If "no FEC" is not in the spec, we
will not pick it (nor do we want to).
It sounds like you're not happy with the spec, then why not change it?
This doesn't sound like an area where we want to be non-compliant.

Regardless, changing our interface because of one device' firmware
bug/behavior change doesn't make any sense. This interface affects all
consumers, and is going to stick with us forever. Firmware will
eventually get updated and it only affects one driver.

Jacob Keller Aug. 29, 2022, 7:11 a.m. UTC | #17

> -----Original Message-----
> From: Gal Pressman <gal@nvidia.com>
> Sent: Sunday, August 28, 2022 3:43 AM
> To: Jakub Kicinski <kuba@kernel.org>; Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org; Simon
> Horman <horms@verge.net.au>; Andy Gospodarek <andy@greyhouse.net>
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On 27/08/2022 02:57, Jakub Kicinski wrote:
> > On Fri, 26 Aug 2022 10:51:21 -0700 Jacob Keller wrote:
> >> On 8/25/2022 6:01 PM, Jakub Kicinski wrote:
> >>> Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
> >>> From the initial reading of your series I thought that Intel NICs
> >>> would _never_ pick No FEC.
> >> That was my original interpretation when I was first introduced to this
> >> problem but I was mistaken, hence why the commit message wasn't clear :(
> >>
> >> This is rather more complicated than I originally understood and the
> >> names for various bits have not been named very well so their behavior
> >> isn't exactly obvious...
> >>
> >>> Sounds like we need a bit for "ignore the standard and try everything".
> >>>
> >>> What about BASE-R FEC? Is the FW going to try it on the CA-L cable?
> >> Ok I got further clarification on this. We have a bit, "Auto FEC
> >> enable", as well as a bitmask for which FEC modes to try.
> >>
> >> If "Auto FEC En" is set, then the Link Establishment State Machine will
> >> try all of the FEC options we list in that bitmask, as long as we can
> >> theoretically support them even if they aren't spec compliant.
> >>
> >> For old firmware the bitmask didn't include a bit for "No FEC", where as
> >> the new firmware has a bit for "No FEC".
> >>
> >> We were always setting "Auto FEC En" so currently we try all FEC modes
> >> we could theoretically support.
> >>
> >> If "Auto FEC En" is disabled, then we only try FEC modes which are spec
> >> compliant. Additionally, only a single FEC mode is tried based on a
> >> priority and the bitmask.
> >>
> >> Currently and historically the driver has always set "Auto FEC En", so
> >> we were enabling non-spec compliant FEC modes, but "No FEC" was only
> >> based on spec compliance with the media type.
> >>
> >> From this, I think I agree the correct behavior is to add a bit for
> >> "override the spec and try everything", and then on new firmware we'd
> >> set the "No FEC" while on old firmware we'd be limited to only trying
> >> FEC modes.
> >>
> >> Does that make sense?
> >>
> >> So yea I think we do probably need a "ignore the standard" bit.. but
> >> currently that appears to already be what ice does (excepting No FEC
> >> which didn't previously have a bit to set for it)
> > Thanks for getting to the bottom of this :)
> >
> > The "override spec modes" bit sounds like a reasonable addition,
> > since we possibly have different behavior between vendors letting
> > the user know if the device will follow the rules can save someone
> > debugging time.
> >
> > But it does sound orthogonal to you adding the No FEC bit to the mask
> > for ice.
> >
> > Let me add Simon and Andy so that we have the major vendors on the CC.
> > (tl;dr the question is whether your FW follows the guidance of
> > 'Table 110C–1—Host and cable assembly combinations' in AUTO FEC mode).
> >

The other engineers I spoke to also wanted to mention that 110C-1 is only a small subset of all of the various link types. They also mentioned something about an SFF standard which describes many more types.

> > If all the vendors already ignore the standard (like Intel and AFAIU
> > nVidia) then we just need to document, no point adding the bit...
> 
> I think we misunderstood each other :).
> Our implementation definitely *does not* ignore the standard.
> When autoneg is disabled, and auto fec is enabled, the firmware chooses
> a fec mode according to the spec. If "no FEC" is not in the spec, we
> will not pick it (nor do we want to).
> It sounds like you're not happy with the spec, then why not change it?
> This doesn't sound like an area where we want to be non-compliant.
> 
> Regardless, changing our interface because of one device' firmware
> bug/behavior change doesn't make any sense. This interface affects all
> consumers, and is going to stick with us forever. Firmware will
> eventually get updated and it only affects one driver.

Well, the current ice behavior for every FEC mode *except* No FEC, we try modes which may be supportable even though they're outside the spec. As far as I understand, the reason we attempt these is because it allows linking in more scenarios, presumably because some combination of things is not fully spec compliant? I don't really know for sure.

For future firmware, this would include No FEC as well. Thus, even with future firmware we'd still be trying some modes outside of the spec. I can try to get some more answers tomorrow about the reasoning and justification for this behavior.

Thanks,
Jake

Gal Pressman Aug. 29, 2022, 11:21 a.m. UTC | #18

On 29/08/2022 10:11, Keller, Jacob E wrote:
>
>> -----Original Message-----
>> From: Gal Pressman <gal@nvidia.com>
>> Sent: Sunday, August 28, 2022 3:43 AM
>> To: Jakub Kicinski <kuba@kernel.org>; Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org; Simon
>> Horman <horms@verge.net.au>; Andy Gospodarek <andy@greyhouse.net>
>> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
>>
>> On 27/08/2022 02:57, Jakub Kicinski wrote:
>>> On Fri, 26 Aug 2022 10:51:21 -0700 Jacob Keller wrote:
>>>> On 8/25/2022 6:01 PM, Jakub Kicinski wrote:
>>>>> Oh, but per the IEEE standard No FEC is _not_ an option for CA-L.
>>>>> From the initial reading of your series I thought that Intel NICs
>>>>> would _never_ pick No FEC.
>>>> That was my original interpretation when I was first introduced to this
>>>> problem but I was mistaken, hence why the commit message wasn't clear :(
>>>>
>>>> This is rather more complicated than I originally understood and the
>>>> names for various bits have not been named very well so their behavior
>>>> isn't exactly obvious...
>>>>
>>>>> Sounds like we need a bit for "ignore the standard and try everything".
>>>>>
>>>>> What about BASE-R FEC? Is the FW going to try it on the CA-L cable?
>>>> Ok I got further clarification on this. We have a bit, "Auto FEC
>>>> enable", as well as a bitmask for which FEC modes to try.
>>>>
>>>> If "Auto FEC En" is set, then the Link Establishment State Machine will
>>>> try all of the FEC options we list in that bitmask, as long as we can
>>>> theoretically support them even if they aren't spec compliant.
>>>>
>>>> For old firmware the bitmask didn't include a bit for "No FEC", where as
>>>> the new firmware has a bit for "No FEC".
>>>>
>>>> We were always setting "Auto FEC En" so currently we try all FEC modes
>>>> we could theoretically support.
>>>>
>>>> If "Auto FEC En" is disabled, then we only try FEC modes which are spec
>>>> compliant. Additionally, only a single FEC mode is tried based on a
>>>> priority and the bitmask.
>>>>
>>>> Currently and historically the driver has always set "Auto FEC En", so
>>>> we were enabling non-spec compliant FEC modes, but "No FEC" was only
>>>> based on spec compliance with the media type.
>>>>
>>>> From this, I think I agree the correct behavior is to add a bit for
>>>> "override the spec and try everything", and then on new firmware we'd
>>>> set the "No FEC" while on old firmware we'd be limited to only trying
>>>> FEC modes.
>>>>
>>>> Does that make sense?
>>>>
>>>> So yea I think we do probably need a "ignore the standard" bit.. but
>>>> currently that appears to already be what ice does (excepting No FEC
>>>> which didn't previously have a bit to set for it)
>>> Thanks for getting to the bottom of this :)
>>>
>>> The "override spec modes" bit sounds like a reasonable addition,
>>> since we possibly have different behavior between vendors letting
>>> the user know if the device will follow the rules can save someone
>>> debugging time.
>>>
>>> But it does sound orthogonal to you adding the No FEC bit to the mask
>>> for ice.
>>>
>>> Let me add Simon and Andy so that we have the major vendors on the CC.
>>> (tl;dr the question is whether your FW follows the guidance of
>>> 'Table 110C–1—Host and cable assembly combinations' in AUTO FEC mode).
>>>
> The other engineers I spoke to also wanted to mention that 110C-1 is only a small subset of all of the various link types. They also mentioned something about an SFF standard which describes many more types.

The firmware folks here claim there's also a difference between TX and RX.

>
>>> If all the vendors already ignore the standard (like Intel and AFAIU
>>> nVidia) then we just need to document, no point adding the bit...
>> I think we misunderstood each other :).
>> Our implementation definitely *does not* ignore the standard.
>> When autoneg is disabled, and auto fec is enabled, the firmware chooses
>> a fec mode according to the spec. If "no FEC" is not in the spec, we
>> will not pick it (nor do we want to).
>> It sounds like you're not happy with the spec, then why not change it?
>> This doesn't sound like an area where we want to be non-compliant.
>>
>> Regardless, changing our interface because of one device' firmware
>> bug/behavior change doesn't make any sense. This interface affects all
>> consumers, and is going to stick with us forever. Firmware will
>> eventually get updated and it only affects one driver.
> Well, the current ice behavior for every FEC mode *except* No FEC, we try modes which may be supportable even though they're outside the spec. As far as I understand, the reason we attempt these is because it allows linking in more scenarios, presumably because some combination of things is not fully spec compliant? I don't really know for sure.
>
> For future firmware, this would include No FEC as well. Thus, even with future firmware we'd still be trying some modes outside of the spec. I can try to get some more answers tomorrow about the reasoning and justification for this behavior.
>

Yea, understood, but respectfully, I don't understand why we should go
along with your requirement to support this non-spec behavior.

Jacob Keller Aug. 29, 2022, 6:10 p.m. UTC | #19

On 8/29/2022 4:21 AM, Gal Pressman wrote:
> On 29/08/2022 10:11, Keller, Jacob E wrote:
>>> Regardless, changing our interface because of one device' firmware
>>> bug/behavior change doesn't make any sense. This interface affects all
>>> consumers, and is going to stick with us forever. Firmware will
>>> eventually get updated and it only affects one driver.
>> Well, the current ice behavior for every FEC mode *except* No FEC, we try modes which may be supportable even though they're outside the spec. As far as I understand, the reason we attempt these is because it allows linking in more scenarios, presumably because some combination of things is not fully spec compliant? I don't really know for sure.
>>
>> For future firmware, this would include No FEC as well. Thus, even with future firmware we'd still be trying some modes outside of the spec. I can try to get some more answers tomorrow about the reasoning and justification for this behavior.
>>
> 
> Yea, understood, but respectfully, I don't understand why we should go
> along with your requirement to support this non-spec behavior.

My understanding is that this is requested by customers for a few reasons:

1) interopability with legacy switches

2) interopability with modules which don't follow spec

3) low latency applications for which disabling FEC can improve latency
if the module is able to achieve a low enough error rate.

We have a fair number of customer requests to support these
non-compliant modules and modes, including both enabling certain FEC
modes or disabling FEC.

We already have this enabled with existing drivers. Of course, part of
that was caused by confusion due to poor naming scheme and lack of clear
communication to us about what the real behavior was. (Thanks Kuba for
pushing on that...) It probably comes across as a bit disingenuous
because we've implemented and enabled this support without being clear
about the behavior.

I haven't 100% confirmed, but I would be surprised if this only affects
ice. Its likely something that behaves similarly for other Intel products.

The ability to go outside the spec enables some of our customers and
solves real problems. The reality is that we don't always have perfect
hardware, and we want to inter-operate with the existing hardware. Some
switches were designed and built while the standards were still being
developed, and they don't 100% follow the spec because of this.

By extending the interface it becomes clear and obvious that we're going
outside the spec. If this hadn't been brought up this would have more or
less hidden behind a binary firmware blob with almost no way to notice
it, and no way to communicate that is whats happening.

I'm frustrated by the poor communication here because it was not at all
obvious to me until the last week that this is what we were doing.
However, I do see value in supporting the existing hardware available
even when its not quite spec compliant.

Thanks,
Jake

Jacob Keller Aug. 30, 2022, 8:09 p.m. UTC | #20

On 8/29/2022 11:10 AM, Jacob Keller wrote:
> 
> 
> On 8/29/2022 4:21 AM, Gal Pressman wrote:
>> On 29/08/2022 10:11, Keller, Jacob E wrote:
>>>> Regardless, changing our interface because of one device' firmware
>>>> bug/behavior change doesn't make any sense. This interface affects all
>>>> consumers, and is going to stick with us forever. Firmware will
>>>> eventually get updated and it only affects one driver.
>>> Well, the current ice behavior for every FEC mode *except* No FEC, we try modes which may be supportable even though they're outside the spec. As far as I understand, the reason we attempt these is because it allows linking in more scenarios, presumably because some combination of things is not fully spec compliant? I don't really know for sure.
>>>
>>> For future firmware, this would include No FEC as well. Thus, even with future firmware we'd still be trying some modes outside of the spec. I can try to get some more answers tomorrow about the reasoning and justification for this behavior.
>>>
>>
>> Yea, understood, but respectfully, I don't understand why we should go
>> along with your requirement to support this non-spec behavior.
> 
> My understanding is that this is requested by customers for a few reasons:
> 
> 1) interopability with legacy switches
> 
> 2) interopability with modules which don't follow spec
> 
> 3) low latency applications for which disabling FEC can improve latency
> if the module is able to achieve a low enough error rate.
> 
> We have a fair number of customer requests to support these
> non-compliant modules and modes, including both enabling certain FEC
> modes or disabling FEC.
> 
> We already have this enabled with existing drivers. Of course, part of
> that was caused by confusion due to poor naming scheme and lack of clear
> communication to us about what the real behavior was. (Thanks Kuba for
> pushing on that...) It probably comes across as a bit disingenuous
> because we've implemented and enabled this support without being clear
> about the behavior.
> 
> I haven't 100% confirmed, but I would be surprised if this only affects
> ice. Its likely something that behaves similarly for other Intel products.
> 
> The ability to go outside the spec enables some of our customers and
> solves real problems. The reality is that we don't always have perfect
> hardware, and we want to inter-operate with the existing hardware. Some
> switches were designed and built while the standards were still being
> developed, and they don't 100% follow the spec because of this.
> 
> By extending the interface it becomes clear and obvious that we're going
> outside the spec. If this hadn't been brought up this would have more or
> less hidden behind a binary firmware blob with almost no way to notice
> it, and no way to communicate that is whats happening.
> 
> I'm frustrated by the poor communication here because it was not at all
> obvious to me until the last week that this is what we were doing.
> However, I do see value in supporting the existing hardware available
> even when its not quite spec compliant.
> 
> Thanks,
> Jake

I'm trying to figure out what my next steps are here.

Jakub, from earlier discussion it sounded like you are ok with accepting
patch to include "No FEC" into our auto override behavior, with no uAPI
changes. Is that still ok given the recent dicussion regarding going
beyond the spec?

I'm also happy to rename the flag in ice so that its not misnamed and
clearly indicates its behavior.

Gal seems against extending uAPI to indicate or support "ignore spec".
To be properly correct that would mean changing ice to stop setting the
AUTO_FEC flag. As explained above, I believe this will lead to breakage
in situations where we used to link and function properly.

I have no way to verify whether other vendors actually follow this or
not, as it essentially requires checking with modules that wouldn't link
otherwise and likely requires a lot of trial and error.

Jakub Kicinski Aug. 30, 2022, 9:44 p.m. UTC | #21

On Tue, 30 Aug 2022 13:09:20 -0700 Jacob Keller wrote:
> I'm trying to figure out what my next steps are here.
> 
> Jakub, from earlier discussion it sounded like you are ok with accepting
> patch to include "No FEC" into our auto override behavior, with no uAPI
> changes. Is that still ok given the recent dicussion regarding going
> beyond the spec?

Yes, I reserve the right to change my mind :) but AFAIU it doesn't make
things worse, so fine by me.

> I'm also happy to rename the flag in ice so that its not misnamed and
> clearly indicates its behavior.

Which flag? A new ethtool priv flag?

> Gal seems against extending uAPI to indicate or support "ignore spec".
> To be properly correct that would mean changing ice to stop setting the
> AUTO_FEC flag. As explained above, I believe this will lead to breakage
> in situations where we used to link and function properly.

Stop setting the AUTO_FEC flag or start using a new standard compliant
AUTO flag?

Gal, within the spec do you iterate over modes or pick one mode somehow
(the spec gives a set, AFAICT)?

> I have no way to verify whether other vendors actually follow this or
> not, as it essentially requires checking with modules that wouldn't link
> otherwise and likely requires a lot of trial and error.

Getting some input from Broadcom or Netronome would be useful, yes :(

Jacob Keller Aug. 30, 2022, 11:09 p.m. UTC | #22

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, August 30, 2022 2:45 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Gal Pressman <gal@nvidia.com>; Saeed Mahameed <saeedm@nvidia.com>;
> netdev@vger.kernel.org; Simon Horman <horms@verge.net.au>; Andy
> Gospodarek <andy@greyhouse.net>
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On Tue, 30 Aug 2022 13:09:20 -0700 Jacob Keller wrote:
> > I'm trying to figure out what my next steps are here.
> >
> > Jakub, from earlier discussion it sounded like you are ok with accepting
> > patch to include "No FEC" into our auto override behavior, with no uAPI
> > changes. Is that still ok given the recent dicussion regarding going
> > beyond the spec?
> 
> Yes, I reserve the right to change my mind :) but AFAIU it doesn't make
> things worse, so fine by me.
>

Ok.
 
> > I'm also happy to rename the flag in ice so that its not misnamed and
> > clearly indicates its behavior.
> 
> Which flag? A new ethtool priv flag?
> 

No the flag I am referring to here is for the bit we pass to firmware from the driver. This is confusingly named "EN_AUTO_FEC" but it really means something like "override spec and try all FEC modes".

> > Gal seems against extending uAPI to indicate or support "ignore spec".
> > To be properly correct that would mean changing ice to stop setting the
> > AUTO_FEC flag. As explained above, I believe this will lead to breakage
> > in situations where we used to link and function properly.
> 
> Stop setting the AUTO_FEC flag or start using a new standard compliant
> AUTO flag?

There is only "EN_AUTO_FEC" which means both "try multiple FEC modes, including ones outside the spec". If we disable this flag, then I believe it will only try the highest priority FEC mode for each link type. This is the flag which I think is poorly named and led me to misunderstand the whole behavior.

> 
> Gal, within the spec do you iterate over modes or pick one mode somehow
> (the spec gives a set, AFAICT)?
> 
> > I have no way to verify whether other vendors actually follow this or
> > not, as it essentially requires checking with modules that wouldn't link
> > otherwise and likely requires a lot of trial and error.
> 
> Getting some input from Broadcom or Netronome would be useful, yes :(

Gal Pressman Aug. 31, 2022, 11:01 a.m. UTC | #23

On 31/08/2022 00:44, Jakub Kicinski wrote:
> On Tue, 30 Aug 2022 13:09:20 -0700 Jacob Keller wrote:
>> Gal seems against extending uAPI to indicate or support "ignore spec".
>> To be properly correct that would mean changing ice to stop setting the
>> AUTO_FEC flag. As explained above, I believe this will lead to breakage
>> in situations where we used to link and function properly.
> Stop setting the AUTO_FEC flag or start using a new standard compliant
> AUTO flag?
>
> Gal, within the spec do you iterate over modes or pick one mode somehow
> (the spec gives a set, AFAICT)?

When autoneg is enabled (and auto fec enabled), auto negotiation is done
from the link modes according to spec.
When autoneg is disabled (and auto fec enabled), the firmware chooses
one of the supported modes according to the spec. As far as I
understand, it doesn't try anything, just picks a supported mode.

This whole thing revolves around customers who don't use auto
negotiation, but it sounds like ice is still trying to do auto
negotiation for fec under the hood.
Looking at the reasons Jacob listed here (unspec switches/modules), it
seems like all this can be resolved by simply setting the fec mode
explicitly, and to me it sounds like a reasonable solution for these
special cases.

Jakub Kicinski Aug. 31, 2022, 5:36 p.m. UTC | #24

On Wed, 31 Aug 2022 14:01:10 +0300 Gal Pressman wrote:
> When autoneg is disabled (and auto fec enabled), the firmware chooses
> one of the supported modes according to the spec. 

Would you be able to provide the pointer in the spec / section which
defines the behavior?  It may be helpful to add that to the kdoc for 
ETHTOOL_FEC_AUTO_BIT.

Jacob Keller Aug. 31, 2022, 8:15 p.m. UTC | #25

On 8/31/2022 4:01 AM, Gal Pressman wrote:
> On 31/08/2022 00:44, Jakub Kicinski wrote:
>> On Tue, 30 Aug 2022 13:09:20 -0700 Jacob Keller wrote:
>>> Gal seems against extending uAPI to indicate or support "ignore spec".
>>> To be properly correct that would mean changing ice to stop setting the
>>> AUTO_FEC flag. As explained above, I believe this will lead to breakage
>>> in situations where we used to link and function properly.
>> Stop setting the AUTO_FEC flag or start using a new standard compliant
>> AUTO flag?
>>
>> Gal, within the spec do you iterate over modes or pick one mode somehow
>> (the spec gives a set, AFAICT)?
>> When autoneg is enabled (and auto fec enabled), auto negotiation is done
> from the link modes according to spec.

This is the same for ice, except that the ETHTOOL_FEC_* configuration is
 ignored. The documentation indicates its only intended for when
autonegotiation is disabled.

> When autoneg is disabled (and auto fec enabled), the firmware chooses
> one of the supported modes according to the spec. As far as I
> understand, it doesn't try anything, just picks a supported mode.
> 

This is how ice works if we don't set the ICE_AQC_PHY_EN_FEC_AUTO flag
when configuring our firmware.

> This whole thing revolves around customers who don't use auto
> negotiation, but it sounds like ice is still trying to do auto
> negotiation for fec under the hood.

It's not really auto negotiation as it is more like auto-retry, its a
simple state machine that iterates through a series of possible
configurations. The goal being to reduce cognitive burden on users and
just try to establish link.

I believe that sigh ICE_AQC_PHY_EN_FEC_AUTO set, we still try different
media type settings but only one FEC mode. The exact way it picks such a
FEC mode is unclear to me in this case. With ICE_AQC_PHY_EN_FEC_AUTO,
each media type is re-tried with each FEC mode that is theoretically
possible even if its not in spec.

> Looking at the reasons Jacob listed here (unspec switches/modules), it
> seems like all this can be resolved by simply setting the fec mode
> explicitly, and to me it sounds like a reasonable solution for these
> special cases.

Gal Pressman Sept. 1, 2022, 11:51 a.m. UTC | #26

On 31/08/2022 23:15, Jacob Keller wrote:
> On 8/31/2022 4:01 AM, Gal Pressman wrote:
>> When autoneg is disabled (and auto fec enabled), the firmware chooses
>> one of the supported modes according to the spec. As far as I
>> understand, it doesn't try anything, just picks a supported mode.
>>
> This is how ice works if we don't set the ICE_AQC_PHY_EN_FEC_AUTO flag
> when configuring our firmware.

If auto fec is off, whatever mode the user chose explicitly should be used.
What I'm referring to is when auto fec is on, then our firmware picks
one of the spec modes it sees fit (according to spec).

>> This whole thing revolves around customers who don't use auto
>> negotiation, but it sounds like ice is still trying to do auto
>> negotiation for fec under the hood.
> It's not really auto negotiation as it is more like auto-retry, its a
> simple state machine that iterates through a series of possible
> configurations. The goal being to reduce cognitive burden on users and
> just try to establish link.

Cognitive burden can be reduced by using auto negotiation?

Gal Pressman Sept. 1, 2022, 11:52 a.m. UTC | #27

On 31/08/2022 20:36, Jakub Kicinski wrote:
> On Wed, 31 Aug 2022 14:01:10 +0300 Gal Pressman wrote:
>> When autoneg is disabled (and auto fec enabled), the firmware chooses
>> one of the supported modes according to the spec. 
> Would you be able to provide the pointer in the spec / section which
> defines the behavior?  It may be helpful to add that to the kdoc for 
> ETHTOOL_FEC_AUTO_BIT.

Asked our architect for a pointer, hope I'll get it soon (he's OOO,
might take a few days).

Jacob Keller Sept. 1, 2022, 5:59 p.m. UTC | #28

> -----Original Message-----
> From: Gal Pressman <gal@nvidia.com>
> Sent: Thursday, September 01, 2022 4:52 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>
> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org; Simon
> Horman <horms@verge.net.au>; Andy Gospodarek <andy@greyhouse.net>
> Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable
> 
> On 31/08/2022 23:15, Jacob Keller wrote:
> > On 8/31/2022 4:01 AM, Gal Pressman wrote:
> >> When autoneg is disabled (and auto fec enabled), the firmware chooses
> >> one of the supported modes according to the spec. As far as I
> >> understand, it doesn't try anything, just picks a supported mode.
> >>
> > This is how ice works if we don't set the ICE_AQC_PHY_EN_FEC_AUTO flag
> > when configuring our firmware.
> 
> If auto fec is off, whatever mode the user chose explicitly should be used.
> What I'm referring to is when auto fec is on, then our firmware picks
> one of the spec modes it sees fit (according to spec).

Correct. I'm referring to a firwamre bit we set, not the ETHTOOL_FEC_AUTO here. Sorry for the confusion, the names are similar but not strictly related. That is definitely a point of confusion here :(

Currently, when ETHTOOL_FEC_AUTO is  set, we always set ICE_AQC_PHY_EN_FEC_AUTO, but if we opted not to do that, then we would have similar behavior to what you describe.

We would choose a single FEC mode to try for the various media types that get tried. We still perform the link state machine, but with fewer FEC options and none of them outside spec. We don't currently have a driver implemented this way.

> 
> >> This whole thing revolves around customers who don't use auto
> >> negotiation, but it sounds like ice is still trying to do auto
> >> negotiation for fec under the hood.
> > It's not really auto negotiation as it is more like auto-retry, its a
> > simple state machine that iterates through a series of possible
> > configurations. The goal being to reduce cognitive burden on users and
> > just try to establish link.
> 
> Cognitive burden can be reduced by using auto negotiation?

I'm not sure what situations result in autonegotiation vs when we have it disabled. For example, if autonegotiation fails we then the link state machine falls back to the series of forced configurations it tries.

The auto selection makes the device attempt to get link, and by enabling more modes we are more likely to achieve link. By having this happen without need for significant user interaction means that users simply see the device link up and function without the need to make a choice.

Thanks,
Jake

Gal Pressman Sept. 5, 2022, 11:25 a.m. UTC | #29

On 31/08/2022 20:36, Jakub Kicinski wrote:
> On Wed, 31 Aug 2022 14:01:10 +0300 Gal Pressman wrote:
>> When autoneg is disabled (and auto fec enabled), the firmware chooses
>> one of the supported modes according to the spec. 
> Would you be able to provide the pointer in the spec / section which
> defines the behavior?  It may be helpful to add that to the kdoc for 
> ETHTOOL_FEC_AUTO_BIT.

I have been directed to:
https://www.snia.org/technology-communities/sff/specifications?field_doc_status_value=All&combine=8024&field_release_date_value_2%5Bvalue%5D%5Bdate%5D=&field_release_date_value%5Bvalue%5D%5Bdate%5D=&items_per_page=20

SFF-8024 SFF Module Management Reference Code Tables -> Table 4-4
Extended Specification Compliance Codes.

Michael Chan Sept. 7, 2022, 3:44 a.m. UTC | #30

On Tue, Aug 30, 2022 at 2:45 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 30 Aug 2022 13:09:20 -0700 Jacob Keller wrote:
> > I have no way to verify whether other vendors actually follow this or
> > not, as it essentially requires checking with modules that wouldn't link
> > otherwise and likely requires a lot of trial and error.
>
> Getting some input from Broadcom or Netronome would be useful, yes :(

Sorry for the late comment.  I needed to consult with the firmware
engineer first.  The bnxt driver does not support auto FEC if
auto-negotiation is disabled.  But firmware has a "media detect"
fallback mode if auto-negotiation fails to link up.  It is similar to
the parallel detect mode but done in firmware.  I think this fallback
mode may be relevant to the discussion here.  In this fallback mode,
firmware will cycle through all the supported speeds and supported FEC
combinations (including no FEC) and try to link up.

[net-next,0/2] ice: support FEC automatic disable

Message

Comments