mbox series

[net-next,v2,0/7] drivers: Fix drivers doing TX csum offload with EH

Message ID 20240701195507.256374-1-tom@herbertland.com (mailing list archive)
Headers show
Series drivers: Fix drivers doing TX csum offload with EH | expand

Message

Tom Herbert July 1, 2024, 7:55 p.m. UTC
Several NICs would seem to support protocol specific TX checksum offload
and allow for cases where an IPv6 packet contains extension headers.
When deciding whether to offload a packet, ipv6_skip_exthdr is called
to skip extension headers. The problem is that if a packet contains an
IPv6 Routing Header then protocol specific checksum offload can't work,
the destination IP address in the IPv6 header is not the same one that
is used in the pseudo header for TCP or UDP. The correct address is
derived from the last segment in the routing list (which itself might
be obfuscated so that a device could even read it).

This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
called in lieu of ipv6_skip_exthdr. If a routing header is present in
a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
zero, this is an indication to the driver that TX checksum offload
is not viable and it should call skb_checksum_help instead of
offloading the checksum.

The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
to call ipv6_skip_exthdr_no_rthdr.

Testing: The code compiles, but is otherwise untested due to lack of
NIC hardware. It would be appreciated if someone with access to the
hardware could test.

v2: Fixed uninitialized variable in exthdrs_core.c

Tom Herbert (7):
  ipv6: Add ipv6_skip_exthdr_no_rthdr
  i40e: Don't do TX csum offload with routing header present
  iavf: Don't do TX csum offload with routing header present
  ice: Don't do TX csum offload with routing header present
  idpf: Don't do TX csum offload with routing header present
  hinic: Don't do TX csum offload with routing header present
  fm10k: Don't do TX csum offload with routing header present

 drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 23 +++++++++++----
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |  9 ++++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 ++++++---------
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 20 ++++++-------
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 22 ++++++---------
 .../ethernet/intel/idpf/idpf_singleq_txrx.c   | 28 +++++++++----------
 include/net/ipv6.h                            | 17 +++++++++--
 net/ipv6/exthdrs_core.c                       | 25 ++++++++++++-----
 8 files changed, 98 insertions(+), 68 deletions(-)

Comments

Przemek Kitszel July 2, 2024, 10:31 a.m. UTC | #1
On 7/1/24 21:55, Tom Herbert wrote:
> Several NICs would seem to support protocol specific TX checksum offload
> and allow for cases where an IPv6 packet contains extension headers.
> When deciding whether to offload a packet, ipv6_skip_exthdr is called
> to skip extension headers. The problem is that if a packet contains an
> IPv6 Routing Header then protocol specific checksum offload can't work,
> the destination IP address in the IPv6 header is not the same one that
> is used in the pseudo header for TCP or UDP. The correct address is
> derived from the last segment in the routing list (which itself might
> be obfuscated so that a device could even read it).

feels like there is a missing "not" after "could" - with it added, reads
fine (not a request to change, just being verbose about assumptions)

> 
> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
> called in lieu of ipv6_skip_exthdr. If a routing header is present in
> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
> zero, this is an indication to the driver that TX checksum offload
> is not viable and it should call skb_checksum_help instead of
> offloading the checksum.
> 
> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
> to call ipv6_skip_exthdr_no_rthdr.
> 
> Testing: The code compiles, but is otherwise untested due to lack of
> NIC hardware. It would be appreciated if someone with access to the
> hardware could test.

we could test intel ones (except fm10k) via @Tony's tree

> 
> v2: Fixed uninitialized variable in exthdrs_core.c
> 
> Tom Herbert (7):
>    ipv6: Add ipv6_skip_exthdr_no_rthdr
>    i40e: Don't do TX csum offload with routing header present
>    iavf: Don't do TX csum offload with routing header present
>    ice: Don't do TX csum offload with routing header present

sidenote:
our HW is supporting (among others) a GCO check-summing mode described
as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not
yet provided patches for that, and I don't even know if this mode
will be used (CC @Paul)

>    idpf: Don't do TX csum offload with routing header present
>    hinic: Don't do TX csum offload with routing header present
>    fm10k: Don't do TX csum offload with routing header present
> 
>   drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 23 +++++++++++----
>   drivers/net/ethernet/intel/fm10k/fm10k_main.c |  9 ++++--
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 ++++++---------
>   drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 20 ++++++-------
>   drivers/net/ethernet/intel/ice/ice_txrx.c     | 22 ++++++---------
>   .../ethernet/intel/idpf/idpf_singleq_txrx.c   | 28 +++++++++----------
>   include/net/ipv6.h                            | 17 +++++++++--
>   net/ipv6/exthdrs_core.c                       | 25 ++++++++++++-----
>   8 files changed, 98 insertions(+), 68 deletions(-)
> 

I have reviewed the patches and they conform to commit message/intent,
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
(for the series)
Jakub Kicinski July 3, 2024, 1:46 a.m. UTC | #2
On Mon,  1 Jul 2024 12:55:00 -0700 Tom Herbert wrote:
> Testing: The code compiles, but is otherwise untested due to lack of
> NIC hardware. It would be appreciated if someone with access to the
> hardware could test.

Could you pop a script under tools/testing/selftests/drivers/net/
that'd exercise this?

This will hopefully guarantee good coverage soon, due to:
https://netdev.bots.linux.dev/devices.html
Greenwalt, Paul July 3, 2024, 2:20 p.m. UTC | #3
On 7/2/2024 3:31 AM, Przemek Kitszel wrote:
> On 7/1/24 21:55, Tom Herbert wrote:
>> Several NICs would seem to support protocol specific TX checksum offload
>> and allow for cases where an IPv6 packet contains extension headers.
>> When deciding whether to offload a packet, ipv6_skip_exthdr is called
>> to skip extension headers. The problem is that if a packet contains an
>> IPv6 Routing Header then protocol specific checksum offload can't work,
>> the destination IP address in the IPv6 header is not the same one that
>> is used in the pseudo header for TCP or UDP. The correct address is
>> derived from the last segment in the routing list (which itself might
>> be obfuscated so that a device could even read it).
> 
> feels like there is a missing "not" after "could" - with it added, reads
> fine (not a request to change, just being verbose about assumptions)
> 
>>
>> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
>> called in lieu of ipv6_skip_exthdr. If a routing header is present in
>> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
>> zero, this is an indication to the driver that TX checksum offload
>> is not viable and it should call skb_checksum_help instead of
>> offloading the checksum.
>>
>> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
>> to call ipv6_skip_exthdr_no_rthdr.
>>
>> Testing: The code compiles, but is otherwise untested due to lack of
>> NIC hardware. It would be appreciated if someone with access to the
>> hardware could test.
> 
> we could test intel ones (except fm10k) via @Tony's tree
> 
>>
>> v2: Fixed uninitialized variable in exthdrs_core.c
>>
>> Tom Herbert (7):
>>    ipv6: Add ipv6_skip_exthdr_no_rthdr
>>    i40e: Don't do TX csum offload with routing header present
>>    iavf: Don't do TX csum offload with routing header present
>>    ice: Don't do TX csum offload with routing header present
> 
> sidenote:
> our HW is supporting (among others) a GCO check-summing mode described
> as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not
> yet provided patches for that, and I don't even know if this mode
> will be used (CC @Paul)
> 

We will be adding support for GCO "Checksum 16 with pseudo Headers" to
the ice driver. It will be off by default.

>>    idpf: Don't do TX csum offload with routing header present
>>    hinic: Don't do TX csum offload with routing header present
>>    fm10k: Don't do TX csum offload with routing header present
>>
>>   drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 23 +++++++++++----
>>   drivers/net/ethernet/intel/fm10k/fm10k_main.c |  9 ++++--
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 ++++++---------
>>   drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 20 ++++++-------
>>   drivers/net/ethernet/intel/ice/ice_txrx.c     | 22 ++++++---------
>>   .../ethernet/intel/idpf/idpf_singleq_txrx.c   | 28 +++++++++----------
>>   include/net/ipv6.h                            | 17 +++++++++--
>>   net/ipv6/exthdrs_core.c                       | 25 ++++++++++++-----
>>   8 files changed, 98 insertions(+), 68 deletions(-)
>>
> 
> I have reviewed the patches and they conform to commit message/intent,
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> (for the series)
Tom Herbert July 3, 2024, 2:39 p.m. UTC | #4
On Tue, Jul 2, 2024 at 6:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  1 Jul 2024 12:55:00 -0700 Tom Herbert wrote:
> > Testing: The code compiles, but is otherwise untested due to lack of
> > NIC hardware. It would be appreciated if someone with access to the
> > hardware could test.
>
> Could you pop a script under tools/testing/selftests/drivers/net/
> that'd exercise this?
>
> This will hopefully guarantee good coverage soon, due to:
> https://netdev.bots.linux.dev/devices.html

Sure. Thanks for the pointer.

Tom
Przemek Kitszel July 3, 2024, 3:02 p.m. UTC | #5
On 7/3/24 16:38, Tom Herbert wrote:
> 
> 
> On Wed, Jul 3, 2024, 7:20 AM Greenwalt, Paul <paul.greenwalt@intel.com 
> <mailto:paul.greenwalt@intel.com>> wrote:
> 
> 
> 
>     On 7/2/2024 3:31 AM, Przemek Kitszel wrote:
>      > On 7/1/24 21:55, Tom Herbert wrote:
>      >> Several NICs would seem to support protocol specific TX checksum
>     offload
>      >> and allow for cases where an IPv6 packet contains extension headers.
>      >> When deciding whether to offload a packet, ipv6_skip_exthdr is
>     called
>      >> to skip extension headers. The problem is that if a packet
>     contains an
>      >> IPv6 Routing Header then protocol specific checksum offload
>     can't work,
>      >> the destination IP address in the IPv6 header is not the same
>     one that
>      >> is used in the pseudo header for TCP or UDP. The correct address is
>      >> derived from the last segment in the routing list (which itself
>     might
>      >> be obfuscated so that a device could even read it).
>      >
>      > feels like there is a missing "not" after "could" - with it
>     added, reads
>      > fine (not a request to change, just being verbose about assumptions)
>      >
>      >>
>      >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
>      >> called in lieu of ipv6_skip_exthdr. If a routing header is
>     present in
>      >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
>      >> zero, this is an indication to the driver that TX checksum offload
>      >> is not viable and it should call skb_checksum_help instead of
>      >> offloading the checksum.
>      >>
>      >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
>      >> to call ipv6_skip_exthdr_no_rthdr.
>      >>
>      >> Testing: The code compiles, but is otherwise untested due to lack of
>      >> NIC hardware. It would be appreciated if someone with access to the
>      >> hardware could test.
>      >
>      > we could test intel ones (except fm10k) via @Tony's tree
> 
> 
> Awesome! If you need any help let me know.
> 
>      >
>      >>
>      >> v2: Fixed uninitialized variable in exthdrs_core.c
>      >>
>      >> Tom Herbert (7):
>      >>    ipv6: Add ipv6_skip_exthdr_no_rthdr
>      >>    i40e: Don't do TX csum offload with routing header present
>      >>    iavf: Don't do TX csum offload with routing header present
>      >>    ice: Don't do TX csum offload with routing header present
>      >
>      > sidenote:
>      > our HW is supporting (among others) a GCO check-summing mode
>     described
>      > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not
>      > yet provided patches for that, and I don't even know if this mode
>      > will be used (CC @Paul)
>      >
> 
>     We will be adding support for GCO "Checksum 16 with pseudo Headers" to
>     the ice driver. It will be off by default.
> 
> 
> I'm not sure what that means. 

IPv6 Routing Headers render (simple?) HW-offloaded checksumming wrong,
but not for the "no pseudo Header"-checksum. I have no idea how such
checksum will be useful, and we don't have plans to implement it, so
this is not much relevant. But that is just one mode that you could 
config our (new) HW.

> Can ICE just provide checksum-complete? 
> It's by far the simplest, most robust method with the most flexibility 
> for users. :-)

sorry, could you please elaborate?

Paul will implement GCO for ice and otherwise my understanding was that
our checksum is fine. Is there a room for improvement?

> 
> Tom
> 
> 
>      >>    idpf: Don't do TX csum offload with routing header present
>      >>    hinic: Don't do TX csum offload with routing header present
>      >>    fm10k: Don't do TX csum offload with routing header present
>      >>
>      >>   drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 23 +++++++++++----
>      >>   drivers/net/ethernet/intel/fm10k/fm10k_main.c |  9 ++++--
>      >>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 ++++++---------
>      >>   drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 20 ++++++-------
>      >>   drivers/net/ethernet/intel/ice/ice_txrx.c     | 22 ++++++---------
>      >>   .../ethernet/intel/idpf/idpf_singleq_txrx.c   | 28
>     +++++++++----------
>      >>   include/net/ipv6.h                            | 17 +++++++++--
>      >>   net/ipv6/exthdrs_core.c                       | 25
>     ++++++++++++-----
>      >>   8 files changed, 98 insertions(+), 68 deletions(-)
>      >>
>      >
>      > I have reviewed the patches and they conform to commit
>     message/intent,
>      > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com
>     <mailto:przemyslaw.kitszel@intel.com>>
>      > (for the series)
>
Tom Herbert July 3, 2024, 3:56 p.m. UTC | #6
On Wed, Jul 3, 2024 at 8:03 AM Przemek Kitszel
<przemyslaw.kitszel@intel.com> wrote:
>
> On 7/3/24 16:38, Tom Herbert wrote:
> >
> >
> > On Wed, Jul 3, 2024, 7:20 AM Greenwalt, Paul <paul.greenwalt@intel.com
> > <mailto:paul.greenwalt@intel.com>> wrote:
> >
> >
> >
> >     On 7/2/2024 3:31 AM, Przemek Kitszel wrote:
> >      > On 7/1/24 21:55, Tom Herbert wrote:
> >      >> Several NICs would seem to support protocol specific TX checksum
> >     offload
> >      >> and allow for cases where an IPv6 packet contains extension headers.
> >      >> When deciding whether to offload a packet, ipv6_skip_exthdr is
> >     called
> >      >> to skip extension headers. The problem is that if a packet
> >     contains an
> >      >> IPv6 Routing Header then protocol specific checksum offload
> >     can't work,
> >      >> the destination IP address in the IPv6 header is not the same
> >     one that
> >      >> is used in the pseudo header for TCP or UDP. The correct address is
> >      >> derived from the last segment in the routing list (which itself
> >     might
> >      >> be obfuscated so that a device could even read it).
> >      >
> >      > feels like there is a missing "not" after "could" - with it
> >     added, reads
> >      > fine (not a request to change, just being verbose about assumptions)
> >      >
> >      >>
> >      >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
> >      >> called in lieu of ipv6_skip_exthdr. If a routing header is
> >     present in
> >      >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
> >      >> zero, this is an indication to the driver that TX checksum offload
> >      >> is not viable and it should call skb_checksum_help instead of
> >      >> offloading the checksum.
> >      >>
> >      >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
> >      >> to call ipv6_skip_exthdr_no_rthdr.
> >      >>
> >      >> Testing: The code compiles, but is otherwise untested due to lack of
> >      >> NIC hardware. It would be appreciated if someone with access to the
> >      >> hardware could test.
> >      >
> >      > we could test intel ones (except fm10k) via @Tony's tree
> >
> >
> > Awesome! If you need any help let me know.
> >
> >      >
> >      >>
> >      >> v2: Fixed uninitialized variable in exthdrs_core.c
> >      >>
> >      >> Tom Herbert (7):
> >      >>    ipv6: Add ipv6_skip_exthdr_no_rthdr
> >      >>    i40e: Don't do TX csum offload with routing header present
> >      >>    iavf: Don't do TX csum offload with routing header present
> >      >>    ice: Don't do TX csum offload with routing header present
> >      >
> >      > sidenote:
> >      > our HW is supporting (among others) a GCO check-summing mode
> >     described
> >      > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not
> >      > yet provided patches for that, and I don't even know if this mode
> >      > will be used (CC @Paul)
> >      >
> >
> >     We will be adding support for GCO "Checksum 16 with pseudo Headers" to
> >     the ice driver. It will be off by default.
> >
> >
> > I'm not sure what that means.
>
> IPv6 Routing Headers render (simple?) HW-offloaded checksumming wrong,
> but not for the "no pseudo Header"-checksum. I have no idea how such
> checksum will be useful, and we don't have plans to implement it, so
> this is not much relevant. But that is just one mode that you could
> config our (new) HW.
>
> > Can ICE just provide checksum-complete?
> > It's by far the simplest, most robust method with the most flexibility
> > for users. :-)
>
> sorry, could you please elaborate?
>
> Paul will implement GCO for ice and otherwise my understanding was that
> our checksum is fine. Is there a room for improvement?

Przemek,

No, there's plenty of room for improvement :-). This is protocol
specific checksum offload versus protocol agnostic checksum offload,
and the opinion of the community has been clear for a long time:
protocol agnostic checksum offload is the preferred method and
protocol specific checksum offload is deprecated. This is true for
both transmit and receive checksum offload. For background see Dave
Miller's presentation on this from 2016:
https://www.youtube.com/watch?v=6VgmazGwL_Y.

Protocol agnostic checksum offload isn't just a to have, it addresses
many bugs in protocol specific checksum offload. This patch set
addresses one obvious bug, but there are others. For instance, in IETF
there is a proposal in spring WG to do SRv6 without a routing header
that would make the checksum incorrect on the wire. This will break
protocol specific TX checksum offload and there's nothing to key on in
the packet like an RH  so that a driver would know the offload will
fail. I'm really not sure how we could fix this without major surgery
in the stack. Use protocol agnostic checksum offload and it "just
works" (the proposal to purposely send a bad checksum on the wire
without a RH is a bad idea in general, but I'm not sure we'll be able
to stop it in IETF).

And not to pick on the ICE driver, but please take a look at the
function ice_tx_csum. This function is called on every packet just to
evaluate whether the device is going to be able to offload the packet.
Basically, it parses the packet on transmit to make sure that the
device will be able to parse the packet (this is where we need to call
ipv6_skip_exthdr_no_rthdr). This function is 180 LOC! If the device
properly supports protocol agnostic checksum offload all that is
needed is to write the start offset and checksum offset into the
receive descriptor. Maybe there's some checks on the offset values,
but I can't see that needing more than ten LOC--  it's much less
susceptible to bugs, more robust, and works with a much wider set of
protocol combinations.

BTW, this patch set is the first in a series to formally deprecate and
remove protocol specific checksum offloads from the core kernel. IMO,
we've waited long enough! My intent is to remove CHECKSUM_UNNECESSARY,
NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM. (note comment in skbuff.h "New
devices should use %NETIF_F_HW_CSUM to indicate checksum offload
capability."). Helper functions will be provided to support legacy
devices.

Tom

>
> >
> > Tom
> >
> >
> >      >>    idpf: Don't do TX csum offload with routing header present
> >      >>    hinic: Don't do TX csum offload with routing header present
> >      >>    fm10k: Don't do TX csum offload with routing header present
> >      >>
> >      >>   drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 23 +++++++++++----
> >      >>   drivers/net/ethernet/intel/fm10k/fm10k_main.c |  9 ++++--
> >      >>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 ++++++---------
> >      >>   drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 20 ++++++-------
> >      >>   drivers/net/ethernet/intel/ice/ice_txrx.c     | 22 ++++++---------
> >      >>   .../ethernet/intel/idpf/idpf_singleq_txrx.c   | 28
> >     +++++++++----------
> >      >>   include/net/ipv6.h                            | 17 +++++++++--
> >      >>   net/ipv6/exthdrs_core.c                       | 25
> >     ++++++++++++-----
> >      >>   8 files changed, 98 insertions(+), 68 deletions(-)
> >      >>
> >      >
> >      > I have reviewed the patches and they conform to commit
> >     message/intent,
> >      > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com
> >     <mailto:przemyslaw.kitszel@intel.com>>
> >      > (for the series)
> >
>