diff mbox series

[bpf-next,v3,12/21] xdp: Add checksum hint

Message ID 20230719183734.21681-13-larysa.zaremba@intel.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series XDP metadata via kfuncs for ice | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/apply fail Patch does not apply to bpf-next
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat

Commit Message

Larysa Zaremba July 19, 2023, 6:37 p.m. UTC
Implement functionality that enables drivers to expose to XDP code checksum
information that consists of:

- Checksum status - bitfield that consists of
  - number of consecutive validated checksums. This is almost the same as
    csum_level in skb, but starts with 1. Enum names for those bits still
    use checksum level concept, so it is less confusing for driver
    developers.
  - Is checksum partial? This bit cannot coexist with any other
  - Is there a complete checksum available?
- Additional checksum data, a union of:
  - checksum start and offset, if checksum is partial
  - complete checksum, if available

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 Documentation/networking/xdp-rx-metadata.rst |  3 ++
 include/linux/netdevice.h                    |  3 ++
 include/net/xdp.h                            | 46 ++++++++++++++++++++
 kernel/bpf/offload.c                         |  2 +
 net/core/xdp.c                               | 23 ++++++++++
 5 files changed, 77 insertions(+)

Comments

Willem de Bruijn July 19, 2023, 9:42 p.m. UTC | #1
Larysa Zaremba wrote:
> Implement functionality that enables drivers to expose to XDP code checksum
> information that consists of:
> 
> - Checksum status - bitfield that consists of
>   - number of consecutive validated checksums. This is almost the same as
>     csum_level in skb, but starts with 1. Enum names for those bits still
>     use checksum level concept, so it is less confusing for driver
>     developers.
>   - Is checksum partial? This bit cannot coexist with any other
>   - Is there a complete checksum available?
> - Additional checksum data, a union of:
>   - checksum start and offset, if checksum is partial
>   - complete checksum, if available
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  Documentation/networking/xdp-rx-metadata.rst |  3 ++
>  include/linux/netdevice.h                    |  3 ++
>  include/net/xdp.h                            | 46 ++++++++++++++++++++
>  kernel/bpf/offload.c                         |  2 +
>  net/core/xdp.c                               | 23 ++++++++++
>  5 files changed, 77 insertions(+)
> 
> diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> index ea6dd79a21d3..7f056a44f682 100644
> --- a/Documentation/networking/xdp-rx-metadata.rst
> +++ b/Documentation/networking/xdp-rx-metadata.rst
> @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
>  .. kernel-doc:: net/core/xdp.c
>     :identifiers: bpf_xdp_metadata_rx_vlan_tag
>  
> +.. kernel-doc:: net/core/xdp.c
> +   :identifiers: bpf_xdp_metadata_rx_csum
> +
>  An XDP program can use these kfuncs to read the metadata into stack
>  variables for its own consumption. Or, to pass the metadata on to other
>  consumers, an XDP program can store it into the metadata area carried
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 1749f4f75c64..4f6da36ac123 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
>  			       enum xdp_rss_hash_type *rss_type);
>  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
>  				   __be16 *vlan_proto);
> +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> +			       enum xdp_csum_status *csum_status,
> +			       union xdp_csum_info *csum_info);
>  };
>  
>  /**
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 89c58f56ffc6..2b7a7d678ff4 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
>  			   bpf_xdp_metadata_rx_hash) \
>  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
>  			   bpf_xdp_metadata_rx_vlan_tag) \
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> +			   bpf_xdp_metadata_rx_csum) \
>  
>  enum {
>  #define XDP_METADATA_KFUNC(name, _) name,
> @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
>  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
>  };
>  
> +union xdp_csum_info {
> +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> +	 * valid, but was never calculated, TX device has to do this,
> +	 * starting from csum_start packet byte.
> +	 * Any preceding checksums are also considered valid.
> +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> +	 */
> +	struct {
> +		u16 csum_start;
> +		u16 csum_offset;
> +	};
> +
> +	/* Checksum, calculated over the whole packet.
> +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> +	 */
> +	u32 checksum;
> +};
> +
> +enum xdp_csum_status {
> +	/* HW had parsed several transport headers and validated their
> +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> +	 * 3 least significat bytes contain number of consecutive checksum,

typo: significant

(I did not scan for typos, just came across this when trying to understand
the skb->csum_level + 1 trick. Probably good to run a spell check).

> +	 * starting with the outermost, reported by hardware as valid.
> +	 * ``sk_buff`` checksum level (``csum_level``) notation is provided
> +	 * for driver developers.
> +	 */
> +	XDP_CHECKSUM_VALID_LVL0		= 1,	/* 1 outermost checksum */
> +	XDP_CHECKSUM_VALID_LVL1		= 2,	/* 2 outermost checksums */
> +	XDP_CHECKSUM_VALID_LVL2		= 3,	/* 3 outermost checksums */
> +	XDP_CHECKSUM_VALID_LVL3		= 4,	/* 4 outermost checksums */
> +	XDP_CHECKSUM_VALID_NUM_MASK	= GENMASK(2, 0),
> +	XDP_CHECKSUM_VALID		= XDP_CHECKSUM_VALID_NUM_MASK,
> +
> +	/* Occurs if packet is sent virtually (between Linux VMs / containers)
> +	 * This status cannot coexist with any other.
> +	 * Refer to ``csum_start`` and ``csum_offset`` in ``xdp_csum_info``
> +	 * for more information.
> +	 */
> +	XDP_CHECKSUM_PARTIAL	= BIT(3),
> +
> +	/* Checksum, calculated over the entire packet is provided */
> +	XDP_CHECKSUM_COMPLETE	= BIT(4),
> +};
> +
>  #ifdef CONFIG_NET
>  u32 bpf_xdp_metadata_kfunc_id(int id);
>  bool bpf_dev_bound_kfunc_id(u32 btf_id);
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index 986e7becfd42..f60a6add5273 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -850,6 +850,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
>  		p = ops->xmo_rx_hash;
>  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG))
>  		p = ops->xmo_rx_vlan_tag;
> +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM))
> +		p = ops->xmo_rx_csum;
>  out:
>  	up_read(&bpf_devs_lock);
>  
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 8b55419d332e..d4ea54046afc 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -772,6 +772,29 @@ __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
>  	return -EOPNOTSUPP;
>  }
>  
> +/**
> + * bpf_xdp_metadata_rx_csum - Get checksum status with additional info.
> + * @ctx: XDP context pointer.
> + * @csum_status: Destination for checksum status.
> + * @csum_info: Destination for complete checksum or partial checksum offset.
> + *
> + * Status (@csum_status) is a bitfield that informs, what checksum
> + * processing was performed. Additional results of such processing,
> + * such as complete checksum or partial checksum offsets,
> + * are passed as info (@csum_info).
> + *
> + * Return:
> + * * Returns 0 on success or ``-errno`` on error.
> + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
> + * * ``-ENODATA``    : Checksum status is unknown
> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx,
> +					 enum xdp_csum_status *csum_status,
> +					 union xdp_csum_info *csum_info)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  __diag_pop();
>  
>  BTF_SET8_START(xdp_metadata_kfunc_ids)
> -- 
> 2.41.0
>
Larysa Zaremba July 20, 2023, 9:57 a.m. UTC | #2
On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> Larysa Zaremba wrote:
> > Implement functionality that enables drivers to expose to XDP code checksum
> > information that consists of:
> > 
> > - Checksum status - bitfield that consists of
> >   - number of consecutive validated checksums. This is almost the same as
> >     csum_level in skb, but starts with 1. Enum names for those bits still
> >     use checksum level concept, so it is less confusing for driver
> >     developers.
> >   - Is checksum partial? This bit cannot coexist with any other
> >   - Is there a complete checksum available?
> > - Additional checksum data, a union of:
> >   - checksum start and offset, if checksum is partial
> >   - complete checksum, if available
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> >  include/linux/netdevice.h                    |  3 ++
> >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> >  kernel/bpf/offload.c                         |  2 +
> >  net/core/xdp.c                               | 23 ++++++++++
> >  5 files changed, 77 insertions(+)
> > 
> > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > index ea6dd79a21d3..7f056a44f682 100644
> > --- a/Documentation/networking/xdp-rx-metadata.rst
> > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> >  .. kernel-doc:: net/core/xdp.c
> >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> >  
> > +.. kernel-doc:: net/core/xdp.c
> > +   :identifiers: bpf_xdp_metadata_rx_csum
> > +
> >  An XDP program can use these kfuncs to read the metadata into stack
> >  variables for its own consumption. Or, to pass the metadata on to other
> >  consumers, an XDP program can store it into the metadata area carried
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 1749f4f75c64..4f6da36ac123 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> >  			       enum xdp_rss_hash_type *rss_type);
> >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> >  				   __be16 *vlan_proto);
> > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > +			       enum xdp_csum_status *csum_status,
> > +			       union xdp_csum_info *csum_info);
> >  };
> >  
> >  /**
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 89c58f56ffc6..2b7a7d678ff4 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> >  			   bpf_xdp_metadata_rx_hash) \
> >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > +			   bpf_xdp_metadata_rx_csum) \
> >  
> >  enum {
> >  #define XDP_METADATA_KFUNC(name, _) name,
> > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> >  };
> >  
> > +union xdp_csum_info {
> > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > +	 * valid, but was never calculated, TX device has to do this,
> > +	 * starting from csum_start packet byte.
> > +	 * Any preceding checksums are also considered valid.
> > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > +	 */
> > +	struct {
> > +		u16 csum_start;
> > +		u16 csum_offset;
> > +	};
> > +
> > +	/* Checksum, calculated over the whole packet.
> > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > +	 */
> > +	u32 checksum;
> > +};
> > +
> > +enum xdp_csum_status {
> > +	/* HW had parsed several transport headers and validated their
> > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > +	 * 3 least significat bytes contain number of consecutive checksum,
> 
> typo: significant
> 
> (I did not scan for typos, just came across this when trying to understand
> the skb->csum_level + 1 trick. Probably good to run a spell check).
> 

Thanks! Will fix.

> > +	 * starting with the outermost, reported by hardware as valid.
> > +	 * ``sk_buff`` checksum level (``csum_level``) notation is provided
> > +	 * for driver developers.
> > +	 */
> > +	XDP_CHECKSUM_VALID_LVL0		= 1,	/* 1 outermost checksum */
> > +	XDP_CHECKSUM_VALID_LVL1		= 2,	/* 2 outermost checksums */
> > +	XDP_CHECKSUM_VALID_LVL2		= 3,	/* 3 outermost checksums */
> > +	XDP_CHECKSUM_VALID_LVL3		= 4,	/* 4 outermost checksums */
> > +	XDP_CHECKSUM_VALID_NUM_MASK	= GENMASK(2, 0),
> > +	XDP_CHECKSUM_VALID		= XDP_CHECKSUM_VALID_NUM_MASK,
> > +
> > +	/* Occurs if packet is sent virtually (between Linux VMs / containers)
> > +	 * This status cannot coexist with any other.
> > +	 * Refer to ``csum_start`` and ``csum_offset`` in ``xdp_csum_info``
> > +	 * for more information.
> > +	 */
> > +	XDP_CHECKSUM_PARTIAL	= BIT(3),
> > +
> > +	/* Checksum, calculated over the entire packet is provided */
> > +	XDP_CHECKSUM_COMPLETE	= BIT(4),
> > +};
> > +
> >  #ifdef CONFIG_NET
> >  u32 bpf_xdp_metadata_kfunc_id(int id);
> >  bool bpf_dev_bound_kfunc_id(u32 btf_id);
> > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > index 986e7becfd42..f60a6add5273 100644
> > --- a/kernel/bpf/offload.c
> > +++ b/kernel/bpf/offload.c
> > @@ -850,6 +850,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> >  		p = ops->xmo_rx_hash;
> >  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG))
> >  		p = ops->xmo_rx_vlan_tag;
> > +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM))
> > +		p = ops->xmo_rx_csum;
> >  out:
> >  	up_read(&bpf_devs_lock);
> >  
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 8b55419d332e..d4ea54046afc 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -772,6 +772,29 @@ __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
> >  	return -EOPNOTSUPP;
> >  }
> >  
> > +/**
> > + * bpf_xdp_metadata_rx_csum - Get checksum status with additional info.
> > + * @ctx: XDP context pointer.
> > + * @csum_status: Destination for checksum status.
> > + * @csum_info: Destination for complete checksum or partial checksum offset.
> > + *
> > + * Status (@csum_status) is a bitfield that informs, what checksum
> > + * processing was performed. Additional results of such processing,
> > + * such as complete checksum or partial checksum offsets,
> > + * are passed as info (@csum_info).
> > + *
> > + * Return:
> > + * * Returns 0 on success or ``-errno`` on error.
> > + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
> > + * * ``-ENODATA``    : Checksum status is unknown
> > + */
> > +__bpf_kfunc int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx,
> > +					 enum xdp_csum_status *csum_status,
> > +					 union xdp_csum_info *csum_info)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  __diag_pop();
> >  
> >  BTF_SET8_START(xdp_metadata_kfunc_ids)
> > -- 
> > 2.41.0
> > 
> 
>
Larysa Zaremba July 20, 2023, 10:10 a.m. UTC | #3
On Thu, Jul 20, 2023 at 09:57:05AM +0000, Zaremba, Larysa wrote:
> On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> > Larysa Zaremba wrote:
> > > Implement functionality that enables drivers to expose to XDP code checksum
> > > information that consists of:
> > > 
> > > - Checksum status - bitfield that consists of
> > >   - number of consecutive validated checksums. This is almost the same as
> > >     csum_level in skb, but starts with 1. Enum names for those bits still
> > >     use checksum level concept, so it is less confusing for driver
> > >     developers.
> > >   - Is checksum partial? This bit cannot coexist with any other
> > >   - Is there a complete checksum available?
> > > - Additional checksum data, a union of:
> > >   - checksum start and offset, if checksum is partial
> > >   - complete checksum, if available
> > > 
> > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > ---
> > >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> > >  include/linux/netdevice.h                    |  3 ++
> > >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> > >  kernel/bpf/offload.c                         |  2 +
> > >  net/core/xdp.c                               | 23 ++++++++++
> > >  5 files changed, 77 insertions(+)
> > > 
> > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > index ea6dd79a21d3..7f056a44f682 100644
> > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> > >  .. kernel-doc:: net/core/xdp.c
> > >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > >  
> > > +.. kernel-doc:: net/core/xdp.c
> > > +   :identifiers: bpf_xdp_metadata_rx_csum
> > > +
> > >  An XDP program can use these kfuncs to read the metadata into stack
> > >  variables for its own consumption. Or, to pass the metadata on to other
> > >  consumers, an XDP program can store it into the metadata area carried
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 1749f4f75c64..4f6da36ac123 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> > >  			       enum xdp_rss_hash_type *rss_type);
> > >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> > >  				   __be16 *vlan_proto);
> > > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > > +			       enum xdp_csum_status *csum_status,
> > > +			       union xdp_csum_info *csum_info);
> > >  };
> > >  
> > >  /**
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index 89c58f56ffc6..2b7a7d678ff4 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> > >  			   bpf_xdp_metadata_rx_hash) \
> > >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> > >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > > +			   bpf_xdp_metadata_rx_csum) \
> > >  
> > >  enum {
> > >  #define XDP_METADATA_KFUNC(name, _) name,
> > > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> > >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> > >  };
> > >  
> > > +union xdp_csum_info {
> > > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > > +	 * valid, but was never calculated, TX device has to do this,
> > > +	 * starting from csum_start packet byte.
> > > +	 * Any preceding checksums are also considered valid.
> > > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > > +	 */
> > > +	struct {
> > > +		u16 csum_start;
> > > +		u16 csum_offset;
> > > +	};
> > > +
> > > +	/* Checksum, calculated over the whole packet.
> > > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > > +	 */
> > > +	u32 checksum;
> > > +};
> > > +
> > > +enum xdp_csum_status {
> > > +	/* HW had parsed several transport headers and validated their
> > > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > > +	 * 3 least significat bytes contain number of consecutive checksum,
> > 
> > typo: significant
> > 
> > (I did not scan for typos, just came across this when trying to understand
> > the skb->csum_level + 1 trick. Probably good to run a spell check).
> >

Oh, and about skb->csum_level + 1, maybe this way it would be more 
understandable: XDP_CHECKSUM_VALID_LVL0 + skb->csum_level?

Using number of valid checksums (starts with 1) instead of checksum level 
(starts with 0) is a debatable decision, but I have decided to go with it under 
2 assumptions:

- the only reason checksum level in skb starts with 0 is to use less bits
- checksum number would be more intuitive for XDP/AF_XDP application developers

I encourage everyone to share their opinion on that.
 
> 
> Thanks! Will fix.
> 
> > > +	 * starting with the outermost, reported by hardware as valid.
> > > +	 * ``sk_buff`` checksum level (``csum_level``) notation is provided
> > > +	 * for driver developers.
> > > +	 */
> > > +	XDP_CHECKSUM_VALID_LVL0		= 1,	/* 1 outermost checksum */
> > > +	XDP_CHECKSUM_VALID_LVL1		= 2,	/* 2 outermost checksums */
> > > +	XDP_CHECKSUM_VALID_LVL2		= 3,	/* 3 outermost checksums */
> > > +	XDP_CHECKSUM_VALID_LVL3		= 4,	/* 4 outermost checksums */
> > > +	XDP_CHECKSUM_VALID_NUM_MASK	= GENMASK(2, 0),
> > > +	XDP_CHECKSUM_VALID		= XDP_CHECKSUM_VALID_NUM_MASK,
> > > +
> > > +	/* Occurs if packet is sent virtually (between Linux VMs / containers)
> > > +	 * This status cannot coexist with any other.
> > > +	 * Refer to ``csum_start`` and ``csum_offset`` in ``xdp_csum_info``
> > > +	 * for more information.
> > > +	 */
> > > +	XDP_CHECKSUM_PARTIAL	= BIT(3),
> > > +
> > > +	/* Checksum, calculated over the entire packet is provided */
> > > +	XDP_CHECKSUM_COMPLETE	= BIT(4),
> > > +};
> > > +
> > >  #ifdef CONFIG_NET
> > >  u32 bpf_xdp_metadata_kfunc_id(int id);
> > >  bool bpf_dev_bound_kfunc_id(u32 btf_id);
> > > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > > index 986e7becfd42..f60a6add5273 100644
> > > --- a/kernel/bpf/offload.c
> > > +++ b/kernel/bpf/offload.c
> > > @@ -850,6 +850,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> > >  		p = ops->xmo_rx_hash;
> > >  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG))
> > >  		p = ops->xmo_rx_vlan_tag;
> > > +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM))
> > > +		p = ops->xmo_rx_csum;
> > >  out:
> > >  	up_read(&bpf_devs_lock);
> > >  
> > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > index 8b55419d332e..d4ea54046afc 100644
> > > --- a/net/core/xdp.c
> > > +++ b/net/core/xdp.c
> > > @@ -772,6 +772,29 @@ __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
> > >  	return -EOPNOTSUPP;
> > >  }
> > >  
> > > +/**
> > > + * bpf_xdp_metadata_rx_csum - Get checksum status with additional info.
> > > + * @ctx: XDP context pointer.
> > > + * @csum_status: Destination for checksum status.
> > > + * @csum_info: Destination for complete checksum or partial checksum offset.
> > > + *
> > > + * Status (@csum_status) is a bitfield that informs, what checksum
> > > + * processing was performed. Additional results of such processing,
> > > + * such as complete checksum or partial checksum offsets,
> > > + * are passed as info (@csum_info).
> > > + *
> > > + * Return:
> > > + * * Returns 0 on success or ``-errno`` on error.
> > > + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
> > > + * * ``-ENODATA``    : Checksum status is unknown
> > > + */
> > > +__bpf_kfunc int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx,
> > > +					 enum xdp_csum_status *csum_status,
> > > +					 union xdp_csum_info *csum_info)
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > +
> > >  __diag_pop();
> > >  
> > >  BTF_SET8_START(xdp_metadata_kfunc_ids)
> > > -- 
> > > 2.41.0
> > > 
> > 
> > 
>
Willem de Bruijn July 20, 2023, 1:55 p.m. UTC | #4
Zaremba, Larysa wrote:
> On Thu, Jul 20, 2023 at 09:57:05AM +0000, Zaremba, Larysa wrote:
> > On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> > > Larysa Zaremba wrote:
> > > > Implement functionality that enables drivers to expose to XDP code checksum
> > > > information that consists of:
> > > > 
> > > > - Checksum status - bitfield that consists of
> > > >   - number of consecutive validated checksums. This is almost the same as
> > > >     csum_level in skb, but starts with 1. Enum names for those bits still
> > > >     use checksum level concept, so it is less confusing for driver
> > > >     developers.
> > > >   - Is checksum partial? This bit cannot coexist with any other
> > > >   - Is there a complete checksum available?
> > > > - Additional checksum data, a union of:
> > > >   - checksum start and offset, if checksum is partial
> > > >   - complete checksum, if available
> > > > 
> > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > ---
> > > >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> > > >  include/linux/netdevice.h                    |  3 ++
> > > >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> > > >  kernel/bpf/offload.c                         |  2 +
> > > >  net/core/xdp.c                               | 23 ++++++++++
> > > >  5 files changed, 77 insertions(+)
> > > > 
> > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > index ea6dd79a21d3..7f056a44f682 100644
> > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> > > >  .. kernel-doc:: net/core/xdp.c
> > > >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > > >  
> > > > +.. kernel-doc:: net/core/xdp.c
> > > > +   :identifiers: bpf_xdp_metadata_rx_csum
> > > > +
> > > >  An XDP program can use these kfuncs to read the metadata into stack
> > > >  variables for its own consumption. Or, to pass the metadata on to other
> > > >  consumers, an XDP program can store it into the metadata area carried
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 1749f4f75c64..4f6da36ac123 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> > > >  			       enum xdp_rss_hash_type *rss_type);
> > > >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> > > >  				   __be16 *vlan_proto);
> > > > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > > > +			       enum xdp_csum_status *csum_status,
> > > > +			       union xdp_csum_info *csum_info);
> > > >  };
> > > >  
> > > >  /**
> > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > index 89c58f56ffc6..2b7a7d678ff4 100644
> > > > --- a/include/net/xdp.h
> > > > +++ b/include/net/xdp.h
> > > > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> > > >  			   bpf_xdp_metadata_rx_hash) \
> > > >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> > > >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > > > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > > > +			   bpf_xdp_metadata_rx_csum) \
> > > >  
> > > >  enum {
> > > >  #define XDP_METADATA_KFUNC(name, _) name,
> > > > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> > > >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> > > >  };
> > > >  
> > > > +union xdp_csum_info {
> > > > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > > > +	 * valid, but was never calculated, TX device has to do this,
> > > > +	 * starting from csum_start packet byte.
> > > > +	 * Any preceding checksums are also considered valid.
> > > > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > > > +	 */
> > > > +	struct {
> > > > +		u16 csum_start;
> > > > +		u16 csum_offset;
> > > > +	};
> > > > +
> > > > +	/* Checksum, calculated over the whole packet.
> > > > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > > > +	 */
> > > > +	u32 checksum;
> > > > +};
> > > > +
> > > > +enum xdp_csum_status {
> > > > +	/* HW had parsed several transport headers and validated their
> > > > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > > > +	 * 3 least significat bytes contain number of consecutive checksum,
> > > 
> > > typo: significant
> > > 
> > > (I did not scan for typos, just came across this when trying to understand
> > > the skb->csum_level + 1 trick. Probably good to run a spell check).
> > >
> 
> Oh, and about skb->csum_level + 1, maybe this way it would be more 
> understandable: XDP_CHECKSUM_VALID_LVL0 + skb->csum_level?

Agreed, that would help document the intent.
 
> Using number of valid checksums (starts with 1) instead of checksum level 
> (starts with 0) is a debatable decision, but I have decided to go with it under 
> 2 assumptions:
> 
> - the only reason checksum level in skb starts with 0 is to use less bits
> - checksum number would be more intuitive for XDP/AF_XDP application developers
> 
> I encourage everyone to share their opinion on that.

I assumed this offset by one was because csum_status zero implicitly
meant XDP_CHECKSUM_NONE. Is that not correct? That should probably
get an explicit name.
Larysa Zaremba July 20, 2023, 4:03 p.m. UTC | #5
On Thu, Jul 20, 2023 at 09:55:16AM -0400, Willem de Bruijn wrote:
> Zaremba, Larysa wrote:
> > On Thu, Jul 20, 2023 at 09:57:05AM +0000, Zaremba, Larysa wrote:
> > > On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> > > > Larysa Zaremba wrote:
> > > > > Implement functionality that enables drivers to expose to XDP code checksum
> > > > > information that consists of:
> > > > > 
> > > > > - Checksum status - bitfield that consists of
> > > > >   - number of consecutive validated checksums. This is almost the same as
> > > > >     csum_level in skb, but starts with 1. Enum names for those bits still
> > > > >     use checksum level concept, so it is less confusing for driver
> > > > >     developers.
> > > > >   - Is checksum partial? This bit cannot coexist with any other
> > > > >   - Is there a complete checksum available?
> > > > > - Additional checksum data, a union of:
> > > > >   - checksum start and offset, if checksum is partial
> > > > >   - complete checksum, if available
> > > > > 
> > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > ---
> > > > >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> > > > >  include/linux/netdevice.h                    |  3 ++
> > > > >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> > > > >  kernel/bpf/offload.c                         |  2 +
> > > > >  net/core/xdp.c                               | 23 ++++++++++
> > > > >  5 files changed, 77 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > > index ea6dd79a21d3..7f056a44f682 100644
> > > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> > > > >  .. kernel-doc:: net/core/xdp.c
> > > > >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > > > >  
> > > > > +.. kernel-doc:: net/core/xdp.c
> > > > > +   :identifiers: bpf_xdp_metadata_rx_csum
> > > > > +
> > > > >  An XDP program can use these kfuncs to read the metadata into stack
> > > > >  variables for its own consumption. Or, to pass the metadata on to other
> > > > >  consumers, an XDP program can store it into the metadata area carried
> > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > > index 1749f4f75c64..4f6da36ac123 100644
> > > > > --- a/include/linux/netdevice.h
> > > > > +++ b/include/linux/netdevice.h
> > > > > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> > > > >  			       enum xdp_rss_hash_type *rss_type);
> > > > >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> > > > >  				   __be16 *vlan_proto);
> > > > > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > > > > +			       enum xdp_csum_status *csum_status,
> > > > > +			       union xdp_csum_info *csum_info);
> > > > >  };
> > > > >  
> > > > >  /**
> > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > > index 89c58f56ffc6..2b7a7d678ff4 100644
> > > > > --- a/include/net/xdp.h
> > > > > +++ b/include/net/xdp.h
> > > > > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> > > > >  			   bpf_xdp_metadata_rx_hash) \
> > > > >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> > > > >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > > > > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > > > > +			   bpf_xdp_metadata_rx_csum) \
> > > > >  
> > > > >  enum {
> > > > >  #define XDP_METADATA_KFUNC(name, _) name,
> > > > > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> > > > >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> > > > >  };
> > > > >  
> > > > > +union xdp_csum_info {
> > > > > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > > > > +	 * valid, but was never calculated, TX device has to do this,
> > > > > +	 * starting from csum_start packet byte.
> > > > > +	 * Any preceding checksums are also considered valid.
> > > > > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > > > > +	 */
> > > > > +	struct {
> > > > > +		u16 csum_start;
> > > > > +		u16 csum_offset;
> > > > > +	};
> > > > > +
> > > > > +	/* Checksum, calculated over the whole packet.
> > > > > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > > > > +	 */
> > > > > +	u32 checksum;
> > > > > +};
> > > > > +
> > > > > +enum xdp_csum_status {
> > > > > +	/* HW had parsed several transport headers and validated their
> > > > > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > > > > +	 * 3 least significat bytes contain number of consecutive checksum,
> > > > 
> > > > typo: significant
> > > > 
> > > > (I did not scan for typos, just came across this when trying to understand
> > > > the skb->csum_level + 1 trick. Probably good to run a spell check).
> > > >
> > 
> > Oh, and about skb->csum_level + 1, maybe this way it would be more 
> > understandable: XDP_CHECKSUM_VALID_LVL0 + skb->csum_level?
> 
> Agreed, that would help document the intent.
>  
> > Using number of valid checksums (starts with 1) instead of checksum level 
> > (starts with 0) is a debatable decision, but I have decided to go with it under 
> > 2 assumptions:
> > 
> > - the only reason checksum level in skb starts with 0 is to use less bits
> > - checksum number would be more intuitive for XDP/AF_XDP application developers
> > 
> > I encourage everyone to share their opinion on that.
> 
> I assumed this offset by one was because csum_status zero implicitly
> meant XDP_CHECKSUM_NONE. Is that not correct? That should probably
> get an explicit name.
> 

Well, I was not sure, whether I should add XDP_CHECKSUM_NONE, because it would 
be equal to returning -ENODATA from kfunc, but after giving it some thought now, 
it is worth to have XDP_CHECKSUM_NONE for packets that have no checksum to 
check, like for hash there is XDP_RSS_TYPE_L2.
Willem de Bruijn July 20, 2023, 10:27 p.m. UTC | #6
Zaremba, Larysa wrote:
> On Thu, Jul 20, 2023 at 09:55:16AM -0400, Willem de Bruijn wrote:
> > Zaremba, Larysa wrote:
> > > On Thu, Jul 20, 2023 at 09:57:05AM +0000, Zaremba, Larysa wrote:
> > > > On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> > > > > Larysa Zaremba wrote:
> > > > > > Implement functionality that enables drivers to expose to XDP code checksum
> > > > > > information that consists of:
> > > > > > 
> > > > > > - Checksum status - bitfield that consists of
> > > > > >   - number of consecutive validated checksums. This is almost the same as
> > > > > >     csum_level in skb, but starts with 1. Enum names for those bits still
> > > > > >     use checksum level concept, so it is less confusing for driver
> > > > > >     developers.
> > > > > >   - Is checksum partial? This bit cannot coexist with any other
> > > > > >   - Is there a complete checksum available?
> > > > > > - Additional checksum data, a union of:
> > > > > >   - checksum start and offset, if checksum is partial
> > > > > >   - complete checksum, if available
> > > > > > 
> > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > > ---
> > > > > >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> > > > > >  include/linux/netdevice.h                    |  3 ++
> > > > > >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> > > > > >  kernel/bpf/offload.c                         |  2 +
> > > > > >  net/core/xdp.c                               | 23 ++++++++++
> > > > > >  5 files changed, 77 insertions(+)
> > > > > > 
> > > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > index ea6dd79a21d3..7f056a44f682 100644
> > > > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> > > > > >  .. kernel-doc:: net/core/xdp.c
> > > > > >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > > > > >  
> > > > > > +.. kernel-doc:: net/core/xdp.c
> > > > > > +   :identifiers: bpf_xdp_metadata_rx_csum
> > > > > > +
> > > > > >  An XDP program can use these kfuncs to read the metadata into stack
> > > > > >  variables for its own consumption. Or, to pass the metadata on to other
> > > > > >  consumers, an XDP program can store it into the metadata area carried
> > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > > > index 1749f4f75c64..4f6da36ac123 100644
> > > > > > --- a/include/linux/netdevice.h
> > > > > > +++ b/include/linux/netdevice.h
> > > > > > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> > > > > >  			       enum xdp_rss_hash_type *rss_type);
> > > > > >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> > > > > >  				   __be16 *vlan_proto);
> > > > > > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > > > > > +			       enum xdp_csum_status *csum_status,
> > > > > > +			       union xdp_csum_info *csum_info);
> > > > > >  };
> > > > > >  
> > > > > >  /**
> > > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > > > index 89c58f56ffc6..2b7a7d678ff4 100644
> > > > > > --- a/include/net/xdp.h
> > > > > > +++ b/include/net/xdp.h
> > > > > > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> > > > > >  			   bpf_xdp_metadata_rx_hash) \
> > > > > >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> > > > > >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > > > > > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > > > > > +			   bpf_xdp_metadata_rx_csum) \
> > > > > >  
> > > > > >  enum {
> > > > > >  #define XDP_METADATA_KFUNC(name, _) name,
> > > > > > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> > > > > >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> > > > > >  };
> > > > > >  
> > > > > > +union xdp_csum_info {
> > > > > > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > > > > > +	 * valid, but was never calculated, TX device has to do this,
> > > > > > +	 * starting from csum_start packet byte.
> > > > > > +	 * Any preceding checksums are also considered valid.
> > > > > > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > > > > > +	 */
> > > > > > +	struct {
> > > > > > +		u16 csum_start;
> > > > > > +		u16 csum_offset;
> > > > > > +	};
> > > > > > +
> > > > > > +	/* Checksum, calculated over the whole packet.
> > > > > > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > > > > > +	 */
> > > > > > +	u32 checksum;
> > > > > > +};
> > > > > > +
> > > > > > +enum xdp_csum_status {
> > > > > > +	/* HW had parsed several transport headers and validated their
> > > > > > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > > > > > +	 * 3 least significat bytes contain number of consecutive checksum,
> > > > > 
> > > > > typo: significant
> > > > > 
> > > > > (I did not scan for typos, just came across this when trying to understand
> > > > > the skb->csum_level + 1 trick. Probably good to run a spell check).
> > > > >
> > > 
> > > Oh, and about skb->csum_level + 1, maybe this way it would be more 
> > > understandable: XDP_CHECKSUM_VALID_LVL0 + skb->csum_level?
> > 
> > Agreed, that would help document the intent.
> >  
> > > Using number of valid checksums (starts with 1) instead of checksum level 
> > > (starts with 0) is a debatable decision, but I have decided to go with it under 
> > > 2 assumptions:
> > > 
> > > - the only reason checksum level in skb starts with 0 is to use less bits
> > > - checksum number would be more intuitive for XDP/AF_XDP application developers
> > > 
> > > I encourage everyone to share their opinion on that.
> > 
> > I assumed this offset by one was because csum_status zero implicitly
> > meant XDP_CHECKSUM_NONE. Is that not correct? That should probably
> > get an explicit name.
> > 
> 
> Well, I was not sure, whether I should add XDP_CHECKSUM_NONE, because it would 
> be equal to returning -ENODATA from kfunc, but after giving it some thought now, 
> it is worth to have XDP_CHECKSUM_NONE for packets that have no checksum to 
> check, like for hash there is XDP_RSS_TYPE_L2.

On receive, CHECKSUM_NONE means that the packet has not been checked, not
necessarily that it has no checksum. Perhaps the device was unable to
parse the protocol.

(on transmit, it conveys that a transmit checksum is not required.)
Larysa Zaremba July 21, 2023, 8:01 a.m. UTC | #7
On Thu, Jul 20, 2023 at 06:27:41PM -0400, Willem de Bruijn wrote:
> Zaremba, Larysa wrote:
> > On Thu, Jul 20, 2023 at 09:55:16AM -0400, Willem de Bruijn wrote:
> > > Zaremba, Larysa wrote:
> > > > On Thu, Jul 20, 2023 at 09:57:05AM +0000, Zaremba, Larysa wrote:
> > > > > On Wed, Jul 19, 2023 at 05:42:04PM -0400, Willem de Bruijn wrote:
> > > > > > Larysa Zaremba wrote:
> > > > > > > Implement functionality that enables drivers to expose to XDP code checksum
> > > > > > > information that consists of:
> > > > > > > 
> > > > > > > - Checksum status - bitfield that consists of
> > > > > > >   - number of consecutive validated checksums. This is almost the same as
> > > > > > >     csum_level in skb, but starts with 1. Enum names for those bits still
> > > > > > >     use checksum level concept, so it is less confusing for driver
> > > > > > >     developers.
> > > > > > >   - Is checksum partial? This bit cannot coexist with any other
> > > > > > >   - Is there a complete checksum available?
> > > > > > > - Additional checksum data, a union of:
> > > > > > >   - checksum start and offset, if checksum is partial
> > > > > > >   - complete checksum, if available
> > > > > > > 
> > > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > > > ---
> > > > > > >  Documentation/networking/xdp-rx-metadata.rst |  3 ++
> > > > > > >  include/linux/netdevice.h                    |  3 ++
> > > > > > >  include/net/xdp.h                            | 46 ++++++++++++++++++++
> > > > > > >  kernel/bpf/offload.c                         |  2 +
> > > > > > >  net/core/xdp.c                               | 23 ++++++++++
> > > > > > >  5 files changed, 77 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > index ea6dd79a21d3..7f056a44f682 100644
> > > > > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > @@ -26,6 +26,9 @@ metadata is supported, this set will grow:
> > > > > > >  .. kernel-doc:: net/core/xdp.c
> > > > > > >     :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > > > > > >  
> > > > > > > +.. kernel-doc:: net/core/xdp.c
> > > > > > > +   :identifiers: bpf_xdp_metadata_rx_csum
> > > > > > > +
> > > > > > >  An XDP program can use these kfuncs to read the metadata into stack
> > > > > > >  variables for its own consumption. Or, to pass the metadata on to other
> > > > > > >  consumers, an XDP program can store it into the metadata area carried
> > > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > > > > index 1749f4f75c64..4f6da36ac123 100644
> > > > > > > --- a/include/linux/netdevice.h
> > > > > > > +++ b/include/linux/netdevice.h
> > > > > > > @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops {
> > > > > > >  			       enum xdp_rss_hash_type *rss_type);
> > > > > > >  	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
> > > > > > >  				   __be16 *vlan_proto);
> > > > > > > +	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
> > > > > > > +			       enum xdp_csum_status *csum_status,
> > > > > > > +			       union xdp_csum_info *csum_info);
> > > > > > >  };
> > > > > > >  
> > > > > > >  /**
> > > > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > > > > index 89c58f56ffc6..2b7a7d678ff4 100644
> > > > > > > --- a/include/net/xdp.h
> > > > > > > +++ b/include/net/xdp.h
> > > > > > > @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> > > > > > >  			   bpf_xdp_metadata_rx_hash) \
> > > > > > >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
> > > > > > >  			   bpf_xdp_metadata_rx_vlan_tag) \
> > > > > > > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
> > > > > > > +			   bpf_xdp_metadata_rx_csum) \
> > > > > > >  
> > > > > > >  enum {
> > > > > > >  #define XDP_METADATA_KFUNC(name, _) name,
> > > > > > > @@ -448,6 +450,50 @@ enum xdp_rss_hash_type {
> > > > > > >  	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
> > > > > > >  };
> > > > > > >  
> > > > > > > +union xdp_csum_info {
> > > > > > > +	/* Checksum referred to by ``csum_start + csum_offset`` is considered
> > > > > > > +	 * valid, but was never calculated, TX device has to do this,
> > > > > > > +	 * starting from csum_start packet byte.
> > > > > > > +	 * Any preceding checksums are also considered valid.
> > > > > > > +	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
> > > > > > > +	 */
> > > > > > > +	struct {
> > > > > > > +		u16 csum_start;
> > > > > > > +		u16 csum_offset;
> > > > > > > +	};
> > > > > > > +
> > > > > > > +	/* Checksum, calculated over the whole packet.
> > > > > > > +	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
> > > > > > > +	 */
> > > > > > > +	u32 checksum;
> > > > > > > +};
> > > > > > > +
> > > > > > > +enum xdp_csum_status {
> > > > > > > +	/* HW had parsed several transport headers and validated their
> > > > > > > +	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
> > > > > > > +	 * 3 least significat bytes contain number of consecutive checksum,
> > > > > > 
> > > > > > typo: significant
> > > > > > 
> > > > > > (I did not scan for typos, just came across this when trying to understand
> > > > > > the skb->csum_level + 1 trick. Probably good to run a spell check).
> > > > > >
> > > > 
> > > > Oh, and about skb->csum_level + 1, maybe this way it would be more 
> > > > understandable: XDP_CHECKSUM_VALID_LVL0 + skb->csum_level?
> > > 
> > > Agreed, that would help document the intent.
> > >  
> > > > Using number of valid checksums (starts with 1) instead of checksum level 
> > > > (starts with 0) is a debatable decision, but I have decided to go with it under 
> > > > 2 assumptions:
> > > > 
> > > > - the only reason checksum level in skb starts with 0 is to use less bits
> > > > - checksum number would be more intuitive for XDP/AF_XDP application developers
> > > > 
> > > > I encourage everyone to share their opinion on that.
> > > 
> > > I assumed this offset by one was because csum_status zero implicitly
> > > meant XDP_CHECKSUM_NONE. Is that not correct? That should probably
> > > get an explicit name.
> > > 
> > 
> > Well, I was not sure, whether I should add XDP_CHECKSUM_NONE, because it would 
> > be equal to returning -ENODATA from kfunc, but after giving it some thought now, 
> > it is worth to have XDP_CHECKSUM_NONE for packets that have no checksum to 
> > check, like for hash there is XDP_RSS_TYPE_L2.
> 
> On receive, CHECKSUM_NONE means that the packet has not been checked, not
> necessarily that it has no checksum. Perhaps the device was unable to
> parse the protocol.
> 
> (on transmit, it conveys that a transmit checksum is not required.)

Oh, my bad, I have re-read the docs and for packets without checksum, 
CHECKSUM_UNNECESSARY instead conveys "CRC in OK". In such case, 
XDP_CHECKSUM_NONE becomes a full equivalent of returning -ENODATA from kfunc, so 
I do not think XDP_CHECKSUM_NONE enum is worth including, because it coud lead 
to new people writing programs in such way:

if (bpf_xdp_metadata_rx_csum(ctx, &csum_status, &rx_csum_info))
	fallback();

if (csum_status == XDP_CHECKSUM_NONE)
	fallback();
[...]
diff mbox series

Patch

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index ea6dd79a21d3..7f056a44f682 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -26,6 +26,9 @@  metadata is supported, this set will grow:
 .. kernel-doc:: net/core/xdp.c
    :identifiers: bpf_xdp_metadata_rx_vlan_tag
 
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_csum
+
 An XDP program can use these kfuncs to read the metadata into stack
 variables for its own consumption. Or, to pass the metadata on to other
 consumers, an XDP program can store it into the metadata area carried
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1749f4f75c64..4f6da36ac123 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1660,6 +1660,9 @@  struct xdp_metadata_ops {
 			       enum xdp_rss_hash_type *rss_type);
 	int	(*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci,
 				   __be16 *vlan_proto);
+	int	(*xmo_rx_csum)(const struct xdp_md *ctx,
+			       enum xdp_csum_status *csum_status,
+			       union xdp_csum_info *csum_info);
 };
 
 /**
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 89c58f56ffc6..2b7a7d678ff4 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -391,6 +391,8 @@  void xdp_attachment_setup(struct xdp_attachment_info *info,
 			   bpf_xdp_metadata_rx_hash) \
 	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
 			   bpf_xdp_metadata_rx_vlan_tag) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \
+			   bpf_xdp_metadata_rx_csum) \
 
 enum {
 #define XDP_METADATA_KFUNC(name, _) name,
@@ -448,6 +450,50 @@  enum xdp_rss_hash_type {
 	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
 };
 
+union xdp_csum_info {
+	/* Checksum referred to by ``csum_start + csum_offset`` is considered
+	 * valid, but was never calculated, TX device has to do this,
+	 * starting from csum_start packet byte.
+	 * Any preceding checksums are also considered valid.
+	 * Available, if ``status == XDP_CHECKSUM_PARTIAL``.
+	 */
+	struct {
+		u16 csum_start;
+		u16 csum_offset;
+	};
+
+	/* Checksum, calculated over the whole packet.
+	 * Available, if ``status & XDP_CHECKSUM_COMPLETE``.
+	 */
+	u32 checksum;
+};
+
+enum xdp_csum_status {
+	/* HW had parsed several transport headers and validated their
+	 * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``.
+	 * 3 least significat bytes contain number of consecutive checksum,
+	 * starting with the outermost, reported by hardware as valid.
+	 * ``sk_buff`` checksum level (``csum_level``) notation is provided
+	 * for driver developers.
+	 */
+	XDP_CHECKSUM_VALID_LVL0		= 1,	/* 1 outermost checksum */
+	XDP_CHECKSUM_VALID_LVL1		= 2,	/* 2 outermost checksums */
+	XDP_CHECKSUM_VALID_LVL2		= 3,	/* 3 outermost checksums */
+	XDP_CHECKSUM_VALID_LVL3		= 4,	/* 4 outermost checksums */
+	XDP_CHECKSUM_VALID_NUM_MASK	= GENMASK(2, 0),
+	XDP_CHECKSUM_VALID		= XDP_CHECKSUM_VALID_NUM_MASK,
+
+	/* Occurs if packet is sent virtually (between Linux VMs / containers)
+	 * This status cannot coexist with any other.
+	 * Refer to ``csum_start`` and ``csum_offset`` in ``xdp_csum_info``
+	 * for more information.
+	 */
+	XDP_CHECKSUM_PARTIAL	= BIT(3),
+
+	/* Checksum, calculated over the entire packet is provided */
+	XDP_CHECKSUM_COMPLETE	= BIT(4),
+};
+
 #ifdef CONFIG_NET
 u32 bpf_xdp_metadata_kfunc_id(int id);
 bool bpf_dev_bound_kfunc_id(u32 btf_id);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 986e7becfd42..f60a6add5273 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -850,6 +850,8 @@  void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
 		p = ops->xmo_rx_hash;
 	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG))
 		p = ops->xmo_rx_vlan_tag;
+	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM))
+		p = ops->xmo_rx_csum;
 out:
 	up_read(&bpf_devs_lock);
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 8b55419d332e..d4ea54046afc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -772,6 +772,29 @@  __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
 	return -EOPNOTSUPP;
 }
 
+/**
+ * bpf_xdp_metadata_rx_csum - Get checksum status with additional info.
+ * @ctx: XDP context pointer.
+ * @csum_status: Destination for checksum status.
+ * @csum_info: Destination for complete checksum or partial checksum offset.
+ *
+ * Status (@csum_status) is a bitfield that informs, what checksum
+ * processing was performed. Additional results of such processing,
+ * such as complete checksum or partial checksum offsets,
+ * are passed as info (@csum_info).
+ *
+ * Return:
+ * * Returns 0 on success or ``-errno`` on error.
+ * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
+ * * ``-ENODATA``    : Checksum status is unknown
+ */
+__bpf_kfunc int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx,
+					 enum xdp_csum_status *csum_status,
+					 union xdp_csum_info *csum_info)
+{
+	return -EOPNOTSUPP;
+}
+
 __diag_pop();
 
 BTF_SET8_START(xdp_metadata_kfunc_ids)