diff mbox series

[net-next] net: sched: extend flow action with RSS

Message ID 20231020061158.6716-1-hkelam@marvell.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: sched: extend flow action with RSS | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3178 this patch: 3178
netdev/cc_maintainers warning 1 maintainers not CCed: davem@davemloft.net
netdev/build_clang success Errors and warnings before: 1525 this patch: 1525
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3428 this patch: 3428
netdev/checkpatch warning CHECK: Comparison to NULL could be written "tb[TCA_SKBEDIT_RSS_GROUP_ID]" WARNING: line length of 86 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hariprasad Kelam Oct. 20, 2023, 6:11 a.m. UTC
This patch extends current flow action with RSS, such that
the user can install flower offloads with action RSS followed
by a group id. Since this is done in hardware skip_sw flag
is enforced.

Example:
In a multi rss group supported NIC,

rss group #1 flow hash indirection table populated with rx queues 1 to 4
rss group #2 flow hash indirection table populated with rx queues 5 to 9

$tc filter add dev eth1 ingress protocol ip flower ip_proto tcp dst_port
443 action skbedit rss_group 1 skip_sw

Packets destined to tcp port 443 will be distributed among rx queues 1 to 4

$tc filter add dev eth1 ingress protocol ip flower ip_proto udp dst_port
8080 action skbedit rss_group 2 skip_sw

Packets destined to udp port 8080 will be distributed among rx queues
5 to 9

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
---
 include/net/flow_offload.h             |  2 ++
 include/net/tc_act/tc_skbedit.h        | 18 ++++++++++++++++++
 include/uapi/linux/tc_act/tc_skbedit.h |  2 ++
 net/sched/act_skbedit.c                | 26 +++++++++++++++++++++++++-
 4 files changed, 47 insertions(+), 1 deletion(-)

Comments

Jamal Hadi Salim Oct. 20, 2023, 12:15 p.m. UTC | #1
On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam <hkelam@marvell.com> wrote:
>
> This patch extends current flow action with RSS, such that
> the user can install flower offloads with action RSS followed
> by a group id. Since this is done in hardware skip_sw flag
> is enforced.

Our typical rule for TC is we need s/w equivalence for offloads. How
would this work in absence of offload?

cheers,
jamal

> Example:
> In a multi rss group supported NIC,
>
> rss group #1 flow hash indirection table populated with rx queues 1 to 4
> rss group #2 flow hash indirection table populated with rx queues 5 to 9
>
> $tc filter add dev eth1 ingress protocol ip flower ip_proto tcp dst_port
> 443 action skbedit rss_group 1 skip_sw
>
> Packets destined to tcp port 443 will be distributed among rx queues 1 to 4
>
> $tc filter add dev eth1 ingress protocol ip flower ip_proto udp dst_port
> 8080 action skbedit rss_group 2 skip_sw
>
> Packets destined to udp port 8080 will be distributed among rx queues
> 5 to 9
Hariprasad Kelam Oct. 20, 2023, 4:35 p.m. UTC | #2
> On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam <hkelam@marvell.com>
> wrote:
> >
> > This patch extends current flow action with RSS, such that the user
> > can install flower offloads with action RSS followed by a group id.
> > Since this is done in hardware skip_sw flag is enforced.
> 
> Our typical rule for TC is we need s/w equivalence for offloads. How would
> this work in absence of offload?
> 
[Hari]  
Our typical rule for TC is we need s/w equivalence for offloads. How would this work in absence of offload?

This patch we added as an extension to receive queue selection in hardware.
This patch "act_skbedit: skbedit queue mapping for receive queue" enabled receive queue selection in hardware
and skip_sw is enforced.

Adding stakeholders of this patch, to get their opinion.
sridhar.samudrala@intel.com  amritha.nambiar@intel.com

incase of RSS, hardware makes decisions about incoming packets before they are even received in the queue.

Thanks,
Hariprasad k




> cheers,
> jamal
> 
> > Example:
> > In a multi rss group supported NIC,
> >
> > rss group #1 flow hash indirection table populated with rx queues 1 to
> > 4 rss group #2 flow hash indirection table populated with rx queues 5
> > to 9
> >
> > $tc filter add dev eth1 ingress protocol ip flower ip_proto tcp
> > dst_port
> > 443 action skbedit rss_group 1 skip_sw
> >
> > Packets destined to tcp port 443 will be distributed among rx queues 1
> > to 4
> >
> > $tc filter add dev eth1 ingress protocol ip flower ip_proto udp
> > dst_port
> > 8080 action skbedit rss_group 2 skip_sw
> >
> > Packets destined to udp port 8080 will be distributed among rx queues
> > 5 to 9
Pedro Tammela Oct. 20, 2023, 5:41 p.m. UTC | #3
On 20/10/2023 03:11, Hariprasad Kelam wrote:
> This patch extends current flow action with RSS, such that

..current skbedit action..

> the user can install flower offloads with action RSS followed
> by a group id. Since this is done in hardware skip_sw flag
> is enforced.
> 
> Example:
> In a multi rss group supported NIC,
> 
> rss group #1 flow hash indirection table populated with rx queues 1 to 4
> rss group #2 flow hash indirection table populated with rx queues 5 to 9
> 
> $tc filter add dev eth1 ingress protocol ip flower ip_proto tcp dst_port
> 443 action skbedit rss_group 1 skip_sw
> 
> Packets destined to tcp port 443 will be distributed among rx queues 1 to 4
> 
> $tc filter add dev eth1 ingress protocol ip flower ip_proto udp dst_port
> 8080 action skbedit rss_group 2 skip_sw
> 
> Packets destined to udp port 8080 will be distributed among rx queues
> 5 to 9
> 
> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
> ---
>   include/net/flow_offload.h             |  2 ++
>   include/net/tc_act/tc_skbedit.h        | 18 ++++++++++++++++++
>   include/uapi/linux/tc_act/tc_skbedit.h |  2 ++
>   net/sched/act_skbedit.c                | 26 +++++++++++++++++++++++++-
>   4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 314087a5e181..efa45ed87e69 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -168,6 +168,7 @@ enum flow_action_id {
>   	FLOW_ACTION_PTYPE,
>   	FLOW_ACTION_PRIORITY,
>   	FLOW_ACTION_RX_QUEUE_MAPPING,
> +	FLOW_ACTION_RSS,
>   	FLOW_ACTION_WAKE,
>   	FLOW_ACTION_QUEUE,
>   	FLOW_ACTION_SAMPLE,
> @@ -264,6 +265,7 @@ struct flow_action_entry {
>   		u16                     ptype;          /* FLOW_ACTION_PTYPE */
>   		u16			rx_queue;	/* FLOW_ACTION_RX_QUEUE_MAPPING */
>   		u32			priority;	/* FLOW_ACTION_PRIORITY */
> +		u16			rss_group_id;	/* FLOW_ACTION_RSS_GROUP_ID */
>   		struct {				/* FLOW_ACTION_QUEUE */
>   			u32		ctx;
>   			u32		index;
> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 9649600fb3dc..c4a122b49e94 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -19,6 +19,7 @@ struct tcf_skbedit_params {
>   	u16 queue_mapping;
>   	u16 mapping_mod;
>   	u16 ptype;
> +	u16 rss_group_id;
>   	struct rcu_head rcu;
>   };
>   
> @@ -106,6 +107,17 @@ static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a)
>   	return rx_queue;
>   }
>   
> +static inline u16 tcf_skbedit_rss_group_id(const struct tc_action *a)
> +{
> +	u16 rss_group_id;
> +
> +	rcu_read_lock();
> +	rss_group_id = rcu_dereference(to_skbedit(a)->params)->rss_group_id;
> +	rcu_read_unlock();
> +
> +	return rss_group_id;
> +}
> +
>   /* Return true iff action is queue_mapping */
>   static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a)
>   {
> @@ -136,4 +148,10 @@ static inline bool is_tcf_skbedit_inheritdsfield(const struct tc_action *a)
>   	return is_tcf_skbedit_with_flag(a, SKBEDIT_F_INHERITDSFIELD);
>   }
>   
> +static inline bool is_tcf_skbedit_rss_group_id(const struct tc_action *a)
> +{
> +	return is_tcf_skbedit_ingress(a->tcfa_flags) &&
> +	       is_tcf_skbedit_with_flag(a, SKBEDIT_F_RSS_GROUP_ID);
> +}
> +
>   #endif /* __NET_TC_SKBEDIT_H */
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 64032513cc4c..83f53550bb88 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -17,6 +17,7 @@
>   #define SKBEDIT_F_MASK			0x10
>   #define SKBEDIT_F_INHERITDSFIELD	0x20
>   #define SKBEDIT_F_TXQ_SKBHASH		0x40
> +#define SKBEDIT_F_RSS_GROUP_ID		0x80
>   
>   struct tc_skbedit {
>   	tc_gen;
> @@ -34,6 +35,7 @@ enum {
>   	TCA_SKBEDIT_MASK,
>   	TCA_SKBEDIT_FLAGS,
>   	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> +	TCA_SKBEDIT_RSS_GROUP_ID,
>   	__TCA_SKBEDIT_MAX
>   };
>   #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index ce7008cf291c..127198239ac7 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -112,6 +112,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>   	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
>   	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
>   	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },
> +	[TCA_SKBEDIT_RSS_GROUP_ID]	= { .len = sizeof(u16) },
>   };
>   
>   static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -126,8 +127,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>   	struct tcf_chain *goto_ch = NULL;
>   	struct tc_skbedit *parm;
>   	struct tcf_skbedit *d;
> +	u16 *queue_mapping = NULL, *ptype = NULL, *rss_group_id = NULL;
>   	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
> -	u16 *queue_mapping = NULL, *ptype = NULL;
>   	u16 mapping_mod = 1;
>   	bool exists = false;
>   	int ret = 0, err;
> @@ -176,6 +177,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>   		mask = nla_data(tb[TCA_SKBEDIT_MASK]);
>   	}
>   
> +	if (tb[TCA_SKBEDIT_RSS_GROUP_ID] != NULL) {
> +		if (!is_tcf_skbedit_ingress(act_flags) ||
> +		    !(act_flags & TCA_ACT_FLAGS_SKIP_SW)) {

nit: use tc_skip_sw()

> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "\"rss_group_id\" option allowed only on receive side with hardware only, use skip_sw");
> +			return -EOPNOTSUPP;
> +		}
> +		flags |= SKBEDIT_F_RSS_GROUP_ID;
> +		rss_group_id = nla_data(tb[TCA_SKBEDIT_RSS_GROUP_ID]);
> +	}
> +
>   	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
>   		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
>   
> @@ -262,6 +274,9 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>   	if (flags & SKBEDIT_F_MASK)
>   		params_new->mask = *mask;
>   
> +	if (flags & SKBEDIT_F_RSS_GROUP_ID)
> +		params_new->rss_group_id = *rss_group_id;
> +
>   	spin_lock_bh(&d->tcf_lock);
>   	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
>   	params_new = rcu_replace_pointer(d->params, params_new,
> @@ -326,6 +341,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>   
>   		pure_flags |= SKBEDIT_F_TXQ_SKBHASH;
>   	}
> +	if ((params->flags & SKBEDIT_F_RSS_GROUP_ID) &&
> +	    nla_put_u16(skb, TCA_SKBEDIT_RSS_GROUP_ID, params->rss_group_id))
> +		goto nla_put_failure;
>   	if (pure_flags != 0 &&
>   	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
>   		goto nla_put_failure;
> @@ -362,6 +380,7 @@ static size_t tcf_skbedit_get_fill_size(const struct tc_action *act)
>   		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */
>   		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */
>   		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */
> +		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_RSS_GROUP_ID */
>   		+ nla_total_size_64bit(sizeof(u64)); /* TCA_SKBEDIT_FLAGS */
>   }
>   
> @@ -390,6 +409,9 @@ static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
>   		} else if (is_tcf_skbedit_inheritdsfield(act)) {
>   			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"inheritdsfield\" option is used");
>   			return -EOPNOTSUPP;
> +		} else if (is_tcf_skbedit_rss_group_id(act)) {
> +			entry->id = FLOW_ACTION_RSS;
> +			entry->rss_group_id = tcf_skbedit_rss_group_id(act);
>   		} else {
>   			NL_SET_ERR_MSG_MOD(extack, "Unsupported skbedit option offload");
>   			return -EOPNOTSUPP;
> @@ -406,6 +428,8 @@ static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
>   			fl_action->id = FLOW_ACTION_PRIORITY;
>   		else if (is_tcf_skbedit_rx_queue_mapping(act))
>   			fl_action->id = FLOW_ACTION_RX_QUEUE_MAPPING;
> +		else if (is_tcf_skbedit_rss_group_id(act))
> +			fl_action->id = FLOW_ACTION_RSS;
>   		else
>   			return -EOPNOTSUPP;
>   	}
Nambiar, Amritha Oct. 20, 2023, 8:13 p.m. UTC | #4
On 10/20/2023 9:35 AM, Hariprasad Kelam wrote:
> 
> 
> 
> 
>> On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam <hkelam@marvell.com>
>> wrote:
>>>
>>> This patch extends current flow action with RSS, such that the user
>>> can install flower offloads with action RSS followed by a group id.
>>> Since this is done in hardware skip_sw flag is enforced.
>>
>> Our typical rule for TC is we need s/w equivalence for offloads. How would
>> this work in absence of offload?
>>
> [Hari]
> Our typical rule for TC is we need s/w equivalence for offloads. How would this work in absence of offload?
> 
> This patch we added as an extension to receive queue selection in hardware.
> This patch "act_skbedit: skbedit queue mapping for receive queue" enabled receive queue selection in hardware
> and skip_sw is enforced.
> 
> Adding stakeholders of this patch, to get their opinion.
> sridhar.samudrala@intel.com  amritha.nambiar@intel.com
> 
> incase of RSS, hardware makes decisions about incoming packets before they are even received in the queue.
> 

The skip_sw for skbedit receive queue action was enforced as the only 
other alternative was a new hw-only action, or changing the action 
mirred. See discussion at 
https://lore.kernel.org/netdev/20220921132929.3f4ca04d@kernel.org/

Few questions WRT this patch:
How are the rss groups created? ethtool rss contexts? Any reason to use 
TC to direct to rss contexts over using ethtool context ids?

IIUC, skbedit is meant to only edit skb metadata such as mark, packet 
type, queue mapping, priority etc. Even if this is a HW only action and 
has no use in the stack, would skbedit be the right fit here?

> Thanks,
> Hariprasad k
> 
> 
> 
> 
>> cheers,
>> jamal
>>
>>> Example:
>>> In a multi rss group supported NIC,
>>>
>>> rss group #1 flow hash indirection table populated with rx queues 1 to
>>> 4 rss group #2 flow hash indirection table populated with rx queues 5
>>> to 9
>>>
>>> $tc filter add dev eth1 ingress protocol ip flower ip_proto tcp
>>> dst_port
>>> 443 action skbedit rss_group 1 skip_sw
>>>
>>> Packets destined to tcp port 443 will be distributed among rx queues 1
>>> to 4
>>>
>>> $tc filter add dev eth1 ingress protocol ip flower ip_proto udp
>>> dst_port
>>> 8080 action skbedit rss_group 2 skip_sw
>>>
>>> Packets destined to udp port 8080 will be distributed among rx queues
>>> 5 to 9
Sunil Kovvuri Oct. 22, 2023, 5:53 a.m. UTC | #5
On Sat, Oct 21, 2023 at 1:43 AM Nambiar, Amritha
<amritha.nambiar@intel.com> wrote:
>
> >> On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam <hkelam@marvell.com>
> >> wrote:
> >>>
> >>> This patch extends current flow action with RSS, such that the user
> >>> can install flower offloads with action RSS followed by a group id.
> >>> Since this is done in hardware skip_sw flag is enforced.
> >>
> >> Our typical rule for TC is we need s/w equivalence for offloads. How would
> >> this work in absence of offload?
> >>
> > [Hari]
> > Our typical rule for TC is we need s/w equivalence for offloads. How would this work in absence of offload?
> >
> > This patch we added as an extension to receive queue selection in hardware.
> > This patch "act_skbedit: skbedit queue mapping for receive queue" enabled receive queue selection in hardware
> > and skip_sw is enforced.
> >
> > Adding stakeholders of this patch, to get their opinion.
> > sridhar.samudrala@intel.com  amritha.nambiar@intel.com
> >
> > incase of RSS, hardware makes decisions about incoming packets before they are even received in the queue.
> >
>
> The skip_sw for skbedit receive queue action was enforced as the only
> other alternative was a new hw-only action, or changing the action
> mirred. See discussion at
> https://lore.kernel.org/netdev/20220921132929.3f4ca04d@kernel.org/
>
> Few questions WRT this patch:
> How are the rss groups created? ethtool rss contexts? Any reason to use
> TC to direct to rss contexts over using ethtool context ids?
>

Yes, RSS groups are created using ethtool.
Ethtool ntuple is very limited in expressing flow rules and since the
general direction
is to use 'TC', we are attempting to add RSS action to 'TC'.


> IIUC, skbedit is meant to only edit skb metadata such as mark, packet
> type, queue mapping, priority etc. Even if this is a HW only action and
> has no use in the stack, would skbedit be the right fit here?
>

The thought was to keep related actions like RQ, RSS group etc under
one action ie skbedit.
If that's not the right place we can add a separate action.

Thanks,
Sunil.
Dave Taht Oct. 22, 2023, 7:20 a.m. UTC | #6
On Sat, Oct 21, 2023 at 10:53 PM Sunil Kovvuri <sunil.kovvuri@gmail.com> wrote:
>
> On Sat, Oct 21, 2023 at 1:43 AM Nambiar, Amritha
> <amritha.nambiar@intel.com> wrote:
> >
> > >> On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam <hkelam@marvell.com>
> > >> wrote:
> > >>>
> > >>> This patch extends current flow action with RSS, such that the user
> > >>> can install flower offloads with action RSS followed by a group id.
> > >>> Since this is done in hardware skip_sw flag is enforced.
> > >>
> > >> Our typical rule for TC is we need s/w equivalence for offloads. How would
> > >> this work in absence of offload?
> > >>
> > > [Hari]
> > > Our typical rule for TC is we need s/w equivalence for offloads. How would this work in absence of offload?
> > >
> > > This patch we added as an extension to receive queue selection in hardware.
> > > This patch "act_skbedit: skbedit queue mapping for receive queue" enabled receive queue selection in hardware
> > > and skip_sw is enforced.
> > >
> > > Adding stakeholders of this patch, to get their opinion.
> > > sridhar.samudrala@intel.com  amritha.nambiar@intel.com
> > >
> > > incase of RSS, hardware makes decisions about incoming packets before they are even received in the queue.

Is there any way to do a LPM to queue match on inbound?

> > >
> >
> > The skip_sw for skbedit receive queue action was enforced as the only
> > other alternative was a new hw-only action, or changing the action
> > mirred. See discussion at
> > https://lore.kernel.org/netdev/20220921132929.3f4ca04d@kernel.org/
> >
> > Few questions WRT this patch:
> > How are the rss groups created? ethtool rss contexts? Any reason to use
> > TC to direct to rss contexts over using ethtool context ids?
> >
>
> Yes, RSS groups are created using ethtool.
> Ethtool ntuple is very limited in expressing flow rules and since the
> general direction
> is to use 'TC', we are attempting to add RSS action to 'TC'.
>
>
> > IIUC, skbedit is meant to only edit skb metadata such as mark, packet
> > type, queue mapping, priority etc. Even if this is a HW only action and
> > has no use in the stack, would skbedit be the right fit here?
> >
>
> The thought was to keep related actions like RQ, RSS group etc under
> one action ie skbedit.
> If that's not the right place we can add a separate action.
>
> Thanks,
> Sunil.
>
Hariprasad Kelam Oct. 23, 2023, 4:04 a.m. UTC | #7
> -----Original Message-----
> From: Dave Taht <dave.taht@gmail.com>
> ----------------------------------------------------------------------
> On Sat, Oct 21, 2023 at 10:53 PM Sunil Kovvuri <sunil.kovvuri@gmail.com>
> wrote:
> >
> > On Sat, Oct 21, 2023 at 1:43 AM Nambiar, Amritha
> > <amritha.nambiar@intel.com> wrote:
> > >
> > > >> On Fri, Oct 20, 2023 at 2:12 AM Hariprasad Kelam
> > > >> <hkelam@marvell.com>
> > > >> wrote:
> > > >>>
> > > >>> This patch extends current flow action with RSS, such that the
> > > >>> user can install flower offloads with action RSS followed by a group
> id.
> > > >>> Since this is done in hardware skip_sw flag is enforced.
> > > >>
> > > >> Our typical rule for TC is we need s/w equivalence for offloads.
> > > >> How would this work in absence of offload?
> > > >>
> > > > [Hari]
> > > > Our typical rule for TC is we need s/w equivalence for offloads. How
> would this work in absence of offload?
> > > >
> > > > This patch we added as an extension to receive queue selection in
> hardware.
> > > > This patch "act_skbedit: skbedit queue mapping for receive queue"
> > > > enabled receive queue selection in hardware and skip_sw is enforced.
> > > >
> > > > Adding stakeholders of this patch, to get their opinion.
> > > > sridhar.samudrala@intel.com  amritha.nambiar@intel.com
> > > >
> > > > incase of RSS, hardware makes decisions about incoming packets
> before they are even received in the queue.
> 
> Is there any way to do a LPM to queue match on inbound?
> 
AFAIK, LPM (longest prefix match) is used to make routing decisions by comparing the destination ip address bit-by-bit with prefixes in the routing table.
Where on a typice UDP/TCP packet, RSS considers Source IP/Destination IP and Sport and Dport these fields to make decision on the queue mapping.



> > > >
> > >
> > > The skip_sw for skbedit receive queue action was enforced as the
> > > only other alternative was a new hw-only action, or changing the
> > > action mirred. See discussion at
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org
> > > _netdev_20220921132929.3f4ca04d-
> 40kernel.org_&d=DwIFaQ&c=nKjWec2b6R0
> > > mOyPaz7xtfQ&r=2bd4kP44ECYFgf-
> KoNSJWqEipEtpxXnNBKy0vyoJJ8A&m=_GDltrUw
> > > pNUkk5hvYOtlzFeW-rAeZy4_1bSWUdGVen-
> sJDjMWkmLw7pVfDH7OHBX&s=SGwkF4_m8
> > > 5hkScL8rNnbVPvDwEnNalysi6x6ual5NHY&e=
> > >
> > > Few questions WRT this patch:
> > > How are the rss groups created? ethtool rss contexts? Any reason to
> > > use TC to direct to rss contexts over using ethtool context ids?
> > >
> >
> > Yes, RSS groups are created using ethtool.
> > Ethtool ntuple is very limited in expressing flow rules and since the
> > general direction is to use 'TC', we are attempting to add RSS action
> > to 'TC'.
> >
> >
> > > IIUC, skbedit is meant to only edit skb metadata such as mark,
> > > packet type, queue mapping, priority etc. Even if this is a HW only
> > > action and has no use in the stack, would skbedit be the right fit here?
> > >
> >
> > The thought was to keep related actions like RQ, RSS group etc under
> > one action ie skbedit.
> > If that's not the right place we can add a separate action.
> >
> > Thanks,
> > Sunil.
> >
> 
> 
> --
> Oct 30: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__netdevconf.info_0x17_news_the-2Dmaestro-2Dand-2Dthe-2Dmusic-
> 2Dbof.html&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=2bd4kP44ECYFgf-
> KoNSJWqEipEtpxXnNBKy0vyoJJ8A&m=_GDltrUwpNUkk5hvYOtlzFeW-
> rAeZy4_1bSWUdGVen-
> sJDjMWkmLw7pVfDH7OHBX&s=gKGp2Lqrx7O6ZAfoWGSQeb__T6PYk8SzKHSf
> SxtoNLU&e=
> Dave Täht CSO, LibreQos
diff mbox series

Patch

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 314087a5e181..efa45ed87e69 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -168,6 +168,7 @@  enum flow_action_id {
 	FLOW_ACTION_PTYPE,
 	FLOW_ACTION_PRIORITY,
 	FLOW_ACTION_RX_QUEUE_MAPPING,
+	FLOW_ACTION_RSS,
 	FLOW_ACTION_WAKE,
 	FLOW_ACTION_QUEUE,
 	FLOW_ACTION_SAMPLE,
@@ -264,6 +265,7 @@  struct flow_action_entry {
 		u16                     ptype;          /* FLOW_ACTION_PTYPE */
 		u16			rx_queue;	/* FLOW_ACTION_RX_QUEUE_MAPPING */
 		u32			priority;	/* FLOW_ACTION_PRIORITY */
+		u16			rss_group_id;	/* FLOW_ACTION_RSS_GROUP_ID */
 		struct {				/* FLOW_ACTION_QUEUE */
 			u32		ctx;
 			u32		index;
diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 9649600fb3dc..c4a122b49e94 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -19,6 +19,7 @@  struct tcf_skbedit_params {
 	u16 queue_mapping;
 	u16 mapping_mod;
 	u16 ptype;
+	u16 rss_group_id;
 	struct rcu_head rcu;
 };
 
@@ -106,6 +107,17 @@  static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a)
 	return rx_queue;
 }
 
+static inline u16 tcf_skbedit_rss_group_id(const struct tc_action *a)
+{
+	u16 rss_group_id;
+
+	rcu_read_lock();
+	rss_group_id = rcu_dereference(to_skbedit(a)->params)->rss_group_id;
+	rcu_read_unlock();
+
+	return rss_group_id;
+}
+
 /* Return true iff action is queue_mapping */
 static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a)
 {
@@ -136,4 +148,10 @@  static inline bool is_tcf_skbedit_inheritdsfield(const struct tc_action *a)
 	return is_tcf_skbedit_with_flag(a, SKBEDIT_F_INHERITDSFIELD);
 }
 
+static inline bool is_tcf_skbedit_rss_group_id(const struct tc_action *a)
+{
+	return is_tcf_skbedit_ingress(a->tcfa_flags) &&
+	       is_tcf_skbedit_with_flag(a, SKBEDIT_F_RSS_GROUP_ID);
+}
+
 #endif /* __NET_TC_SKBEDIT_H */
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 64032513cc4c..83f53550bb88 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -17,6 +17,7 @@ 
 #define SKBEDIT_F_MASK			0x10
 #define SKBEDIT_F_INHERITDSFIELD	0x20
 #define SKBEDIT_F_TXQ_SKBHASH		0x40
+#define SKBEDIT_F_RSS_GROUP_ID		0x80
 
 struct tc_skbedit {
 	tc_gen;
@@ -34,6 +35,7 @@  enum {
 	TCA_SKBEDIT_MASK,
 	TCA_SKBEDIT_FLAGS,
 	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
+	TCA_SKBEDIT_RSS_GROUP_ID,
 	__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index ce7008cf291c..127198239ac7 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -112,6 +112,7 @@  static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
 	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
 	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },
+	[TCA_SKBEDIT_RSS_GROUP_ID]	= { .len = sizeof(u16) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -126,8 +127,8 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tcf_chain *goto_ch = NULL;
 	struct tc_skbedit *parm;
 	struct tcf_skbedit *d;
+	u16 *queue_mapping = NULL, *ptype = NULL, *rss_group_id = NULL;
 	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
-	u16 *queue_mapping = NULL, *ptype = NULL;
 	u16 mapping_mod = 1;
 	bool exists = false;
 	int ret = 0, err;
@@ -176,6 +177,17 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 		mask = nla_data(tb[TCA_SKBEDIT_MASK]);
 	}
 
+	if (tb[TCA_SKBEDIT_RSS_GROUP_ID] != NULL) {
+		if (!is_tcf_skbedit_ingress(act_flags) ||
+		    !(act_flags & TCA_ACT_FLAGS_SKIP_SW)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "\"rss_group_id\" option allowed only on receive side with hardware only, use skip_sw");
+			return -EOPNOTSUPP;
+		}
+		flags |= SKBEDIT_F_RSS_GROUP_ID;
+		rss_group_id = nla_data(tb[TCA_SKBEDIT_RSS_GROUP_ID]);
+	}
+
 	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
 		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
 
@@ -262,6 +274,9 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	if (flags & SKBEDIT_F_MASK)
 		params_new->mask = *mask;
 
+	if (flags & SKBEDIT_F_RSS_GROUP_ID)
+		params_new->rss_group_id = *rss_group_id;
+
 	spin_lock_bh(&d->tcf_lock);
 	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
 	params_new = rcu_replace_pointer(d->params, params_new,
@@ -326,6 +341,9 @@  static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 
 		pure_flags |= SKBEDIT_F_TXQ_SKBHASH;
 	}
+	if ((params->flags & SKBEDIT_F_RSS_GROUP_ID) &&
+	    nla_put_u16(skb, TCA_SKBEDIT_RSS_GROUP_ID, params->rss_group_id))
+		goto nla_put_failure;
 	if (pure_flags != 0 &&
 	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
 		goto nla_put_failure;
@@ -362,6 +380,7 @@  static size_t tcf_skbedit_get_fill_size(const struct tc_action *act)
 		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */
 		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */
 		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_RSS_GROUP_ID */
 		+ nla_total_size_64bit(sizeof(u64)); /* TCA_SKBEDIT_FLAGS */
 }
 
@@ -390,6 +409,9 @@  static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 		} else if (is_tcf_skbedit_inheritdsfield(act)) {
 			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"inheritdsfield\" option is used");
 			return -EOPNOTSUPP;
+		} else if (is_tcf_skbedit_rss_group_id(act)) {
+			entry->id = FLOW_ACTION_RSS;
+			entry->rss_group_id = tcf_skbedit_rss_group_id(act);
 		} else {
 			NL_SET_ERR_MSG_MOD(extack, "Unsupported skbedit option offload");
 			return -EOPNOTSUPP;
@@ -406,6 +428,8 @@  static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 			fl_action->id = FLOW_ACTION_PRIORITY;
 		else if (is_tcf_skbedit_rx_queue_mapping(act))
 			fl_action->id = FLOW_ACTION_RX_QUEUE_MAPPING;
+		else if (is_tcf_skbedit_rss_group_id(act))
+			fl_action->id = FLOW_ACTION_RSS;
 		else
 			return -EOPNOTSUPP;
 	}