[net-next,11/18] openvswitch: Use nested-BH locking for ovs_actions.

Message ID	20250309144653.825351-12-bigeasy@linutronix.de (mailing list archive)
State	New
Delegated to:	Netdev Maintainers
Headers	show Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78A771DEFF4 for <netdev@vger.kernel.org>; Sun, 9 Mar 2025 14:47:12 +0000 (UTC) From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: netdev@vger.kernel.org, linux-rt-devel@lists.linux.dev Cc: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Pravin B Shelar <pshelar@ovn.org>, dev@openvswitch.org Subject: [PATCH net-next 11/18] openvswitch: Use nested-BH locking for ovs_actions. Date: Sun, 9 Mar 2025 15:46:46 +0100 Message-ID: <20250309144653.825351-12-bigeasy@linutronix.de> In-Reply-To: <20250309144653.825351-1-bigeasy@linutronix.de> References: <20250309144653.825351-1-bigeasy@linutronix.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable
Series	net: Cover more per-CPU storage with local nested BH locking. \| expand [net-next,00/18] net: Cover more per-CPU storage with local nested BH locking. [net-next,01/18] net: page_pool: Don't recycle into cache on PREEMPT_RT. [net-next,02/18] net: dst_cache: Use nested-BH locking for dst_cache::cache. [net-next,03/18] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT. [net-next,04/18] ipv6: sr: Use nested-BH locking for hmac_storage. [net-next,05/18] xdp: Use nested-BH locking for system_page_pool. [net-next,06/18] netfilter: nf_dup{4, 6}: Move duplication check to task_struct. [net-next,07/18] netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx. [net-next,08/18] netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit. [net-next,09/18] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46]. [net-next,10/18] openvswitch: Merge three per-CPU structures into one. [net-next,11/18] openvswitch: Use nested-BH locking for ovs_actions. [net-next,12/18] openvswitch: Move ovs_frag_data_storage into the struct ovs_action. [net-next,13/18] net/sched: act_mirred: Move the recursion counter struct netdev_xmit. [net-next,14/18] net/sched: Use nested-BH locking for sch_frag_data_storage. [net-next,15/18] mptcp: Use nested-BH locking for hmac_storage. [net-next,16/18] rds: Disable only bottom halves in rds_page_remainder_alloc(). [net-next,17/18] rds: Acquire per-CPU pointer within BH disabled section. [net-next,18/18] rds: Use nested-BH locking for rds_page_remainder.

Sebastian Andrzej Siewior March 9, 2025, 2:46 p.m. UTC

ovs_actions is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
The data structure can be referenced recursive and there is a recursion
counter to avoid too many recursions.

Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. Add an owner of the struct which is
the current task and acquire the lock only if the structure is not owned
by the current task.

Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/openvswitch/actions.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Ilya Maximets March 10, 2025, 2:17 p.m. UTC | #1

On 3/9/25 15:46, Sebastian Andrzej Siewior wrote:
> ovs_actions is a per-CPU variable and relies on disabled BH for its
> locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
> this data structure requires explicit locking.
> The data structure can be referenced recursive and there is a recursion
> counter to avoid too many recursions.
> 
> Add a local_lock_t to the data structure and use
> local_lock_nested_bh() for locking. Add an owner of the struct which is
> the current task and acquire the lock only if the structure is not owned
> by the current task.
> 
> Cc: Pravin B Shelar <pshelar@ovn.org>
> Cc: dev@openvswitch.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  net/openvswitch/actions.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 322ca7b30c3bc..c4131e04c1284 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -82,6 +82,8 @@ struct ovs_action {
>  	struct action_fifo action_fifos;
>  	struct action_flow_keys flow_keys;
>  	int exec_level;
> +	struct task_struct *owner;
> +	local_lock_t bh_lock;
>  };
>  
>  static DEFINE_PER_CPU(struct ovs_action, ovs_actions);
> @@ -1690,8 +1692,14 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>  			const struct sw_flow_actions *acts,
>  			struct sw_flow_key *key)
>  {
> +	struct ovs_action *ovs_act = this_cpu_ptr(&ovs_actions);
>  	int err, level;
>  
> +	if (ovs_act->owner != current) {
> +		local_lock_nested_bh(&ovs_actions.bh_lock);

Wouldn't this cause a warning when we're in a syscall/process context?

We will also be taking a spinlock in a general case here, which doesn't
sound particularly great, since we can potentially be holding it for a
long time and it's also not free to take/release on this hot path.
Is there a version of this lock that's a no-op on non-RT?

> +		ovs_act->owner = current;
> +	}
> +
>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>  
>  out:
>  	__this_cpu_dec(ovs_actions.exec_level);
> +
> +	if (level == 1) {
> +		ovs_act->owner = NULL;
> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
> +	}

Seems dangerous to lock every time the owner changes but unlock only
once on level 1.  Even if this works fine, it seems unnecessarily
complicated.  Maybe it's better to just lock once before calling
ovs_execute_actions() instead?

Also, the name of the struct ovs_action doesn't make a lot of sense,
I'd suggest to call it pcpu_storage or something like that instead.
I.e. have a more generic name as the fields inside are not directly
related to each other.

Best regards, Ilya Maximets.

Sebastian Andrzej Siewior March 10, 2025, 2:44 p.m. UTC | #2

On 2025-03-10 15:17:17 [+0100], Ilya Maximets wrote:
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -82,6 +82,8 @@ struct ovs_action {
> >  	struct action_fifo action_fifos;
> >  	struct action_flow_keys flow_keys;
> >  	int exec_level;
> > +	struct task_struct *owner;
> > +	local_lock_t bh_lock;
> >  };
> >  
> >  static DEFINE_PER_CPU(struct ovs_action, ovs_actions);
> > @@ -1690,8 +1692,14 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >  			const struct sw_flow_actions *acts,
> >  			struct sw_flow_key *key)
> >  {
> > +	struct ovs_action *ovs_act = this_cpu_ptr(&ovs_actions);
> >  	int err, level;
> >  
> > +	if (ovs_act->owner != current) {
> > +		local_lock_nested_bh(&ovs_actions.bh_lock);
> 
> Wouldn't this cause a warning when we're in a syscall/process context?

My understanding is that is only invoked in softirq context. Did I
misunderstood it? Otherwise that this_cpu_ptr() above should complain
that preemption is not disabled and if preemption is indeed not disabled
how do you ensure that you don't get preempted after the
__this_cpu_inc_return() in several tasks (at the same time) leading to
exceeding the OVS_RECURSION_LIMIT?

> We will also be taking a spinlock in a general case here, which doesn't
> sound particularly great, since we can potentially be holding it for a
> long time and it's also not free to take/release on this hot path.
> Is there a version of this lock that's a no-op on non-RT?

local_lock_nested_bh() does not acquire any lock on !PREEMPT_RT. It only
verifies that in_softirq() is true.

> > +		ovs_act->owner = current;
> > +	}
> > +
> >  	level = __this_cpu_inc_return(ovs_actions.exec_level);
> >  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
> >  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
> > @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >  
> >  out:
> >  	__this_cpu_dec(ovs_actions.exec_level);
> > +
> > +	if (level == 1) {
> > +		ovs_act->owner = NULL;
> > +		local_unlock_nested_bh(&ovs_actions.bh_lock);
> > +	}
> 
> Seems dangerous to lock every time the owner changes but unlock only
> once on level 1.  Even if this works fine, it seems unnecessarily
> complicated.  Maybe it's better to just lock once before calling
> ovs_execute_actions() instead?

My understanding is this can be invoked recursively. That means on first
invocation owner == NULL and then you acquire the lock at which point
exec_level goes 0->1. On the recursive invocation owner == current and
you skip the lock but exec_level goes 1 -> 2.
On your return path once level becomes 1, then it means that dec made it
go 1 -> 0, you unlock the lock.
The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
softirqs disabled which guarantee that there will be no preemption.

tools/testing/selftests/net/openvswitch should cover this?

> Also, the name of the struct ovs_action doesn't make a lot of sense,
> I'd suggest to call it pcpu_storage or something like that instead.
> I.e. have a more generic name as the fields inside are not directly
> related to each other.

Understood. ovs_pcpu_storage maybe?

> Best regards, Ilya Maximets.

Sebastian

Ilya Maximets March 10, 2025, 4:56 p.m. UTC | #3

On 3/10/25 15:44, Sebastian Andrzej Siewior wrote:
> On 2025-03-10 15:17:17 [+0100], Ilya Maximets wrote:
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -82,6 +82,8 @@ struct ovs_action {
>>>  	struct action_fifo action_fifos;
>>>  	struct action_flow_keys flow_keys;
>>>  	int exec_level;
>>> +	struct task_struct *owner;
>>> +	local_lock_t bh_lock;
>>>  };
>>>  
>>>  static DEFINE_PER_CPU(struct ovs_action, ovs_actions);
>>> @@ -1690,8 +1692,14 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>  			const struct sw_flow_actions *acts,
>>>  			struct sw_flow_key *key)
>>>  {
>>> +	struct ovs_action *ovs_act = this_cpu_ptr(&ovs_actions);
>>>  	int err, level;
>>>  
>>> +	if (ovs_act->owner != current) {
>>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
>>
>> Wouldn't this cause a warning when we're in a syscall/process context?
> 
> My understanding is that is only invoked in softirq context. Did I
> misunderstood it?

It can be called from the syscall/process context while processing
OVS_PACKET_CMD_EXECUTE request.

> Otherwise that this_cpu_ptr() above should complain
> that preemption is not disabled and if preemption is indeed not disabled
> how do you ensure that you don't get preempted after the
> __this_cpu_inc_return() in several tasks (at the same time) leading to
> exceeding the OVS_RECURSION_LIMIT?

We disable BH in this case, so it should be safe (on non-RT).  See the
ovs_packet_cmd_execute() for more details.

> 
>> We will also be taking a spinlock in a general case here, which doesn't
>> sound particularly great, since we can potentially be holding it for a
>> long time and it's also not free to take/release on this hot path.
>> Is there a version of this lock that's a no-op on non-RT?
> 
> local_lock_nested_bh() does not acquire any lock on !PREEMPT_RT. It only
> verifies that in_softirq() is true.

Ah, you're right.  It does more things, but only under lock debug.

> 
>>> +		ovs_act->owner = current;
>>> +	}
>>> +
>>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
>>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
>>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
>>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>  
>>>  out:
>>>  	__this_cpu_dec(ovs_actions.exec_level);
>>> +
>>> +	if (level == 1) {
>>> +		ovs_act->owner = NULL;
>>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
>>> +	}
>>
>> Seems dangerous to lock every time the owner changes but unlock only
>> once on level 1.  Even if this works fine, it seems unnecessarily
>> complicated.  Maybe it's better to just lock once before calling
>> ovs_execute_actions() instead?
> 
> My understanding is this can be invoked recursively. That means on first
> invocation owner == NULL and then you acquire the lock at which point
> exec_level goes 0->1. On the recursive invocation owner == current and
> you skip the lock but exec_level goes 1 -> 2.
> On your return path once level becomes 1, then it means that dec made it
> go 1 -> 0, you unlock the lock.

My point is: why locking here with some extra non-obvious logic of owner
tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
ovs_dp_process_packet() instead?  We already disable BH in one of those
and take appropriate RCU locks in both.  So, feels like a better place
for the extra locking if necessary.  We will also not need to move around
any code in actions.c if the code there is guaranteed to be safe by holding
locks outside of it.

> The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
> softirqs disabled which guarantee that there will be no preemption.
> 
> tools/testing/selftests/net/openvswitch should cover this?

It's not a comprehensive test suite, it covers some cases, but it
doesn't test anything related to preemptions specifically.

> 
>> Also, the name of the struct ovs_action doesn't make a lot of sense,
>> I'd suggest to call it pcpu_storage or something like that instead.
>> I.e. have a more generic name as the fields inside are not directly
>> related to each other.
> 
> Understood. ovs_pcpu_storage maybe?

It's OK, I guess, but see also a point about locking inside datapath.c
instead and probably not needing to change anything in actions.c.

> 
>> Best regards, Ilya Maximets.
> 
> Sebastian

Sebastian Andrzej Siewior March 13, 2025, 11:50 a.m. UTC | #4

On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
> >>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
> >>
> >> Wouldn't this cause a warning when we're in a syscall/process context?
> > 
> > My understanding is that is only invoked in softirq context. Did I
> > misunderstood it?
> 
> It can be called from the syscall/process context while processing
> OVS_PACKET_CMD_EXECUTE request.
> 
> > Otherwise that this_cpu_ptr() above should complain
> > that preemption is not disabled and if preemption is indeed not disabled
> > how do you ensure that you don't get preempted after the
> > __this_cpu_inc_return() in several tasks (at the same time) leading to
> > exceeding the OVS_RECURSION_LIMIT?
> 
> We disable BH in this case, so it should be safe (on non-RT).  See the
> ovs_packet_cmd_execute() for more details.

Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
mimics the local_bh_disable() behaviour in terms of exclusive data
structures on a smaller scope.

> >>> +		ovs_act->owner = current;
> >>> +	}
> >>> +
> >>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
> >>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
> >>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
> >>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>>  
> >>>  out:
> >>>  	__this_cpu_dec(ovs_actions.exec_level);
> >>> +
> >>> +	if (level == 1) {
> >>> +		ovs_act->owner = NULL;
> >>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
> >>> +	}
> >>
> >> Seems dangerous to lock every time the owner changes but unlock only
> >> once on level 1.  Even if this works fine, it seems unnecessarily
> >> complicated.  Maybe it's better to just lock once before calling
> >> ovs_execute_actions() instead?
> > 
> > My understanding is this can be invoked recursively. That means on first
> > invocation owner == NULL and then you acquire the lock at which point
> > exec_level goes 0->1. On the recursive invocation owner == current and
> > you skip the lock but exec_level goes 1 -> 2.
> > On your return path once level becomes 1, then it means that dec made it
> > go 1 -> 0, you unlock the lock.
> 
> My point is: why locking here with some extra non-obvious logic of owner
> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
> ovs_dp_process_packet() instead?  We already disable BH in one of those
> and take appropriate RCU locks in both.  So, feels like a better place
> for the extra locking if necessary.  We will also not need to move around
> any code in actions.c if the code there is guaranteed to be safe by holding
> locks outside of it.

I think I was considering it but dropped it because it looks like one
can call the other.
ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
This could the lock unconditionally.
Then we have ovs_dp_process_packet() as the second entry point towards
ovs_execute_actions() and is the tricky one. One originates from
netdev_frame_hook() which the "normal" packet receiving.
Then within ovs_execute_actions() there is ovs_vport_send() which could
use internal_dev_recv() for forwarding. This one throws the packet into
the networking stack so it could come back via netdev_frame_hook().
Then there is this internal forwarding via internal_dev_xmit() which
also ends up in ovs_execute_actions(). Here I don't know if this can
originate from within the recursion.

After looking at this and seeing the internal_dev_recv() I decided to
move it to within ovs_execute_actions() where the recursion check itself
is.

> > The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
> > softirqs disabled which guarantee that there will be no preemption.
> > 
> > tools/testing/selftests/net/openvswitch should cover this?
> 
> It's not a comprehensive test suite, it covers some cases, but it
> doesn't test anything related to preemptions specifically.

From looking at the traces, everything originates from
netdev_frame_hook() and there is sometimes one recursion from within
ovs_execute_actions(). I haven't seen anything else.

> >> Also, the name of the struct ovs_action doesn't make a lot of sense,
> >> I'd suggest to call it pcpu_storage or something like that instead.
> >> I.e. have a more generic name as the fields inside are not directly
> >> related to each other.
> > 
> > Understood. ovs_pcpu_storage maybe?
> 
> It's OK, I guess, but see also a point about locking inside datapath.c
> instead and probably not needing to change anything in actions.c.

If you say that adding a lock to ovs_dp_process_packet() and another to
ovs_packet_cmd_execute() then I can certainly update. However based on
what I wrote above, I am not sure.

> >> Best regards, Ilya Maximets.

Sebastian

Ilya Maximets March 13, 2025, 1:23 p.m. UTC | #5

On 3/13/25 12:50, Sebastian Andrzej Siewior wrote:
> On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
>>>>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
>>>>
>>>> Wouldn't this cause a warning when we're in a syscall/process context?
>>>
>>> My understanding is that is only invoked in softirq context. Did I
>>> misunderstood it?
>>
>> It can be called from the syscall/process context while processing
>> OVS_PACKET_CMD_EXECUTE request.
>>
>>> Otherwise that this_cpu_ptr() above should complain
>>> that preemption is not disabled and if preemption is indeed not disabled
>>> how do you ensure that you don't get preempted after the
>>> __this_cpu_inc_return() in several tasks (at the same time) leading to
>>> exceeding the OVS_RECURSION_LIMIT?
>>
>> We disable BH in this case, so it should be safe (on non-RT).  See the
>> ovs_packet_cmd_execute() for more details.
> 
> Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
> safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
> mimics the local_bh_disable() behaviour in terms of exclusive data
> structures on a smaller scope.

OK.  I missed that is_softirq() returns true when BH is disabled manually.

> 
>>>>> +		ovs_act->owner = current;
>>>>> +	}
>>>>> +
>>>>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
>>>>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
>>>>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
>>>>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>>>  
>>>>>  out:
>>>>>  	__this_cpu_dec(ovs_actions.exec_level);
>>>>> +
>>>>> +	if (level == 1) {
>>>>> +		ovs_act->owner = NULL;
>>>>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
>>>>> +	}
>>>>
>>>> Seems dangerous to lock every time the owner changes but unlock only
>>>> once on level 1.  Even if this works fine, it seems unnecessarily
>>>> complicated.  Maybe it's better to just lock once before calling
>>>> ovs_execute_actions() instead?
>>>
>>> My understanding is this can be invoked recursively. That means on first
>>> invocation owner == NULL and then you acquire the lock at which point
>>> exec_level goes 0->1. On the recursive invocation owner == current and
>>> you skip the lock but exec_level goes 1 -> 2.
>>> On your return path once level becomes 1, then it means that dec made it
>>> go 1 -> 0, you unlock the lock.
>>
>> My point is: why locking here with some extra non-obvious logic of owner
>> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
>> ovs_dp_process_packet() instead?  We already disable BH in one of those
>> and take appropriate RCU locks in both.  So, feels like a better place
>> for the extra locking if necessary.  We will also not need to move around
>> any code in actions.c if the code there is guaranteed to be safe by holding
>> locks outside of it.
> 
> I think I was considering it but dropped it because it looks like one
> can call the other.
> ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
> This could the lock unconditionally.
> Then we have ovs_dp_process_packet() as the second entry point towards
> ovs_execute_actions() and is the tricky one. One originates from
> netdev_frame_hook() which the "normal" packet receiving.
> Then within ovs_execute_actions() there is ovs_vport_send() which could
> use internal_dev_recv() for forwarding. This one throws the packet into
> the networking stack so it could come back via netdev_frame_hook().
> Then there is this internal forwarding via internal_dev_xmit() which
> also ends up in ovs_execute_actions(). Here I don't know if this can
> originate from within the recursion.

It's true that ovs_packet_cmd_execute() can not be re-intered, while
ovs_dp_process_packet() can be re-entered if the packet leaves OVS and
then comes back from another port.  It's still better to handle all the
locking within datapath.c and not lock for RT in actions.c and for non-RT
in datapath.c.

> 
> After looking at this and seeing the internal_dev_recv() I decided to
> move it to within ovs_execute_actions() where the recursion check itself
> is.
> 
>>> The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
>>> softirqs disabled which guarantee that there will be no preemption.
>>>
>>> tools/testing/selftests/net/openvswitch should cover this?
>>
>> It's not a comprehensive test suite, it covers some cases, but it
>> doesn't test anything related to preemptions specifically.
> 
> From looking at the traces, everything originates from
> netdev_frame_hook() and there is sometimes one recursion from within
> ovs_execute_actions(). I haven't seen anything else.
> 
>>>> Also, the name of the struct ovs_action doesn't make a lot of sense,
>>>> I'd suggest to call it pcpu_storage or something like that instead.
>>>> I.e. have a more generic name as the fields inside are not directly
>>>> related to each other.
>>>
>>> Understood. ovs_pcpu_storage maybe?
>>
>> It's OK, I guess, but see also a point about locking inside datapath.c
>> instead and probably not needing to change anything in actions.c.
> 
> If you say that adding a lock to ovs_dp_process_packet() and another to
> ovs_packet_cmd_execute() then I can certainly update. However based on
> what I wrote above, I am not sure.

I think, it's better if we keep all the locks in datapath.c and let
actions.c assume that all the operations are always safe as it was
originally intended.

Cc: Aaron and Eelco, in case they have some thoughts on this as well.

Best regards, Ilya Maximets.

Sebastian Andrzej Siewior March 13, 2025, 2:02 p.m. UTC | #6

On 2025-03-13 14:23:16 [+0100], Ilya Maximets wrote:
> > originate from within the recursion.
> 
> It's true that ovs_packet_cmd_execute() can not be re-intered, while
> ovs_dp_process_packet() can be re-entered if the packet leaves OVS and
> then comes back from another port.  It's still better to handle all the
> locking within datapath.c and not lock for RT in actions.c and for non-RT
> in datapath.c.

Okay.

> >>>> Also, the name of the struct ovs_action doesn't make a lot of sense,
> >>>> I'd suggest to call it pcpu_storage or something like that instead.
> >>>> I.e. have a more generic name as the fields inside are not directly
> >>>> related to each other.
> >>>
> >>> Understood. ovs_pcpu_storage maybe?
> >>
> >> It's OK, I guess, but see also a point about locking inside datapath.c
> >> instead and probably not needing to change anything in actions.c.
> > 
> > If you say that adding a lock to ovs_dp_process_packet() and another to
> > ovs_packet_cmd_execute() then I can certainly update. However based on
> > what I wrote above, I am not sure.
> 
> I think, it's better if we keep all the locks in datapath.c and let
> actions.c assume that all the operations are always safe as it was
> originally intended.

If you say so. Then I move the logic to the two callers to datapath.c
then. But I would need the same recursive lock-detection as I currently
have in ovs_dp_process_packet(). That means we would have the lock
datapath.c and the data structure it protects in actions.c.

> Cc: Aaron and Eelco, in case they have some thoughts on this as well.

While at it, I would keep "openvswitch: Merge three per-CPU structures
into one." since it looks like a nice clean up.

> Best regards, Ilya Maximets.

Sebastian

Aaron Conole March 13, 2025, 2:11 p.m. UTC | #7

Ilya Maximets <i.maximets@ovn.org> writes:

> On 3/13/25 12:50, Sebastian Andrzej Siewior wrote:
>> On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
>>>>>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
>>>>>
>>>>> Wouldn't this cause a warning when we're in a syscall/process context?
>>>>
>>>> My understanding is that is only invoked in softirq context. Did I
>>>> misunderstood it?
>>>
>>> It can be called from the syscall/process context while processing
>>> OVS_PACKET_CMD_EXECUTE request.
>>>
>>>> Otherwise that this_cpu_ptr() above should complain
>>>> that preemption is not disabled and if preemption is indeed not disabled
>>>> how do you ensure that you don't get preempted after the
>>>> __this_cpu_inc_return() in several tasks (at the same time) leading to
>>>> exceeding the OVS_RECURSION_LIMIT?
>>>
>>> We disable BH in this case, so it should be safe (on non-RT).  See the
>>> ovs_packet_cmd_execute() for more details.
>> 
>> Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
>> safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
>> mimics the local_bh_disable() behaviour in terms of exclusive data
>> structures on a smaller scope.
>
> OK.  I missed that is_softirq() returns true when BH is disabled manually.
>
>> 
>>>>>> +		ovs_act->owner = current;
>>>>>> +	}
>>>>>> +
>>>>>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
>>>>>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
>>>>>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
>>>>>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>>>>  
>>>>>>  out:
>>>>>>  	__this_cpu_dec(ovs_actions.exec_level);
>>>>>> +
>>>>>> +	if (level == 1) {
>>>>>> +		ovs_act->owner = NULL;
>>>>>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
>>>>>> +	}
>>>>>
>>>>> Seems dangerous to lock every time the owner changes but unlock only
>>>>> once on level 1.  Even if this works fine, it seems unnecessarily
>>>>> complicated.  Maybe it's better to just lock once before calling
>>>>> ovs_execute_actions() instead?
>>>>
>>>> My understanding is this can be invoked recursively. That means on first
>>>> invocation owner == NULL and then you acquire the lock at which point
>>>> exec_level goes 0->1. On the recursive invocation owner == current and
>>>> you skip the lock but exec_level goes 1 -> 2.
>>>> On your return path once level becomes 1, then it means that dec made it
>>>> go 1 -> 0, you unlock the lock.
>>>
>>> My point is: why locking here with some extra non-obvious logic of owner
>>> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
>>> ovs_dp_process_packet() instead?  We already disable BH in one of those
>>> and take appropriate RCU locks in both.  So, feels like a better place
>>> for the extra locking if necessary.  We will also not need to move around
>>> any code in actions.c if the code there is guaranteed to be safe by holding
>>> locks outside of it.
>> 
>> I think I was considering it but dropped it because it looks like one
>> can call the other.
>> ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
>> This could the lock unconditionally.
>> Then we have ovs_dp_process_packet() as the second entry point towards
>> ovs_execute_actions() and is the tricky one. One originates from
>> netdev_frame_hook() which the "normal" packet receiving.
>> Then within ovs_execute_actions() there is ovs_vport_send() which could
>> use internal_dev_recv() for forwarding. This one throws the packet into
>> the networking stack so it could come back via netdev_frame_hook().
>> Then there is this internal forwarding via internal_dev_xmit() which
>> also ends up in ovs_execute_actions(). Here I don't know if this can
>> originate from within the recursion.

Just a note that it still needs to go through `ovs_dp_process_packet()`
before it will enter the `ovs_execute_actions()` call by way of the
`ovs_vport_receive()` call.  So keeping the locking in datapath.c should
not be a complex task.

> It's true that ovs_packet_cmd_execute() can not be re-intered, while
> ovs_dp_process_packet() can be re-entered if the packet leaves OVS and
> then comes back from another port.  It's still better to handle all the
> locking within datapath.c and not lock for RT in actions.c and for non-RT
> in datapath.c.

+1

>> 
>> After looking at this and seeing the internal_dev_recv() I decided to
>> move it to within ovs_execute_actions() where the recursion check itself
>> is.
>> 
>>>> The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
>>>> softirqs disabled which guarantee that there will be no preemption.
>>>>
>>>> tools/testing/selftests/net/openvswitch should cover this?
>>>
>>> It's not a comprehensive test suite, it covers some cases, but it
>>> doesn't test anything related to preemptions specifically.

Yes, this would be good to enhance, and the plan is to improve it as we
can.  If during the course of this work you identify a nice test case,
please do feel empowered to add it.

>> From looking at the traces, everything originates from
>> netdev_frame_hook() and there is sometimes one recursion from within
>> ovs_execute_actions(). I haven't seen anything else.
>> 
>>>>> Also, the name of the struct ovs_action doesn't make a lot of sense,
>>>>> I'd suggest to call it pcpu_storage or something like that instead.
>>>>> I.e. have a more generic name as the fields inside are not directly
>>>>> related to each other.
>>>>
>>>> Understood. ovs_pcpu_storage maybe?
>>>
>>> It's OK, I guess, but see also a point about locking inside datapath.c
>>> instead and probably not needing to change anything in actions.c.
>> 
>> If you say that adding a lock to ovs_dp_process_packet() and another to
>> ovs_packet_cmd_execute() then I can certainly update. However based on
>> what I wrote above, I am not sure.
>
> I think, it's better if we keep all the locks in datapath.c and let
> actions.c assume that all the operations are always safe as it was
> originally intended.

Agreed - reading through actions code can be complex enough without need
to remember to do recursions checks there.

> Cc: Aaron and Eelco, in case they have some thoughts on this as well.

Thanks Ilya - I think you covered the major points.

> Best regards, Ilya Maximets.

[net-next,11/18] openvswitch: Use nested-BH locking for ovs_actions.

Commit Message

Comments

Patch