mbox series

[bpf-next,v2,0/4] Remove unnecessary RCU grace period chaining

Message ID 20221014113946.965131-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series Remove unnecessary RCU grace period chaining | expand

Message

Hou Tao Oct. 14, 2022, 11:39 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

Now bpf uses RCU grace period chaining to wait for the completion of
access from both sleepable and non-sleepable bpf program: calling
call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
period, then in its callback calls call_rcu() or kfree_rcu() to wait for
a normal RCU grace period.

According to the implementation of RCU Tasks Trace, it inovkes
->postscan_func() to wait for one RCU-tasks-trace grace period and
rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
normal RCU grace period in turn, so one RCU-tasks-trace grace period
will imply one normal RCU grace period. To codify the implication,
introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch
#2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem.
Other two uses of call_rcu_tasks_trace() are unchanged: for
__bpf_prog_put_rcu() there is no gp chain and for
__bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU
tasks GP.

An alternative way to remove these unnecessary RCU grace period
chainings is using the RCU polling API to check whether or not a normal
RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it
needs an unsigned long space for each free element or each call, and
it is not affordable for local storage element, so as for now always
rcu_trace_implies_rcu_gp().

Comments are always welcome.

Change Log:

v2:
 * codify the implication of RCU Tasks Trace grace period instead of
   assuming for it

v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com

Hou Tao (3):
  bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
  bpf: Use rcu_trace_implies_rcu_gp() in local storage map
  bpf: Use rcu_trace_implies_rcu_gp() for program array freeing

Paul E. McKenney (1):
  rcu-tasks: Provide rcu_trace_implies_rcu_gp()

 include/linux/rcupdate.h       | 12 ++++++++++++
 kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
 kernel/bpf/core.c              |  8 +++++++-
 kernel/bpf/memalloc.c          | 15 ++++++++++-----
 kernel/rcu/tasks.h             |  2 ++
 5 files changed, 42 insertions(+), 8 deletions(-)

Comments

Paul E. McKenney Oct. 17, 2022, 1:39 p.m. UTC | #1
On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> Now bpf uses RCU grace period chaining to wait for the completion of
> access from both sleepable and non-sleepable bpf program: calling
> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
> period, then in its callback calls call_rcu() or kfree_rcu() to wait for
> a normal RCU grace period.
> 
> According to the implementation of RCU Tasks Trace, it inovkes
> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> will imply one normal RCU grace period. To codify the implication,
> introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch
> #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem.
> Other two uses of call_rcu_tasks_trace() are unchanged: for
> __bpf_prog_put_rcu() there is no gp chain and for
> __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU
> tasks GP.
> 
> An alternative way to remove these unnecessary RCU grace period
> chainings is using the RCU polling API to check whether or not a normal
> RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it
> needs an unsigned long space for each free element or each call, and
> it is not affordable for local storage element, so as for now always
> rcu_trace_implies_rcu_gp().
> 
> Comments are always welcome.

For #2-#4:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

(#1 already has my Signed-off-by, in case anyone was wondering.)

> Change Log:
> 
> v2:
>  * codify the implication of RCU Tasks Trace grace period instead of
>    assuming for it
> 
> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com
> 
> Hou Tao (3):
>   bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
>   bpf: Use rcu_trace_implies_rcu_gp() in local storage map
>   bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
> 
> Paul E. McKenney (1):
>   rcu-tasks: Provide rcu_trace_implies_rcu_gp()
> 
>  include/linux/rcupdate.h       | 12 ++++++++++++
>  kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
>  kernel/bpf/core.c              |  8 +++++++-
>  kernel/bpf/memalloc.c          | 15 ++++++++++-----
>  kernel/rcu/tasks.h             |  2 ++
>  5 files changed, 42 insertions(+), 8 deletions(-)
> 
> -- 
> 2.29.2
>
Hou Tao Oct. 18, 2022, 7:31 a.m. UTC | #2
Hi,

On 10/17/2022 9:39 PM, Paul E. McKenney wrote:
> On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> Now bpf uses RCU grace period chaining to wait for the completion of
>> access from both sleepable and non-sleepable bpf program: calling
>> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
>> period, then in its callback calls call_rcu() or kfree_rcu() to wait for
>> a normal RCU grace period.
>>
>> According to the implementation of RCU Tasks Trace, it inovkes
>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
>> will imply one normal RCU grace period. To codify the implication,
>> introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch
>> #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem.
>> Other two uses of call_rcu_tasks_trace() are unchanged: for
>> __bpf_prog_put_rcu() there is no gp chain and for
>> __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU
>> tasks GP.
>>
>> An alternative way to remove these unnecessary RCU grace period
>> chainings is using the RCU polling API to check whether or not a normal
>> RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it
>> needs an unsigned long space for each free element or each call, and
>> it is not affordable for local storage element, so as for now always
>> rcu_trace_implies_rcu_gp().
>>
>> Comments are always welcome.
> For #2-#4:
>
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>
> (#1 already has my Signed-off-by, in case anyone was wondering.)
Thanks for the review. But it seems I missed another possible use case for
rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for
free_mem_alloc() is as following:

static void free_mem_alloc(struct bpf_mem_alloc *ma)
{
        /* waiting_for_gp lists was drained, but __free_rcu might
         * still execute. Wait for it now before we freeing percpu caches.
         */
        rcu_barrier_tasks_trace();
        rcu_barrier();
        free_mem_alloc_no_barrier(ma);
}

It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion
of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to
check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is
no need to call rcu_barrier().

static void free_mem_alloc(struct bpf_mem_alloc *ma)
{
        /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace()
         * or __free_rcu() might still execute. Wait for it now before we
         * freeing percpu caches.
         */
        rcu_barrier_tasks_trace();
        if (!rcu_trace_implies_rcu_gp())
                rcu_barrier();
        free_mem_alloc_no_barrier(ma);
}

Does the above change look good to you ? If it is, I will post v3 to include the
above change and add your Reviewed-by tag.
>
>> Change Log:
>>
>> v2:
>>  * codify the implication of RCU Tasks Trace grace period instead of
>>    assuming for it
>>
>> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com
>>
>> Hou Tao (3):
>>   bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
>>   bpf: Use rcu_trace_implies_rcu_gp() in local storage map
>>   bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
>>
>> Paul E. McKenney (1):
>>   rcu-tasks: Provide rcu_trace_implies_rcu_gp()
>>
>>  include/linux/rcupdate.h       | 12 ++++++++++++
>>  kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
>>  kernel/bpf/core.c              |  8 +++++++-
>>  kernel/bpf/memalloc.c          | 15 ++++++++++-----
>>  kernel/rcu/tasks.h             |  2 ++
>>  5 files changed, 42 insertions(+), 8 deletions(-)
>>
>> -- 
>> 2.29.2
>>
> .
Paul E. McKenney Oct. 18, 2022, 3:08 p.m. UTC | #3
On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/17/2022 9:39 PM, Paul E. McKenney wrote:
> > On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> >>
> >> Now bpf uses RCU grace period chaining to wait for the completion of
> >> access from both sleepable and non-sleepable bpf program: calling
> >> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
> >> period, then in its callback calls call_rcu() or kfree_rcu() to wait for
> >> a normal RCU grace period.
> >>
> >> According to the implementation of RCU Tasks Trace, it inovkes
> >> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> >> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> >> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> >> will imply one normal RCU grace period. To codify the implication,
> >> introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch
> >> #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem.
> >> Other two uses of call_rcu_tasks_trace() are unchanged: for
> >> __bpf_prog_put_rcu() there is no gp chain and for
> >> __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU
> >> tasks GP.
> >>
> >> An alternative way to remove these unnecessary RCU grace period
> >> chainings is using the RCU polling API to check whether or not a normal
> >> RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it
> >> needs an unsigned long space for each free element or each call, and
> >> it is not affordable for local storage element, so as for now always
> >> rcu_trace_implies_rcu_gp().
> >>
> >> Comments are always welcome.
> > For #2-#4:
> >
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > (#1 already has my Signed-off-by, in case anyone was wondering.)
> Thanks for the review. But it seems I missed another possible use case for
> rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for
> free_mem_alloc() is as following:
> 
> static void free_mem_alloc(struct bpf_mem_alloc *ma)
> {
>         /* waiting_for_gp lists was drained, but __free_rcu might
>          * still execute. Wait for it now before we freeing percpu caches.
>          */
>         rcu_barrier_tasks_trace();
>         rcu_barrier();
>         free_mem_alloc_no_barrier(ma);
> }
> 
> It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion
> of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to
> check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is
> no need to call rcu_barrier().
> 
> static void free_mem_alloc(struct bpf_mem_alloc *ma)
> {
>         /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace()
>          * or __free_rcu() might still execute. Wait for it now before we
>          * freeing percpu caches.
>          */
>         rcu_barrier_tasks_trace();
>         if (!rcu_trace_implies_rcu_gp())
>                 rcu_barrier();
>         free_mem_alloc_no_barrier(ma);
> }
> 
> Does the above change look good to you ? If it is, I will post v3 to include the
> above change and add your Reviewed-by tag.

Unfortunately, although synchronize_rcu_tasks_trace() implies
that synchronize_rcu(), there is no relationship between the
callbacks.  Furthermore, rcu_barrier_tasks_trace() does not imply
synchronize_rcu_tasks_trace().

So the above change really would break things.  Please do not do it.

You could use workqueues or similar to make the rcu_barrier_tasks_trace()
and the rcu_barrier() wait concurrently, though.  This would of course
require some synchronization.

							Thanx, Paul

> >> Change Log:
> >>
> >> v2:
> >>  * codify the implication of RCU Tasks Trace grace period instead of
> >>    assuming for it
> >>
> >> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com
> >>
> >> Hou Tao (3):
> >>   bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
> >>   bpf: Use rcu_trace_implies_rcu_gp() in local storage map
> >>   bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
> >>
> >> Paul E. McKenney (1):
> >>   rcu-tasks: Provide rcu_trace_implies_rcu_gp()
> >>
> >>  include/linux/rcupdate.h       | 12 ++++++++++++
> >>  kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
> >>  kernel/bpf/core.c              |  8 +++++++-
> >>  kernel/bpf/memalloc.c          | 15 ++++++++++-----
> >>  kernel/rcu/tasks.h             |  2 ++
> >>  5 files changed, 42 insertions(+), 8 deletions(-)
> >>
> >> -- 
> >> 2.29.2
> >>
> > .
>
patchwork-bot+netdevbpf@kernel.org Oct. 18, 2022, 5:40 p.m. UTC | #4
Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri, 14 Oct 2022 19:39:42 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> Now bpf uses RCU grace period chaining to wait for the completion of
> access from both sleepable and non-sleepable bpf program: calling
> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
> period, then in its callback calls call_rcu() or kfree_rcu() to wait for
> a normal RCU grace period.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2,1/4] rcu-tasks: Provide rcu_trace_implies_rcu_gp()
    https://git.kernel.org/bpf/bpf-next/c/e6c86c513f44
  - [bpf-next,v2,2/4] bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
    https://git.kernel.org/bpf/bpf-next/c/59be91e5e70a
  - [bpf-next,v2,3/4] bpf: Use rcu_trace_implies_rcu_gp() in local storage map
    https://git.kernel.org/bpf/bpf-next/c/d39d1445d377
  - [bpf-next,v2,4/4] bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
    https://git.kernel.org/bpf/bpf-next/c/4835f9ee980c

You are awesome, thank you!
Hou Tao Oct. 21, 2022, 7:08 a.m. UTC | #5
Hi,

On 10/18/2022 11:08 PM, Paul E. McKenney wrote:
> On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/17/2022 9:39 PM, Paul E. McKenney wrote:
>>> On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote:
SNIP
>>>
>> Thanks for the review. But it seems I missed another possible use case for
>> rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for
>> free_mem_alloc() is as following:
>>
>> static void free_mem_alloc(struct bpf_mem_alloc *ma)
>> {
>>         /* waiting_for_gp lists was drained, but __free_rcu might
>>          * still execute. Wait for it now before we freeing percpu caches.
>>          */
>>         rcu_barrier_tasks_trace();
>>         rcu_barrier();
>>         free_mem_alloc_no_barrier(ma);
>> }
>>
>> It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion
>> of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to
>> check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is
>> no need to call rcu_barrier().
>>
>> static void free_mem_alloc(struct bpf_mem_alloc *ma)
>> {
>>         /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace()
>>          * or __free_rcu() might still execute. Wait for it now before we
>>          * freeing percpu caches.
>>          */
>>         rcu_barrier_tasks_trace();
>>         if (!rcu_trace_implies_rcu_gp())
>>                 rcu_barrier();
>>         free_mem_alloc_no_barrier(ma);
>> }
>>
>> Does the above change look good to you ? If it is, I will post v3 to include the
>> above change and add your Reviewed-by tag.
> Unfortunately, although synchronize_rcu_tasks_trace() implies
> that synchronize_rcu(), there is no relationship between the
> callbacks.  Furthermore, rcu_barrier_tasks_trace() does not imply
> synchronize_rcu_tasks_trace().
Yes. I see. And according to the code, if there is not pending cb,
rcu_barrier_tasks_trace() will returned immediately. It is also possible
rcu_tasks_trace kthread is in the middle of grace period waiting when invoking
rcu_barrier_task_trace(), so rcu_barrier_task_trace() does not imply
synchronize_rcu_tasks_trace().
>
> So the above change really would break things.  Please do not do it.
However I am a little confused about the conclusion. If only considering the
invocations of call_rcu() and call_rcu_tasks_trace() in kernel/bpf/memalloc.c, I
think it is safe to do so, right ? Because if  rcu_trace_implies_rcu_gp() is
true, there will be no invocation of call_rcu() and rcu_barrier_tasks_trace()
will wait for the completion of pending call_rcu_tasks_trace(). If
rcu_trace_implies_rcu_gp(), rcu_barrier_tasks_trace() and rcu_barrier() will do
the job. If considering the invocations of call_rcu() in other places, I think
it is definitely unsafe to do that, right ?
>
> You could use workqueues or similar to make the rcu_barrier_tasks_trace()
> and the rcu_barrier() wait concurrently, though.  This would of course
> require some synchronization.
Thanks for the suggestion. Will check it later.
>
> 							Thanx, Paul
>
>>>> Change Log:
>>>>
>>>> v2:
>>>>  * codify the implication of RCU Tasks Trace grace period instead of
>>>>    assuming for it
>>>>
>>>> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com
>>>>
>>>> Hou Tao (3):
>>>>   bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
>>>>   bpf: Use rcu_trace_implies_rcu_gp() in local storage map
>>>>   bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
>>>>
>>>> Paul E. McKenney (1):
>>>>   rcu-tasks: Provide rcu_trace_implies_rcu_gp()
>>>>
>>>>  include/linux/rcupdate.h       | 12 ++++++++++++
>>>>  kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
>>>>  kernel/bpf/core.c              |  8 +++++++-
>>>>  kernel/bpf/memalloc.c          | 15 ++++++++++-----
>>>>  kernel/rcu/tasks.h             |  2 ++
>>>>  5 files changed, 42 insertions(+), 8 deletions(-)
>>>>
>>>> -- 
>>>> 2.29.2
>>>>
>>> .
Paul E. McKenney Oct. 21, 2022, 6:50 p.m. UTC | #6
On Fri, Oct 21, 2022 at 03:08:21PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/18/2022 11:08 PM, Paul E. McKenney wrote:
> > On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/17/2022 9:39 PM, Paul E. McKenney wrote:
> >>> On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote:
> SNIP
> >>>
> >> Thanks for the review. But it seems I missed another possible use case for
> >> rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for
> >> free_mem_alloc() is as following:
> >>
> >> static void free_mem_alloc(struct bpf_mem_alloc *ma)
> >> {
> >>         /* waiting_for_gp lists was drained, but __free_rcu might
> >>          * still execute. Wait for it now before we freeing percpu caches.
> >>          */
> >>         rcu_barrier_tasks_trace();
> >>         rcu_barrier();
> >>         free_mem_alloc_no_barrier(ma);
> >> }
> >>
> >> It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion
> >> of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to
> >> check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is
> >> no need to call rcu_barrier().
> >>
> >> static void free_mem_alloc(struct bpf_mem_alloc *ma)
> >> {
> >>         /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace()
> >>          * or __free_rcu() might still execute. Wait for it now before we
> >>          * freeing percpu caches.
> >>          */
> >>         rcu_barrier_tasks_trace();
> >>         if (!rcu_trace_implies_rcu_gp())
> >>                 rcu_barrier();
> >>         free_mem_alloc_no_barrier(ma);
> >> }
> >>
> >> Does the above change look good to you ? If it is, I will post v3 to include the
> >> above change and add your Reviewed-by tag.
> > Unfortunately, although synchronize_rcu_tasks_trace() implies
> > that synchronize_rcu(), there is no relationship between the
> > callbacks.  Furthermore, rcu_barrier_tasks_trace() does not imply
> > synchronize_rcu_tasks_trace().
> 
> Yes. I see. And according to the code, if there is not pending cb,
> rcu_barrier_tasks_trace() will returned immediately. It is also possible
> rcu_tasks_trace kthread is in the middle of grace period waiting when invoking
> rcu_barrier_task_trace(), so rcu_barrier_task_trace() does not imply
> synchronize_rcu_tasks_trace().

Very good!

> > So the above change really would break things.  Please do not do it.
> 
> However I am a little confused about the conclusion. If only considering the
> invocations of call_rcu() and call_rcu_tasks_trace() in kernel/bpf/memalloc.c, I
> think it is safe to do so, right ? Because if  rcu_trace_implies_rcu_gp() is
> true, there will be no invocation of call_rcu() and rcu_barrier_tasks_trace()
> will wait for the completion of pending call_rcu_tasks_trace(). If
> rcu_trace_implies_rcu_gp(), rcu_barrier_tasks_trace() and rcu_barrier() will do
> the job. If considering the invocations of call_rcu() in other places, I think
> it is definitely unsafe to do that, right ?

Agreed, I am being cautious and pessimistic in assuming that there are
other call_rcu() invocations.  On the other hand, my caution and pessimism
is based on their having been other call_rcu() invocations in the past.
So please verify that there are none before making this sort of change.

							Thanx, Paul

> > You could use workqueues or similar to make the rcu_barrier_tasks_trace()
> > and the rcu_barrier() wait concurrently, though.  This would of course
> > require some synchronization.
> Thanks for the suggestion. Will check it later.
> >
> > 							Thanx, Paul
> >
> >>>> Change Log:
> >>>>
> >>>> v2:
> >>>>  * codify the implication of RCU Tasks Trace grace period instead of
> >>>>    assuming for it
> >>>>
> >>>> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com
> >>>>
> >>>> Hou Tao (3):
> >>>>   bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
> >>>>   bpf: Use rcu_trace_implies_rcu_gp() in local storage map
> >>>>   bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
> >>>>
> >>>> Paul E. McKenney (1):
> >>>>   rcu-tasks: Provide rcu_trace_implies_rcu_gp()
> >>>>
> >>>>  include/linux/rcupdate.h       | 12 ++++++++++++
> >>>>  kernel/bpf/bpf_local_storage.c | 13 +++++++++++--
> >>>>  kernel/bpf/core.c              |  8 +++++++-
> >>>>  kernel/bpf/memalloc.c          | 15 ++++++++++-----
> >>>>  kernel/rcu/tasks.h             |  2 ++
> >>>>  5 files changed, 42 insertions(+), 8 deletions(-)
> >>>>
> >>>> -- 
> >>>> 2.29.2
> >>>>
> >>> .
>