Message ID | 20221014113946.965131-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | Remove unnecessary RCU grace period chaining | expand |
On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > Now bpf uses RCU grace period chaining to wait for the completion of > access from both sleepable and non-sleepable bpf program: calling > call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace > period, then in its callback calls call_rcu() or kfree_rcu() to wait for > a normal RCU grace period. > > According to the implementation of RCU Tasks Trace, it inovkes > ->postscan_func() to wait for one RCU-tasks-trace grace period and > rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one > normal RCU grace period in turn, so one RCU-tasks-trace grace period > will imply one normal RCU grace period. To codify the implication, > introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch > #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem. > Other two uses of call_rcu_tasks_trace() are unchanged: for > __bpf_prog_put_rcu() there is no gp chain and for > __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU > tasks GP. > > An alternative way to remove these unnecessary RCU grace period > chainings is using the RCU polling API to check whether or not a normal > RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it > needs an unsigned long space for each free element or each call, and > it is not affordable for local storage element, so as for now always > rcu_trace_implies_rcu_gp(). > > Comments are always welcome. For #2-#4: Reviewed-by: Paul E. McKenney <paulmck@kernel.org> (#1 already has my Signed-off-by, in case anyone was wondering.) > Change Log: > > v2: > * codify the implication of RCU Tasks Trace grace period instead of > assuming for it > > v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com > > Hou Tao (3): > bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator > bpf: Use rcu_trace_implies_rcu_gp() in local storage map > bpf: Use rcu_trace_implies_rcu_gp() for program array freeing > > Paul E. McKenney (1): > rcu-tasks: Provide rcu_trace_implies_rcu_gp() > > include/linux/rcupdate.h | 12 ++++++++++++ > kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- > kernel/bpf/core.c | 8 +++++++- > kernel/bpf/memalloc.c | 15 ++++++++++----- > kernel/rcu/tasks.h | 2 ++ > 5 files changed, 42 insertions(+), 8 deletions(-) > > -- > 2.29.2 >
Hi, On 10/17/2022 9:39 PM, Paul E. McKenney wrote: > On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Hi, >> >> Now bpf uses RCU grace period chaining to wait for the completion of >> access from both sleepable and non-sleepable bpf program: calling >> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace >> period, then in its callback calls call_rcu() or kfree_rcu() to wait for >> a normal RCU grace period. >> >> According to the implementation of RCU Tasks Trace, it inovkes >> ->postscan_func() to wait for one RCU-tasks-trace grace period and >> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one >> normal RCU grace period in turn, so one RCU-tasks-trace grace period >> will imply one normal RCU grace period. To codify the implication, >> introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch >> #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem. >> Other two uses of call_rcu_tasks_trace() are unchanged: for >> __bpf_prog_put_rcu() there is no gp chain and for >> __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU >> tasks GP. >> >> An alternative way to remove these unnecessary RCU grace period >> chainings is using the RCU polling API to check whether or not a normal >> RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it >> needs an unsigned long space for each free element or each call, and >> it is not affordable for local storage element, so as for now always >> rcu_trace_implies_rcu_gp(). >> >> Comments are always welcome. > For #2-#4: > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > > (#1 already has my Signed-off-by, in case anyone was wondering.) Thanks for the review. But it seems I missed another possible use case for rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for free_mem_alloc() is as following: static void free_mem_alloc(struct bpf_mem_alloc *ma) { /* waiting_for_gp lists was drained, but __free_rcu might * still execute. Wait for it now before we freeing percpu caches. */ rcu_barrier_tasks_trace(); rcu_barrier(); free_mem_alloc_no_barrier(ma); } It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is no need to call rcu_barrier(). static void free_mem_alloc(struct bpf_mem_alloc *ma) { /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace() * or __free_rcu() might still execute. Wait for it now before we * freeing percpu caches. */ rcu_barrier_tasks_trace(); if (!rcu_trace_implies_rcu_gp()) rcu_barrier(); free_mem_alloc_no_barrier(ma); } Does the above change look good to you ? If it is, I will post v3 to include the above change and add your Reviewed-by tag. > >> Change Log: >> >> v2: >> * codify the implication of RCU Tasks Trace grace period instead of >> assuming for it >> >> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com >> >> Hou Tao (3): >> bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator >> bpf: Use rcu_trace_implies_rcu_gp() in local storage map >> bpf: Use rcu_trace_implies_rcu_gp() for program array freeing >> >> Paul E. McKenney (1): >> rcu-tasks: Provide rcu_trace_implies_rcu_gp() >> >> include/linux/rcupdate.h | 12 ++++++++++++ >> kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- >> kernel/bpf/core.c | 8 +++++++- >> kernel/bpf/memalloc.c | 15 ++++++++++----- >> kernel/rcu/tasks.h | 2 ++ >> 5 files changed, 42 insertions(+), 8 deletions(-) >> >> -- >> 2.29.2 >> > .
On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote: > Hi, > > On 10/17/2022 9:39 PM, Paul E. McKenney wrote: > > On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> Hi, > >> > >> Now bpf uses RCU grace period chaining to wait for the completion of > >> access from both sleepable and non-sleepable bpf program: calling > >> call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace > >> period, then in its callback calls call_rcu() or kfree_rcu() to wait for > >> a normal RCU grace period. > >> > >> According to the implementation of RCU Tasks Trace, it inovkes > >> ->postscan_func() to wait for one RCU-tasks-trace grace period and > >> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one > >> normal RCU grace period in turn, so one RCU-tasks-trace grace period > >> will imply one normal RCU grace period. To codify the implication, > >> introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch > >> #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem. > >> Other two uses of call_rcu_tasks_trace() are unchanged: for > >> __bpf_prog_put_rcu() there is no gp chain and for > >> __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU > >> tasks GP. > >> > >> An alternative way to remove these unnecessary RCU grace period > >> chainings is using the RCU polling API to check whether or not a normal > >> RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it > >> needs an unsigned long space for each free element or each call, and > >> it is not affordable for local storage element, so as for now always > >> rcu_trace_implies_rcu_gp(). > >> > >> Comments are always welcome. > > For #2-#4: > > > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > > > > (#1 already has my Signed-off-by, in case anyone was wondering.) > Thanks for the review. But it seems I missed another possible use case for > rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for > free_mem_alloc() is as following: > > static void free_mem_alloc(struct bpf_mem_alloc *ma) > { > /* waiting_for_gp lists was drained, but __free_rcu might > * still execute. Wait for it now before we freeing percpu caches. > */ > rcu_barrier_tasks_trace(); > rcu_barrier(); > free_mem_alloc_no_barrier(ma); > } > > It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion > of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to > check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is > no need to call rcu_barrier(). > > static void free_mem_alloc(struct bpf_mem_alloc *ma) > { > /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace() > * or __free_rcu() might still execute. Wait for it now before we > * freeing percpu caches. > */ > rcu_barrier_tasks_trace(); > if (!rcu_trace_implies_rcu_gp()) > rcu_barrier(); > free_mem_alloc_no_barrier(ma); > } > > Does the above change look good to you ? If it is, I will post v3 to include the > above change and add your Reviewed-by tag. Unfortunately, although synchronize_rcu_tasks_trace() implies that synchronize_rcu(), there is no relationship between the callbacks. Furthermore, rcu_barrier_tasks_trace() does not imply synchronize_rcu_tasks_trace(). So the above change really would break things. Please do not do it. You could use workqueues or similar to make the rcu_barrier_tasks_trace() and the rcu_barrier() wait concurrently, though. This would of course require some synchronization. Thanx, Paul > >> Change Log: > >> > >> v2: > >> * codify the implication of RCU Tasks Trace grace period instead of > >> assuming for it > >> > >> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com > >> > >> Hou Tao (3): > >> bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator > >> bpf: Use rcu_trace_implies_rcu_gp() in local storage map > >> bpf: Use rcu_trace_implies_rcu_gp() for program array freeing > >> > >> Paul E. McKenney (1): > >> rcu-tasks: Provide rcu_trace_implies_rcu_gp() > >> > >> include/linux/rcupdate.h | 12 ++++++++++++ > >> kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- > >> kernel/bpf/core.c | 8 +++++++- > >> kernel/bpf/memalloc.c | 15 ++++++++++----- > >> kernel/rcu/tasks.h | 2 ++ > >> 5 files changed, 42 insertions(+), 8 deletions(-) > >> > >> -- > >> 2.29.2 > >> > > . >
Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Fri, 14 Oct 2022 19:39:42 +0800 you wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > Now bpf uses RCU grace period chaining to wait for the completion of > access from both sleepable and non-sleepable bpf program: calling > call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace > period, then in its callback calls call_rcu() or kfree_rcu() to wait for > a normal RCU grace period. > > [...] Here is the summary with links: - [bpf-next,v2,1/4] rcu-tasks: Provide rcu_trace_implies_rcu_gp() https://git.kernel.org/bpf/bpf-next/c/e6c86c513f44 - [bpf-next,v2,2/4] bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator https://git.kernel.org/bpf/bpf-next/c/59be91e5e70a - [bpf-next,v2,3/4] bpf: Use rcu_trace_implies_rcu_gp() in local storage map https://git.kernel.org/bpf/bpf-next/c/d39d1445d377 - [bpf-next,v2,4/4] bpf: Use rcu_trace_implies_rcu_gp() for program array freeing https://git.kernel.org/bpf/bpf-next/c/4835f9ee980c You are awesome, thank you!
Hi, On 10/18/2022 11:08 PM, Paul E. McKenney wrote: > On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote: >> Hi, >> >> On 10/17/2022 9:39 PM, Paul E. McKenney wrote: >>> On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote: SNIP >>> >> Thanks for the review. But it seems I missed another possible use case for >> rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for >> free_mem_alloc() is as following: >> >> static void free_mem_alloc(struct bpf_mem_alloc *ma) >> { >> /* waiting_for_gp lists was drained, but __free_rcu might >> * still execute. Wait for it now before we freeing percpu caches. >> */ >> rcu_barrier_tasks_trace(); >> rcu_barrier(); >> free_mem_alloc_no_barrier(ma); >> } >> >> It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion >> of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to >> check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is >> no need to call rcu_barrier(). >> >> static void free_mem_alloc(struct bpf_mem_alloc *ma) >> { >> /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace() >> * or __free_rcu() might still execute. Wait for it now before we >> * freeing percpu caches. >> */ >> rcu_barrier_tasks_trace(); >> if (!rcu_trace_implies_rcu_gp()) >> rcu_barrier(); >> free_mem_alloc_no_barrier(ma); >> } >> >> Does the above change look good to you ? If it is, I will post v3 to include the >> above change and add your Reviewed-by tag. > Unfortunately, although synchronize_rcu_tasks_trace() implies > that synchronize_rcu(), there is no relationship between the > callbacks. Furthermore, rcu_barrier_tasks_trace() does not imply > synchronize_rcu_tasks_trace(). Yes. I see. And according to the code, if there is not pending cb, rcu_barrier_tasks_trace() will returned immediately. It is also possible rcu_tasks_trace kthread is in the middle of grace period waiting when invoking rcu_barrier_task_trace(), so rcu_barrier_task_trace() does not imply synchronize_rcu_tasks_trace(). > > So the above change really would break things. Please do not do it. However I am a little confused about the conclusion. If only considering the invocations of call_rcu() and call_rcu_tasks_trace() in kernel/bpf/memalloc.c, I think it is safe to do so, right ? Because if rcu_trace_implies_rcu_gp() is true, there will be no invocation of call_rcu() and rcu_barrier_tasks_trace() will wait for the completion of pending call_rcu_tasks_trace(). If rcu_trace_implies_rcu_gp(), rcu_barrier_tasks_trace() and rcu_barrier() will do the job. If considering the invocations of call_rcu() in other places, I think it is definitely unsafe to do that, right ? > > You could use workqueues or similar to make the rcu_barrier_tasks_trace() > and the rcu_barrier() wait concurrently, though. This would of course > require some synchronization. Thanks for the suggestion. Will check it later. > > Thanx, Paul > >>>> Change Log: >>>> >>>> v2: >>>> * codify the implication of RCU Tasks Trace grace period instead of >>>> assuming for it >>>> >>>> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com >>>> >>>> Hou Tao (3): >>>> bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator >>>> bpf: Use rcu_trace_implies_rcu_gp() in local storage map >>>> bpf: Use rcu_trace_implies_rcu_gp() for program array freeing >>>> >>>> Paul E. McKenney (1): >>>> rcu-tasks: Provide rcu_trace_implies_rcu_gp() >>>> >>>> include/linux/rcupdate.h | 12 ++++++++++++ >>>> kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- >>>> kernel/bpf/core.c | 8 +++++++- >>>> kernel/bpf/memalloc.c | 15 ++++++++++----- >>>> kernel/rcu/tasks.h | 2 ++ >>>> 5 files changed, 42 insertions(+), 8 deletions(-) >>>> >>>> -- >>>> 2.29.2 >>>> >>> .
On Fri, Oct 21, 2022 at 03:08:21PM +0800, Hou Tao wrote: > Hi, > > On 10/18/2022 11:08 PM, Paul E. McKenney wrote: > > On Tue, Oct 18, 2022 at 03:31:20PM +0800, Hou Tao wrote: > >> Hi, > >> > >> On 10/17/2022 9:39 PM, Paul E. McKenney wrote: > >>> On Fri, Oct 14, 2022 at 07:39:42PM +0800, Hou Tao wrote: > SNIP > >>> > >> Thanks for the review. But it seems I missed another possible use case for > >> rcu_trace_implies_rcu_gp() in bpf memory allocator. The code snippet for > >> free_mem_alloc() is as following: > >> > >> static void free_mem_alloc(struct bpf_mem_alloc *ma) > >> { > >> /* waiting_for_gp lists was drained, but __free_rcu might > >> * still execute. Wait for it now before we freeing percpu caches. > >> */ > >> rcu_barrier_tasks_trace(); > >> rcu_barrier(); > >> free_mem_alloc_no_barrier(ma); > >> } > >> > >> It uses rcu_barrier_tasks_trace() and rcu_barrier() to wait for the completion > >> of pending call_rcu_tasks_trace()s and call_rcu()s. I think it is also safe to > >> check rcu_trace_implies_rcu_gp() in free_mem_alloc() and if it is true, there is > >> no need to call rcu_barrier(). > >> > >> static void free_mem_alloc(struct bpf_mem_alloc *ma) > >> { > >> /* waiting_for_gp lists was drained, but __free_rcu_tasks_trace() > >> * or __free_rcu() might still execute. Wait for it now before we > >> * freeing percpu caches. > >> */ > >> rcu_barrier_tasks_trace(); > >> if (!rcu_trace_implies_rcu_gp()) > >> rcu_barrier(); > >> free_mem_alloc_no_barrier(ma); > >> } > >> > >> Does the above change look good to you ? If it is, I will post v3 to include the > >> above change and add your Reviewed-by tag. > > Unfortunately, although synchronize_rcu_tasks_trace() implies > > that synchronize_rcu(), there is no relationship between the > > callbacks. Furthermore, rcu_barrier_tasks_trace() does not imply > > synchronize_rcu_tasks_trace(). > > Yes. I see. And according to the code, if there is not pending cb, > rcu_barrier_tasks_trace() will returned immediately. It is also possible > rcu_tasks_trace kthread is in the middle of grace period waiting when invoking > rcu_barrier_task_trace(), so rcu_barrier_task_trace() does not imply > synchronize_rcu_tasks_trace(). Very good! > > So the above change really would break things. Please do not do it. > > However I am a little confused about the conclusion. If only considering the > invocations of call_rcu() and call_rcu_tasks_trace() in kernel/bpf/memalloc.c, I > think it is safe to do so, right ? Because if rcu_trace_implies_rcu_gp() is > true, there will be no invocation of call_rcu() and rcu_barrier_tasks_trace() > will wait for the completion of pending call_rcu_tasks_trace(). If > rcu_trace_implies_rcu_gp(), rcu_barrier_tasks_trace() and rcu_barrier() will do > the job. If considering the invocations of call_rcu() in other places, I think > it is definitely unsafe to do that, right ? Agreed, I am being cautious and pessimistic in assuming that there are other call_rcu() invocations. On the other hand, my caution and pessimism is based on their having been other call_rcu() invocations in the past. So please verify that there are none before making this sort of change. Thanx, Paul > > You could use workqueues or similar to make the rcu_barrier_tasks_trace() > > and the rcu_barrier() wait concurrently, though. This would of course > > require some synchronization. > Thanks for the suggestion. Will check it later. > > > > Thanx, Paul > > > >>>> Change Log: > >>>> > >>>> v2: > >>>> * codify the implication of RCU Tasks Trace grace period instead of > >>>> assuming for it > >>>> > >>>> v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com > >>>> > >>>> Hou Tao (3): > >>>> bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator > >>>> bpf: Use rcu_trace_implies_rcu_gp() in local storage map > >>>> bpf: Use rcu_trace_implies_rcu_gp() for program array freeing > >>>> > >>>> Paul E. McKenney (1): > >>>> rcu-tasks: Provide rcu_trace_implies_rcu_gp() > >>>> > >>>> include/linux/rcupdate.h | 12 ++++++++++++ > >>>> kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- > >>>> kernel/bpf/core.c | 8 +++++++- > >>>> kernel/bpf/memalloc.c | 15 ++++++++++----- > >>>> kernel/rcu/tasks.h | 2 ++ > >>>> 5 files changed, 42 insertions(+), 8 deletions(-) > >>>> > >>>> -- > >>>> 2.29.2 > >>>> > >>> . >
From: Hou Tao <houtao1@huawei.com> Hi, Now bpf uses RCU grace period chaining to wait for the completion of access from both sleepable and non-sleepable bpf program: calling call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace period, then in its callback calls call_rcu() or kfree_rcu() to wait for a normal RCU grace period. According to the implementation of RCU Tasks Trace, it inovkes ->postscan_func() to wait for one RCU-tasks-trace grace period and rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one normal RCU grace period in turn, so one RCU-tasks-trace grace period will imply one normal RCU grace period. To codify the implication, introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch #2~#4 to remove unnecessary call_rcu() or kfree_rcu() in bpf subsystem. Other two uses of call_rcu_tasks_trace() are unchanged: for __bpf_prog_put_rcu() there is no gp chain and for __bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU tasks GP. An alternative way to remove these unnecessary RCU grace period chainings is using the RCU polling API to check whether or not a normal RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it needs an unsigned long space for each free element or each call, and it is not affordable for local storage element, so as for now always rcu_trace_implies_rcu_gp(). Comments are always welcome. Change Log: v2: * codify the implication of RCU Tasks Trace grace period instead of assuming for it v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com Hou Tao (3): bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator bpf: Use rcu_trace_implies_rcu_gp() in local storage map bpf: Use rcu_trace_implies_rcu_gp() for program array freeing Paul E. McKenney (1): rcu-tasks: Provide rcu_trace_implies_rcu_gp() include/linux/rcupdate.h | 12 ++++++++++++ kernel/bpf/bpf_local_storage.c | 13 +++++++++++-- kernel/bpf/core.c | 8 +++++++- kernel/bpf/memalloc.c | 15 ++++++++++----- kernel/rcu/tasks.h | 2 ++ 5 files changed, 42 insertions(+), 8 deletions(-)