Message ID | 20240903174603.3554182-5-andrii@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | uprobes: RCU-protected hot path optimizations | expand |
Hello Andrii, On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote: > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > uprobes, so we need to get rid of it to improve uprobe performance and > multi-CPU scalability. > > First, we turn uprobe's consumer list to a typical doubly-linked list > and utilize existing RCU-aware helpers for traversing such lists, as > well as adding and removing elements from it. > > For entry uprobes we already have SRCU protection active since before > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > won't go away from under us, but we add SRCU protection around consumer > list traversal. I am seeing the following message in a kernel with RCU_PROVE_LOCKING: kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! It seems the SRCU is not held, when coming from mmap_region -> uprobe_mmap. Here is the message I got in my debug kernel. (sorry for not decoding it, but, the stack trace is clear enough). WARNING: suspicious RCU usage 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G W E N ----------------------------- kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by env/441330: #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0 #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548 #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8 stack backtrace: CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G W E N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 Call trace: dump_backtrace+0x10c/0x198 show_stack+0x24/0x38 __dump_stack+0x28/0x38 dump_stack_lvl+0x74/0xa8 dump_stack+0x18/0x28 lockdep_rcu_suspicious+0x178/0x2c8 filter_chain+0xdc/0xe8 uprobe_mmap+0x2e0/0x548 mmap_region+0x510/0x988 do_mmap+0x444/0x528 vm_mmap_pgoff+0xf8/0x1d0 ksys_mmap_pgoff+0x184/0x2d8 That said, it seems we want to hold the SRCU, before reaching the filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap() solves the problem, but, I might be missing something, since I am not familiar with this code. How does the following patch look like? commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f Author: Breno Leitao <leitao@debian.org> Date: Mon Nov 4 06:53:31 2024 -0800 uprobes: Get SRCU lock before traverseing the list list_for_each_entry_srcu() is being called without holding the lock, which causes LOCKDEP (when enabled with RCU_PROVING) to complain such as: kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! Get the SRCU uprobes_srcu lock before calling filter_chain(), which needs to have the SRCU lock hold, since it is going to call list_for_each_entry_srcu(). Signed-off-by: Breno Leitao <leitao@debian.org> Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection") diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4b52cb2ae6d62..cc9d4ddeea9a6 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma) struct list_head tmp_list; struct uprobe *uprobe, *u; struct inode *inode; + int srcu_idx; if (no_uprobe_events()) return 0; @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma) mutex_lock(uprobes_mmap_hash(inode)); build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); + srcu_idx = srcu_read_lock(&uprobes_srcu); /* * We can race with uprobe_unregister(), this uprobe can be already * removed. But in this case filter_chain() must return false, all @@ -1422,6 +1424,7 @@ int uprobe_mmap(struct vm_area_struct *vma) } put_uprobe(uprobe); } + srcu_read_unlock(&uprobes_srcu, srcu_idx); mutex_unlock(uprobes_mmap_hash(inode)); return 0;
On Wed, Nov 6, 2024 at 4:03 AM Breno Leitao <leitao@debian.org> wrote: > > Hello Andrii, > > On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote: > > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > > uprobes, so we need to get rid of it to improve uprobe performance and > > multi-CPU scalability. > > > > First, we turn uprobe's consumer list to a typical doubly-linked list > > and utilize existing RCU-aware helpers for traversing such lists, as > > well as adding and removing elements from it. > > > > For entry uprobes we already have SRCU protection active since before > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > > won't go away from under us, but we add SRCU protection around consumer > > list traversal. > > I am seeing the following message in a kernel with RCU_PROVE_LOCKING: > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > It seems the SRCU is not held, when coming from mmap_region -> > uprobe_mmap. Here is the message I got in my debug kernel. (sorry for > not decoding it, but, the stack trace is clear enough). > > WARNING: suspicious RCU usage > 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G W E N > ----------------------------- > kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > 3 locks held by env/441330: > #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0 > #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548 > #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8 > > stack backtrace: > CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G W E N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 > Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST > Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 > Call trace: > dump_backtrace+0x10c/0x198 > show_stack+0x24/0x38 > __dump_stack+0x28/0x38 > dump_stack_lvl+0x74/0xa8 > dump_stack+0x18/0x28 > lockdep_rcu_suspicious+0x178/0x2c8 > filter_chain+0xdc/0xe8 > uprobe_mmap+0x2e0/0x548 > mmap_region+0x510/0x988 > do_mmap+0x444/0x528 > vm_mmap_pgoff+0xf8/0x1d0 > ksys_mmap_pgoff+0x184/0x2d8 > > > That said, it seems we want to hold the SRCU, before reaching the > filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap() > solves the problem, but, I might be missing something, since I am not familiar > with this code. > > How does the following patch look like? > > commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f > Author: Breno Leitao <leitao@debian.org> > Date: Mon Nov 4 06:53:31 2024 -0800 > > uprobes: Get SRCU lock before traverseing the list > > list_for_each_entry_srcu() is being called without holding the lock, > which causes LOCKDEP (when enabled with RCU_PROVING) to complain such > as: > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > Get the SRCU uprobes_srcu lock before calling filter_chain(), which > needs to have the SRCU lock hold, since it is going to call > list_for_each_entry_srcu(). > > Signed-off-by: Breno Leitao <leitao@debian.org> > Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection") > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 4b52cb2ae6d62..cc9d4ddeea9a6 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > struct list_head tmp_list; > struct uprobe *uprobe, *u; > struct inode *inode; > + int srcu_idx; > > if (no_uprobe_events()) > return 0; > @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > mutex_lock(uprobes_mmap_hash(inode)); > build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); > + srcu_idx = srcu_read_lock(&uprobes_srcu); Hey Breno, Thanks for catching that (production testing FTW, right?!). But I think you a) adding wrong RCU protection flavor (it has to be rcu_read_lock_trace()/rcu_read_unlock_trace(), see uprobe_apply() for an example) and b) I think this is the wrong place to add it. We should add it inside filter_chain(). filter_chain() is called from three places, only one of which is already RCU protected (that's the handler_chain() case). But there is also register_for_each_vma(), which needs RCU protection as well. So can you resend the patch as a stand-alone patch, switch to RCU Tasks Trace flavor, and add the protection inside filter_chain()? Thank you! P.S. pending_list traversal that you (accidentally) protect as well in your patch doesn't need RCU protection, so there is no problem with moving into filter_chain() for RCU stuff. > /* > * We can race with uprobe_unregister(), this uprobe can be already > * removed. But in this case filter_chain() must return false, all > @@ -1422,6 +1424,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > } > put_uprobe(uprobe); > } > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > mutex_unlock(uprobes_mmap_hash(inode)); > > return 0; >
Hello Andrii, On Wed, Nov 06, 2024 at 08:25:25AM -0800, Andrii Nakryiko wrote: > On Wed, Nov 6, 2024 at 4:03 AM Breno Leitao <leitao@debian.org> wrote: > > On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote: > > > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > > > uprobes, so we need to get rid of it to improve uprobe performance and > > > multi-CPU scalability. > > > > > > First, we turn uprobe's consumer list to a typical doubly-linked list > > > and utilize existing RCU-aware helpers for traversing such lists, as > > > well as adding and removing elements from it. > > > > > > For entry uprobes we already have SRCU protection active since before > > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > > > won't go away from under us, but we add SRCU protection around consumer > > > list traversal. > > > > I am seeing the following message in a kernel with RCU_PROVE_LOCKING: > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > It seems the SRCU is not held, when coming from mmap_region -> > > uprobe_mmap. Here is the message I got in my debug kernel. (sorry for > > not decoding it, but, the stack trace is clear enough). > > > > WARNING: suspicious RCU usage > > 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G W E N > > ----------------------------- > > kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!! > > > > other info that might help us debug this: > > > > rcu_scheduler_active = 2, debug_locks = 1 > > 3 locks held by env/441330: > > #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0 > > #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548 > > #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8 > > > > stack backtrace: > > CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G W E N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 > > Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST > > Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 > > Call trace: > > dump_backtrace+0x10c/0x198 > > show_stack+0x24/0x38 > > __dump_stack+0x28/0x38 > > dump_stack_lvl+0x74/0xa8 > > dump_stack+0x18/0x28 > > lockdep_rcu_suspicious+0x178/0x2c8 > > filter_chain+0xdc/0xe8 > > uprobe_mmap+0x2e0/0x548 > > mmap_region+0x510/0x988 > > do_mmap+0x444/0x528 > > vm_mmap_pgoff+0xf8/0x1d0 > > ksys_mmap_pgoff+0x184/0x2d8 > > > > > > That said, it seems we want to hold the SRCU, before reaching the > > filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap() > > solves the problem, but, I might be missing something, since I am not familiar > > with this code. > > > > How does the following patch look like? > > > > commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f > > Author: Breno Leitao <leitao@debian.org> > > Date: Mon Nov 4 06:53:31 2024 -0800 > > > > uprobes: Get SRCU lock before traverseing the list > > > > list_for_each_entry_srcu() is being called without holding the lock, > > which causes LOCKDEP (when enabled with RCU_PROVING) to complain such > > as: > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > Get the SRCU uprobes_srcu lock before calling filter_chain(), which > > needs to have the SRCU lock hold, since it is going to call > > list_for_each_entry_srcu(). > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection") > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index 4b52cb2ae6d62..cc9d4ddeea9a6 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > struct list_head tmp_list; > > struct uprobe *uprobe, *u; > > struct inode *inode; > > + int srcu_idx; > > > > if (no_uprobe_events()) > > return 0; > > @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > > > mutex_lock(uprobes_mmap_hash(inode)); > > build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); > > + srcu_idx = srcu_read_lock(&uprobes_srcu); > > Thanks for catching that (production testing FTW, right?!). Correct. I am running some hosts with RCU_PROVING and I am finding some cases where RCU protected areas are touched without holding the RCU read lock. > But I think you a) adding wrong RCU protection flavor (it has to be > rcu_read_lock_trace()/rcu_read_unlock_trace(), see uprobe_apply() for > an example) and b) I think this is the wrong place to add it. We > should add it inside filter_chain(). filter_chain() is called from > three places, only one of which is already RCU protected (that's the > handler_chain() case). But there is also register_for_each_vma(), > which needs RCU protection as well. Thanks for the guidance! My initial plan was to protect filter_chain(), but, handler_chain() already has the lock. Is it OK to get into a critical section in a nested form? The code will be something like: handle_swbp() { rcu_read_lock_trace(); handler_chain() { filter_chain() { rcu_read_lock_trace(); list_for_each_entry_rcu() rcu_read_lock_trace(); } } rcu_read_lock_trace(); } Is this nested locking fine? Thanks --breno
On Thu, Nov 7, 2024 at 3:35 AM Breno Leitao <leitao@debian.org> wrote: > > Hello Andrii, > > On Wed, Nov 06, 2024 at 08:25:25AM -0800, Andrii Nakryiko wrote: > > On Wed, Nov 6, 2024 at 4:03 AM Breno Leitao <leitao@debian.org> wrote: > > > On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote: > > > > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > > > > uprobes, so we need to get rid of it to improve uprobe performance and > > > > multi-CPU scalability. > > > > > > > > First, we turn uprobe's consumer list to a typical doubly-linked list > > > > and utilize existing RCU-aware helpers for traversing such lists, as > > > > well as adding and removing elements from it. > > > > > > > > For entry uprobes we already have SRCU protection active since before > > > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > > > > won't go away from under us, but we add SRCU protection around consumer > > > > list traversal. > > > > > > I am seeing the following message in a kernel with RCU_PROVE_LOCKING: > > > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > > > It seems the SRCU is not held, when coming from mmap_region -> > > > uprobe_mmap. Here is the message I got in my debug kernel. (sorry for > > > not decoding it, but, the stack trace is clear enough). > > > > > > WARNING: suspicious RCU usage > > > 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G W E N > > > ----------------------------- > > > kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!! > > > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > 3 locks held by env/441330: > > > #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0 > > > #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548 > > > #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8 > > > > > > stack backtrace: > > > CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G W E N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 > > > Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST > > > Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 > > > Call trace: > > > dump_backtrace+0x10c/0x198 > > > show_stack+0x24/0x38 > > > __dump_stack+0x28/0x38 > > > dump_stack_lvl+0x74/0xa8 > > > dump_stack+0x18/0x28 > > > lockdep_rcu_suspicious+0x178/0x2c8 > > > filter_chain+0xdc/0xe8 > > > uprobe_mmap+0x2e0/0x548 > > > mmap_region+0x510/0x988 > > > do_mmap+0x444/0x528 > > > vm_mmap_pgoff+0xf8/0x1d0 > > > ksys_mmap_pgoff+0x184/0x2d8 > > > > > > > > > That said, it seems we want to hold the SRCU, before reaching the > > > filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap() > > > solves the problem, but, I might be missing something, since I am not familiar > > > with this code. > > > > > > How does the following patch look like? > > > > > > commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f > > > Author: Breno Leitao <leitao@debian.org> > > > Date: Mon Nov 4 06:53:31 2024 -0800 > > > > > > uprobes: Get SRCU lock before traverseing the list > > > > > > list_for_each_entry_srcu() is being called without holding the lock, > > > which causes LOCKDEP (when enabled with RCU_PROVING) to complain such > > > as: > > > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > > > Get the SRCU uprobes_srcu lock before calling filter_chain(), which > > > needs to have the SRCU lock hold, since it is going to call > > > list_for_each_entry_srcu(). > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection") > > > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > > index 4b52cb2ae6d62..cc9d4ddeea9a6 100644 > > > --- a/kernel/events/uprobes.c > > > +++ b/kernel/events/uprobes.c > > > @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > > struct list_head tmp_list; > > > struct uprobe *uprobe, *u; > > > struct inode *inode; > > > + int srcu_idx; > > > > > > if (no_uprobe_events()) > > > return 0; > > > @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > > > > > mutex_lock(uprobes_mmap_hash(inode)); > > > build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); > > > + srcu_idx = srcu_read_lock(&uprobes_srcu); > > > > Thanks for catching that (production testing FTW, right?!). > > Correct. I am running some hosts with RCU_PROVING and I am finding some > cases where RCU protected areas are touched without holding the RCU read > lock. > > > But I think you a) adding wrong RCU protection flavor (it has to be > > rcu_read_lock_trace()/rcu_read_unlock_trace(), see uprobe_apply() for > > an example) and b) I think this is the wrong place to add it. We > > should add it inside filter_chain(). filter_chain() is called from > > three places, only one of which is already RCU protected (that's the > > handler_chain() case). But there is also register_for_each_vma(), > > which needs RCU protection as well. > > Thanks for the guidance! > > My initial plan was to protect filter_chain(), but, handler_chain() > already has the lock. Is it OK to get into a critical section in a > nested form? > > The code will be something like: > > handle_swbp() { > rcu_read_lock_trace(); > handler_chain() { > filter_chain() { > rcu_read_lock_trace(); > list_for_each_entry_rcu() > rcu_read_lock_trace(); > } > } > rcu_read_lock_trace(); > } > > Is this nested locking fine? > Yes, it's totally fine to nest RCU lock regions. > Thanks > --breno
oN tHU, nov 07, 2024 at 08:01:05AM -0800, Andrii Nakryiko wrote: > On Thu, Nov 7, 2024 at 3:35 AM Breno Leitao <leitao@debian.org> wrote: > > > > Hello Andrii, > > > > On Wed, Nov 06, 2024 at 08:25:25AM -0800, Andrii Nakryiko wrote: > > > On Wed, Nov 6, 2024 at 4:03 AM Breno Leitao <leitao@debian.org> wrote: > > > > On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote: > > > > > uprobe->register_rwsem is one of a few big bottlenecks to scalability of > > > > > uprobes, so we need to get rid of it to improve uprobe performance and > > > > > multi-CPU scalability. > > > > > > > > > > First, we turn uprobe's consumer list to a typical doubly-linked list > > > > > and utilize existing RCU-aware helpers for traversing such lists, as > > > > > well as adding and removing elements from it. > > > > > > > > > > For entry uprobes we already have SRCU protection active since before > > > > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe > > > > > won't go away from under us, but we add SRCU protection around consumer > > > > > list traversal. > > > > > > > > I am seeing the following message in a kernel with RCU_PROVE_LOCKING: > > > > > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > > > > > It seems the SRCU is not held, when coming from mmap_region -> > > > > uprobe_mmap. Here is the message I got in my debug kernel. (sorry for > > > > not decoding it, but, the stack trace is clear enough). > > > > > > > > WARNING: suspicious RCU usage > > > > 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G W E N > > > > ----------------------------- > > > > kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!! > > > > > > > > other info that might help us debug this: > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > 3 locks held by env/441330: > > > > #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0 > > > > #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548 > > > > #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8 > > > > > > > > stack backtrace: > > > > CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G W E N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 > > > > Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST > > > > Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 > > > > Call trace: > > > > dump_backtrace+0x10c/0x198 > > > > show_stack+0x24/0x38 > > > > __dump_stack+0x28/0x38 > > > > dump_stack_lvl+0x74/0xa8 > > > > dump_stack+0x18/0x28 > > > > lockdep_rcu_suspicious+0x178/0x2c8 > > > > filter_chain+0xdc/0xe8 > > > > uprobe_mmap+0x2e0/0x548 > > > > mmap_region+0x510/0x988 > > > > do_mmap+0x444/0x528 > > > > vm_mmap_pgoff+0xf8/0x1d0 > > > > ksys_mmap_pgoff+0x184/0x2d8 > > > > > > > > > > > > That said, it seems we want to hold the SRCU, before reaching the > > > > filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap() > > > > solves the problem, but, I might be missing something, since I am not familiar > > > > with this code. > > > > > > > > How does the following patch look like? > > > > > > > > commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f > > > > Author: Breno Leitao <leitao@debian.org> > > > > Date: Mon Nov 4 06:53:31 2024 -0800 > > > > > > > > uprobes: Get SRCU lock before traverseing the list > > > > > > > > list_for_each_entry_srcu() is being called without holding the lock, > > > > which causes LOCKDEP (when enabled with RCU_PROVING) to complain such > > > > as: > > > > > > > > kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!! > > > > > > > > Get the SRCU uprobes_srcu lock before calling filter_chain(), which > > > > needs to have the SRCU lock hold, since it is going to call > > > > list_for_each_entry_srcu(). > > > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection") > > > > > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > > > index 4b52cb2ae6d62..cc9d4ddeea9a6 100644 > > > > --- a/kernel/events/uprobes.c > > > > +++ b/kernel/events/uprobes.c > > > > @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > > > struct list_head tmp_list; > > > > struct uprobe *uprobe, *u; > > > > struct inode *inode; > > > > + int srcu_idx; > > > > > > > > if (no_uprobe_events()) > > > > return 0; > > > > @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > > > > > > > mutex_lock(uprobes_mmap_hash(inode)); > > > > build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); > > > > + srcu_idx = srcu_read_lock(&uprobes_srcu); > > > > > > Thanks for catching that (production testing FTW, right?!). > > > > Correct. I am running some hosts with RCU_PROVING and I am finding some > > cases where RCU protected areas are touched without holding the RCU read > > lock. > > > > > But I think you a) adding wrong RCU protection flavor (it has to be > > > rcu_read_lock_trace()/rcu_read_unlock_trace(), see uprobe_apply() for > > > an example) and b) I think this is the wrong place to add it. We > > > should add it inside filter_chain(). filter_chain() is called from > > > three places, only one of which is already RCU protected (that's the > > > handler_chain() case). But there is also register_for_each_vma(), > > > which needs RCU protection as well. > > > > Thanks for the guidance! > > > > My initial plan was to protect filter_chain(), but, handler_chain() > > already has the lock. Is it OK to get into a critical section in a > > nested form? > > > > The code will be something like: > > > > handle_swbp() { > > rcu_read_lock_trace(); > > handler_chain() { > > filter_chain() { > > rcu_read_lock_trace(); > > list_for_each_entry_rcu() > > rcu_read_lock_trace(); > > } > > } > > rcu_read_lock_trace(); > > } > > > > Is this nested locking fine? > > Yes, it's totally fine to nest RCU lock regions. As long as you don't nest them more than 255 deep in CONFIG_PREEMPT=n kernels that also have CONFIG_PREEMPT_COUNT=y, or more than 2G deep in CONFIG_PREEMPT=y kernels. For a limited time only, in CONFIG_PREEMPT=n kernels that also have CONFIG_PREEMPT_COUNT=n, you can nest as deeply as you want. ;-) Sorry, couldn't resist... Thanx, Paul
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 9cf0dce62e4c..2785d1bedb74 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -29,13 +29,21 @@ struct page; #define MAX_URETPROBE_DEPTH 64 struct uprobe_consumer { + /* + * handler() can return UPROBE_HANDLER_REMOVE to signal the need to + * unregister uprobe for current process. If UPROBE_HANDLER_REMOVE is + * returned, filter() callback has to be implemented as well and it + * should return false to "confirm" the decision to uninstall uprobe + * for the current process. If filter() is omitted or returns true, + * UPROBE_HANDLER_REMOVE is effectively ignored. + */ int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs); int (*ret_handler)(struct uprobe_consumer *self, unsigned long func, struct pt_regs *regs); bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm); - struct uprobe_consumer *next; + struct list_head cons_node; }; #ifdef CONFIG_UPROBES diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8bdcdc6901b2..97e58d160647 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -59,7 +59,7 @@ struct uprobe { struct rw_semaphore register_rwsem; struct rw_semaphore consumer_rwsem; struct list_head pending_list; - struct uprobe_consumer *consumers; + struct list_head consumers; struct inode *inode; /* Also hold a ref to inode */ struct rcu_head rcu; loff_t offset; @@ -783,6 +783,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, uprobe->inode = inode; uprobe->offset = offset; uprobe->ref_ctr_offset = ref_ctr_offset; + INIT_LIST_HEAD(&uprobe->consumers); init_rwsem(&uprobe->register_rwsem); init_rwsem(&uprobe->consumer_rwsem); RB_CLEAR_NODE(&uprobe->rb_node); @@ -808,32 +809,19 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) { down_write(&uprobe->consumer_rwsem); - uc->next = uprobe->consumers; - uprobe->consumers = uc; + list_add_rcu(&uc->cons_node, &uprobe->consumers); up_write(&uprobe->consumer_rwsem); } /* * For uprobe @uprobe, delete the consumer @uc. - * Return true if the @uc is deleted successfully - * or return false. + * Should never be called with consumer that's not part of @uprobe->consumers. */ -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) { - struct uprobe_consumer **con; - bool ret = false; - down_write(&uprobe->consumer_rwsem); - for (con = &uprobe->consumers; *con; con = &(*con)->next) { - if (*con == uc) { - *con = uc->next; - ret = true; - break; - } - } + list_del_rcu(&uc->cons_node); up_write(&uprobe->consumer_rwsem); - - return ret; } static int __copy_insn(struct address_space *mapping, struct file *filp, @@ -929,7 +917,8 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) bool ret = false; down_read(&uprobe->consumer_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { ret = consumer_filter(uc, mm); if (ret) break; @@ -1125,18 +1114,29 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) int err; down_write(&uprobe->register_rwsem); - if (WARN_ON(!consumer_del(uprobe, uc))) { - err = -ENOENT; - } else { - err = register_for_each_vma(uprobe, NULL); - /* TODO : cant unregister? schedule a worker thread */ - if (unlikely(err)) - uprobe_warn(current, "unregister, leaking uprobe"); - } + consumer_del(uprobe, uc); + err = register_for_each_vma(uprobe, NULL); up_write(&uprobe->register_rwsem); - if (!err) - put_uprobe(uprobe); + /* TODO : cant unregister? schedule a worker thread */ + if (unlikely(err)) { + uprobe_warn(current, "unregister, leaking uprobe"); + goto out_sync; + } + + put_uprobe(uprobe); + +out_sync: + /* + * Now that handler_chain() and handle_uretprobe_chain() iterate over + * uprobe->consumers list under RCU protection without holding + * uprobe->register_rwsem, we need to wait for RCU grace period to + * make sure that we can't call into just unregistered + * uprobe_consumer's callbacks anymore. If we don't do that, fast and + * unlucky enough caller can free consumer's memory and cause + * handler_chain() or handle_uretprobe_chain() to do an use-after-free. + */ + synchronize_srcu(&uprobes_srcu); } EXPORT_SYMBOL_GPL(uprobe_unregister); @@ -1214,13 +1214,20 @@ EXPORT_SYMBOL_GPL(uprobe_register); int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) { struct uprobe_consumer *con; - int ret = -ENOENT; + int ret = -ENOENT, srcu_idx; down_write(&uprobe->register_rwsem); - for (con = uprobe->consumers; con && con != uc ; con = con->next) - ; - if (con) - ret = register_for_each_vma(uprobe, add ? uc : NULL); + + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { + if (con == uc) { + ret = register_for_each_vma(uprobe, add ? uc : NULL); + break; + } + } + srcu_read_unlock(&uprobes_srcu, srcu_idx); + up_write(&uprobe->register_rwsem); return ret; @@ -2085,10 +2092,12 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) struct uprobe_consumer *uc; int remove = UPROBE_HANDLER_REMOVE; bool need_prep = false; /* prepare return uprobe, when needed */ + bool has_consumers = false; - down_read(&uprobe->register_rwsem); current->utask->auprobe = &uprobe->arch; - for (uc = uprobe->consumers; uc; uc = uc->next) { + + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { int rc = 0; if (uc->handler) { @@ -2101,17 +2110,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) need_prep = true; remove &= rc; + has_consumers = true; } current->utask->auprobe = NULL; if (need_prep && !remove) prepare_uretprobe(uprobe, regs); /* put bp at return */ - if (remove && uprobe->consumers) { - WARN_ON(!uprobe_is_active(uprobe)); - unapply_uprobe(uprobe, current->mm); + if (remove && has_consumers) { + down_read(&uprobe->register_rwsem); + + /* re-check that removal is still required, this time under lock */ + if (!filter_chain(uprobe, current->mm)) { + WARN_ON(!uprobe_is_active(uprobe)); + unapply_uprobe(uprobe, current->mm); + } + + up_read(&uprobe->register_rwsem); } - up_read(&uprobe->register_rwsem); } static void @@ -2119,13 +2135,15 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) { struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; + int srcu_idx; - down_read(&uprobe->register_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { if (uc->ret_handler) uc->ret_handler(uc, ri->func, regs); } - up_read(&uprobe->register_rwsem); + srcu_read_unlock(&uprobes_srcu, srcu_idx); } static struct return_instance *find_next_ret_chain(struct return_instance *ri)