diff mbox series

[02/12] uprobes: grab write mmap lock in unapply_uprobe()

Message ID 20240625002144.3485799-3-andrii@kernel.org (mailing list archive)
State Superseded
Headers show
Series uprobes: add batched register/unregister APIs and per-CPU RW semaphore | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Andrii Nakryiko June 25, 2024, 12:21 a.m. UTC
Given unapply_uprobe() can call remove_breakpoint() which eventually
calls uprobe_write_opcode(), which can modify a set of memory pages and
expects mm->mmap_lock held for write, it needs to have writer lock.

Fix this by switching to mmap_write_lock()/mmap_write_unlock().

Fixes: da1816b1caec ("uprobes: Teach handler_chain() to filter out the probed task")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/uprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Masami Hiramatsu (Google) June 25, 2024, 1:29 a.m. UTC | #1
On Mon, 24 Jun 2024 17:21:34 -0700
Andrii Nakryiko <andrii@kernel.org> wrote:

> Given unapply_uprobe() can call remove_breakpoint() which eventually
> calls uprobe_write_opcode(), which can modify a set of memory pages and
> expects mm->mmap_lock held for write, it needs to have writer lock.
> 
> Fix this by switching to mmap_write_lock()/mmap_write_unlock().
> 

Oops, it is an actual bug, right?

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> Fixes: da1816b1caec ("uprobes: Teach handler_chain() to filter out the probed task")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/events/uprobes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 197fbe4663b5..e896eeecb091 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1235,7 +1235,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	int err = 0;
>  
> -	mmap_read_lock(mm);
> +	mmap_write_lock(mm);
>  	for_each_vma(vmi, vma) {
>  		unsigned long vaddr;
>  		loff_t offset;
> @@ -1252,7 +1252,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>  		vaddr = offset_to_vaddr(vma, uprobe->offset);
>  		err |= remove_breakpoint(uprobe, mm, vaddr);
>  	}
> -	mmap_read_unlock(mm);
> +	mmap_write_unlock(mm);
>  
>  	return err;
>  }
> -- 
> 2.43.0
>
Oleg Nesterov June 25, 2024, 10:50 a.m. UTC | #2
I don't think I can review, I forgot everything, but I'll try to read this
series when I have time to (try to) understand what it does...

On 06/24, Andrii Nakryiko wrote:
>
> Given unapply_uprobe() can call remove_breakpoint() which eventually
> calls uprobe_write_opcode(), which can modify a set of memory pages and
> expects mm->mmap_lock held for write, it needs to have writer lock.
>
> Fix this by switching to mmap_write_lock()/mmap_write_unlock().
>
> Fixes: da1816b1caec ("uprobes: Teach handler_chain() to filter out the probed task")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/events/uprobes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 197fbe4663b5..e896eeecb091 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1235,7 +1235,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	int err = 0;
>
> -	mmap_read_lock(mm);
> +	mmap_write_lock(mm);

Can you explain what exactly is wrong?

FOLL_FORCE/etc is fine under mmap_read_lock(), __replace_page() seems too...

I recall that initially uprobes.c always took mmap_sem for reading, then
register_for_each_vma() was changed by 77fc4af1b59d12 but there was other
reasons for this change...

Again, I don't understand this code today, quite possibly I missed something,
I am just trying to understand.

Well, it seems there is another reason for this change... Currently 2
unapply_uprobe()'s can race with each other if they try to update the same
page. But in this case we can rely on -EAGAIN from __replace_page() ?

Oleg.
Oleg Nesterov June 25, 2024, 2:49 p.m. UTC | #3
On 06/25, Masami Hiramatsu wrote:
>
> On Mon, 24 Jun 2024 17:21:34 -0700
> Andrii Nakryiko <andrii@kernel.org> wrote:
>
> > Given unapply_uprobe() can call remove_breakpoint() which eventually
> > calls uprobe_write_opcode(), which can modify a set of memory pages and
> > expects mm->mmap_lock held for write, it needs to have writer lock.
>
> Oops, it is an actual bug, right?

Why?

So far I don't understand this change. Quite possibly I missed something,
but in this case the changelog should explain the problem more clearly.

Oleg.
Andrii Nakryiko June 25, 2024, 5:37 p.m. UTC | #4
On Tue, Jun 25, 2024 at 7:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/25, Masami Hiramatsu wrote:
> >
> > On Mon, 24 Jun 2024 17:21:34 -0700
> > Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > > Given unapply_uprobe() can call remove_breakpoint() which eventually
> > > calls uprobe_write_opcode(), which can modify a set of memory pages and
> > > expects mm->mmap_lock held for write, it needs to have writer lock.
> >
> > Oops, it is an actual bug, right?
>
> Why?
>
> So far I don't understand this change. Quite possibly I missed something,
> but in this case the changelog should explain the problem more clearly.
>

I just went off of "Called with mm->mmap_lock held for write." comment
in uprobe_write_opcode(), tbh. If we don't actually need writer
mmap_lock, we should probably update at least that comment. There is a
lot going on in uprobe_write_opcode(), and I don't understand all the
requirements there.


> Oleg.
>
Oleg Nesterov June 25, 2024, 7:07 p.m. UTC | #5
On 06/25, Andrii Nakryiko wrote:
>
> On Tue, Jun 25, 2024 at 7:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Why?
> >
> > So far I don't understand this change. Quite possibly I missed something,
> > but in this case the changelog should explain the problem more clearly.
> >
>
> I just went off of "Called with mm->mmap_lock held for write." comment
> in uprobe_write_opcode(), tbh.

Ah, indeed... and git blame makes me sad ;)

I _think_ that 29dedee0e693a updated this comment without any thinking,
but today I can't recall. In any case, today this nothing to do with
mem_cgroup_charge(). Not sure __replace_page() is correct (in this respect)
when it returns -EAGAIN but this is another story.

> If we don't actually need writer
> mmap_lock, we should probably update at least that comment.

Agreed.

> There is a
> lot going on in uprobe_write_opcode(), and I don't understand all the
> requirements there.

Heh. Neither me today ;)

Oleg.
Andrii Nakryiko June 26, 2024, 4:38 p.m. UTC | #6
On Tue, Jun 25, 2024 at 12:09 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/25, Andrii Nakryiko wrote:
> >
> > On Tue, Jun 25, 2024 at 7:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Why?
> > >
> > > So far I don't understand this change. Quite possibly I missed something,
> > > but in this case the changelog should explain the problem more clearly.
> > >
> >
> > I just went off of "Called with mm->mmap_lock held for write." comment
> > in uprobe_write_opcode(), tbh.
>
> Ah, indeed... and git blame makes me sad ;)
>
> I _think_ that 29dedee0e693a updated this comment without any thinking,
> but today I can't recall. In any case, today this nothing to do with
> mem_cgroup_charge(). Not sure __replace_page() is correct (in this respect)
> when it returns -EAGAIN but this is another story.
>
> > If we don't actually need writer
> > mmap_lock, we should probably update at least that comment.
>
> Agreed.

Ok, I'll drop this change and will just update the comment.

>
> > There is a
> > lot going on in uprobe_write_opcode(), and I don't understand all the
> > requirements there.
>
> Heh. Neither me today ;)
>
> Oleg.
>
diff mbox series

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 197fbe4663b5..e896eeecb091 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1235,7 +1235,7 @@  static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	int err = 0;
 
-	mmap_read_lock(mm);
+	mmap_write_lock(mm);
 	for_each_vma(vmi, vma) {
 		unsigned long vaddr;
 		loff_t offset;
@@ -1252,7 +1252,7 @@  static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		vaddr = offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, mm, vaddr);
 	}
-	mmap_read_unlock(mm);
+	mmap_write_unlock(mm);
 
 	return err;
 }