diff mbox series

[v3,1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug

Message ID 20230904180806.1002832-1-joel@joelfernandes.org (mailing list archive)
State New
Headers show
Series [v3,1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug | expand

Commit Message

Joel Fernandes Sept. 4, 2023, 6:08 p.m. UTC
It is unsafe to dump vmalloc area information when trying to do so from
some contexts. Add a safer trylock version of the same function to do a
best-effort VMA finding and use it from vmalloc_dump_obj().

[applied test robot feedback on unused function fix.]
[applied Uladzislau feedback on locking.]

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Cc: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/vmalloc.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Lorenzo Stoakes Sept. 5, 2023, 7:09 a.m. UTC | #1
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> It is unsafe to dump vmalloc area information when trying to do so from
> some contexts. Add a safer trylock version of the same function to do a
> best-effort VMA finding and use it from vmalloc_dump_obj().

It'd be nice to have more details as to precisely which contexts and what this
resolves.

>
> [applied test robot feedback on unused function fix.]
> [applied Uladzislau feedback on locking.]
>
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/vmalloc.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..2c6a0e2ff404 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  #ifdef CONFIG_PRINTK
>  bool vmalloc_dump_obj(void *object)
>  {
> -	struct vm_struct *vm;
>  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> +	const void *caller;
> +	struct vm_struct *vm;
> +	struct vmap_area *va;
> +	unsigned long addr;
> +	unsigned int nr_pages;
>
> -	vm = find_vm_area(objp);
> -	if (!vm)
> +	if (!spin_trylock(&vmap_area_lock))
> +		return false;

It'd be good to have a comment here explaining why we must trylock here. I am
also concerned that in the past this function would return false only if the
address was not a vmalloc one, but now it might just return false due to lock
contention and the user has no idea which it is?

I'd want to at least output "vmalloc region cannot lookup lock contention"
vs. the below cannot find case.

Under heavy lock contention aren't you potentially breaking the ability to
introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
contexts under which acquiring this spinlock is not appropriate?

> +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> +	if (!va) {
> +		spin_unlock(&vmap_area_lock);
>  		return false;
> +	}
> +
> +	vm = va->vm;
> +	if (!vm) {
> +		spin_unlock(&vmap_area_lock);
> +		return false;
> +	}
> +	addr = (unsigned long)vm->addr;
> +	caller = vm->caller;
> +	nr_pages = vm->nr_pages;
> +	spin_unlock(&vmap_area_lock);
>  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> +		nr_pages, addr, caller);
>  	return true;
>  }
>  #endif
> --
> 2.42.0.283.g2d96d420d3-goog
>
Joel Fernandes Sept. 5, 2023, 11:47 a.m. UTC | #2
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > It is unsafe to dump vmalloc area information when trying to do so from
> > some contexts. Add a safer trylock version of the same function to do a
> > best-effort VMA finding and use it from vmalloc_dump_obj().
> 
> It'd be nice to have more details as to precisely which contexts and what this
> resolves.

True. I was hoping the 'trylock' mention would be sufficient (example hardirq
context interrupting a lock-held region) but you're right.

> > [applied test robot feedback on unused function fix.]
> > [applied Uladzislau feedback on locking.]
> >
> > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: rcu@vger.kernel.org
> > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 93cf99aba335..2c6a0e2ff404 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  #ifdef CONFIG_PRINTK
> >  bool vmalloc_dump_obj(void *object)
> >  {
> > -	struct vm_struct *vm;
> >  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > +	const void *caller;
> > +	struct vm_struct *vm;
> > +	struct vmap_area *va;
> > +	unsigned long addr;
> > +	unsigned int nr_pages;
> >
> > -	vm = find_vm_area(objp);
> > -	if (!vm)
> > +	if (!spin_trylock(&vmap_area_lock))
> > +		return false;
> 
> It'd be good to have a comment here explaining why we must trylock here. I am
> also concerned that in the past this function would return false only if the
> address was not a vmalloc one, but now it might just return false due to lock
> contention and the user has no idea which it is?
> 
> I'd want to at least output "vmalloc region cannot lookup lock contention"
> vs. the below cannot find case.

In the patch 2/2 we do print if the address looks like a vmalloc address even
if the vmalloc look up fails.

Also the reporter's usecase is not a common one. We only attempt to dump
information if there was a debug objects failure (example if somebody did a
double call_rcu). In such a situation, the patch will prevent a deadlock and
still print something about the address.

> Under heavy lock contention aren't you potentially breaking the ability to
> introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> contexts under which acquiring this spinlock is not appropriate?

Yes this is a good point, but there's another case as well: PREEMPT_RT can
sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
call_rcu() as it may be called in contexts that cannot sleep. So we handle
that also using trylock.

Thanks for the review!

 - Joel


> 
> > +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > +	if (!va) {
> > +		spin_unlock(&vmap_area_lock);
> >  		return false;
> > +	}
> > +
> > +	vm = va->vm;
> > +	if (!vm) {
> > +		spin_unlock(&vmap_area_lock);
> > +		return false;
> > +	}
> > +	addr = (unsigned long)vm->addr;
> > +	caller = vm->caller;
> > +	nr_pages = vm->nr_pages;
> > +	spin_unlock(&vmap_area_lock);
> >  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > +		nr_pages, addr, caller);
> >  	return true;
> >  }
> >  #endif
> > --
> > 2.42.0.283.g2d96d420d3-goog
> >
Lorenzo Stoakes Sept. 6, 2023, 7:23 p.m. UTC | #3
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > It is unsafe to dump vmalloc area information when trying to do so from
> > > some contexts. Add a safer trylock version of the same function to do a
> > > best-effort VMA finding and use it from vmalloc_dump_obj().
> >
> > It'd be nice to have more details as to precisely which contexts and what this
> > resolves.
>
> True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> context interrupting a lock-held region) but you're right.
>
> > > [applied test robot feedback on unused function fix.]
> > > [applied Uladzislau feedback on locking.]
> > >
> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: rcu@vger.kernel.org
> > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 93cf99aba335..2c6a0e2ff404 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > >  #ifdef CONFIG_PRINTK
> > >  bool vmalloc_dump_obj(void *object)
> > >  {
> > > -   struct vm_struct *vm;
> > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > +   const void *caller;
> > > +   struct vm_struct *vm;
> > > +   struct vmap_area *va;
> > > +   unsigned long addr;
> > > +   unsigned int nr_pages;
> > >
> > > -   vm = find_vm_area(objp);
> > > -   if (!vm)
> > > +   if (!spin_trylock(&vmap_area_lock))
> > > +           return false;
> >
> > It'd be good to have a comment here explaining why we must trylock here. I am
> > also concerned that in the past this function would return false only if the
> > address was not a vmalloc one, but now it might just return false due to lock
> > contention and the user has no idea which it is?
> >
> > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > vs. the below cannot find case.
>
> In the patch 2/2 we do print if the address looks like a vmalloc address even
> if the vmalloc look up fails.

No, you output exactly what was output before, only changing what it
means and in no way differentiating between couldn't find vmalloc
area/couldn't get lock.

>
> Also the reporter's usecase is not a common one. We only attempt to dump
> information if there was a debug objects failure (example if somebody did a
> double call_rcu). In such a situation, the patch will prevent a deadlock and
> still print something about the address.

Right, but the function still purports to do X but does Y.

>
> > Under heavy lock contention aren't you potentially breaking the ability to
> > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > contexts under which acquiring this spinlock is not appropriate?
>
> Yes this is a good point, but there's another case as well: PREEMPT_RT can
> sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> call_rcu() as it may be called in contexts that cannot sleep. So we handle
> that also using trylock.

Right so somebody now has to find this email to realise that. I hate
implicit knowledge like this, it needs a comment. It also furthers the
point that it'd be useful to differentiate between the two.

>
> Thanks for the review!

This got merged despite my outstanding comments so I guess I'll have
to follow up with a patch.

>
>  - Joel
>
>
> >
> > > +   va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > > +   if (!va) {
> > > +           spin_unlock(&vmap_area_lock);
> > >             return false;
> > > +   }
> > > +
> > > +   vm = va->vm;
> > > +   if (!vm) {
> > > +           spin_unlock(&vmap_area_lock);
> > > +           return false;
> > > +   }
> > > +   addr = (unsigned long)vm->addr;
> > > +   caller = vm->caller;
> > > +   nr_pages = vm->nr_pages;
> > > +   spin_unlock(&vmap_area_lock);
> > >     pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > > -           vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > > +           nr_pages, addr, caller);
> > >     return true;
> > >  }
> > >  #endif
> > > --
> > > 2.42.0.283.g2d96d420d3-goog
> > >

This reads like another 'nice review and I agree but I won't change
anything!'...
Lorenzo Stoakes Sept. 6, 2023, 7:46 p.m. UTC | #4
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > some contexts. Add a safer trylock version of the same function to do a
> > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > >
> > > It'd be nice to have more details as to precisely which contexts and what this
> > > resolves.
> >
> > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > context interrupting a lock-held region) but you're right.
> >
> > > > [applied test robot feedback on unused function fix.]
> > > > [applied Uladzislau feedback on locking.]
> > > >
> > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: rcu@vger.kernel.org
> > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > >  #ifdef CONFIG_PRINTK
> > > >  bool vmalloc_dump_obj(void *object)
> > > >  {
> > > > -   struct vm_struct *vm;
> > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > +   const void *caller;
> > > > +   struct vm_struct *vm;
> > > > +   struct vmap_area *va;
> > > > +   unsigned long addr;
> > > > +   unsigned int nr_pages;
> > > >
> > > > -   vm = find_vm_area(objp);
> > > > -   if (!vm)
> > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > +           return false;
> > >
> > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > also concerned that in the past this function would return false only if the
> > > address was not a vmalloc one, but now it might just return false due to lock
> > > contention and the user has no idea which it is?
> > >
> > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > vs. the below cannot find case.
> >
> > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > if the vmalloc look up fails.
>
> No, you output exactly what was output before, only changing what it
> means and in no way differentiating between couldn't find vmalloc
> area/couldn't get lock.
>
> >
> > Also the reporter's usecase is not a common one. We only attempt to dump
> > information if there was a debug objects failure (example if somebody did a
> > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > still print something about the address.
>
> Right, but the function still purports to do X but does Y.
>
> >
> > > Under heavy lock contention aren't you potentially breaking the ability to
> > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > contexts under which acquiring this spinlock is not appropriate?
> >
> > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > that also using trylock.
>
> Right so somebody now has to find this email to realise that. I hate
> implicit knowledge like this, it needs a comment. It also furthers the
> point that it'd be useful to differentiate between the two.
>
> >
> > Thanks for the review!
>
> This got merged despite my outstanding comments so I guess I'll have
> to follow up with a patch.
>
> >
> >  - Joel
> >
> >
> > >
> > > > +   va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > > > +   if (!va) {
> > > > +           spin_unlock(&vmap_area_lock);
> > > >             return false;
> > > > +   }
> > > > +
> > > > +   vm = va->vm;
> > > > +   if (!vm) {
> > > > +           spin_unlock(&vmap_area_lock);
> > > > +           return false;
> > > > +   }
> > > > +   addr = (unsigned long)vm->addr;
> > > > +   caller = vm->caller;
> > > > +   nr_pages = vm->nr_pages;
> > > > +   spin_unlock(&vmap_area_lock);
> > > >     pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > > > -           vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > > > +           nr_pages, addr, caller);
> > > >     return true;
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.42.0.283.g2d96d420d3-goog
> > > >
>
> This reads like another 'nice review and I agree but I won't change
> anything!'...
>

Sorry I actually wrote this unkind comment in a moment of annoyance then
meant to delete it but of course forgot to :>) Disregard this bit.

Happy for pushback/disagreement, just feel like a few little touchups would
have helped improve documentation/clarity of what this series does.

Obviously stability matters more so perhaps touch-ups best as a follow up
series... though would be nice to have a comment to that effect.

Thanks!
Joel Fernandes Sept. 6, 2023, 10:46 p.m. UTC | #5
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > some contexts. Add a safer trylock version of the same function to do a
> > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > >
> > > It'd be nice to have more details as to precisely which contexts and what this
> > > resolves.
> >
> > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > context interrupting a lock-held region) but you're right.
> >
> > > > [applied test robot feedback on unused function fix.]
> > > > [applied Uladzislau feedback on locking.]
> > > >
> > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: rcu@vger.kernel.org
> > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > >  #ifdef CONFIG_PRINTK
> > > >  bool vmalloc_dump_obj(void *object)
> > > >  {
> > > > -   struct vm_struct *vm;
> > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > +   const void *caller;
> > > > +   struct vm_struct *vm;
> > > > +   struct vmap_area *va;
> > > > +   unsigned long addr;
> > > > +   unsigned int nr_pages;
> > > >
> > > > -   vm = find_vm_area(objp);
> > > > -   if (!vm)
> > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > +           return false;
> > >
> > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > also concerned that in the past this function would return false only if the
> > > address was not a vmalloc one, but now it might just return false due to lock
> > > contention and the user has no idea which it is?
> > >
> > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > vs. the below cannot find case.
> >
> > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > if the vmalloc look up fails.
> 
> No, you output exactly what was output before, only changing what it
> means and in no way differentiating between couldn't find vmalloc
> area/couldn't get lock.

2/2 does this:
                         -     if (virt_addr_valid(object))
                         +     if (is_vmalloc_addr(object))
                         +             type = "vmalloc memory";
                         +     else if (virt_addr_valid(object))
                                       type = "non-slab/vmalloc memory";

This code is executed only if vmalloc_dump_obj() returns false. The
is_vmalloc_addr() was added by 2/2 which is newly added right?

You are right we are not differentiating between trylock failure and failure to
find the vmalloc area. I was just saying, even though we don't differentiate,
we do print "vmalloc memory" right? That wasn't being printed before.

> > Also the reporter's usecase is not a common one. We only attempt to dump
> > information if there was a debug objects failure (example if somebody did a
> > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > still print something about the address.
> 
> Right, but the function still purports to do X but does Y.
> 
> >
> > > Under heavy lock contention aren't you potentially breaking the ability to
> > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > contexts under which acquiring this spinlock is not appropriate?
> >
> > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > that also using trylock.
> 
> Right so somebody now has to find this email to realise that. I hate
> implicit knowledge like this, it needs a comment. It also furthers the
> point that it'd be useful to differentiate between the two.

This is a valid point, and I acknowledged it in last email. A code comment could
indeed be useful.

So I guess from an agreement standpoint, I agree:

1/2 could use an additional comment explaining why we need trylock (sighting
the RT sleeping lock issue).

2/2 could update the existing code to convert "non-slab/vmalloc" to
"non-slab/non-vmalloc". Note: that's an *existing* issue.

The issue in 2/2 is not a new one so that can certainly be a separate patch.
And while at it, we could update the comment in that patch as well.

But the whole differentiating between trylock vs vmalloc area lookup failure
is not that useful -- just my opinion fwiw! I honestly feel differentiating
between trylock vs vmalloc area lookup failure complicates the code because
it will require passing this information down from vmalloc_dump_obj() to the
caller AFAICS and I am not sure if the person reading the debug will really
care much. But I am OK with whatever the -mm community wants and I am happy
to send out a new patch on top with the above that I agree on since Andrew
took these 2 (but for the stuff I don't agree, I would appreciate if you
could send a patch for review and I am happy to review it!).

As you mentioned, this series is a stability fix and we can put touch-ups on
top of it if needed, and there is also plenty of time till the next merge
window. Allow me a few days and I'll do the new patch on top (I'd say dont
bother to spend your time on it, I'll do it).

thanks,

 - Joel


> 
> 
> -- 
> Lorenzo Stoakes
> https://ljs.io
Vlastimil Babka Sept. 7, 2023, 6:53 a.m. UTC | #6
Hi,

On 9/4/23 20:08, Joel Fernandes (Google) wrote:
> It is unsafe to dump vmalloc area information when trying to do so from
> some contexts. Add a safer trylock version of the same function to do a
> best-effort VMA finding and use it from vmalloc_dump_obj().

I was a bit confused by the subject which suggests a new function is added,
but it seems open-coded in its only caller. I assume it's due to evolution
of the series. Something like:

mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()

?

I also notice it's trying hard to copy everything from "vm" to temporary
variables before unlocking, presumably to prevent use-after-free, so should
that be also mentioned in the changelog?

> [applied test robot feedback on unused function fix.]
> [applied Uladzislau feedback on locking.]
> 
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/vmalloc.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..2c6a0e2ff404 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  #ifdef CONFIG_PRINTK
>  bool vmalloc_dump_obj(void *object)
>  {
> -	struct vm_struct *vm;
>  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> +	const void *caller;
> +	struct vm_struct *vm;
> +	struct vmap_area *va;
> +	unsigned long addr;
> +	unsigned int nr_pages;
>  
> -	vm = find_vm_area(objp);
> -	if (!vm)
> +	if (!spin_trylock(&vmap_area_lock))
> +		return false;
> +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> +	if (!va) {
> +		spin_unlock(&vmap_area_lock);
>  		return false;
> +	}
> +
> +	vm = va->vm;
> +	if (!vm) {
> +		spin_unlock(&vmap_area_lock);
> +		return false;
> +	}
> +	addr = (unsigned long)vm->addr;
> +	caller = vm->caller;
> +	nr_pages = vm->nr_pages;
> +	spin_unlock(&vmap_area_lock);
>  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> +		nr_pages, addr, caller);
>  	return true;
>  }
>  #endif
Lorenzo Stoakes Sept. 7, 2023, 7:11 a.m. UTC | #7
On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
> On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> > On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > > some contexts. Add a safer trylock version of the same function to do a
> > > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > > >
> > > > It'd be nice to have more details as to precisely which contexts and what this
> > > > resolves.
> > >
> > > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > > context interrupting a lock-held region) but you're right.
> > >
> > > > > [applied test robot feedback on unused function fix.]
> > > > > [applied Uladzislau feedback on locking.]
> > > > >
> > > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > Cc: rcu@vger.kernel.org
> > > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > > >  #ifdef CONFIG_PRINTK
> > > > >  bool vmalloc_dump_obj(void *object)
> > > > >  {
> > > > > -   struct vm_struct *vm;
> > > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > > +   const void *caller;
> > > > > +   struct vm_struct *vm;
> > > > > +   struct vmap_area *va;
> > > > > +   unsigned long addr;
> > > > > +   unsigned int nr_pages;
> > > > >
> > > > > -   vm = find_vm_area(objp);
> > > > > -   if (!vm)
> > > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > > +           return false;
> > > >
> > > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > > also concerned that in the past this function would return false only if the
> > > > address was not a vmalloc one, but now it might just return false due to lock
> > > > contention and the user has no idea which it is?
> > > >
> > > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > > vs. the below cannot find case.
> > >
> > > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > > if the vmalloc look up fails.
> >
> > No, you output exactly what was output before, only changing what it
> > means and in no way differentiating between couldn't find vmalloc
> > area/couldn't get lock.
>
> 2/2 does this:
>                          -     if (virt_addr_valid(object))
>                          +     if (is_vmalloc_addr(object))
>                          +             type = "vmalloc memory";
>                          +     else if (virt_addr_valid(object))
>                                        type = "non-slab/vmalloc memory";
>
> This code is executed only if vmalloc_dump_obj() returns false. The
> is_vmalloc_addr() was added by 2/2 which is newly added right?
>
> You are right we are not differentiating between trylock failure and failure to
> find the vmalloc area. I was just saying, even though we don't differentiate,
> we do print "vmalloc memory" right? That wasn't being printed before.
>
> > > Also the reporter's usecase is not a common one. We only attempt to dump
> > > information if there was a debug objects failure (example if somebody did a
> > > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > > still print something about the address.
> >
> > Right, but the function still purports to do X but does Y.
> >
> > >
> > > > Under heavy lock contention aren't you potentially breaking the ability to
> > > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > > contexts under which acquiring this spinlock is not appropriate?
> > >
> > > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > > that also using trylock.
> >
> > Right so somebody now has to find this email to realise that. I hate
> > implicit knowledge like this, it needs a comment. It also furthers the
> > point that it'd be useful to differentiate between the two.
>
> This is a valid point, and I acknowledged it in last email. A code comment could
> indeed be useful.

Thanks, yeah this may seem trivial, but I am quite sensitive about things
being added to the code base that are neither described in commit msg nor
in a comment or elsewhere and become 'implicit' in a sense.

So just a simple comment here would be helpful, and I'm glad we're in
agreement on that, will leave to you to do a follow up patch.

>
> So I guess from an agreement standpoint, I agree:
>
> 1/2 could use an additional comment explaining why we need trylock (sighting
> the RT sleeping lock issue).
>
> 2/2 could update the existing code to convert "non-slab/vmalloc" to
> "non-slab/non-vmalloc". Note: that's an *existing* issue.

Yeah sorry this whole thing was rather confusing, it did indeed (unclearly)
specify non-/non- in the past (on assumption dumping function would work),
addition of vmalloc check now makes that correct again, the phrasing is the
issue.

You can leave this as-is as yeah, you're right, this was a pre-existing issue.

virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is
checked above so already been ruled out, now you ruled out vmalloc.

Just a bit tricksy.

>
> The issue in 2/2 is not a new one so that can certainly be a separate patch.
> And while at it, we could update the comment in that patch as well.
>
> But the whole differentiating between trylock vs vmalloc area lookup failure
> is not that useful -- just my opinion fwiw! I honestly feel differentiating
> between trylock vs vmalloc area lookup failure complicates the code because
> it will require passing this information down from vmalloc_dump_obj() to the
> caller AFAICS and I am not sure if the person reading the debug will really
> care much. But I am OK with whatever the -mm community wants and I am happy
> to send out a new patch on top with the above that I agree on since Andrew
> took these 2 (but for the stuff I don't agree, I would appreciate if you
> could send a patch for review and I am happy to review it!).

Ah right, I think maybe I wasn't clear, all I meant to suggest is to output
log output rather than feed anything back to caller, something like:-

if (!spin_trylock(&vmap_area_lock)) {
        pr_cont(" [couldn't acquire vmap lock]\n");
	...
}

My concern is that in the past this function would only return false if it
couldn't find the address in a VA, now it returns false also if you happen
to call it when the spinlock is locked, which might be confusing for
somebody debugging this.

HOWEVER, since you now indicate that the address is vmalloc anyway, and you
_absolutely cannot_ give any further details safely, perhaps this
additional information is indeed not that usful.

My concern was just feeling antsy that we suddenly don't do something
because a lock happens to be applied but as you say that cannot be helped
in certain contexts.

So actually, leave this.

>
> As you mentioned, this series is a stability fix and we can put touch-ups on
> top of it if needed, and there is also plenty of time till the next merge
> window. Allow me a few days and I'll do the new patch on top (I'd say dont
> bother to spend your time on it, I'll do it).

Ack, I was just a little frustrated we didn't reach a resolution on review
(either deciding things could be deferred or having changes) before
merge. Obviously fine to prioritise, but would be good to have that
explicitly stated.

>
> thanks,
>
>  - Joel
>
>
> >
> >

Anyway, so TL;DR:-

1. As we both agree, add a comment to explain why you need the spin trylock.
(there are no further steps :P)

And I don't believe this actually needs any further changes after this
discussion*, so if you fancy doing a follow up to that effect that will
suffice for me thanks!

* Though I strongly feel vmalloc as a whole needs top-to-bottom
  refactoring, but that's another story...
Uladzislau Rezki Sept. 7, 2023, 9:23 a.m. UTC | #8
On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
> On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
> > On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > > > some contexts. Add a safer trylock version of the same function to do a
> > > > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > > > >
> > > > > It'd be nice to have more details as to precisely which contexts and what this
> > > > > resolves.
> > > >
> > > > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > > > context interrupting a lock-held region) but you're right.
> > > >
> > > > > > [applied test robot feedback on unused function fix.]
> > > > > > [applied Uladzislau feedback on locking.]
> > > > > >
> > > > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > > Cc: rcu@vger.kernel.org
> > > > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > > > >  #ifdef CONFIG_PRINTK
> > > > > >  bool vmalloc_dump_obj(void *object)
> > > > > >  {
> > > > > > -   struct vm_struct *vm;
> > > > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > > > +   const void *caller;
> > > > > > +   struct vm_struct *vm;
> > > > > > +   struct vmap_area *va;
> > > > > > +   unsigned long addr;
> > > > > > +   unsigned int nr_pages;
> > > > > >
> > > > > > -   vm = find_vm_area(objp);
> > > > > > -   if (!vm)
> > > > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > > > +           return false;
> > > > >
> > > > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > > > also concerned that in the past this function would return false only if the
> > > > > address was not a vmalloc one, but now it might just return false due to lock
> > > > > contention and the user has no idea which it is?
> > > > >
> > > > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > > > vs. the below cannot find case.
> > > >
> > > > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > > > if the vmalloc look up fails.
> > >
> > > No, you output exactly what was output before, only changing what it
> > > means and in no way differentiating between couldn't find vmalloc
> > > area/couldn't get lock.
> >
> > 2/2 does this:
> >                          -     if (virt_addr_valid(object))
> >                          +     if (is_vmalloc_addr(object))
> >                          +             type = "vmalloc memory";
> >                          +     else if (virt_addr_valid(object))
> >                                        type = "non-slab/vmalloc memory";
> >
> > This code is executed only if vmalloc_dump_obj() returns false. The
> > is_vmalloc_addr() was added by 2/2 which is newly added right?
> >
> > You are right we are not differentiating between trylock failure and failure to
> > find the vmalloc area. I was just saying, even though we don't differentiate,
> > we do print "vmalloc memory" right? That wasn't being printed before.
> >
> > > > Also the reporter's usecase is not a common one. We only attempt to dump
> > > > information if there was a debug objects failure (example if somebody did a
> > > > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > > > still print something about the address.
> > >
> > > Right, but the function still purports to do X but does Y.
> > >
> > > >
> > > > > Under heavy lock contention aren't you potentially breaking the ability to
> > > > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > > > contexts under which acquiring this spinlock is not appropriate?
> > > >
> > > > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > > > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > > > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > > > that also using trylock.
> > >
> > > Right so somebody now has to find this email to realise that. I hate
> > > implicit knowledge like this, it needs a comment. It also furthers the
> > > point that it'd be useful to differentiate between the two.
> >
> > This is a valid point, and I acknowledged it in last email. A code comment could
> > indeed be useful.
> 
> Thanks, yeah this may seem trivial, but I am quite sensitive about things
> being added to the code base that are neither described in commit msg nor
> in a comment or elsewhere and become 'implicit' in a sense.
> 
> So just a simple comment here would be helpful, and I'm glad we're in
> agreement on that, will leave to you to do a follow up patch.
> 
> >
> > So I guess from an agreement standpoint, I agree:
> >
> > 1/2 could use an additional comment explaining why we need trylock (sighting
> > the RT sleeping lock issue).
> >
> > 2/2 could update the existing code to convert "non-slab/vmalloc" to
> > "non-slab/non-vmalloc". Note: that's an *existing* issue.
> 
> Yeah sorry this whole thing was rather confusing, it did indeed (unclearly)
> specify non-/non- in the past (on assumption dumping function would work),
> addition of vmalloc check now makes that correct again, the phrasing is the
> issue.
> 
> You can leave this as-is as yeah, you're right, this was a pre-existing issue.
> 
> virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is
> checked above so already been ruled out, now you ruled out vmalloc.
> 
> Just a bit tricksy.
> 
> >
> > The issue in 2/2 is not a new one so that can certainly be a separate patch.
> > And while at it, we could update the comment in that patch as well.
> >
> > But the whole differentiating between trylock vs vmalloc area lookup failure
> > is not that useful -- just my opinion fwiw! I honestly feel differentiating
> > between trylock vs vmalloc area lookup failure complicates the code because
> > it will require passing this information down from vmalloc_dump_obj() to the
> > caller AFAICS and I am not sure if the person reading the debug will really
> > care much. But I am OK with whatever the -mm community wants and I am happy
> > to send out a new patch on top with the above that I agree on since Andrew
> > took these 2 (but for the stuff I don't agree, I would appreciate if you
> > could send a patch for review and I am happy to review it!).
> 
> Ah right, I think maybe I wasn't clear, all I meant to suggest is to output
> log output rather than feed anything back to caller, something like:-
> 
> if (!spin_trylock(&vmap_area_lock)) {
>         pr_cont(" [couldn't acquire vmap lock]\n");
> 	...
> }
> 
> My concern is that in the past this function would only return false if it
> couldn't find the address in a VA, now it returns false also if you happen
> to call it when the spinlock is locked, which might be confusing for
> somebody debugging this.
> 
> HOWEVER, since you now indicate that the address is vmalloc anyway, and you
> _absolutely cannot_ give any further details safely, perhaps this
> additional information is indeed not that usful.
> 
> My concern was just feeling antsy that we suddenly don't do something
> because a lock happens to be applied but as you say that cannot be helped
> in certain contexts.
> 
> So actually, leave this.
> 
> >
> > As you mentioned, this series is a stability fix and we can put touch-ups on
> > top of it if needed, and there is also plenty of time till the next merge
> > window. Allow me a few days and I'll do the new patch on top (I'd say dont
> > bother to spend your time on it, I'll do it).
> 
> Ack, I was just a little frustrated we didn't reach a resolution on review
> (either deciding things could be deferred or having changes) before
> merge. Obviously fine to prioritise, but would be good to have that
> explicitly stated.
> 
> >
> > thanks,
> >
> >  - Joel
> >
> >
> > >
> > >
> 
> Anyway, so TL;DR:-
> 
> 1. As we both agree, add a comment to explain why you need the spin trylock.
> (there are no further steps :P)
> 
> And I don't believe this actually needs any further changes after this
> discussion*, so if you fancy doing a follow up to that effect that will
> suffice for me thanks!
> 
For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non
slepable context, this is just reality, because we use a sleep type of
spinlock.

I am not sure how urgent we need this fix. But to me it looks like
debuging and corner case. Probably i am wrong and miss something.
But if it is correct, i would just bailout for RT kernel and rework
later in a more proper way. For example implement a safe way of RCU
scan but this is also another story.

--
Uladzislau Rezki
Joel Fernandes Sept. 8, 2023, 12:18 a.m. UTC | #9
On Thu, Sep 07, 2023 at 11:23:40AM +0200, Uladzislau Rezki wrote:
> On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
[..]
> > Anyway, so TL;DR:-
> > 
> > 1. As we both agree, add a comment to explain why you need the spin trylock.
> > (there are no further steps :P)
> > 
> > And I don't believe this actually needs any further changes after this
> > discussion*, so if you fancy doing a follow up to that effect that will
> > suffice for me thanks!

Thanks.

> For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non
> slepable context, this is just reality, because we use a sleep type of
> spinlock.
> 
> I am not sure how urgent we need this fix. But to me it looks like
> debuging and corner case. Probably i am wrong and miss something.
> But if it is correct, i would just bailout for RT kernel and rework
> later in a more proper way. For example implement a safe way of RCU
> scan but this is also another story.

Bailing out for RT kernel is insufficient, as we need the trylock() to avoid
self-deadlock as well for !PREEMPT_RT. Plus IIRC in the past there was a
opposition to special-casing PREEMPT_RT in code as well. Admittedly those
PREEMPT_RT cases were related to detecting preempt-disabled than a lock-held
section though.

We could optionally do a trylock() loop + bail out after certain number of
tries as well but that would compilicate the code a bit more and I am not
sure if it is worth it. Still if you guys feel strongly about doing something
like that, let me know and I can give it a try :).

thanks,

 - Joel
Joel Fernandes Sept. 8, 2023, 12:47 a.m. UTC | #10
On Thu, Sep 07, 2023 at 08:53:14AM +0200, Vlastimil Babka wrote:
> Hi,
> 
> On 9/4/23 20:08, Joel Fernandes (Google) wrote:
> > It is unsafe to dump vmalloc area information when trying to do so from
> > some contexts. Add a safer trylock version of the same function to do a
> > best-effort VMA finding and use it from vmalloc_dump_obj().
> 
> I was a bit confused by the subject which suggests a new function is added,
> but it seems open-coded in its only caller. I assume it's due to evolution
> of the series. Something like:
> 
> mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()
> 
> ?
> 
> I also notice it's trying hard to copy everything from "vm" to temporary
> variables before unlocking, presumably to prevent use-after-free, so should
> that be also mentioned in the changelog?

Apologies for the less-than-ideal changelog. Andrew would you mind replacing
the merged patch with the below one instead? It just contains non-functional
changes to change log and an additional code comment/print. Thanks!

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v3.1] mm/vmalloc: Add a safer inlined version of find_vm_area() for
 debug

It is unsafe to dump vmalloc area information when trying to do so from
some contexts such as PREEMPT_RT or from an IRQ handler that interrupted
a vmap_area_lock-held region. Add a safer and inlined trylock version of
find_vm_area() to do a best-effort VMA finding and use it from
vmalloc_dump_obj().

While the vmap_area_lock is held, copy interesting attributes from the
vm_struct before unlocking.

[applied test robot feedback on unused function fix.]
[applied Uladzislau feedback on locking.]
[applied Vlastimil and Lorenzo feedback on changelog, comment and print
improvements]

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Cc: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/vmalloc.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..990a0d5efba8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4274,14 +4274,40 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 #ifdef CONFIG_PRINTK
 bool vmalloc_dump_obj(void *object)
 {
-	struct vm_struct *vm;
 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+	const void *caller;
+	struct vm_struct *vm;
+	struct vmap_area *va;
+	unsigned long addr;
+	unsigned int nr_pages;
+
+	/*
+	 * Use trylock as we don't want to contend since this is debug code and
+	 * we might run this code in contexts like PREEMPT_RT where spinlock
+	 * contention may result in sleeping, or from an IRQ handler which
+	 * might interrupt a vmap_area_lock-held critical section.
+	 */
+	if (!spin_trylock(&vmap_area_lock)) {
+		pr_cont(" [couldn't acquire vmap_area_lock]\n");
+		return false;
+	}
+	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
+	if (!va) {
+		spin_unlock(&vmap_area_lock);
+		return false;
+	}
 
-	vm = find_vm_area(objp);
-	if (!vm)
+	vm = va->vm;
+	if (!vm) {
+		spin_unlock(&vmap_area_lock);
 		return false;
+	}
+	addr = (unsigned long)vm->addr;
+	caller = vm->caller;
+	nr_pages = vm->nr_pages;
+	spin_unlock(&vmap_area_lock);
 	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
-		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
+		nr_pages, addr, caller);
 	return true;
 }
 #endif
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..2c6a0e2ff404 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4274,14 +4274,32 @@  void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 #ifdef CONFIG_PRINTK
 bool vmalloc_dump_obj(void *object)
 {
-	struct vm_struct *vm;
 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+	const void *caller;
+	struct vm_struct *vm;
+	struct vmap_area *va;
+	unsigned long addr;
+	unsigned int nr_pages;
 
-	vm = find_vm_area(objp);
-	if (!vm)
+	if (!spin_trylock(&vmap_area_lock))
+		return false;
+	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
+	if (!va) {
+		spin_unlock(&vmap_area_lock);
 		return false;
+	}
+
+	vm = va->vm;
+	if (!vm) {
+		spin_unlock(&vmap_area_lock);
+		return false;
+	}
+	addr = (unsigned long)vm->addr;
+	caller = vm->caller;
+	nr_pages = vm->nr_pages;
+	spin_unlock(&vmap_area_lock);
 	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
-		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
+		nr_pages, addr, caller);
 	return true;
 }
 #endif