diff mbox series

mm: ratelimit oversized kvmalloc warnings instead of once

Message ID 20240618213421.282381-1-shakeel.butt@linux.dev (mailing list archive)
State New
Headers show
Series mm: ratelimit oversized kvmalloc warnings instead of once | expand

Commit Message

Shakeel Butt June 18, 2024, 9:34 p.m. UTC
At the moment oversize kvmalloc warnings are triggered once using
WARN_ON_ONCE() macro. One issue with this approach is that it only
detects the first abuser and then ignores the remaining abusers which
complicates detecting all such abusers in a timely manner. The situation
becomes worse when the repro has low probability and requires production
traffic and thus require large set of machines to find such abusers. In
Mera production, this warn once is slowing down the detection of these
abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.

Reported-by: Kyle McMartin <kyle@infradead.org>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/util.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Linus Torvalds June 18, 2024, 9:40 p.m. UTC | #1
On Tue, 18 Jun 2024 at 14:34, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> Simply replace WARN_ON_ONCE with WARN_RATELIMIT.

NAK.

Sadly, the RATELIMIT cases are useless.

The normal rate limiting is basically "burst of up to ten, every five seconds".

That's going to completely swamp things and hide any other issue.

If we ratelimit it to "at most 1 per hour", maybe something like that
would be acceptable.

But honestly, I do not understand your "first abuser only" complaint.
There should not be *any* abusers. So just fix that first one already.

If you have more than one, you have bigger issues. So what is the real
reason for this broken patch? Why didn't you fix the first one?

            Linus
Shakeel Butt June 18, 2024, 9:44 p.m. UTC | #2
On Tue, Jun 18, 2024 at 02:40:10PM GMT, Linus Torvalds wrote:
> On Tue, 18 Jun 2024 at 14:34, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> 
> NAK.
> 
> Sadly, the RATELIMIT cases are useless.
> 
> The normal rate limiting is basically "burst of up to ten, every five seconds".
> 
> That's going to completely swamp things and hide any other issue.
> 
> If we ratelimit it to "at most 1 per hour", maybe something like that
> would be acceptable.

"at most 1 per hour" sounds good.

> 
> But honestly, I do not understand your "first abuser only" complaint.
> There should not be *any* abusers. So just fix that first one already.
> 
> If you have more than one, you have bigger issues. So what is the real
> reason for this broken patch? Why didn't you fix the first one?
> 

The issue is the turnaround time to fix the first abuser and deploy the
fixed kernel to big enough fleet to find the remaining abusers. Usually
this turnaround time is in months.
Michal Hocko June 19, 2024, 7:19 a.m. UTC | #3
On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> At the moment oversize kvmalloc warnings are triggered once using
> WARN_ON_ONCE() macro. One issue with this approach is that it only
> detects the first abuser and then ignores the remaining abusers which
> complicates detecting all such abusers in a timely manner. The situation
> becomes worse when the repro has low probability and requires production
> traffic and thus require large set of machines to find such abusers. In
> Mera production, this warn once is slowing down the detection of these
> abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.

Long time ago, I've had a patch to do the once_per_callsite WARN. I
cannot find reference at the moment but it used stack depot to note
stacks that have already triggered. Back then there was no reponse on
the ML. Should I try to dig deep and recover it from my archives? I
think this is exactly kind of usecase where it would fit.

> Reported-by: Kyle McMartin <kyle@infradead.org>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/util.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 10f215985fe5..de36344e8d53 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -649,7 +649,8 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
>  
>  	/* Don't even allow crazy sizes */
>  	if (unlikely(size > INT_MAX)) {
> -		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> +		WARN_RATELIMIT(!(flags & __GFP_NOWARN), "size = %zu > INT_MAX",
> +			       size);
>  		return NULL;
>  	}
>  
> -- 
> 2.43.0
Shakeel Butt June 19, 2024, 8:03 a.m. UTC | #4
On Wed, Jun 19, 2024 at 09:19:41AM GMT, Michal Hocko wrote:
> On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> > At the moment oversize kvmalloc warnings are triggered once using
> > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > detects the first abuser and then ignores the remaining abusers which
> > complicates detecting all such abusers in a timely manner. The situation
> > becomes worse when the repro has low probability and requires production
> > traffic and thus require large set of machines to find such abusers. In
> > Mera production, this warn once is slowing down the detection of these
> > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> 
> Long time ago, I've had a patch to do the once_per_callsite WARN. I
> cannot find reference at the moment but it used stack depot to note
> stacks that have already triggered. Back then there was no reponse on
> the ML. Should I try to dig deep and recover it from my archives? I
> think this is exactly kind of usecase where it would fit.
> 

Do you mean something like warn once per unique call stack? If yes then
I think that is better than the simple ratelimiting version as
ratelimiting one may still miss some abusers and also may keep warning
about the same abuser. Please do share your patch.

Thanks,
Shakeel
Michal Hocko June 19, 2024, 8:30 a.m. UTC | #5
On Wed 19-06-24 01:03:16, Shakeel Butt wrote:
> On Wed, Jun 19, 2024 at 09:19:41AM GMT, Michal Hocko wrote:
> > On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> > > At the moment oversize kvmalloc warnings are triggered once using
> > > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > > detects the first abuser and then ignores the remaining abusers which
> > > complicates detecting all such abusers in a timely manner. The situation
> > > becomes worse when the repro has low probability and requires production
> > > traffic and thus require large set of machines to find such abusers. In
> > > Mera production, this warn once is slowing down the detection of these
> > > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> > 
> > Long time ago, I've had a patch to do the once_per_callsite WARN. I
> > cannot find reference at the moment but it used stack depot to note
> > stacks that have already triggered. Back then there was no reponse on
> > the ML. Should I try to dig deep and recover it from my archives? I
> > think this is exactly kind of usecase where it would fit.
> > 
> 
> Do you mean something like warn once per unique call stack?

Exactly!

> If yes then
> I think that is better than the simple ratelimiting version as
> ratelimiting one may still miss some abusers and also may keep warning
> about the same abuser. Please do share your patch.

https://lore.kernel.org/all/20170103134424.28123-1-mhocko@kernel.org/
Shakeel Butt June 19, 2024, 8:37 a.m. UTC | #6
On Wed, Jun 19, 2024 at 10:30:46AM GMT, Michal Hocko wrote:
> On Wed 19-06-24 01:03:16, Shakeel Butt wrote:
> > On Wed, Jun 19, 2024 at 09:19:41AM GMT, Michal Hocko wrote:
> > > On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> > > > At the moment oversize kvmalloc warnings are triggered once using
> > > > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > > > detects the first abuser and then ignores the remaining abusers which
> > > > complicates detecting all such abusers in a timely manner. The situation
> > > > becomes worse when the repro has low probability and requires production
> > > > traffic and thus require large set of machines to find such abusers. In
> > > > Mera production, this warn once is slowing down the detection of these
> > > > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> > > 
> > > Long time ago, I've had a patch to do the once_per_callsite WARN. I
> > > cannot find reference at the moment but it used stack depot to note
> > > stacks that have already triggered. Back then there was no reponse on
> > > the ML. Should I try to dig deep and recover it from my archives? I
> > > think this is exactly kind of usecase where it would fit.
> > > 
> > 
> > Do you mean something like warn once per unique call stack?
> 
> Exactly!
> 
> > If yes then
> > I think that is better than the simple ratelimiting version as
> > ratelimiting one may still miss some abusers and also may keep warning
> > about the same abuser. Please do share your patch.
> 
> https://lore.kernel.org/all/20170103134424.28123-1-mhocko@kernel.org/

Do you want to propose this patch again (after rebase to latest) or
should I take over?
Michal Hocko June 19, 2024, 8:48 a.m. UTC | #7
On Wed 19-06-24 10:30:46, Michal Hocko wrote:
> On Wed 19-06-24 01:03:16, Shakeel Butt wrote:
> > On Wed, Jun 19, 2024 at 09:19:41AM GMT, Michal Hocko wrote:
> > > On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> > > > At the moment oversize kvmalloc warnings are triggered once using
> > > > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > > > detects the first abuser and then ignores the remaining abusers which
> > > > complicates detecting all such abusers in a timely manner. The situation
> > > > becomes worse when the repro has low probability and requires production
> > > > traffic and thus require large set of machines to find such abusers. In
> > > > Mera production, this warn once is slowing down the detection of these
> > > > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> > > 
> > > Long time ago, I've had a patch to do the once_per_callsite WARN. I
> > > cannot find reference at the moment but it used stack depot to note
> > > stacks that have already triggered. Back then there was no reponse on
> > > the ML. Should I try to dig deep and recover it from my archives? I
> > > think this is exactly kind of usecase where it would fit.
> > > 
> > 
> > Do you mean something like warn once per unique call stack?
> 
> Exactly!
> 
> > If yes then
> > I think that is better than the simple ratelimiting version as
> > ratelimiting one may still miss some abusers and also may keep warning
> > about the same abuser. Please do share your patch.
> 
> https://lore.kernel.org/all/20170103134424.28123-1-mhocko@kernel.org/

Btw. the code has changed a lot since 2017 when this was posted so it
will likely need a lot of massaging to rebase. Also I am not entirely
sure it is ok to change WARN_ONCE semantic like that anymore. Maybe we
need an explicit variant that does this per-call-site warnings. It is a
notable difference between library functions which can be called from
different callpaths and those that are used from a single place. I do
not have much time to dig deeper into this but if you want to take over
then go ahead. I still think this is a useful WARN_ONCE or in general
do_something_once semantic.
Mateusz Guzik June 19, 2024, 12:49 p.m. UTC | #8
On Tue, Jun 18, 2024 at 02:34:21PM -0700, Shakeel Butt wrote:
> At the moment oversize kvmalloc warnings are triggered once using
> WARN_ON_ONCE() macro. One issue with this approach is that it only
> detects the first abuser and then ignores the remaining abusers which
> complicates detecting all such abusers in a timely manner. The situation
> becomes worse when the repro has low probability and requires production
> traffic and thus require large set of machines to find such abusers. In
> Mera production, this warn once is slowing down the detection of these
> abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> 
> Reported-by: Kyle McMartin <kyle@infradead.org>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/util.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 10f215985fe5..de36344e8d53 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -649,7 +649,8 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
>  
>  	/* Don't even allow crazy sizes */
>  	if (unlikely(size > INT_MAX)) {
> -		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> +		WARN_RATELIMIT(!(flags & __GFP_NOWARN), "size = %zu > INT_MAX",
> +			       size);
>  		return NULL;
>  	}
>  

I don't think this is necessary. From the description I think interested
parties can get away with bpftrace.

Suppose you have an abuser of the sort and you are worried there is more
than one.

Then this one-liner will catch *all* of them, not just the ones which
were "lucky" to get logged with ratelimit:
bpftrace -e 'kprobe:kvmalloc_node_noprof /arg0 > 2147483647/ { @[kstack()] = count(); }'

Of course adding a probe is not free, but then again kvmalloc should not
be used often to begin with so I doubt it is going to have material
impact in terms of performance.

While I concede it takes more effort to get this running on all affected
machines, the result is much better than mere ratelimit. Also there is
no need to patch the kernel.

btw, I found drm keeps spamming kvmalloc, someone(tm) should look into
it:
@[
    kvmalloc_node_noprof+5
    drm_property_create_blob+76
    drm_atomic_helper_dirtyfb+234
    drm_fbdev_generic_helper_fb_dirty+509
    drm_fb_helper_damage_work+139
    process_one_work+376
    worker_thread+753
    kthread+207
    ret_from_fork+49
    ret_from_fork_asm+26
, 104]: 12
Mateusz Guzik June 19, 2024, 12:51 p.m. UTC | #9
On Wed, Jun 19, 2024 at 2:49 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 02:34:21PM -0700, Shakeel Butt wrote:
> > At the moment oversize kvmalloc warnings are triggered once using
> > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > detects the first abuser and then ignores the remaining abusers which
> > complicates detecting all such abusers in a timely manner. The situation
> > becomes worse when the repro has low probability and requires production
> > traffic and thus require large set of machines to find such abusers. In
> > Mera production, this warn once is slowing down the detection of these
> > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> >
> > Reported-by: Kyle McMartin <kyle@infradead.org>
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >  mm/util.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/util.c b/mm/util.c
> > index 10f215985fe5..de36344e8d53 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -649,7 +649,8 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
> >
> >       /* Don't even allow crazy sizes */
> >       if (unlikely(size > INT_MAX)) {
> > -             WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> > +             WARN_RATELIMIT(!(flags & __GFP_NOWARN), "size = %zu > INT_MAX",
> > +                            size);
> >               return NULL;
> >       }
> >
>
> I don't think this is necessary. From the description I think interested
> parties can get away with bpftrace.
>
> Suppose you have an abuser of the sort and you are worried there is more
> than one.
>
> Then this one-liner will catch *all* of them, not just the ones which
> were "lucky" to get logged with ratelimit:
> bpftrace -e 'kprobe:kvmalloc_node_noprof /arg0 > 2147483647/ { @[kstack()] = count(); }'
>
> Of course adding a probe is not free, but then again kvmalloc should not
> be used often to begin with so I doubt it is going to have material
> impact in terms of performance.
>
> While I concede it takes more effort to get this running on all affected
> machines, the result is much better than mere ratelimit. Also there is
> no need to patch the kernel.
>
> btw, I found drm keeps spamming kvmalloc, someone(tm) should look into
> it:
> @[
>     kvmalloc_node_noprof+5
>     drm_property_create_blob+76
>     drm_atomic_helper_dirtyfb+234
>     drm_fbdev_generic_helper_fb_dirty+509
>     drm_fb_helper_damage_work+139
>     process_one_work+376
>     worker_thread+753
>     kthread+207
>     ret_from_fork+49
>     ret_from_fork_asm+26
> , 104]: 12

I should clarify this is allocs of 104 bytes, not some outlandish size.
Shakeel Butt June 19, 2024, 5:47 p.m. UTC | #10
On Wed, Jun 19, 2024 at 10:48:07AM +0200, Michal Hocko wrote:
> On Wed 19-06-24 10:30:46, Michal Hocko wrote:
> > On Wed 19-06-24 01:03:16, Shakeel Butt wrote:
> > > On Wed, Jun 19, 2024 at 09:19:41AM GMT, Michal Hocko wrote:
> > > > On Tue 18-06-24 14:34:21, Shakeel Butt wrote:
> > > > > At the moment oversize kvmalloc warnings are triggered once using
> > > > > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > > > > detects the first abuser and then ignores the remaining abusers which
> > > > > complicates detecting all such abusers in a timely manner. The situation
> > > > > becomes worse when the repro has low probability and requires production
> > > > > traffic and thus require large set of machines to find such abusers. In
> > > > > Mera production, this warn once is slowing down the detection of these
> > > > > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> > > > 
> > > > Long time ago, I've had a patch to do the once_per_callsite WARN. I
> > > > cannot find reference at the moment but it used stack depot to note
> > > > stacks that have already triggered. Back then there was no reponse on
> > > > the ML. Should I try to dig deep and recover it from my archives? I
> > > > think this is exactly kind of usecase where it would fit.
> > > > 
> > > 
> > > Do you mean something like warn once per unique call stack?
> > 
> > Exactly!
> > 
> > > If yes then
> > > I think that is better than the simple ratelimiting version as
> > > ratelimiting one may still miss some abusers and also may keep warning
> > > about the same abuser. Please do share your patch.
> > 
> > https://lore.kernel.org/all/20170103134424.28123-1-mhocko@kernel.org/
> 
> Btw. the code has changed a lot since 2017 when this was posted so it
> will likely need a lot of massaging to rebase. Also I am not entirely
> sure it is ok to change WARN_ONCE semantic like that anymore. Maybe we
> need an explicit variant that does this per-call-site warnings. It is a
> notable difference between library functions which can be called from
> different callpaths and those that are used from a single place. I do
> not have much time to dig deeper into this but if you want to take over
> then go ahead. I still think this is a useful WARN_ONCE or in general
> do_something_once semantic.

I think a separate variant like WARN_UNIQUE() would be better. I will
look into this.

Linus, please let me know if you have any concerns on the approach
Michal is suggesting i.e. a variant for warn once for unique call stack.

Shakeel
Shakeel Butt June 19, 2024, 5:52 p.m. UTC | #11
On Wed, Jun 19, 2024 at 02:49:21PM +0200, Mateusz Guzik wrote:
> On Tue, Jun 18, 2024 at 02:34:21PM -0700, Shakeel Butt wrote:
> > At the moment oversize kvmalloc warnings are triggered once using
> > WARN_ON_ONCE() macro. One issue with this approach is that it only
> > detects the first abuser and then ignores the remaining abusers which
> > complicates detecting all such abusers in a timely manner. The situation
> > becomes worse when the repro has low probability and requires production
> > traffic and thus require large set of machines to find such abusers. In
> > Mera production, this warn once is slowing down the detection of these
> > abusers. Simply replace WARN_ON_ONCE with WARN_RATELIMIT.
> > 
> > Reported-by: Kyle McMartin <kyle@infradead.org>
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >  mm/util.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/util.c b/mm/util.c
> > index 10f215985fe5..de36344e8d53 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -649,7 +649,8 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
> >  
> >  	/* Don't even allow crazy sizes */
> >  	if (unlikely(size > INT_MAX)) {
> > -		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
> > +		WARN_RATELIMIT(!(flags & __GFP_NOWARN), "size = %zu > INT_MAX",
> > +			       size);
> >  		return NULL;
> >  	}
> >  
> 
> I don't think this is necessary. From the description I think interested
> parties can get away with bpftrace.
> 
> Suppose you have an abuser of the sort and you are worried there is more
> than one.
> 
> Then this one-liner will catch *all* of them, not just the ones which
> were "lucky" to get logged with ratelimit:
> bpftrace -e 'kprobe:kvmalloc_node_noprof /arg0 > 2147483647/ { @[kstack()] = count(); }'
> 
> Of course adding a probe is not free, but then again kvmalloc should not
> be used often to begin with so I doubt it is going to have material
> impact in terms of performance.
> 
> While I concede it takes more effort to get this running on all affected
> machines, the result is much better than mere ratelimit. Also there is
> no need to patch the kernel.
> 

Thanks for the response and suggestion. I am inclined towards warn once
for each unique stack trace as suggested by Michal because I think it
would be useful in general.
Linus Torvalds June 19, 2024, 7:30 p.m. UTC | #12
On Wed, 19 Jun 2024 at 10:47, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> Linus, please let me know if you have any concerns on the approach
> Michal is suggesting i.e. a variant for warn once for unique call stack.

I think we should just try to change the existing WARN_ONCE(), and see
if it causes any issues.

A new "WARN_UNIQUE()" might be the borign and safe approach, but

 (a) it won't actually be unique if you don't have stackdepot anyway,
and will just be WARN_ONCE

 (b) I suspect most WARN_ONCE users really do want WARN_UNIQUE

so let's at least _start_ with just changing semantics of the existing
"once", and then if it causes problems we'll have to revisit this.

I doubt it will cause problems,

                Linus
Michal Hocko June 19, 2024, 7:54 p.m. UTC | #13
On Wed 19-06-24 12:30:42, Linus Torvalds wrote:
> On Wed, 19 Jun 2024 at 10:47, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > Linus, please let me know if you have any concerns on the approach
> > Michal is suggesting i.e. a variant for warn once for unique call stack.
> 
> I think we should just try to change the existing WARN_ONCE(), and see
> if it causes any issues.
> 
> A new "WARN_UNIQUE()" might be the borign and safe approach, but
> 
>  (a) it won't actually be unique if you don't have stackdepot anyway,
> and will just be WARN_ONCE
> 
>  (b) I suspect most WARN_ONCE users really do want WARN_UNIQUE
> 
> so let's at least _start_ with just changing semantics of the existing
> "once", and then if it causes problems we'll have to revisit this.
> 
> I doubt it will cause problems,

I would be careful about the WARN_ONCE used from stackdepot itself. It's
been some time since I have looked into that code but a quick grep tells
there is some usage.
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index 10f215985fe5..de36344e8d53 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -649,7 +649,8 @@  void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
 
 	/* Don't even allow crazy sizes */
 	if (unlikely(size > INT_MAX)) {
-		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
+		WARN_RATELIMIT(!(flags & __GFP_NOWARN), "size = %zu > INT_MAX",
+			       size);
 		return NULL;
 	}