diff mbox series

rhashtable: Better error message on allocation failure

Message ID 20231123235949.421106-1-kent.overstreet@linux.dev (mailing list archive)
State Changes Requested
Headers show
Series rhashtable: Better error message on allocation failure | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Kent Overstreet Nov. 23, 2023, 11:59 p.m. UTC
Memory allocation failures print backtraces by default, but when we're
running out of a rhashtable worker the backtrace is useless - it doesn't
tell us which hashtable the allocation failure was for.

This adds a dedicated warning that prints out functions from the
rhashtable params, which will be a bit more useful.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 lib/rhashtable.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

David Laight Nov. 25, 2023, 3:23 p.m. UTC | #1
From: Kent Overstreet
> Sent: 24 November 2023 00:00
> 
> Memory allocation failures print backtraces by default, but when we're
> running out of a rhashtable worker the backtrace is useless - it doesn't
> tell us which hashtable the allocation failure was for.
> 
> This adds a dedicated warning that prints out functions from the
> rhashtable params, which will be a bit more useful.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
>  lib/rhashtable.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 6ae2ba8e06a2..d3fce9c8989a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -360,9 +360,14 @@ static int rhashtable_rehash_alloc(struct rhashtable *ht,
> 
>  	ASSERT_RHT_MUTEX(ht);
> 
> -	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> -	if (new_tbl == NULL)
> +	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL|__GFP_NOWARN);
> +	if (new_tbl == NULL) {
> +		WARN("rhashtable bucket table allocation failure for %ps",

Won't WARN() be a panic on systems with PANICK_ON_WARN set?

> +		     (void *) ht->p.hashfn ?:
> +		     (void *) ht->p.obj_hashfn ?:
> +		     (void *) ht->p.obj_cmpfn);

That layout is horrid (and I bet checkpatch complains).
You only actually need one (void *) cast on the RH value:
			ht->p.hashfn ?: ht->p.obj_hashfn ?: (void *)ht->p.obj_cmpfn

	David


>  		return -ENOMEM;
> +	}
> 
>  	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
>  	if (err)
> --
> 2.42.0
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Jakub Kicinski Nov. 29, 2023, 1:35 a.m. UTC | #2
On Sat, 25 Nov 2023 15:23:49 +0000 David Laight wrote:
> > +	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL|__GFP_NOWARN);
> > +	if (new_tbl == NULL) {
> > +		WARN("rhashtable bucket table allocation failure for %ps",  
> 
> Won't WARN() be a panic on systems with PANICK_ON_WARN set?

Yes, that's problematic :(
Let's leave out the GFP_NOWARN and add a pr_warn() instead of
the WARN()?
Kent Overstreet Nov. 29, 2023, 1:57 a.m. UTC | #3
On Tue, Nov 28, 2023 at 05:35:36PM -0800, Jakub Kicinski wrote:
> On Sat, 25 Nov 2023 15:23:49 +0000 David Laight wrote:
> > > +	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL|__GFP_NOWARN);
> > > +	if (new_tbl == NULL) {
> > > +		WARN("rhashtable bucket table allocation failure for %ps",  
> > 
> > Won't WARN() be a panic on systems with PANICK_ON_WARN set?
> 
> Yes, that's problematic :(
> Let's leave out the GFP_NOWARN and add a pr_warn() instead of
> the WARN()?

pr_warn() instead of WARN() is fine, but the stack trace from
warn_alloc() will be entirely useless.

Perhaps if we had a GFP flag to just suppress the backtrace in
warn_alloc() - we could even stash a backtrace in the rhashtable at
rhashtable_init() time, if we want to print out a more useful one.
Jakub Kicinski Nov. 29, 2023, 2:15 a.m. UTC | #4
On Tue, 28 Nov 2023 20:57:05 -0500 Kent Overstreet wrote:
> > Yes, that's problematic :(
> > Let's leave out the GFP_NOWARN and add a pr_warn() instead of
> > the WARN()?  
> 
> pr_warn() instead of WARN() is fine, but the stack trace from
> warn_alloc() will be entirely useless.
> 
> Perhaps if we had a GFP flag to just suppress the backtrace in
> warn_alloc() - we could even stash a backtrace in the rhashtable at
> rhashtable_init() time, if we want to print out a more useful one.

Interesting idea, up to you how far down the rabbit hole you're
willing to go, really :)

Stating the obvious but would be good to add to the commit message,
if you decide to implement this, how many rht instances there are
on a sample system, IOW how much memory we expect the stacks to burn.
David Laight Nov. 29, 2023, 9:20 a.m. UTC | #5
From: Jakub Kicinski
> Sent: 29 November 2023 02:15
> 
> On Tue, 28 Nov 2023 20:57:05 -0500 Kent Overstreet wrote:
> > > Yes, that's problematic :(
> > > Let's leave out the GFP_NOWARN and add a pr_warn() instead of
> > > the WARN()?
> >
> > pr_warn() instead of WARN() is fine, but the stack trace from
> > warn_alloc() will be entirely useless.
> >
> > Perhaps if we had a GFP flag to just suppress the backtrace in
> > warn_alloc() - we could even stash a backtrace in the rhashtable at
> > rhashtable_init() time, if we want to print out a more useful one.
> 
> Interesting idea, up to you how far down the rabbit hole you're
> willing to go, really :)
> 
> Stating the obvious but would be good to add to the commit message,
> if you decide to implement this, how many rht instances there are
> on a sample system, IOW how much memory we expect the stacks to burn.

It's not really memory, just 'junk' in the console buffer.
But completely supressing the traceback from warn_alloc() (et al)
would give absolutely no indication of what failed.
You wouldn't even know it was a call from rhashtable().

Actually I'd have thought the traceback would show where the hash table
was being allocated - which would (mostly) tell you which one.
(Unless they were being allocated by a worker thread - which seems unlikely.)

IIRC the traceback includes the cpu registers (and code??) they probably
are noise for 'controlled' errors like kmalloc failing.

Is the whole extra trace really worthwhile at all?
How often does kmalloc() really fail?
Although if the code recovers by using a smaller table then maybe
that might be worth a trace instead of the one from kmalloc().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ae2ba8e06a2..d3fce9c8989a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -360,9 +360,14 @@  static int rhashtable_rehash_alloc(struct rhashtable *ht,
 
 	ASSERT_RHT_MUTEX(ht);
 
-	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
-	if (new_tbl == NULL)
+	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL|__GFP_NOWARN);
+	if (new_tbl == NULL) {
+		WARN("rhashtable bucket table allocation failure for %ps",
+		     (void *) ht->p.hashfn ?:
+		     (void *) ht->p.obj_hashfn ?:
+		     (void *) ht->p.obj_cmpfn);
 		return -ENOMEM;
+	}
 
 	err = rhashtable_rehash_attach(ht, old_tbl, new_tbl);
 	if (err)