diff mbox series

[04/13] tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()

Message ID 20230201150815.409582-5-urezki@gmail.com (mailing list archive)
State Accepted
Commit 74ad5a1bf0685861d0a3d2871148c9bab642a00a
Headers show
Series Rename k[v]free_rcu() single argument to k[v]free_rcu_mightsleep() | expand

Commit Message

Uladzislau Rezki Feb. 1, 2023, 3:08 p.m. UTC
The kvfree_rcu()'s single argument name is deprecated therefore
rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
underline that it is for sleepable contexts.

Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/trace/trace_osnoise.c | 2 +-
 kernel/trace/trace_probe.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Uladzislau Rezki March 9, 2023, 1:45 p.m. UTC | #1
> The kvfree_rcu()'s single argument name is deprecated therefore
> rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> underline that it is for sleepable contexts.
> 
> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>
Could you please add you reviwed-by or Acked-by tags so we can bring
our series with renaming for the next merge window?

Thanks!

--
Uladzisslau Rezki
Steven Rostedt March 15, 2023, 10:36 p.m. UTC | #2
On Thu, 9 Mar 2023 14:45:21 +0100
Uladzislau Rezki <urezki@gmail.com> wrote:

> > The kvfree_rcu()'s single argument name is deprecated therefore
> > rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> > underline that it is for sleepable contexts.
> > 
> > Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> >  
> Could you please add you reviwed-by or Acked-by tags so we can bring
> our series with renaming for the next merge window?

I don't know. Perhaps we should just apply this patch and not worry about
sleeping and whatnot.

-- Steve

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 04f0fdae19a1..5de945a8f61d 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -76,6 +76,7 @@ static unsigned long osnoise_options	= OSN_DEFAULT_OPTIONS;
 struct osnoise_instance {
 	struct list_head	list;
 	struct trace_array	*tr;
+	struct rcu_head		rcu;
 };
 
 static struct list_head osnoise_instances;
@@ -159,7 +160,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
 	if (!found)
 		return;
 
-	kvfree_rcu(inst);
+	kvfree_rcu(inst, rcu);
 }
 
 /*
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 20d0c4a97633..ef5fafb40c76 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
 		return -ENOENT;
 
 	list_del_rcu(&link->list);
-	kvfree_rcu(link);
+	kvfree_rcu(link, rcu);
 
 	if (list_empty(&tp->event->files))
 		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index ef8ed3b65d05..e6037752dcf0 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -256,6 +256,7 @@ struct trace_probe {
 struct event_file_link {
 	struct trace_event_file		*file;
 	struct list_head		list;
+	struct rcu_head			rcu;
 };
 
 static inline bool trace_probe_test_flag(struct trace_probe *tp,
Jens Axboe March 15, 2023, 11:19 p.m. UTC | #3
On 3/15/23 4:36 PM, Steven Rostedt wrote:
> On Thu, 9 Mar 2023 14:45:21 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
>>> The kvfree_rcu()'s single argument name is deprecated therefore
>>> rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
>>> underline that it is for sleepable contexts.
>>>
>>> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>>>  
>> Could you please add you reviwed-by or Acked-by tags so we can bring
>> our series with renaming for the next merge window?
> 
> I don't know. Perhaps we should just apply this patch and not worry about
> sleeping and whatnot.

That's a cop out, just removing the one case you care about. Fact is
the naming is awful, and the 1/2 argument thing is making it worse.
If a big change is warranted, why not do it right and ACTUALLY
get it right?
Paul E. McKenney March 16, 2023, 12:37 a.m. UTC | #4
On Wed, Mar 15, 2023 at 05:19:18PM -0600, Jens Axboe wrote:
> On 3/15/23 4:36 PM, Steven Rostedt wrote:
> > On Thu, 9 Mar 2023 14:45:21 +0100
> > Uladzislau Rezki <urezki@gmail.com> wrote:
> > 
> >>> The kvfree_rcu()'s single argument name is deprecated therefore
> >>> rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> >>> underline that it is for sleepable contexts.
> >>>
> >>> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> >>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> >>>  
> >> Could you please add you reviwed-by or Acked-by tags so we can bring
> >> our series with renaming for the next merge window?
> > 
> > I don't know. Perhaps we should just apply this patch and not worry about
> > sleeping and whatnot.

That does work, and I am guessing that the size increase is not a big
problem for you there.

> That's a cop out, just removing the one case you care about. Fact is
> the naming is awful, and the 1/2 argument thing is making it worse.
> If a big change is warranted, why not do it right and ACTUALLY
> get it right?

You both do realize that the kvfree_rcu_mightsleep() definition is
already in mainline, right?

Anyway, to sum up, kvfree_rcu_mightsleep()--or whatever the entire
community eventually decides to name it--can do any of the following:

1.	Put the pointer into an already allocated array of pointers.

2.	Allocate a new array of pointers, have the allocation succeed
	without sleeping, then put the pointer into an already allocated
	array of pointers.

3.	Allocate a new array of pointers, have the allocation succeed
	after sleeping, then put the pointer into an already allocated
	array of pointers.

4.	Attempt to allocate a new array of pointers, have the allocation
	fail (presumably after sleeping), then invoke synchronize_rcu()
	directly.

Too much fun!  ;-)

							Thanx, Paul
Steven Rostedt March 16, 2023, 2:23 a.m. UTC | #5
On Wed, 15 Mar 2023 17:37:30 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> 
> That does work, and I am guessing that the size increase is not a big
> problem for you there.

Well, I was fine with it as long as it stayed in the headers, where
ugliness is warmly welcomed. Just ask all the #ifdefs.

> 
> > That's a cop out, just removing the one case you care about. Fact is
> > the naming is awful, and the 1/2 argument thing is making it worse.
> > If a big change is warranted, why not do it right and ACTUALLY
> > get it right?  
> 
> You both do realize that the kvfree_rcu_mightsleep() definition is
> already in mainline, right?
> 
> Anyway, to sum up, kvfree_rcu_mightsleep()--or whatever the entire
> community eventually decides to name it--can do any of the following:
> 
> 1.	Put the pointer into an already allocated array of pointers.
> 
> 2.	Allocate a new array of pointers, have the allocation succeed
> 	without sleeping, then put the pointer into an already allocated
> 	array of pointers.
> 
> 3.	Allocate a new array of pointers, have the allocation succeed
> 	after sleeping, then put the pointer into an already allocated
> 	array of pointers.
> 
> 4.	Attempt to allocate a new array of pointers, have the allocation
> 	fail (presumably after sleeping), then invoke synchronize_rcu()
> 	directly.
> 
> Too much fun!  ;-)
> 

  kvfree_rcu_kitchen_sink() ?

  kvfree_rcu_goldie_locks()?

I honestly like the name "headless" as that perfectly describes the
difference between kvfree_rcu(arg1, arg2) and kvfree_rcu(arg1).

Whereas mightsleep() is confusing to me because it doesn't tell me why
kvfree_rcu() has two args and kvfree_rcu_mightsleep() has only one.
Usually, code that has two sleep variants is about limiting the
functionality of the atomic friendly one.

-- Steve
Paul E. McKenney March 16, 2023, 3:44 a.m. UTC | #6
On Wed, Mar 15, 2023 at 10:23:23PM -0400, Steven Rostedt wrote:
> On Wed, 15 Mar 2023 17:37:30 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > 
> > That does work, and I am guessing that the size increase is not a big
> > problem for you there.
> 
> Well, I was fine with it as long as it stayed in the headers, where
> ugliness is warmly welcomed. Just ask all the #ifdefs.
> 
> > 
> > > That's a cop out, just removing the one case you care about. Fact is
> > > the naming is awful, and the 1/2 argument thing is making it worse.
> > > If a big change is warranted, why not do it right and ACTUALLY
> > > get it right?  
> > 
> > You both do realize that the kvfree_rcu_mightsleep() definition is
> > already in mainline, right?
> > 
> > Anyway, to sum up, kvfree_rcu_mightsleep()--or whatever the entire
> > community eventually decides to name it--can do any of the following:
> > 
> > 1.	Put the pointer into an already allocated array of pointers.
> > 
> > 2.	Allocate a new array of pointers, have the allocation succeed
> > 	without sleeping, then put the pointer into an already allocated
> > 	array of pointers.
> > 
> > 3.	Allocate a new array of pointers, have the allocation succeed
> > 	after sleeping, then put the pointer into an already allocated
> > 	array of pointers.
> > 
> > 4.	Attempt to allocate a new array of pointers, have the allocation
> > 	fail (presumably after sleeping), then invoke synchronize_rcu()
> > 	directly.
> > 
> > Too much fun!  ;-)
> > 
> 
>   kvfree_rcu_kitchen_sink() ?
> 
>   kvfree_rcu_goldie_locks()?
> 
> I honestly like the name "headless" as that perfectly describes the
> difference between kvfree_rcu(arg1, arg2) and kvfree_rcu(arg1).
> 
> Whereas mightsleep() is confusing to me because it doesn't tell me why
> kvfree_rcu() has two args and kvfree_rcu_mightsleep() has only one.
> Usually, code that has two sleep variants is about limiting the
> functionality of the atomic friendly one.

	kvfree_rcu_alloc_head()?
	kvfree_rcu_dynhead()?
	kvfree_rcu_gearhead()?
	kvfree_rcu_radiohead()?
	kvfree_rcu_getahead()?

I don't know about you guys, but to me, kvfree_rcu_mightsleep() is
sounding better and better by comparison...

							Thanx, Paul
Joel Fernandes March 16, 2023, 4:16 a.m. UTC | #7
On Wed, Mar 15, 2023 at 11:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Mar 15, 2023 at 10:23:23PM -0400, Steven Rostedt wrote:
> > On Wed, 15 Mar 2023 17:37:30 -0700
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >
> > >
> > > That does work, and I am guessing that the size increase is not a big
> > > problem for you there.
> >
> > Well, I was fine with it as long as it stayed in the headers, where
> > ugliness is warmly welcomed. Just ask all the #ifdefs.
> >
> > >
> > > > That's a cop out, just removing the one case you care about. Fact is
> > > > the naming is awful, and the 1/2 argument thing is making it worse.
> > > > If a big change is warranted, why not do it right and ACTUALLY
> > > > get it right?
> > >
> > > You both do realize that the kvfree_rcu_mightsleep() definition is
> > > already in mainline, right?
> > >
> > > Anyway, to sum up, kvfree_rcu_mightsleep()--or whatever the entire
> > > community eventually decides to name it--can do any of the following:
> > >
> > > 1.  Put the pointer into an already allocated array of pointers.
> > >
> > > 2.  Allocate a new array of pointers, have the allocation succeed
> > >     without sleeping, then put the pointer into an already allocated
> > >     array of pointers.
> > >
> > > 3.  Allocate a new array of pointers, have the allocation succeed
> > >     after sleeping, then put the pointer into an already allocated
> > >     array of pointers.
> > >
> > > 4.  Attempt to allocate a new array of pointers, have the allocation
> > >     fail (presumably after sleeping), then invoke synchronize_rcu()
> > >     directly.
> > >
> > > Too much fun!  ;-)
> > >
> >
> >   kvfree_rcu_kitchen_sink() ?
> >
> >   kvfree_rcu_goldie_locks()?
> >
> > I honestly like the name "headless" as that perfectly describes the
> > difference between kvfree_rcu(arg1, arg2) and kvfree_rcu(arg1).
> >
> > Whereas mightsleep() is confusing to me because it doesn't tell me why
> > kvfree_rcu() has two args and kvfree_rcu_mightsleep() has only one.
> > Usually, code that has two sleep variants is about limiting the
> > functionality of the atomic friendly one.
>
>         kvfree_rcu_alloc_head()?
>         kvfree_rcu_dynhead()?
>         kvfree_rcu_gearhead()?
>         kvfree_rcu_radiohead()?
>         kvfree_rcu_getahead()?
>
> I don't know about you guys, but to me, kvfree_rcu_mightsleep() is
> sounding better and better by comparison...

Indeed, and one could argue that "headless" sounds like something out
of a horror movie ;-). Which of course does match the situation when
the API is applied incorrectly.

 - Joel
Uladzislau Rezki March 16, 2023, 8:16 a.m. UTC | #8
> On Thu, 9 Mar 2023 14:45:21 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > > The kvfree_rcu()'s single argument name is deprecated therefore
> > > rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> > > underline that it is for sleepable contexts.
> > > 
> > > Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > >  
> > Could you please add you reviwed-by or Acked-by tags so we can bring
> > our series with renaming for the next merge window?
> 
> I don't know. Perhaps we should just apply this patch and not worry about
> sleeping and whatnot.
> 
> -- Steve
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 04f0fdae19a1..5de945a8f61d 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -76,6 +76,7 @@ static unsigned long osnoise_options	= OSN_DEFAULT_OPTIONS;
>  struct osnoise_instance {
>  	struct list_head	list;
>  	struct trace_array	*tr;
> +	struct rcu_head		rcu;
>  };
>  
>  static struct list_head osnoise_instances;
> @@ -159,7 +160,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
>  	if (!found)
>  		return;
>  
> -	kvfree_rcu(inst);
> +	kvfree_rcu(inst, rcu);
>  }
>  
>  /*
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 20d0c4a97633..ef5fafb40c76 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
>  		return -ENOENT;
>  
>  	list_del_rcu(&link->list);
> -	kvfree_rcu(link);
> +	kvfree_rcu(link, rcu);
>  
>  	if (list_empty(&tp->event->files))
>  		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index ef8ed3b65d05..e6037752dcf0 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -256,6 +256,7 @@ struct trace_probe {
>  struct event_file_link {
>  	struct trace_event_file		*file;
>  	struct list_head		list;
> +	struct rcu_head			rcu;
>  };
>  
>  static inline bool trace_probe_test_flag(struct trace_probe *tp,
>
struct foo_a {
  int a;
  int b;
};

your obj size is 8 byte

struct foo_b {
  struct rcu_head rcu;
  int a;
  int b;
};

now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
will be 32 bytes since there is no slab for 24 bytes:

<snip>
  kmalloc-32         19840  19840     32  128    1 : tunables    0    0    0 : slabdata    155    155      0
  kmalloc-16         28857  28928     16  256    1 : tunables    0    0    0 : slabdata    113    113      0
  kmalloc-8          37376  37376      8  512    1 : tunables    0    0    0 : slabdata     73     73      0
<snip>

if we allocate 512 objects of foo_a it would be 4096 bytes
in case of foo_b it is 24 * 512 = 12228 bytes.

single argument will give you 4096 + 512 * 8 = 8192 bytes
int terms of memory consumtion.

And double argument will not give you better performance comparing
with a single argument.

So it depends on what you want to achieve by that patch.

--
Uladzislau Rezki
Steven Rostedt March 16, 2023, 12:14 p.m. UTC | #9
On Thu, 16 Mar 2023 00:16:39 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Indeed, and one could argue that "headless" sounds like something out
> of a horror movie ;-). Which of course does match the situation when
> the API is applied incorrectly.

Well, "headless" is a common term in IT.

   https://en.wikipedia.org/wiki/Headless_software


We could be specific to what horror movie/story, and call it:

  kvfree_rcu_sleepy_hollow()

Which will imply both headless *and* might_sleep!

-- Steve
Steven Rostedt March 16, 2023, 1:56 p.m. UTC | #10
On Thu, 16 Mar 2023 09:16:37 +0100
Uladzislau Rezki <urezki@gmail.com> wrote:

> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index ef8ed3b65d05..e6037752dcf0 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -256,6 +256,7 @@ struct trace_probe {
> >  struct event_file_link {
> >  	struct trace_event_file		*file;
> >  	struct list_head		list;
> > +	struct rcu_head			rcu;
> >  };
> >  
> >  static inline bool trace_probe_test_flag(struct trace_probe *tp,
> >  
> struct foo_a {
>   int a;
>   int b;
> };

Most machines today are 64 bits, even low end machines.

 struct foo_a {
	long long a;
	long long b;
 };

is more accurate. That's 16 bytes.

Although it is more likely off because list_head is a double pointer. But
let's just go with this, as the amount really doesn't matter here.

> 
> your obj size is 8 byte
> 
> struct foo_b {
>   struct rcu_head rcu;

Isn't rcu_head defined as;

struct callback_head {
        struct callback_head *next;
        void (*func)(struct callback_head *head);
} __attribute__((aligned(sizeof(void *))));
#define rcu_head callback_head

Which makes it 8 not 16 on 32 bit as well?

>   int a;
>   int b;
> };

So it should be 8 + 8 = 16, on 32 bit and 16 + 16 = 32 on 64bit.

> 
> now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
> will be 32 bytes since there is no slab for 24 bytes:
> 
> <snip>
>   kmalloc-32         19840  19840     32  128    1 : tunables    0    0    0 : slabdata    155    155      0
>   kmalloc-16         28857  28928     16  256    1 : tunables    0    0    0 : slabdata    113    113      0
>   kmalloc-8          37376  37376      8  512    1 : tunables    0    0    0 : slabdata     73     73      0
> <snip>
> 
> if we allocate 512 objects of foo_a it would be 4096 bytes
> in case of foo_b it is 24 * 512 = 12228 bytes.

This is for probe events. We usually allocate 1, maybe 2. Oh, some may even
allocate 100 to be crazy. But each probe event is in reality much larger
(1K perhaps) as each one allocates dentry's, inodes, etc. So 8 or 16 bytes
extra is still lost in the noise.

> 
> single argument will give you 4096 + 512 * 8 = 8192 bytes
> int terms of memory consumtion.

If someone allocate 512 instances, that would be closer to a meg in size
without this change. 8k is probably less than 1%

> 
> And double argument will not give you better performance comparing
> with a single argument.

It will, because it will no longer have to allocate anything if need be.
Note, when it doesn't allocate the system is probably mostly idle and we
don't care about performance, but when it needs allocation, that's likely a
time when performance is a bit more important.

-- Steve
Paul E. McKenney March 16, 2023, 2:56 p.m. UTC | #11
On Thu, Mar 16, 2023 at 08:14:24AM -0400, Steven Rostedt wrote:
> On Thu, 16 Mar 2023 00:16:39 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Indeed, and one could argue that "headless" sounds like something out
> > of a horror movie ;-). Which of course does match the situation when
> > the API is applied incorrectly.
> 
> Well, "headless" is a common term in IT.
> 
>    https://en.wikipedia.org/wiki/Headless_software

Indeed it is.  But RCU is incapable of headlessness, so in the
kvfree_rcu_mightsleep() case, the rcu_head is dynamically allocated.

> We could be specific to what horror movie/story, and call it:
> 
>   kvfree_rcu_sleepy_hollow()
> 
> Which will imply both headless *and* might_sleep!

Heh!  That one is almost bad enough to be good!  Almost!  ;-)

							Thanx, Paul
Uladzislau Rezki March 16, 2023, 3:05 p.m. UTC | #12
On Thu, Mar 16, 2023 at 09:56:53AM -0400, Steven Rostedt wrote:
> On Thu, 16 Mar 2023 09:16:37 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > > index ef8ed3b65d05..e6037752dcf0 100644
> > > --- a/kernel/trace/trace_probe.h
> > > +++ b/kernel/trace/trace_probe.h
> > > @@ -256,6 +256,7 @@ struct trace_probe {
> > >  struct event_file_link {
> > >  	struct trace_event_file		*file;
> > >  	struct list_head		list;
> > > +	struct rcu_head			rcu;
> > >  };
> > >  
> > >  static inline bool trace_probe_test_flag(struct trace_probe *tp,
> > >  
> > struct foo_a {
> >   int a;
> >   int b;
> > };
> 
> Most machines today are 64 bits, even low end machines.
> 
>  struct foo_a {
> 	long long a;
> 	long long b;
>  };
> 
> is more accurate. That's 16 bytes.
> 
> Although it is more likely off because list_head is a double pointer. But
> let's just go with this, as the amount really doesn't matter here.
> 
> > 
> > your obj size is 8 byte
> > 
> > struct foo_b {
> >   struct rcu_head rcu;
> 
> Isn't rcu_head defined as;
> 
> struct callback_head {
>         struct callback_head *next;
>         void (*func)(struct callback_head *head);
> } __attribute__((aligned(sizeof(void *))));
> #define rcu_head callback_head
> 
> Which makes it 8 not 16 on 32 bit as well?
> 
> >   int a;
> >   int b;
> > };
> 
> So it should be 8 + 8 = 16, on 32 bit and 16 + 16 = 32 on 64bit.
> 
> > 
> > now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
> > will be 32 bytes since there is no slab for 24 bytes:
> > 
> > <snip>
> >   kmalloc-32         19840  19840     32  128    1 : tunables    0    0    0 : slabdata    155    155      0
> >   kmalloc-16         28857  28928     16  256    1 : tunables    0    0    0 : slabdata    113    113      0
> >   kmalloc-8          37376  37376      8  512    1 : tunables    0    0    0 : slabdata     73     73      0
> > <snip>
> > 
> > if we allocate 512 objects of foo_a it would be 4096 bytes
> > in case of foo_b it is 24 * 512 = 12228 bytes.
> 
> This is for probe events. We usually allocate 1, maybe 2. Oh, some may even
> allocate 100 to be crazy. But each probe event is in reality much larger
> (1K perhaps) as each one allocates dentry's, inodes, etc. So 8 or 16 bytes
> extra is still lost in the noise.
> 
> > 
> > single argument will give you 4096 + 512 * 8 = 8192 bytes
> > int terms of memory consumtion.
> 
> If someone allocate 512 instances, that would be closer to a meg in size
> without this change. 8k is probably less than 1%
> 
In percentage. My case. (12228 - 8192) * 100 / 12228 = ~33% difference.

> > 
> > And double argument will not give you better performance comparing
> > with a single argument.
> 
> It will, because it will no longer have to allocate anything if need be.
> Note, when it doesn't allocate the system is probably mostly idle and we
> don't care about performance, but when it needs allocation, that's likely a
> time when performance is a bit more important.
> 
The problem further is about pointer chasing, like comparing arrays and
lists. It will take longer time to offload all pointers.

--
Uladzislau Rezki
Paul E. McKenney March 16, 2023, 3:12 p.m. UTC | #13
On Thu, Mar 16, 2023 at 09:56:53AM -0400, Steven Rostedt wrote:
> On Thu, 16 Mar 2023 09:16:37 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > > index ef8ed3b65d05..e6037752dcf0 100644
> > > --- a/kernel/trace/trace_probe.h
> > > +++ b/kernel/trace/trace_probe.h
> > > @@ -256,6 +256,7 @@ struct trace_probe {
> > >  struct event_file_link {
> > >  	struct trace_event_file		*file;
> > >  	struct list_head		list;
> > > +	struct rcu_head			rcu;
> > >  };
> > >  
> > >  static inline bool trace_probe_test_flag(struct trace_probe *tp,
> > >  
> > struct foo_a {
> >   int a;
> >   int b;
> > };
> 
> Most machines today are 64 bits, even low end machines.
> 
>  struct foo_a {
> 	long long a;
> 	long long b;
>  };
> 
> is more accurate. That's 16 bytes.
> 
> Although it is more likely off because list_head is a double pointer. But
> let's just go with this, as the amount really doesn't matter here.
> 
> > 
> > your obj size is 8 byte
> > 
> > struct foo_b {
> >   struct rcu_head rcu;
> 
> Isn't rcu_head defined as;
> 
> struct callback_head {
>         struct callback_head *next;
>         void (*func)(struct callback_head *head);
> } __attribute__((aligned(sizeof(void *))));
> #define rcu_head callback_head
> 
> Which makes it 8 not 16 on 32 bit as well?
> 
> >   int a;
> >   int b;
> > };
> 
> So it should be 8 + 8 = 16, on 32 bit and 16 + 16 = 32 on 64bit.
> 
> > 
> > now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
> > will be 32 bytes since there is no slab for 24 bytes:
> > 
> > <snip>
> >   kmalloc-32         19840  19840     32  128    1 : tunables    0    0    0 : slabdata    155    155      0
> >   kmalloc-16         28857  28928     16  256    1 : tunables    0    0    0 : slabdata    113    113      0
> >   kmalloc-8          37376  37376      8  512    1 : tunables    0    0    0 : slabdata     73     73      0
> > <snip>
> > 
> > if we allocate 512 objects of foo_a it would be 4096 bytes
> > in case of foo_b it is 24 * 512 = 12228 bytes.
> 
> This is for probe events. We usually allocate 1, maybe 2. Oh, some may even
> allocate 100 to be crazy. But each probe event is in reality much larger
> (1K perhaps) as each one allocates dentry's, inodes, etc. So 8 or 16 bytes
> extra is still lost in the noise.

Agreed, in this case, there is almost no penalty for adding an rcu_head
structure to the probe event...

> > single argument will give you 4096 + 512 * 8 = 8192 bytes
> > int terms of memory consumtion.
> 
> If someone allocate 512 instances, that would be closer to a meg in size
> without this change. 8k is probably less than 1%
> 
> > 
> > And double argument will not give you better performance comparing
> > with a single argument.
> 
> It will, because it will no longer have to allocate anything if need be.
> Note, when it doesn't allocate the system is probably mostly idle and we
> don't care about performance, but when it needs allocation, that's likely a
> time when performance is a bit more important.

... and also agreed that there are performance benefits to doing so,
especially under OOM conditions.

							Thanx, Paul
Paul E. McKenney March 16, 2023, 5:54 p.m. UTC | #14
On Wed, Mar 15, 2023 at 06:36:48PM -0400, Steven Rostedt wrote:
> On Thu, 9 Mar 2023 14:45:21 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > > The kvfree_rcu()'s single argument name is deprecated therefore
> > > rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> > > underline that it is for sleepable contexts.
> > > 
> > > Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > >  
> > Could you please add you reviwed-by or Acked-by tags so we can bring
> > our series with renaming for the next merge window?
> 
> I don't know. Perhaps we should just apply this patch and not worry about
> sleeping and whatnot.

Just in case it is not clear:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> -- Steve
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 04f0fdae19a1..5de945a8f61d 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -76,6 +76,7 @@ static unsigned long osnoise_options	= OSN_DEFAULT_OPTIONS;
>  struct osnoise_instance {
>  	struct list_head	list;
>  	struct trace_array	*tr;
> +	struct rcu_head		rcu;
>  };
>  
>  static struct list_head osnoise_instances;
> @@ -159,7 +160,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
>  	if (!found)
>  		return;
>  
> -	kvfree_rcu(inst);
> +	kvfree_rcu(inst, rcu);
>  }
>  
>  /*
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 20d0c4a97633..ef5fafb40c76 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
>  		return -ENOENT;
>  
>  	list_del_rcu(&link->list);
> -	kvfree_rcu(link);
> +	kvfree_rcu(link, rcu);
>  
>  	if (list_empty(&tp->event->files))
>  		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index ef8ed3b65d05..e6037752dcf0 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -256,6 +256,7 @@ struct trace_probe {
>  struct event_file_link {
>  	struct trace_event_file		*file;
>  	struct list_head		list;
> +	struct rcu_head			rcu;
>  };
>  
>  static inline bool trace_probe_test_flag(struct trace_probe *tp,
Uladzislau Rezki March 16, 2023, 5:57 p.m. UTC | #15
On Wed, Mar 15, 2023 at 06:36:48PM -0400, Steven Rostedt wrote:
> On Thu, 9 Mar 2023 14:45:21 +0100
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > > The kvfree_rcu()'s single argument name is deprecated therefore
> > > rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
> > > underline that it is for sleepable contexts.
> > > 
> > > Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > >  
> > Could you please add you reviwed-by or Acked-by tags so we can bring
> > our series with renaming for the next merge window?
> 
> I don't know. Perhaps we should just apply this patch and not worry about
> sleeping and whatnot.
> 
> -- Steve
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 04f0fdae19a1..5de945a8f61d 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -76,6 +76,7 @@ static unsigned long osnoise_options	= OSN_DEFAULT_OPTIONS;
>  struct osnoise_instance {
>  	struct list_head	list;
>  	struct trace_array	*tr;
> +	struct rcu_head		rcu;
>  };
>  
>  static struct list_head osnoise_instances;
> @@ -159,7 +160,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
>  	if (!found)
>  		return;
>  
> -	kvfree_rcu(inst);
> +	kvfree_rcu(inst, rcu);
>  }
>  
>  /*
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 20d0c4a97633..ef5fafb40c76 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
>  		return -ENOENT;
>  
>  	list_del_rcu(&link->list);
> -	kvfree_rcu(link);
> +	kvfree_rcu(link, rcu);
>  
>  	if (list_empty(&tp->event->files))
>  		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index ef8ed3b65d05..e6037752dcf0 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -256,6 +256,7 @@ struct trace_probe {
>  struct event_file_link {
>  	struct trace_event_file		*file;
>  	struct list_head		list;
> +	struct rcu_head			rcu;
>  };
>  
>  static inline bool trace_probe_test_flag(struct trace_probe *tp,
>
Anyway i do not see any problems with it

Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki
Joel Fernandes March 16, 2023, 6:01 p.m. UTC | #16
> On Mar 16, 2023, at 1:58 PM, Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> On Wed, Mar 15, 2023 at 06:36:48PM -0400, Steven Rostedt wrote:
>> On Thu, 9 Mar 2023 14:45:21 +0100
>> Uladzislau Rezki <urezki@gmail.com> wrote:
>> 
>>>> The kvfree_rcu()'s single argument name is deprecated therefore
>>>> rename it to kvfree_rcu_mightsleep() variant. The goal is explicitly
>>>> underline that it is for sleepable contexts.
>>>> 
>>>> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>>>> 
>>> Could you please add you reviwed-by or Acked-by tags so we can bring
>>> our series with renaming for the next merge window?
>> 
>> I don't know. Perhaps we should just apply this patch and not worry about
>> sleeping and whatnot.
>> 
>> -- Steve
>> 
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index 04f0fdae19a1..5de945a8f61d 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -76,6 +76,7 @@ static unsigned long osnoise_options    = OSN_DEFAULT_OPTIONS;
>> struct osnoise_instance {
>>    struct list_head    list;
>>    struct trace_array    *tr;
>> +    struct rcu_head        rcu;
>> };
>> 
>> static struct list_head osnoise_instances;
>> @@ -159,7 +160,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
>>    if (!found)
>>        return;
>> 
>> -    kvfree_rcu(inst);
>> +    kvfree_rcu(inst, rcu);
>> }
>> 
>> /*
>> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
>> index 20d0c4a97633..ef5fafb40c76 100644
>> --- a/kernel/trace/trace_probe.c
>> +++ b/kernel/trace/trace_probe.c
>> @@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
>>        return -ENOENT;
>> 
>>    list_del_rcu(&link->list);
>> -    kvfree_rcu(link);
>> +    kvfree_rcu(link, rcu);
>> 
>>    if (list_empty(&tp->event->files))
>>        trace_probe_clear_flag(tp, TP_FLAG_TRACE);
>> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
>> index ef8ed3b65d05..e6037752dcf0 100644
>> --- a/kernel/trace/trace_probe.h
>> +++ b/kernel/trace/trace_probe.h
>> @@ -256,6 +256,7 @@ struct trace_probe {
>> struct event_file_link {
>>    struct trace_event_file        *file;
>>    struct list_head        list;
>> +    struct rcu_head            rcu;
>> };
>> 
>> static inline bool trace_probe_test_flag(struct trace_probe *tp,
>> 
> Anyway i do not see any problems with it
> 
> Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Just to clarify, we are dropping the original patch and instead taking Steves version?

If so, Steve please send a patch when you are able to and with Vlads Ack,
we can take it via the RCU tree if that is Ok with you.

Thanks,

 - Joel


> 
> --
> Uladzislau Rezki
Uladzislau Rezki March 17, 2023, 9:05 a.m. UTC | #17
On Thu, Mar 16, 2023 at 04:05:09PM +0100, Uladzislau Rezki wrote:
> On Thu, Mar 16, 2023 at 09:56:53AM -0400, Steven Rostedt wrote:
> > On Thu, 16 Mar 2023 09:16:37 +0100
> > Uladzislau Rezki <urezki@gmail.com> wrote:
> > 
> > > > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > > > index ef8ed3b65d05..e6037752dcf0 100644
> > > > --- a/kernel/trace/trace_probe.h
> > > > +++ b/kernel/trace/trace_probe.h
> > > > @@ -256,6 +256,7 @@ struct trace_probe {
> > > >  struct event_file_link {
> > > >  	struct trace_event_file		*file;
> > > >  	struct list_head		list;
> > > > +	struct rcu_head			rcu;
> > > >  };
> > > >  
> > > >  static inline bool trace_probe_test_flag(struct trace_probe *tp,
> > > >  
> > > struct foo_a {
> > >   int a;
> > >   int b;
> > > };
> > 
> > Most machines today are 64 bits, even low end machines.
> > 
> >  struct foo_a {
> > 	long long a;
> > 	long long b;
> >  };
> > 
> > is more accurate. That's 16 bytes.
> > 
> > Although it is more likely off because list_head is a double pointer. But
> > let's just go with this, as the amount really doesn't matter here.
> > 
> > > 
> > > your obj size is 8 byte
> > > 
> > > struct foo_b {
> > >   struct rcu_head rcu;
> > 
> > Isn't rcu_head defined as;
> > 
> > struct callback_head {
> >         struct callback_head *next;
> >         void (*func)(struct callback_head *head);
> > } __attribute__((aligned(sizeof(void *))));
> > #define rcu_head callback_head
> > 
> > Which makes it 8 not 16 on 32 bit as well?
> > 
> > >   int a;
> > >   int b;
> > > };
> > 
> > So it should be 8 + 8 = 16, on 32 bit and 16 + 16 = 32 on 64bit.
> > 
> > > 
> > > now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
> > > will be 32 bytes since there is no slab for 24 bytes:
> > > 
> > > <snip>
> > >   kmalloc-32         19840  19840     32  128    1 : tunables    0    0    0 : slabdata    155    155      0
> > >   kmalloc-16         28857  28928     16  256    1 : tunables    0    0    0 : slabdata    113    113      0
> > >   kmalloc-8          37376  37376      8  512    1 : tunables    0    0    0 : slabdata     73     73      0
> > > <snip>
> > > 
> > > if we allocate 512 objects of foo_a it would be 4096 bytes
> > > in case of foo_b it is 24 * 512 = 12228 bytes.
> > 
> > This is for probe events. We usually allocate 1, maybe 2. Oh, some may even
> > allocate 100 to be crazy. But each probe event is in reality much larger
> > (1K perhaps) as each one allocates dentry's, inodes, etc. So 8 or 16 bytes
> > extra is still lost in the noise.
> > 
> > > 
> > > single argument will give you 4096 + 512 * 8 = 8192 bytes
> > > int terms of memory consumtion.
> > 
> > If someone allocate 512 instances, that would be closer to a meg in size
> > without this change. 8k is probably less than 1%
> > 
> In percentage. My case. (12228 - 8192) * 100 / 12228 = ~33% difference.
> 
> > > 
> > > And double argument will not give you better performance comparing
> > > with a single argument.
> > 
> > It will, because it will no longer have to allocate anything if need be.
> > Note, when it doesn't allocate the system is probably mostly idle and we
> > don't care about performance, but when it needs allocation, that's likely a
> > time when performance is a bit more important.
> > 
> The problem further is about pointer chasing, like comparing arrays and
> lists. It will take longer time to offload all pointers.
> 
Since i have a data, IMHO it is better to share than not:

--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=3 rcuscale.holdoff=20 rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot"

# double-argument 10 run
Total time taken by all kfree'ers: 4387872408 ns, loops: 10000, batches: 958,  memory footprint: 40MB
Total time taken by all kfree'ers: 4415232304 ns, loops: 10000, batches: 982,  memory footprint: 39MB
Total time taken by all kfree'ers: 4270303081 ns, loops: 10000, batches: 955,  memory footprint: 42MB
Total time taken by all kfree'ers: 4364984147 ns, loops: 10000, batches: 953,  memory footprint: 40MB
Total time taken by all kfree'ers: 4225994506 ns, loops: 10000, batches: 942,  memory footprint: 40MB
Total time taken by all kfree'ers: 4601087346 ns, loops: 10000, batches: 1033, memory footprint: 40MB
Total time taken by all kfree'ers: 4853397855 ns, loops: 10000, batches: 1109, memory footprint: 38MB
Total time taken by all kfree'ers: 4627914204 ns, loops: 10000, batches: 1037, memory footprint: 39MB
Total time taken by all kfree'ers: 4274587317 ns, loops: 10000, batches: 938,  memory footprint: 33MB
Total time taken by all kfree'ers: 4642151205 ns, loops: 10000, batches: 1068, memory footprint: 39MB

# single-argument 10 run
Total time taken by all kfree'ers: 3661190052 ns, loops: 10000, batches: 831, memory footprint: 29MB
Total time taken by all kfree'ers: 3616277061 ns, loops: 10000, batches: 780, memory footprint: 27MB
Total time taken by all kfree'ers: 3704584439 ns, loops: 10000, batches: 810, memory footprint: 27MB
Total time taken by all kfree'ers: 3631291959 ns, loops: 10000, batches: 812, memory footprint: 28MB
Total time taken by all kfree'ers: 3610490769 ns, loops: 10000, batches: 795, memory footprint: 27MB
Total time taken by all kfree'ers: 3595446243 ns, loops: 10000, batches: 825, memory footprint: 28MB
Total time taken by all kfree'ers: 3686252889 ns, loops: 10000, batches: 784, memory footprint: 27MB
Total time taken by all kfree'ers: 3821475275 ns, loops: 10000, batches: 869, memory footprint: 27MB
Total time taken by all kfree'ers: 3740407185 ns, loops: 10000, batches: 813, memory footprint: 28MB
Total time taken by all kfree'ers: 3646684795 ns, loops: 10000, batches: 780, memory footprint: 24MB

And yes, there are side effects. For example a low memory condition.

--
Uladzislau Rezki
Steven Rostedt March 18, 2023, 4:11 p.m. UTC | #18
On Thu, 16 Mar 2023 14:01:30 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> If so, Steve please send a patch when you are able to and with Vlads Ack,
> we can take it via the RCU tree if that is Ok with you.

Daniel acked it, so you can just take it as he can be responsible for that code.

For the trace_probe.c if you get an ack from Masami, then you can take
the original patch as is.

-- Steve
Joel Fernandes March 22, 2023, 11:10 p.m. UTC | #19
On Sat, Mar 18, 2023 at 12:11:31PM -0400, Steven Rostedt wrote:
> On Thu, 16 Mar 2023 14:01:30 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > If so, Steve please send a patch when you are able to and with Vlads Ack,
> > we can take it via the RCU tree if that is Ok with you.
> 
> Daniel acked it, so you can just take it as he can be responsible for that code.
> 
> For the trace_probe.c if you get an ack from Masami, then you can take
> the original patch as is.


Masami, do you Ack the changes of the original patch to trace_probe.c ?

Here it is again:
https://lore.kernel.org/lkml/20230201150815.409582-5-urezki@gmail.com/


thanks,

 - Joel
diff mbox series

Patch

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 94c1b5eb1dc0..67392d872b67 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -160,7 +160,7 @@  static void osnoise_unregister_instance(struct trace_array *tr)
 	if (!found)
 		return;
 
-	kvfree_rcu(inst);
+	kvfree_rcu_mightsleep(inst);
 }
 
 /*
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 01ebabbbe8c9..32a7741dc731 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1170,7 +1170,7 @@  int trace_probe_remove_file(struct trace_probe *tp,
 		return -ENOENT;
 
 	list_del_rcu(&link->list);
-	kvfree_rcu(link);
+	kvfree_rcu_mightsleep(link);
 
 	if (list_empty(&tp->event->files))
 		trace_probe_clear_flag(tp, TP_FLAG_TRACE);