diff mbox series

[RESEND] tracing: Fix overflow in get_free_elt()

Message ID 20240805055922.6277-1-Tze-nan.Wu@mediatek.com (mailing list archive)
State New
Headers show
Series [RESEND] tracing: Fix overflow in get_free_elt() | expand

Commit Message

Tze-nan Wu Aug. 5, 2024, 5:59 a.m. UTC
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.

Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.

Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".

Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
We have encountered this issue internally after enabling the
throttle_rss_stat feature provided by Perfetto in background for more than
two days, during which `rss_stat` tracepoint was invoked over 2^32 times.
After tracing_map->next_elt overflow, new elements can continue to be 
inserted to the tracing_map belong to `rss_stat`.
Then the CPU could hang inside the while dead loop in function
`__tracing_map_insert()` by calling it after the tracing_map left no empty
entry.

Call trace during hang:
  __tracing_map_insert()
  tracing_map_insert()
  event_hist_trigger()
  event_triggers_call()
  __event_trigger_test_discard()
  trace_event_buffer_commit()
  trace_event_raw_event_rss_stat()
  __traceiter_rss_stat()
  trace_rss_stat()
  mm_trace_rss_stat()
  inc_mm_counter()
  do_swap_page()

throttle_rss_stat is literally a synthetic event triggered by `rss_stat`
with condition:
 1. $echo "rss_stat_throttled unsigned int mm_id unsigned int curr int
    member long size" >> /sys/kernel/tracing/synthetic_events
 2. $echo
    'hist:keys=mm_id,member:bucket=size/0x80000:onchange($bucket).rss_stat_
    throttled(mm_id,curr,member,size)' >
    /sys/kernel/tracing/events/kmem/rss_stat/trigger

The hang issue could also be reproduced easily by calling a customize
trace event `myevent(u64 mycount)` for more than 2^32+(map_size-max_elts)
times during its histogram enabled with the key set to variable "mycount".
While we call `myevent` with different argument "mycount" everytime.

BTW, I added Cheng-jui to Co-developed because we have a lot of discussions
during debugging this.
---
 kernel/trace/tracing_map.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Steven Rostedt Aug. 6, 2024, 7:40 p.m. UTC | #1
On Mon, 5 Aug 2024 13:59:22 +0800
Tze-nan Wu <Tze-nan.Wu@mediatek.com> wrote:

> "tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
> 
> Once it overflows, new elements can still be inserted into the tracing_map
> even though the maximum number of elements (`max_elts`) has been reached.
> Continuing to insert elements after the overflow could result in the
> tracing_map containing "tracing_map->max_size" elements, leaving no empty
> entries.
> If any attempt is made to insert an element into a full tracing_map using
> `__tracing_map_insert()`, it will cause an infinite loop with preemption
> disabled, leading to a CPU hang problem.
> 
> Fix this by preventing any further increments to "tracing_map->next_elt"
> once it reaches "tracing_map->max_elt".
> 
> Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> ---
> We have encountered this issue internally after enabling the
> throttle_rss_stat feature provided by Perfetto in background for more than
> two days, during which `rss_stat` tracepoint was invoked over 2^32 times.
> After tracing_map->next_elt overflow, new elements can continue to be 
> inserted to the tracing_map belong to `rss_stat`.
> Then the CPU could hang inside the while dead loop in function
> `__tracing_map_insert()` by calling it after the tracing_map left no empty
> entry.
> 
> Call trace during hang:
>   __tracing_map_insert()
>   tracing_map_insert()
>   event_hist_trigger()
>   event_triggers_call()
>   __event_trigger_test_discard()
>   trace_event_buffer_commit()
>   trace_event_raw_event_rss_stat()
>   __traceiter_rss_stat()
>   trace_rss_stat()
>   mm_trace_rss_stat()
>   inc_mm_counter()
>   do_swap_page()
> 
> throttle_rss_stat is literally a synthetic event triggered by `rss_stat`
> with condition:
>  1. $echo "rss_stat_throttled unsigned int mm_id unsigned int curr int
>     member long size" >> /sys/kernel/tracing/synthetic_events
>  2. $echo
>     'hist:keys=mm_id,member:bucket=size/0x80000:onchange($bucket).rss_stat_
>     throttled(mm_id,curr,member,size)' >
>     /sys/kernel/tracing/events/kmem/rss_stat/trigger
> 
> The hang issue could also be reproduced easily by calling a customize
> trace event `myevent(u64 mycount)` for more than 2^32+(map_size-max_elts)
> times during its histogram enabled with the key set to variable "mycount".
> While we call `myevent` with different argument "mycount" everytime.
> 
> BTW, I added Cheng-jui to Co-developed because we have a lot of discussions
> during debugging this.
> ---
>  kernel/trace/tracing_map.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
> index a4dcf0f24352..3a56e7c8aa4f 100644
> --- a/kernel/trace/tracing_map.c
> +++ b/kernel/trace/tracing_map.c
> @@ -454,7 +454,7 @@ static struct tracing_map_elt *get_free_elt(struct tracing_map *map)
>  	struct tracing_map_elt *elt = NULL;
>  	int idx;
>  
> -	idx = atomic_inc_return(&map->next_elt);
> +	idx = atomic_fetch_add_unless(&map->next_elt, 1, map->max_elts);

I guess we need to add (with a comment):

	idx--;

>  	if (idx < map->max_elts) {

Otherwise the max elements will be off by one.

>  		elt = *(TRACING_MAP_ELT(map->elts, idx));

And the index will skip position zero.

-- Steve

>  		if (map->ops && map->ops->elt_init)
> @@ -699,7 +699,7 @@ void tracing_map_clear(struct tracing_map *map)
>  {
>  	unsigned int i;
>  
> -	atomic_set(&map->next_elt, -1);
> +	atomic_set(&map->next_elt, 0);
>  	atomic64_set(&map->hits, 0);
>  	atomic64_set(&map->drops, 0);
>  
> @@ -783,7 +783,7 @@ struct tracing_map *tracing_map_create(unsigned int map_bits,
>  
>  	map->map_bits = map_bits;
>  	map->max_elts = (1 << map_bits);
> -	atomic_set(&map->next_elt, -1);
> +	atomic_set(&map->next_elt, 0);
>  
>  	map->map_size = (1 << (map_bits + 1));
>  	map->ops = ops;
Tze-nan Wu Aug. 7, 2024, 11:34 a.m. UTC | #2
On Tue, 2024-08-06 at 15:40 -0400, Steven Rostedt wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Mon, 5 Aug 2024 13:59:22 +0800
> Tze-nan Wu <Tze-nan.Wu@mediatek.com> wrote:
> 
> > "tracing_map->next_elt" in get_free_elt() is at risk of
> overflowing.
> > 
> > Once it overflows, new elements can still be inserted into the
> tracing_map
> > even though the maximum number of elements (`max_elts`) has been
> reached.
> > Continuing to insert elements after the overflow could result in
> the
> > tracing_map containing "tracing_map->max_size" elements, leaving no
> empty
> > entries.
> > If any attempt is made to insert an element into a full tracing_map
> using
> > `__tracing_map_insert()`, it will cause an infinite loop with
> preemption
> > disabled, leading to a CPU hang problem.
> > 
> > Fix this by preventing any further increments to "tracing_map-
> >next_elt"
> > once it reaches "tracing_map->max_elt".
> > 
> > Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> > Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > ---
> > We have encountered this issue internally after enabling the
> > throttle_rss_stat feature provided by Perfetto in background for
> more than
> > two days, during which `rss_stat` tracepoint was invoked over 2^32
> times.
> > After tracing_map->next_elt overflow, new elements can continue to
> be 
> > inserted to the tracing_map belong to `rss_stat`.
> > Then the CPU could hang inside the while dead loop in function
> > `__tracing_map_insert()` by calling it after the tracing_map left
> no empty
> > entry.
> > 
> > Call trace during hang:
> >   __tracing_map_insert()
> >   tracing_map_insert()
> >   event_hist_trigger()
> >   event_triggers_call()
> >   __event_trigger_test_discard()
> >   trace_event_buffer_commit()
> >   trace_event_raw_event_rss_stat()
> >   __traceiter_rss_stat()
> >   trace_rss_stat()
> >   mm_trace_rss_stat()
> >   inc_mm_counter()
> >   do_swap_page()
> > 
> > throttle_rss_stat is literally a synthetic event triggered by
> `rss_stat`
> > with condition:
> >  1. $echo "rss_stat_throttled unsigned int mm_id unsigned int curr
> int
> >     member long size" >> /sys/kernel/tracing/synthetic_events
> >  2. $echo
> >     'hist:keys=mm_id,member:bucket=size/0x80000:onchange($bucket).r
> ss_stat_
> >     throttled(mm_id,curr,member,size)' >
> >     /sys/kernel/tracing/events/kmem/rss_stat/trigger
> > 
> > The hang issue could also be reproduced easily by calling a
> customize
> > trace event `myevent(u64 mycount)` for more than 2^32+(map_size-
> max_elts)
> > times during its histogram enabled with the key set to variable
> "mycount".
> > While we call `myevent` with different argument "mycount"
> everytime.
> > 
> > BTW, I added Cheng-jui to Co-developed because we have a lot of
> discussions
> > during debugging this.
> > ---
> >  kernel/trace/tracing_map.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/trace/tracing_map.c
> b/kernel/trace/tracing_map.c
> > index a4dcf0f24352..3a56e7c8aa4f 100644
> > --- a/kernel/trace/tracing_map.c
> > +++ b/kernel/trace/tracing_map.c
> > @@ -454,7 +454,7 @@ static struct tracing_map_elt
> *get_free_elt(struct tracing_map *map)
> >  struct tracing_map_elt *elt = NULL;
> >  int idx;
> >  
> > -idx = atomic_inc_return(&map->next_elt);
> > +idx = atomic_fetch_add_unless(&map->next_elt, 1, map->max_elts);
> 
> I guess we need to add (with a comment):
> 
> idx--;
> 
Thanks for reviewing this!

By reference to "/include/linux/atomic/atomic-instrumented.h",
`atomic_fetch_add_unless` will return the original value before the
adding happens on operands.
Therefore, the return values of `atomic_fetch_add`, will sequentially
be:
    - 0, 1, 2, ..., max_elts-1, max_elts, max_elts, max_elts, ... (in
this case)

I've also observed the result from my local test.
And I guess the result is already what we desire? (elt will insert to
map from index 0).

Here's the comment from "atomic-instrumented.h":
  atomic_fetch_add_unless() - atomic add unless value with full
ordering
  @v: pointer to atomic_t
  @a: int value to add
  @u: int value to compare with
  If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
  Otherwise, @v is not modified and relaxed ordering is provided.
  Unsafe to use in noinstr code; use raw_atomic_fetch_add_unless()
there.
  Return: The original value of @v.

Thanks
-- Tze-nan

> >  if (idx < map->max_elts) {
> 
> Otherwise the max elements will be off by one.
> 
> >  elt = *(TRACING_MAP_ELT(map->elts, idx));
> 
> And the index will skip position zero.
> 
> -- Steve
> 
> >  if (map->ops && map->ops->elt_init)
> > @@ -699,7 +699,7 @@ void tracing_map_clear(struct tracing_map *map)
> >  {
> >  unsigned int i;
> >  
> > -atomic_set(&map->next_elt, -1);
> > +atomic_set(&map->next_elt, 0);
> >  atomic64_set(&map->hits, 0);
> >  atomic64_set(&map->drops, 0);
> >  
> > @@ -783,7 +783,7 @@ struct tracing_map *tracing_map_create(unsigned
> int map_bits,
> >  
> >  map->map_bits = map_bits;
> >  map->max_elts = (1 << map_bits);
> > -atomic_set(&map->next_elt, -1);
> > +atomic_set(&map->next_elt, 0);
> >  
> >  map->map_size = (1 << (map_bits + 1));
> >  map->ops = ops;
>
Steven Rostedt Aug. 7, 2024, 1:29 p.m. UTC | #3
On Wed, 7 Aug 2024 11:34:43 +0000
Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com> wrote:

> > > -idx = atomic_inc_return(&map->next_elt);
> > > +idx = atomic_fetch_add_unless(&map->next_elt, 1, map->max_elts);  
> > 
> > I guess we need to add (with a comment):
> > 
> > idx--;
> >   
> Thanks for reviewing this!
> 
> By reference to "/include/linux/atomic/atomic-instrumented.h",
> `atomic_fetch_add_unless` will return the original value before the
> adding happens on operands.

Ah, right. I haven't used that before and was thinking this was the same as
the inc_return().

-- Steve
diff mbox series

Patch

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index a4dcf0f24352..3a56e7c8aa4f 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -454,7 +454,7 @@  static struct tracing_map_elt *get_free_elt(struct tracing_map *map)
 	struct tracing_map_elt *elt = NULL;
 	int idx;
 
-	idx = atomic_inc_return(&map->next_elt);
+	idx = atomic_fetch_add_unless(&map->next_elt, 1, map->max_elts);
 	if (idx < map->max_elts) {
 		elt = *(TRACING_MAP_ELT(map->elts, idx));
 		if (map->ops && map->ops->elt_init)
@@ -699,7 +699,7 @@  void tracing_map_clear(struct tracing_map *map)
 {
 	unsigned int i;
 
-	atomic_set(&map->next_elt, -1);
+	atomic_set(&map->next_elt, 0);
 	atomic64_set(&map->hits, 0);
 	atomic64_set(&map->drops, 0);
 
@@ -783,7 +783,7 @@  struct tracing_map *tracing_map_create(unsigned int map_bits,
 
 	map->map_bits = map_bits;
 	map->max_elts = (1 << map_bits);
-	atomic_set(&map->next_elt, -1);
+	atomic_set(&map->next_elt, 0);
 
 	map->map_size = (1 << (map_bits + 1));
 	map->ops = ops;