diff mbox series

tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer

Message ID 169262755804.106231.8245792908363050528.stgit@devnote2 (mailing list archive)
State Changes Requested
Headers show
Series tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer | expand

Commit Message

Masami Hiramatsu (Google) Aug. 21, 2023, 2:19 p.m. UTC
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

ftrace user can set 0 or small number to the 'buffer_percent' for quick
response for the ring buffer. In that case wait_on_pipe() will return
before filling a page of the ring buffer. That is too soon for splice()
because ring_buffer_read_page() will fail again.
This leads unnecessary loop in tracing_buffers_splice_read().

Set a minimum percentage of the buffer which is enough to fill a page to
wait_on_pipe() to avoid this situation.

Fixes: 03329f993978 ("tracing: Add tracefs file buffer_percentage")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/trace.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Masami Hiramatsu (Google) Aug. 21, 2023, 3:16 p.m. UTC | #1
On Mon, 21 Aug 2023 23:19:18 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> ftrace user can set 0 or small number to the 'buffer_percent' for quick
> response for the ring buffer. In that case wait_on_pipe() will return
> before filling a page of the ring buffer. That is too soon for splice()
> because ring_buffer_read_page() will fail again.
> This leads unnecessary loop in tracing_buffers_splice_read().
> 
> Set a minimum percentage of the buffer which is enough to fill a page to
> wait_on_pipe() to avoid this situation.
> 
> Fixes: 03329f993978 ("tracing: Add tracefs file buffer_percentage")
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

So this is what I found when I played with virtio-trace.

I think now trace-cmd has richer feature, so it can be replaced with that,
maybe what we need is a selftest that ensures trace_pipe_raw supports
splice correctly.

BTW, even with this fix, blocking splice still causes a strange behavior.
If I set '400' to buffer_size_kb (so 100 pages) and '1' to buffer_percent,
splice always returns 8192 (2 pages) to read. But I expected that should
return 4096 (1 page). This means splice() waits longer than I thought.

I think the fullfilled percentage calculation will be a bit wrong.

Thank you,

> ---
>  kernel/trace/trace.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index b8870078ef58..88448e8d8214 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -8462,6 +8462,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
>  	/* did we read anything? */
>  	if (!spd.nr_pages) {
>  		long wait_index;
> +		size_t nr_pages;
> +		size_t full;
>  
>  		if (ret)
>  			goto out;
> @@ -8472,7 +8474,15 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
>  
>  		wait_index = READ_ONCE(iter->wait_index);
>  
> -		ret = wait_on_pipe(iter, iter->tr->buffer_percent);
> +		/* For splice, we have to ensure at least 1 page is filled */
> +		nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
> +		if (nr_pages * iter->tr->buffer_percent < 100) {
> +			full = nr_pages + 99;
> +			do_div(full, nr_pages);
> +		} else
> +			full = iter->tr->buffer_percent;
> +
> +		ret = wait_on_pipe(iter, full);
>  		if (ret)
>  			goto out;
>  
>
Steven Rostedt Aug. 21, 2023, 3:27 p.m. UTC | #2
On Tue, 22 Aug 2023 00:16:39 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> BTW, even with this fix, blocking splice still causes a strange behavior.
> If I set '400' to buffer_size_kb (so 100 pages) and '1' to buffer_percent,
> splice always returns 8192 (2 pages) to read. But I expected that should
> return 4096 (1 page). This means splice() waits longer than I thought.
> 
> I think the fullfilled percentage calculation will be a bit wrong.

Note, the percentage is when to wake up the task. If between the wakeup and
the read/splice, another ring buffer page gets filled more, then it will
give that as well. The buffer_percent is just how long to wait, not for how
much to read.

Now if you test this with just adding enough to fill one page, and it
doesn't wake up the reader, then that would be a bug.

-- Steve
Masami Hiramatsu (Google) Aug. 21, 2023, 11:50 p.m. UTC | #3
On Mon, 21 Aug 2023 11:27:03 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 22 Aug 2023 00:16:39 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > BTW, even with this fix, blocking splice still causes a strange behavior.
> > If I set '400' to buffer_size_kb (so 100 pages) and '1' to buffer_percent,
> > splice always returns 8192 (2 pages) to read. But I expected that should
> > return 4096 (1 page). This means splice() waits longer than I thought.
> > 
> > I think the fullfilled percentage calculation will be a bit wrong.
> 
> Note, the percentage is when to wake up the task. If between the wakeup and
> the read/splice, another ring buffer page gets filled more, then it will
> give that as well. The buffer_percent is just how long to wait, not for how
> much to read.

Yes, but I used the trace_marker as Zheng did for testing, and I saw the
buffer is filled more than 4096 bytes via tracefs/per_cpu/cpu*/stats.

> 
> Now if you test this with just adding enough to fill one page, and it
> doesn't wake up the reader, then that would be a bug.

Yes.

Run trace_agent with 1% buffer_percent == 1 page;

 # echo 400 > /sys/kernel/tracing/buffer_size_kb
 # echo 1 > /sys/kernel/tracing/buffer_percent
 # trace-agent &

Ready to read in the host side,

$ for i in `seq 0 7`; do cat trace-path-cpu$i.out > trace-data.$i & done
$ echo 1 > agent-ctl-path.in

Write events on per-cpu buffer in the guest
 # for i in `seq 1 1000`; do echo "test event data $i" | tee /sys/kernel/tracing/trace_marker > /dev/null; done 

But trace_agent doesn't wake up. The data is still there.

 # grep bytes /sys/kernel/tracing/per_cpu/cpu*/stats 
/sys/kernel/tracing/per_cpu/cpu0/stats:bytes: 5936
/sys/kernel/tracing/per_cpu/cpu1/stats:bytes: 5584
/sys/kernel/tracing/per_cpu/cpu2/stats:bytes: 6368
/sys/kernel/tracing/per_cpu/cpu3/stats:bytes: 4704
/sys/kernel/tracing/per_cpu/cpu4/stats:bytes: 5972
/sys/kernel/tracing/per_cpu/cpu5/stats:bytes: 5620
/sys/kernel/tracing/per_cpu/cpu6/stats:bytes: 6588
/sys/kernel/tracing/per_cpu/cpu7/stats:bytes: 3496

And write more events via trace_marker again

 # for i in `seq 1001 2000`; do echo "test event data $i" | tee /sys/kernel/tracing/trace_marker > /dev/null; done 

The data is read;

 # grep bytes /sys/kernel/tracing/per_cpu/cpu*/stats 
/sys/kernel/tracing/per_cpu/cpu0/stats:bytes: 3220
/sys/kernel/tracing/per_cpu/cpu1/stats:bytes: 3960
/sys/kernel/tracing/per_cpu/cpu2/stats:bytes: 2420
/sys/kernel/tracing/per_cpu/cpu3/stats:bytes: 2024
/sys/kernel/tracing/per_cpu/cpu4/stats:bytes: 2912
/sys/kernel/tracing/per_cpu/cpu5/stats:bytes: 1064
/sys/kernel/tracing/per_cpu/cpu6/stats:bytes: 5004
/sys/kernel/tracing/per_cpu/cpu7/stats:bytes: 2684

I think this full_hit() function is somewhat wrong.

static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
{
...
        return (dirty * 100) > (full * nr_pages);
}

Ah, I also found a mistake on this patch too.

Thanks,
Masami Hiramatsu (Google) Aug. 22, 2023, 12:15 a.m. UTC | #4
On Mon, 21 Aug 2023 23:19:18 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> ftrace user can set 0 or small number to the 'buffer_percent' for quick
> response for the ring buffer. In that case wait_on_pipe() will return
> before filling a page of the ring buffer. That is too soon for splice()
> because ring_buffer_read_page() will fail again.
> This leads unnecessary loop in tracing_buffers_splice_read().
> 
> Set a minimum percentage of the buffer which is enough to fill a page to
> wait_on_pipe() to avoid this situation.
> 
> Fixes: 03329f993978 ("tracing: Add tracefs file buffer_percentage")
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  kernel/trace/trace.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index b8870078ef58..88448e8d8214 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -8462,6 +8462,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
>  	/* did we read anything? */
>  	if (!spd.nr_pages) {
>  		long wait_index;
> +		size_t nr_pages;
> +		size_t full;
>  
>  		if (ret)
>  			goto out;
> @@ -8472,7 +8474,15 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
>  
>  		wait_index = READ_ONCE(iter->wait_index);
>  
> -		ret = wait_on_pipe(iter, iter->tr->buffer_percent);
> +		/* For splice, we have to ensure at least 1 page is filled */
> +		nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
> +		if (nr_pages * iter->tr->buffer_percent < 100) {
> +			full = nr_pages + 99;
> +			do_div(full, nr_pages);
> +		} else
> +			full = iter->tr->buffer_percent;

Ah I must have to take a sleep well. What I need is to ensure full >= 1.

static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
{
...
        return (dirty * 100) > (full * nr_pages);
}

If dirty = 0, this always false.
But I think if full == 0, this should return true.

If dirty = 1,

- nr_pages < 100, this is always true and that is good.

- nr_pages > 100, even if full is 1 (smallest), it doesn't true. But that is OK
  because dirty page number will be increased later.

- nr_pages == 100 is the corner case. I think this should be

  return (dirty * 100) >= (full * nr_pages);

Let me update the patch.

Thank you,

> +
> +		ret = wait_on_pipe(iter, full);
>  		if (ret)
>  			goto out;
>  
>
kernel test robot Aug. 22, 2023, 4 a.m. UTC | #5
Hi Masami,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.5-rc7 next-20230821]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/tracing-Fix-to-avoid-wakeup-loop-in-splice-read-of-per-cpu-buffer/20230821-222307
base:   linus/master
patch link:    https://lore.kernel.org/r/169262755804.106231.8245792908363050528.stgit%40devnote2
patch subject: [PATCH] tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230822/202308221154.hil5JaxI-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230822/202308221154.hil5JaxI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308221154.hil5JaxI-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/mips/include/asm/div64.h:89,
                    from include/linux/math.h:6,
                    from include/linux/math64.h:6,
                    from include/linux/time64.h:5,
                    from include/linux/restart_block.h:10,
                    from include/linux/thread_info.h:14,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/mips/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:79,
                    from include/linux/spinlock.h:56,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from include/linux/ring_buffer.h:5,
                    from kernel/trace/trace.c:15:
   kernel/trace/trace.c: In function 'tracing_buffers_splice_read':
>> include/asm-generic/div64.h:222:35: warning: comparison of distinct pointer types lacks a cast
     222 |         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
         |                                   ^~
   kernel/trace/trace.c:8481:25: note: in expansion of macro 'do_div'
    8481 |                         do_div(full, nr_pages);
         |                         ^~~~~~
   In file included from arch/mips/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6:
>> include/asm-generic/div64.h:234:32: warning: right shift count >= width of type [-Wshift-count-overflow]
     234 |         } else if (likely(((n) >> 32) == 0)) {          \
         |                                ^~
   include/linux/compiler.h:76:45: note: in definition of macro 'likely'
      76 | # define likely(x)      __builtin_expect(!!(x), 1)
         |                                             ^
   kernel/trace/trace.c:8481:25: note: in expansion of macro 'do_div'
    8481 |                         do_div(full, nr_pages);
         |                         ^~~~~~


vim +222 include/asm-generic/div64.h

^1da177e4c3f41 Linus Torvalds     2005-04-16  215  
^1da177e4c3f41 Linus Torvalds     2005-04-16  216  /* The unnecessary pointer compare is there
^1da177e4c3f41 Linus Torvalds     2005-04-16  217   * to check for type safety (n must be 64bit)
^1da177e4c3f41 Linus Torvalds     2005-04-16  218   */
^1da177e4c3f41 Linus Torvalds     2005-04-16  219  # define do_div(n,base) ({				\
^1da177e4c3f41 Linus Torvalds     2005-04-16  220  	uint32_t __base = (base);			\
^1da177e4c3f41 Linus Torvalds     2005-04-16  221  	uint32_t __rem;					\
^1da177e4c3f41 Linus Torvalds     2005-04-16 @222  	(void)(((typeof((n)) *)0) == ((uint64_t *)0));	\
911918aa7ef6f8 Nicolas Pitre      2015-11-02  223  	if (__builtin_constant_p(__base) &&		\
911918aa7ef6f8 Nicolas Pitre      2015-11-02  224  	    is_power_of_2(__base)) {			\
911918aa7ef6f8 Nicolas Pitre      2015-11-02  225  		__rem = (n) & (__base - 1);		\
911918aa7ef6f8 Nicolas Pitre      2015-11-02  226  		(n) >>= ilog2(__base);			\
c747ce4706190e Geert Uytterhoeven 2021-08-11  227  	} else if (__builtin_constant_p(__base) &&	\
461a5e51060c93 Nicolas Pitre      2015-10-30  228  		   __base != 0) {			\
461a5e51060c93 Nicolas Pitre      2015-10-30  229  		uint32_t __res_lo, __n_lo = (n);	\
461a5e51060c93 Nicolas Pitre      2015-10-30  230  		(n) = __div64_const32(n, __base);	\
461a5e51060c93 Nicolas Pitre      2015-10-30  231  		/* the remainder can be computed with 32-bit regs */ \
461a5e51060c93 Nicolas Pitre      2015-10-30  232  		__res_lo = (n);				\
461a5e51060c93 Nicolas Pitre      2015-10-30  233  		__rem = __n_lo - __res_lo * __base;	\
911918aa7ef6f8 Nicolas Pitre      2015-11-02 @234  	} else if (likely(((n) >> 32) == 0)) {		\
^1da177e4c3f41 Linus Torvalds     2005-04-16  235  		__rem = (uint32_t)(n) % __base;		\
^1da177e4c3f41 Linus Torvalds     2005-04-16  236  		(n) = (uint32_t)(n) / __base;		\
c747ce4706190e Geert Uytterhoeven 2021-08-11  237  	} else {					\
^1da177e4c3f41 Linus Torvalds     2005-04-16  238  		__rem = __div64_32(&(n), __base);	\
c747ce4706190e Geert Uytterhoeven 2021-08-11  239  	}						\
^1da177e4c3f41 Linus Torvalds     2005-04-16  240  	__rem;						\
^1da177e4c3f41 Linus Torvalds     2005-04-16  241   })
^1da177e4c3f41 Linus Torvalds     2005-04-16  242
kernel test robot Aug. 22, 2023, 6:56 a.m. UTC | #6
Hi Masami,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.5-rc7 next-20230821]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/tracing-Fix-to-avoid-wakeup-loop-in-splice-read-of-per-cpu-buffer/20230821-222307
base:   linus/master
patch link:    https://lore.kernel.org/r/169262755804.106231.8245792908363050528.stgit%40devnote2
patch subject: [PATCH] tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer
config: arm-randconfig-r004-20230822 (https://download.01.org/0day-ci/archive/20230822/202308221429.cY9ifC65-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230822/202308221429.cY9ifC65-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308221429.cY9ifC65-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> kernel/trace/trace.c:8481:4: warning: comparison of distinct pointer types ('typeof ((full)) *' (aka 'unsigned int *') and 'uint64_t *' (aka 'unsigned long long *')) [-Wcompare-distinct-pointer-types]
    8481 |                         do_div(full, nr_pages);
         |                         ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/div64.h:222:28: note: expanded from macro 'do_div'
     222 |         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
         |                ~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~
>> kernel/trace/trace.c:8481:4: error: incompatible pointer types passing 'size_t *' (aka 'unsigned int *') to parameter of type 'uint64_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
    8481 |                         do_div(full, nr_pages);
         |                         ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/div64.h:238:22: note: expanded from macro 'do_div'
     238 |                 __rem = __div64_32(&(n), __base);       \
         |                                    ^~~~
   arch/arm/include/asm/div64.h:24:45: note: passing argument to parameter 'n' here
      24 | static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
         |                                             ^
>> kernel/trace/trace.c:8481:4: warning: shift count >= width of type [-Wshift-count-overflow]
    8481 |                         do_div(full, nr_pages);
         |                         ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/div64.h:234:25: note: expanded from macro 'do_div'
     234 |         } else if (likely(((n) >> 32) == 0)) {          \
         |                                ^  ~~
   include/linux/compiler.h:76:40: note: expanded from macro 'likely'
      76 | # define likely(x)      __builtin_expect(!!(x), 1)
         |                                             ^
   2 warnings and 1 error generated.


vim +8481 kernel/trace/trace.c

  8400	
  8401		if (*ppos & (PAGE_SIZE - 1))
  8402			return -EINVAL;
  8403	
  8404		if (len & (PAGE_SIZE - 1)) {
  8405			if (len < PAGE_SIZE)
  8406				return -EINVAL;
  8407			len &= PAGE_MASK;
  8408		}
  8409	
  8410		if (splice_grow_spd(pipe, &spd))
  8411			return -ENOMEM;
  8412	
  8413	 again:
  8414		trace_access_lock(iter->cpu_file);
  8415		entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
  8416	
  8417		for (i = 0; i < spd.nr_pages_max && len && entries; i++, len -= PAGE_SIZE) {
  8418			struct page *page;
  8419			int r;
  8420	
  8421			ref = kzalloc(sizeof(*ref), GFP_KERNEL);
  8422			if (!ref) {
  8423				ret = -ENOMEM;
  8424				break;
  8425			}
  8426	
  8427			refcount_set(&ref->refcount, 1);
  8428			ref->buffer = iter->array_buffer->buffer;
  8429			ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file);
  8430			if (IS_ERR(ref->page)) {
  8431				ret = PTR_ERR(ref->page);
  8432				ref->page = NULL;
  8433				kfree(ref);
  8434				break;
  8435			}
  8436			ref->cpu = iter->cpu_file;
  8437	
  8438			r = ring_buffer_read_page(ref->buffer, &ref->page,
  8439						  len, iter->cpu_file, 1);
  8440			if (r < 0) {
  8441				ring_buffer_free_read_page(ref->buffer, ref->cpu,
  8442							   ref->page);
  8443				kfree(ref);
  8444				break;
  8445			}
  8446	
  8447			page = virt_to_page(ref->page);
  8448	
  8449			spd.pages[i] = page;
  8450			spd.partial[i].len = PAGE_SIZE;
  8451			spd.partial[i].offset = 0;
  8452			spd.partial[i].private = (unsigned long)ref;
  8453			spd.nr_pages++;
  8454			*ppos += PAGE_SIZE;
  8455	
  8456			entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
  8457		}
  8458	
  8459		trace_access_unlock(iter->cpu_file);
  8460		spd.nr_pages = i;
  8461	
  8462		/* did we read anything? */
  8463		if (!spd.nr_pages) {
  8464			long wait_index;
  8465			size_t nr_pages;
  8466			size_t full;
  8467	
  8468			if (ret)
  8469				goto out;
  8470	
  8471			ret = -EAGAIN;
  8472			if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
  8473				goto out;
  8474	
  8475			wait_index = READ_ONCE(iter->wait_index);
  8476	
  8477			/* For splice, we have to ensure at least 1 page is filled */
  8478			nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
  8479			if (nr_pages * iter->tr->buffer_percent < 100) {
  8480				full = nr_pages + 99;
> 8481				do_div(full, nr_pages);
  8482			} else
  8483				full = iter->tr->buffer_percent;
  8484	
  8485			ret = wait_on_pipe(iter, full);
  8486			if (ret)
  8487				goto out;
  8488	
  8489			/* No need to wait after waking up when tracing is off */
  8490			if (!tracer_tracing_is_on(iter->tr))
  8491				goto out;
  8492	
  8493			/* Make sure we see the new wait_index */
  8494			smp_rmb();
  8495			if (wait_index != iter->wait_index)
  8496				goto out;
  8497	
  8498			goto again;
  8499		}
  8500	
  8501		ret = splice_to_pipe(pipe, &spd);
  8502	out:
  8503		splice_shrink_spd(&spd);
  8504	
  8505		return ret;
  8506	}
  8507
Steven Rostedt Aug. 22, 2023, 8:59 a.m. UTC | #7
On Mon, 21 Aug 2023 23:19:18 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

>  	if (!spd.nr_pages) {
>  		long wait_index;
> +		size_t nr_pages;
> +		size_t full;

size_t is usually considered "long" (machine word length).

>  
>  		if (ret)
>  			goto out;
> @@ -8472,7 +8474,15 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
>  
>  		wait_index = READ_ONCE(iter->wait_index);
>  
> -		ret = wait_on_pipe(iter, iter->tr->buffer_percent);
> +		/* For splice, we have to ensure at least 1 page is filled */
> +		nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
> +		if (nr_pages * iter->tr->buffer_percent < 100) {
> +			full = nr_pages + 99;
> +			do_div(full, nr_pages);

No need for do_div() as full is not 64 bit on 32 bit machines.

That's why the kernel test robot is complaining.

-- Steve

> +		} else
> +			full = iter->tr->buffer_percent;
> +
> +		ret = wait_on_pipe(iter, full);
Masami Hiramatsu (Google) Aug. 22, 2023, 12:51 p.m. UTC | #8
On Tue, 22 Aug 2023 04:59:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 21 Aug 2023 23:19:18 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> >  	if (!spd.nr_pages) {
> >  		long wait_index;
> > +		size_t nr_pages;
> > +		size_t full;
> 
> size_t is usually considered "long" (machine word length).
> 
> >  
> >  		if (ret)
> >  			goto out;
> > @@ -8472,7 +8474,15 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
> >  
> >  		wait_index = READ_ONCE(iter->wait_index);
> >  
> > -		ret = wait_on_pipe(iter, iter->tr->buffer_percent);
> > +		/* For splice, we have to ensure at least 1 page is filled */
> > +		nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
> > +		if (nr_pages * iter->tr->buffer_percent < 100) {
> > +			full = nr_pages + 99;
> > +			do_div(full, nr_pages);
> 
> No need for do_div() as full is not 64 bit on 32 bit machines.
> 
> That's why the kernel test robot is complaining.

Thanks for the comment. I decided to set full = 1, maybe we don't need to set
the percent to be the page size because full_hit() checks it by the number
of dirty pages. :)
I'm testing a new one.

Thanks,

> 
> -- Steve
> 
> > +		} else
> > +			full = iter->tr->buffer_percent;
> > +
> > +		ret = wait_on_pipe(iter, full);
kernel test robot Aug. 23, 2023, 12:18 p.m. UTC | #9
Hi Masami,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/tracing-Fix-to-avoid-wakeup-loop-in-splice-read-of-per-cpu-buffer/20230821-222307
base:   linus/master
patch link:    https://lore.kernel.org/r/169262755804.106231.8245792908363050528.stgit%40devnote2
patch subject: [PATCH] tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer
config: mips-randconfig-r131-20230823 (https://download.01.org/0day-ci/archive/20230823/202308232056.401OxapL-lkp@intel.com/config)
compiler: mipsel-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230823/202308232056.401OxapL-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308232056.401OxapL-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   kernel/trace/trace.c:446:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct trace_export **list @@     got struct trace_export [noderef] __rcu ** @@
   kernel/trace/trace.c:446:28: sparse:     expected struct trace_export **list
   kernel/trace/trace.c:446:28: sparse:     got struct trace_export [noderef] __rcu **
   kernel/trace/trace.c:460:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct trace_export **list @@     got struct trace_export [noderef] __rcu ** @@
   kernel/trace/trace.c:460:33: sparse:     expected struct trace_export **list
   kernel/trace/trace.c:460:33: sparse:     got struct trace_export [noderef] __rcu **
   kernel/trace/trace.c:2959:38: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct event_filter *filter @@     got struct event_filter [noderef] __rcu *filter @@
   kernel/trace/trace.c:2959:38: sparse:     expected struct event_filter *filter
   kernel/trace/trace.c:2959:38: sparse:     got struct event_filter [noderef] __rcu *filter
>> kernel/trace/trace.c:8481:25: sparse: sparse: incompatible types in comparison expression (different type sizes):
>> kernel/trace/trace.c:8481:25: sparse:    unsigned int *
>> kernel/trace/trace.c:8481:25: sparse:    unsigned long long [usertype] *
   kernel/trace/trace.c:400:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/trace/trace.c:400:9: sparse:    struct trace_export [noderef] __rcu *
   kernel/trace/trace.c:400:9: sparse:    struct trace_export *
   kernel/trace/trace.c:415:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/trace/trace.c:415:9: sparse:    struct trace_export [noderef] __rcu *
   kernel/trace/trace.c:415:9: sparse:    struct trace_export *
>> kernel/trace/trace.c:8481:25: sparse: sparse: shift too big (32) for type unsigned int

vim +8481 kernel/trace/trace.c

  8400	
  8401		if (*ppos & (PAGE_SIZE - 1))
  8402			return -EINVAL;
  8403	
  8404		if (len & (PAGE_SIZE - 1)) {
  8405			if (len < PAGE_SIZE)
  8406				return -EINVAL;
  8407			len &= PAGE_MASK;
  8408		}
  8409	
  8410		if (splice_grow_spd(pipe, &spd))
  8411			return -ENOMEM;
  8412	
  8413	 again:
  8414		trace_access_lock(iter->cpu_file);
  8415		entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
  8416	
  8417		for (i = 0; i < spd.nr_pages_max && len && entries; i++, len -= PAGE_SIZE) {
  8418			struct page *page;
  8419			int r;
  8420	
  8421			ref = kzalloc(sizeof(*ref), GFP_KERNEL);
  8422			if (!ref) {
  8423				ret = -ENOMEM;
  8424				break;
  8425			}
  8426	
  8427			refcount_set(&ref->refcount, 1);
  8428			ref->buffer = iter->array_buffer->buffer;
  8429			ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file);
  8430			if (IS_ERR(ref->page)) {
  8431				ret = PTR_ERR(ref->page);
  8432				ref->page = NULL;
  8433				kfree(ref);
  8434				break;
  8435			}
  8436			ref->cpu = iter->cpu_file;
  8437	
  8438			r = ring_buffer_read_page(ref->buffer, &ref->page,
  8439						  len, iter->cpu_file, 1);
  8440			if (r < 0) {
  8441				ring_buffer_free_read_page(ref->buffer, ref->cpu,
  8442							   ref->page);
  8443				kfree(ref);
  8444				break;
  8445			}
  8446	
  8447			page = virt_to_page(ref->page);
  8448	
  8449			spd.pages[i] = page;
  8450			spd.partial[i].len = PAGE_SIZE;
  8451			spd.partial[i].offset = 0;
  8452			spd.partial[i].private = (unsigned long)ref;
  8453			spd.nr_pages++;
  8454			*ppos += PAGE_SIZE;
  8455	
  8456			entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
  8457		}
  8458	
  8459		trace_access_unlock(iter->cpu_file);
  8460		spd.nr_pages = i;
  8461	
  8462		/* did we read anything? */
  8463		if (!spd.nr_pages) {
  8464			long wait_index;
  8465			size_t nr_pages;
  8466			size_t full;
  8467	
  8468			if (ret)
  8469				goto out;
  8470	
  8471			ret = -EAGAIN;
  8472			if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
  8473				goto out;
  8474	
  8475			wait_index = READ_ONCE(iter->wait_index);
  8476	
  8477			/* For splice, we have to ensure at least 1 page is filled */
  8478			nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
  8479			if (nr_pages * iter->tr->buffer_percent < 100) {
  8480				full = nr_pages + 99;
> 8481				do_div(full, nr_pages);
  8482			} else
  8483				full = iter->tr->buffer_percent;
  8484	
  8485			ret = wait_on_pipe(iter, full);
  8486			if (ret)
  8487				goto out;
  8488	
  8489			/* No need to wait after waking up when tracing is off */
  8490			if (!tracer_tracing_is_on(iter->tr))
  8491				goto out;
  8492	
  8493			/* Make sure we see the new wait_index */
  8494			smp_rmb();
  8495			if (wait_index != iter->wait_index)
  8496				goto out;
  8497	
  8498			goto again;
  8499		}
  8500	
  8501		ret = splice_to_pipe(pipe, &spd);
  8502	out:
  8503		splice_shrink_spd(&spd);
  8504	
  8505		return ret;
  8506	}
  8507
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b8870078ef58..88448e8d8214 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8462,6 +8462,8 @@  tracing_buffers_splice_read(struct file *file, loff_t *ppos,
 	/* did we read anything? */
 	if (!spd.nr_pages) {
 		long wait_index;
+		size_t nr_pages;
+		size_t full;
 
 		if (ret)
 			goto out;
@@ -8472,7 +8474,15 @@  tracing_buffers_splice_read(struct file *file, loff_t *ppos,
 
 		wait_index = READ_ONCE(iter->wait_index);
 
-		ret = wait_on_pipe(iter, iter->tr->buffer_percent);
+		/* For splice, we have to ensure at least 1 page is filled */
+		nr_pages = ring_buffer_nr_pages(iter->array_buffer->buffer, iter->cpu_file);
+		if (nr_pages * iter->tr->buffer_percent < 100) {
+			full = nr_pages + 99;
+			do_div(full, nr_pages);
+		} else
+			full = iter->tr->buffer_percent;
+
+		ret = wait_on_pipe(iter, full);
 		if (ret)
 			goto out;