diff mbox series

[v21,26/30] splice: Convert trace/seq to use copy_splice_read()

Message ID 20230520000049.2226926-27-dhowells@redhat.com (mailing list archive)
State Handled Elsewhere
Headers show
Series None | expand

Commit Message

David Howells May 20, 2023, midnight UTC
For the splice from the trace seq buffer, just use copy_splice_read().

In the future, something better can probably be done by gifting pages from
seq->buf into the pipe, but that would require changing seq->buf into a
vmap over an array of pages.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: Masami Hiramatsu <mhiramat@kernel.org>
cc: linux-kernel@vger.kernel.org
cc: linux-trace-kernel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig May 20, 2023, 4:14 a.m. UTC | #1
s/splice/trace/ ?
Masami Hiramatsu (Google) May 21, 2023, 10:28 a.m. UTC | #2
Hi David,

On Sat, 20 May 2023 01:00:45 +0100
David Howells <dhowells@redhat.com> wrote:

> For the splice from the trace seq buffer, just use copy_splice_read().

So this is because you will remove generic_file_splice_read() (since
it's buggy), right?

> 
> In the future, something better can probably be done by gifting pages from
> seq->buf into the pipe, but that would require changing seq->buf into a
> vmap over an array of pages.

So what we need is to introduce a vmap? We introduced splice support for
avoiding copy ringbuffer pages, but this drops it. Thus this will drop
performance of splice on ring buffer (trace file). If it is correct,
can you also add a note about that?

Thank you,

> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Steven Rostedt <rostedt@goodmis.org>
> cc: Masami Hiramatsu <mhiramat@kernel.org>
> cc: linux-kernel@vger.kernel.org
> cc: linux-trace-kernel@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-block@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>  kernel/trace/trace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index ebc59781456a..c210d02fac97 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5171,7 +5171,7 @@ static const struct file_operations tracing_fops = {
>  	.open		= tracing_open,
>  	.read		= seq_read,
>  	.read_iter	= seq_read_iter,
> -	.splice_read	= generic_file_splice_read,
> +	.splice_read	= copy_splice_read,
>  	.write		= tracing_write_stub,
>  	.llseek		= tracing_lseek,
>  	.release	= tracing_release,
>
David Howells May 21, 2023, 12:50 p.m. UTC | #3
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> David Howells <dhowells@redhat.com> wrote:
> 
> > For the splice from the trace seq buffer, just use copy_splice_read().
> 
> So this is because you will remove generic_file_splice_read() (since
> it's buggy), right?

An ITER_PIPE iterator has a problem if it gets reverted with other changes I
want to make.  The problem is that it may not be valid to control the lifetime
of the data in the buffer with get_page().  The pages may need a pin taking
(FOLL_PIN) or the lifetime might be controlled with kfree() or rmmod.

> > In the future, something better can probably be done by gifting pages from
> > seq->buf into the pipe, but that would require changing seq->buf into a
> > vmap over an array of pages.
> 
> ... We introduced splice support for avoiding copy ringbuffer pages, but
> this drops it. Thus this will drop performance of splice on ring buffer
> (trace file). If it is correct, can you also add a note about that?

Actually, no.  There is no special splice support for tracing_fops.  You
currently use generic_file_splice_read(), which wends its way down into
seq_read_iter.  However, the seqfile stuff uses kvmalloc() to allocate the
buffer, so you are not allowed to splice page refs from kmalloc'd or vmalloc'd
memory into a pipe, so it doesn't.  It calls copy_to_iter() which will cause
ITER_PIPE to allocate bufferage on an as-needed basis.

copy_splice_read() instead creates an ITER_BVEC and populates it up front
using the bulk allocator, so if you're splicing a lot of data, this ought to
be marginally faster.

> So what we need is to introduce a vmap?

We could implement seq_splice_read().  What we would need to do is to change
how the buffer is allocated: bulk allocate a bunch of arbitrary pages which we
then vmap().  When we need to splice, we read into the buffer, do a vunmap()
and then splice the pages holding the data we used into the pipe.

If we don't manage to splice all the data, we can continue splicing from the
pages we have left next time.  If a read() comes along to view partially
spliced data, we would need to copy from the individual pages.

When we use up all the data, we discard all the pages we might have spliced
from and shuffle down the other pages, call the bulk allocator to replenish
the buffer and then vmap() it again.

Any pages we've spliced from must be discarded and replaced and not rewritten.

If a read() comes without the buffer having been spliced from, it can do as it
does now.

David
Steven Rostedt May 23, 2023, 2:27 p.m. UTC | #4
On Sat, 20 May 2023 01:00:45 +0100
David Howells <dhowells@redhat.com> wrote:

> For the splice from the trace seq buffer, just use copy_splice_read().
> 
> In the future, something better can probably be done by gifting pages from
> seq->buf into the pipe, but that would require changing seq->buf into a
> vmap over an array of pages.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Steven Rostedt <rostedt@goodmis.org>
> cc: Masami Hiramatsu <mhiramat@kernel.org>
> cc: linux-kernel@vger.kernel.org
> cc: linux-trace-kernel@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-block@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>  kernel/trace/trace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index ebc59781456a..c210d02fac97 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5171,7 +5171,7 @@ static const struct file_operations tracing_fops = {
>  	.open		= tracing_open,
>  	.read		= seq_read,
>  	.read_iter	= seq_read_iter,
> -	.splice_read	= generic_file_splice_read,
> +	.splice_read	= copy_splice_read,

Anyway, for this change:

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

>  	.write		= tracing_write_stub,
>  	.llseek		= tracing_lseek,
>  	.release	= tracing_release,
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebc59781456a..c210d02fac97 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5171,7 +5171,7 @@  static const struct file_operations tracing_fops = {
 	.open		= tracing_open,
 	.read		= seq_read,
 	.read_iter	= seq_read_iter,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= copy_splice_read,
 	.write		= tracing_write_stub,
 	.llseek		= tracing_lseek,
 	.release	= tracing_release,