[3/2] splice: increase pipe size in splice_direct_to_actor()
diff mbox series

Message ID 20181109005410.GG19305@dastard
State New
Headers show
Series
  • : dedupe/copy_file_range fixes
Related show

Commit Message

Dave Chinner Nov. 9, 2018, 12:54 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When copy_file_range() is called on files that have been opened with
O_DIRECT, do_splice_direct() does a manual copy of the range one
pipe buffer at a time. The default is 16 pages, which means on
x86_64 it is limited to 64kB IO. This is extremely slow - 64k
synchrnous read/write will run at maybe 5-10MB/s on a spinning disk
and be seek bound. It will be faster on SSDs, but still very
inefficient.

Increase the pipe size to the maximum allowed user size so that we
can get decent throughput for this highly sub-optimal copy loop. Add
a new function to the pipe code that lets us set the pipe size to
the maximum allowed without root permissions to keep things really
simple. We also don't care if changing the pipe size fails - that
will just result in a slower copy.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/pipe.c                 | 10 ++++++++++
 fs/splice.c               |  7 +++++++
 include/linux/pipe_fs_i.h |  1 +
 3 files changed, 18 insertions(+)

Comments

Darrick J. Wong Nov. 13, 2018, 4:22 p.m. UTC | #1
On Fri, Nov 09, 2018 at 11:54:10AM +1100, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When copy_file_range() is called on files that have been opened with
> O_DIRECT, do_splice_direct() does a manual copy of the range one
> pipe buffer at a time. The default is 16 pages, which means on
> x86_64 it is limited to 64kB IO. This is extremely slow - 64k
> synchrnous read/write will run at maybe 5-10MB/s on a spinning disk
> and be seek bound. It will be faster on SSDs, but still very
> inefficient.
> 
> Increase the pipe size to the maximum allowed user size so that we
> can get decent throughput for this highly sub-optimal copy loop. Add
> a new function to the pipe code that lets us set the pipe size to
> the maximum allowed without root permissions to keep things really
> simple. We also don't care if changing the pipe size fails - that
> will just result in a slower copy.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/pipe.c                 | 10 ++++++++++
>  fs/splice.c               |  7 +++++++
>  include/linux/pipe_fs_i.h |  1 +
>  3 files changed, 18 insertions(+)
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index bdc5d3c0977d..436bc0464569 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -1109,6 +1109,16 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>  	return ret;
>  }
>  
> +/*
> + * Set the pipe to the maximum allowable user size. Advisory only, will
> + * swallow any errors and return the resultant pipe size.
> + */
> +long pipe_set_max_safe_size(struct pipe_inode_info *pipe)
> +{
> +	pipe_set_size(pipe, pipe_max_size);
> +	return pipe->buffers * PAGE_SIZE;
> +}
> +
>  /*
>   * After the inode slimming patch, i_pipe/i_bdev/i_cdev share the same
>   * location, so checking ->i_pipe is not enough to verify that this is a
> diff --git a/fs/splice.c b/fs/splice.c
> index 3553f1956508..9749139da731 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -931,6 +931,13 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
>  		current->splice_pipe = pipe;
>  	}
>  
> +	/*
> +	 * Try to increase the data holding capacity of the pipe so we can do
> +	 * larger IOs. This may not increase the size at all because maximum
> +	 * user pipe size is administrator controlled, but we still should try.
> +	 */
> +	pipe_set_max_safe_size(pipe);

I get where you're going with this, but I have two questions:

- Is it safe to be enlarging the pipe buffer size unconditionally?

- Especially if we didn't just create the splice pipe?  Suppose someone
  comes along later trying to splice things and doesn't realize the pipe
  is now 1MB...

Then I started wondering about the splice_pipe lifetime and couldn't
figure out if it ever gets detached from current prior to do_exit.
I don't think it does, which means that we're stuck with the 1MB
kernel memory allocation until the process dies.

--D

> +
>  	/*
>  	 * Do the splice.
>  	 */
> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
> index 5a3bb3b7c9ad..962ba4cfcb74 100644
> --- a/include/linux/pipe_fs_i.h
> +++ b/include/linux/pipe_fs_i.h
> @@ -191,5 +191,6 @@ struct pipe_inode_info *get_pipe_info(struct file *file);
>  
>  int create_pipe_files(struct file **, int);
>  unsigned int round_pipe_size(unsigned long size);
> +long pipe_set_max_safe_size(struct pipe_inode_info *pipe);
>  
>  #endif
Dave Chinner Nov. 13, 2018, 9:41 p.m. UTC | #2
On Tue, Nov 13, 2018 at 08:22:37AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 09, 2018 at 11:54:10AM +1100, Dave Chinner wrote:
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When copy_file_range() is called on files that have been opened with
> > O_DIRECT, do_splice_direct() does a manual copy of the range one
> > pipe buffer at a time. The default is 16 pages, which means on
> > x86_64 it is limited to 64kB IO. This is extremely slow - 64k
> > synchrnous read/write will run at maybe 5-10MB/s on a spinning disk
> > and be seek bound. It will be faster on SSDs, but still very
> > inefficient.
> > 
> > Increase the pipe size to the maximum allowed user size so that we
> > can get decent throughput for this highly sub-optimal copy loop. Add
> > a new function to the pipe code that lets us set the pipe size to
> > the maximum allowed without root permissions to keep things really
> > simple. We also don't care if changing the pipe size fails - that
> > will just result in a slower copy.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/pipe.c                 | 10 ++++++++++
> >  fs/splice.c               |  7 +++++++
> >  include/linux/pipe_fs_i.h |  1 +
> >  3 files changed, 18 insertions(+)
> > 
> > diff --git a/fs/pipe.c b/fs/pipe.c
> > index bdc5d3c0977d..436bc0464569 100644
> > --- a/fs/pipe.c
> > +++ b/fs/pipe.c
> > @@ -1109,6 +1109,16 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Set the pipe to the maximum allowable user size. Advisory only, will
> > + * swallow any errors and return the resultant pipe size.
> > + */
> > +long pipe_set_max_safe_size(struct pipe_inode_info *pipe)
> > +{
> > +	pipe_set_size(pipe, pipe_max_size);
> > +	return pipe->buffers * PAGE_SIZE;
> > +}
> > +
> >  /*
> >   * After the inode slimming patch, i_pipe/i_bdev/i_cdev share the same
> >   * location, so checking ->i_pipe is not enough to verify that this is a
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 3553f1956508..9749139da731 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -931,6 +931,13 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
> >  		current->splice_pipe = pipe;
> >  	}
> >  
> > +	/*
> > +	 * Try to increase the data holding capacity of the pipe so we can do
> > +	 * larger IOs. This may not increase the size at all because maximum
> > +	 * user pipe size is administrator controlled, but we still should try.
> > +	 */
> > +	pipe_set_max_safe_size(pipe);
> 
> I get where you're going with this, but I have two questions:
> 
> - Is it safe to be enlarging the pipe buffer size unconditionally?

Don't see why it would be unsafe.

> - Especially if we didn't just create the splice pipe?  Suppose someone
>   comes along later trying to splice things and doesn't realize the pipe
>   is now 1MB...

The splice code is supposed to handle arbitrary pipe sizes
correctly. if something breaks because it has assumptions about how
much data a pipe can hold, it's already broken.

> Then I started wondering about the splice_pipe lifetime and couldn't
> figure out if it ever gets detached from current prior to do_exit.
> I don't think it does, which means that we're stuck with the 1MB
> kernel memory allocation until the process dies.

If you are using do_splice_direct(), you either have a short term
process (i.e. a cp type utility) or you are moving bulk data around,
in which case 1MB of extra memory isn't a big deal.

And given that the default pipe size is dependent on PAGE_SIZE (i.e.
the default is 16 pages, not 64kB) then on 64k page architectures we
are already using pipes of 1MB capacity by default.

I could make this contingent on O_DIRECT, but then we have the
problem that 64k pipes aren't big enough for efficient buffered IO
with 64k block size filesystems, either. IOWs, the pipe size in
do_splice_direct needs to be increased whichever way we look at it.

This all said, I really think we need to imlpement our own
->copy_file_range() code that uses iomap to iterate data-only
extents (i.e. hole-preserving) copying and does well formed IO.
IOWs, only fall back to do_splice_direct() when doing copies to/from
non-XFS filesystems.

Cheers,

Dave.
Christoph Hellwig Nov. 15, 2018, 10:17 a.m. UTC | #3
> +long pipe_set_max_safe_size(struct pipe_inode_info *pipe)
> +{
> +	pipe_set_size(pipe, pipe_max_size);
> +	return pipe->buffers * PAGE_SIZE;
> +}

This should probably return an unsigned value, given that we don't return
errors.  Then again the callers ignores the return value entirely.
Wouldn't it be easier to just call pipe_set_size from splice.c after
removing the static marker?
Dave Chinner Nov. 16, 2018, 5:42 a.m. UTC | #4
On Thu, Nov 15, 2018 at 02:17:35AM -0800, Christoph Hellwig wrote:
> > +long pipe_set_max_safe_size(struct pipe_inode_info *pipe)
> > +{
> > +	pipe_set_size(pipe, pipe_max_size);
> > +	return pipe->buffers * PAGE_SIZE;
> > +}
> 
> This should probably return an unsigned value, given that we don't return
> errors.  Then again the callers ignores the return value entirely.
> Wouldn't it be easier to just call pipe_set_size from splice.c after
> removing the static marker?

I didn't want to use pipe_max_size outside of the pipe code. it's
definitely simpler without a wrapper if using pipe_max_size directly
is acceptable.

Cheers,

Dave.

Patch
diff mbox series

diff --git a/fs/pipe.c b/fs/pipe.c
index bdc5d3c0977d..436bc0464569 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1109,6 +1109,16 @@  static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 	return ret;
 }
 
+/*
+ * Set the pipe to the maximum allowable user size. Advisory only, will
+ * swallow any errors and return the resultant pipe size.
+ */
+long pipe_set_max_safe_size(struct pipe_inode_info *pipe)
+{
+	pipe_set_size(pipe, pipe_max_size);
+	return pipe->buffers * PAGE_SIZE;
+}
+
 /*
  * After the inode slimming patch, i_pipe/i_bdev/i_cdev share the same
  * location, so checking ->i_pipe is not enough to verify that this is a
diff --git a/fs/splice.c b/fs/splice.c
index 3553f1956508..9749139da731 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -931,6 +931,13 @@  ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		current->splice_pipe = pipe;
 	}
 
+	/*
+	 * Try to increase the data holding capacity of the pipe so we can do
+	 * larger IOs. This may not increase the size at all because maximum
+	 * user pipe size is administrator controlled, but we still should try.
+	 */
+	pipe_set_max_safe_size(pipe);
+
 	/*
 	 * Do the splice.
 	 */
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 5a3bb3b7c9ad..962ba4cfcb74 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -191,5 +191,6 @@  struct pipe_inode_info *get_pipe_info(struct file *file);
 
 int create_pipe_files(struct file **, int);
 unsigned int round_pipe_size(unsigned long size);
+long pipe_set_max_safe_size(struct pipe_inode_info *pipe);
 
 #endif