[06/25] vfs: strengthen checking of file range inputs to generic_remap_checks
diff mbox series

Message ID 153913028015.32295.15993665528948323051.stgit@magnolia
State New
Headers show
Series
  • fs: fixes for serious clone/dedupe problems
Related show

Commit Message

Darrick J. Wong Oct. 10, 2018, 12:11 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

File range remapping, if allowed to run past the destination file's EOF,
is an optimization on a regular file write.  Regular file writes that
extend the file length are subject to various constraints which are not
checked by range cloning.

This is a correctness problem because we're never allowed to touch
ranges that the page cache can't support (s_maxbytes); we're not
supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
set; and we must obey resource limits (RLIMIT_FSIZE).

Therefore, add these checks to the new generic_remap_checks function so
that we curtail unexpected behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/filemap.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

Amir Goldstein Oct. 10, 2018, 5:23 a.m. UTC | #1
On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> File range remapping, if allowed to run past the destination file's EOF,
> is an optimization on a regular file write.  Regular file writes that
> extend the file length are subject to various constraints which are not
> checked by range cloning.
>
> This is a correctness problem because we're never allowed to touch
> ranges that the page cache can't support (s_maxbytes); we're not
> supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
> set; and we must obey resource limits (RLIMIT_FSIZE).
>
> Therefore, add these checks to the new generic_remap_checks function so
> that we curtail unexpected behavior.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  mm/filemap.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 14041a8468ba..59056bd9c58a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
>  }
>  EXPORT_SYMBOL(generic_write_checks);
>
> +static int
> +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count)
> +{
> +       struct inode *inode = file->f_mapping->host;
> +
> +       /* Don't exceed the LFS limits. */
> +       if (unlikely(pos + *count > MAX_NON_LFS &&
> +                               !(file->f_flags & O_LARGEFILE))) {
> +               if (pos >= MAX_NON_LFS)
> +                       return -EFBIG;
> +               *count = min(*count, MAX_NON_LFS - (uint64_t)pos);
> +       }
> +
> +       /* Don't operate on ranges the page cache doesn't support. */
> +       if (unlikely(pos >= inode->i_sb->s_maxbytes))
> +               return -EFBIG;
> +
> +       *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos);
> +       return 0;
> +}
> +

Sorry. I haven't explained myself properly last time.
What I meant is that it hurts my eyes to see generic_write_checks() and
generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY)
are exactly the same thing. Yes, generic_remap_check_limits() uses
iov_iter_truncate(), but that's a minor semantic change - it can be easily
resolved by creating a dummy iter in generic_remap_checks() instead of
passing int *count.

You could say that this is nit picking, but the very reason this patch
set exists
it because clone/dedup implementation did not use the same range checks
of write to begin with, so it just seems wrong to diverge them at this point.

So to be clear, I suggest that generic_write_checks() should use your
generic_remap_check_limits() helper.
If you disagree and others can live with this minor duplication, fine by me.

Thanks,
Amir.
Darrick J. Wong Oct. 10, 2018, 5:01 p.m. UTC | #2
On Wed, Oct 10, 2018 at 08:23:27AM +0300, Amir Goldstein wrote:
> On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > File range remapping, if allowed to run past the destination file's EOF,
> > is an optimization on a regular file write.  Regular file writes that
> > extend the file length are subject to various constraints which are not
> > checked by range cloning.
> >
> > This is a correctness problem because we're never allowed to touch
> > ranges that the page cache can't support (s_maxbytes); we're not
> > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
> > set; and we must obey resource limits (RLIMIT_FSIZE).
> >
> > Therefore, add these checks to the new generic_remap_checks function so
> > that we curtail unexpected behavior.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  mm/filemap.c |   39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> >
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 14041a8468ba..59056bd9c58a 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
> >  }
> >  EXPORT_SYMBOL(generic_write_checks);
> >
> > +static int
> > +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count)
> > +{
> > +       struct inode *inode = file->f_mapping->host;
> > +
> > +       /* Don't exceed the LFS limits. */
> > +       if (unlikely(pos + *count > MAX_NON_LFS &&
> > +                               !(file->f_flags & O_LARGEFILE))) {
> > +               if (pos >= MAX_NON_LFS)
> > +                       return -EFBIG;
> > +               *count = min(*count, MAX_NON_LFS - (uint64_t)pos);
> > +       }
> > +
> > +       /* Don't operate on ranges the page cache doesn't support. */
> > +       if (unlikely(pos >= inode->i_sb->s_maxbytes))
> > +               return -EFBIG;
> > +
> > +       *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos);
> > +       return 0;
> > +}
> > +
> 
> Sorry. I haven't explained myself properly last time.
> What I meant is that it hurts my eyes to see generic_write_checks() and
> generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY)
> are exactly the same thing. Yes, generic_remap_check_limits() uses
> iov_iter_truncate(), but that's a minor semantic change - it can be easily
> resolved by creating a dummy iter in generic_remap_checks() instead of
> passing int *count.

Making a fake kiocb and iterator seem like a terribly fragile idea.

How about I make the common helper take a pos and *count, and
generic_write_checks can translate that into iov_iter_truncate?

> You could say that this is nit picking, but the very reason this patch
> set exists
> it because clone/dedup implementation did not use the same range checks
> of write to begin with, so it just seems wrong to diverge them at this point.
> 
> So to be clear, I suggest that generic_write_checks() should use your
> generic_remap_check_limits() helper.
> If you disagree and others can live with this minor duplication, fine by me.

Nah, I think I misunderstood you the first time, sorry about that.

--D

> Thanks,
> Amir.
Amir Goldstein Oct. 10, 2018, 5:26 p.m. UTC | #3
On Wed, Oct 10, 2018 at 8:01 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Oct 10, 2018 at 08:23:27AM +0300, Amir Goldstein wrote:
> > On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > >
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > File range remapping, if allowed to run past the destination file's EOF,
> > > is an optimization on a regular file write.  Regular file writes that
> > > extend the file length are subject to various constraints which are not
> > > checked by range cloning.
> > >
> > > This is a correctness problem because we're never allowed to touch
> > > ranges that the page cache can't support (s_maxbytes); we're not
> > > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
> > > set; and we must obey resource limits (RLIMIT_FSIZE).
> > >
> > > Therefore, add these checks to the new generic_remap_checks function so
> > > that we curtail unexpected behavior.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  mm/filemap.c |   39 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 39 insertions(+)
> > >
> > >
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 14041a8468ba..59056bd9c58a 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
> > >  }
> > >  EXPORT_SYMBOL(generic_write_checks);
> > >
> > > +static int
> > > +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count)
> > > +{
> > > +       struct inode *inode = file->f_mapping->host;
> > > +
> > > +       /* Don't exceed the LFS limits. */
> > > +       if (unlikely(pos + *count > MAX_NON_LFS &&
> > > +                               !(file->f_flags & O_LARGEFILE))) {
> > > +               if (pos >= MAX_NON_LFS)
> > > +                       return -EFBIG;
> > > +               *count = min(*count, MAX_NON_LFS - (uint64_t)pos);
> > > +       }
> > > +
> > > +       /* Don't operate on ranges the page cache doesn't support. */
> > > +       if (unlikely(pos >= inode->i_sb->s_maxbytes))
> > > +               return -EFBIG;
> > > +
> > > +       *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos);
> > > +       return 0;
> > > +}
> > > +
> >
> > Sorry. I haven't explained myself properly last time.
> > What I meant is that it hurts my eyes to see generic_write_checks() and
> > generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY)
> > are exactly the same thing. Yes, generic_remap_check_limits() uses
> > iov_iter_truncate(), but that's a minor semantic change - it can be easily
> > resolved by creating a dummy iter in generic_remap_checks() instead of
> > passing int *count.
>
> Making a fake kiocb and iterator seem like a terribly fragile idea.
>
> How about I make the common helper take a pos and *count, and
> generic_write_checks can translate that into iov_iter_truncate?
>

Seems good to me.

Thanks,
Amir.

Patch
diff mbox series

diff --git a/mm/filemap.c b/mm/filemap.c
index 14041a8468ba..59056bd9c58a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2974,6 +2974,27 @@  inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 }
 EXPORT_SYMBOL(generic_write_checks);
 
+static int
+generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count)
+{
+	struct inode *inode = file->f_mapping->host;
+
+	/* Don't exceed the LFS limits. */
+	if (unlikely(pos + *count > MAX_NON_LFS &&
+				!(file->f_flags & O_LARGEFILE))) {
+		if (pos >= MAX_NON_LFS)
+			return -EFBIG;
+		*count = min(*count, MAX_NON_LFS - (uint64_t)pos);
+	}
+
+	/* Don't operate on ranges the page cache doesn't support. */
+	if (unlikely(pos >= inode->i_sb->s_maxbytes))
+		return -EFBIG;
+
+	*count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos);
+	return 0;
+}
+
 /*
  * Performs necessary checks before doing a clone.
  *
@@ -2992,6 +3013,7 @@  int generic_remap_checks(struct file *file_in, loff_t pos_in,
 	uint64_t bcount;
 	loff_t size_in, size_out;
 	loff_t bs = inode_out->i_sb->s_blocksize;
+	int ret;
 
 	/* The start of both ranges must be aligned to an fs block. */
 	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs))
@@ -3015,6 +3037,23 @@  int generic_remap_checks(struct file *file_in, loff_t pos_in,
 		return -EINVAL;
 	count = min(count, size_in - (uint64_t)pos_in);
 
+	/* Don't exceed RLMIT_FSIZE in the file we're writing into. */
+	if (limit != RLIM_INFINITY) {
+		if (pos_out >= limit) {
+			send_sig(SIGXFSZ, current, 0);
+			return -EFBIG;
+		}
+		count = min(count, limit - (uint64_t)pos_out);
+	}
+
+	ret = generic_remap_check_limits(file_in, pos_in, &count);
+	if (ret)
+		return ret;
+
+	ret = generic_remap_check_limits(file_out, pos_out, &count);
+	if (ret)
+		return ret;
+
 	/*
 	 * If the user wanted us to link to the infile's EOF, round up to the
 	 * next block boundary for this check.