diff mbox series

ceph: copy_file_range needs to strip setuid bits and update timestamps

Message ID 20190610174007.4818-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series ceph: copy_file_range needs to strip setuid bits and update timestamps | expand

Commit Message

Amir Goldstein June 10, 2019, 5:40 p.m. UTC
Because ceph doesn't hold destination inode lock throughout the copy,
strip setuid bits before and after copy.

The destination inode mtime is updated before and after the copy and the
source inode atime is updated after the copy, similar to the filesystem
->read_iter() implementation.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Hi Ilya,

Please consider applying this patch to ceph branch after merging
Darrick's copy-file-range-fixes branch from:
        git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git

The series (including this patch) was tested on ceph by
Luis Henriques using new copy_range xfstests.

AFAIK, only fallback from ceph to generic_copy_file_range()
implementation was tested and not the actual ceph clustered
copy_file_range.

Thanks,
Amir.

 fs/ceph/file.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Ilya Dryomov June 10, 2019, 7:24 p.m. UTC | #1
On Mon, Jun 10, 2019 at 7:40 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Because ceph doesn't hold destination inode lock throughout the copy,
> strip setuid bits before and after copy.
>
> The destination inode mtime is updated before and after the copy and the
> source inode atime is updated after the copy, similar to the filesystem
> ->read_iter() implementation.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Hi Ilya,
>
> Please consider applying this patch to ceph branch after merging
> Darrick's copy-file-range-fixes branch from:
>         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
>
> The series (including this patch) was tested on ceph by
> Luis Henriques using new copy_range xfstests.
>
> AFAIK, only fallback from ceph to generic_copy_file_range()
> implementation was tested and not the actual ceph clustered
> copy_file_range.

Zheng, Jeff, please take a look.

Thanks,

                Ilya
Luis Henriques June 11, 2019, 8:39 a.m. UTC | #2
Amir Goldstein <amir73il@gmail.com> writes:

> Because ceph doesn't hold destination inode lock throughout the copy,
> strip setuid bits before and after copy.
>
> The destination inode mtime is updated before and after the copy and the
> source inode atime is updated after the copy, similar to the filesystem
> ->read_iter() implementation.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Hi Ilya,
>
> Please consider applying this patch to ceph branch after merging
> Darrick's copy-file-range-fixes branch from:
>         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
>
> The series (including this patch) was tested on ceph by
> Luis Henriques using new copy_range xfstests.
>
> AFAIK, only fallback from ceph to generic_copy_file_range()
> implementation was tested and not the actual ceph clustered
> copy_file_range.

JFYI I've also run tests that exercise the ceph implementation,
i.e. that actually do the copy offload.  It's the xfstests that (AFAIR)
only exercise the generic VFS copy_file_range as they never meet the
requirements for this copy offload to happen (for example, the copy must
be at least the same length as the files object size which is 4M by
default).

Cheers,
Jeff Layton June 13, 2019, 12:03 p.m. UTC | #3
On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote:
> Because ceph doesn't hold destination inode lock throughout the copy,
> strip setuid bits before and after copy.
> 
> The destination inode mtime is updated before and after the copy and the
> source inode atime is updated after the copy, similar to the filesystem
> ->read_iter() implementation.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Hi Ilya,
> 
> Please consider applying this patch to ceph branch after merging
> Darrick's copy-file-range-fixes branch from:
>         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
> 
> The series (including this patch) was tested on ceph by
> Luis Henriques using new copy_range xfstests.
> 
> AFAIK, only fallback from ceph to generic_copy_file_range()
> implementation was tested and not the actual ceph clustered
> copy_file_range.
> 
> Thanks,
> Amir.
> 
>  fs/ceph/file.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index c5517ffeb11c..b04c97c7d393 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>  		goto out;
>  	}
>  
> +	/* Should dst_inode lock be held throughout the copy operation? */
> +	inode_lock(dst_inode);
> +	ret = file_modified(dst_file);
> +	inode_unlock(dst_inode);
> +	if (ret < 0) {
> +		dout("failed to modify dst file before copy (%zd)\n", ret);
> +		goto out;
> +	}
> +

I don't see anything that guarantees that the mode of the destination
file is up to date at this point. file_modified() just ends up checking
the mode cached in the inode.

I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference
to AUTH_SHARED caps on the destination inode, and then call
file_modified() after we get those caps. That would also mean that we
wouldn't need to do this a second time after the copy.

The catch is that if we did need to issue a setattr, I'm not sure if
we'd need to release those caps first.

Luis, Zheng, thoughts?

>  	/*
>  	 * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other
>  	 * clients may have dirty data in their caches.  And OSDs know nothing
> @@ -2099,6 +2108,14 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>  out:
>  	ceph_free_cap_flush(prealloc_cf);
>  
> +	file_accessed(src_file);
> +	/* To be on the safe side, try to remove privs also after copy */
> +	inode_lock(dst_inode);
> +	err = file_modified(dst_file);
> +	inode_unlock(dst_inode);
> +	if (err < 0)
> +		dout("failed to modify dst file after copy (%d)\n", err);
> +
>  	return ret;
>  }
>
Luis Henriques June 13, 2019, 3:50 p.m. UTC | #4
Jeff Layton <jlayton@kernel.org> writes:

> On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote:
>> Because ceph doesn't hold destination inode lock throughout the copy,
>> strip setuid bits before and after copy.
>> 
>> The destination inode mtime is updated before and after the copy and the
>> source inode atime is updated after the copy, similar to the filesystem
>> ->read_iter() implementation.
>> 
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>> 
>> Hi Ilya,
>> 
>> Please consider applying this patch to ceph branch after merging
>> Darrick's copy-file-range-fixes branch from:
>>         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
>> 
>> The series (including this patch) was tested on ceph by
>> Luis Henriques using new copy_range xfstests.
>> 
>> AFAIK, only fallback from ceph to generic_copy_file_range()
>> implementation was tested and not the actual ceph clustered
>> copy_file_range.
>> 
>> Thanks,
>> Amir.
>> 
>>  fs/ceph/file.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>> 
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index c5517ffeb11c..b04c97c7d393 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>>  		goto out;
>>  	}
>>  
>> +	/* Should dst_inode lock be held throughout the copy operation? */
>> +	inode_lock(dst_inode);
>> +	ret = file_modified(dst_file);
>> +	inode_unlock(dst_inode);
>> +	if (ret < 0) {
>> +		dout("failed to modify dst file before copy (%zd)\n", ret);
>> +		goto out;
>> +	}
>> +
>
> I don't see anything that guarantees that the mode of the destination
> file is up to date at this point. file_modified() just ends up checking
> the mode cached in the inode.
>
> I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference
> to AUTH_SHARED caps on the destination inode, and then call
> file_modified() after we get those caps. That would also mean that we
> wouldn't need to do this a second time after the copy.
>
> The catch is that if we did need to issue a setattr, I'm not sure if
> we'd need to release those caps first.
>
> Luis, Zheng, thoughts?

Hmm... I missed that.  IIRC the FILE_WR caps allow to modify some
metadata (such as timestamps, and file size).  I suppose it doesn't
allow to cache the mode, does it?  If it does, fixing it would be a
matter of moving the code a bit further down.  If it doesn't the
ceph_copy_file_range function already has this problem, as it calls
file_update_time.  And I wonder if other code paths have this problem
too.

Obviously, the chunk below will have the same problem.

Cheers,
Jeff Layton June 13, 2019, 5:48 p.m. UTC | #5
On Thu, 2019-06-13 at 16:50 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote:
> > > Because ceph doesn't hold destination inode lock throughout the copy,
> > > strip setuid bits before and after copy.
> > > 
> > > The destination inode mtime is updated before and after the copy and the
> > > source inode atime is updated after the copy, similar to the filesystem
> > > ->read_iter() implementation.
> > > 
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > > 
> > > Hi Ilya,
> > > 
> > > Please consider applying this patch to ceph branch after merging
> > > Darrick's copy-file-range-fixes branch from:
> > >         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
> > > 
> > > The series (including this patch) was tested on ceph by
> > > Luis Henriques using new copy_range xfstests.
> > > 
> > > AFAIK, only fallback from ceph to generic_copy_file_range()
> > > implementation was tested and not the actual ceph clustered
> > > copy_file_range.
> > > 
> > > Thanks,
> > > Amir.
> > > 
> > >  fs/ceph/file.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index c5517ffeb11c..b04c97c7d393 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> > >  		goto out;
> > >  	}
> > >  
> > > +	/* Should dst_inode lock be held throughout the copy operation? */
> > > +	inode_lock(dst_inode);
> > > +	ret = file_modified(dst_file);
> > > +	inode_unlock(dst_inode);
> > > +	if (ret < 0) {
> > > +		dout("failed to modify dst file before copy (%zd)\n", ret);
> > > +		goto out;
> > > +	}
> > > +
> > 
> > I don't see anything that guarantees that the mode of the destination
> > file is up to date at this point. file_modified() just ends up checking
> > the mode cached in the inode.
> > 
> > I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference
> > to AUTH_SHARED caps on the destination inode, and then call
> > file_modified() after we get those caps. That would also mean that we
> > wouldn't need to do this a second time after the copy.
> > 
> > The catch is that if we did need to issue a setattr, I'm not sure if
> > we'd need to release those caps first.
> > 
> > Luis, Zheng, thoughts?
> 
> Hmm... I missed that.  IIRC the FILE_WR caps allow to modify some
> metadata (such as timestamps, and file size).  I suppose it doesn't
> allow to cache the mode, does it? 

No, W caps don't guarantee that the mode won't change. You need As or Ax
caps for that.

>  If it does, fixing it would be a
> matter of moving the code a bit further down.  If it doesn't the
> ceph_copy_file_range function already has this problem, as it calls
> file_update_time.  And I wonder if other code paths have this problem
> too.
> 

I think you mean file_remove_privs, but yes...the write codepath has a
similar problem. file_remove_privs is called before acquiring any caps,
so the same thing could happen there too.

It'd be good to fix both places, but taking As cap references in the
write codepath could have performance impact in some cases. OTOH, they
don't change that much, so maybe that's OK.

> Obviously, the chunk below will have the same problem.
> 

Right. If however, we have this code take an As cap reference before
doing the copy, then we can be sure that the mode can't change until we
drop them. That way we wouldn't need the second call.
Luis Henriques June 14, 2019, 8:52 a.m. UTC | #6
Jeff Layton <jlayton@kernel.org> writes:

> On Thu, 2019-06-13 at 16:50 +0100, Luis Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote:
>> > > Because ceph doesn't hold destination inode lock throughout the copy,
>> > > strip setuid bits before and after copy.
>> > > 
>> > > The destination inode mtime is updated before and after the copy and the
>> > > source inode atime is updated after the copy, similar to the filesystem
>> > > ->read_iter() implementation.
>> > > 
>> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> > > ---
>> > > 
>> > > Hi Ilya,
>> > > 
>> > > Please consider applying this patch to ceph branch after merging
>> > > Darrick's copy-file-range-fixes branch from:
>> > >         git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
>> > > 
>> > > The series (including this patch) was tested on ceph by
>> > > Luis Henriques using new copy_range xfstests.
>> > > 
>> > > AFAIK, only fallback from ceph to generic_copy_file_range()
>> > > implementation was tested and not the actual ceph clustered
>> > > copy_file_range.
>> > > 
>> > > Thanks,
>> > > Amir.
>> > > 
>> > >  fs/ceph/file.c | 17 +++++++++++++++++
>> > >  1 file changed, 17 insertions(+)
>> > > 
>> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> > > index c5517ffeb11c..b04c97c7d393 100644
>> > > --- a/fs/ceph/file.c
>> > > +++ b/fs/ceph/file.c
>> > > @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> > >  		goto out;
>> > >  	}
>> > >  
>> > > +	/* Should dst_inode lock be held throughout the copy operation? */
>> > > +	inode_lock(dst_inode);
>> > > +	ret = file_modified(dst_file);
>> > > +	inode_unlock(dst_inode);
>> > > +	if (ret < 0) {
>> > > +		dout("failed to modify dst file before copy (%zd)\n", ret);
>> > > +		goto out;
>> > > +	}
>> > > +
>> > 
>> > I don't see anything that guarantees that the mode of the destination
>> > file is up to date at this point. file_modified() just ends up checking
>> > the mode cached in the inode.
>> > 
>> > I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference
>> > to AUTH_SHARED caps on the destination inode, and then call
>> > file_modified() after we get those caps. That would also mean that we
>> > wouldn't need to do this a second time after the copy.
>> > 
>> > The catch is that if we did need to issue a setattr, I'm not sure if
>> > we'd need to release those caps first.
>> > 
>> > Luis, Zheng, thoughts?
>> 
>> Hmm... I missed that.  IIRC the FILE_WR caps allow to modify some
>> metadata (such as timestamps, and file size).  I suppose it doesn't
>> allow to cache the mode, does it? 
>
> No, W caps don't guarantee that the mode won't change. You need As or Ax
> caps for that.
>
>>  If it does, fixing it would be a
>> matter of moving the code a bit further down.  If it doesn't the
>> ceph_copy_file_range function already has this problem, as it calls
>> file_update_time.  And I wonder if other code paths have this problem
>> too.
>> 
>
> I think you mean file_remove_privs, but yes...the write codepath has a
> similar problem. file_remove_privs is called before acquiring any caps,
> so the same thing could happen there too.
>
> It'd be good to fix both places, but taking As cap references in the
> write codepath could have performance impact in some cases. OTOH, they
> don't change that much, so maybe that's OK.
>
>> Obviously, the chunk below will have the same problem.
>> 
>
> Right. If however, we have this code take an As cap reference before
> doing the copy, then we can be sure that the mode can't change until we
> drop them. That way we wouldn't need the second call.

So, do you think the patch below would be enough?  It's totally
untested, but I wanted to know if that would be acceptable before
running some tests on it.

Cheers,
Jeff Layton June 14, 2019, 11:43 a.m. UTC | #7
On Fri, 2019-06-14 at 09:52 +0100, Luis Henriques wrote:
> So, do you think the patch below would be enough?  It's totally
> untested, but I wanted to know if that would be acceptable before
> running some tests on it.
> 
> Cheers,
> --
> Luis
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index c5517ffeb11c..f6b0683dd8dc 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1949,6 +1949,21 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>                 goto out;
>         }
>  
> +       ret = ceph_do_getattr(dst_inode, CEPH_CAP_AUTH_SHARED, false);
> +       if (ret < 0) {
> +               dout("failed to get auth caps on dst file (%zd)\n", ret);
> +               goto out;
> +       }
> +

I think this is still racy. You could lose As caps before file_modified
is called. IMO, this code should hold a reference to As caps until the
c_f_r operation is complete.

That may get tricky however if you do need to issue a setattr to change
the mode, as the MDS may try to recall As caps at that point. You won't
be able to release them until you drop the reference, so will that
deadlock? I'm not sure.

> +       /* Should dst_inode lock be held throughout the copy operation? */
> +       inode_lock(dst_inode);
> +       ret = file_modified(dst_file);
> +       inode_unlock(dst_inode);
> +       if (ret < 0) {
> +               dout("failed to modify dst file before copy (%zd)\n", ret);
> +               goto out;
> +       }
> +
>         /*
>          * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other
>          * clients may have dirty data in their caches.  And OSDs know nothing
Jeff Layton June 14, 2019, 5:38 p.m. UTC | #8
On Fri, 2019-06-14 at 07:43 -0400, Jeff Layton wrote:
> On Fri, 2019-06-14 at 09:52 +0100, Luis Henriques wrote:
> > So, do you think the patch below would be enough?  It's totally
> > untested, but I wanted to know if that would be acceptable before
> > running some tests on it.
> > 
> > Cheers,
> > --
> > Luis
> > 
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index c5517ffeb11c..f6b0683dd8dc 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1949,6 +1949,21 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> >                 goto out;
> >         }
> >  
> > +       ret = ceph_do_getattr(dst_inode, CEPH_CAP_AUTH_SHARED, false);
> > +       if (ret < 0) {
> > +               dout("failed to get auth caps on dst file (%zd)\n", ret);
> > +               goto out;
> > +       }
> > +
> 
> I think this is still racy. You could lose As caps before file_modified
> is called. IMO, this code should hold a reference to As caps until the
> c_f_r operation is complete.
> 
> That may get tricky however if you do need to issue a setattr to change
> the mode, as the MDS may try to recall As caps at that point. You won't
> be able to release them until you drop the reference, so will that
> deadlock? I'm not sure.
> 

That said...in many (most?) cases the client will already have As caps
on the file from the permission check during open, and mode changes
(and AUTH cap revokes) are relatively rare. So, the race I'm talking
about probably almost never happens in practice.

But...privilege escalation attacks often involve setuid changes, so I
think we really ought to be careful here.

In any case, if holding a reference to As is not feasible, then I we
could take the original version of the patch, and maybe pair it with
the getattr above. It's not ideal, but it's better than nothing.


> > +       /* Should dst_inode lock be held throughout the copy operation? */
> > +       inode_lock(dst_inode);
> > +       ret = file_modified(dst_file);
> > +       inode_unlock(dst_inode);
> > +       if (ret < 0) {
> > +               dout("failed to modify dst file before copy (%zd)\n", ret);
> > +               goto out;
> > +       }
> > +
> >         /*
> >          * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other
> >          * clients may have dirty data in their caches.  And OSDs know nothing
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c5517ffeb11c..b04c97c7d393 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1949,6 +1949,15 @@  static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 		goto out;
 	}
 
+	/* Should dst_inode lock be held throughout the copy operation? */
+	inode_lock(dst_inode);
+	ret = file_modified(dst_file);
+	inode_unlock(dst_inode);
+	if (ret < 0) {
+		dout("failed to modify dst file before copy (%zd)\n", ret);
+		goto out;
+	}
+
 	/*
 	 * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other
 	 * clients may have dirty data in their caches.  And OSDs know nothing
@@ -2099,6 +2108,14 @@  static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 out:
 	ceph_free_cap_flush(prealloc_cf);
 
+	file_accessed(src_file);
+	/* To be on the safe side, try to remove privs also after copy */
+	inode_lock(dst_inode);
+	err = file_modified(dst_file);
+	inode_unlock(dst_inode);
+	if (err < 0)
+		dout("failed to modify dst file after copy (%d)\n", err);
+
 	return ret;
 }