xfs: preserve i_mode if __xfs_set_acl() fails

Message ID	20170814081806.GA3924@debian.home (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> Date: Mon, 14 Aug 2017 05:18:10 -0300 From: Ernesto =?utf-8?Q?A=2E_Fern=C3=A1ndez?= <ernesto.mnd.fernandez@gmail.com> To: linux-xfs@vger.kernel.org Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, Zorro Lang <zlang@redhat.com>, Ernesto =?utf-8?Q?A=2E_Fern=C3=A1ndez?= <ernesto.mnd.fernandez@gmail.com> Subject: [PATCH] xfs: preserve i_mode if __xfs_set_acl() fails Message-ID: <20170814081806.GA3924@debian.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

Ernesto A. Fernández Aug. 14, 2017, 8:18 a.m. UTC

When changing a file's ACL mask, xfs_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new ACL.

If the second part fails (due to lack of space, for example) and the
file had no ACL attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

To prevent this, perform the mode update and the ACL update as part of
the same transaction, and restore the original mode to the vfs inode in
case of failure. This requires a change in how extended attributes are
deleted: commit the transaction even if there was no attribute to
remove, to log the standard inode fields.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
---
This issue is covered by generic/449 in xfstests. Several other filesystems
are affected by this, some of them have already applied patches:
  - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
  - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
  - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
  - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
The test won't exactly pass after this patch is applied. It will fail
differently, this time claiming that the filesystem is inconsistent. This
probably has something to do with setting too many extended attributes, as
it goes away if you change the attribute size in the test from 1k to 64k.

 fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
 fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
 2 files changed, 25 insertions(+), 27 deletions(-)

Darrick J. Wong Aug. 14, 2017, 10:28 p.m. UTC | #1

On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> When changing a file's ACL mask, xfs_set_acl() will first set the group
> bits of i_mode to the value of the mask, and only then set the actual
> extended attribute representing the new ACL.
> 
> If the second part fails (due to lack of space, for example) and the
> file had no ACL attribute to begin with, the system will from now on
> assume that the mask permission bits are actual group permission bits,
> potentially granting access to the wrong users.
> 
> To prevent this, perform the mode update and the ACL update as part of
> the same transaction, and restore the original mode to the vfs inode in
> case of failure. This requires a change in how extended attributes are
> deleted: commit the transaction even if there was no attribute to
> remove, to log the standard inode fields.
> 
> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
> ---
> This issue is covered by generic/449 in xfstests. Several other filesystems
> are affected by this, some of them have already applied patches:
>   - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
>   - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
>   - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
>   - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
> The test won't exactly pass after this patch is applied. It will fail
> differently, this time claiming that the filesystem is inconsistent. This

I disagree that corrupting the filesystem is an acceptable strategy for
dealing with insufficient space to handle setfacl.

> probably has something to do with setting too many extended attributes, as
> it goes away if you change the attribute size in the test from 1k to 64k.

Hadn't you better look into why it does that and handle it?

--D

> 
>  fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
>  fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
>  2 files changed, 25 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index de7b9bd..c3193e1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -395,6 +395,7 @@ xfs_attr_remove(
>  	struct xfs_defer_ops	dfops;
>  	xfs_fsblock_t		firstblock;
>  	int			error;
> +	int			noattr = 0;
>  
>  	XFS_STATS_INC(mp, xs_attr_remove);
>  
> @@ -438,7 +439,12 @@ xfs_attr_remove(
>  	xfs_trans_ijoin(args.trans, dp, 0);
>  
>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		/*
> +		 * Commit even if there is no attr to remove, because when
> +		 * setting ACLs the caller may be changing the mode in the
> +		 * same transaction.
> +		 */
> +		noattr = 1;
>  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>  		error = xfs_attr_shortform_remove(&args);
> @@ -458,7 +464,7 @@ xfs_attr_remove(
>  	if (mp->m_flags & XFS_MOUNT_WSYNC)
>  		xfs_trans_set_sync(args.trans);
>  
> -	if ((flags & ATTR_KERNOTIME) == 0)
> +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
>  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
>  
>  	/*
> @@ -468,6 +474,9 @@ xfs_attr_remove(
>  	error = xfs_trans_commit(args.trans);
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
>  
> +	if (!error && noattr)
> +		error = -ENOATTR;
> +
>  	return error;
>  
>  out:
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 7034e17..1fe4363 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  	return error;
>  }
>  
> -static int
> -xfs_set_mode(struct inode *inode, umode_t mode)
> -{
> -	int error = 0;
> -
> -	if (mode != inode->i_mode) {
> -		struct iattr iattr;
> -
> -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> -		iattr.ia_mode = mode;
> -		iattr.ia_ctime = current_time(inode);
> -
> -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> -	}
> -
> -	return error;
> -}
> -
>  int
>  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
>  	int error = 0;
> +	int mode_changed = 0;
> +	umode_t old_mode = inode->i_mode;
> +	struct timespec old_ctime = inode->i_ctime;
>  
>  	if (!acl)
>  		goto set_acl;
> @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  		return error;
>  
>  	if (type == ACL_TYPE_ACCESS) {
> -		umode_t mode;
> -
> -		error = posix_acl_update_mode(inode, &mode, &acl);
> -		if (error)
> -			return error;
> -		error = xfs_set_mode(inode, mode);
> +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
>  		if (error)
>  			return error;
> +		inode->i_ctime = current_time(inode);
> +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +		mode_changed = 1;
>  	}
>  
>   set_acl:
> -	return __xfs_set_acl(inode, acl, type);
> +	error = __xfs_set_acl(inode, acl, type);
> +	if (error && mode_changed) {
> +		inode->i_mode = old_mode;
> +		inode->i_ctime = old_ctime;
> +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +	}
> +	return error;
>  }
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ernesto A. Fernández Aug. 15, 2017, 12:58 a.m. UTC | #2

On Mon, Aug 14, 2017 at 03:28:27PM -0700, Darrick J. Wong wrote:
> On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > The test won't exactly pass after this patch is applied. It will fail
> > differently, this time claiming that the filesystem is inconsistent. This
> 
> I disagree that corrupting the filesystem is an acceptable strategy for
> dealing with insufficient space to handle setfacl.
> 
> > probably has something to do with setting too many extended attributes, as
> > it goes away if you change the attribute size in the test from 1k to 64k.
> 
> Hadn't you better look into why it does that and handle it?

Hi Darrick, thank for your quick reply.

I think I failed to explain myself properly. The inconsistency issue is NOT
being caused by my patch. It was already there, and my patch just makes it
visible. I'll try to be more clear. The test generic/449 essentially goes
like this:

  1) It sets as many extended attributes as possible so the filesystem will
     run out of space for the ACL.
  2) It tries to set the ACL, which will fail because of (1).
  3) It checks that the mode was not altered by the failed setfacl.
  4) It checks that the filesystem was not corrupted.

The filesystem becomes corrupted after step (1). Before my patch, the test
was failing in step (3), so step (4) was never run. Once the ACL issue is
fixed, step (3) will pass and the test will get to (4). So now the test
announces the inconsistency, while before it was silent about it.

If the test is run without applying my patch, but changing it slightly so it
passes step (3), the end result will be the same: inconsistent filesystem. A
simple way to try this is to comment out the whole if block that deals with
the call to setfacl.

So whatever this issue is, it seems to be a general problem with extended
attributes in xfs. It has nothing to do with the bug my patch intends to
fix, other than the fact that generic/449 accidentally checks for both
problems.

My suggestion to adjust the attribute size in the test was so that we could
focus on the issue at hand, not so we could ignore the other one. Of course
it needs to be fixed as well. I will try to figure out what is going on and
hopefully send another patch. But both issues are independent, so please
take a look at my patch on its own merits. It does fix the ACL bug even if
the test does not pass.

Thank you for your attention,
Ernest
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Aug. 15, 2017, 2 a.m. UTC | #3

On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index de7b9bd..c3193e1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -395,6 +395,7 @@ xfs_attr_remove(
>  	struct xfs_defer_ops	dfops;
>  	xfs_fsblock_t		firstblock;
>  	int			error;
> +	int			noattr = 0;

bool.

And make it "bool has_attr = true" to avoid the nasty double
negative of "!noattr" later on.

>  
>  	XFS_STATS_INC(mp, xs_attr_remove);
>  
> @@ -438,7 +439,12 @@ xfs_attr_remove(
>  	xfs_trans_ijoin(args.trans, dp, 0);
>  
>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		/*
> +		 * Commit even if there is no attr to remove, because when
> +		 * setting ACLs the caller may be changing the mode in the
> +		 * same transaction.
> +		 */
> +		noattr = 1;
>  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>  		error = xfs_attr_shortform_remove(&args);
> @@ -458,7 +464,7 @@ xfs_attr_remove(
>  	if (mp->m_flags & XFS_MOUNT_WSYNC)
>  		xfs_trans_set_sync(args.trans);
>  
> -	if ((flags & ATTR_KERNOTIME) == 0)
> +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
>  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);

Why?

>  
>  	/*
> @@ -468,6 +474,9 @@ xfs_attr_remove(
>  	error = xfs_trans_commit(args.trans);
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
>  
> +	if (!error && noattr)
> +		error = -ENOATTR;
> +
>  	return error;
>  
>  out:
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 7034e17..1fe4363 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  	return error;
>  }
>  
> -static int
> -xfs_set_mode(struct inode *inode, umode_t mode)
> -{
> -	int error = 0;
> -
> -	if (mode != inode->i_mode) {
> -		struct iattr iattr;
> -
> -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> -		iattr.ia_mode = mode;
> -		iattr.ia_ctime = current_time(inode);
> -
> -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> -	}
> -
> -	return error;
> -}

Ok, so that does an atomic, transactional change to the inode
mode....

>  int
>  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
>  	int error = 0;
> +	int mode_changed = 0;
> +	umode_t old_mode = inode->i_mode;
> +	struct timespec old_ctime = inode->i_ctime;
>  
>  	if (!acl)
>  		goto set_acl;
> @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  		return error;
>  
>  	if (type == ACL_TYPE_ACCESS) {
> -		umode_t mode;
> -
> -		error = posix_acl_update_mode(inode, &mode, &acl);
> -		if (error)
> -			return error;
> -		error = xfs_set_mode(inode, mode);
> +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
>  		if (error)
>  			return error;
> +		inode->i_ctime = current_time(inode);
> +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +		mode_changed = 1;
>  	}

And this doesn't.

So this breaks all our rules for inode metadata updates, both for
runtime access serialisation and for crash consistency.

>   set_acl:
> -	return __xfs_set_acl(inode, acl, type);
> +	error = __xfs_set_acl(inode, acl, type);
> +	if (error && mode_changed) {
> +		inode->i_mode = old_mode;
> +		inode->i_ctime = old_ctime;
> +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +	}

Same here.

IOWs, you can't just remove a transactional modification of inode
state and replace it with in-memory VFS inode level changes. You
need to do the changes in transaction context and with the correct
locks held.

IOWs, if you need to undo the mode changes xfs_set_mode() did, you
need to call xfs_set_mode() again. And because modifications have
been made, the new ctime should remain in place, too.

Of course, this still isn't atomic from a runtime access POV, but
that's substantially harder to provide than just undoing the mode
change on ACL attribute save error.

Cheers,

Dave.

Ernesto A. Fernández Aug. 15, 2017, 6:18 a.m. UTC | #4

Hi Dave, thanks for the review!

On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote:
> On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index de7b9bd..c3193e1 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -395,6 +395,7 @@ xfs_attr_remove(
> >  	struct xfs_defer_ops	dfops;
> >  	xfs_fsblock_t		firstblock;
> >  	int			error;
> > +	int			noattr = 0;
> 
> bool.
> 
> And make it "bool has_attr = true" to avoid the nasty double
> negative of "!noattr" later on.

I called it noattr because the error was -ENOATTR, but you are right,
has_attr is better. I assume you would prefer mode_changed to be a bool
as well?

> 
> >  
> >  	XFS_STATS_INC(mp, xs_attr_remove);
> >  
> > @@ -438,7 +439,12 @@ xfs_attr_remove(
> >  	xfs_trans_ijoin(args.trans, dp, 0);
> >  
> >  	if (!xfs_inode_hasattr(dp)) {
> > -		error = -ENOATTR;
> > +		/*
> > +		 * Commit even if there is no attr to remove, because when
> > +		 * setting ACLs the caller may be changing the mode in the
> > +		 * same transaction.
> > +		 */
> > +		noattr = 1;
> >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> >  		error = xfs_attr_shortform_remove(&args);
> > @@ -458,7 +464,7 @@ xfs_attr_remove(
> >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> >  		xfs_trans_set_sync(args.trans);
> >  
> > -	if ((flags & ATTR_KERNOTIME) == 0)
> > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> 
> Why?

Because we don't want the ctime to change when you try to delete an
attribute that didn't exist in the first place. At least that's how
it went before, so I tried to keep it the same.

> 
> >  
> >  	/*
> > @@ -468,6 +474,9 @@ xfs_attr_remove(
> >  	error = xfs_trans_commit(args.trans);
> >  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> >  
> > +	if (!error && noattr)
> > +		error = -ENOATTR;
> > +
> >  	return error;
> >  
> >  out:
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index 7034e17..1fe4363 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  	return error;
> >  }
> >  
> > -static int
> > -xfs_set_mode(struct inode *inode, umode_t mode)
> > -{
> > -	int error = 0;
> > -
> > -	if (mode != inode->i_mode) {
> > -		struct iattr iattr;
> > -
> > -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> > -		iattr.ia_mode = mode;
> > -		iattr.ia_ctime = current_time(inode);
> > -
> > -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> > -	}
> > -
> > -	return error;
> > -}
> 
> Ok, so that does an atomic, transactional change to the inode
> mode....
> 
> >  int
> >  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  {
> >  	int error = 0;
> > +	int mode_changed = 0;
> > +	umode_t old_mode = inode->i_mode;
> > +	struct timespec old_ctime = inode->i_ctime;
> >  
> >  	if (!acl)
> >  		goto set_acl;
> > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  		return error;
> >  
> >  	if (type == ACL_TYPE_ACCESS) {
> > -		umode_t mode;
> > -
> > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > -		if (error)
> > -			return error;
> > -		error = xfs_set_mode(inode, mode);
> > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> >  		if (error)
> >  			return error;
> > +		inode->i_ctime = current_time(inode);
> > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +		mode_changed = 1;
> >  	}
> 
> And this doesn't.
> 
> So this breaks all our rules for inode metadata updates, both for
> runtime access serialisation and for crash consistency.
> 

I'm only making changes to the vfs inode, and xfs_set_acl() should always
be called with i_rwsem held for writing. I think that should be enough to
serialize access to the vfs inode fields.

I'm sorry, I'm not sure I follow the crash consistency part. This is not
writing anything to disk. My expectation was that the changes I made to
the vfs inode would be attached to the transaction later, when calling
xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that
not work as intended? I was mostly imitating how btrfs sets an ACL, so I
could be misunderstanding how xfs does journaling.

> >   set_acl:
> > -	return __xfs_set_acl(inode, acl, type);
> > +	error = __xfs_set_acl(inode, acl, type);
> > +	if (error && mode_changed) {
> > +		inode->i_mode = old_mode;
> > +		inode->i_ctime = old_ctime;
> > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +	}
> 
> Same here.

i_rwsem would still be held here. My reasoning is that, since there was
an error, no changes were actually committed; so there would be no need
to commit the restoration either, just restore the old in-memory vfs inode
fields. Once again, this is inspired by btrfs, so I suppose it may fail
here in some way I can't see.

(I should probably clarify I did run tests before submitting)

> IOWs, you can't just remove a transactional modification of inode
> state and replace it with in-memory VFS inode level changes. You
> need to do the changes in transaction context and with the correct
> locks held.
>
> IOWs, if you need to undo the mode changes xfs_set_mode() did, you
> need to call xfs_set_mode() again. And because modifications have

But that could fail as well, and how would we handle that? If this bug
is to be fixed the whole thing needs to be a single transaction. That's
why I removed the xfs_set_mode() function in the first place.

> been made, the new ctime should remain in place, too.
>
> Of course, this still isn't atomic from a runtime access POV, but
> that's substantially harder to provide than just undoing the mode
> change on ACL attribute save error.
>
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks,
Ernest
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Aug. 15, 2017, 8:44 a.m. UTC | #5

On Tue, Aug 15, 2017 at 03:18:58AM -0300, Ernesto A. Fernández wrote:
> Hi Dave, thanks for the review!
> 
> On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote:
> > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index de7b9bd..c3193e1 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -395,6 +395,7 @@ xfs_attr_remove(
> > >  	struct xfs_defer_ops	dfops;
> > >  	xfs_fsblock_t		firstblock;
> > >  	int			error;
> > > +	int			noattr = 0;
> > 
> > bool.
> > 
> > And make it "bool has_attr = true" to avoid the nasty double
> > negative of "!noattr" later on.
> 
> I called it noattr because the error was -ENOATTR, but you are right,
> has_attr is better. I assume you would prefer mode_changed to be a bool
> as well?
> 
> > 
> > >  
> > >  	XFS_STATS_INC(mp, xs_attr_remove);
> > >  
> > > @@ -438,7 +439,12 @@ xfs_attr_remove(
> > >  	xfs_trans_ijoin(args.trans, dp, 0);
> > >  
> > >  	if (!xfs_inode_hasattr(dp)) {
> > > -		error = -ENOATTR;
> > > +		/*
> > > +		 * Commit even if there is no attr to remove, because when
> > > +		 * setting ACLs the caller may be changing the mode in the
> > > +		 * same transaction.
> > > +		 */
> > > +		noattr = 1;
> > >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > >  		error = xfs_attr_shortform_remove(&args);
> > > @@ -458,7 +464,7 @@ xfs_attr_remove(
> > >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> > >  		xfs_trans_set_sync(args.trans);
> > >  
> > > -	if ((flags & ATTR_KERNOTIME) == 0)
> > > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> > >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> > 
> > Why?
> 
> Because we don't want the ctime to change when you try to delete an
> attribute that didn't exist in the first place. At least that's how
> it went before, so I tried to keep it the same.

Which means the logic is convoluted and no longer obvious as to what
it does. i.e. the noattr case is running a transaction that dirties
an inode that no modifications were made to. Alarm bells right
there.

> > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > >  		return error;
> > >  
> > >  	if (type == ACL_TYPE_ACCESS) {
> > > -		umode_t mode;
> > > -
> > > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > > -		if (error)
> > > -			return error;
> > > -		error = xfs_set_mode(inode, mode);
> > > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> > >  		if (error)
> > >  			return error;
> > > +		inode->i_ctime = current_time(inode);
> > > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > +		mode_changed = 1;
> > >  	}
> > 
> > And this doesn't.
> > 
> > So this breaks all our rules for inode metadata updates, both for
> > runtime access serialisation and for crash consistency.
> 
> I'm only making changes to the vfs inode, and xfs_set_acl() should always
> be called with i_rwsem held for writing. I think that should be enough to
> serialize access to the vfs inode fields.

It's not. The i_rwsem does not protect XFS inode metadata, which is
what needs changing here. i.e. we change both the VFS inode and
the corresponding XFS inode metadata at the same time, under the
same XFS_ILOCK_EXCL lock, only inside a transaction context.

That's the rules, no exceptions.

> I'm sorry, I'm not sure I follow the crash consistency part. This is not
> writing anything to disk.

xfs_set_mode() runs a transaction that is journalled to change the
inode mode. You break in-memory consistency by modifying the VFS
inode state directly without holding the correct locks (see
xfs_setattr_mode()), and you break the transactional change model
(i.e. on-disk journal consistency and on-disk metadata change
ordering) by making metadata changes this outside transaction
contexts and without holding the correct locks.

> My expectation was that the changes I made to
> the vfs inode would be attached to the transaction later, when calling
> xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that
> not work as intended?

No, it wouldn't.  You have to follow the filesystem specific update
rules to change inode metadata, not make up your own....

> I was mostly imitating how btrfs sets an ACL, so I
> could be misunderstanding how xfs does journaling.

... and btrfs has a completely different transaction model and
implementation to XFS. IOWs, trying to do what works for btrfs in
XFS code will break XFS. And vice versa.

> > >   set_acl:
> > > -	return __xfs_set_acl(inode, acl, type);
> > > +	error = __xfs_set_acl(inode, acl, type);
> > > +	if (error && mode_changed) {
> > > +		inode->i_mode = old_mode;
> > > +		inode->i_ctime = old_ctime;
> > > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > +	}
> > 
> > Same here.
> 
> i_rwsem would still be held here. My reasoning is that, since there was
> an error, no changes were actually committed; so there would be no need
> to commit the restoration either, just restore the old in-memory vfs inode
> fields. Once again, this is inspired by btrfs, so I suppose it may fail
> here in some way I can't see.
> 
> (I should probably clarify I did run tests before submitting)

When you treat the code as a black box for testing, then end result
might appear to be what you want. That doesn't mean the
implementation is correct or that hidden transient states within the
block box are incorrect or invalid.

> > IOWs, you can't just remove a transactional modification of inode
> > state and replace it with in-memory VFS inode level changes. You
> > need to do the changes in transaction context and with the correct
> > locks held.
> >
> > IOWs, if you need to undo the mode changes xfs_set_mode() did, you
> > need to call xfs_set_mode() again. And because modifications have
> 
> But that could fail as well, and how would we handle that?

If the xattr change was not a fatal error (which will shut down the
filesystem), then calling xfs_set_mode() should work just fine as it
doesn't require any new space to be allocated (XFS != BTRFS - we can
run overwriting transactions even at ENOSPC).

And you need to run the mode undo as a separate transaction
because....

> If this bug
> is to be fixed the whole thing needs to be a single transaction.

.... this is simply not possible. Why? Because __xfs_set_acl() can
run up three separate transactions internally in adding the new
xattr. That means your unlogged vfs indoe change has already been
logged to the journal and so we *require* a transaction to overwrite
the new mode that has already been committed to the journal.

But I have to ask - why do we even need to modify the mode first?
Why not change the ACL first, then modify the mode+timestamp? If
setting the ACL fails, then we don't have anything to undo and all
is good....

Cheers,

Dave.

Ernesto A. Fernández Aug. 15, 2017, 7:29 p.m. UTC | #6

On Tue, Aug 15, 2017 at 06:44:30PM +1000, Dave Chinner wrote:
> On Tue, Aug 15, 2017 at 03:18:58AM -0300, Ernesto A. Fernández wrote:
> > Hi Dave, thanks for the review!
> > 
> > On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote:
> > > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index de7b9bd..c3193e1 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -395,6 +395,7 @@ xfs_attr_remove(
> > > >  	struct xfs_defer_ops	dfops;
> > > >  	xfs_fsblock_t		firstblock;
> > > >  	int			error;
> > > > +	int			noattr = 0;
> > > 
> > > bool.
> > > 
> > > And make it "bool has_attr = true" to avoid the nasty double
> > > negative of "!noattr" later on.
> > 
> > I called it noattr because the error was -ENOATTR, but you are right,
> > has_attr is better. I assume you would prefer mode_changed to be a bool
> > as well?
> > 
> > > 
> > > >  
> > > >  	XFS_STATS_INC(mp, xs_attr_remove);
> > > >  
> > > > @@ -438,7 +439,12 @@ xfs_attr_remove(
> > > >  	xfs_trans_ijoin(args.trans, dp, 0);
> > > >  
> > > >  	if (!xfs_inode_hasattr(dp)) {
> > > > -		error = -ENOATTR;
> > > > +		/*
> > > > +		 * Commit even if there is no attr to remove, because when
> > > > +		 * setting ACLs the caller may be changing the mode in the
> > > > +		 * same transaction.
> > > > +		 */
> > > > +		noattr = 1;
> > > >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > > >  		error = xfs_attr_shortform_remove(&args);
> > > > @@ -458,7 +464,7 @@ xfs_attr_remove(
> > > >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> > > >  		xfs_trans_set_sync(args.trans);
> > > >  
> > > > -	if ((flags & ATTR_KERNOTIME) == 0)
> > > > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> > > >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> > > 
> > > Why?
> > 
> > Because we don't want the ctime to change when you try to delete an
> > attribute that didn't exist in the first place. At least that's how
> > it went before, so I tried to keep it the same.
> 
> Which means the logic is convoluted and no longer obvious as to what
> it does. i.e. the noattr case is running a transaction that dirties
> an inode that no modifications were made to. Alarm bells right
> there.
> 
> > > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > > >  		return error;
> > > >  
> > > >  	if (type == ACL_TYPE_ACCESS) {
> > > > -		umode_t mode;
> > > > -
> > > > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > > > -		if (error)
> > > > -			return error;
> > > > -		error = xfs_set_mode(inode, mode);
> > > > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> > > >  		if (error)
> > > >  			return error;
> > > > +		inode->i_ctime = current_time(inode);
> > > > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > +		mode_changed = 1;
> > > >  	}
> > > 
> > > And this doesn't.
> > > 
> > > So this breaks all our rules for inode metadata updates, both for
> > > runtime access serialisation and for crash consistency.
> > 
> > I'm only making changes to the vfs inode, and xfs_set_acl() should always
> > be called with i_rwsem held for writing. I think that should be enough to
> > serialize access to the vfs inode fields.
> 
> It's not. The i_rwsem does not protect XFS inode metadata, which is
> what needs changing here. i.e. we change both the VFS inode and
> the corresponding XFS inode metadata at the same time, under the
> same XFS_ILOCK_EXCL lock, only inside a transaction context.
> 
> That's the rules, no exceptions.
> 
> > I'm sorry, I'm not sure I follow the crash consistency part. This is not
> > writing anything to disk.
> 
> xfs_set_mode() runs a transaction that is journalled to change the
> inode mode. You break in-memory consistency by modifying the VFS
> inode state directly without holding the correct locks (see
> xfs_setattr_mode()), and you break the transactional change model
> (i.e. on-disk journal consistency and on-disk metadata change
> ordering) by making metadata changes this outside transaction
> contexts and without holding the correct locks.
> 
> > My expectation was that the changes I made to
> > the vfs inode would be attached to the transaction later, when calling
> > xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that
> > not work as intended?
> 
> No, it wouldn't.  You have to follow the filesystem specific update
> rules to change inode metadata, not make up your own....
> 
> > I was mostly imitating how btrfs sets an ACL, so I
> > could be misunderstanding how xfs does journaling.
> 
> ... and btrfs has a completely different transaction model and
> implementation to XFS. IOWs, trying to do what works for btrfs in
> XFS code will break XFS. And vice versa.
> 
> > > >   set_acl:
> > > > -	return __xfs_set_acl(inode, acl, type);
> > > > +	error = __xfs_set_acl(inode, acl, type);
> > > > +	if (error && mode_changed) {
> > > > +		inode->i_mode = old_mode;
> > > > +		inode->i_ctime = old_ctime;
> > > > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > +	}
> > > 
> > > Same here.
> > 
> > i_rwsem would still be held here. My reasoning is that, since there was
> > an error, no changes were actually committed; so there would be no need
> > to commit the restoration either, just restore the old in-memory vfs inode
> > fields. Once again, this is inspired by btrfs, so I suppose it may fail
> > here in some way I can't see.
> > 
> > (I should probably clarify I did run tests before submitting)
> 
> When you treat the code as a black box for testing, then end result
> might appear to be what you want. That doesn't mean the
> implementation is correct or that hidden transient states within the
> block box are incorrect or invalid.
> 
> > > IOWs, you can't just remove a transactional modification of inode
> > > state and replace it with in-memory VFS inode level changes. You
> > > need to do the changes in transaction context and with the correct
> > > locks held.
> > >
> > > IOWs, if you need to undo the mode changes xfs_set_mode() did, you
> > > need to call xfs_set_mode() again. And because modifications have
> > 
> > But that could fail as well, and how would we handle that?
> 
> If the xattr change was not a fatal error (which will shut down the
> filesystem), then calling xfs_set_mode() should work just fine as it
> doesn't require any new space to be allocated (XFS != BTRFS - we can
> run overwriting transactions even at ENOSPC).
> 
> And you need to run the mode undo as a separate transaction
> because....
> 
> > If this bug
> > is to be fixed the whole thing needs to be a single transaction.
> 
> .... this is simply not possible. Why? Because __xfs_set_acl() can
> run up three separate transactions internally in adding the new
> xattr. That means your unlogged vfs indoe change has already been
> logged to the journal and so we *require* a transaction to overwrite
> the new mode that has already been committed to the journal.

It's clear that I don't understand xfs, and I'm causing more trouble than
I'm helping. I should leave this issue to you guys.
 
> But I have to ask - why do we even need to modify the mode first?
> Why not change the ACL first, then modify the mode+timestamp? If
> setting the ACL fails, then we don't have anything to undo and all
> is good....

I intended for the mode to be committed as part of the same transaction
that sets or removes the ACL. In my mind making the changes later, as part
of a separate transaction, would have meant that a crash between the two
left the filesystem in an inconsistent state, with a new ACL but without
the corresponding mode bits.

Of course that was based on a poor understanding of xfs journaling, so now
I don't know.

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks,
Ernest
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill O'Donnell Dec. 7, 2017, 5:31 p.m. UTC | #7

On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> When changing a file's ACL mask, xfs_set_acl() will first set the group
> bits of i_mode to the value of the mask, and only then set the actual
> extended attribute representing the new ACL.
> 
> If the second part fails (due to lack of space, for example) and the
> file had no ACL attribute to begin with, the system will from now on
> assume that the mask permission bits are actual group permission bits,
> potentially granting access to the wrong users.
> 
> To prevent this, perform the mode update and the ACL update as part of
> the same transaction, and restore the original mode to the vfs inode in
> case of failure. This requires a change in how extended attributes are
> deleted: commit the transaction even if there was no attribute to
> remove, to log the standard inode fields.
> 
> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
> ---
> This issue is covered by generic/449 in xfstests. Several other filesystems

Sorry, I'm coming into this late, and might be missing something...
is there a particular xfs configuration that you've set up on which
generic/449 fails? On 4.15-rc1, I see no failures of g/449.

Thanks-
Bill



> are affected by this, some of them have already applied patches:
>   - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
>   - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
>   - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
>   - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
> The test won't exactly pass after this patch is applied. It will fail
> differently, this time claiming that the filesystem is inconsistent. This
> probably has something to do with setting too many extended attributes, as
> it goes away if you change the attribute size in the test from 1k to 64k.
> 
>  fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
>  fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
>  2 files changed, 25 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index de7b9bd..c3193e1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -395,6 +395,7 @@ xfs_attr_remove(
>  	struct xfs_defer_ops	dfops;
>  	xfs_fsblock_t		firstblock;
>  	int			error;
> +	int			noattr = 0;
>  
>  	XFS_STATS_INC(mp, xs_attr_remove);
>  
> @@ -438,7 +439,12 @@ xfs_attr_remove(
>  	xfs_trans_ijoin(args.trans, dp, 0);
>  
>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		/*
> +		 * Commit even if there is no attr to remove, because when
> +		 * setting ACLs the caller may be changing the mode in the
> +		 * same transaction.
> +		 */
> +		noattr = 1;
>  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>  		error = xfs_attr_shortform_remove(&args);
> @@ -458,7 +464,7 @@ xfs_attr_remove(
>  	if (mp->m_flags & XFS_MOUNT_WSYNC)
>  		xfs_trans_set_sync(args.trans);
>  
> -	if ((flags & ATTR_KERNOTIME) == 0)
> +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
>  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
>  
>  	/*
> @@ -468,6 +474,9 @@ xfs_attr_remove(
>  	error = xfs_trans_commit(args.trans);
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
>  
> +	if (!error && noattr)
> +		error = -ENOATTR;
> +
>  	return error;
>  
>  out:
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 7034e17..1fe4363 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  	return error;
>  }
>  
> -static int
> -xfs_set_mode(struct inode *inode, umode_t mode)
> -{
> -	int error = 0;
> -
> -	if (mode != inode->i_mode) {
> -		struct iattr iattr;
> -
> -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> -		iattr.ia_mode = mode;
> -		iattr.ia_ctime = current_time(inode);
> -
> -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> -	}
> -
> -	return error;
> -}
> -
>  int
>  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
>  	int error = 0;
> +	int mode_changed = 0;
> +	umode_t old_mode = inode->i_mode;
> +	struct timespec old_ctime = inode->i_ctime;
>  
>  	if (!acl)
>  		goto set_acl;
> @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  		return error;
>  
>  	if (type == ACL_TYPE_ACCESS) {
> -		umode_t mode;
> -
> -		error = posix_acl_update_mode(inode, &mode, &acl);
> -		if (error)
> -			return error;
> -		error = xfs_set_mode(inode, mode);
> +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
>  		if (error)
>  			return error;
> +		inode->i_ctime = current_time(inode);
> +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +		mode_changed = 1;
>  	}
>  
>   set_acl:
> -	return __xfs_set_acl(inode, acl, type);
> +	error = __xfs_set_acl(inode, acl, type);
> +	if (error && mode_changed) {
> +		inode->i_mode = old_mode;
> +		inode->i_ctime = old_ctime;
> +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> +	}
> +	return error;
>  }
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill O'Donnell Dec. 7, 2017, 5:38 p.m. UTC | #8

On Thu, Dec 07, 2017 at 11:31:51AM -0600, Bill O'Donnell wrote:
> On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > When changing a file's ACL mask, xfs_set_acl() will first set the group
> > bits of i_mode to the value of the mask, and only then set the actual
> > extended attribute representing the new ACL.
> > 
> > If the second part fails (due to lack of space, for example) and the
> > file had no ACL attribute to begin with, the system will from now on
> > assume that the mask permission bits are actual group permission bits,
> > potentially granting access to the wrong users.
> > 
> > To prevent this, perform the mode update and the ACL update as part of
> > the same transaction, and restore the original mode to the vfs inode in
> > case of failure. This requires a change in how extended attributes are
> > deleted: commit the transaction even if there was no attribute to
> > remove, to log the standard inode fields.
> > 
> > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
> > ---
> > This issue is covered by generic/449 in xfstests. Several other filesystems
> 
> Sorry, I'm coming into this late, and might be missing something...
> is there a particular xfs configuration that you've set up on which
> generic/449 fails? On 4.15-rc1, I see no failures of g/449.

Oh, I see this was merged and fixes g/449:
commit 67f2ffe31d1a683170c2ba0ecc643e42a5fdd397
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Oct 9 11:37:23 2017 -0700

    xfs: don't change inode mode if ACL update fails



> 
> Thanks-
> Bill
> 
> 
> 
> > are affected by this, some of them have already applied patches:
> >   - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
> >   - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
> >   - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
> >   - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
> > The test won't exactly pass after this patch is applied. It will fail
> > differently, this time claiming that the filesystem is inconsistent. This
> > probably has something to do with setting too many extended attributes, as
> > it goes away if you change the attribute size in the test from 1k to 64k.
> > 
> >  fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
> >  fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
> >  2 files changed, 25 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index de7b9bd..c3193e1 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -395,6 +395,7 @@ xfs_attr_remove(
> >  	struct xfs_defer_ops	dfops;
> >  	xfs_fsblock_t		firstblock;
> >  	int			error;
> > +	int			noattr = 0;
> >  
> >  	XFS_STATS_INC(mp, xs_attr_remove);
> >  
> > @@ -438,7 +439,12 @@ xfs_attr_remove(
> >  	xfs_trans_ijoin(args.trans, dp, 0);
> >  
> >  	if (!xfs_inode_hasattr(dp)) {
> > -		error = -ENOATTR;
> > +		/*
> > +		 * Commit even if there is no attr to remove, because when
> > +		 * setting ACLs the caller may be changing the mode in the
> > +		 * same transaction.
> > +		 */
> > +		noattr = 1;
> >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> >  		error = xfs_attr_shortform_remove(&args);
> > @@ -458,7 +464,7 @@ xfs_attr_remove(
> >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> >  		xfs_trans_set_sync(args.trans);
> >  
> > -	if ((flags & ATTR_KERNOTIME) == 0)
> > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> >  
> >  	/*
> > @@ -468,6 +474,9 @@ xfs_attr_remove(
> >  	error = xfs_trans_commit(args.trans);
> >  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> >  
> > +	if (!error && noattr)
> > +		error = -ENOATTR;
> > +
> >  	return error;
> >  
> >  out:
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index 7034e17..1fe4363 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  	return error;
> >  }
> >  
> > -static int
> > -xfs_set_mode(struct inode *inode, umode_t mode)
> > -{
> > -	int error = 0;
> > -
> > -	if (mode != inode->i_mode) {
> > -		struct iattr iattr;
> > -
> > -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> > -		iattr.ia_mode = mode;
> > -		iattr.ia_ctime = current_time(inode);
> > -
> > -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> > -	}
> > -
> > -	return error;
> > -}
> > -
> >  int
> >  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  {
> >  	int error = 0;
> > +	int mode_changed = 0;
> > +	umode_t old_mode = inode->i_mode;
> > +	struct timespec old_ctime = inode->i_ctime;
> >  
> >  	if (!acl)
> >  		goto set_acl;
> > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  		return error;
> >  
> >  	if (type == ACL_TYPE_ACCESS) {
> > -		umode_t mode;
> > -
> > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > -		if (error)
> > -			return error;
> > -		error = xfs_set_mode(inode, mode);
> > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> >  		if (error)
> >  			return error;
> > +		inode->i_ctime = current_time(inode);
> > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +		mode_changed = 1;
> >  	}
> >  
> >   set_acl:
> > -	return __xfs_set_acl(inode, acl, type);
> > +	error = __xfs_set_acl(inode, acl, type);
> > +	if (error && mode_changed) {
> > +		inode->i_mode = old_mode;
> > +		inode->i_ctime = old_ctime;
> > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +	}
> > +	return error;
> >  }
> > -- 
> > 2.1.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Dec. 7, 2017, 5:43 p.m. UTC | #9

On Thu, Dec 07, 2017 at 11:31:51AM -0600, Bill O'Donnell wrote:
> On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > When changing a file's ACL mask, xfs_set_acl() will first set the group
> > bits of i_mode to the value of the mask, and only then set the actual
> > extended attribute representing the new ACL.
> > 
> > If the second part fails (due to lack of space, for example) and the
> > file had no ACL attribute to begin with, the system will from now on
> > assume that the mask permission bits are actual group permission bits,
> > potentially granting access to the wrong users.
> > 
> > To prevent this, perform the mode update and the ACL update as part of
> > the same transaction, and restore the original mode to the vfs inode in
> > case of failure. This requires a change in how extended attributes are
> > deleted: commit the transaction even if there was no attribute to
> > remove, to log the standard inode fields.
> > 
> > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
> > ---
> > This issue is covered by generic/449 in xfstests. Several other filesystems
> 
> Sorry, I'm coming into this late, and might be missing something...
> is there a particular xfs configuration that you've set up on which
> generic/449 fails? On 4.15-rc1, I see no failures of g/449.

Fixed in:
67f2ffe31d1 ("xfs: don't change inode mode if ACL update fails")

(I think...)

--D

> Thanks-
> Bill
> 
> 
> 
> > are affected by this, some of them have already applied patches:
> >   - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
> >   - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
> >   - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
> >   - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
> > The test won't exactly pass after this patch is applied. It will fail
> > differently, this time claiming that the filesystem is inconsistent. This
> > probably has something to do with setting too many extended attributes, as
> > it goes away if you change the attribute size in the test from 1k to 64k.
> > 
> >  fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
> >  fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
> >  2 files changed, 25 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index de7b9bd..c3193e1 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -395,6 +395,7 @@ xfs_attr_remove(
> >  	struct xfs_defer_ops	dfops;
> >  	xfs_fsblock_t		firstblock;
> >  	int			error;
> > +	int			noattr = 0;
> >  
> >  	XFS_STATS_INC(mp, xs_attr_remove);
> >  
> > @@ -438,7 +439,12 @@ xfs_attr_remove(
> >  	xfs_trans_ijoin(args.trans, dp, 0);
> >  
> >  	if (!xfs_inode_hasattr(dp)) {
> > -		error = -ENOATTR;
> > +		/*
> > +		 * Commit even if there is no attr to remove, because when
> > +		 * setting ACLs the caller may be changing the mode in the
> > +		 * same transaction.
> > +		 */
> > +		noattr = 1;
> >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> >  		error = xfs_attr_shortform_remove(&args);
> > @@ -458,7 +464,7 @@ xfs_attr_remove(
> >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> >  		xfs_trans_set_sync(args.trans);
> >  
> > -	if ((flags & ATTR_KERNOTIME) == 0)
> > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> >  
> >  	/*
> > @@ -468,6 +474,9 @@ xfs_attr_remove(
> >  	error = xfs_trans_commit(args.trans);
> >  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> >  
> > +	if (!error && noattr)
> > +		error = -ENOATTR;
> > +
> >  	return error;
> >  
> >  out:
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index 7034e17..1fe4363 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  	return error;
> >  }
> >  
> > -static int
> > -xfs_set_mode(struct inode *inode, umode_t mode)
> > -{
> > -	int error = 0;
> > -
> > -	if (mode != inode->i_mode) {
> > -		struct iattr iattr;
> > -
> > -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> > -		iattr.ia_mode = mode;
> > -		iattr.ia_ctime = current_time(inode);
> > -
> > -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> > -	}
> > -
> > -	return error;
> > -}
> > -
> >  int
> >  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  {
> >  	int error = 0;
> > +	int mode_changed = 0;
> > +	umode_t old_mode = inode->i_mode;
> > +	struct timespec old_ctime = inode->i_ctime;
> >  
> >  	if (!acl)
> >  		goto set_acl;
> > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  		return error;
> >  
> >  	if (type == ACL_TYPE_ACCESS) {
> > -		umode_t mode;
> > -
> > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > -		if (error)
> > -			return error;
> > -		error = xfs_set_mode(inode, mode);
> > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> >  		if (error)
> >  			return error;
> > +		inode->i_ctime = current_time(inode);
> > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +		mode_changed = 1;
> >  	}
> >  
> >   set_acl:
> > -	return __xfs_set_acl(inode, acl, type);
> > +	error = __xfs_set_acl(inode, acl, type);
> > +	if (error && mode_changed) {
> > +		inode->i_mode = old_mode;
> > +		inode->i_ctime = old_ctime;
> > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > +	}
> > +	return error;
> >  }
> > -- 
> > 2.1.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill O'Donnell Dec. 7, 2017, 5:51 p.m. UTC | #10

On Thu, Dec 07, 2017 at 09:43:09AM -0800, Darrick J. Wong wrote:
> On Thu, Dec 07, 2017 at 11:31:51AM -0600, Bill O'Donnell wrote:
> > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > > When changing a file's ACL mask, xfs_set_acl() will first set the group
> > > bits of i_mode to the value of the mask, and only then set the actual
> > > extended attribute representing the new ACL.
> > > 
> > > If the second part fails (due to lack of space, for example) and the
> > > file had no ACL attribute to begin with, the system will from now on
> > > assume that the mask permission bits are actual group permission bits,
> > > potentially granting access to the wrong users.
> > > 
> > > To prevent this, perform the mode update and the ACL update as part of
> > > the same transaction, and restore the original mode to the vfs inode in
> > > case of failure. This requires a change in how extended attributes are
> > > deleted: commit the transaction even if there was no attribute to
> > > remove, to log the standard inode fields.
> > > 
> > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
> > > ---
> > > This issue is covered by generic/449 in xfstests. Several other filesystems
> > 
> > Sorry, I'm coming into this late, and might be missing something...
> > is there a particular xfs configuration that you've set up on which
> > generic/449 fails? On 4.15-rc1, I see no failures of g/449.
> 
> Fixed in:
> 67f2ffe31d1 ("xfs: don't change inode mode if ACL update fails")
> 
> (I think...)

Correct.
Thanks-
Bill


> 
> --D
> 
> > Thanks-
> > Bill
> > 
> > 
> > 
> > > are affected by this, some of them have already applied patches:
> > >   - 397e434 ext4: preserve i_mode if __ext4_set_acl() fails
> > >   - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails
> > >   - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails
> > >   - fe26569 ext2: preserve i_mode if ext2_set_acl() fails
> > > The test won't exactly pass after this patch is applied. It will fail
> > > differently, this time claiming that the filesystem is inconsistent. This
> > > probably has something to do with setting too many extended attributes, as
> > > it goes away if you change the attribute size in the test from 1k to 64k.
> > > 
> > >  fs/xfs/libxfs/xfs_attr.c | 13 +++++++++++--
> > >  fs/xfs/xfs_acl.c         | 39 ++++++++++++++-------------------------
> > >  2 files changed, 25 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index de7b9bd..c3193e1 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -395,6 +395,7 @@ xfs_attr_remove(
> > >  	struct xfs_defer_ops	dfops;
> > >  	xfs_fsblock_t		firstblock;
> > >  	int			error;
> > > +	int			noattr = 0;
> > >  
> > >  	XFS_STATS_INC(mp, xs_attr_remove);
> > >  
> > > @@ -438,7 +439,12 @@ xfs_attr_remove(
> > >  	xfs_trans_ijoin(args.trans, dp, 0);
> > >  
> > >  	if (!xfs_inode_hasattr(dp)) {
> > > -		error = -ENOATTR;
> > > +		/*
> > > +		 * Commit even if there is no attr to remove, because when
> > > +		 * setting ACLs the caller may be changing the mode in the
> > > +		 * same transaction.
> > > +		 */
> > > +		noattr = 1;
> > >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > >  		error = xfs_attr_shortform_remove(&args);
> > > @@ -458,7 +464,7 @@ xfs_attr_remove(
> > >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> > >  		xfs_trans_set_sync(args.trans);
> > >  
> > > -	if ((flags & ATTR_KERNOTIME) == 0)
> > > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> > >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> > >  
> > >  	/*
> > > @@ -468,6 +474,9 @@ xfs_attr_remove(
> > >  	error = xfs_trans_commit(args.trans);
> > >  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> > >  
> > > +	if (!error && noattr)
> > > +		error = -ENOATTR;
> > > +
> > >  	return error;
> > >  
> > >  out:
> > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > index 7034e17..1fe4363 100644
> > > --- a/fs/xfs/xfs_acl.c
> > > +++ b/fs/xfs/xfs_acl.c
> > > @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > >  	return error;
> > >  }
> > >  
> > > -static int
> > > -xfs_set_mode(struct inode *inode, umode_t mode)
> > > -{
> > > -	int error = 0;
> > > -
> > > -	if (mode != inode->i_mode) {
> > > -		struct iattr iattr;
> > > -
> > > -		iattr.ia_valid = ATTR_MODE | ATTR_CTIME;
> > > -		iattr.ia_mode = mode;
> > > -		iattr.ia_ctime = current_time(inode);
> > > -
> > > -		error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
> > > -	}
> > > -
> > > -	return error;
> > > -}
> > > -
> > >  int
> > >  xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > >  {
> > >  	int error = 0;
> > > +	int mode_changed = 0;
> > > +	umode_t old_mode = inode->i_mode;
> > > +	struct timespec old_ctime = inode->i_ctime;
> > >  
> > >  	if (!acl)
> > >  		goto set_acl;
> > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > >  		return error;
> > >  
> > >  	if (type == ACL_TYPE_ACCESS) {
> > > -		umode_t mode;
> > > -
> > > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > > -		if (error)
> > > -			return error;
> > > -		error = xfs_set_mode(inode, mode);
> > > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> > >  		if (error)
> > >  			return error;
> > > +		inode->i_ctime = current_time(inode);
> > > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > +		mode_changed = 1;
> > >  	}
> > >  
> > >   set_acl:
> > > -	return __xfs_set_acl(inode, acl, type);
> > > +	error = __xfs_set_acl(inode, acl, type);
> > > +	if (error && mode_changed) {
> > > +		inode->i_mode = old_mode;
> > > +		inode->i_ctime = old_ctime;
> > > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > +	}
> > > +	return error;
> > >  }
> > > -- 
> > > 2.1.4
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

xfs: preserve i_mode if __xfs_set_acl() fails

Commit Message

Comments

Patch