diff mbox

[1/7] xfs: fix toctou race when locking an inode to access the data map

Message ID 148582219035.12293.12084220786527965512.stgit@birch.djwong.org (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Darrick J. Wong Jan. 31, 2017, 12:23 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

We use di_format and if_flags to decide whether we're grabbing the ilock
in btree mode (btree extents not loaded) or shared mode (anything else),
but the state of those fields can be changed by other threads that are
also trying to load the btree extents -- IFEXTENTS gets set before the
_bmap_read_extents call and cleared if it fails.  Therefore, once we've
grabbed the shared ilock we have to re-check the fields to see if we
actually need to upgrade to the exclusive ilock in order to try loading
the extents.

Without this patch, we trigger ilock assert failures when a bunch of
threads try to access a btree format directory with a corrupt bmbt root
and corrupt the incore data structures, leading to a crash.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen Jan. 31, 2017, 3:01 a.m. UTC | #1
On 1/30/17 6:23 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We use di_format and if_flags to decide whether we're grabbing the ilock
> in btree mode (btree extents not loaded) or shared mode (anything else),
> but the state of those fields can be changed by other threads that are
> also trying to load the btree extents -- IFEXTENTS gets set before the
> _bmap_read_extents call and cleared if it fails.  Therefore, once we've
> grabbed the shared ilock we have to re-check the fields to see if we
> actually need to upgrade to the exclusive ilock in order to try loading
> the extents.
> 
> Without this patch, we trigger ilock assert failures when a bunch of
> threads try to access a btree format directory with a corrupt bmbt root
> and corrupt the incore data structures, leading to a crash.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

I'm a little surprised this hasn't been hit before, TBH.

Seems right -

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> ---
>  fs/xfs/xfs_inode.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index de32f0f..7d7206c 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -125,6 +125,18 @@ xfs_ilock_data_map_shared(
>  	    (ip->i_df.if_flags & XFS_IFEXTENTS) == 0)
>  		lock_mode = XFS_ILOCK_EXCL;
>  	xfs_ilock(ip, lock_mode);
> +	/*
> +	 * We can change if_flags under ilock if we try to read the
> +	 * extents and fail.  Since we hadn't grabbed the ilock at check
> +	 * time, we have to re-check and upgrade the lock now.
> +	 */
> +	if (lock_mode == XFS_ILOCK_SHARED &&
> +	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> +	    (ip->i_df.if_flags & XFS_IFEXTENTS) == 0) {
> +		xfs_iunlock(ip, lock_mode);
> +		lock_mode = XFS_ILOCK_EXCL;
> +		xfs_ilock(ip, lock_mode);
> +	}
>  	return lock_mode;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Jan. 31, 2017, 1:26 p.m. UTC | #2
On Mon, Jan 30, 2017 at 04:23:10PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We use di_format and if_flags to decide whether we're grabbing the ilock
> in btree mode (btree extents not loaded) or shared mode (anything else),
> but the state of those fields can be changed by other threads that are
> also trying to load the btree extents -- IFEXTENTS gets set before the
> _bmap_read_extents call and cleared if it fails.  Therefore, once we've
> grabbed the shared ilock we have to re-check the fields to see if we
> actually need to upgrade to the exclusive ilock in order to try loading
> the extents.
> 
> Without this patch, we trigger ilock assert failures when a bunch of
> threads try to access a btree format directory with a corrupt bmbt root
> and corrupt the incore data structures, leading to a crash.

I see the problem here, but I really don't like the fix.  Instead
I'd much rather check for a new flag that tells that the extent list
hasn't been read, which can only be cleared under the exclusive
ilock.  That way we shouldn't need any additional relocking or
checks.

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 31, 2017, 7:45 p.m. UTC | #3
On Tue, Jan 31, 2017 at 05:26:58AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 30, 2017 at 04:23:10PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > We use di_format and if_flags to decide whether we're grabbing the ilock
> > in btree mode (btree extents not loaded) or shared mode (anything else),
> > but the state of those fields can be changed by other threads that are
> > also trying to load the btree extents -- IFEXTENTS gets set before the
> > _bmap_read_extents call and cleared if it fails.  Therefore, once we've
> > grabbed the shared ilock we have to re-check the fields to see if we
> > actually need to upgrade to the exclusive ilock in order to try loading
> > the extents.
> > 
> > Without this patch, we trigger ilock assert failures when a bunch of
> > threads try to access a btree format directory with a corrupt bmbt root
> > and corrupt the incore data structures, leading to a crash.
> 
> I see the problem here, but I really don't like the fix.  Instead
> I'd much rather check for a new flag that tells that the extent list
> hasn't been read, which can only be cleared under the exclusive
> ilock.  That way we shouldn't need any additional relocking or
> checks.

I'm confused --

I thought XFS_IFEXTENTS means "extents have been read", which is the
inverse of the flag you propose.  AFAICT the bit is only ever set (or
cleared) with ILOCK_EXCL held, so the problem here is that we're
performing an unlocked read of if_flags prior to actually taking the
lock that we need.

On the other hand, I /think/ it's the case that none of the functions
called in _iread_extents actually cares about IFEXTENTS being set, so
perhaps an alternative could be to avoid setting the bit until we've
successfully read in all the bmbt records?

I'll try that out and report back.

--D

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 31, 2017, 9:40 p.m. UTC | #4
On Tue, Jan 31, 2017 at 11:45:20AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 31, 2017 at 05:26:58AM -0800, Christoph Hellwig wrote:
> > On Mon, Jan 30, 2017 at 04:23:10PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > We use di_format and if_flags to decide whether we're grabbing the ilock
> > > in btree mode (btree extents not loaded) or shared mode (anything else),
> > > but the state of those fields can be changed by other threads that are
> > > also trying to load the btree extents -- IFEXTENTS gets set before the
> > > _bmap_read_extents call and cleared if it fails.  Therefore, once we've
> > > grabbed the shared ilock we have to re-check the fields to see if we
> > > actually need to upgrade to the exclusive ilock in order to try loading
> > > the extents.
> > > 
> > > Without this patch, we trigger ilock assert failures when a bunch of
> > > threads try to access a btree format directory with a corrupt bmbt root
> > > and corrupt the incore data structures, leading to a crash.
> > 
> > I see the problem here, but I really don't like the fix.  Instead
> > I'd much rather check for a new flag that tells that the extent list
> > hasn't been read, which can only be cleared under the exclusive
> > ilock.  That way we shouldn't need any additional relocking or
> > checks.
> 
> I'm confused --
> 
> I thought XFS_IFEXTENTS means "extents have been read", which is the
> inverse of the flag you propose.  AFAICT the bit is only ever set (or
> cleared) with ILOCK_EXCL held, so the problem here is that we're
> performing an unlocked read of if_flags prior to actually taking the
> lock that we need.
> 
> On the other hand, I /think/ it's the case that none of the functions
> called in _iread_extents actually cares about IFEXTENTS being set, so
> perhaps an alternative could be to avoid setting the bit until we've
> successfully read in all the bmbt records?
> 
> I'll try that out and report back.

Seems to work, will send a revised patch.

--D

> 
> --D
> 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index de32f0f..7d7206c 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -125,6 +125,18 @@  xfs_ilock_data_map_shared(
 	    (ip->i_df.if_flags & XFS_IFEXTENTS) == 0)
 		lock_mode = XFS_ILOCK_EXCL;
 	xfs_ilock(ip, lock_mode);
+	/*
+	 * We can change if_flags under ilock if we try to read the
+	 * extents and fail.  Since we hadn't grabbed the ilock at check
+	 * time, we have to re-check and upgrade the lock now.
+	 */
+	if (lock_mode == XFS_ILOCK_SHARED &&
+	    ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
+	    (ip->i_df.if_flags & XFS_IFEXTENTS) == 0) {
+		xfs_iunlock(ip, lock_mode);
+		lock_mode = XFS_ILOCK_EXCL;
+		xfs_ilock(ip, lock_mode);
+	}
 	return lock_mode;
 }