diff mbox

[v3] fs: clear writeback errors in inode_init_always

Message ID 20180522164359.GB12940@magnolia (mailing list archive)
State Accepted
Headers show

Commit Message

Darrick J. Wong May 22, 2018, 4:43 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

In inode_init_always(), we clear the inode mapping flags, which clears
any retained error (AS_EIO, AS_ENOSPC) bits.  Unfortunately, we do not
also clear wb_err, which means that old mapping errors can leak through
to new inodes.

This is crucial for the XFS inode allocation path because we recycle old
in-core inodes and we do not want error state from an old file to leak
into the new file.  This bug was discovered by running generic/036 and
generic/047 in a loop and noticing that the EIOs generated by the
collision of direct and buffered writes in generic/036 would survive the
remount between 036 and 047, and get reported to the fsyncs (on
different files!) in generic/047.

Since we're changing the semantics of inode_init_always, we must also
change xfs_reinit_inode to retain the writeback error state when we go
to recover an inode that has been torn down in the vfs but not yet
disposed of by XFS.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
v3: clear error state when allocating new inode
v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit
---
 fs/inode.c          |    1 +
 fs/xfs/xfs_icache.c |    9 +++++++++
 fs/xfs/xfs_inode.c  |    5 +++++
 3 files changed, 15 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster May 22, 2018, 6:40 p.m. UTC | #1
On Tue, May 22, 2018 at 09:43:59AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In inode_init_always(), we clear the inode mapping flags, which clears
> any retained error (AS_EIO, AS_ENOSPC) bits.  Unfortunately, we do not
> also clear wb_err, which means that old mapping errors can leak through
> to new inodes.
> 
> This is crucial for the XFS inode allocation path because we recycle old
> in-core inodes and we do not want error state from an old file to leak
> into the new file.  This bug was discovered by running generic/036 and
> generic/047 in a loop and noticing that the EIOs generated by the
> collision of direct and buffered writes in generic/036 would survive the
> remount between 036 and 047, and get reported to the fsyncs (on
> different files!) in generic/047.
> 
> Since we're changing the semantics of inode_init_always, we must also
> change xfs_reinit_inode to retain the writeback error state when we go
> to recover an inode that has been torn down in the vfs but not yet
> disposed of by XFS.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> v3: clear error state when allocating new inode
> v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit
> ---
>  fs/inode.c          |    1 +
>  fs/xfs/xfs_icache.c |    9 +++++++++
>  fs/xfs/xfs_inode.c  |    5 +++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 13ceb98c3bd3..3b55391072f3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	mapping->a_ops = &empty_aops;
>  	mapping->host = inode;
>  	mapping->flags = 0;
> +	mapping->wb_err = 0;
>  	atomic_set(&mapping->i_mmap_writable, 0);
>  	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
>  	mapping->private_data = NULL;
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 164350d91efc..d01f9544ff01 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -298,6 +298,10 @@ xfs_reinit_inode(
>  	uint64_t	version = inode_peek_iversion(inode);
>  	umode_t		mode = inode->i_mode;
>  	dev_t		dev = inode->i_rdev;
> +	errseq_t	old_err = inode->i_mapping->wb_err;
> +	bool		as_eio = test_bit(AS_EIO, &inode->i_mapping->flags);
> +	bool		as_enospc = test_bit(AS_ENOSPC,
> +					     &inode->i_mapping->flags);
>  
>  	error = inode_init_always(mp->m_super, inode);
>  
> @@ -306,6 +310,11 @@ xfs_reinit_inode(
>  	inode_set_iversion_queried(inode, version);
>  	inode->i_mode = mode;
>  	inode->i_rdev = dev;
> +	inode->i_mapping->wb_err = old_err;
> +	if (as_eio)
> +		set_bit(AS_EIO, &inode->i_mapping->flags);
> +	if (as_enospc)
> +		set_bit(AS_ENOSPC, &inode->i_mapping->flags);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 02eae5059231..6c47ea3e577b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -835,6 +835,11 @@ xfs_ialloc(
>  			inode->i_mode |= S_ISGID;
>  	}
>  
> +	/* Reset all vfs error state. */

I'd prefer a comment that reminds why we have to do this. E.g.,
something like:

	/*
	 * In-core inode reinit carries over vfs error state. Reset it
	 * so we don't leak error state from a previous generation of
	 * the inode.
	 */

... but feel free to reword that.

> +	inode->i_mapping->wb_err = 0;
> +	clear_bit(AS_EIO, &inode->i_mapping->flags);
> +	clear_bit(AS_ENOSPC, &inode->i_mapping->flags);
> +

I also wonder whether it might be cleaner to just reset this stuff in
xfs_ifree() where we kill the i_mode and bump the gen and whatnot, but
afaict this should work too. With the comment fix, at least:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	/*
>  	 * If the group ID of the new file does not match the effective group
>  	 * ID or one of the supplementary group IDs, the S_ISGID bit is cleared
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong May 22, 2018, 6:47 p.m. UTC | #2
On Tue, May 22, 2018 at 02:40:09PM -0400, Brian Foster wrote:
> On Tue, May 22, 2018 at 09:43:59AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In inode_init_always(), we clear the inode mapping flags, which clears
> > any retained error (AS_EIO, AS_ENOSPC) bits.  Unfortunately, we do not
> > also clear wb_err, which means that old mapping errors can leak through
> > to new inodes.
> > 
> > This is crucial for the XFS inode allocation path because we recycle old
> > in-core inodes and we do not want error state from an old file to leak
> > into the new file.  This bug was discovered by running generic/036 and
> > generic/047 in a loop and noticing that the EIOs generated by the
> > collision of direct and buffered writes in generic/036 would survive the
> > remount between 036 and 047, and get reported to the fsyncs (on
> > different files!) in generic/047.
> > 
> > Since we're changing the semantics of inode_init_always, we must also
> > change xfs_reinit_inode to retain the writeback error state when we go
> > to recover an inode that has been torn down in the vfs but not yet
> > disposed of by XFS.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > v3: clear error state when allocating new inode
> > v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit
> > ---
> >  fs/inode.c          |    1 +
> >  fs/xfs/xfs_icache.c |    9 +++++++++
> >  fs/xfs/xfs_inode.c  |    5 +++++
> >  3 files changed, 15 insertions(+)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 13ceb98c3bd3..3b55391072f3 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> >  	mapping->a_ops = &empty_aops;
> >  	mapping->host = inode;
> >  	mapping->flags = 0;
> > +	mapping->wb_err = 0;
> >  	atomic_set(&mapping->i_mmap_writable, 0);
> >  	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
> >  	mapping->private_data = NULL;
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 164350d91efc..d01f9544ff01 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -298,6 +298,10 @@ xfs_reinit_inode(
> >  	uint64_t	version = inode_peek_iversion(inode);
> >  	umode_t		mode = inode->i_mode;
> >  	dev_t		dev = inode->i_rdev;
> > +	errseq_t	old_err = inode->i_mapping->wb_err;
> > +	bool		as_eio = test_bit(AS_EIO, &inode->i_mapping->flags);
> > +	bool		as_enospc = test_bit(AS_ENOSPC,
> > +					     &inode->i_mapping->flags);
> >  
> >  	error = inode_init_always(mp->m_super, inode);
> >  
> > @@ -306,6 +310,11 @@ xfs_reinit_inode(
> >  	inode_set_iversion_queried(inode, version);
> >  	inode->i_mode = mode;
> >  	inode->i_rdev = dev;
> > +	inode->i_mapping->wb_err = old_err;
> > +	if (as_eio)
> > +		set_bit(AS_EIO, &inode->i_mapping->flags);
> > +	if (as_enospc)
> > +		set_bit(AS_ENOSPC, &inode->i_mapping->flags);
> >  	return error;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 02eae5059231..6c47ea3e577b 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -835,6 +835,11 @@ xfs_ialloc(
> >  			inode->i_mode |= S_ISGID;
> >  	}
> >  
> > +	/* Reset all vfs error state. */
> 
> I'd prefer a comment that reminds why we have to do this. E.g.,
> something like:
> 
> 	/*
> 	 * In-core inode reinit carries over vfs error state. Reset it
> 	 * so we don't leak error state from a previous generation of
> 	 * the inode.
> 	 */

I settled on:

	/*
	 * In-core inode reinit does not clear vfs error state.  Reset
	 * it so we don't leak error state from a previous generation of
	 * the inode.
	 */

> 
> ... but feel free to reword that.
> 
> > +	inode->i_mapping->wb_err = 0;
> > +	clear_bit(AS_EIO, &inode->i_mapping->flags);
> > +	clear_bit(AS_ENOSPC, &inode->i_mapping->flags);
> > +
> 
> I also wonder whether it might be cleaner to just reset this stuff in
> xfs_ifree() where we kill the i_mode and bump the gen and whatnot, but
> afaict this should work too. With the comment fix, at least:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Thanks for the review!

--D

> 
> >  	/*
> >  	 * If the group ID of the new file does not match the effective group
> >  	 * ID or one of the supplementary group IDs, the S_ISGID bit is cleared
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner May 22, 2018, 10:05 p.m. UTC | #3
On Tue, May 22, 2018 at 09:43:59AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In inode_init_always(), we clear the inode mapping flags, which clears
> any retained error (AS_EIO, AS_ENOSPC) bits.  Unfortunately, we do not
> also clear wb_err, which means that old mapping errors can leak through
> to new inodes.
> 
> This is crucial for the XFS inode allocation path because we recycle old
> in-core inodes and we do not want error state from an old file to leak
> into the new file.  This bug was discovered by running generic/036 and
> generic/047 in a loop and noticing that the EIOs generated by the
> collision of direct and buffered writes in generic/036 would survive the
> remount between 036 and 047, and get reported to the fsyncs (on
> different files!) in generic/047.
> 
> Since we're changing the semantics of inode_init_always, we must also
> change xfs_reinit_inode to retain the writeback error state when we go
> to recover an inode that has been torn down in the vfs but not yet
> disposed of by XFS.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> v3: clear error state when allocating new inode
> v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit
> ---
>  fs/inode.c          |    1 +
>  fs/xfs/xfs_icache.c |    9 +++++++++
>  fs/xfs/xfs_inode.c  |    5 +++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 13ceb98c3bd3..3b55391072f3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	mapping->a_ops = &empty_aops;
>  	mapping->host = inode;
>  	mapping->flags = 0;
> +	mapping->wb_err = 0;
>  	atomic_set(&mapping->i_mmap_writable, 0);
>  	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
>  	mapping->private_data = NULL;
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 164350d91efc..d01f9544ff01 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -298,6 +298,10 @@ xfs_reinit_inode(
>  	uint64_t	version = inode_peek_iversion(inode);
>  	umode_t		mode = inode->i_mode;
>  	dev_t		dev = inode->i_rdev;
> +	errseq_t	old_err = inode->i_mapping->wb_err;
> +	bool		as_eio = test_bit(AS_EIO, &inode->i_mapping->flags);
> +	bool		as_enospc = test_bit(AS_ENOSPC,
> +					     &inode->i_mapping->flags);
>  
>  	error = inode_init_always(mp->m_super, inode);
>  
> @@ -306,6 +310,11 @@ xfs_reinit_inode(
>  	inode_set_iversion_queried(inode, version);
>  	inode->i_mode = mode;
>  	inode->i_rdev = dev;
> +	inode->i_mapping->wb_err = old_err;
> +	if (as_eio)
> +		set_bit(AS_EIO, &inode->i_mapping->flags);
> +	if (as_enospc)
> +		set_bit(AS_ENOSPC, &inode->i_mapping->flags);

I just don't see this as valid.

The inode has already been removed from the VFS cache when this
happens (i.e. it's gone through iput_final()->evict->destroy_inode)
and so any VFS-specific state is already invalid.

We're retaining XFS-specific inode information across the recycling
of the xfs_inode (i.e. stuff that doesn't change *on-disk* across
recycling an inode, but is reset by inode_init_always()). This is a
creating a new life cycle for the VFS inode (i.e. I_NEW is set), so 
*there is no VFS state retained* across this operation from it's
previous life.

Trying to do this is just going to lead us into nasty layering
violations where the VFS state has to be hacked around just in case
this recycling of the previous life cycle's state was the wrong
thing to do.

> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 02eae5059231..6c47ea3e577b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -835,6 +835,11 @@ xfs_ialloc(
>  			inode->i_mode |= S_ISGID;
>  	}
>  
> +	/* Reset all vfs error state. */
> +	inode->i_mapping->wb_err = 0;
> +	clear_bit(AS_EIO, &inode->i_mapping->flags);
> +	clear_bit(AS_ENOSPC, &inode->i_mapping->flags);

Just like this. This is a sure sign we're doing the wrong thing in
xfs_reinit_inode(). VFS inode lifecycle state is trashed at
->destroy_inode. It does not persist into new instantiations of VFS
inode state, regardless of that new inode lifecycle gets to the
I_NEW state...

Cheers,

Dave.
Darrick J. Wong May 23, 2018, 3 a.m. UTC | #4
On Wed, May 23, 2018 at 08:05:52AM +1000, Dave Chinner wrote:
> On Tue, May 22, 2018 at 09:43:59AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In inode_init_always(), we clear the inode mapping flags, which clears
> > any retained error (AS_EIO, AS_ENOSPC) bits.  Unfortunately, we do not
> > also clear wb_err, which means that old mapping errors can leak through
> > to new inodes.
> > 
> > This is crucial for the XFS inode allocation path because we recycle old
> > in-core inodes and we do not want error state from an old file to leak
> > into the new file.  This bug was discovered by running generic/036 and
> > generic/047 in a loop and noticing that the EIOs generated by the
> > collision of direct and buffered writes in generic/036 would survive the
> > remount between 036 and 047, and get reported to the fsyncs (on
> > different files!) in generic/047.
> > 
> > Since we're changing the semantics of inode_init_always, we must also
> > change xfs_reinit_inode to retain the writeback error state when we go
> > to recover an inode that has been torn down in the vfs but not yet
> > disposed of by XFS.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > v3: clear error state when allocating new inode
> > v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit
> > ---
> >  fs/inode.c          |    1 +
> >  fs/xfs/xfs_icache.c |    9 +++++++++
> >  fs/xfs/xfs_inode.c  |    5 +++++
> >  3 files changed, 15 insertions(+)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 13ceb98c3bd3..3b55391072f3 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> >  	mapping->a_ops = &empty_aops;
> >  	mapping->host = inode;
> >  	mapping->flags = 0;
> > +	mapping->wb_err = 0;
> >  	atomic_set(&mapping->i_mmap_writable, 0);
> >  	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
> >  	mapping->private_data = NULL;
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 164350d91efc..d01f9544ff01 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -298,6 +298,10 @@ xfs_reinit_inode(
> >  	uint64_t	version = inode_peek_iversion(inode);
> >  	umode_t		mode = inode->i_mode;
> >  	dev_t		dev = inode->i_rdev;
> > +	errseq_t	old_err = inode->i_mapping->wb_err;
> > +	bool		as_eio = test_bit(AS_EIO, &inode->i_mapping->flags);
> > +	bool		as_enospc = test_bit(AS_ENOSPC,
> > +					     &inode->i_mapping->flags);
> >  
> >  	error = inode_init_always(mp->m_super, inode);
> >  
> > @@ -306,6 +310,11 @@ xfs_reinit_inode(
> >  	inode_set_iversion_queried(inode, version);
> >  	inode->i_mode = mode;
> >  	inode->i_rdev = dev;
> > +	inode->i_mapping->wb_err = old_err;
> > +	if (as_eio)
> > +		set_bit(AS_EIO, &inode->i_mapping->flags);
> > +	if (as_enospc)
> > +		set_bit(AS_ENOSPC, &inode->i_mapping->flags);
> 
> I just don't see this as valid.
> 
> The inode has already been removed from the VFS cache when this
> happens (i.e. it's gone through iput_final()->evict->destroy_inode)
> and so any VFS-specific state is already invalid.
> 
> We're retaining XFS-specific inode information across the recycling
> of the xfs_inode (i.e. stuff that doesn't change *on-disk* across
> recycling an inode, but is reset by inode_init_always()). This is a
> creating a new life cycle for the VFS inode (i.e. I_NEW is set), so 
> *there is no VFS state retained* across this operation from it's
> previous life.
> 
> Trying to do this is just going to lead us into nasty layering
> violations where the VFS state has to be hacked around just in case
> this recycling of the previous life cycle's state was the wrong
> thing to do.
> 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 02eae5059231..6c47ea3e577b 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -835,6 +835,11 @@ xfs_ialloc(
> >  			inode->i_mode |= S_ISGID;
> >  	}
> >  
> > +	/* Reset all vfs error state. */
> > +	inode->i_mapping->wb_err = 0;
> > +	clear_bit(AS_EIO, &inode->i_mapping->flags);
> > +	clear_bit(AS_ENOSPC, &inode->i_mapping->flags);
> 
> Just like this. This is a sure sign we're doing the wrong thing in
> xfs_reinit_inode(). VFS inode lifecycle state is trashed at
> ->destroy_inode. It does not persist into new instantiations of VFS
> inode state, regardless of that new inode lifecycle gets to the
> I_NEW state...

(Just reiterating a conversation we had on irc...)

So basically as far as the vfs is concerned the inode is gone and done
and destroyed, and it has no expectation of state being preserved.
Therefore, this patch can be trimmed to the inode_init_always part.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/inode.c b/fs/inode.c
index 13ceb98c3bd3..3b55391072f3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -178,6 +178,7 @@  int inode_init_always(struct super_block *sb, struct inode *inode)
 	mapping->a_ops = &empty_aops;
 	mapping->host = inode;
 	mapping->flags = 0;
+	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
 	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
 	mapping->private_data = NULL;
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 164350d91efc..d01f9544ff01 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -298,6 +298,10 @@  xfs_reinit_inode(
 	uint64_t	version = inode_peek_iversion(inode);
 	umode_t		mode = inode->i_mode;
 	dev_t		dev = inode->i_rdev;
+	errseq_t	old_err = inode->i_mapping->wb_err;
+	bool		as_eio = test_bit(AS_EIO, &inode->i_mapping->flags);
+	bool		as_enospc = test_bit(AS_ENOSPC,
+					     &inode->i_mapping->flags);
 
 	error = inode_init_always(mp->m_super, inode);
 
@@ -306,6 +310,11 @@  xfs_reinit_inode(
 	inode_set_iversion_queried(inode, version);
 	inode->i_mode = mode;
 	inode->i_rdev = dev;
+	inode->i_mapping->wb_err = old_err;
+	if (as_eio)
+		set_bit(AS_EIO, &inode->i_mapping->flags);
+	if (as_enospc)
+		set_bit(AS_ENOSPC, &inode->i_mapping->flags);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 02eae5059231..6c47ea3e577b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -835,6 +835,11 @@  xfs_ialloc(
 			inode->i_mode |= S_ISGID;
 	}
 
+	/* Reset all vfs error state. */
+	inode->i_mapping->wb_err = 0;
+	clear_bit(AS_EIO, &inode->i_mapping->flags);
+	clear_bit(AS_ENOSPC, &inode->i_mapping->flags);
+
 	/*
 	 * If the group ID of the new file does not match the effective group
 	 * ID or one of the supplementary group IDs, the S_ISGID bit is cleared