Message ID | 20180522040631.GD14384@magnolia (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2018-05-21 at 21:06 -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > In inode_init_always(), we clear the inode mapping flags, which clears > any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not > also clear wb_err, which means that old mapping errors can leak through > to new inodes. > > This is crucial for the XFS inode allocation path because we recycle old > in-core inodes and we do not want error state from an old file to leak > into the new file. This bug was discovered by running generic/036 and > generic/047 in a loop and noticing that the EIOs generated by the > collision of direct and buffered writes in generic/036 would survive the > remount between 036 and 047, and get reported to the fsyncs (on > different files!) in generic/047. > > Since we're changing the semantics of inode_init_always, we must also > change xfs_reinit_inode to retain the writeback error state when we go > to recover an inode that has been torn down in the vfs but not yet > disposed of by XFS. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit > --- > fs/inode.c | 1 + > fs/xfs/xfs_icache.c | 9 +++++++++ > 2 files changed, 10 insertions(+) > > diff --git a/fs/inode.c b/fs/inode.c > index 13ceb98c3bd3..3b55391072f3 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) > mapping->a_ops = &empty_aops; > mapping->host = inode; > mapping->flags = 0; > + mapping->wb_err = 0; > atomic_set(&mapping->i_mmap_writable, 0); > mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); > mapping->private_data = NULL; > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index 164350d91efc..d01f9544ff01 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -298,6 +298,10 @@ xfs_reinit_inode( > uint64_t version = inode_peek_iversion(inode); > umode_t mode = inode->i_mode; > dev_t dev = inode->i_rdev; > + errseq_t old_err = inode->i_mapping->wb_err; > + bool as_eio = test_bit(AS_EIO, &inode->i_mapping->flags); > + bool as_enospc = test_bit(AS_ENOSPC, > + &inode->i_mapping->flags); > > error = inode_init_always(mp->m_super, inode); > > @@ -306,6 +310,11 @@ xfs_reinit_inode( > inode_set_iversion_queried(inode, version); > inode->i_mode = mode; > inode->i_rdev = dev; > + inode->i_mapping->wb_err = old_err; > + if (as_eio) > + set_bit(AS_EIO, &inode->i_mapping->flags); > + if (as_enospc) > + set_bit(AS_ENOSPC, &inode->i_mapping->flags); > return error; > } > Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Mon, May 21, 2018 at 09:06:31PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > In inode_init_always(), we clear the inode mapping flags, which clears > any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not > also clear wb_err, which means that old mapping errors can leak through > to new inodes. > > This is crucial for the XFS inode allocation path because we recycle old > in-core inodes and we do not want error state from an old file to leak > into the new file. This bug was discovered by running generic/036 and > generic/047 in a loop and noticing that the EIOs generated by the > collision of direct and buffered writes in generic/036 would survive the > remount between 036 and 047, and get reported to the fsyncs (on > different files!) in generic/047. > > Since we're changing the semantics of inode_init_always, we must also > change xfs_reinit_inode to retain the writeback error state when we go > to recover an inode that has been torn down in the vfs but not yet > disposed of by XFS. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit > --- > fs/inode.c | 1 + > fs/xfs/xfs_icache.c | 9 +++++++++ > 2 files changed, 10 insertions(+) > ... > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index 164350d91efc..d01f9544ff01 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -298,6 +298,10 @@ xfs_reinit_inode( > uint64_t version = inode_peek_iversion(inode); > umode_t mode = inode->i_mode; > dev_t dev = inode->i_rdev; > + errseq_t old_err = inode->i_mapping->wb_err; > + bool as_eio = test_bit(AS_EIO, &inode->i_mapping->flags); > + bool as_enospc = test_bit(AS_ENOSPC, > + &inode->i_mapping->flags); > > error = inode_init_always(mp->m_super, inode); > > @@ -306,6 +310,11 @@ xfs_reinit_inode( > inode_set_iversion_queried(inode, version); > inode->i_mode = mode; > inode->i_rdev = dev; > + inode->i_mapping->wb_err = old_err; > + if (as_eio) > + set_bit(AS_EIO, &inode->i_mapping->flags); > + if (as_enospc) > + set_bit(AS_ENOSPC, &inode->i_mapping->flags); I'm wondering how safe this is. Can't the associated on-disk inode have been unlinked and reallocated anew across this kind of reinit of the in-core ip? Brian > return error; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 22, 2018 at 08:14:01AM -0400, Brian Foster wrote: > On Mon, May 21, 2018 at 09:06:31PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > In inode_init_always(), we clear the inode mapping flags, which clears > > any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not > > also clear wb_err, which means that old mapping errors can leak through > > to new inodes. > > > > This is crucial for the XFS inode allocation path because we recycle old > > in-core inodes and we do not want error state from an old file to leak > > into the new file. This bug was discovered by running generic/036 and > > generic/047 in a loop and noticing that the EIOs generated by the > > collision of direct and buffered writes in generic/036 would survive the > > remount between 036 and 047, and get reported to the fsyncs (on > > different files!) in generic/047. > > > > Since we're changing the semantics of inode_init_always, we must also > > change xfs_reinit_inode to retain the writeback error state when we go > > to recover an inode that has been torn down in the vfs but not yet > > disposed of by XFS. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > v2: retain AS_EIO/AS_ENOSPC across xfs inode reinit > > --- > > fs/inode.c | 1 + > > fs/xfs/xfs_icache.c | 9 +++++++++ > > 2 files changed, 10 insertions(+) > > > ... > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index 164350d91efc..d01f9544ff01 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -298,6 +298,10 @@ xfs_reinit_inode( > > uint64_t version = inode_peek_iversion(inode); > > umode_t mode = inode->i_mode; > > dev_t dev = inode->i_rdev; > > + errseq_t old_err = inode->i_mapping->wb_err; > > + bool as_eio = test_bit(AS_EIO, &inode->i_mapping->flags); > > + bool as_enospc = test_bit(AS_ENOSPC, > > + &inode->i_mapping->flags); > > > > error = inode_init_always(mp->m_super, inode); > > > > @@ -306,6 +310,11 @@ xfs_reinit_inode( > > inode_set_iversion_queried(inode, version); > > inode->i_mode = mode; > > inode->i_rdev = dev; > > + inode->i_mapping->wb_err = old_err; > > + if (as_eio) > > + set_bit(AS_EIO, &inode->i_mapping->flags); > > + if (as_enospc) > > + set_bit(AS_ENOSPC, &inode->i_mapping->flags); > > I'm wondering how safe this is. Can't the associated on-disk inode have > been unlinked and reallocated anew across this kind of reinit of the > in-core ip? Oops, yeah, xfs_ialloc ought to clear those error states unconditionally when allocating a new on-disk inode. --D > Brian > > > return error; > > } > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/inode.c b/fs/inode.c index 13ceb98c3bd3..3b55391072f3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -178,6 +178,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping->a_ops = &empty_aops; mapping->host = inode; mapping->flags = 0; + mapping->wb_err = 0; atomic_set(&mapping->i_mmap_writable, 0); mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); mapping->private_data = NULL; diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 164350d91efc..d01f9544ff01 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -298,6 +298,10 @@ xfs_reinit_inode( uint64_t version = inode_peek_iversion(inode); umode_t mode = inode->i_mode; dev_t dev = inode->i_rdev; + errseq_t old_err = inode->i_mapping->wb_err; + bool as_eio = test_bit(AS_EIO, &inode->i_mapping->flags); + bool as_enospc = test_bit(AS_ENOSPC, + &inode->i_mapping->flags); error = inode_init_always(mp->m_super, inode); @@ -306,6 +310,11 @@ xfs_reinit_inode( inode_set_iversion_queried(inode, version); inode->i_mode = mode; inode->i_rdev = dev; + inode->i_mapping->wb_err = old_err; + if (as_eio) + set_bit(AS_EIO, &inode->i_mapping->flags); + if (as_enospc) + set_bit(AS_ENOSPC, &inode->i_mapping->flags); return error; }