diff mbox series

xfs: log the inode on directory sf to block format change

Message ID 20191004125520.7857-1-bfoster@redhat.com (mailing list archive)
State Accepted
Headers show
Series xfs: log the inode on directory sf to block format change | expand

Commit Message

Brian Foster Oct. 4, 2019, 12:55 p.m. UTC
When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.

The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.

Update xfs_dir2_sf_to_block() to log the inode when the on-disk
format is changed. This ensures that any subsequent errors after the
format has changed cause the transaction to abort.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_dir2_block.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Darrick J. Wong Oct. 4, 2019, 3:24 p.m. UTC | #1
On Fri, Oct 04, 2019 at 08:55:20AM -0400, Brian Foster wrote:
> When a directory changes from shortform (sf) to block format, the sf
> format is copied to a temporary buffer, the inode format is modified
> and the updated format filled with the dentries from the temporary
> buffer. If the inode format is modified and attempt to grow the
> inode fails (due to I/O error, for example), it is possible to
> return an error while leaving the directory in an inconsistent state
> and with an otherwise clean transaction. This results in corruption
> of the associated directory and leads to xfs_dabuf_map() errors as
> subsequent lookups cannot accurately determine the format of the
> directory. This problem is reproduced occasionally by generic/475.
> 
> The fundamental problem is that xfs_dir2_sf_to_block() changes the
> on-disk inode format without logging the inode. The inode is
> eventually logged by the bmapi layer in the common case, but error
> checking introduces the possibility of failing the high level
> request before this happens.
> 
> Update xfs_dir2_sf_to_block() to log the inode when the on-disk
> format is changed. This ensures that any subsequent errors after the
> format has changed cause the transaction to abort.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_dir2_block.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
> index 9595ced393dc..3d1e5f6d64fd 100644
> --- a/fs/xfs/libxfs/xfs_dir2_block.c
> +++ b/fs/xfs/libxfs/xfs_dir2_block.c
> @@ -1098,6 +1098,7 @@ xfs_dir2_sf_to_block(
>  	xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
>  	xfs_bmap_local_to_extents_empty(dp, XFS_DATA_FORK);
>  	dp->i_d.di_size = 0;
> +	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);

I think the general idea looks ok, but is there any reason why we don't
log the inode in xfs_bmap_local_to_extents_empty, since it changes the
ondisk format?

Also, does xfs_attr_shortform_to_leaf have a similar problem?

--D

>  
>  	/*
>  	 * Add block 0 to the inode.
> -- 
> 2.20.1
>
Brian Foster Oct. 4, 2019, 3:51 p.m. UTC | #2
On Fri, Oct 04, 2019 at 08:24:08AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 04, 2019 at 08:55:20AM -0400, Brian Foster wrote:
> > When a directory changes from shortform (sf) to block format, the sf
> > format is copied to a temporary buffer, the inode format is modified
> > and the updated format filled with the dentries from the temporary
> > buffer. If the inode format is modified and attempt to grow the
> > inode fails (due to I/O error, for example), it is possible to
> > return an error while leaving the directory in an inconsistent state
> > and with an otherwise clean transaction. This results in corruption
> > of the associated directory and leads to xfs_dabuf_map() errors as
> > subsequent lookups cannot accurately determine the format of the
> > directory. This problem is reproduced occasionally by generic/475.
> > 
> > The fundamental problem is that xfs_dir2_sf_to_block() changes the
> > on-disk inode format without logging the inode. The inode is
> > eventually logged by the bmapi layer in the common case, but error
> > checking introduces the possibility of failing the high level
> > request before this happens.
> > 
> > Update xfs_dir2_sf_to_block() to log the inode when the on-disk
> > format is changed. This ensures that any subsequent errors after the
> > format has changed cause the transaction to abort.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_dir2_block.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
> > index 9595ced393dc..3d1e5f6d64fd 100644
> > --- a/fs/xfs/libxfs/xfs_dir2_block.c
> > +++ b/fs/xfs/libxfs/xfs_dir2_block.c
> > @@ -1098,6 +1098,7 @@ xfs_dir2_sf_to_block(
> >  	xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
> >  	xfs_bmap_local_to_extents_empty(dp, XFS_DATA_FORK);
> >  	dp->i_d.di_size = 0;
> > +	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
> 
> I think the general idea looks ok, but is there any reason why we don't
> log the inode in xfs_bmap_local_to_extents_empty, since it changes the
> ondisk format?
> 

No reason in particular that I'm aware of...

> Also, does xfs_attr_shortform_to_leaf have a similar problem?
> 

Hmmm I think it might, but I'm not totally sure. What is slightly
interesting is that xfs_attr_shortform_to_leaf() makes an attempt to put
the shortform format back into place on error. I briefly thought about
doing this for the dir code but figured it wasn't worth it given the
other dir conversions and operations are likely to have dirtied the tx
by that point. That said, the attr variant skips the recovery attempt in
the event of -EIO for some reason, which happens to be the error here.

So I suppose we could either stuff it in
xfs_bmap_local_to_extents_empty() since it's only called in these two
places, or we could do something similar in xfs_attr_shortform_to_leaf()
and perhaps remove the "recovery" attempt since we'll definitely abort
there as well. Hm?

Brian

> --D
> 
> >  
> >  	/*
> >  	 * Add block 0 to the inode.
> > -- 
> > 2.20.1
> >
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 9595ced393dc..3d1e5f6d64fd 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -1098,6 +1098,7 @@  xfs_dir2_sf_to_block(
 	xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
 	xfs_bmap_local_to_extents_empty(dp, XFS_DATA_FORK);
 	dp->i_d.di_size = 0;
+	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 
 	/*
 	 * Add block 0 to the inode.