diff mbox series

[01/15] xfs: don't clear the "dinode core" in xfs_inode_alloc

Message ID 20200620071102.462554-2-hch@lst.de (mailing list archive)
State Superseded, archived
Headers show
Series [01/15] xfs: don't clear the "dinode core" in xfs_inode_alloc | expand

Commit Message

Christoph Hellwig June 20, 2020, 7:10 a.m. UTC
The xfs_icdinode structure just contains a random mix of inode field,
which are all read from the on-disk inode and mostly not looked at
before reading the inode or initializing a new inode cluster.  The
only exceptions are the forkoff and blocks field, which are used
in sanity checks for freshly allocated inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_icache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Chandan Babu R June 22, 2020, 7:11 a.m. UTC | #1
On Saturday 20 June 2020 12:40:48 PM IST Christoph Hellwig wrote:
> The xfs_icdinode structure just contains a random mix of inode field,
> which are all read from the on-disk inode and mostly not looked at
> before reading the inode or initializing a new inode cluster.  The
> only exceptions are the forkoff and blocks field, which are used
> in sanity checks for freshly allocated inodes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_icache.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 0a5ac6f9a58349..660e7abd4e8b76 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -66,7 +66,8 @@ xfs_inode_alloc(
>  	memset(&ip->i_df, 0, sizeof(ip->i_df));
>  	ip->i_flags = 0;
>  	ip->i_delayed_blks = 0;
> -	memset(&ip->i_d, 0, sizeof(ip->i_d));
> +	ip->i_d.di_nblocks = 0;
> +	ip->i_d.di_forkoff = 0;
>  	ip->i_sick = 0;
>  	ip->i_checked = 0;
>  	INIT_WORK(&ip->i_ioend_work, xfs_end_io);
> 

i_d.di_nblocks is accessed by xfs_iget_check_free_state() and
i_d.di_forkoff is being accessed by xfs_setup_inode(). Hence,

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Darrick J. Wong June 30, 2020, 5:58 p.m. UTC | #2
On Sat, Jun 20, 2020 at 09:10:48AM +0200, Christoph Hellwig wrote:
> The xfs_icdinode structure just contains a random mix of inode field,
> which are all read from the on-disk inode and mostly not looked at
> before reading the inode or initializing a new inode cluster.  The
> only exceptions are the forkoff and blocks field, which are used
> in sanity checks for freshly allocated inodes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

/me thinks this looks ok, though I'm leaning on Chandan probably having
done a more thorough examination than I did...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_icache.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 0a5ac6f9a58349..660e7abd4e8b76 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -66,7 +66,8 @@ xfs_inode_alloc(
>  	memset(&ip->i_df, 0, sizeof(ip->i_df));
>  	ip->i_flags = 0;
>  	ip->i_delayed_blks = 0;
> -	memset(&ip->i_d, 0, sizeof(ip->i_d));
> +	ip->i_d.di_nblocks = 0;
> +	ip->i_d.di_forkoff = 0;
>  	ip->i_sick = 0;
>  	ip->i_checked = 0;
>  	INIT_WORK(&ip->i_ioend_work, xfs_end_io);
> -- 
> 2.26.2
>
Darrick J. Wong July 2, 2020, 6:34 p.m. UTC | #3
On Tue, Jun 30, 2020 at 10:58:49AM -0700, Darrick J. Wong wrote:
> On Sat, Jun 20, 2020 at 09:10:48AM +0200, Christoph Hellwig wrote:
> > The xfs_icdinode structure just contains a random mix of inode field,
> > which are all read from the on-disk inode and mostly not looked at
> > before reading the inode or initializing a new inode cluster.  The
> > only exceptions are the forkoff and blocks field, which are used
> > in sanity checks for freshly allocated inodes.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> /me thinks this looks ok, though I'm leaning on Chandan probably having
> done a more thorough examination than I did...

...which I should not have done, because this is not correct.

Prior to this patch, xfs_inode_alloc would always zero ip->i_d before
loading the inode in from disk.  xfs_inode_from_disk, in turn, didn't
have to worry about zeroing fields if it was reading in a v2 inode.
Hence it does nothing about i_diflags2 or i_crtime.

Unfortunately, this opens a crash vector.  Let's say you mount a V5 fs,
create some reflinked files, and unmount the V5 fs.  Next, you mount a
V4 fs, and if the slab reuses the incore inodes, the next
xfs_inode_alloc won't clear the v3inode fields, and neither will
xfs_inode_from_disk.  The xfs_is_reflink_inode helper will see the
REFLINK flag set in i_diflags2 (remember, nobody cleared it) and try to
do CoW things, and kaboom.

We could (should?) probably fix the helper, but we definitely cannot be
leaving nuggets like this around in memory from the previous occupants.

I suspect the (v3inode && IGET_CREATE && !MOUNT_IKEEP) case could also
be a (theoretical) future landmine.

FWIW I found this by running fstests with 'MKFS_OPTIONS="-m crc=0"' and
then running xfs/096 and then generic/456 in that order, though I think
reproducibility here depends on the luck of the draw wrt slab caches.

> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Withdrawn, for now...

--D

> 
> --D
> 
> > ---
> >  fs/xfs/xfs_icache.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 0a5ac6f9a58349..660e7abd4e8b76 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -66,7 +66,8 @@ xfs_inode_alloc(
> >  	memset(&ip->i_df, 0, sizeof(ip->i_df));
> >  	ip->i_flags = 0;
> >  	ip->i_delayed_blks = 0;
> > -	memset(&ip->i_d, 0, sizeof(ip->i_d));
> > +	ip->i_d.di_nblocks = 0;
> > +	ip->i_d.di_forkoff = 0;
> >  	ip->i_sick = 0;
> >  	ip->i_checked = 0;
> >  	INIT_WORK(&ip->i_ioend_work, xfs_end_io);
> > -- 
> > 2.26.2
> >
diff mbox series

Patch

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0a5ac6f9a58349..660e7abd4e8b76 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -66,7 +66,8 @@  xfs_inode_alloc(
 	memset(&ip->i_df, 0, sizeof(ip->i_df));
 	ip->i_flags = 0;
 	ip->i_delayed_blks = 0;
-	memset(&ip->i_d, 0, sizeof(ip->i_d));
+	ip->i_d.di_nblocks = 0;
+	ip->i_d.di_forkoff = 0;
 	ip->i_sick = 0;
 	ip->i_checked = 0;
 	INIT_WORK(&ip->i_ioend_work, xfs_end_io);