Message ID | 20190911012107.26553-1-david@fromorbit.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [RFC] xfs: fix inode fork extent count overflow | expand |
... and there went my hopes to eventually squeeze xfs_ifork into a single 64-byte cacheline. But the analys looks sensible.
On Wed, Sep 11, 2019 at 03:55:51AM -0700, Christoph Hellwig wrote: > ... and there went my hopes to eventually squeeze xfs_ifork into > a single 64-byte cacheline. But the analys looks sensible. Not sure what the issue is here: struct xfs_ifork { int64_t if_bytes; /* 0 8 */ struct xfs_btree_block * if_broot; /* 8 8 */ unsigned int if_seq; /* 16 4 */ int if_height; /* 20 4 */ union { void * if_root; /* 24 8 */ char * if_data; /* 24 8 */ } if_u1; /* 24 8 */ short int if_broot_bytes; /* 32 2 */ unsigned char if_flags; /* 34 1 */ /* size: 40, cachelines: 1, members: 7 */ /* padding: 5 */ /* last cacheline: 40 bytes */ }; it's already well inside a 64-byte single cacheline, even with a 64bit if_bytes. Yes, I've just pushed it from 32 to 40 bytes, but but if that is a problem we could pack some things more tightly... Cheers, Dave.
On Wed, Sep 11, 2019 at 11:21:07AM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > [commit message is verbose for discussion purposes - will trim it > down later. Some questions about implementation details at the end.] > > Zorro Lang recently ran a new test to stress single inode extent > counts now that they are no longer limited by memory allocation. > The test was simply: > > # xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file > # ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file > > This test uncovered a problem where the hole punching operation > appeared to finish with no error, but apparently only created 268M > extents instead of the 10 billion it was supposed to. > > Further, trying to punch out extents that should have been present > resulted in success, but no change in the extent count. It looked > like a silent failure. > > While running the test and observing the behaviour in real time, > I observed the extent coutn growing at ~2M extents/minute, and saw > this after about an hour: > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \ > > sleep 60 ; \ > > xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > fsxattr.nextents = 127657993 > fsxattr.nextents = 129683339 > # > > And a few minutes later this: > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > fsxattr.nextents = 4177861124 > # > > Ah, what? Where did that 4 billion extra extents suddenly come from? > > Stop the workload, unmount, mount: > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > fsxattr.nextents = 166044375 > # > > And it's back at the expected number. i.e. the extent count is > correct on disk, but it's screwed up in memory. I loaded up the > extent list, and immediately: > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > fsxattr.nextents = 4192576215 > # > > It's bad again. So, where does that number come from? > xfs_fill_fsxattr(): > > if (ip->i_df.if_flags & XFS_IFEXTENTS) > fa->fsx_nextents = xfs_iext_count(&ip->i_df); > else > fa->fsx_nextents = ip->i_d.di_nextents; > > And that's the behaviour I just saw in a nutshell. The on disk count > is correct, but once the tree is loaded into memory, it goes whacky. > Clearly there's something wrong with xfs_iext_count(): > > inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) > { > return ifp->if_bytes / sizeof(struct xfs_iext_rec); > } > > Simple enough, but 134M extents is 2**27, and that's right about On the plus side, 2^27 is way better than the last time anyone tried to create an egregious number of extents. > where things went wrong. A struct xfs_iext_rec is 16 bytes in size, > which means 2**27 * 2**4 = 2**31 and we're right on target for an > integer overflow. And, sure enough: > > struct xfs_ifork { > int if_bytes; /* bytes in if_u1 */ > .... > > Once we get 2**27 extents in a file, we overflow if_bytes and the > in-core extent count goes wrong. And when we reach 2**28 extents, > if_bytes wraps back to zero and things really start to go wrong > there. This is where the silent failure comes from - only the first > 2**28 extents can be looked up directly due to the overflow, all the > extents above this index wrap back to somewhere in the first 2**28 > extents. Hence with a regular pattern, trying to punch a hole in the > range that didn't have holes mapped to a hole in the first 2**28 > extents and so "succeeded" without changing anything. Hence "silent > failure"... > > Fix this by converting if_bytes to a int64_t and converting all the > index variables and size calculations to use int64_t types to avoid > overflows in future. Signed integers are still used to enable easy > detection of extent count underflows. This enables scalability of > extent counts to the limits of the on-disk format - MAXEXTNUM > (2**31) extents. > > Current testing is at over 500M extents and still going: > > fsxattr.nextents = 517310478 > > Reported-by: Zorro Lang <zlang@redhat.com> > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks reasonable to me; did Zorro retest w/ this patch? If so, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/xfs/libxfs/xfs_attr_leaf.c | 18 ++++++++++-------- > fs/xfs/libxfs/xfs_dir2_sf.c | 2 +- > fs/xfs/libxfs/xfs_iext_tree.c | 2 +- > fs/xfs/libxfs/xfs_inode_fork.c | 8 ++++---- > fs/xfs/libxfs/xfs_inode_fork.h | 14 ++++++++------ > 5 files changed, 24 insertions(+), 20 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > index b9f019603d0b..0bfb1ba919e2 100644 > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > @@ -453,13 +453,15 @@ xfs_attr_copy_value( > * special case for dev/uuid inodes, they have fixed size data forks. > */ > int > -xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) > +xfs_attr_shortform_bytesfit( > + struct xfs_inode *dp, > + int bytes) > { > - int offset; > - int minforkoff; /* lower limit on valid forkoff locations */ > - int maxforkoff; /* upper limit on valid forkoff locations */ > - int dsize; > - xfs_mount_t *mp = dp->i_mount; > + struct xfs_mount *mp = dp->i_mount; > + int64_t dsize; > + int minforkoff; > + int maxforkoff; > + int offset; > > /* rounded down */ > offset = (XFS_LITINO(mp, dp->i_d.di_version) - bytes) >> 3; > @@ -525,7 +527,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) > * A data fork btree root must have space for at least > * MINDBTPTRS key/ptr pairs if the data fork is small or empty. > */ > - minforkoff = max(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); > + minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); > minforkoff = roundup(minforkoff, 8) >> 3; > > /* attr fork btree root can have at least this many key/ptr pairs */ > @@ -939,7 +941,7 @@ xfs_attr_shortform_verify( > char *endp; > struct xfs_ifork *ifp; > int i; > - int size; > + int64_t size; > > ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); > ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); > diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c > index 85f14fc2a8da..ae16ca7c422a 100644 > --- a/fs/xfs/libxfs/xfs_dir2_sf.c > +++ b/fs/xfs/libxfs/xfs_dir2_sf.c > @@ -628,7 +628,7 @@ xfs_dir2_sf_verify( > int i; > int i8count; > int offset; > - int size; > + int64_t size; > int error; > uint8_t filetype; > > diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c > index 7bc87408f1a0..52451809c478 100644 > --- a/fs/xfs/libxfs/xfs_iext_tree.c > +++ b/fs/xfs/libxfs/xfs_iext_tree.c > @@ -596,7 +596,7 @@ xfs_iext_realloc_root( > struct xfs_ifork *ifp, > struct xfs_iext_cursor *cur) > { > - size_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); > + int64_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); > void *new; > > /* account for the prev/next pointers */ > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c > index c643beeb5a24..8fdd0424070e 100644 > --- a/fs/xfs/libxfs/xfs_inode_fork.c > +++ b/fs/xfs/libxfs/xfs_inode_fork.c > @@ -129,7 +129,7 @@ xfs_init_local_fork( > struct xfs_inode *ip, > int whichfork, > const void *data, > - int size) > + int64_t size) > { > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > int mem_size = size, real_size = 0; > @@ -467,11 +467,11 @@ xfs_iroot_realloc( > void > xfs_idata_realloc( > struct xfs_inode *ip, > - int byte_diff, > + int64_t byte_diff, > int whichfork) > { > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > - int new_size = (int)ifp->if_bytes + byte_diff; > + int64_t new_size = ifp->if_bytes + byte_diff; > > ASSERT(new_size >= 0); > ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork)); > @@ -552,7 +552,7 @@ xfs_iextents_copy( > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > struct xfs_iext_cursor icur; > struct xfs_bmbt_irec rec; > - int copied = 0; > + int64_t copied = 0; > > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)); > ASSERT(ifp->if_bytes > 0); > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h > index 00c62ce170d0..7b845c052fb4 100644 > --- a/fs/xfs/libxfs/xfs_inode_fork.h > +++ b/fs/xfs/libxfs/xfs_inode_fork.h > @@ -13,16 +13,16 @@ struct xfs_dinode; > * File incore extent information, present for each of data & attr forks. > */ > struct xfs_ifork { > - int if_bytes; /* bytes in if_u1 */ > - unsigned int if_seq; /* fork mod counter */ > + int64_t if_bytes; /* bytes in if_u1 */ > struct xfs_btree_block *if_broot; /* file's incore btree root */ > - short if_broot_bytes; /* bytes allocated for root */ > - unsigned char if_flags; /* per-fork flags */ > + unsigned int if_seq; /* fork mod counter */ > int if_height; /* height of the extent tree */ > union { > void *if_root; /* extent tree root */ > char *if_data; /* inline file data */ > } if_u1; > + short if_broot_bytes; /* bytes allocated for root */ > + unsigned char if_flags; /* per-fork flags */ > }; > > /* > @@ -93,12 +93,14 @@ int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *); > void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *, > struct xfs_inode_log_item *, int); > void xfs_idestroy_fork(struct xfs_inode *, int); > -void xfs_idata_realloc(struct xfs_inode *, int, int); > +void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, > + int whichfork); > void xfs_iroot_realloc(struct xfs_inode *, int, int); > int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); > int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *, > int); > -void xfs_init_local_fork(struct xfs_inode *, int, const void *, int); > +void xfs_init_local_fork(struct xfs_inode *ip, int whichfork, > + const void *data, int64_t size); > > xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp); > void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur, > -- > 2.23.0.rc1 >
On Mon, Sep 16, 2019 at 09:20:05AM -0700, Darrick J. Wong wrote: > On Wed, Sep 11, 2019 at 11:21:07AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > [commit message is verbose for discussion purposes - will trim it > > down later. Some questions about implementation details at the end.] > > > > Zorro Lang recently ran a new test to stress single inode extent > > counts now that they are no longer limited by memory allocation. > > The test was simply: > > > > # xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file > > # ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file > > > > This test uncovered a problem where the hole punching operation > > appeared to finish with no error, but apparently only created 268M > > extents instead of the 10 billion it was supposed to. > > > > Further, trying to punch out extents that should have been present > > resulted in success, but no change in the extent count. It looked > > like a silent failure. > > > > While running the test and observing the behaviour in real time, > > I observed the extent coutn growing at ~2M extents/minute, and saw > > this after about an hour: > > > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \ > > > sleep 60 ; \ > > > xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > > fsxattr.nextents = 127657993 > > fsxattr.nextents = 129683339 > > # > > > > And a few minutes later this: > > > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > > fsxattr.nextents = 4177861124 > > # > > > > Ah, what? Where did that 4 billion extra extents suddenly come from? > > > > Stop the workload, unmount, mount: > > > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > > fsxattr.nextents = 166044375 > > # > > > > And it's back at the expected number. i.e. the extent count is > > correct on disk, but it's screwed up in memory. I loaded up the > > extent list, and immediately: > > > > # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next > > fsxattr.nextents = 4192576215 > > # > > > > It's bad again. So, where does that number come from? > > xfs_fill_fsxattr(): > > > > if (ip->i_df.if_flags & XFS_IFEXTENTS) > > fa->fsx_nextents = xfs_iext_count(&ip->i_df); > > else > > fa->fsx_nextents = ip->i_d.di_nextents; > > > > And that's the behaviour I just saw in a nutshell. The on disk count > > is correct, but once the tree is loaded into memory, it goes whacky. > > Clearly there's something wrong with xfs_iext_count(): > > > > inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) > > { > > return ifp->if_bytes / sizeof(struct xfs_iext_rec); > > } > > > > Simple enough, but 134M extents is 2**27, and that's right about > > On the plus side, 2^27 is way better than the last time anyone tried to > create an egregious number of extents. > > > where things went wrong. A struct xfs_iext_rec is 16 bytes in size, > > which means 2**27 * 2**4 = 2**31 and we're right on target for an > > integer overflow. And, sure enough: > > > > struct xfs_ifork { > > int if_bytes; /* bytes in if_u1 */ > > .... > > > > Once we get 2**27 extents in a file, we overflow if_bytes and the > > in-core extent count goes wrong. And when we reach 2**28 extents, > > if_bytes wraps back to zero and things really start to go wrong > > there. This is where the silent failure comes from - only the first > > 2**28 extents can be looked up directly due to the overflow, all the > > extents above this index wrap back to somewhere in the first 2**28 > > extents. Hence with a regular pattern, trying to punch a hole in the > > range that didn't have holes mapped to a hole in the first 2**28 > > extents and so "succeeded" without changing anything. Hence "silent > > failure"... > > > > Fix this by converting if_bytes to a int64_t and converting all the > > index variables and size calculations to use int64_t types to avoid > > overflows in future. Signed integers are still used to enable easy > > detection of extent count underflows. This enables scalability of > > extent counts to the limits of the on-disk format - MAXEXTNUM > > (2**31) extents. > > > > Current testing is at over 500M extents and still going: > > > > fsxattr.nextents = 517310478 > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > Looks reasonable to me; did Zorro retest w/ this patch? After merging this patch, currently I've gotten: # xfs_io -c stat /mnt/500Test/2nd-40t-file fd.path = "/mnt/500Test/2nd-40t-file" fd.flags = non-sync,non-direct,read-write stat.ino = 132 stat.type = regular file stat.size = 43980465111040 stat.blocks = 79880689352 fsxattr.xflags = 0x80000002 [-p--------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.cowextsize = 0 fsxattr.nextents = 755358063 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 And the *nextents* is still increasing. So I think this patch works as it expects. But I haven't gotten enough time to hit the MAX extents limitation. I'll let the test keep running, to see what'll happen. I'll do more test after you merge this patch into xfs-linux. Thanks, Zorro > > If so, > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > --D > > > --- > > fs/xfs/libxfs/xfs_attr_leaf.c | 18 ++++++++++-------- > > fs/xfs/libxfs/xfs_dir2_sf.c | 2 +- > > fs/xfs/libxfs/xfs_iext_tree.c | 2 +- > > fs/xfs/libxfs/xfs_inode_fork.c | 8 ++++---- > > fs/xfs/libxfs/xfs_inode_fork.h | 14 ++++++++------ > > 5 files changed, 24 insertions(+), 20 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > > index b9f019603d0b..0bfb1ba919e2 100644 > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > > @@ -453,13 +453,15 @@ xfs_attr_copy_value( > > * special case for dev/uuid inodes, they have fixed size data forks. > > */ > > int > > -xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) > > +xfs_attr_shortform_bytesfit( > > + struct xfs_inode *dp, > > + int bytes) > > { > > - int offset; > > - int minforkoff; /* lower limit on valid forkoff locations */ > > - int maxforkoff; /* upper limit on valid forkoff locations */ > > - int dsize; > > - xfs_mount_t *mp = dp->i_mount; > > + struct xfs_mount *mp = dp->i_mount; > > + int64_t dsize; > > + int minforkoff; > > + int maxforkoff; > > + int offset; > > > > /* rounded down */ > > offset = (XFS_LITINO(mp, dp->i_d.di_version) - bytes) >> 3; > > @@ -525,7 +527,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) > > * A data fork btree root must have space for at least > > * MINDBTPTRS key/ptr pairs if the data fork is small or empty. > > */ > > - minforkoff = max(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); > > + minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); > > minforkoff = roundup(minforkoff, 8) >> 3; > > > > /* attr fork btree root can have at least this many key/ptr pairs */ > > @@ -939,7 +941,7 @@ xfs_attr_shortform_verify( > > char *endp; > > struct xfs_ifork *ifp; > > int i; > > - int size; > > + int64_t size; > > > > ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); > > ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); > > diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c > > index 85f14fc2a8da..ae16ca7c422a 100644 > > --- a/fs/xfs/libxfs/xfs_dir2_sf.c > > +++ b/fs/xfs/libxfs/xfs_dir2_sf.c > > @@ -628,7 +628,7 @@ xfs_dir2_sf_verify( > > int i; > > int i8count; > > int offset; > > - int size; > > + int64_t size; > > int error; > > uint8_t filetype; > > > > diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c > > index 7bc87408f1a0..52451809c478 100644 > > --- a/fs/xfs/libxfs/xfs_iext_tree.c > > +++ b/fs/xfs/libxfs/xfs_iext_tree.c > > @@ -596,7 +596,7 @@ xfs_iext_realloc_root( > > struct xfs_ifork *ifp, > > struct xfs_iext_cursor *cur) > > { > > - size_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); > > + int64_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); > > void *new; > > > > /* account for the prev/next pointers */ > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c > > index c643beeb5a24..8fdd0424070e 100644 > > --- a/fs/xfs/libxfs/xfs_inode_fork.c > > +++ b/fs/xfs/libxfs/xfs_inode_fork.c > > @@ -129,7 +129,7 @@ xfs_init_local_fork( > > struct xfs_inode *ip, > > int whichfork, > > const void *data, > > - int size) > > + int64_t size) > > { > > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > > int mem_size = size, real_size = 0; > > @@ -467,11 +467,11 @@ xfs_iroot_realloc( > > void > > xfs_idata_realloc( > > struct xfs_inode *ip, > > - int byte_diff, > > + int64_t byte_diff, > > int whichfork) > > { > > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > > - int new_size = (int)ifp->if_bytes + byte_diff; > > + int64_t new_size = ifp->if_bytes + byte_diff; > > > > ASSERT(new_size >= 0); > > ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork)); > > @@ -552,7 +552,7 @@ xfs_iextents_copy( > > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); > > struct xfs_iext_cursor icur; > > struct xfs_bmbt_irec rec; > > - int copied = 0; > > + int64_t copied = 0; > > > > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)); > > ASSERT(ifp->if_bytes > 0); > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h > > index 00c62ce170d0..7b845c052fb4 100644 > > --- a/fs/xfs/libxfs/xfs_inode_fork.h > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h > > @@ -13,16 +13,16 @@ struct xfs_dinode; > > * File incore extent information, present for each of data & attr forks. > > */ > > struct xfs_ifork { > > - int if_bytes; /* bytes in if_u1 */ > > - unsigned int if_seq; /* fork mod counter */ > > + int64_t if_bytes; /* bytes in if_u1 */ > > struct xfs_btree_block *if_broot; /* file's incore btree root */ > > - short if_broot_bytes; /* bytes allocated for root */ > > - unsigned char if_flags; /* per-fork flags */ > > + unsigned int if_seq; /* fork mod counter */ > > int if_height; /* height of the extent tree */ > > union { > > void *if_root; /* extent tree root */ > > char *if_data; /* inline file data */ > > } if_u1; > > + short if_broot_bytes; /* bytes allocated for root */ > > + unsigned char if_flags; /* per-fork flags */ > > }; > > > > /* > > @@ -93,12 +93,14 @@ int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *); > > void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *, > > struct xfs_inode_log_item *, int); > > void xfs_idestroy_fork(struct xfs_inode *, int); > > -void xfs_idata_realloc(struct xfs_inode *, int, int); > > +void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, > > + int whichfork); > > void xfs_iroot_realloc(struct xfs_inode *, int, int); > > int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); > > int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *, > > int); > > -void xfs_init_local_fork(struct xfs_inode *, int, const void *, int); > > +void xfs_init_local_fork(struct xfs_inode *ip, int whichfork, > > + const void *data, int64_t size); > > > > xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp); > > void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur, > > -- > > 2.23.0.rc1 > >
On Mon, Sep 16, 2019 at 09:20:05AM -0700, Darrick J. Wong wrote: > On Wed, Sep 11, 2019 at 11:21:07AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > And that's the behaviour I just saw in a nutshell. The on disk count > > is correct, but once the tree is loaded into memory, it goes whacky. > > Clearly there's something wrong with xfs_iext_count(): > > > > inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) > > { > > return ifp->if_bytes / sizeof(struct xfs_iext_rec); > > } > > > > Simple enough, but 134M extents is 2**27, and that's right about > > On the plus side, 2^27 is way better than the last time anyone tried to > create an egregious number of extents. Well, we'd get to 2^26 (~65M extents) before memory allocation stopped progress... > > Current testing is at over 500M extents and still going: > > > > fsxattr.nextents = 517310478 > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > Looks reasonable to me; did Zorro retest w/ this patch? No idea, but I got to 1.3B extents before the VM ran out of RAM and oom-killed itself to death - the extent list took up >47GB of the 48GB of RAM I gave the VM. At some point we are going to have to think about demand paging extent lists.... > If so, > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Thanks! -Dave.
On Thu, Sep 12, 2019 at 11:08:38AM +1000, Dave Chinner wrote: > On Wed, Sep 11, 2019 at 03:55:51AM -0700, Christoph Hellwig wrote: > > ... and there went my hopes to eventually squeeze xfs_ifork into > > a single 64-byte cacheline. But the analys looks sensible. > > Not sure what the issue is here: > > struct xfs_ifork { > int64_t if_bytes; /* 0 8 */ > struct xfs_btree_block * if_broot; /* 8 8 */ > unsigned int if_seq; /* 16 4 */ > int if_height; /* 20 4 */ > union { > void * if_root; /* 24 8 */ > char * if_data; /* 24 8 */ > } if_u1; /* 24 8 */ > short int if_broot_bytes; /* 32 2 */ > unsigned char if_flags; /* 34 1 */ > > /* size: 40, cachelines: 1, members: 7 */ > /* padding: 5 */ > /* last cacheline: 40 bytes */ > }; > > it's already well inside a 64-byte single cacheline, even with a > 64bit if_bytes. Yes, I've just pushed it from 32 to 40 bytes, but > but if that is a problem we could pack some things more tightly... Ok, I misremembered. But before it fit into half a cacheline and nicely aligned slab, and now it doesn't. Not really as an argument against the patch because it is clearly needed..
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index b9f019603d0b..0bfb1ba919e2 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -453,13 +453,15 @@ xfs_attr_copy_value( * special case for dev/uuid inodes, they have fixed size data forks. */ int -xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) +xfs_attr_shortform_bytesfit( + struct xfs_inode *dp, + int bytes) { - int offset; - int minforkoff; /* lower limit on valid forkoff locations */ - int maxforkoff; /* upper limit on valid forkoff locations */ - int dsize; - xfs_mount_t *mp = dp->i_mount; + struct xfs_mount *mp = dp->i_mount; + int64_t dsize; + int minforkoff; + int maxforkoff; + int offset; /* rounded down */ offset = (XFS_LITINO(mp, dp->i_d.di_version) - bytes) >> 3; @@ -525,7 +527,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes) * A data fork btree root must have space for at least * MINDBTPTRS key/ptr pairs if the data fork is small or empty. */ - minforkoff = max(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); + minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); minforkoff = roundup(minforkoff, 8) >> 3; /* attr fork btree root can have at least this many key/ptr pairs */ @@ -939,7 +941,7 @@ xfs_attr_shortform_verify( char *endp; struct xfs_ifork *ifp; int i; - int size; + int64_t size; ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL); ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c index 85f14fc2a8da..ae16ca7c422a 100644 --- a/fs/xfs/libxfs/xfs_dir2_sf.c +++ b/fs/xfs/libxfs/xfs_dir2_sf.c @@ -628,7 +628,7 @@ xfs_dir2_sf_verify( int i; int i8count; int offset; - int size; + int64_t size; int error; uint8_t filetype; diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c index 7bc87408f1a0..52451809c478 100644 --- a/fs/xfs/libxfs/xfs_iext_tree.c +++ b/fs/xfs/libxfs/xfs_iext_tree.c @@ -596,7 +596,7 @@ xfs_iext_realloc_root( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { - size_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); + int64_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); void *new; /* account for the prev/next pointers */ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index c643beeb5a24..8fdd0424070e 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -129,7 +129,7 @@ xfs_init_local_fork( struct xfs_inode *ip, int whichfork, const void *data, - int size) + int64_t size) { struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); int mem_size = size, real_size = 0; @@ -467,11 +467,11 @@ xfs_iroot_realloc( void xfs_idata_realloc( struct xfs_inode *ip, - int byte_diff, + int64_t byte_diff, int whichfork) { struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); - int new_size = (int)ifp->if_bytes + byte_diff; + int64_t new_size = ifp->if_bytes + byte_diff; ASSERT(new_size >= 0); ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork)); @@ -552,7 +552,7 @@ xfs_iextents_copy( struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); struct xfs_iext_cursor icur; struct xfs_bmbt_irec rec; - int copied = 0; + int64_t copied = 0; ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)); ASSERT(ifp->if_bytes > 0); diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 00c62ce170d0..7b845c052fb4 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -13,16 +13,16 @@ struct xfs_dinode; * File incore extent information, present for each of data & attr forks. */ struct xfs_ifork { - int if_bytes; /* bytes in if_u1 */ - unsigned int if_seq; /* fork mod counter */ + int64_t if_bytes; /* bytes in if_u1 */ struct xfs_btree_block *if_broot; /* file's incore btree root */ - short if_broot_bytes; /* bytes allocated for root */ - unsigned char if_flags; /* per-fork flags */ + unsigned int if_seq; /* fork mod counter */ int if_height; /* height of the extent tree */ union { void *if_root; /* extent tree root */ char *if_data; /* inline file data */ } if_u1; + short if_broot_bytes; /* bytes allocated for root */ + unsigned char if_flags; /* per-fork flags */ }; /* @@ -93,12 +93,14 @@ int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *); void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *, struct xfs_inode_log_item *, int); void xfs_idestroy_fork(struct xfs_inode *, int); -void xfs_idata_realloc(struct xfs_inode *, int, int); +void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, + int whichfork); void xfs_iroot_realloc(struct xfs_inode *, int, int); int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *, int); -void xfs_init_local_fork(struct xfs_inode *, int, const void *, int); +void xfs_init_local_fork(struct xfs_inode *ip, int whichfork, + const void *data, int64_t size); xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp); void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur,