mbox series

[0/3] xfs: Extend per-inode extent counters

Message ID 20200831130010.454-1-chandanrlinux@gmail.com (mailing list archive)
Headers show
Series xfs: Extend per-inode extent counters | expand


Chandan Babu R Aug. 31, 2020, 1 p.m. UTC
The commit xfs: fix inode fork extent count overflow
(3f8a4f1d876d3e3e49e50b0396eaffcc4ba71b08) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data extent counter to 47 bits. The
length of 47-bits was chosen because,
Maximum file size = 2^63.
Maximum extent count when using 64k block size = 2^63 / 2^16 = 2^47.

Also, XFS has a per-inode xattr extent counter which is 16 bits
wide. A workload which
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
   causes the xattr extent counter to overflow.

Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide xattr extent counter
when large number of hardlinks are created. Hence this patchset
extends the on-disk field to 32-bits.

The following changes are made to accomplish this,
1. A new incompat superblock flag to prevent older kernels from mounting
   the filesystem. This flag has to be set during mkfs time.
2. Carve out a new 32-bit field from xfs_dinode->di_pad2[]. This field
   holds the most significant 15 bits of the data extent counter.
3. Carve out a new 16-bit field from xfs_dinode->di_pad2[]. This field
   holds the most significant 16 bits of the attr extent counter.

I did try in vain to get this done in a seamless way (i.e. setting an
ro-compat flag just in time when extent counter is about to
overflow). The maximum height of data and attr BMBT trees are a
function of maximum number of per-inode data and attr extents
respectively. Due to increase in the maximum value of data/attr
extents, the maximum height of the data/attr BMBT tree increased
causing the dependent log reservation values to increase as well.

Increased log reservation values caused "minimum log reservation size"
check to fail in some scenarios and hence mount syscall would return
with an error. Reducing log reservation values by making the
corresponding calculations more precise is not an option since these
code changes, once percolated to mkfs.xfs, could create filesystems
that won't mount on older kernels (Please refer to the discussion at

The patchset has been tested by executing xfstests with the following
mkfs.xfs options,
1. -m crc=0 -b size=1k
2. -m crc=0 -b size=4k
3. -m crc=0 -b size=512
4. -m rmapbt=1,reflink=1 -b size=1k
5. -m rmapbt=1,reflink=1 -b size=4k

Each of the above test scenarios were executed on the following
combinations (For V4 FS test scenario, the last combination
i.e. "Patched (enable widextcnt)", was omitted).
| Xfsprogs                    | Kernel    |
| Unpatched                   | Patched   |
| Patched (disable widextcnt) | Unpatched |
| Patched (disable widextcnt) | Patched   |
| Patched (enable widextcnt)  | Patched   |

This patchset is built on top of V3 of "Bail out if transaction can
cause extent count to overflow" patchset.  It can also be obtained
from https://github.com/chandanr/linux.git at branch

I will be posting the changes associated with xfsprogs separately.

Chandan Babu R (3):
  xfs: Introduce xfs_iext_max() helper
  xfs: Introduce xfs_dfork_nextents() helper
  xfs: Extend data/attr fork extent counter width

 fs/xfs/libxfs/xfs_bmap.c        | 17 ++++----
 fs/xfs/libxfs/xfs_format.h      | 24 +++++++-----
 fs/xfs/libxfs/xfs_inode_buf.c   | 69 ++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_inode_buf.h   |  6 ++-
 fs/xfs/libxfs/xfs_inode_fork.c  | 10 +++--
 fs/xfs/libxfs/xfs_inode_fork.h  | 19 ++++++++-
 fs/xfs/libxfs/xfs_log_format.h  |  8 ++--
 fs/xfs/libxfs/xfs_types.h       | 10 +++--
 fs/xfs/scrub/inode.c            | 14 ++++---
 fs/xfs/xfs_inode.c              |  2 +-
 fs/xfs/xfs_inode_item.c         | 12 +++++-
 fs/xfs/xfs_inode_item_recover.c | 20 +++++++---
 12 files changed, 153 insertions(+), 58 deletions(-)