Message ID | 20250118-vfs-dio-3ca805947186@brauner (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] vfs dio | expand |
On Sat, 18 Jan 2025 at 05:09, Christian Brauner <brauner@kernel.org> wrote: > > Add a separate dio read align field to statx, as many out of place write > file systems can easily do reads aligned to the device sector size, but > require bigger alignment for writes. I've pulled this, but it needs some fixing. You added the 'dio_read_offset_align' field to 'struct kstat', and that structure is *critical*, because it's used even for the real 'stat()' calls that people actually use (as opposed to the statx side that is seldom a real issue). And that field was added in a way that causes the struct to grow due to alignment issues. For no good reason, because there were existing holes in there. So please just fix it. I despise the whole statx thing exactly because it has (approximately) five specialized users, while slowing down regular stat/fstat that is used widely absolutely *evertwhere*. Of course, judging by past performance, I wouldn't be surprised if glibc has screwed the pooch, and decided to use 'statx()' to implement stat, together with extra pointless user space overhead to convert one into the other. Because that's the glibc way (ie the whole "turn fstat() into the much slower fstatat() call, just because"). So here's the deal: 'statx()' is *not* an "improved stat". It's an actively worse stat() for people who need very unusual and specialized information, and it makes everything else worse. Linus
On Mon, Jan 20, 2025 at 11:24:56AM -0800, Linus Torvalds wrote: > On Sat, 18 Jan 2025 at 05:09, Christian Brauner <brauner@kernel.org> wrote: > > > > Add a separate dio read align field to statx, as many out of place write > > file systems can easily do reads aligned to the device sector size, but > > require bigger alignment for writes. > > I've pulled this, but it needs some fixing. > > You added the 'dio_read_offset_align' field to 'struct kstat', and > that structure is *critical*, because it's used even for the real > 'stat()' calls that people actually use (as opposed to the statx side > that is seldom a real issue). > > And that field was added in a way that causes the struct to grow due > to alignment issues. For no good reason, because there were existing > holes in there. > > So please just fix it. Right, sorry I should've noticed that during review. > I despise the whole statx thing exactly because it has (approximately) > five specialized users, while slowing down regular stat/fstat that is > used widely absolutely *evertwhere*. > > Of course, judging by past performance, I wouldn't be surprised if > glibc has screwed the pooch, and decided to use 'statx()' to implement > stat, together with extra pointless user space overhead to convert one > into the other. Because that's the glibc way (ie the whole "turn > fstat() into the much slower fstatat() call, just because"). > > So here's the deal: 'statx()' is *not* an "improved stat". It's an > actively worse stat() for people who need very unusual and specialized > information, and it makes everything else worse. Yes, I'm well aware that you dislike statx(). :) It is heavily used nowadays though because there's a few additional bits in there that don't require calling into filesystems but are heavily used. I want to reiterate that if I'd been involved in statx() I would've done it as an extensible struct. So v1 of the struct would just be exactly what stat() was. This way neither copy-in nor copy-out would have done any unnecessary work and perf would've been the same. Only when userspace actually needed additional information it would have to pass in a larger struct and pay the price for copy-in and copy-out. I'll get you a new PR soon!
On Mon, 20 Jan 2025 at 11:24, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And that field was added in a way that causes the struct to grow due > to alignment issues. For no good reason, because there were existing > holes in there. > > So please just fix it. Side note: for extra bonus points, just make those fields be 'u8' or even smaller, knowing that the values (a) have to be powers of two anyway because nobody can deal with anything else (b) the powers are already limited to 31 on the top end (by the existing use of 'u32') and 9 on the low end (by SECTOR_SIZE) And yeah, maybe somebody wants to have a "no alignment possible" value, so that would make it a total of 24 different values. You really don't need 32 bits to encode 24 values. So wasting three 32-bit words for this all - and an extra one for the bad alignment - is just a crime, and only makes kstat bigger on stack. It also just adds extra code, since most *sources* have the size in bits anyway: it typically comes from things i_blocksize(), which just does return (1 << node->i_blkbits); and it would be better to just give the bit number. (XFS uses a special XFS_FSB_TO_B() thing to turn its bit-numbers to byte sizes - I'm not claiming that everybody uses i_blkbits, I'm just claiming that all sane users already are in terms of bits) Linus
On Mon, 20 Jan 2025 at 11:38, Christian Brauner <brauner@kernel.org> wrote: > > It is heavily used nowadays though because there's a few > additional bits in there that don't require calling into filesystems > but are heavily used. By "heavily used" you mean "there is probably a 1:1000 ratio between statx:stat in reality". Those "few additional bits" are all very specialized. No normal application cares about things like "mount ID" or subvolume data. I can't imagine what other fields you think are so important. Linus
The pull request you sent on Sat, 18 Jan 2025 14:09:26 +0100:
> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.14-rc1.statx.dio
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/47c9f2b3c838a33552dbd41db6c5d93377842fcd
Thank you!
On Mon, Jan 20, 2025 at 11:50:09AM -0800, Linus Torvalds wrote: > On Mon, 20 Jan 2025 at 11:38, Christian Brauner <brauner@kernel.org> wrote: > > > > It is heavily used nowadays though because there's a few > > additional bits in there that don't require calling into filesystems > > but are heavily used. > > By "heavily used" you mean "there is probably a 1:1000 ratio between > statx:stat in reality". > > Those "few additional bits" are all very specialized. No normal > application cares about things like "mount ID" or subvolume data. I > can't imagine what other fields you think are so important. Sorry, I wasn't talking about new fields I was talking about actual bits that statx has. #define STATX_ATTR_COMPRESSED 0x00000004 /* [I] File is compressed by the fs */ #define STATX_ATTR_IMMUTABLE 0x00000010 /* [I] File is marked immutable */ #define STATX_ATTR_APPEND 0x00000020 /* [I] File is append-only */ #define STATX_ATTR_NODUMP 0x00000040 /* [I] File is not to be dumped */ #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ #define STATX_ATTR_DAX 0x00200000 /* File is currently in DAX state */ #define STATX_ATTR_WRITE_ATOMIC 0x00400000 /* File supports atomic write operations */ I think that such an attribute field could've been added to struct stat itself because afaict there are two unused fields in there currently: struct stat { unsigned long st_dev; /* Device. */ unsigned long st_ino; /* File serial number. */ unsigned int st_mode; /* File mode. */ unsigned int st_nlink; /* Link count. */ unsigned int st_uid; /* User ID of the file's owner. */ unsigned int st_gid; /* Group ID of the file's group. */ [...] unsigned int __unused4; unsigned int __unused5; }; I happily concede that stat() will exceed the usage of statx() for most application that e.g., just want to do the basic mode, uid, gid thing. So compared to that statx() usage is probably not that important. And nothing would stop us from adding a statx2()... int statx(int dfd, const char *path, unsigned int flags, unsigned int mask struct statx *st, size_t st_size) #define STATX_SIZE_VER0 // compatible with struct stat and then only the stat portion is filled in. This could allow glibc to use statx() for everything without perf impact especially when paired with allowing NULL with AT_EMPTY_PATH.
[Sorry for the delay, I've been distrracted by a fun mix of personal business and urgend customer work] On Mon, Jan 20, 2025 at 11:24:56AM -0800, Linus Torvalds wrote: > On Sat, 18 Jan 2025 at 05:09, Christian Brauner <brauner@kernel.org> wrote: > > > > Add a separate dio read align field to statx, as many out of place write > > file systems can easily do reads aligned to the device sector size, but > > require bigger alignment for writes. > > I've pulled this, but it needs some fixing. > > You added the 'dio_read_offset_align' field to 'struct kstat', and > that structure is *critical*, because it's used even for the real > 'stat()' calls that people actually use (as opposed to the statx side > that is seldom a real issue). > > And that field was added in a way that causes the struct to grow due > to alignment issues. For no good reason, because there were existing > holes in there. Indeed, sorry. I put all attention on the user visible struct statx and missed optimizing the in-kernel kstat. > I despise the whole statx thing exactly because it has (approximately) > five specialized users, while slowing down regular stat/fstat that is > used widely absolutely *evertwhere*. Yeah. I still hate the statx design as it overload the critical stat information with all the misc based path information that is useful in some cases but not actually needed for most. Not much we can do about that now, though.