Message ID | 20250206-min-io-default-blocksize-v1-1-2312e0bb8809@samsung.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | mkfs: use stx_blksize for dev block size by default | expand |
On Thu, Feb 06, 2025 at 07:00:55PM +0000, da.gomez@kernel.org wrote: > From: Daniel Gomez <da.gomez@samsung.com> > > In patch [1] ("bdev: use bdev_io_min() for statx block size"), block > devices will now report their preferred minimum I/O size for optimal > performance in the stx_blksize field of the statx data structure. This > change updates the current default 4 KiB block size for all devices > reporting a minimum I/O larger than 4 KiB, opting instead to query for > its advertised minimum I/O value in the statx data struct. > > [1]: > https://lore.kernel.org/all/20250204231209.429356-9-mcgrof@kernel.org/ This isn't even upstream yet... > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > --- > Set MIN-IO from statx as the default filesystem fundamental block size. > This ensures that, for devices reporting values within the supported > XFS block size range, we do not incur in RMW. If the MIN-IO reported > value is lower than the current default of 4 KiB, then 4 KiB will be > used instead. I don't think this is a good idea -- assuming you mean the same MIN-IO as what lsblk puts out: NAME MIN-IO sda 512 ├─sda1 512 ├─sda2 512 │ └─node0.boot 512 ├─sda3 512 │ └─node0.swap 512 └─sda4 512 └─node0.lvm 512 └─node0-root 512 sdb 4096 └─sdb1 4096 nvme1n1 512 └─md0 524288 └─node0.raid 524288 └─node0_raid-storage 524288 nvme0n1 512 └─md0 524288 └─node0.raid 524288 └─node0_raid-storage 524288 nvme2n1 512 └─md0 524288 └─node0.raid 524288 └─node0_raid-storage 524288 nvme3n1 512 └─md0 524288 └─node0.raid 524288 └─node0_raid-storage 524288 Now you've decreased the default blocksize to 512 on sda, and md0 gets an impossible 512k blocksize. Also, disrupting the default 4k blocksize will introduce portability problems with distros that aren't yet shipping 6.12. --D > --- > mkfs/xfs_mkfs.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c > index bbd0dbb6c80ab63ebf9edbe0a9a304149770f89d..e17389622682bc23f9b20c207db7e22181e2fe6f 100644 > --- a/mkfs/xfs_mkfs.c > +++ b/mkfs/xfs_mkfs.c > @@ -2178,6 +2178,26 @@ _("block size %d cannot be smaller than sector size %d\n"), > } > } > > +void > +get_dev_blocksize( > + struct cli_params *cli, > + struct mkfs_default_params *dft) > +{ > + struct statx stx; > + int ret; > + > + if (!cli->xi->data.name) > + return; > + > + ret = statx(AT_FDCWD, cli->xi->data.name, > + AT_STATX_DONT_SYNC | AT_NO_AUTOMOUNT, > + STATX_DIOALIGN | STATX_TYPE | STATX_MODE | STATX_INO, &stx); > + if (!ret) > + dft->blocksize = > + min(max(1 << XFS_DFL_BLOCKSIZE_LOG, stx.stx_blksize), > + XFS_MAX_BLOCKSIZE); > +} > + > static void > validate_blocksize( > struct mkfs_params *cfg, > @@ -2189,6 +2209,7 @@ validate_blocksize( > * For RAID4/5/6 we want to align sector size and block size, > * so we need to start with the device geometry extraction too. > */ > + get_dev_blocksize(cli, dft); > if (!cli->blocksize) > cfg->blocksize = dft->blocksize; > else > > --- > base-commit: 90d6da68ee54e6d4ef99eca4a82cac6036a34b00 > change-id: 20250206-min-io-default-blocksize-13334f54899e > > Best regards, > -- > Daniel Gomez <da.gomez@samsung.com> > >
On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > Now you've decreased the default blocksize to 512 on sda, and md0 gets > an impossible 512k blocksize. Also, disrupting the default 4k blocksize > will introduce portability problems with distros that aren't yet > shipping 6.12. Our default should be 4k, and to address the later we should sanity check and user an upper limit of what XFS supports, 64k. Thoughts? Luis
On Thu, Feb 06, 2025 at 02:50:28PM -0800, Luis Chamberlain wrote: > On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > > Now you've decreased the default blocksize to 512 on sda, and md0 gets > > an impossible 512k blocksize. Also, disrupting the default 4k blocksize > > will introduce portability problems with distros that aren't yet > > shipping 6.12. > > Our default should be 4k, and to address the later we should sanity > check and user an upper limit of what XFS supports, 64k. I don't think it's a good idea to boost the default fsblock size beyond 4k until we get further into the era where the major distros are shipping 6.12 kernels. I wouldn't want to deal with people accidentally ending up with an 8k fsblock filesystem that they can't mount on fairly new things like RHEL9/Debian12/etc. --D > Thoughts? > > Luis >
On Thu, Feb 06, 2025 at 07:00:55PM +0000, da.gomez@kernel.org wrote: > From: Daniel Gomez <da.gomez@samsung.com> > > In patch [1] ("bdev: use bdev_io_min() for statx block size"), block > devices will now report their preferred minimum I/O size for optimal > performance in the stx_blksize field of the statx data structure. This > change updates the current default 4 KiB block size for all devices > reporting a minimum I/O larger than 4 KiB, opting instead to query for > its advertised minimum I/O value in the statx data struct. UUuh, no. Larger block sizes have their use cases, but this will regress performance for a lot (most?) common setups. A lot of device report fairly high values there, but say increasing the directory and bmap btree block size unconditionally using the current algorithms will dramatically increase write amplification. Similarly for small buffered writes.
On Thu, Feb 06, 2025 at 02:50:28PM +0100, Luis Chamberlain wrote: > On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > > Now you've decreased the default blocksize to 512 on sda, and md0 gets > > an impossible 512k blocksize. Also, disrupting the default 4k blocksize > > will introduce portability problems with distros that aren't yet > > shipping 6.12. > > Our default should be 4k, and to address the later we should sanity > check and user an upper limit of what XFS supports, 64k. To clarify, the patch addresses the cases described above. It sets mkfs.xfs's default block size to the device’s reported minimum I/O size (via stx_blksize), clamping the value between 4k and 64k: if the reported size is less than 4k, 4k is used, and if it exceeds 64k (XFS_MAX_BLOCKSIZE), 64k is applied. > > Thoughts? > > Luis
On Thu, Feb 06, 2025 at 02:27:16PM +0100, Darrick J. Wong wrote: > On Thu, Feb 06, 2025 at 07:00:55PM +0000, da.gomez@kernel.org wrote: > > From: Daniel Gomez <da.gomez@samsung.com> > > > > In patch [1] ("bdev: use bdev_io_min() for statx block size"), block > > devices will now report their preferred minimum I/O size for optimal > > performance in the stx_blksize field of the statx data structure. This > > change updates the current default 4 KiB block size for all devices > > reporting a minimum I/O larger than 4 KiB, opting instead to query for > > its advertised minimum I/O value in the statx data struct. > > > > [1]: > > https://lore.kernel.org/all/20250204231209.429356-9-mcgrof@kernel.org/ > > This isn't even upstream yet... > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > --- > > Set MIN-IO from statx as the default filesystem fundamental block size. > > This ensures that, for devices reporting values within the supported > > XFS block size range, we do not incur in RMW. If the MIN-IO reported > > value is lower than the current default of 4 KiB, then 4 KiB will be > > used instead. > > I don't think this is a good idea -- assuming you mean the same MIN-IO > as what lsblk puts out: This is just about matching the values in code and documentation across all layers (to guarantee writes do no incur in RMW when possible and supported by the fs): minimum_io_size (block layer) -> stx_blksize (statx) -> lsblk MIN-IO (minimum I/ O size) -> Filesystem fundamental block size (mkfs.xfs -b size). * MIN-IO is the minimum I/O size in lsblk [1] which should be the queue-sysfs minimum_io_size [2] [3] ("This is the smallest preferred IO size reported by the device"). * From statx [4] manual (and kernel statx data struct description), stx_blksize is 'The "preferred" block size for efficient filesystem I/O (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)' [1] https://github.com/util-linux/util-linux/blob/master/misc-utils/lsblk.c#L199 [2] minimum_io_size: https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt [3] https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-block What: /sys/block/<disk>/queue/minimum_io_size Date: April 2009 Contact: Martin K. Petersen <martin.petersen@oracle.com> Description: [RO] Storage devices may report a granularity or preferred minimum I/O size which is the smallest request the device can perform without incurring a performance penalty. For disk drives this is often the physical block size. For RAID arrays it is often the stripe chunk size. A properly aligned multiple of minimum_io_size is the preferred request size for workloads where a high number of I/O operations is desired. [4] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man2/statx.2?id=master#n369 kernel: __u32 stx_blksize; /* Preferred general I/O size [uncond] */ > nvme1n1 512 > └─md0 524288 > └─node0.raid 524288 > └─node0_raid-storage 524288 Is the MIN-IO correctly reported in RAID arrays here? I guess it should match the stripe chunk size as per description above?
On Thu, Feb 06, 2025 at 08:30:22PM +0100, Christoph Hellwig wrote: > On Thu, Feb 06, 2025 at 07:00:55PM +0000, da.gomez@kernel.org wrote: > > From: Daniel Gomez <da.gomez@samsung.com> > > > > In patch [1] ("bdev: use bdev_io_min() for statx block size"), block > > devices will now report their preferred minimum I/O size for optimal > > performance in the stx_blksize field of the statx data structure. This > > change updates the current default 4 KiB block size for all devices > > reporting a minimum I/O larger than 4 KiB, opting instead to query for > > its advertised minimum I/O value in the statx data struct. > > UUuh, no. Larger block sizes have their use cases, but this will > regress performance for a lot (most?) common setups. A lot of > device report fairly high values there, but say increasing the Are these devices reporting the correct value? As I mentioned in my discussion with Darrick, matching the minimum_io_size with the "fs fundamental blocksize" actually allows to avoid RMW operations (when using the default path in mkfs.xfs and the value reported is within boundaries). > directory and bmap btree block size unconditionally using the current > algorithms will dramatically increase write amplification. Similarly I agree, but it seems to be a consequence of using such a large minimum_io_size. > for small buffered writes. > Exactly. Even though write amplification happens with a 512 byte minimum_io_size and a 4k default block size, it doesn't incur a performance penalty. But we will incur that when minimum_io_size exceeds 4k. So, this solves the issue but comes at the cost of write amplification.
On Fri, Feb 07, 2025 at 10:12:07AM +0100, Daniel Gomez wrote: > On Thu, Feb 06, 2025 at 02:50:28PM +0100, Luis Chamberlain wrote: > > On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > > > Now you've decreased the default blocksize to 512 on sda, and md0 gets > > > an impossible 512k blocksize. Also, disrupting the default 4k blocksize > > > will introduce portability problems with distros that aren't yet > > > shipping 6.12. > > > > Our default should be 4k, and to address the later we should sanity > > check and user an upper limit of what XFS supports, 64k. > > To clarify, the patch addresses the cases described above. It sets mkfs.xfs's > default block size to the device’s reported minimum I/O size (via stx_blksize), > clamping the value between 4k and 64k: if the reported size is less than 4k, 4k > is used, and if it exceeds 64k (XFS_MAX_BLOCKSIZE), 64k is applied. Ah, neat. Then the only other concern expresed by Darrick is using this for distros not yet shipping v6.12. Let me address that next in reply to another part of this thread. Luis
On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > NAME MIN-IO > sda 512 > ├─sda1 512 > ├─sda2 512 > │ └─node0.boot 512 > ├─sda3 512 > │ └─node0.swap 512 > └─sda4 512 > └─node0.lvm 512 > └─node0-root 512 > sdb 4096 > └─sdb1 4096 > nvme1n1 512 > └─md0 524288 > └─node0.raid 524288 > └─node0_raid-storage 524288 > nvme0n1 512 > └─md0 524288 > └─node0.raid 524288 > └─node0_raid-storage 524288 > nvme2n1 512 > └─md0 524288 > └─node0.raid 524288 > └─node0_raid-storage 524288 > nvme3n1 512 > └─md0 524288 > └─node0.raid 524288 > └─node0_raid-storage 524288 Can you try this for each of these: stat --print=%o I believe that without that new patch I posted [0] you will get 4 KiB here. Then the blocksize passed won't be the min-io until that patch gets applied. The above is: statx(AT_FDCWD, "/dev/nvme0n1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, 0, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFBLK|0660, stx_size=0, ...}) = 0 So if we use this instead at mkfs, then even older kernels will get 4 KiB, and if distros want to automatically lift the value at mkfs, they could cherry pick that simple patch. [0] https://lkml.kernel.org/r/20250204231209.429356-9-mcgrof@kernel.org Luis
On Fri, Feb 07, 2025 at 11:26:20AM -0800, Luis Chamberlain wrote: > On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > > NAME MIN-IO > > sda 512 > > ├─sda1 512 > > ├─sda2 512 > > │ └─node0.boot 512 > > ├─sda3 512 > > │ └─node0.swap 512 > > └─sda4 512 > > └─node0.lvm 512 > > └─node0-root 512 > > sdb 4096 > > └─sdb1 4096 > > nvme1n1 512 > > └─md0 524288 > > └─node0.raid 524288 > > └─node0_raid-storage 524288 > > nvme0n1 512 > > └─md0 524288 > > └─node0.raid 524288 > > └─node0_raid-storage 524288 > > nvme2n1 512 > > └─md0 524288 > > └─node0.raid 524288 > > └─node0_raid-storage 524288 > > nvme3n1 512 > > └─md0 524288 > > └─node0.raid 524288 > > └─node0_raid-storage 524288 > > Can you try this for each of these: > > stat --print=%o > > I believe that without that new patch I posted [0] you will get 4 KiB > here. Then the blocksize passed won't be the min-io until that patch > gets applied. Yes, that returns 4K on 6.13.0 for every device in the list. I think you're saying that stat will start returning 512K for the blocksize if your patch is merged? > The above is: > > statx(AT_FDCWD, "/dev/nvme0n1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, 0, > {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, > stx_mode=S_IFBLK|0660, stx_size=0, ...}) = 0 > > So if we use this instead at mkfs, then even older kernels will get 4 > KiB, and if distros want to automatically lift the value at mkfs, they > could cherry pick that simple patch. How well does that work if the gold master image creator machine has a new kernel and a RAID setup, but the kernel written into the gold master image is something older than a 6.12 kernel? --D > > [0] https://lkml.kernel.org/r/20250204231209.429356-9-mcgrof@kernel.org > > Luis
On Fri, Feb 07, 2025 at 11:31:17AM -0800, Darrick J. Wong wrote: > On Fri, Feb 07, 2025 at 11:26:20AM -0800, Luis Chamberlain wrote: > > On Thu, Feb 06, 2025 at 02:27:16PM -0800, Darrick J. Wong wrote: > > > NAME MIN-IO > > > sda 512 > > > ├─sda1 512 > > > ├─sda2 512 > > > │ └─node0.boot 512 > > > ├─sda3 512 > > > │ └─node0.swap 512 > > > └─sda4 512 > > > └─node0.lvm 512 > > > └─node0-root 512 > > > sdb 4096 > > > └─sdb1 4096 > > > nvme1n1 512 > > > └─md0 524288 > > > └─node0.raid 524288 > > > └─node0_raid-storage 524288 > > > nvme0n1 512 > > > └─md0 524288 > > > └─node0.raid 524288 > > > └─node0_raid-storage 524288 > > > nvme2n1 512 > > > └─md0 524288 > > > └─node0.raid 524288 > > > └─node0_raid-storage 524288 > > > nvme3n1 512 > > > └─md0 524288 > > > └─node0.raid 524288 > > > └─node0_raid-storage 524288 > > > > Can you try this for each of these: > > > > stat --print=%o > > > > I believe that without that new patch I posted [0] you will get 4 KiB > > here. Then the blocksize passed won't be the min-io until that patch > > gets applied. > > Yes, that returns 4K on 6.13.0 for every device in the list. I think > you're saying that stat will start returning 512K for the blocksize if > your patch is merged? Yes, as that *is* min-io for block devices. > > The above is: > > > > statx(AT_FDCWD, "/dev/nvme0n1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, 0, > > {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, > > stx_mode=S_IFBLK|0660, stx_size=0, ...}) = 0 > > > > So if we use this instead at mkfs, then even older kernels will get 4 > > KiB, and if distros want to automatically lift the value at mkfs, they > > could cherry pick that simple patch. > > How well does that work if the gold master image creator machine has a > new kernel and a RAID setup, but the kernel written into the gold master > image is something older than a 6.12 kernel? I think you're asking about an image creator for an old release and that old release uses an old kernel. Isn't that the same concern as what if the same new kernel creates a filesystem on an image for an old release? If so the risk is the new kernel defaults take precedence. And the only thing I can think of for this case an option to ignore this heuristic for blocksize, which could be used on configs to override defaults for older target kernels. Luis
On Fri, Feb 07, 2025 at 11:04:06AM +0100, Daniel Gomez wrote: > > > performance in the stx_blksize field of the statx data structure. This > > > change updates the current default 4 KiB block size for all devices > > > reporting a minimum I/O larger than 4 KiB, opting instead to query for > > > its advertised minimum I/O value in the statx data struct. > > > > UUuh, no. Larger block sizes have their use cases, but this will > > regress performance for a lot (most?) common setups. A lot of > > device report fairly high values there, but say increasing the > > Are these devices reporting the correct value? Who defines what "correct" means to start with? > As I mentioned in my > discussion with Darrick, matching the minimum_io_size with the "fs > fundamental blocksize" actually allows to avoid RMW operations (when > using the default path in mkfs.xfs and the value reported is within > boundaries). At least for buffered I/O it does not remove RMW operations at all, it just moves them up. And for plenty devices the min_io size might be set to larger than LBA size, but the device is still reasonably efficient at handling smaller I/O (e.g. because it has power loss protection and stages writes in powerfail protected memory).
On Wed, Feb 12, 2025 at 08:10:04PM +0100, Christoph Hellwig wrote: > On Fri, Feb 07, 2025 at 11:04:06AM +0100, Daniel Gomez wrote: > > > > performance in the stx_blksize field of the statx data structure. This > > > > change updates the current default 4 KiB block size for all devices > > > > reporting a minimum I/O larger than 4 KiB, opting instead to query for > > > > its advertised minimum I/O value in the statx data struct. > > > > > > UUuh, no. Larger block sizes have their use cases, but this will > > > regress performance for a lot (most?) common setups. A lot of > > > device report fairly high values there, but say increasing the > > > > Are these devices reporting the correct value? > > Who defines what "correct" means to start with? That's a good question. The stx_blksize field description indicates the value should be referring to the fs block size that avoids RMW. * From statx manual: DESCRIPTION struct statx { __u32 stx_blksize; /* Block size for filesystem I/O */ {...} The returned information stx_blksize The "preferred" block size for efficient filesystem I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.) * From include/uapi/linux/stat.h: struct statx { /* Preferred general I/O size [uncond] */ __u32 stx_blksize; So I think, if devices report high values in stx_blksize, it is either because smaller values than the reported one cause RMW or they are incorrectly reporting a value in the wrong statx field. The commit I refer in the commit message maps the minimum_io_size reported by the block layer with stx_blksize. * Documentation/ABI/stable/sysfs-block What: /sys/block/<disk>/queue/minimum_io_size Date: April 2009 Contact: Martin K. Petersen <martin.petersen@oracle.com> Description: [RO] Storage devices may report a granularity or preferred minimum I/O size which is the smallest request the device can perform without incurring a performance penalty. For disk drives this is often the physical block size. For RAID arrays it is often the stripe chunk size. A properly aligned multiple of minimum_io_size is the preferred request size for workloads where a high number of I/O operations is desired. I guess the correct approach is to ensure the mapping only occurs for "disk drives" and we avoid mapping RAID strip chunk size to stx_blksize. In addition, we also have the optimal I/O size: What: /sys/block/<disk>/queue/optimal_io_size Date: April 2009 Contact: Martin K. Petersen <martin.petersen@oracle.com> Description: [RO] Storage devices may report an optimal I/O size, which is the device's preferred unit for sustained I/O. This is rarely reported for disk drives. For RAID arrays it is usually the stripe width or the internal track size. A properly aligned multiple of optimal_io_size is the preferred request size for workloads where sustained throughput is desired. If no optimal I/O size is reported this file contains 0. Optimal I/O is not preferred I/O (minimum_io_size). However, I think xfs mount option mixes both values: * From Documentation/admin-guide/xfs.rst Mount Options ============= largeio or nolargeio (default) If ``nolargeio`` is specified, the optimal I/O reported in ``st_blksize`` by **stat(2)** will be as small as possible to allow user applications to avoid inefficient read/modify/write I/O. This is typically the page size of the machine, as this is the granularity of the page cache. But it must be referring to the minimum_io_size as this is the limit "to avoid inefficient read/modify/write I/O" as per sysfs-block minimum_io_size description, right? > > > As I mentioned in my > > discussion with Darrick, matching the minimum_io_size with the "fs > > fundamental blocksize" actually allows to avoid RMW operations (when > > using the default path in mkfs.xfs and the value reported is within > > boundaries). > > At least for buffered I/O it does not remove RMW operations at all, > it just moves them up. But only if the RMW boundary is smaller than the fs bs. If they are matched, it should not move them up. > > And for plenty devices the min_io size might be set to larger than > LBA size, Yes, I agree. > but the device is still reasonably efficient at handling > smaller I/O (e.g. because it has power loss protection and stages That is device specific and covered in the stx_blksize and minimum_io_size descriptions. > writes in powerfail protected memory). Assuming min_io is lsblk MIN-IO reported field, this is the block minimum_io_size. * From util-linux code: case COL_MINIO: ul_path_read_string(dev->sysfs, &str, "queue/minimum_io_size"); if (rawdata) str2u64(str, rawdata); break; I think this might be a bit trickier. The value should report the disk "preferred I/O minimum granularity" and "the stripe chunk size" for RAID arrays.
On Thu, Feb 13, 2025 at 02:26:37PM +0100, Daniel Gomez wrote: > That's a good question. The stx_blksize field description indicates the value > should be referring to the fs block size that avoids RMW. One that is optimal, the RMW is an example. This what Posix says: blksize_t st_blksize A file system-specific preferred I/O block size for this object. In some file system types, this may vary from file to file. > So I think, if devices report high values in stx_blksize, it is either because > smaller values than the reported one cause RMW or they are incorrectly reporting > a value in the wrong statx field. Or it is just more efficient. E.g. on NFS or XFS you'll get fairly big values. > The commit I refer in the commit message maps the minimum_io_size reported by > the block layer with stx_blksize. Yes, and that was my question again - minimum_io_size for block devices is specified even more handwavy. In other words I'm really sceptical this is going to be a net win. To win me over you'll need to show a improvement over a wide variety of devіces and workloads.
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index bbd0dbb6c80ab63ebf9edbe0a9a304149770f89d..e17389622682bc23f9b20c207db7e22181e2fe6f 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2178,6 +2178,26 @@ _("block size %d cannot be smaller than sector size %d\n"), } } +void +get_dev_blocksize( + struct cli_params *cli, + struct mkfs_default_params *dft) +{ + struct statx stx; + int ret; + + if (!cli->xi->data.name) + return; + + ret = statx(AT_FDCWD, cli->xi->data.name, + AT_STATX_DONT_SYNC | AT_NO_AUTOMOUNT, + STATX_DIOALIGN | STATX_TYPE | STATX_MODE | STATX_INO, &stx); + if (!ret) + dft->blocksize = + min(max(1 << XFS_DFL_BLOCKSIZE_LOG, stx.stx_blksize), + XFS_MAX_BLOCKSIZE); +} + static void validate_blocksize( struct mkfs_params *cfg, @@ -2189,6 +2209,7 @@ validate_blocksize( * For RAID4/5/6 we want to align sector size and block size, * so we need to start with the device geometry extraction too. */ + get_dev_blocksize(cli, dft); if (!cli->blocksize) cfg->blocksize = dft->blocksize; else