Message ID | 20250127152236.612108-1-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | loop: take the file system minimum dio alignment into account | expand |
On 1/28/25 00:22, Christoph Hellwig wrote: > The loop driver currently uses the logical block size of the underlying > bdev as the lower bound of the loop device block size. While this works > for many cases, it fails for file systems made up of multiple devices > with different logic block size (e.g. XFS with a RT device that has a > larger logical block size), or when the file systems doesn't support > direct I/O writes at the sector size granularity (e.g. because it does > out of place writes with a file system block size larger than the sector > size). > > Fix this by querying the minimum direct I/O alignment from statx when > available. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/block/loop.c | 58 ++++++++++++++++++++++++-------------------- > 1 file changed, 32 insertions(+), 26 deletions(-) [...] > +static void loop_query_min_dio_size(struct loop_device *lo) > +{ > + struct file *file = lo->lo_backing_file; > + struct block_device *sb_bdev = file->f_mapping->host->i_sb->s_bdev; > + struct kstat st; > + > + if (!vfs_getattr(&file->f_path, &st, STATX_DIOALIGN, 0) && > + (st.result_mask & STATX_DIOALIGN)) > + lo->lo_min_dio_size = st.dio_offset_align; Nit: Maybe return here to avoid the else below and the comment block in the middle of a if-else-if ? > + /* > + * In a perfect world this wouldn't be needed, but as of Linux 6.13 only > + * a handful of file systems support the STATX_DIOALIGN flag. > + */ > + else if (sb_bdev) > + lo->lo_min_dio_size = bdev_logical_block_size(sb_bdev); > + else > + lo->lo_min_dio_size = SECTOR_SIZE; > +} [...] > -static unsigned int loop_default_blocksize(struct loop_device *lo, > - struct block_device *backing_bdev) > +static unsigned int loop_default_blocksize(struct loop_device *lo) > { > - /* In case of direct I/O, match underlying block size */ > - if ((lo->lo_backing_file->f_flags & O_DIRECT) && backing_bdev) > - return bdev_logical_block_size(backing_bdev); > + /* In case of direct I/O, match underlying minimum I/O size */ > + if (lo->lo_backing_file->f_flags & O_DIRECT) > + return lo->lo_min_dio_size; I wonder if conditionally returning lo_min_dio_size only for O_DIRECT is correct. What if the loop dev is first started without direct IO and gets a block size of 512B, and then later restarted with direct IO and now gets a block size of say 4K ? That could break a file system that is already on the device and formatted using the initial smaller block size. So shouldn't we simply always return lo_min_dio_size here, regardless of if O_DIRECT is used or not ? > return SECTOR_SIZE; > } > > @@ -993,7 +998,7 @@ static void loop_update_limits(struct loop_device *lo, struct queue_limits *lim, > backing_bdev = inode->i_sb->s_bdev; > > if (!bsize) > - bsize = loop_default_blocksize(lo, backing_bdev); > + bsize = loop_default_blocksize(lo); > > loop_get_discard_config(lo, &granularity, &max_discard_sectors); > > @@ -1090,6 +1095,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode, > lo->lo_backing_file = file; > lo->old_gfp_mask = mapping_gfp_mask(mapping); > mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS)); > + loop_query_min_dio_size(lo); > > lim = queue_limits_start_update(lo->lo_queue); > loop_update_limits(lo, &lim, config->block_size);
On Thu, Jan 30, 2025 at 02:49:38PM +0900, Damien Le Moal wrote: > > +static void loop_query_min_dio_size(struct loop_device *lo) > > +{ > > + struct file *file = lo->lo_backing_file; > > + struct block_device *sb_bdev = file->f_mapping->host->i_sb->s_bdev; > > + struct kstat st; > > + > > + if (!vfs_getattr(&file->f_path, &st, STATX_DIOALIGN, 0) && > > + (st.result_mask & STATX_DIOALIGN)) > > + lo->lo_min_dio_size = st.dio_offset_align; > > Nit: Maybe return here to avoid the else below and the comment block in the > middle of a if-else-if ? That looks kinda weird. But maybe I can just return the value instead and let the caller assign it, which would allow an early return without multi-line blocks. > > -static unsigned int loop_default_blocksize(struct loop_device *lo, > > - struct block_device *backing_bdev) > > +static unsigned int loop_default_blocksize(struct loop_device *lo) > > { > > - /* In case of direct I/O, match underlying block size */ > > - if ((lo->lo_backing_file->f_flags & O_DIRECT) && backing_bdev) > > - return bdev_logical_block_size(backing_bdev); > > + /* In case of direct I/O, match underlying minimum I/O size */ > > + if (lo->lo_backing_file->f_flags & O_DIRECT) > > + return lo->lo_min_dio_size; > > I wonder if conditionally returning lo_min_dio_size only for O_DIRECT is > correct. What if the loop dev is first started without direct IO and gets a > block size of 512B, and then later restarted with direct IO and now gets a block > size of say 4K ? That could break a file system that is already on the device > and formatted using the initial smaller block size. So shouldn't we simply > always return lo_min_dio_size here, regardless of if O_DIRECT is used or not ? Yes, that's not handled correctly here or in the existing code. I'll add a patch before this one to fix that up.
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index d1f1d6bef2e6..84b810b0f278 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -54,7 +54,8 @@ struct loop_device { int lo_flags; char lo_file_name[LO_NAME_SIZE]; - struct file * lo_backing_file; + struct file *lo_backing_file; + unsigned int lo_min_dio_size; struct block_device *lo_device; gfp_t old_gfp_mask; @@ -169,29 +170,14 @@ static loff_t get_loop_size(struct loop_device *lo, struct file *file) * of backing device, and the logical block size of loop is bigger than that of * the backing device. */ -static bool lo_bdev_can_use_dio(struct loop_device *lo, - struct block_device *backing_bdev) -{ - unsigned int sb_bsize = bdev_logical_block_size(backing_bdev); - - if (queue_logical_block_size(lo->lo_queue) < sb_bsize) - return false; - if (lo->lo_offset & (sb_bsize - 1)) - return false; - return true; -} - static bool lo_can_use_dio(struct loop_device *lo) { - struct inode *inode = lo->lo_backing_file->f_mapping->host; - if (!(lo->lo_backing_file->f_mode & FMODE_CAN_ODIRECT)) return false; - - if (S_ISBLK(inode->i_mode)) - return lo_bdev_can_use_dio(lo, I_BDEV(inode)); - if (inode->i_sb->s_bdev) - return lo_bdev_can_use_dio(lo, inode->i_sb->s_bdev); + if (queue_logical_block_size(lo->lo_queue) < lo->lo_min_dio_size) + return false; + if (lo->lo_offset & (lo->lo_min_dio_size - 1)) + return false; return true; } @@ -541,6 +527,25 @@ static void loop_reread_partitions(struct loop_device *lo) __func__, lo->lo_number, lo->lo_file_name, rc); } +static void loop_query_min_dio_size(struct loop_device *lo) +{ + struct file *file = lo->lo_backing_file; + struct block_device *sb_bdev = file->f_mapping->host->i_sb->s_bdev; + struct kstat st; + + if (!vfs_getattr(&file->f_path, &st, STATX_DIOALIGN, 0) && + (st.result_mask & STATX_DIOALIGN)) + lo->lo_min_dio_size = st.dio_offset_align; + /* + * In a perfect world this wouldn't be needed, but as of Linux 6.13 only + * a handful of file systems support the STATX_DIOALIGN flag. + */ + else if (sb_bdev) + lo->lo_min_dio_size = bdev_logical_block_size(sb_bdev); + else + lo->lo_min_dio_size = SECTOR_SIZE; +} + static inline int is_loop_device(struct file *file) { struct inode *i = file->f_mapping->host; @@ -629,6 +634,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, lo->old_gfp_mask = mapping_gfp_mask(file->f_mapping); mapping_set_gfp_mask(file->f_mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS)); + loop_query_min_dio_size(lo); loop_update_dio(lo); blk_mq_unfreeze_queue(lo->lo_queue); partscan = lo->lo_flags & LO_FLAGS_PARTSCAN; @@ -970,12 +976,11 @@ loop_set_status_from_info(struct loop_device *lo, return 0; } -static unsigned int loop_default_blocksize(struct loop_device *lo, - struct block_device *backing_bdev) +static unsigned int loop_default_blocksize(struct loop_device *lo) { - /* In case of direct I/O, match underlying block size */ - if ((lo->lo_backing_file->f_flags & O_DIRECT) && backing_bdev) - return bdev_logical_block_size(backing_bdev); + /* In case of direct I/O, match underlying minimum I/O size */ + if (lo->lo_backing_file->f_flags & O_DIRECT) + return lo->lo_min_dio_size; return SECTOR_SIZE; } @@ -993,7 +998,7 @@ static void loop_update_limits(struct loop_device *lo, struct queue_limits *lim, backing_bdev = inode->i_sb->s_bdev; if (!bsize) - bsize = loop_default_blocksize(lo, backing_bdev); + bsize = loop_default_blocksize(lo); loop_get_discard_config(lo, &granularity, &max_discard_sectors); @@ -1090,6 +1095,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode, lo->lo_backing_file = file; lo->old_gfp_mask = mapping_gfp_mask(mapping); mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS)); + loop_query_min_dio_size(lo); lim = queue_limits_start_update(lo->lo_queue); loop_update_limits(lo, &lim, config->block_size);
The loop driver currently uses the logical block size of the underlying bdev as the lower bound of the loop device block size. While this works for many cases, it fails for file systems made up of multiple devices with different logic block size (e.g. XFS with a RT device that has a larger logical block size), or when the file systems doesn't support direct I/O writes at the sector size granularity (e.g. because it does out of place writes with a file system block size larger than the sector size). Fix this by querying the minimum direct I/O alignment from statx when available. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/block/loop.c | 58 ++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 26 deletions(-)