diff mbox series

bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()

Message ID 20250305015301.1610092-1-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series bdev: add back PAGE_SIZE block size validation for sb_set_blocksize() | expand

Commit Message

Luis Chamberlain March 5, 2025, 1:53 a.m. UTC
The commit titled "block/bdev: lift block size restrictions to 64k"
lifted the block layer's max supported block size to 64k inside the
helper blk_validate_block_size() now that we support large folios.
However in lifting the block size we also removed the silly use
cases many filesystems have to use sb_set_blocksize() to *verify*
that the block size < PAGE_SIZE. The call to sb_set_blocksize() can
happen in-kernel given mkfs utilities *can* create for example an
ext4 32k block size filesystem on x86_64, the issue we want to prevent
is mounting it on x86_64 unless the filesystem supports LBS.

While, we could argue that such checks should be filesystem specific,
there are much more users of sb_set_blocksize() than LBS enabled
filesystem on linux-next, so just do the easier thing and bring back
the PAGE_SIZE check for sb_set_blocksize() users.

This will ensure that tests such as generic/466 when run in a loop
against say, ext4, won't try to try to actually mount a filesystem with
a block size larger than your filesystem supports given your PAGE_SIZE
and in the worst case crash.

Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---

Christian, a small fixup for a crash when running generic/466 on ext4
in a loop. The issue is obvious, and we just need to ensure we don't
break old filesystem expectations of sb_set_blocksize().

This still allows XFS with 32k block size and I even tested with XFS
with 32k block size and a 32k sector size set.

 block/bdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox March 5, 2025, 6:04 a.m. UTC | #1
On Tue, Mar 04, 2025 at 05:53:01PM -0800, Luis Chamberlain wrote:
> The commit titled "block/bdev: lift block size restrictions to 64k"
> lifted the block layer's max supported block size to 64k inside the
> helper blk_validate_block_size() now that we support large folios.
> However in lifting the block size we also removed the silly use
> cases many filesystems have to use sb_set_blocksize() to *verify*
> that the block size < PAGE_SIZE. The call to sb_set_blocksize() can
> happen in-kernel given mkfs utilities *can* create for example an
> ext4 32k block size filesystem on x86_64, the issue we want to prevent
> is mounting it on x86_64 unless the filesystem supports LBS.
> 
> While, we could argue that such checks should be filesystem specific,
> there are much more users of sb_set_blocksize() than LBS enabled
> filesystem on linux-next, so just do the easier thing and bring back
> the PAGE_SIZE check for sb_set_blocksize() users.
> 
> This will ensure that tests such as generic/466 when run in a loop
> against say, ext4, won't try to try to actually mount a filesystem with
> a block size larger than your filesystem supports given your PAGE_SIZE
> and in the worst case crash.

So this is expedient because XFS happens to not call sb_set_blocksize()?
What is the path forward for filesystems which call sb_set_blocksize()
today and want to support LBS in future?
Darrick J. Wong March 5, 2025, 6:33 a.m. UTC | #2
On Wed, Mar 05, 2025 at 06:04:21AM +0000, Matthew Wilcox wrote:
> On Tue, Mar 04, 2025 at 05:53:01PM -0800, Luis Chamberlain wrote:
> > The commit titled "block/bdev: lift block size restrictions to 64k"
> > lifted the block layer's max supported block size to 64k inside the
> > helper blk_validate_block_size() now that we support large folios.
> > However in lifting the block size we also removed the silly use
> > cases many filesystems have to use sb_set_blocksize() to *verify*
> > that the block size < PAGE_SIZE. The call to sb_set_blocksize() can
> > happen in-kernel given mkfs utilities *can* create for example an
> > ext4 32k block size filesystem on x86_64, the issue we want to prevent
> > is mounting it on x86_64 unless the filesystem supports LBS.
> > 
> > While, we could argue that such checks should be filesystem specific,
> > there are much more users of sb_set_blocksize() than LBS enabled
> > filesystem on linux-next, so just do the easier thing and bring back
> > the PAGE_SIZE check for sb_set_blocksize() users.
> > 
> > This will ensure that tests such as generic/466 when run in a loop
> > against say, ext4, won't try to try to actually mount a filesystem with
> > a block size larger than your filesystem supports given your PAGE_SIZE
> > and in the worst case crash.
> 
> So this is expedient because XFS happens to not call sb_set_blocksize()?
> What is the path forward for filesystems which call sb_set_blocksize()
> today and want to support LBS in future?

Well they /could/ set sb_blocksize/sb_blocksize_bits themselves, like
XFS does.

Luis: Is the bsize > PAGE_SIZE constraint in set_blocksize go away?
IOWs, will xfs support sector sizes > 4k in the near future?

--D
Hannes Reinecke March 5, 2025, 7:18 a.m. UTC | #3
On 3/5/25 02:53, Luis Chamberlain wrote:
> The commit titled "block/bdev: lift block size restrictions to 64k"
> lifted the block layer's max supported block size to 64k inside the
> helper blk_validate_block_size() now that we support large folios.
> However in lifting the block size we also removed the silly use
> cases many filesystems have to use sb_set_blocksize() to *verify*
> that the block size < PAGE_SIZE. The call to sb_set_blocksize() can
> happen in-kernel given mkfs utilities *can* create for example an
> ext4 32k block size filesystem on x86_64, the issue we want to prevent
> is mounting it on x86_64 unless the filesystem supports LBS.
> 
> While, we could argue that such checks should be filesystem specific,
> there are much more users of sb_set_blocksize() than LBS enabled
> filesystem on linux-next, so just do the easier thing and bring back
> the PAGE_SIZE check for sb_set_blocksize() users.
> 
> This will ensure that tests such as generic/466 when run in a loop
> against say, ext4, won't try to try to actually mount a filesystem with
> a block size larger than your filesystem supports given your PAGE_SIZE
> and in the worst case crash.
> 
> Cc: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> Christian, a small fixup for a crash when running generic/466 on ext4
> in a loop. The issue is obvious, and we just need to ensure we don't
> break old filesystem expectations of sb_set_blocksize().
> 
> This still allows XFS with 32k block size and I even tested with XFS
> with 32k block size and a 32k sector size set.
> 
>   block/bdev.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 3bd948e6438d..de9ebc3e5d15 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -181,7 +181,7 @@ EXPORT_SYMBOL(set_blocksize);
>   
>   int sb_set_blocksize(struct super_block *sb, int size)
>   {
> -	if (set_blocksize(sb->s_bdev_file, size))
> +	if (size > PAGE_SIZE || set_blocksize(sb->s_bdev_file, size))
>   		return 0;
>   	/* If we get here, we know size is validated */
>   	sb->s_blocksize = size;

Can you add a comment stating why it's needed, even with LBS?
It's kinda non-obious, and we don't want to repeat the mistake
in the future.

Cheers,

Hannes
Matthew Wilcox March 5, 2025, 2:15 p.m. UTC | #4
On Tue, Mar 04, 2025 at 10:33:30PM -0800, Darrick J. Wong wrote:
> > So this is expedient because XFS happens to not call sb_set_blocksize()?
> > What is the path forward for filesystems which call sb_set_blocksize()
> > today and want to support LBS in future?
> 
> Well they /could/ set sb_blocksize/sb_blocksize_bits themselves, like
> XFS does.

I'm kind of hoping that isn't the answer.

> Luis: Is the bsize > PAGE_SIZE constraint in set_blocksize go away?
> IOWs, will xfs support sector sizes > 4k in the near future?

Already there in linux-next.  47dd67532303 in next-20250304.
Luis Chamberlain March 5, 2025, 5:04 p.m. UTC | #5
On Wed, Mar 05, 2025 at 02:15:47PM +0000, Matthew Wilcox wrote:
> On Tue, Mar 04, 2025 at 10:33:30PM -0800, Darrick J. Wong wrote:
> > > So this is expedient because XFS happens to not call sb_set_blocksize()?
> > > What is the path forward for filesystems which call sb_set_blocksize()
> > > today and want to support LBS in future?
> > 
> > Well they /could/ set sb_blocksize/sb_blocksize_bits themselves, like
> > XFS does.
> 
> I'm kind of hoping that isn't the answer.

set_blocksize() can be used. The only extra steps the filesystem needs
to in addition is:

	sb->s_blocksize = size;
	sb->s_blocksize_bits = blksize_bits(size);

Which is what both XFS and bcachefs do.

We could modify sb to add an LBS flag but that alone would not suffice
either as the upper limit is still a filesystem specific limit. Additionally
it also does not suffice for filesystems that support a different device
for metadata writes, for instance XFS supports this and uses the sector
size for set_blocksize().

So I think that if ext4 for example wants to use LBS then simply it
would open code the above two lines and use set_blocksize(). Let me know
if you have any other recommendations.

  Luis
Luis Chamberlain March 5, 2025, 5:05 p.m. UTC | #6
On Wed, Mar 05, 2025 at 08:18:29AM +0100, Hannes Reinecke wrote:
> On 3/5/25 02:53, Luis Chamberlain wrote:
> > The commit titled "block/bdev: lift block size restrictions to 64k"
> > lifted the block layer's max supported block size to 64k inside the
> > helper blk_validate_block_size() now that we support large folios.
> > However in lifting the block size we also removed the silly use
> > cases many filesystems have to use sb_set_blocksize() to *verify*
> > that the block size < PAGE_SIZE. The call to sb_set_blocksize() can
> > happen in-kernel given mkfs utilities *can* create for example an
> > ext4 32k block size filesystem on x86_64, the issue we want to prevent
> > is mounting it on x86_64 unless the filesystem supports LBS.
> > 
> > While, we could argue that such checks should be filesystem specific,
> > there are much more users of sb_set_blocksize() than LBS enabled
> > filesystem on linux-next, so just do the easier thing and bring back
> > the PAGE_SIZE check for sb_set_blocksize() users.
> > 
> > This will ensure that tests such as generic/466 when run in a loop
> > against say, ext4, won't try to try to actually mount a filesystem with
> > a block size larger than your filesystem supports given your PAGE_SIZE
> > and in the worst case crash.
> > 
> > Cc: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> > 
> > Christian, a small fixup for a crash when running generic/466 on ext4
> > in a loop. The issue is obvious, and we just need to ensure we don't
> > break old filesystem expectations of sb_set_blocksize().
> > 
> > This still allows XFS with 32k block size and I even tested with XFS
> > with 32k block size and a 32k sector size set.
> > 
> >   block/bdev.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/bdev.c b/block/bdev.c
> > index 3bd948e6438d..de9ebc3e5d15 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -181,7 +181,7 @@ EXPORT_SYMBOL(set_blocksize);
> >   int sb_set_blocksize(struct super_block *sb, int size)
> >   {
> > -	if (set_blocksize(sb->s_bdev_file, size))
> > +	if (size > PAGE_SIZE || set_blocksize(sb->s_bdev_file, size))
> >   		return 0;
> >   	/* If we get here, we know size is validated */
> >   	sb->s_blocksize = size;
> 
> Can you add a comment stating why it's needed, even with LBS?
> It's kinda non-obious, and we don't want to repeat the mistake
> in the future.

Sure.

 Luis
Darrick J. Wong March 5, 2025, 6:48 p.m. UTC | #7
On Wed, Mar 05, 2025 at 09:04:50AM -0800, Luis Chamberlain wrote:
> On Wed, Mar 05, 2025 at 02:15:47PM +0000, Matthew Wilcox wrote:
> > On Tue, Mar 04, 2025 at 10:33:30PM -0800, Darrick J. Wong wrote:
> > > > So this is expedient because XFS happens to not call sb_set_blocksize()?
> > > > What is the path forward for filesystems which call sb_set_blocksize()
> > > > today and want to support LBS in future?
> > > 
> > > Well they /could/ set sb_blocksize/sb_blocksize_bits themselves, like
> > > XFS does.
> > 
> > I'm kind of hoping that isn't the answer.
> 
> set_blocksize() can be used. The only extra steps the filesystem needs
> to in addition is:
> 
> 	sb->s_blocksize = size;
> 	sb->s_blocksize_bits = blksize_bits(size);
> 
> Which is what both XFS and bcachefs do.
> 
> We could modify sb to add an LBS flag but that alone would not suffice
> either as the upper limit is still a filesystem specific limit. Additionally
> it also does not suffice for filesystems that support a different device
> for metadata writes, for instance XFS supports this and uses the sector
> size for set_blocksize().
> 
> So I think that if ext4 for example wants to use LBS then simply it
> would open code the above two lines and use set_blocksize(). Let me know
> if you have any other recommendations.

int sb_set_large_blocksize(struct super_block *sb, int size)
{
	if (set_blocksize(sb->s_bdev_file, size))
		return 0;
	sb->s_blocksize = size;
	sb->s_blocksize_bits = blksize_bits(size);
	return sb->s_blocksize;
}
EXPORT_SYMBOL_GPL(sb_set_large_blocksize);

int sb_set_blocksize(struct super_block *sb, int size)
{
	if (size > PAGE_SIZE)
		return 0;
	return sb_set_large_blocksize(sb, size);
}
EXPORT_SYMBOL(sb_set_blocksize);

Though you'll note that this doesn't help XFS, or any other filesystem
where the bdev block size isn't set to the fs block size.  But xfs can
just be weird on its own like always. ;)

--D

> 
>   Luis
Luis Chamberlain March 5, 2025, 7:55 p.m. UTC | #8
On Wed, Mar 05, 2025 at 10:48:34AM -0800, Darrick J. Wong wrote:
> On Wed, Mar 05, 2025 at 09:04:50AM -0800, Luis Chamberlain wrote:
> > On Wed, Mar 05, 2025 at 02:15:47PM +0000, Matthew Wilcox wrote:
> > > On Tue, Mar 04, 2025 at 10:33:30PM -0800, Darrick J. Wong wrote:
> > > > > So this is expedient because XFS happens to not call sb_set_blocksize()?
> > > > > What is the path forward for filesystems which call sb_set_blocksize()
> > > > > today and want to support LBS in future?
> > > > 
> > > > Well they /could/ set sb_blocksize/sb_blocksize_bits themselves, like
> > > > XFS does.
> > > 
> > > I'm kind of hoping that isn't the answer.
> > 
> > set_blocksize() can be used. The only extra steps the filesystem needs
> > to in addition is:
> > 
> > 	sb->s_blocksize = size;
> > 	sb->s_blocksize_bits = blksize_bits(size);
> > 
> > Which is what both XFS and bcachefs do.
> > 
> > We could modify sb to add an LBS flag but that alone would not suffice
> > either as the upper limit is still a filesystem specific limit. Additionally
> > it also does not suffice for filesystems that support a different device
> > for metadata writes, for instance XFS supports this and uses the sector
> > size for set_blocksize().
> > 
> > So I think that if ext4 for example wants to use LBS then simply it
> > would open code the above two lines and use set_blocksize(). Let me know
> > if you have any other recommendations.
> 
> int sb_set_large_blocksize(struct super_block *sb, int size)
> {
> 	if (set_blocksize(sb->s_bdev_file, size))
> 		return 0;
> 	sb->s_blocksize = size;
> 	sb->s_blocksize_bits = blksize_bits(size);
> 	return sb->s_blocksize;
> }
> EXPORT_SYMBOL_GPL(sb_set_large_blocksize);
> 
> int sb_set_blocksize(struct super_block *sb, int size)
> {
> 	if (size > PAGE_SIZE)
> 		return 0;
> 	return sb_set_large_blocksize(sb, size);
> }
> EXPORT_SYMBOL(sb_set_blocksize);
> 
> Though you'll note that this doesn't help XFS, or any other filesystem
> where the bdev block size isn't set to the fs block size.  But xfs can
> just be weird on its own like always. ;)

Actually, I failed to also notice XFS implicitly calls sb_set_large_blocksize()
through:

xfs_fs_get_tree() -->
  get_tree_bdev() -->
    get_tree_bdev_flags() -->
      get_tree_bdev_flags() -->
        sb_set_blocksize()

We just don't care if sb_set_blocksize() if fails inside get_tree_bdev_flags().
To be clear this has been the case since we added and tested LBS on XFS.

So if we wanted something more automatic, how about:

diff --git a/block/bdev.c b/block/bdev.c
index 3bd948e6438d..4844d1e27b6f 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -181,6 +181,8 @@ EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
+	if (!(sb->s_type->fs_flags & FS_LBS) && size > PAGE_SIZE)
+		return 0;
 	if (set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is validated */
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 90ade8f648d9..e99e378d68ea 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -2396,7 +2396,7 @@ static struct file_system_type bcache_fs_type = {
 	.name			= "bcachefs",
 	.init_fs_context	= bch2_init_fs_context,
 	.kill_sb		= bch2_kill_sb,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_LBS,
 };
 
 MODULE_ALIAS_FS("bcachefs");
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d92d7a07ea89..3d8b80165d48 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -2118,7 +2118,8 @@ static struct file_system_type xfs_fs_type = {
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
 	.kill_sb		= xfs_kill_sb,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
+				  FS_LBS,
 };
 MODULE_ALIAS_FS("xfs");
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2c3b2f8a621f..16d17bd82be0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2616,6 +2616,7 @@ struct file_system_type {
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
 #define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
 #define FS_MGTIME		64	/* FS uses multigrain timestamps */
+#define FS_LBS			128	/* FS supports LBS */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 3bd948e6438d..de9ebc3e5d15 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -181,7 +181,7 @@  EXPORT_SYMBOL(set_blocksize);
 
 int sb_set_blocksize(struct super_block *sb, int size)
 {
-	if (set_blocksize(sb->s_bdev_file, size))
+	if (size > PAGE_SIZE || set_blocksize(sb->s_bdev_file, size))
 		return 0;
 	/* If we get here, we know size is validated */
 	sb->s_blocksize = size;