mbox series

[v3,0/8] enable bs > ps for block devices

Message ID 20250221223823.1680616-1-mcgrof@kernel.org (mailing list archive)
Headers show
Series enable bs > ps for block devices | expand

Message

Luis Chamberlain Feb. 21, 2025, 10:38 p.m. UTC
Christian, Andrew,

This v3 series addresses the feedback from the v2 series [0]. The only
patch which was mofified was the patch titled "fs/mpage: use blocks_per_folio
instead of blocks_per_page". The motivation for this series is to mainly
start supporting block devices with logical block sizes larger than 4k,
we do this by addressing buffer-head support required for the block
device cache.

In the future these changes can be leveraged to also start experimenting
with LBS support for filesystems which support only buffer-heads. This
paves the way for that work.

Its perhaps is surprising to some but since this also lifts the block
device cache sector size support to 64k, devices which support up to
64k sector sizes can also leverage this to enable filesystems created with
larger sector sizes up to 64k sector sizes. The filesystem sector size
is used or documented in a bit of obscurity except for few filesystems,
but in short it ensures that the filesystem itself will not generate
writes iteslef smaller than the specified sector size. In practice this
means you can constrain metadata writes as well to a minimum size, and
so be completely deterministic with regards to the specified sector size
for min IO writes. For example since XFS can supports up to 32k sector size,
it means with these changes enable filesystems to also be created on x86_64
with both the filesystem block size and sector size to 32k, now that the block
device cache limitation is lifted.

Since this touches buffer-heads I've ran this through fstests on ext4
and found no new regressions. I've also used blktests against a kernel
built with these changes to test block devices with different larger logical
block sizes than 4k on x86_64. All changes to be able to test block
devices with a logical block size support > 4k are now merged on
upstream blktests.  I've tested the block layer with blktests with block
devices with logical block sizes up to 64k which is the max we are
currently supporting and found no new regressions.

Detailed changes in this series:

  - Modifies the commit log for "fs/buffer: remove batching from async
    read" as per Willy's request and collects his SOB.
  - Collects Reviewed-by tags
  - The patch titled "fs/mpage: use blocks_per_folio instead of blocks_per_page"
    received more love to account for Willy's point
    that we should keep accounting in order for nr_pages on mpage. This
    does this by using folio_nr_pages() on the args passed and adjusts
    the last_block accounting accordingly.
  - Through code inspection fixed folio_zero_segment() use to use
    folio_size() as we move to suppor large folios for unmapped
    folio segments on do_mpage_readpage(), this is dealt with on the
    patch titled "fs/mpage: use blocks_per_folio instead of blocks_per_page"
    as that's when we start accounting large folios into the picture.

[0] https://lkml.kernel.org/r/20250204231209.429356-1-mcgrof@kernel.org

Hannes Reinecke (2):
  fs/mpage: avoid negative shift for large blocksize
  block/bdev: enable large folio support for large logical block sizes

Luis Chamberlain (5):
  fs/buffer: simplify block_read_full_folio() with bh_offset()
  fs/mpage: use blocks_per_folio instead of blocks_per_page
  fs/buffer fs/mpage: remove large folio restriction
  block/bdev: lift block size restrictions to 64k
  bdev: use bdev_io_min() for statx block size

Matthew Wilcox (1):
  fs/buffer: remove batching from async read

 block/bdev.c           | 11 ++++----
 fs/buffer.c            | 58 +++++++++++++++++-------------------------
 fs/mpage.c             | 49 +++++++++++++++++------------------
 include/linux/blkdev.h |  8 +++++-
 4 files changed, 59 insertions(+), 67 deletions(-)