mbox series

[RFC,00/34] Open block devices as files & a bd_inode proposal

Message ID 20240103-vfs-bdev-file-v1-0-6c8ee55fb6ef@kernel.org (mailing list archive)
Headers show
Series Open block devices as files & a bd_inode proposal | expand

Message

Christian Brauner Jan. 3, 2024, 12:54 p.m. UTC
Hey Christoph,
Hey Jan,
Hey Jens,

I've been toying with this idea in between changing diapers essentially
and I've taken it far enough that I'd like some general input before
going further and massaging out any corner cases I might've missed.

I wanted to see whether we can make struct bdev_handle completely
private to the block layer in the next cycle and unexport low-level
helpers such as bdev_release() - formerly blkdev_put() - completely.

And afaict, we can actually get that to work. Simply put instead of
doing this bdev_open_by_*() dance where we return a struct block_device
we can just make bdev_file_open_by_*() return a struct file. Opening and
closing a block device from setup_bdev_super() and in all other places
just becomes equivalent to opening and closing a file.

This has held up in xfstests and in blktests so far and it seems stable
and clean. The equivalence of opening and closing block devices to
regular files is a win in and of itself imho. Added to that is the
ability to hide away all of the the details of struct bdev_handle and
various other low-level helpers.

So for that reason alone I think we should do it. All places were we
currently stash a bdev_handle we just stash a file and use accessors
such as F_BDEV() akin to I_BDEV() to get to the block device.

While I was doing that I realized that this is also a way for us to get
rid of bd_inode in fs/buffer.c though I don't think that's a requirement
for this change to be worth it.

Basically we simply record a struct file for the block device in struct
buffer_head and in struct iomap. That works without a problem afaict.
All filesystems will have a struct file handle to the block device so we
can trivially get access to it in nearly all places. The only exception
is for the block/fops.c layer itself where we obviously don't have a
struct file for the inode. So if we can solve that problem we can kill
bd_inode access and simply rely on file->f_mapping->host there as well.
IOW, just export and use bdev_file_inode() everywhere in fs/buffer.c

I only roughly drafted that bd_inode removal in fs/buffer.c. I think
this would work but I'd like to hear your thoughts on this. But again, I
don't think that's a requirement for that change to be worth it.

The patch series is barebones with really tiny commit messages because
I'd like to get early input. The core patches are:

bdev: open block device as files

In that patch the order between allocating a file and opening a bdev
handle are still reversed that's all fully cleaned up after all users of
bdev_handle are ported to rely on files. So the final form is:

bdev: rework bdev_open_by_dev()

and I think that looks fairly nice.

I've added a few additional illustrational patches for future work on
top:

* port ext4 to only rely on sb->s_f_bdev instead of sb->s_bdev
* port ext4 to never touch bdev->bd_inode and just rely on bdev_file_inode()
* remove bdev->bd_inode access from fs/buffer.c and just rely on bdev_file_inode()

I haven't though about potential corner cases yet too much but the file
stuff should actually be doable.

Thanks!
Christian

---
Christian Brauner (34):
      bdev: open block device as files
      block/ioctl: port blkdev_bszset() to file
      block/genhd: port disk_scan_partitions() to file
      md: port block device access to file
      swap: port block device usage to file
      power: port block device access to file
      xfs: port block device access to files
      drbd: port block device access to file
      pktcdvd: port block device access to file
      rnbd: port block device access to file
      xen: port block device access to file
      zram: port block device access to file
      bcache: port block device access to files
      block2mtd: port device access to files
      nvme: port block device access to file
      s390: port block device access to file
      target: port block device access to file
      bcachefs: port block device access to file
      btrfs: port device access to file
      erofs: port device access to file
      ext4: port block device access to file
      f2fs: port block device access to files
      jfs: port block device access to file
      nfs: port block device access to files
      ocfs2: port block device access to file
      reiserfs: port block device access to file
      bdev: remove bdev_open_by_path()
      bdev: make bdev_release() private to block layer
      bdev: make struct bdev_handle private to the block layer
      bdev: rework bdev_open_by_dev()
      ext4: rely on sb->f_bdev only
      block: expose bdev_file_inode()
      ext4: use bdev_file_inode()
      [DRAFT] buffer: port block device access to files and get rid of bd_inode access

 block/bdev.c                        | 220 +++++++++++++++++++++++-------------
 block/blk.h                         |  10 ++
 block/fops.c                        |  40 +++----
 block/genhd.c                       |  12 +-
 block/ioctl.c                       |   9 +-
 drivers/block/drbd/drbd_int.h       |   4 +-
 drivers/block/drbd/drbd_nl.c        |  58 +++++-----
 drivers/block/pktcdvd.c             |  68 +++++------
 drivers/block/rnbd/rnbd-srv.c       |  26 ++---
 drivers/block/rnbd/rnbd-srv.h       |   2 +-
 drivers/block/xen-blkback/blkback.c |   4 +-
 drivers/block/xen-blkback/common.h  |   4 +-
 drivers/block/xen-blkback/xenbus.c  |  36 +++---
 drivers/block/zram/zram_drv.c       |  26 ++---
 drivers/block/zram/zram_drv.h       |   2 +-
 drivers/md/bcache/bcache.h          |   4 +-
 drivers/md/bcache/super.c           |  74 ++++++------
 drivers/md/dm.c                     |  23 ++--
 drivers/md/md-bitmap.c              |   1 +
 drivers/md/md.c                     |  12 +-
 drivers/md/md.h                     |   2 +-
 drivers/mtd/devices/block2mtd.c     |  42 +++----
 drivers/nvme/target/io-cmd-bdev.c   |  16 +--
 drivers/nvme/target/nvmet.h         |   2 +-
 drivers/s390/block/dasd.c           |  10 +-
 drivers/s390/block/dasd_genhd.c     |  36 +++---
 drivers/s390/block/dasd_int.h       |   2 +-
 drivers/s390/block/dasd_ioctl.c     |   2 +-
 drivers/target/target_core_iblock.c |  18 +--
 drivers/target/target_core_iblock.h |   2 +-
 drivers/target/target_core_pscsi.c  |  22 ++--
 drivers/target/target_core_pscsi.h  |   2 +-
 fs/affs/file.c                      |   1 +
 fs/bcachefs/super-io.c              |  20 ++--
 fs/bcachefs/super_types.h           |   2 +-
 fs/btrfs/dev-replace.c              |  14 +--
 fs/btrfs/inode.c                    |   1 +
 fs/btrfs/ioctl.c                    |  16 +--
 fs/btrfs/volumes.c                  |  92 +++++++--------
 fs/btrfs/volumes.h                  |   4 +-
 fs/buffer.c                         |  69 +++++------
 fs/cramfs/inode.c                   |   2 +-
 fs/direct-io.c                      |   2 +-
 fs/erofs/data.c                     |  13 ++-
 fs/erofs/internal.h                 |   3 +-
 fs/erofs/super.c                    |  16 +--
 fs/erofs/zmap.c                     |   1 +
 fs/ext2/inode.c                     |   8 +-
 fs/ext4/dir.c                       |   2 +-
 fs/ext4/ext4.h                      |   2 +-
 fs/ext4/ext4_jbd2.c                 |   2 +-
 fs/ext4/fast_commit.c               |   2 +-
 fs/ext4/fsmap.c                     |   8 +-
 fs/ext4/inode.c                     |   6 +-
 fs/ext4/super.c                     |  88 +++++++--------
 fs/f2fs/data.c                      |   6 +-
 fs/f2fs/f2fs.h                      |   3 +-
 fs/f2fs/super.c                     |  12 +-
 fs/fuse/dax.c                       |   1 +
 fs/gfs2/aops.c                      |   1 +
 fs/gfs2/bmap.c                      |   1 +
 fs/hpfs/file.c                      |   1 +
 fs/jbd2/commit.c                    |   1 +
 fs/jbd2/journal.c                   |  26 +++--
 fs/jbd2/recovery.c                  |   6 +-
 fs/jbd2/revoke.c                    |  10 +-
 fs/jbd2/transaction.c               |   1 +
 fs/jfs/jfs_logmgr.c                 |  26 ++---
 fs/jfs/jfs_logmgr.h                 |   2 +-
 fs/jfs/jfs_mount.c                  |   2 +-
 fs/mpage.c                          |   5 +-
 fs/nfs/blocklayout/blocklayout.h    |   2 +-
 fs/nfs/blocklayout/dev.c            |  68 +++++------
 fs/nilfs2/btnode.c                  |   2 +
 fs/nilfs2/gcinode.c                 |   1 +
 fs/nilfs2/mdt.c                     |   1 +
 fs/nilfs2/page.c                    |   2 +
 fs/nilfs2/recovery.c                |  20 ++--
 fs/nilfs2/the_nilfs.c               |   1 +
 fs/ntfs/aops.c                      |   3 +
 fs/ntfs/file.c                      |   1 +
 fs/ntfs/mft.c                       |   2 +
 fs/ntfs3/fsntfs.c                   |   8 +-
 fs/ntfs3/inode.c                    |   1 +
 fs/ntfs3/super.c                    |   2 +-
 fs/ocfs2/cluster/heartbeat.c        |  32 +++---
 fs/ocfs2/journal.c                  |   2 +-
 fs/reiserfs/journal.c               |  44 ++++----
 fs/reiserfs/procfs.c                |   2 +-
 fs/reiserfs/reiserfs.h              |   8 +-
 fs/reiserfs/tail_conversion.c       |   1 +
 fs/romfs/super.c                    |   2 +-
 fs/super.c                          |  18 +--
 fs/xfs/xfs_buf.c                    |  10 +-
 fs/xfs/xfs_buf.h                    |   4 +-
 fs/xfs/xfs_iomap.c                  |   7 +-
 fs/xfs/xfs_super.c                  |  43 ++++---
 fs/zonefs/file.c                    |   2 +
 include/linux/blkdev.h              |  18 +--
 include/linux/buffer_head.h         |  45 ++++----
 include/linux/device-mapper.h       |   2 +-
 include/linux/fs.h                  |   4 +-
 include/linux/iomap.h               |   1 +
 include/linux/jbd2.h                |   6 +-
 include/linux/pktcdvd.h             |   4 +-
 include/linux/swap.h                |   2 +-
 kernel/power/swap.c                 |  28 ++---
 mm/swapfile.c                       |  22 ++--
 108 files changed, 908 insertions(+), 782 deletions(-)
---
base-commit: aee755dd02191d5669860f38e28ec93d8f0a4e70
change-id: 20240103-vfs-bdev-file-1208da73d7ea

Comments

Christoph Hellwig Jan. 8, 2024, 4:26 p.m. UTC | #1
On Wed, Jan 03, 2024 at 01:54:58PM +0100, Christian Brauner wrote:
> I wanted to see whether we can make struct bdev_handle completely
> private to the block layer in the next cycle and unexport low-level
> helpers such as bdev_release() - formerly blkdev_put() - completely.

I think we can actually kill bdev_handle entirely.  We can get the
bdev from the bdev inode using I_BDEV already, so no need to store
the bdev.  We don't need the mode field as we known an exlusive
open is equivalent to having a holder.  So just store the older in
file->private_data and the bdev_handle can be removed again.
Jan Kara Jan. 9, 2024, 8:46 a.m. UTC | #2
On Mon 08-01-24 17:26:41, Christoph Hellwig wrote:
> On Wed, Jan 03, 2024 at 01:54:58PM +0100, Christian Brauner wrote:
> > I wanted to see whether we can make struct bdev_handle completely
> > private to the block layer in the next cycle and unexport low-level
> > helpers such as bdev_release() - formerly blkdev_put() - completely.
> 
> I think we can actually kill bdev_handle entirely.  We can get the
> bdev from the bdev inode using I_BDEV already, so no need to store
> the bdev.  We don't need the mode field as we known an exlusive
> open is equivalent to having a holder.  So just store the older in
> file->private_data and the bdev_handle can be removed again.

Well, we also need the read-write mode of the handle in some places but that
could be stored in file->f_mode (not sure if it really gets stored there
in this patch set - still need to read the details) so in principle I agree
that bdev_handle should not be necessary.

								Honza
Christian Brauner Jan. 15, 2024, 2:24 p.m. UTC | #3
On Tue, Jan 09, 2024 at 09:46:27AM +0100, Jan Kara wrote:
> On Mon 08-01-24 17:26:41, Christoph Hellwig wrote:
> > On Wed, Jan 03, 2024 at 01:54:58PM +0100, Christian Brauner wrote:
> > > I wanted to see whether we can make struct bdev_handle completely
> > > private to the block layer in the next cycle and unexport low-level
> > > helpers such as bdev_release() - formerly blkdev_put() - completely.
> > 
> > I think we can actually kill bdev_handle entirely.  We can get the
> > bdev from the bdev inode using I_BDEV already, so no need to store
> > the bdev.  We don't need the mode field as we known an exlusive
> > open is equivalent to having a holder.  So just store the older in
> > file->private_data and the bdev_handle can be removed again.
> 
> Well, we also need the read-write mode of the handle in some places but that
> could be stored in file->f_mode (not sure if it really gets stored there
> in this patch set - still need to read the details) so in principle I agree
> that bdev_handle should not be necessary.

So I think I've found a way to not even use a new fmode flag for this.
We can just use a set of file operations def_blk_fops_restricted to
detect when a block device was opened with restricted write access.
def_blk_fops isn't needed to check whether something is a block device
IS_BLK() is enough for that. And def_blk_fops_restricted can be kept
completely local to block/.
Jan Kara Jan. 17, 2024, 4:46 p.m. UTC | #4
On Wed 03-01-24 13:54:58, Christian Brauner wrote:
> I wanted to see whether we can make struct bdev_handle completely
> private to the block layer in the next cycle and unexport low-level
> helpers such as bdev_release() - formerly blkdev_put() - completely.
> 
> And afaict, we can actually get that to work. Simply put instead of
> doing this bdev_open_by_*() dance where we return a struct block_device
> we can just make bdev_file_open_by_*() return a struct file. Opening and
> closing a block device from setup_bdev_super() and in all other places
> just becomes equivalent to opening and closing a file.

So I've checked the patchset (not too carefully) and overall I like the
direction. I've commented on the few things I didn't quite understand /
like but overall I like this.

								Honza