Message ID | 20240123-vfs-bdev-file-v2-0-adbd023e19cc@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Open block devices as files | expand |
Do you have a git tree for this series?
On Mon, Jan 29, 2024 at 07:17:07AM +0100, Christoph Hellwig wrote:
> Do you have a git tree for this series?
b4/vfs-bdev-file on vfs.git
On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > Hey Christoph, > Hey Jan, > Hey Jens, > > This opens block devices as files. Instead of introducing a separate > indirection into bdev_open_by_*() vis struct bdev_handle we can just > make bdev_file_open_by_*() return a struct file. Opening and closing a > block device from setup_bdev_super() and in all other places just > becomes equivalent to opening and closing a file. > > This has held up in xfstests and in blktests so far and it seems stable > and clean. The equivalence of opening and closing block devices to > regular files is a win in and of itself imho. Added to that is the > ability to do away with struct bdev_handle completely and make various > low-level helpers private to the block layer. > > All places were we currently stash a struct bdev_handle we just stash a > file and use an accessor such as file_bdev() akin to I_BDEV() to get to > the block device. > > It's now also possible to use file->f_mapping as a replacement for > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as > an alternative to bdev->bd_inode allowing us to significantly reduce or > even fully remove bdev->bd_inode in follow-up patches. > > In addition, we could get rid of sb->s_bdev and various other places > that stash the block device directly and instead stash the block device > file. Again, this is follow-up work. > > Thanks! > Christian > > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- With all fixes applied I've moved this into vfs.super on vfs/vfs.git so this gets some exposure in -next asap. Please let me know if you have quarrels with that.
Hi! On Mon 05-02-24 12:55:18, Christian Brauner wrote: > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > > Hey Christoph, > > Hey Jan, > > Hey Jens, > > > > This opens block devices as files. Instead of introducing a separate > > indirection into bdev_open_by_*() vis struct bdev_handle we can just > > make bdev_file_open_by_*() return a struct file. Opening and closing a > > block device from setup_bdev_super() and in all other places just > > becomes equivalent to opening and closing a file. > > > > This has held up in xfstests and in blktests so far and it seems stable > > and clean. The equivalence of opening and closing block devices to > > regular files is a win in and of itself imho. Added to that is the > > ability to do away with struct bdev_handle completely and make various > > low-level helpers private to the block layer. > > > > All places were we currently stash a struct bdev_handle we just stash a > > file and use an accessor such as file_bdev() akin to I_BDEV() to get to > > the block device. > > > > It's now also possible to use file->f_mapping as a replacement for > > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as > > an alternative to bdev->bd_inode allowing us to significantly reduce or > > even fully remove bdev->bd_inode in follow-up patches. > > > > In addition, we could get rid of sb->s_bdev and various other places > > that stash the block device directly and instead stash the block device > > file. Again, this is follow-up work. > > > > Thanks! > > Christian > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > --- > > With all fixes applied I've moved this into vfs.super on vfs/vfs.git so > this gets some exposure in -next asap. Please let me know if you have > quarrels with that. No quarrels really. I went through the patches and all of them look fine to me to feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> I have just noticed that in "bdev: make struct bdev_handle private to the block layer" in bdev_open() we are still leaking the handle in case of error. This is however temporary (until the end of the series when we get rid of handles altogether) so whatever. Honza
On Mon, Feb 05, 2024 at 03:19:11PM +0100, Jan Kara wrote: > Hi! > > On Mon 05-02-24 12:55:18, Christian Brauner wrote: > > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > > > Hey Christoph, > > > Hey Jan, > > > Hey Jens, > > > > > > This opens block devices as files. Instead of introducing a separate > > > indirection into bdev_open_by_*() vis struct bdev_handle we can just > > > make bdev_file_open_by_*() return a struct file. Opening and closing a > > > block device from setup_bdev_super() and in all other places just > > > becomes equivalent to opening and closing a file. > > > > > > This has held up in xfstests and in blktests so far and it seems stable > > > and clean. The equivalence of opening and closing block devices to > > > regular files is a win in and of itself imho. Added to that is the > > > ability to do away with struct bdev_handle completely and make various > > > low-level helpers private to the block layer. > > > > > > All places were we currently stash a struct bdev_handle we just stash a > > > file and use an accessor such as file_bdev() akin to I_BDEV() to get to > > > the block device. > > > > > > It's now also possible to use file->f_mapping as a replacement for > > > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as > > > an alternative to bdev->bd_inode allowing us to significantly reduce or > > > even fully remove bdev->bd_inode in follow-up patches. > > > > > > In addition, we could get rid of sb->s_bdev and various other places > > > that stash the block device directly and instead stash the block device > > > file. Again, this is follow-up work. > > > > > > Thanks! > > > Christian > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > --- > > > > With all fixes applied I've moved this into vfs.super on vfs/vfs.git so > > this gets some exposure in -next asap. Please let me know if you have > > quarrels with that. > > No quarrels really. I went through the patches and all of them look fine to > me to feel free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> > > I have just noticed that in "bdev: make struct bdev_handle private to the > block layer" in bdev_open() we are still leaking the handle in case of > error. This is however temporary (until the end of the series when we get > rid of handles altogether) so whatever. Can you double-check what's in vfs.super right now? I thought I fixed this up. I'll check too!
On Tue 06-02-24 14:39:13, Christian Brauner wrote: > On Mon, Feb 05, 2024 at 03:19:11PM +0100, Jan Kara wrote: > > Hi! > > > > On Mon 05-02-24 12:55:18, Christian Brauner wrote: > > > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > > > > Hey Christoph, > > > > Hey Jan, > > > > Hey Jens, > > > > > > > > This opens block devices as files. Instead of introducing a separate > > > > indirection into bdev_open_by_*() vis struct bdev_handle we can just > > > > make bdev_file_open_by_*() return a struct file. Opening and closing a > > > > block device from setup_bdev_super() and in all other places just > > > > becomes equivalent to opening and closing a file. > > > > > > > > This has held up in xfstests and in blktests so far and it seems stable > > > > and clean. The equivalence of opening and closing block devices to > > > > regular files is a win in and of itself imho. Added to that is the > > > > ability to do away with struct bdev_handle completely and make various > > > > low-level helpers private to the block layer. > > > > > > > > All places were we currently stash a struct bdev_handle we just stash a > > > > file and use an accessor such as file_bdev() akin to I_BDEV() to get to > > > > the block device. > > > > > > > > It's now also possible to use file->f_mapping as a replacement for > > > > bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as > > > > an alternative to bdev->bd_inode allowing us to significantly reduce or > > > > even fully remove bdev->bd_inode in follow-up patches. > > > > > > > > In addition, we could get rid of sb->s_bdev and various other places > > > > that stash the block device directly and instead stash the block device > > > > file. Again, this is follow-up work. > > > > > > > > Thanks! > > > > Christian > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > --- > > > > > > With all fixes applied I've moved this into vfs.super on vfs/vfs.git so > > > this gets some exposure in -next asap. Please let me know if you have > > > quarrels with that. > > > > No quarrels really. I went through the patches and all of them look fine to > > me to feel free to add: > > > > Reviewed-by: Jan Kara <jack@suse.cz> > > > > I have just noticed that in "bdev: make struct bdev_handle private to the > > block layer" in bdev_open() we are still leaking the handle in case of > > error. This is however temporary (until the end of the series when we get > > rid of handles altogether) so whatever. > > Can you double-check what's in vfs.super right now? I thought I fixed > this up. I'll check too! Well, you've fixed the "double allocation" issue but there's still a problem that you do: int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder, const struct blk_holder_ops *hops, struct file *bdev_file) { ... handle = kmalloc(sizeof(struct bdev_handle), GFP_KERNEL); if (!handle) return -ENOMEM; if (holder) { mode |= BLK_OPEN_EXCL; ret = bd_prepare_to_claim(bdev, holder, hops); if (ret) return ret; } else { ... So in case bd_prepare_to_claim() fails we forget to free the allocated handle. Honza
> > Can you double-check what's in vfs.super right now? I thought I fixed > > this up. I'll check too! > > Well, you've fixed the "double allocation" issue but there's still a > problem that you do: > > int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder, > const struct blk_holder_ops *hops, struct file *bdev_file) > { > ... > handle = kmalloc(sizeof(struct bdev_handle), GFP_KERNEL); > if (!handle) > return -ENOMEM; > if (holder) { > mode |= BLK_OPEN_EXCL; > ret = bd_prepare_to_claim(bdev, holder, hops); > if (ret) > return ret; > } else { > ... > > > So in case bd_prepare_to_claim() fails we forget to free the allocated > handle. Grumble grumble grumble, thank you! Fixing.
On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > This opens block devices as files. Instead of introducing a separate > indirection into bdev_open_by_*() vis struct bdev_handle we can just > make bdev_file_open_by_*() return a struct file. Opening and closing a > block device from setup_bdev_super() and in all other places just > becomes equivalent to opening and closing a file. > > This has held up in xfstests and in blktests so far and it seems stable > and clean. The equivalence of opening and closing block devices to > regular files is a win in and of itself imho. Added to that is the > ability to do away with struct bdev_handle completely and make various > low-level helpers private to the block layer. It fails to hold up in xfstests for me. git bisect leads to: commit 321de651fa565dcf76c017b257bdf15ec7fff45d Author: Christian Brauner <brauner@kernel.org> Date: Tue Jan 23 14:26:48 2024 +0100 block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access QA output created by 015 mkfs failed (see /ktest-out/xfstests/generic/015.full for details) umount: /dev/vdc: not mounted. ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 ** ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 ** mkfs.xfs: cannot open /dev/vdc: Device or resource busy mkfs failed About half the xfstests fail this way (722 of 1387 tests)
On Thu, Mar 21, 2024 at 10:17:55PM +0000, Matthew Wilcox wrote: > On Tue, Jan 23, 2024 at 02:26:17PM +0100, Christian Brauner wrote: > > This opens block devices as files. Instead of introducing a separate > > indirection into bdev_open_by_*() vis struct bdev_handle we can just > > make bdev_file_open_by_*() return a struct file. Opening and closing a > > block device from setup_bdev_super() and in all other places just > > becomes equivalent to opening and closing a file. > > > > This has held up in xfstests and in blktests so far and it seems stable > > and clean. The equivalence of opening and closing block devices to > > regular files is a win in and of itself imho. Added to that is the > > ability to do away with struct bdev_handle completely and make various > > low-level helpers private to the block layer. > > It fails to hold up in xfstests for me. > > git bisect leads to: > > commit 321de651fa565dcf76c017b257bdf15ec7fff45d > Author: Christian Brauner <brauner@kernel.org> > Date: Tue Jan 23 14:26:48 2024 +0100 > > block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access > > QA output created by 015 > mkfs failed > (see /ktest-out/xfstests/generic/015.full for details) > umount: /dev/vdc: not mounted. > > ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 ** > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 ** > mkfs.xfs: cannot open /dev/vdc: Device or resource busy > mkfs failed > > About half the xfstests fail this way (722 of 1387 tests) Christain, let's chat about testing at LSF - I was looking at this too because we thought it was a ktest update that broke at first, but if we can get you using the automated test infrastructure I built this could get caught before hitting -next
> ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 ** > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 ** > mkfs.xfs: cannot open /dev/vdc: Device or resource busy > mkfs failed > > About half the xfstests fail this way (722 of 1387 tests) Thanks for the report. Can you please show me the kernel config and the xfstests config that was used for this?
On Fri, Mar 22, 2024 at 01:31:23PM +0100, Christian Brauner wrote: > > ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 ** > > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 ** > > mkfs.xfs: cannot open /dev/vdc: Device or resource busy > > mkfs failed > > > > About half the xfstests fail this way (722 of 1387 tests) > > Thanks for the report. Can you please show me the kernel config and the > xfstests config that was used for this? Kernel config attached. I'll have to defer to Kent on the xfstests config that's used. It might be this: cat << EOF > /ktest/tests/xfstests/local.config TEST_DEV=${ktest_scratch_dev[0]} TEST_DIR=$TEST_DIR SCRATCH_DEV=${ktest_scratch_dev[1]} SCRATCH_MNT=/mnt/scratch LOGWRITES_DEV=${ktest_scratch_dev[2]} RESULT_BASE=/ktest-out/xfstests LOGGER_PROG=true EOF Also, while generic/015 is the first to fail, you can't just run generic/015. You can't even just run 012, 013, 014, 015. I haven't bisected to exactly how many predecessor tests are necessary to get a failure.
On Fri, Mar 22, 2024 at 12:40:03PM +0000, Matthew Wilcox wrote: > On Fri, Mar 22, 2024 at 01:31:23PM +0100, Christian Brauner wrote: > > > ** mkfs failed with extra mkfs options added to "-m reflink=1,rmapbt=1 -i sparse=1,nrext64=1" by test 015 ** > > > ** attempting to mkfs using only test 015 options: -d size=268435456 -b size=4096 ** > > > mkfs.xfs: cannot open /dev/vdc: Device or resource busy > > > mkfs failed > > > > > > About half the xfstests fail this way (722 of 1387 tests) > > > > Thanks for the report. Can you please show me the kernel config and the > > xfstests config that was used for this? > > Kernel config attached. > > I'll have to defer to Kent on the xfstests config that's used. It might > be this: > > cat << EOF > /ktest/tests/xfstests/local.config > TEST_DEV=${ktest_scratch_dev[0]} > TEST_DIR=$TEST_DIR > SCRATCH_DEV=${ktest_scratch_dev[1]} > SCRATCH_MNT=/mnt/scratch > LOGWRITES_DEV=${ktest_scratch_dev[2]} > RESULT_BASE=/ktest-out/xfstests > LOGGER_PROG=true > EOF > > > Also, while generic/015 is the first to fail, you can't just run > generic/015. You can't even just run 012, 013, 014, 015. I haven't > bisected to exactly how many predecessor tests are necessary to get > a failure. Thanks for the info. So it's as I suspected. The config that was used has # CONFIG_BLK_DEV_WRITE_MOUNTED is not set That means it isn't possible to write to mounted block devices. The default is CONFIG_BLK_DEV_WRITE_MOUNTED=y. I go through all block-based filesystems with xfstests for such changes wit various config options. My test matrix hasn't been updated to specifically unset CONFIG_BLK_DEV_WRITE_MOUNTED which is why this escaped. I'll send a fix shortly.
> Christain, let's chat about testing at LSF - I was looking at this too > because we thought it was a ktest update that broke at first, but if we > can get you using the automated test infrastructure I built this could > get caught before hitting -next I already do automated testing. This specific error depends on a new config option CONFIG_BLK_DEV_WRITE_MOUNTED which wasn't reflected in my test matrix. I've fixed that now. But I'm always happy to have the tree integrated with other automated testing as well!
Hey Christoph, Hey Jan, Hey Jens, This opens block devices as files. Instead of introducing a separate indirection into bdev_open_by_*() vis struct bdev_handle we can just make bdev_file_open_by_*() return a struct file. Opening and closing a block device from setup_bdev_super() and in all other places just becomes equivalent to opening and closing a file. This has held up in xfstests and in blktests so far and it seems stable and clean. The equivalence of opening and closing block devices to regular files is a win in and of itself imho. Added to that is the ability to do away with struct bdev_handle completely and make various low-level helpers private to the block layer. All places were we currently stash a struct bdev_handle we just stash a file and use an accessor such as file_bdev() akin to I_BDEV() to get to the block device. It's now also possible to use file->f_mapping as a replacement for bdev->bd_inode->i_mapping and file->f_inode or file->f_mapping->host as an alternative to bdev->bd_inode allowing us to significantly reduce or even fully remove bdev->bd_inode in follow-up patches. In addition, we could get rid of sb->s_bdev and various other places that stash the block device directly and instead stash the block device file. Again, this is follow-up work. Thanks! Christian Signed-off-by: Christian Brauner <brauner@kernel.org> --- Changes in v2: - This is not an RFC anymore. - The patches to convert all of fs/buffer.c and associated helpers to struct file have been split out of the core infrastructure. - Various renaming of helpers in response to v1. - Link to v1: https://lore.kernel.org/r/20240103-vfs-bdev-file-v1-0-6c8ee55fb6ef@kernel.org --- Christian Brauner (34): bdev: open block device as files block/ioctl: port blkdev_bszset() to file block/genhd: port disk_scan_partitions() to file md: port block device access to file swap: port block device usage to file power: port block device access to file xfs: port block device access to files drbd: port block device access to file pktcdvd: port block device access to file rnbd: port block device access to file xen: port block device access to file zram: port block device access to file bcache: port block device access to files block2mtd: port device access to files nvme: port block device access to file s390: port block device access to file target: port block device access to file bcachefs: port block device access to file btrfs: port device access to file erofs: port device access to file ext4: port block device access to file f2fs: port block device access to files jfs: port block device access to file nfs: port block device access to files ocfs2: port block device access to file reiserfs: port block device access to file bdev: remove bdev_open_by_path() bdev: make bdev_release() private to block layer bdev: make struct bdev_handle private to the block layer bdev: remove bdev pointer from struct bdev_handle block: use file->f_op to indicate restricted writes block: remove bdev_handle completely block: expose bdev_file_inode() ext4: rely on sb->f_bdev only block/bdev.c | 249 ++++++++++++++++++++++-------------- block/blk.h | 6 + block/fops.c | 48 ++++--- block/genhd.c | 12 +- block/ioctl.c | 9 +- drivers/block/drbd/drbd_int.h | 4 +- drivers/block/drbd/drbd_nl.c | 58 ++++----- drivers/block/pktcdvd.c | 68 +++++----- drivers/block/rnbd/rnbd-srv.c | 28 ++-- drivers/block/rnbd/rnbd-srv.h | 2 +- drivers/block/xen-blkback/blkback.c | 4 +- drivers/block/xen-blkback/common.h | 4 +- drivers/block/xen-blkback/xenbus.c | 37 +++--- drivers/block/zram/zram_drv.c | 26 ++-- drivers/block/zram/zram_drv.h | 2 +- drivers/md/bcache/bcache.h | 4 +- drivers/md/bcache/super.c | 74 +++++------ drivers/md/dm.c | 23 ++-- drivers/md/md.c | 12 +- drivers/md/md.h | 2 +- drivers/mtd/devices/block2mtd.c | 46 +++---- drivers/nvme/target/io-cmd-bdev.c | 16 +-- drivers/nvme/target/nvmet.h | 2 +- drivers/s390/block/dasd.c | 10 +- drivers/s390/block/dasd_genhd.c | 36 +++--- drivers/s390/block/dasd_int.h | 2 +- drivers/s390/block/dasd_ioctl.c | 2 +- drivers/target/target_core_iblock.c | 18 +-- drivers/target/target_core_iblock.h | 2 +- drivers/target/target_core_pscsi.c | 22 ++-- drivers/target/target_core_pscsi.h | 2 +- fs/bcachefs/super-io.c | 20 +-- fs/bcachefs/super_types.h | 2 +- fs/btrfs/dev-replace.c | 14 +- fs/btrfs/ioctl.c | 16 +-- fs/btrfs/volumes.c | 92 ++++++------- fs/btrfs/volumes.h | 4 +- fs/cramfs/inode.c | 2 +- fs/erofs/data.c | 6 +- fs/erofs/internal.h | 2 +- fs/erofs/super.c | 16 +-- fs/ext4/dir.c | 2 +- fs/ext4/ext4.h | 2 +- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/fast_commit.c | 2 +- fs/ext4/fsmap.c | 8 +- fs/ext4/super.c | 87 ++++++------- fs/f2fs/f2fs.h | 2 +- fs/f2fs/super.c | 12 +- fs/file_table.c | 36 +++++- fs/jfs/jfs_logmgr.c | 26 ++-- fs/jfs/jfs_logmgr.h | 2 +- fs/jfs/jfs_mount.c | 2 +- fs/nfs/blocklayout/blocklayout.h | 2 +- fs/nfs/blocklayout/dev.c | 68 +++++----- fs/ocfs2/cluster/heartbeat.c | 32 ++--- fs/reiserfs/journal.c | 38 +++--- fs/reiserfs/procfs.c | 2 +- fs/reiserfs/reiserfs.h | 8 +- fs/romfs/super.c | 2 +- fs/super.c | 18 +-- fs/xfs/xfs_buf.c | 10 +- fs/xfs/xfs_buf.h | 4 +- fs/xfs/xfs_super.c | 43 +++---- include/linux/blkdev.h | 18 +-- include/linux/device-mapper.h | 2 +- include/linux/file.h | 2 + include/linux/fs.h | 4 +- include/linux/pktcdvd.h | 4 +- include/linux/swap.h | 2 +- kernel/power/swap.c | 28 ++-- mm/swapfile.c | 22 ++-- 72 files changed, 791 insertions(+), 705 deletions(-) --- base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d change-id: 20240103-vfs-bdev-file-1208da73d7ea