diff mbox series

[block/for-next,v2,01/16] block: add a new helper to get inode from block_device

Message ID 20231127062116.2355129-2-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series block: remove field 'bd_inode' from block_device | expand

Commit Message

Yu Kuai Nov. 27, 2023, 6:21 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

block_devcie is allocated from bdev_alloc() by bdev_alloc_inode(), and
currently block_device contains a pointer that point to the address of
inode, while such inode is allocated together:

bdev_alloc
 inode = new_inode()
  // inode is &bdev_inode->vfs_inode
 bdev = I_BDEV(inode)
  // bdev is &bdev_inode->bdev
 bdev->inode = inode

Add a new helper to get address of inode from bdev by add operation
instead of memory access, which is more efficiency.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c              |  5 -----
 include/linux/blk_types.h | 12 ++++++++++++
 2 files changed, 12 insertions(+), 5 deletions(-)

Comments

Christoph Hellwig Nov. 27, 2023, 7:21 a.m. UTC | #1
On Mon, Nov 27, 2023 at 02:21:01PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> block_devcie is allocated from bdev_alloc() by bdev_alloc_inode(), and
> currently block_device contains a pointer that point to the address of
> inode, while such inode is allocated together:

This is going the wrong way.  Nothing outside of core block layer code
should ever directly use the bdev inode.  We've been rather sloppy
and added a lot of direct reference to it, but they really need to
go away and be replaced with well defined high level operation on
struct block_device.  Once that is done we can remove the bd_inode
pointer, but replacing it with something that pokes even more deeply
into bdev internals is a bad idea.
Yu Kuai Nov. 27, 2023, 1:07 p.m. UTC | #2
Hi,

在 2023/11/27 15:21, Christoph Hellwig 写道:
> On Mon, Nov 27, 2023 at 02:21:01PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> block_devcie is allocated from bdev_alloc() by bdev_alloc_inode(), and
>> currently block_device contains a pointer that point to the address of
>> inode, while such inode is allocated together:
> 
> This is going the wrong way.  Nothing outside of core block layer code
> should ever directly use the bdev inode.  We've been rather sloppy
> and added a lot of direct reference to it, but they really need to
> go away and be replaced with well defined high level operation on
> struct block_device.  Once that is done we can remove the bd_inode
> pointer, but replacing it with something that pokes even more deeply
> into bdev internals is a bad idea.

Thanks for the advice, however, after collecting how other modules are
using bdev inode, I got two main questions:

1) Is't okay to add a new helper to pass in bdev for following apis?
If so, then almost all the fs and driver can avoid to access bd_inode
dirctly.

errseq_check(&bdev->bd_inode->i_mapping->wb_err, wb_err);
errseq_check_and_advance(&bdev->bd_inode->i_mapping->wb_err, &wb_err);
mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
i_size_read(bdev->bd_inode)
find_get_page(bdev->bd_inode->i_mapping, offset);
find_or_create_page(bdev->bd_inode->i_mapping, index, gfp);
read_cache_page_gfp(bdev->bd_inode->i_mapping, index, gfp);
invalidate_inode_pages2(bdev->bd_inode->i_mapping);
invalidate_inode_pages2_range(bdev->bd_inode->i_mapping, start, end);
read_mapping_folio(bdev->bd_inode->i_mapping, index, file);
read_mapping_page(bdev->bd_inode->i_mapping, index, file);
balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping)
file_ra_state_init(ra, bdev->bd_inode->i_mapping);
page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index, 
req_count);
inode_to_bdi(bdev->bd_inode)

2) For the file fs/buffer.c, there are some special usage like
following that I don't think it's good to add a helper:

spin_lock(&bd_inode->i_mapping->private_lock);

Is't okay to move following apis from fs/buffer.c directly to
block/bdev.c?

__find_get_block
bdev_getblk

Thanks,
Kuai

> .
>
Christoph Hellwig Nov. 27, 2023, 4:32 p.m. UTC | #3
On Mon, Nov 27, 2023 at 09:07:22PM +0800, Yu Kuai wrote:
> 1) Is't okay to add a new helper to pass in bdev for following apis?


For some we already have them (e.g. bdev_nr_bytes to read the bdev)
size, for some we need to add them.  The big thing that seems to
stick out is page cache API, and I think that is where we need to
define maintainable APIs for file systems and others to use the
block device page cache.  Probably only in folio versions and not
pages once if we're touching the code anyay

> 2) For the file fs/buffer.c, there are some special usage like
> following that I don't think it's good to add a helper:
> 
> spin_lock(&bd_inode->i_mapping->private_lock);
> 
> Is't okay to move following apis from fs/buffer.c directly to
> block/bdev.c?
> 
> __find_get_block
> bdev_getblk

I'm not sure moving is a good idea, but we might end up the
some kind of low-level access from buffer.c, be that special
helpers, a separate header or something else.  Let's sort out
the rest of the kernel first.
Yu Kuai Nov. 28, 2023, 1:35 a.m. UTC | #4
Hi,

在 2023/11/28 0:32, Christoph Hellwig 写道:
> On Mon, Nov 27, 2023 at 09:07:22PM +0800, Yu Kuai wrote:
>> 1) Is't okay to add a new helper to pass in bdev for following apis?
> 
> 
> For some we already have them (e.g. bdev_nr_bytes to read the bdev)
> size, for some we need to add them.  The big thing that seems to
> stick out is page cache API, and I think that is where we need to
> define maintainable APIs for file systems and others to use the
> block device page cache.  Probably only in folio versions and not
> pages once if we're touching the code anyay

Thanks for the advice! In case I'm understanding correctly, do you mean
that all other fs/drivers that is using pages versions can safely switch
to folio versions now?

By the way, my orginal idea was trying to add a new field 'bd_flags'
in block_devcie, and then add a new bit so that bio_check_ro() will
only warn once for each partition. Now that this patchset will be quite
complex, I'll add a new bool field 'bd_ro_warned' to fix the above
problem first, and then add 'bd_flags' once this patchset is done.

Thanks,
Kuai

> 
>> 2) For the file fs/buffer.c, there are some special usage like
>> following that I don't think it's good to add a helper:
>>
>> spin_lock(&bd_inode->i_mapping->private_lock);
>>
>> Is't okay to move following apis from fs/buffer.c directly to
>> block/bdev.c?
>>
>> __find_get_block
>> bdev_getblk
> 
> I'm not sure moving is a good idea, but we might end up the
> some kind of low-level access from buffer.c, be that special
> helpers, a separate header or something else.  Let's sort out
> the rest of the kernel first.
> 
> .
>
Christoph Hellwig Nov. 28, 2023, 5:48 a.m. UTC | #5
On Tue, Nov 28, 2023 at 09:35:56AM +0800, Yu Kuai wrote:
> Thanks for the advice! In case I'm understanding correctly, do you mean
> that all other fs/drivers that is using pages versions can safely switch
> to folio versions now?

If you never allocate a high-order folio pages are identical to folios.
So yes, we can do folio based interfaces only, and also use that as
an opportunity to convert over the callers.

> By the way, my orginal idea was trying to add a new field 'bd_flags'
> in block_devcie, and then add a new bit so that bio_check_ro() will
> only warn once for each partition. Now that this patchset will be quite
> complex, I'll add a new bool field 'bd_ro_warned' to fix the above
> problem first, and then add 'bd_flags' once this patchset is done.

Yes, please do a minimal version if you can find space where the
rmw cycles don't cause damage to neighbouring fields.  Or just leave
the current set of warnings in if it's too hard.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index e4cfb7adb645..7509389095b7 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -30,11 +30,6 @@ 
 #include "../fs/internal.h"
 #include "blk.h"
 
-struct bdev_inode {
-	struct block_device bdev;
-	struct inode vfs_inode;
-};
-
 static inline struct bdev_inode *BDEV_I(struct inode *inode)
 {
 	return container_of(inode, struct bdev_inode, vfs_inode);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..06de8393dcd1 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -85,6 +85,18 @@  struct block_device {
 #define bdev_kobj(_bdev) \
 	(&((_bdev)->bd_device.kobj))
 
+struct bdev_inode {
+	struct block_device bdev;
+	struct inode vfs_inode;
+};
+
+static inline struct inode *bdev_inode(struct block_device *bdev)
+{
+	struct bdev_inode *bi = container_of(bdev, struct bdev_inode, bdev);
+
+	return &bi->vfs_inode;
+}
+
 /*
  * Block error status values.  See block/blk-core:blk_errors for the details.
  * Alpha cannot write a byte atomically, so we need to use 32-bit value.