diff mbox series

[v2,04/13] fs: split off __alloc_page_buffers function

Message ID 20220218195739.585044-5-shr@fb.com (mailing list archive)
State New, archived
Headers show
Series Support sync buffered writes for io-uring | expand

Commit Message

Stefan Roesch Feb. 18, 2022, 7:57 p.m. UTC
This splits off the __alloc_page_buffers() function from the
alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so
the caller can specify the allocation flags.

Signed-off-by: Stefan Roesch <shr@fb.com>
---
 fs/buffer.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

Comments

Matthew Wilcox Feb. 18, 2022, 8:42 p.m. UTC | #1
On Fri, Feb 18, 2022 at 11:57:30AM -0800, Stefan Roesch wrote:
> This splits off the __alloc_page_buffers() function from the
> alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so
> the caller can specify the allocation flags.

This one only has six callers, so let's get the API right.  I suggest
making this:

struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
		gfp_t gfp)
{
	gfp |= __GFP_ACCOUNT;

and then all the existing callers specify either GFP_NOFS or
GFP_NOFS | __GFP_NOFAIL.
Stefan Roesch Feb. 18, 2022, 8:50 p.m. UTC | #2
On 2/18/22 12:42 PM, Matthew Wilcox wrote:
> On Fri, Feb 18, 2022 at 11:57:30AM -0800, Stefan Roesch wrote:
>> This splits off the __alloc_page_buffers() function from the
>> alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so
>> the caller can specify the allocation flags.
> 
> This one only has six callers, so let's get the API right.  I suggest
> making this:
> 
> struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
> 		gfp_t gfp)
> {
> 	gfp |= __GFP_ACCOUNT;
> 
> and then all the existing callers specify either GFP_NOFS or
> GFP_NOFS | __GFP_NOFAIL.
> 


I can make that change, but i don't see how i can decide in block_write_begin()
to use different gfp flags when an async buffered write request is processed?
Christoph Hellwig Feb. 19, 2022, 7:35 a.m. UTC | #3
Err, hell no.  Please do not add any new functionality to the legacy
buffer head code.  If you want new features do that on the
non-bufferhead iomap code path only please.
Matthew Wilcox Feb. 20, 2022, 4:23 a.m. UTC | #4
On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote:
> Err, hell no.  Please do not add any new functionality to the legacy
> buffer head code.  If you want new features do that on the
> non-bufferhead iomap code path only please.

I think "first convert the block device code from buffer_heads to iomap"
might be a bit much of a prerequisite.  I think running ext4 on top of a
block device still requires buffer_heads, for example (I tried to convert
the block device to use mpage in order to avoid creating buffer_heads
when possible, and ext4 stopped working.  I didn't try too hard to debug
it as it was a bit of a distraction at the time).
Jens Axboe Feb. 20, 2022, 4:38 a.m. UTC | #5
On 2/19/22 9:23 PM, Matthew Wilcox wrote:
> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote:
>> Err, hell no.  Please do not add any new functionality to the legacy
>> buffer head code.  If you want new features do that on the
>> non-bufferhead iomap code path only please.
> 
> I think "first convert the block device code from buffer_heads to
> iomap" might be a bit much of a prerequisite.  I think running ext4 on
> top of a

Yes, that's exactly what Christoph was trying to say, but failing to
state in an appropriate manner. And we did actually discuss that, I'm
not against doing something like that.

> block device still requires buffer_heads, for example (I tried to convert
> the block device to use mpage in order to avoid creating buffer_heads
> when possible, and ext4 stopped working.  I didn't try too hard to debug
> it as it was a bit of a distraction at the time).

That's one of the main reasons why I didn't push this particular path,
as it is a bit fraught with weirdness and legacy buffer_head code which
isn't that easy to tackle...
Jens Axboe Feb. 20, 2022, 4:51 a.m. UTC | #6
On 2/19/22 9:38 PM, Jens Axboe wrote:
> On 2/19/22 9:23 PM, Matthew Wilcox wrote:
>> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote:
>>> Err, hell no.  Please do not add any new functionality to the legacy
>>> buffer head code.  If you want new features do that on the
>>> non-bufferhead iomap code path only please.
>>
>> I think "first convert the block device code from buffer_heads to
>> iomap" might be a bit much of a prerequisite.  I think running ext4 on
>> top of a
> 
> Yes, that's exactly what Christoph was trying to say, but failing to
> state in an appropriate manner. And we did actually discuss that, I'm
> not against doing something like that.

Just to be clear, I do agree with you that it's an unfair ask for this
change. And as you mentioned, ext4 would require the buffer_head code
to be touched anyway, just layering on top of the necessary changes
for the bdev code.
Christoph Hellwig Feb. 22, 2022, 8:18 a.m. UTC | #7
On Sun, Feb 20, 2022 at 04:23:50AM +0000, Matthew Wilcox wrote:
> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote:
> > Err, hell no.  Please do not add any new functionality to the legacy
> > buffer head code.  If you want new features do that on the
> > non-bufferhead iomap code path only please.
> 
> I think "first convert the block device code from buffer_heads to iomap"
> might be a bit much of a prerequisite.  I think running ext4 on top of a
> block device still requires buffer_heads, for example (I tried to convert
> the block device to use mpage in order to avoid creating buffer_heads
> when possible, and ext4 stopped working.  I didn't try too hard to debug
> it as it was a bit of a distraction at the time).

Oh, I did not spot the users here is the block device.  Which is really
weird, why would anyone do buffered writes to a block devices?  Doing
so is a bit of a data integrity nightmare.

Can we please develop this feature for iomap based file systems first,
and if by then a use case for block devices arises I'll see what we can
do there.  I've been planning to get the block device code to stop using
buffer_heads by default, but taking them into account if used by a
legacy buffer_head user anyway.
Jens Axboe Feb. 22, 2022, 11:19 p.m. UTC | #8
On 2/22/22 1:18 AM, Christoph Hellwig wrote:
> On Sun, Feb 20, 2022 at 04:23:50AM +0000, Matthew Wilcox wrote:
>> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote:
>>> Err, hell no.  Please do not add any new functionality to the legacy
>>> buffer head code.  If you want new features do that on the
>>> non-bufferhead iomap code path only please.
>>
>> I think "first convert the block device code from buffer_heads to iomap"
>> might be a bit much of a prerequisite.  I think running ext4 on top of a
>> block device still requires buffer_heads, for example (I tried to convert
>> the block device to use mpage in order to avoid creating buffer_heads
>> when possible, and ext4 stopped working.  I didn't try too hard to debug
>> it as it was a bit of a distraction at the time).
> 
> Oh, I did not spot the users here is the block device.  Which is
> really weird, why would anyone do buffered writes to a block devices?
> Doing so is a bit of a data integrity nightmare.
> 
> Can we please develop this feature for iomap based file systems first,
> and if by then a use case for block devices arises I'll see what we
> can do there.

The original plan wasn't to develop bdev async writes as a separate
useful feature, but rather to do it as a first step to both become
acquainted with the code base and solve some of the common issues for
both.

The fact that we need to touch buffer_heads for the bdev path is
annoying, and something that I'd very much rather just avoid. And
converting bdev to iomap first is a waste of time, exactly because it's
not a separately useful feature.

Hence I think we'll change gears here and start with iomap and XFS
instead.

> I've been planning to get the block device code to stop using
> buffer_heads by default, but taking them into account if used by a
> legacy buffer_head user anyway.

That would indeed be great, and to be honest, the current code for bdev
read/write doesn't make much sense outside of from a historical point of
view.
diff mbox series

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 6e6a69a12eed..2858eaf433c8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -802,26 +802,13 @@  int remove_inode_buffers(struct inode *inode)
 	return ret;
 }
 
-/*
- * Create the appropriate buffers when given a page for data area and
- * the size of each buffer.. Use the bh->b_this_page linked list to
- * follow the buffers created.  Return NULL if unable to create more
- * buffers.
- *
- * The retry flag is used to differentiate async IO (paging, swapping)
- * which may not fail from ordinary buffer allocations.
- */
-struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
-		bool retry)
+static struct buffer_head *__alloc_page_buffers(struct page *page,
+						unsigned long size, gfp_t gfp)
 {
 	struct buffer_head *bh, *head;
-	gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
 	long offset;
 	struct mem_cgroup *memcg, *old_memcg;
 
-	if (retry)
-		gfp |= __GFP_NOFAIL;
-
 	/* The page lock pins the memcg */
 	memcg = page_memcg(page);
 	old_memcg = set_active_memcg(memcg);
@@ -859,6 +846,26 @@  struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 
 	goto out;
 }
+
+/*
+ * Create the appropriate buffers when given a page for data area and
+ * the size of each buffer.. Use the bh->b_this_page linked list to
+ * follow the buffers created.  Return NULL if unable to create more
+ * buffers.
+ *
+ * The retry flag is used to differentiate async IO (paging, swapping)
+ * which may not fail from ordinary buffer allocations.
+ */
+struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
+		bool retry)
+{
+	gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
+
+	if (retry)
+		gfp |= __GFP_NOFAIL;
+
+	return __alloc_page_buffers(page, size, gfp);
+}
 EXPORT_SYMBOL_GPL(alloc_page_buffers);
 
 static inline void