Message ID | 20220218195739.585044-5-shr@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Support sync buffered writes for io-uring | expand |
On Fri, Feb 18, 2022 at 11:57:30AM -0800, Stefan Roesch wrote: > This splits off the __alloc_page_buffers() function from the > alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so > the caller can specify the allocation flags. This one only has six callers, so let's get the API right. I suggest making this: struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, gfp_t gfp) { gfp |= __GFP_ACCOUNT; and then all the existing callers specify either GFP_NOFS or GFP_NOFS | __GFP_NOFAIL.
On 2/18/22 12:42 PM, Matthew Wilcox wrote: > On Fri, Feb 18, 2022 at 11:57:30AM -0800, Stefan Roesch wrote: >> This splits off the __alloc_page_buffers() function from the >> alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so >> the caller can specify the allocation flags. > > This one only has six callers, so let's get the API right. I suggest > making this: > > struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > gfp_t gfp) > { > gfp |= __GFP_ACCOUNT; > > and then all the existing callers specify either GFP_NOFS or > GFP_NOFS | __GFP_NOFAIL. > I can make that change, but i don't see how i can decide in block_write_begin() to use different gfp flags when an async buffered write request is processed?
Err, hell no. Please do not add any new functionality to the legacy buffer head code. If you want new features do that on the non-bufferhead iomap code path only please.
On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote: > Err, hell no. Please do not add any new functionality to the legacy > buffer head code. If you want new features do that on the > non-bufferhead iomap code path only please. I think "first convert the block device code from buffer_heads to iomap" might be a bit much of a prerequisite. I think running ext4 on top of a block device still requires buffer_heads, for example (I tried to convert the block device to use mpage in order to avoid creating buffer_heads when possible, and ext4 stopped working. I didn't try too hard to debug it as it was a bit of a distraction at the time).
On 2/19/22 9:23 PM, Matthew Wilcox wrote: > On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote: >> Err, hell no. Please do not add any new functionality to the legacy >> buffer head code. If you want new features do that on the >> non-bufferhead iomap code path only please. > > I think "first convert the block device code from buffer_heads to > iomap" might be a bit much of a prerequisite. I think running ext4 on > top of a Yes, that's exactly what Christoph was trying to say, but failing to state in an appropriate manner. And we did actually discuss that, I'm not against doing something like that. > block device still requires buffer_heads, for example (I tried to convert > the block device to use mpage in order to avoid creating buffer_heads > when possible, and ext4 stopped working. I didn't try too hard to debug > it as it was a bit of a distraction at the time). That's one of the main reasons why I didn't push this particular path, as it is a bit fraught with weirdness and legacy buffer_head code which isn't that easy to tackle...
On 2/19/22 9:38 PM, Jens Axboe wrote: > On 2/19/22 9:23 PM, Matthew Wilcox wrote: >> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote: >>> Err, hell no. Please do not add any new functionality to the legacy >>> buffer head code. If you want new features do that on the >>> non-bufferhead iomap code path only please. >> >> I think "first convert the block device code from buffer_heads to >> iomap" might be a bit much of a prerequisite. I think running ext4 on >> top of a > > Yes, that's exactly what Christoph was trying to say, but failing to > state in an appropriate manner. And we did actually discuss that, I'm > not against doing something like that. Just to be clear, I do agree with you that it's an unfair ask for this change. And as you mentioned, ext4 would require the buffer_head code to be touched anyway, just layering on top of the necessary changes for the bdev code.
On Sun, Feb 20, 2022 at 04:23:50AM +0000, Matthew Wilcox wrote: > On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote: > > Err, hell no. Please do not add any new functionality to the legacy > > buffer head code. If you want new features do that on the > > non-bufferhead iomap code path only please. > > I think "first convert the block device code from buffer_heads to iomap" > might be a bit much of a prerequisite. I think running ext4 on top of a > block device still requires buffer_heads, for example (I tried to convert > the block device to use mpage in order to avoid creating buffer_heads > when possible, and ext4 stopped working. I didn't try too hard to debug > it as it was a bit of a distraction at the time). Oh, I did not spot the users here is the block device. Which is really weird, why would anyone do buffered writes to a block devices? Doing so is a bit of a data integrity nightmare. Can we please develop this feature for iomap based file systems first, and if by then a use case for block devices arises I'll see what we can do there. I've been planning to get the block device code to stop using buffer_heads by default, but taking them into account if used by a legacy buffer_head user anyway.
On 2/22/22 1:18 AM, Christoph Hellwig wrote: > On Sun, Feb 20, 2022 at 04:23:50AM +0000, Matthew Wilcox wrote: >> On Fri, Feb 18, 2022 at 11:35:10PM -0800, Christoph Hellwig wrote: >>> Err, hell no. Please do not add any new functionality to the legacy >>> buffer head code. If you want new features do that on the >>> non-bufferhead iomap code path only please. >> >> I think "first convert the block device code from buffer_heads to iomap" >> might be a bit much of a prerequisite. I think running ext4 on top of a >> block device still requires buffer_heads, for example (I tried to convert >> the block device to use mpage in order to avoid creating buffer_heads >> when possible, and ext4 stopped working. I didn't try too hard to debug >> it as it was a bit of a distraction at the time). > > Oh, I did not spot the users here is the block device. Which is > really weird, why would anyone do buffered writes to a block devices? > Doing so is a bit of a data integrity nightmare. > > Can we please develop this feature for iomap based file systems first, > and if by then a use case for block devices arises I'll see what we > can do there. The original plan wasn't to develop bdev async writes as a separate useful feature, but rather to do it as a first step to both become acquainted with the code base and solve some of the common issues for both. The fact that we need to touch buffer_heads for the bdev path is annoying, and something that I'd very much rather just avoid. And converting bdev to iomap first is a waste of time, exactly because it's not a separately useful feature. Hence I think we'll change gears here and start with iomap and XFS instead. > I've been planning to get the block device code to stop using > buffer_heads by default, but taking them into account if used by a > legacy buffer_head user anyway. That would indeed be great, and to be honest, the current code for bdev read/write doesn't make much sense outside of from a historical point of view.
diff --git a/fs/buffer.c b/fs/buffer.c index 6e6a69a12eed..2858eaf433c8 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -802,26 +802,13 @@ int remove_inode_buffers(struct inode *inode) return ret; } -/* - * Create the appropriate buffers when given a page for data area and - * the size of each buffer.. Use the bh->b_this_page linked list to - * follow the buffers created. Return NULL if unable to create more - * buffers. - * - * The retry flag is used to differentiate async IO (paging, swapping) - * which may not fail from ordinary buffer allocations. - */ -struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, - bool retry) +static struct buffer_head *__alloc_page_buffers(struct page *page, + unsigned long size, gfp_t gfp) { struct buffer_head *bh, *head; - gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; long offset; struct mem_cgroup *memcg, *old_memcg; - if (retry) - gfp |= __GFP_NOFAIL; - /* The page lock pins the memcg */ memcg = page_memcg(page); old_memcg = set_active_memcg(memcg); @@ -859,6 +846,26 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, goto out; } + +/* + * Create the appropriate buffers when given a page for data area and + * the size of each buffer.. Use the bh->b_this_page linked list to + * follow the buffers created. Return NULL if unable to create more + * buffers. + * + * The retry flag is used to differentiate async IO (paging, swapping) + * which may not fail from ordinary buffer allocations. + */ +struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, + bool retry) +{ + gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; + + if (retry) + gfp |= __GFP_NOFAIL; + + return __alloc_page_buffers(page, size, gfp); +} EXPORT_SYMBOL_GPL(alloc_page_buffers); static inline void
This splits off the __alloc_page_buffers() function from the alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so the caller can specify the allocation flags. Signed-off-by: Stefan Roesch <shr@fb.com> --- fs/buffer.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-)