Message ID | 20220623175157.1715274-1-shr@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | io-uring/xfs: support async buffered writes | expand |
On Thu, Jun 23, 2022 at 10:51:43AM -0700, Stefan Roesch wrote: > This patch series adds support for async buffered writes when using both > xfs and io-uring. Currently io-uring only supports buffered writes in the > slow path, by processing them in the io workers. With this patch series it is > now possible to support buffered writes in the fast path. To be able to use > the fast path the required pages must be in the page cache, the required locks > in xfs can be granted immediately and no additional blocks need to be read > form disk. > > Updating the inode can take time. An optimization has been implemented for > the time update. Time updates will be processed in the slow path. While there > is already a time update in process, other write requests for the same file, > can skip the update of the modification time. > > > Performance results: > For fio the following results have been obtained with a queue depth of > 1 and 4k block size (runtime 600 secs): > > sequential writes: > without patch with patch libaio psync > iops: 77k 209k 195K 233K > bw: 314MB/s 854MB/s 790MB/s 953MB/s > clat: 9600ns 120ns 540ns 3000ns Hey, nice! > > > For an io depth of 1, the new patch improves throughput by over three times > (compared to the exiting behavior, where buffered writes are processed by an > io-worker process) and also the latency is considerably reduced. To achieve the > same or better performance with the exisiting code an io depth of 4 is required. > Increasing the iodepth further does not lead to improvements. > > In addition the latency of buffered write operations is reduced considerably. > > > > Support for async buffered writes: > > To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In > addition the check in generic_write_checks is modified to allow for async > buffered writes that have this flag set. > > Changes to the iomap page create function to allow the caller to specify > the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set > and specifies the requested gfp flags. > > Adds the iomap async buffered write support to the xfs iomap layer. > Adds async buffered write support to the xfs iomap layer. > > Support for async buffered write support and inode time modification > > Splits the functions for checking if the file privileges need to be removed in > two functions: check function and a function for the removal of file privileges. > The same split is also done for the function to update the file modification time. > > Implement an optimization that while a file modification time is pending other > requests for the same file don't need to wait for the file modification update. > This avoids that a considerable number of buffered async write requests get > punted. > > Take the ilock in nowait mode if async buffered writes are enabled and enable > the async buffered writes optimization in io_uring. > > Support for write throttling of async buffered writes: > > Add a no_wait parameter to the exisiting balance_dirty_pages() function. The > function will return -EAGAIN if the parameter is true and write throttling is > required. > > Add a new function called balance_dirty_pages_ratelimited_async() that will be > invoked from iomap_write_iter() if an async buffered write is requested. > > Enable async buffered write support in xfs > This enables async buffered writes for xfs. > > > Testing: > This patch has been tested with xfstests, fsx, fio and individual test programs. Good to hear. Will there be some new fstest coming/already merged? <snip> Hmm, well, vger and lore are still having stomach problems, so even the resend didn't result in #5 ending up in my mailbox. :( For the patches I haven't received, I'll just attach my replies as comments /after/ each patch subject line. What a way to review code! > Jan Kara (3): > mm: Move starting of background writeback into the main balancing loop > mm: Move updates of dirty_exceeded into one place > mm: Add balance_dirty_pages_ratelimited_flags() function (Yeah, I guess these changes make sense...) > Stefan Roesch (11): > iomap: Add flags parameter to iomap_page_create() > iomap: Add async buffered write support Reviewed-by: Darrick J. Wong <djwong@kernel.org> > iomap: Return -EAGAIN from iomap_write_iter() > fs: Add check for async buffered writes to generic_write_checks > fs: add __remove_file_privs() with flags parameter > fs: Split off inode_needs_update_time and __file_update_time > fs: Add async write file modification handling. The commit message references a file_modified_async function, but all I see is file_modified_flags? Assuming that's just a clerical error, Reviewed-by: Darrick J. Wong <djwong@kernel.org> > io_uring: Add support for async buffered writes Hm, ok, so the EAGAINs that we sprinkle everywhere get turned into short writes at the end of iomap_file_buffered_write, and that's what this picks up? If so, then... > io_uring: Add tracepoint for short writes > xfs: Specify lockmode when calling xfs_ilock_for_iomap() > xfs: Add async buffered write support ...I guess I'm ok with signing off on the last patch: Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > > fs/inode.c | 168 +++++++++++++++++++++++--------- > fs/io_uring.c | 32 +++++- > fs/iomap/buffered-io.c | 71 +++++++++++--- > fs/read_write.c | 4 +- > fs/xfs/xfs_file.c | 11 +-- > fs/xfs/xfs_iomap.c | 11 ++- > include/linux/fs.h | 4 + > include/linux/writeback.h | 7 ++ > include/trace/events/io_uring.h | 25 +++++ > mm/page-writeback.c | 89 +++++++++++------ > 10 files changed, 314 insertions(+), 108 deletions(-) > > > base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > -- > 2.30.2 >
On 6/23/22 2:31 PM, Darrick J. Wong wrote: >> Testing: >> This patch has been tested with xfstests, fsx, fio and individual test programs. > > Good to hear. Will there be some new fstest coming/already merged? It should not really require any new tests, as anything buffered + io_uring on xfs will now use this code. But Stefan has run a bunch of things on the side too, some of those synthetic (like ensure that various parts of a buffered write range isn't cached, etc) and some more generic (fsx). There might be some that could be turned into xfstests, I'll let him answer that one. > Hmm, well, vger and lore are still having stomach problems, so even the > resend didn't result in #5 ending up in my mailbox. :( > > For the patches I haven't received, I'll just attach my replies as > comments /after/ each patch subject line. What a way to review code! Really not sure what's going on with email these days, it's quite a pain... Thanks for taking a look so quickly! I've added your reviewed-bys and also made that ternary change you suggested. Only other change is addressing a kernelbot noticing that one ret in the mm side was being set to zero only, so we could kill it. End result: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.20/io_uring-buffered-writes
On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote: > Hmm, well, vger and lore are still having stomach problems, so even the > resend didn't result in #5 ending up in my mailbox. :( I can see a all here. Sometimes it helps to just wait a bit.
FYI, I think a subject like "fs: add a FMODE_BUF_WASYNC flags for f_mode" might be a more descriptive. As the new flag here really is the interesting part, not that we check it.
On 6/23/22 11:21 PM, Christoph Hellwig wrote: > FYI, I think a subject like > > "fs: add a FMODE_BUF_WASYNC flags for f_mode" > > might be a more descriptive. As the new flag here really is the > interesting part, not that we check it. Agree on that - if others do too, I can just make that edit.
On 6/23/22 11:14 PM, Christoph Hellwig wrote: > On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote: >> Hmm, well, vger and lore are still having stomach problems, so even the >> resend didn't result in #5 ending up in my mailbox. :( > > I can see a all here. Sometimes it helps to just wait a bit. on lore? I'm still seeing some missing. Which is a bit odd, since eg b4 can pull the series down just fine.
On 6/24/22 9:49 PM, Jens Axboe wrote: > On 6/23/22 11:14 PM, Christoph Hellwig wrote: >> On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote: >>> Hmm, well, vger and lore are still having stomach problems, so even the >>> resend didn't result in #5 ending up in my mailbox. :( >> >> I can see a all here. Sometimes it helps to just wait a bit. > > on lore? I'm still seeing some missing. Which is a bit odd, since eg b4 > can pull the series down just fine. I'm still seeing some missing on the io-uring lore too: https://lore.kernel.org/io-uring/20220623175157.1715274-1-shr@fb.com/ (missing) But changing the path to `/all/` shows the complete series: https://lore.kernel.org/all/20220623175157.1715274-1-shr@fb.com/ (complete) b4 seems to be fetching from `/all/`.
On 6/24/22 9:27 AM, Ammar Faizi wrote: > On 6/24/22 9:49 PM, Jens Axboe wrote: >> On 6/23/22 11:14 PM, Christoph Hellwig wrote: >>> On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote: >>>> Hmm, well, vger and lore are still having stomach problems, so even the >>>> resend didn't result in #5 ending up in my mailbox. :( >>> >>> I can see a all here. Sometimes it helps to just wait a bit. >> >> on lore? I'm still seeing some missing. Which is a bit odd, since eg b4 >> can pull the series down just fine. > > I'm still seeing some missing on the io-uring lore too: > > https://lore.kernel.org/io-uring/20220623175157.1715274-1-shr@fb.com/ (missing) > > But changing the path to `/all/` shows the complete series: > > https://lore.kernel.org/all/20220623175157.1715274-1-shr@fb.com/ (complete) > > b4 seems to be fetching from `/all/`. Ah, I see. Konstantin, do you know what is going on here? tldr - /all/ has the whole series, io-uring list does not.
On 6/23/22 11:21 PM, Christoph Hellwig wrote: > FYI, I think a subject like > > "fs: add a FMODE_BUF_WASYNC flags for f_mode" > > might be a more descriptive. As the new flag here really is the > interesting part, not that we check it. I made that edit.
On Thu, 23 Jun 2022 10:51:43 -0700, Stefan Roesch wrote: > This patch series adds support for async buffered writes when using both > xfs and io-uring. Currently io-uring only supports buffered writes in the > slow path, by processing them in the io workers. With this patch series it is > now possible to support buffered writes in the fast path. To be able to use > the fast path the required pages must be in the page cache, the required locks > in xfs can be granted immediately and no additional blocks need to be read > form disk. > > [...] Applied, thanks! [13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap() (no commit info) [14/14] xfs: Add async buffered write support (no commit info) Best regards,
On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote: > Add the kiocb flags parameter to the function iomap_page_create(). > Depending on the value of the flags parameter it enables different gfp > flags. > > No intended functional changes in this patch. [...] > @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, > if (WARN_ON_ONCE(size > iomap->length)) > return -EIO; > if (offset > 0) > - iop = iomap_page_create(iter->inode, folio); > + iop = iomap_page_create(iter->inode, folio, iter->flags); > else > iop = to_iomap_page(folio); I really don't like what this change has done to this file. I'm modifying this function, and I start thinking "Well, hang on, if flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop will be NULL, so we'll end up marking the entire folio uptodate when really we should only be marking some blocks uptodate, so we should really be failing the entire read if the allocation failed, but maybe it's OK because IOMAP_NOWAIT is never set in this path". I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or -EAGAIN if the memory allocation fails (leaving the NULL return for "we don't need an iop"). Thoughts?
On Fri, Mar 03, 2023 at 04:51:10AM +0000, Matthew Wilcox wrote: > On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote: > > Add the kiocb flags parameter to the function iomap_page_create(). > > Depending on the value of the flags parameter it enables different gfp > > flags. > > > > No intended functional changes in this patch. > > [...] > > > @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, > > if (WARN_ON_ONCE(size > iomap->length)) > > return -EIO; > > if (offset > 0) > > - iop = iomap_page_create(iter->inode, folio); > > + iop = iomap_page_create(iter->inode, folio, iter->flags); > > else > > iop = to_iomap_page(folio); > > I really don't like what this change has done to this file. I'm > modifying this function, and I start thinking "Well, hang on, if > flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop > will be NULL, so we'll end up marking the entire folio uptodate > when really we should only be marking some blocks uptodate, so > we should really be failing the entire read if the allocation > failed, but maybe it's OK because IOMAP_NOWAIT is never set in > this path". > > I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or > -EAGAIN if the memory allocation fails (leaving the NULL return > for "we don't need an iop"). Thoughts? I don't see any problem with that, aside from being pre-coffee and on vacation for the rest of today. ;) --D
On 3/3/23 8:53 AM, Darrick J. Wong wrote: > > > On Fri, Mar 03, 2023 at 04:51:10AM +0000, Matthew Wilcox wrote: >> On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote: >>> Add the kiocb flags parameter to the function iomap_page_create(). >>> Depending on the value of the flags parameter it enables different gfp >>> flags. >>> >>> No intended functional changes in this patch. >> >> [...] >> >>> @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, >>> if (WARN_ON_ONCE(size > iomap->length)) >>> return -EIO; >>> if (offset > 0) >>> - iop = iomap_page_create(iter->inode, folio); >>> + iop = iomap_page_create(iter->inode, folio, iter->flags); >>> else >>> iop = to_iomap_page(folio); >> >> I really don't like what this change has done to this file. I'm >> modifying this function, and I start thinking "Well, hang on, if >> flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop >> will be NULL, so we'll end up marking the entire folio uptodate >> when really we should only be marking some blocks uptodate, so >> we should really be failing the entire read if the allocation >> failed, but maybe it's OK because IOMAP_NOWAIT is never set in >> this path". >> >> I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or >> -EAGAIN if the memory allocation fails (leaving the NULL return >> for "we don't need an iop"). Thoughts? > > I don't see any problem with that, aside from being pre-coffee and on > vacation for the rest of today. ;) > > --D If IOMAP_NOWAIT is set, and the allocation fails, we should return -EAGAIN, so the write request is retried in the slow path. --Stefan
On Fri, Mar 03, 2023 at 09:29:30AM -0800, Stefan Roesch wrote: > If IOMAP_NOWAIT is set, and the allocation fails, we should return > -EAGAIN, so the write request is retried in the slow path. Yes. Another vote for doing the ERR_PTR. willy, are you going to look into that yourself or are you waiting for someone to take care of it?