diff mbox series

[v2,2/2] xfs: kick extra large ioends to completion workqueue

Message ID 20201005152102.15797-1-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series None | expand

Commit Message

Brian Foster Oct. 5, 2020, 3:21 p.m. UTC
We've had reports of soft lockup warnings in the iomap ioend
completion path due to very large bios and/or bio chains. Divert any
ioends with 256k or more pages to process to the workqueue so
completion occurs in non-atomic context and can reschedule to avoid
soft lockup warnings.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

v2:
- Fix type in macro.

 fs/xfs/xfs_aops.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Oct. 6, 2020, 3:55 a.m. UTC | #1
On Mon, Oct 05, 2020 at 11:21:02AM -0400, Brian Foster wrote:
> We've had reports of soft lockup warnings in the iomap ioend
> completion path due to very large bios and/or bio chains. Divert any
> ioends with 256k or more pages to process to the workqueue so
> completion occurs in non-atomic context and can reschedule to avoid
> soft lockup warnings.

Hmmmm... is there any way we can just make end_page_writeback faster?

TBH it still strikes me as odd that we'd cap ioends this way just to
cover for the fact that we have to poke each and every page.

(Also, those 'bool atomic' in the other patch make me kind of nervous --
how do we make sure (from a QA perspective) that nobody gets that wrong?)

--D

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> v2:
> - Fix type in macro.
> 
>  fs/xfs/xfs_aops.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 3e061ea99922..c00cc0624986 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -30,6 +30,13 @@ XFS_WPC(struct iomap_writepage_ctx *ctx)
>  	return container_of(ctx, struct xfs_writepage_ctx, ctx);
>  }
>  
> +/*
> + * Kick extra large ioends off to the workqueue. Completion will process a lot
> + * of pages for a large bio or bio chain and a non-atomic context is required to
> + * reschedule and avoid soft lockup warnings.
> + */
> +#define XFS_LARGE_IOEND	(262144ULL << PAGE_SHIFT)
> +
>  /*
>   * Fast and loose check if this write could update the on-disk inode size.
>   */
> @@ -239,7 +246,8 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
>  {
>  	return ioend->io_private ||
>  		ioend->io_type == IOMAP_UNWRITTEN ||
> -		(ioend->io_flags & IOMAP_F_SHARED);
> +		(ioend->io_flags & IOMAP_F_SHARED) ||
> +		(ioend->io_size >= XFS_LARGE_IOEND);
>  }
>  
>  STATIC void
> -- 
> 2.25.4
>
Brian Foster Oct. 6, 2020, 12:44 p.m. UTC | #2
On Mon, Oct 05, 2020 at 08:55:37PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 05, 2020 at 11:21:02AM -0400, Brian Foster wrote:
> > We've had reports of soft lockup warnings in the iomap ioend
> > completion path due to very large bios and/or bio chains. Divert any
> > ioends with 256k or more pages to process to the workqueue so
> > completion occurs in non-atomic context and can reschedule to avoid
> > soft lockup warnings.
> 
> Hmmmm... is there any way we can just make end_page_writeback faster?
> 

I'm not sure that would help us. It's not doing much work as it is. The
issue is just that we effectively queue so many of them to a single bio
completion due to either bio chaining or the physical page merging
implemented by multipage bvecs.

> TBH it still strikes me as odd that we'd cap ioends this way just to
> cover for the fact that we have to poke each and every page.
> 

I suppose, but it's not like we don't already account for constructing
bios that must be handed off to a workqueue for completion processing.
Also FWIW this doesn't cap ioend size like my original patch does. It
just updates XFS to send them to the completion workqueue.

> (Also, those 'bool atomic' in the other patch make me kind of nervous --
> how do we make sure (from a QA perspective) that nobody gets that wrong?)
> 

Yeah, that's a bit ugly. If somebody has a better idea on the factoring
I'm interested in hearing about it. My understanding is that in_atomic()
is not reliable and/or generally frowned upon, hence the explicit
context parameter.

Also, I don't have the error handy but my development kernel complains
quite clearly if we make a call that can potentially sleep in atomic
context. I believe this is the purpose of the __might_sleep()
(CONFIG_DEBUG_ATOMIC_SLEEP) annotation.

Brian

> --D
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > v2:
> > - Fix type in macro.
> > 
> >  fs/xfs/xfs_aops.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 3e061ea99922..c00cc0624986 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -30,6 +30,13 @@ XFS_WPC(struct iomap_writepage_ctx *ctx)
> >  	return container_of(ctx, struct xfs_writepage_ctx, ctx);
> >  }
> >  
> > +/*
> > + * Kick extra large ioends off to the workqueue. Completion will process a lot
> > + * of pages for a large bio or bio chain and a non-atomic context is required to
> > + * reschedule and avoid soft lockup warnings.
> > + */
> > +#define XFS_LARGE_IOEND	(262144ULL << PAGE_SHIFT)
> > +
> >  /*
> >   * Fast and loose check if this write could update the on-disk inode size.
> >   */
> > @@ -239,7 +246,8 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> >  {
> >  	return ioend->io_private ||
> >  		ioend->io_type == IOMAP_UNWRITTEN ||
> > -		(ioend->io_flags & IOMAP_F_SHARED);
> > +		(ioend->io_flags & IOMAP_F_SHARED) ||
> > +		(ioend->io_size >= XFS_LARGE_IOEND);
> >  }
> >  
> >  STATIC void
> > -- 
> > 2.25.4
> > 
>
Matthew Wilcox Oct. 6, 2020, 2:07 p.m. UTC | #3
On Mon, Oct 05, 2020 at 08:55:37PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 05, 2020 at 11:21:02AM -0400, Brian Foster wrote:
> > We've had reports of soft lockup warnings in the iomap ioend
> > completion path due to very large bios and/or bio chains. Divert any
> > ioends with 256k or more pages to process to the workqueue so
> > completion occurs in non-atomic context and can reschedule to avoid
> > soft lockup warnings.
> 
> Hmmmm... is there any way we can just make end_page_writeback faster?

There are ways to make it faster.  I don't know if they're a "just"
solution ...

1. We can use THPs.  That will reduce the number of pages being operated
on.  I hear somebody might have a patch set for that.  Incidentally,
this patch set will clash with the THP patchset, so one of us is going to
have to rebase on the other's work.  Not a complaint, just acknowledging
that some coordination will be needed for the 5.11 merge window.

2. We could create end_writeback_pages(struct pagevec *pvec) which
calls a new test_clear_writeback_pages(pvec).  That could amortise
taking the memcg lock and finding the lruvec and taking the mapping
lock -- assuming these pages are sufficiently virtually contiguous.
It can definitely amortise all the statistics updates.

3. We can make wake_up_page(page, PG_writeback); more efficient.  If
you can produce this situation on demand, I had a patch for that which
languished due to lack of interest.

https://lore.kernel.org/linux-fsdevel/20200416220130.13343-1-willy@infradead.org/
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3e061ea99922..c00cc0624986 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -30,6 +30,13 @@  XFS_WPC(struct iomap_writepage_ctx *ctx)
 	return container_of(ctx, struct xfs_writepage_ctx, ctx);
 }
 
+/*
+ * Kick extra large ioends off to the workqueue. Completion will process a lot
+ * of pages for a large bio or bio chain and a non-atomic context is required to
+ * reschedule and avoid soft lockup warnings.
+ */
+#define XFS_LARGE_IOEND	(262144ULL << PAGE_SHIFT)
+
 /*
  * Fast and loose check if this write could update the on-disk inode size.
  */
@@ -239,7 +246,8 @@  static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
 {
 	return ioend->io_private ||
 		ioend->io_type == IOMAP_UNWRITTEN ||
-		(ioend->io_flags & IOMAP_F_SHARED);
+		(ioend->io_flags & IOMAP_F_SHARED) ||
+		(ioend->io_size >= XFS_LARGE_IOEND);
 }
 
 STATIC void