diff mbox series

[v3,1/3] iomap: resched ioend completion when in non-atomic context

Message ID 20210517171722.1266878-2-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series iomap: avoid soft lockup warnings on large ioends | expand

Commit Message

Brian Foster May 17, 2021, 5:17 p.m. UTC
The iomap ioend mechanism has the ability to construct very large,
contiguous bios and/or bio chains. This has been reported to lead to
soft lockup warnings in bio completion due to the amount of page
processing that occurs. Update the ioend completion path with a
parameter to indicate atomic context and insert a cond_resched()
call to avoid soft lockups in either scenario.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 15 +++++++++------
 fs/xfs/xfs_aops.c      |  2 +-
 include/linux/iomap.h  |  2 +-
 3 files changed, 11 insertions(+), 8 deletions(-)

Comments

Matthew Wilcox May 17, 2021, 5:54 p.m. UTC | #1
On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
>  			next = bio->bi_private;
>  
>  		/* walk each page on bio, ending page IO on them */
> -		bio_for_each_segment_all(bv, bio, iter_all)
> +		bio_for_each_segment_all(bv, bio, iter_all) {
>  			iomap_finish_page_writeback(inode, bv->bv_page, error,
>  					bv->bv_len);
> +			if (!atomic)
> +				cond_resched();
> +		}

I don't know that it makes sense to check after _every_ page.  I might
go for every segment.  Some users check after every thousand pages.
Brian Foster May 18, 2021, 11:38 a.m. UTC | #2
On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote:
> On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
> >  			next = bio->bi_private;
> >  
> >  		/* walk each page on bio, ending page IO on them */
> > -		bio_for_each_segment_all(bv, bio, iter_all)
> > +		bio_for_each_segment_all(bv, bio, iter_all) {
> >  			iomap_finish_page_writeback(inode, bv->bv_page, error,
> >  					bv->bv_len);
> > +			if (!atomic)
> > +				cond_resched();
> > +		}
> 
> I don't know that it makes sense to check after _every_ page.  I might
> go for every segment.  Some users check after every thousand pages.
> 

The handful of examples I come across on a brief scan (including the
other iomap usage) have a similar pattern as used here. I don't doubt
there are others, but I think I'd prefer to have more reasoning behind
adding more code than might be necessary (i.e. do we expect additional
overhead to be measurable here?). As it is, the intent isn't so much to
check on every page as much as this just happens to be the common point
of the function to cover both long bio chains and single vector bios
with large numbers of pages.

Brian
Darrick J. Wong May 20, 2021, 9:58 p.m. UTC | #3
On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote:
> On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote:
> > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
> > >  			next = bio->bi_private;
> > >  
> > >  		/* walk each page on bio, ending page IO on them */
> > > -		bio_for_each_segment_all(bv, bio, iter_all)
> > > +		bio_for_each_segment_all(bv, bio, iter_all) {
> > >  			iomap_finish_page_writeback(inode, bv->bv_page, error,
> > >  					bv->bv_len);
> > > +			if (!atomic)
> > > +				cond_resched();
> > > +		}
> > 
> > I don't know that it makes sense to check after _every_ page.  I might
> > go for every segment.  Some users check after every thousand pages.
> > 
> 
> The handful of examples I come across on a brief scan (including the
> other iomap usage) have a similar pattern as used here. I don't doubt
> there are others, but I think I'd prefer to have more reasoning behind
> adding more code than might be necessary (i.e. do we expect additional
> overhead to be measurable here?). As it is, the intent isn't so much to
> check on every page as much as this just happens to be the common point
> of the function to cover both long bio chains and single vector bios
> with large numbers of pages.

It's been a while since I waded through the macro hell to find out what
cond_resched actually does, but iirc it can do some fairly heavyweight
things (disable preemption, call the scheduler, rcu stuff) which is why
we're supposed to be a little judicious about amortizing each call over
a few thousand pages.

--D

> Brian
>
Ming Lei May 22, 2021, 7:45 a.m. UTC | #4
On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> The iomap ioend mechanism has the ability to construct very large,
> contiguous bios and/or bio chains. This has been reported to lead to

BTW, it is actually wrong to complete a large bio chains in
iomap_finish_ioend(), which may risk in bio allocation deadlock, cause
bio_alloc_bioset() relies on bio submission to make forward progress. But
it becomes not true when all chained bios are freed just after the whole
ioend is done since all chained bios(except for the one embedded in ioend)
are allocated from same bioset(fs_bio_set).


Thanks,
Ming
Brian Foster May 24, 2021, 11:57 a.m. UTC | #5
On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote:
> On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote:
> > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote:
> > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
> > > >  			next = bio->bi_private;
> > > >  
> > > >  		/* walk each page on bio, ending page IO on them */
> > > > -		bio_for_each_segment_all(bv, bio, iter_all)
> > > > +		bio_for_each_segment_all(bv, bio, iter_all) {
> > > >  			iomap_finish_page_writeback(inode, bv->bv_page, error,
> > > >  					bv->bv_len);
> > > > +			if (!atomic)
> > > > +				cond_resched();
> > > > +		}
> > > 
> > > I don't know that it makes sense to check after _every_ page.  I might
> > > go for every segment.  Some users check after every thousand pages.
> > > 
> > 
> > The handful of examples I come across on a brief scan (including the
> > other iomap usage) have a similar pattern as used here. I don't doubt
> > there are others, but I think I'd prefer to have more reasoning behind
> > adding more code than might be necessary (i.e. do we expect additional
> > overhead to be measurable here?). As it is, the intent isn't so much to
> > check on every page as much as this just happens to be the common point
> > of the function to cover both long bio chains and single vector bios
> > with large numbers of pages.
> 
> It's been a while since I waded through the macro hell to find out what
> cond_resched actually does, but iirc it can do some fairly heavyweight
> things (disable preemption, call the scheduler, rcu stuff) which is why
> we're supposed to be a little judicious about amortizing each call over
> a few thousand pages.
> 

It looks to me it just checks some state bit and only does any work if
actually necessary. I suppose not doing that less often is cheaper than
doing it more, but it's not clear to me it's enough that it really
matters and/or warrants more code to filter out calls..

What exactly did you have in mind for logic? I suppose we could always
stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the
inner loop, but that might have less of an effect on larger chains
constructed of bios with fewer pages (depending on whether that might
still be possible).

Brian

> --D
> 
> > Brian
> > 
>
Brian Foster May 24, 2021, 11:57 a.m. UTC | #6
On Sat, May 22, 2021 at 03:45:11PM +0800, Ming Lei wrote:
> On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > The iomap ioend mechanism has the ability to construct very large,
> > contiguous bios and/or bio chains. This has been reported to lead to
> 
> BTW, it is actually wrong to complete a large bio chains in
> iomap_finish_ioend(), which may risk in bio allocation deadlock, cause
> bio_alloc_bioset() relies on bio submission to make forward progress. But
> it becomes not true when all chained bios are freed just after the whole
> ioend is done since all chained bios(except for the one embedded in ioend)
> are allocated from same bioset(fs_bio_set).
> 

Interesting. Do you have a reproducer (or error report) for this? Is it
addressed by the next patch, or are further changes required?

Brian

> 
> Thanks,
> Ming
>
Ming Lei May 24, 2021, 2:11 p.m. UTC | #7
On Mon, May 24, 2021 at 07:57:48AM -0400, Brian Foster wrote:
> On Sat, May 22, 2021 at 03:45:11PM +0800, Ming Lei wrote:
> > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > > The iomap ioend mechanism has the ability to construct very large,
> > > contiguous bios and/or bio chains. This has been reported to lead to
> > 
> > BTW, it is actually wrong to complete a large bio chains in
> > iomap_finish_ioend(), which may risk in bio allocation deadlock, cause
> > bio_alloc_bioset() relies on bio submission to make forward progress. But
> > it becomes not true when all chained bios are freed just after the whole
> > ioend is done since all chained bios(except for the one embedded in ioend)
> > are allocated from same bioset(fs_bio_set).
> > 
> 
> Interesting. Do you have a reproducer (or error report) for this? Is it

No, but the theory has been applied for long time.

> addressed by the next patch, or are further changes required?

Your patchset can't address the issue.


Thanks,
Ming
Darrick J. Wong May 24, 2021, 4:53 p.m. UTC | #8
On Mon, May 24, 2021 at 07:57:31AM -0400, Brian Foster wrote:
> On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote:
> > On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote:
> > > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote:
> > > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
> > > > >  			next = bio->bi_private;
> > > > >  
> > > > >  		/* walk each page on bio, ending page IO on them */
> > > > > -		bio_for_each_segment_all(bv, bio, iter_all)
> > > > > +		bio_for_each_segment_all(bv, bio, iter_all) {
> > > > >  			iomap_finish_page_writeback(inode, bv->bv_page, error,
> > > > >  					bv->bv_len);
> > > > > +			if (!atomic)
> > > > > +				cond_resched();
> > > > > +		}
> > > > 
> > > > I don't know that it makes sense to check after _every_ page.  I might
> > > > go for every segment.  Some users check after every thousand pages.
> > > > 
> > > 
> > > The handful of examples I come across on a brief scan (including the
> > > other iomap usage) have a similar pattern as used here. I don't doubt
> > > there are others, but I think I'd prefer to have more reasoning behind
> > > adding more code than might be necessary (i.e. do we expect additional
> > > overhead to be measurable here?). As it is, the intent isn't so much to
> > > check on every page as much as this just happens to be the common point
> > > of the function to cover both long bio chains and single vector bios
> > > with large numbers of pages.
> > 
> > It's been a while since I waded through the macro hell to find out what
> > cond_resched actually does, but iirc it can do some fairly heavyweight
> > things (disable preemption, call the scheduler, rcu stuff) which is why
> > we're supposed to be a little judicious about amortizing each call over
> > a few thousand pages.
> > 
> 
> It looks to me it just checks some state bit and only does any work if
> actually necessary. I suppose not doing that less often is cheaper than
> doing it more, but it's not clear to me it's enough that it really
> matters and/or warrants more code to filter out calls..
> 
> What exactly did you have in mind for logic? I suppose we could always
> stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the
> inner loop, but that might have less of an effect on larger chains
> constructed of bios with fewer pages (depending on whether that might
> still be possible).

I /was/ thinking about a function level page counter until I noticed
that iomap_{write,unshare}_actor call cond_resched for every page it
touches.  I withdraw the comment. :)

--D

> 
> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > 
>
Darrick J. Wong May 26, 2021, 1:19 a.m. UTC | #9
On Mon, May 24, 2021 at 09:53:05AM -0700, Darrick J. Wong wrote:
> On Mon, May 24, 2021 at 07:57:31AM -0400, Brian Foster wrote:
> > On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote:
> > > On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote:
> > > > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote:
> > > > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote:
> > > > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
> > > > > >  			next = bio->bi_private;
> > > > > >  
> > > > > >  		/* walk each page on bio, ending page IO on them */
> > > > > > -		bio_for_each_segment_all(bv, bio, iter_all)
> > > > > > +		bio_for_each_segment_all(bv, bio, iter_all) {
> > > > > >  			iomap_finish_page_writeback(inode, bv->bv_page, error,
> > > > > >  					bv->bv_len);
> > > > > > +			if (!atomic)
> > > > > > +				cond_resched();
> > > > > > +		}
> > > > > 
> > > > > I don't know that it makes sense to check after _every_ page.  I might
> > > > > go for every segment.  Some users check after every thousand pages.
> > > > > 
> > > > 
> > > > The handful of examples I come across on a brief scan (including the
> > > > other iomap usage) have a similar pattern as used here. I don't doubt
> > > > there are others, but I think I'd prefer to have more reasoning behind
> > > > adding more code than might be necessary (i.e. do we expect additional
> > > > overhead to be measurable here?). As it is, the intent isn't so much to
> > > > check on every page as much as this just happens to be the common point
> > > > of the function to cover both long bio chains and single vector bios
> > > > with large numbers of pages.
> > > 
> > > It's been a while since I waded through the macro hell to find out what
> > > cond_resched actually does, but iirc it can do some fairly heavyweight
> > > things (disable preemption, call the scheduler, rcu stuff) which is why
> > > we're supposed to be a little judicious about amortizing each call over
> > > a few thousand pages.
> > > 
> > 
> > It looks to me it just checks some state bit and only does any work if
> > actually necessary. I suppose not doing that less often is cheaper than
> > doing it more, but it's not clear to me it's enough that it really
> > matters and/or warrants more code to filter out calls..
> > 
> > What exactly did you have in mind for logic? I suppose we could always
> > stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the
> > inner loop, but that might have less of an effect on larger chains
> > constructed of bios with fewer pages (depending on whether that might
> > still be possible).
> 
> I /was/ thinking about a function level page counter until I noticed
> that iomap_{write,unshare}_actor call cond_resched for every page it
> touches.  I withdraw the comment. :)

Oh, also:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > 
> >
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 414769a6ad11..642422775e4e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1061,7 +1061,7 @@  iomap_finish_page_writeback(struct inode *inode, struct page *page,
  * ioend after this.
  */
 static void
-iomap_finish_ioend(struct iomap_ioend *ioend, int error)
+iomap_finish_ioend(struct iomap_ioend *ioend, int error, bool atomic)
 {
 	struct inode *inode = ioend->io_inode;
 	struct bio *bio = &ioend->io_inline_bio;
@@ -1084,9 +1084,12 @@  iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bv, bio, iter_all)
+		bio_for_each_segment_all(bv, bio, iter_all) {
 			iomap_finish_page_writeback(inode, bv->bv_page, error,
 					bv->bv_len);
+			if (!atomic)
+				cond_resched();
+		}
 		bio_put(bio);
 	}
 	/* The ioend has been freed by bio_put() */
@@ -1099,17 +1102,17 @@  iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 }
 
 void
-iomap_finish_ioends(struct iomap_ioend *ioend, int error)
+iomap_finish_ioends(struct iomap_ioend *ioend, int error, bool atomic)
 {
 	struct list_head tmp;
 
 	list_replace_init(&ioend->io_list, &tmp);
-	iomap_finish_ioend(ioend, error);
+	iomap_finish_ioend(ioend, error, atomic);
 
 	while (!list_empty(&tmp)) {
 		ioend = list_first_entry(&tmp, struct iomap_ioend, io_list);
 		list_del_init(&ioend->io_list);
-		iomap_finish_ioend(ioend, error);
+		iomap_finish_ioend(ioend, error, atomic);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_finish_ioends);
@@ -1178,7 +1181,7 @@  static void iomap_writepage_end_bio(struct bio *bio)
 {
 	struct iomap_ioend *ioend = bio->bi_private;
 
-	iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
+	iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status), true);
 }
 
 /*
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 9b08db45ce85..84cd6cf46b12 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -123,7 +123,7 @@  xfs_end_ioend(
 	if (!error && xfs_ioend_is_append(ioend))
 		error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
 done:
-	iomap_finish_ioends(ioend, error);
+	iomap_finish_ioends(ioend, error, false);
 	memalloc_nofs_restore(nofs_flag);
 }
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d202fd2d0f91..07f3f4e69084 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -232,7 +232,7 @@  struct iomap_writepage_ctx {
 	const struct iomap_writeback_ops *ops;
 };
 
-void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
+void iomap_finish_ioends(struct iomap_ioend *ioend, int error, bool atomic);
 void iomap_ioend_try_merge(struct iomap_ioend *ioend,
 		struct list_head *more_ioends,
 		void (*merge_private)(struct iomap_ioend *ioend,