diff mbox series

iomap: Submit BIOs at the end of each extent

Message ID 20200319150720.24622-1-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series iomap: Submit BIOs at the end of each extent | expand

Commit Message

Matthew Wilcox March 19, 2020, 3:07 p.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

By definition, an extent covers a range of consecutive blocks, so
it would be quite rare to be able to just add pages to the BIO from
a previous range.  The only case we can think of is a mapped extent
followed by a hole extent, followed by another mapped extent which has
been allocated immediately after the first extent.  We believe this to
be an unlikely layout for a filesystem to choose and, since the queue
is plugged, those two BIOs would be merged by the block layer.

The reason we care is that ext2/ext4 choose to lay out blocks 0-11
consecutively, followed by the indirect block, and we want to merge those
two BIOs.  If we don't submit the data BIO before asking the filesystem
for the next extent, then the indirect BIO will be submitted first,
and waited for, leading to inefficient I/O patterns.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Darrick J. Wong March 19, 2020, 3:18 p.m. UTC | #1
On Thu, Mar 19, 2020 at 08:07:20AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> By definition, an extent covers a range of consecutive blocks, so
> it would be quite rare to be able to just add pages to the BIO from
> a previous range.  The only case we can think of is a mapped extent
> followed by a hole extent, followed by another mapped extent which has
> been allocated immediately after the first extent.  We believe this to

Well... userspace can induce that with fallocate(INSERT_RANGE). :)

> be an unlikely layout for a filesystem to choose and, since the queue
> is plugged, those two BIOs would be merged by the block layer.
> 
> The reason we care is that ext2/ext4 choose to lay out blocks 0-11
> consecutively, followed by the indirect block, and we want to merge those
> two BIOs.  If we don't submit the data BIO before asking the filesystem
> for the next extent, then the indirect BIO will be submitted first,
> and waited for, leading to inefficient I/O patterns.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 83438b3257de..8d26920ddf00 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -388,6 +388,11 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
>  				ctx, iomap, srcmap);
>  	}
>  
> +	if (ctx->bio) {
> +		submit_bio(ctx->bio);
> +		ctx->bio = NULL;
> +	}

Makes sense, but could we have a quick comment here to capture why we're
submitting the bio here?

/*
 * Submit the bio now so that we neither combine IO requests for
 * non-adjacent ranges nor interleave data and metadata requests.
 */

--D

> +
>  	return done;
>  }
>  
> -- 
> 2.25.1
>
Matthew Wilcox March 19, 2020, 7:06 p.m. UTC | #2
On Thu, Mar 19, 2020 at 08:18:19AM -0700, Darrick J. Wong wrote:
> On Thu, Mar 19, 2020 at 08:07:20AM -0700, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > By definition, an extent covers a range of consecutive blocks, so
> > it would be quite rare to be able to just add pages to the BIO from
> > a previous range.  The only case we can think of is a mapped extent
> > followed by a hole extent, followed by another mapped extent which has
> > been allocated immediately after the first extent.  We believe this to
> 
> Well... userspace can induce that with fallocate(INSERT_RANGE). :)

It's not impossible, of course ... just unlikely.  Nobody actually uses
INSERT_RANGE anyway.

> > be an unlikely layout for a filesystem to choose and, since the queue
> > is plugged, those two BIOs would be merged by the block layer.
> > 
> > The reason we care is that ext2/ext4 choose to lay out blocks 0-11
> > consecutively, followed by the indirect block, and we want to merge those
> > two BIOs.  If we don't submit the data BIO before asking the filesystem
> > for the next extent, then the indirect BIO will be submitted first,
> > and waited for, leading to inefficient I/O patterns.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/iomap/buffered-io.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 83438b3257de..8d26920ddf00 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -388,6 +388,11 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
> >  				ctx, iomap, srcmap);
> >  	}
> >  
> > +	if (ctx->bio) {
> > +		submit_bio(ctx->bio);
> > +		ctx->bio = NULL;
> > +	}
> 
> Makes sense, but could we have a quick comment here to capture why we're
> submitting the bio here?
> 
> /*
>  * Submit the bio now so that we neither combine IO requests for
>  * non-adjacent ranges nor interleave data and metadata requests.
>  */

How about:

         * Submitting the bio here leads to better I/O patterns for
         * filesystems which need to do metadata reads to find the
         * next range.

I also realised we can add:

@@ -454,8 +459,6 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
        }
        ret = 0;
 done:
-       if (ctx.bio)
-               submit_bio(ctx.bio);
        if (ctx.cur_page) {
                if (!ctx.cur_page_in_bio)
                        unlock_page(ctx.cur_page);

since we always subit the bio in readpages_actor.
Darrick J. Wong March 19, 2020, 7:08 p.m. UTC | #3
On Thu, Mar 19, 2020 at 12:06:46PM -0700, Matthew Wilcox wrote:
> On Thu, Mar 19, 2020 at 08:18:19AM -0700, Darrick J. Wong wrote:
> > On Thu, Mar 19, 2020 at 08:07:20AM -0700, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > 
> > > By definition, an extent covers a range of consecutive blocks, so
> > > it would be quite rare to be able to just add pages to the BIO from
> > > a previous range.  The only case we can think of is a mapped extent
> > > followed by a hole extent, followed by another mapped extent which has
> > > been allocated immediately after the first extent.  We believe this to
> > 
> > Well... userspace can induce that with fallocate(INSERT_RANGE). :)
> 
> It's not impossible, of course ... just unlikely.  Nobody actually uses
> INSERT_RANGE anyway.
> 
> > > be an unlikely layout for a filesystem to choose and, since the queue
> > > is plugged, those two BIOs would be merged by the block layer.
> > > 
> > > The reason we care is that ext2/ext4 choose to lay out blocks 0-11
> > > consecutively, followed by the indirect block, and we want to merge those
> > > two BIOs.  If we don't submit the data BIO before asking the filesystem
> > > for the next extent, then the indirect BIO will be submitted first,
> > > and waited for, leading to inefficient I/O patterns.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  fs/iomap/buffered-io.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 83438b3257de..8d26920ddf00 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -388,6 +388,11 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
> > >  				ctx, iomap, srcmap);
> > >  	}
> > >  
> > > +	if (ctx->bio) {
> > > +		submit_bio(ctx->bio);
> > > +		ctx->bio = NULL;
> > > +	}
> > 
> > Makes sense, but could we have a quick comment here to capture why we're
> > submitting the bio here?
> > 
> > /*
> >  * Submit the bio now so that we neither combine IO requests for
> >  * non-adjacent ranges nor interleave data and metadata requests.
> >  */
> 
> How about:
> 
>          * Submitting the bio here leads to better I/O patterns for
>          * filesystems which need to do metadata reads to find the
>          * next range.

Oooh I like your version better.

> I also realised we can add:
> 
> @@ -454,8 +459,6 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
>         }
>         ret = 0;
>  done:
> -       if (ctx.bio)
> -               submit_bio(ctx.bio);
>         if (ctx.cur_page) {
>                 if (!ctx.cur_page_in_bio)
>                         unlock_page(ctx.cur_page);
> 
> since we always subit the bio in readpages_actor.

<nod> I'll keep an eye out for v2. :)

--D
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 83438b3257de..8d26920ddf00 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -388,6 +388,11 @@  iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 				ctx, iomap, srcmap);
 	}
 
+	if (ctx->bio) {
+		submit_bio(ctx->bio);
+		ctx->bio = NULL;
+	}
+
 	return done;
 }