diff mbox series

[05/45] xfs: async blkdev cache flush

Message ID 20210305051143.182133-6-david@fromorbit.com (mailing list archive)
State New
Headers show
Series xfs: consolidated log and optimisation changes | expand

Commit Message

Dave Chinner March 5, 2021, 5:11 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

The new checkpoint caceh flush mechanism requires us to issue an
unconditional cache flush before we start a new checkpoint. We don't
want to block for this if we can help it, and we have a fair chunk
of CPU work to do between starting the checkpoint and issuing the
first journal IO.

Hence it makes sense to amortise the latency cost of the cache flush
by issuing it asynchronously and then waiting for it only when we
need to issue the first IO in the transaction.

TO do this, we need async cache flush primitives to submit the cache
flush bio and to wait on it. THe block layer has no such primitives
for filesystems, so roll our own for the moment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_linux.h  |  2 ++
 2 files changed, 38 insertions(+)

Comments

Chandan Babu R March 8, 2021, 9:48 a.m. UTC | #1
On 05 Mar 2021 at 10:41, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> The new checkpoint caceh flush mechanism requires us to issue an
> unconditional cache flush before we start a new checkpoint. We don't
> want to block for this if we can help it, and we have a fair chunk
> of CPU work to do between starting the checkpoint and issuing the
> first journal IO.
>
> Hence it makes sense to amortise the latency cost of the cache flush
> by issuing it asynchronously and then waiting for it only when we
> need to issue the first IO in the transaction.
>
> TO do this, we need async cache flush primitives to submit the cache
> flush bio and to wait on it. THe block layer has no such primitives
> for filesystems, so roll our own for the moment.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_linux.h  |  2 ++
>  2 files changed, 38 insertions(+)
>
> diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> index 17f36db2f792..668f8bd27b4a 100644
> --- a/fs/xfs/xfs_bio_io.c
> +++ b/fs/xfs/xfs_bio_io.c
> @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
>  	return bio_max_segs(howmany(count, PAGE_SIZE));
>  }
>  
> +void
> +xfs_flush_bdev_async_endio(
> +	struct bio	*bio)
> +{
> +	if (bio->bi_private)
> +		complete(bio->bi_private);
> +}
> +
> +/*
> + * Submit a request for an async cache flush to run. If the request queue does
> + * not require flush operations, just skip it altogether. If the caller needsi
> + * to wait for the flush completion at a later point in time, they must supply a
> + * valid completion. This will be signalled when the flush completes.  The
> + * caller never sees the bio that is issued here.
> + */
> +void
> +xfs_flush_bdev_async(
> +	struct bio		*bio,
> +	struct block_device	*bdev,
> +	struct completion	*done)
> +{
> +	struct request_queue	*q = bdev->bd_disk->queue;
> +
> +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> +		complete(done);

complete() should be invoked only when "done" has a non-NULL value.

> +		return;
> +	}
Darrick J. Wong March 8, 2021, 10:24 p.m. UTC | #2
On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote:
> On 05 Mar 2021 at 10:41, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > The new checkpoint caceh flush mechanism requires us to issue an
> > unconditional cache flush before we start a new checkpoint. We don't
> > want to block for this if we can help it, and we have a fair chunk
> > of CPU work to do between starting the checkpoint and issuing the
> > first journal IO.
> >
> > Hence it makes sense to amortise the latency cost of the cache flush
> > by issuing it asynchronously and then waiting for it only when we
> > need to issue the first IO in the transaction.
> >
> > TO do this, we need async cache flush primitives to submit the cache
> > flush bio and to wait on it. THe block layer has no such primitives
> > for filesystems, so roll our own for the moment.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_linux.h  |  2 ++
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> > index 17f36db2f792..668f8bd27b4a 100644
> > --- a/fs/xfs/xfs_bio_io.c
> > +++ b/fs/xfs/xfs_bio_io.c
> > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
> >  	return bio_max_segs(howmany(count, PAGE_SIZE));
> >  }
> >  
> > +void
> > +xfs_flush_bdev_async_endio(
> > +	struct bio	*bio)
> > +{
> > +	if (bio->bi_private)
> > +		complete(bio->bi_private);
> > +}
> > +
> > +/*
> > + * Submit a request for an async cache flush to run. If the request queue does
> > + * not require flush operations, just skip it altogether. If the caller needsi
> > + * to wait for the flush completion at a later point in time, they must supply a
> > + * valid completion. This will be signalled when the flush completes.  The
> > + * caller never sees the bio that is issued here.
> > + */
> > +void
> > +xfs_flush_bdev_async(
> > +	struct bio		*bio,
> > +	struct block_device	*bdev,
> > +	struct completion	*done)
> > +{
> > +	struct request_queue	*q = bdev->bd_disk->queue;
> > +
> > +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> > +		complete(done);
> 
> complete() should be invoked only when "done" has a non-NULL value.

The only caller always provides a completion.

--D

> > +		return;
> > +	}
> 
> -- 
> chandan
Darrick J. Wong March 8, 2021, 10:26 p.m. UTC | #3
On Fri, Mar 05, 2021 at 04:11:03PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The new checkpoint caceh flush mechanism requires us to issue an
> unconditional cache flush before we start a new checkpoint. We don't
> want to block for this if we can help it, and we have a fair chunk
> of CPU work to do between starting the checkpoint and issuing the
> first journal IO.
> 
> Hence it makes sense to amortise the latency cost of the cache flush
> by issuing it asynchronously and then waiting for it only when we
> need to issue the first IO in the transaction.
> 
> TO do this, we need async cache flush primitives to submit the cache
> flush bio and to wait on it. THe block layer has no such primitives
> for filesystems, so roll our own for the moment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_linux.h  |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> index 17f36db2f792..668f8bd27b4a 100644
> --- a/fs/xfs/xfs_bio_io.c
> +++ b/fs/xfs/xfs_bio_io.c
> @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
>  	return bio_max_segs(howmany(count, PAGE_SIZE));
>  }
>  
> +void

static void?

> +xfs_flush_bdev_async_endio(
> +	struct bio	*bio)
> +{
> +	if (bio->bi_private)
> +		complete(bio->bi_private);

Er... when would bi_private be null?  We always set it in
xfs_flush_bdev_async, and nobody else uses this helper, right?

--D

> +}
> +
> +/*
> + * Submit a request for an async cache flush to run. If the request queue does
> + * not require flush operations, just skip it altogether. If the caller needsi
> + * to wait for the flush completion at a later point in time, they must supply a
> + * valid completion. This will be signalled when the flush completes.  The
> + * caller never sees the bio that is issued here.
> + */
> +void
> +xfs_flush_bdev_async(
> +	struct bio		*bio,
> +	struct block_device	*bdev,
> +	struct completion	*done)
> +{
> +	struct request_queue	*q = bdev->bd_disk->queue;
> +
> +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> +		complete(done);
> +		return;
> +	}
> +
> +	bio_init(bio, NULL, 0);
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> +	bio->bi_private = done;
> +	bio->bi_end_io = xfs_flush_bdev_async_endio;
> +
> +	submit_bio(bio);
> +}
>  int
>  xfs_rw_bdev(
>  	struct block_device	*bdev,
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index af6be9b9ccdf..953d98bc4832 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -196,6 +196,8 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
>  
>  int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
>  		char *data, unsigned int op);
> +void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev,
> +		struct completion *done);
>  
>  #define ASSERT_ALWAYS(expr)	\
>  	(likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__))
> -- 
> 2.28.0
>
Brian Foster March 15, 2021, 2:41 p.m. UTC | #4
On Mon, Mar 08, 2021 at 02:24:07PM -0800, Darrick J. Wong wrote:
> On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote:
> > On 05 Mar 2021 at 10:41, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > The new checkpoint caceh flush mechanism requires us to issue an
> > > unconditional cache flush before we start a new checkpoint. We don't
> > > want to block for this if we can help it, and we have a fair chunk
> > > of CPU work to do between starting the checkpoint and issuing the
> > > first journal IO.
> > >
> > > Hence it makes sense to amortise the latency cost of the cache flush
> > > by issuing it asynchronously and then waiting for it only when we
> > > need to issue the first IO in the transaction.
> > >
> > > TO do this, we need async cache flush primitives to submit the cache
> > > flush bio and to wait on it. THe block layer has no such primitives
> > > for filesystems, so roll our own for the moment.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_linux.h  |  2 ++
> > >  2 files changed, 38 insertions(+)
> > >
> > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> > > index 17f36db2f792..668f8bd27b4a 100644
> > > --- a/fs/xfs/xfs_bio_io.c
> > > +++ b/fs/xfs/xfs_bio_io.c
> > > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
> > >  	return bio_max_segs(howmany(count, PAGE_SIZE));
> > >  }
> > >  
> > > +void
> > > +xfs_flush_bdev_async_endio(
> > > +	struct bio	*bio)
> > > +{
> > > +	if (bio->bi_private)
> > > +		complete(bio->bi_private);
> > > +}
> > > +
> > > +/*
> > > + * Submit a request for an async cache flush to run. If the request queue does
> > > + * not require flush operations, just skip it altogether. If the caller needsi
> > > + * to wait for the flush completion at a later point in time, they must supply a
> > > + * valid completion. This will be signalled when the flush completes.  The
> > > + * caller never sees the bio that is issued here.
> > > + */
> > > +void
> > > +xfs_flush_bdev_async(
> > > +	struct bio		*bio,
> > > +	struct block_device	*bdev,
> > > +	struct completion	*done)
> > > +{
> > > +	struct request_queue	*q = bdev->bd_disk->queue;
> > > +
> > > +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> > > +		complete(done);
> > 
> > complete() should be invoked only when "done" has a non-NULL value.
> 
> The only caller always provides a completion.
> 

IMO, if the mechanism (i.e. the helper) accommodates a NULL parameter,
the underlying completion callback should as well..

Brian

> --D
> 
> > > +		return;
> > > +	}
> > 
> > -- 
> > chandan
>
Brian Foster March 15, 2021, 2:42 p.m. UTC | #5
On Fri, Mar 05, 2021 at 04:11:03PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The new checkpoint caceh flush mechanism requires us to issue an

		     cache

> unconditional cache flush before we start a new checkpoint. We don't
> want to block for this if we can help it, and we have a fair chunk
> of CPU work to do between starting the checkpoint and issuing the
> first journal IO.
> 
> Hence it makes sense to amortise the latency cost of the cache flush
> by issuing it asynchronously and then waiting for it only when we
> need to issue the first IO in the transaction.
> 
> TO do this, we need async cache flush primitives to submit the cache

  To

> flush bio and to wait on it. THe block layer has no such primitives

			       The

> for filesystems, so roll our own for the moment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_linux.h  |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> index 17f36db2f792..668f8bd27b4a 100644
> --- a/fs/xfs/xfs_bio_io.c
> +++ b/fs/xfs/xfs_bio_io.c
> @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
>  	return bio_max_segs(howmany(count, PAGE_SIZE));
>  }
>  
> +void
> +xfs_flush_bdev_async_endio(
> +	struct bio	*bio)
> +{
> +	if (bio->bi_private)
> +		complete(bio->bi_private);
> +}
> +
> +/*
> + * Submit a request for an async cache flush to run. If the request queue does
> + * not require flush operations, just skip it altogether. If the caller needsi

									   needs

> + * to wait for the flush completion at a later point in time, they must supply a
> + * valid completion. This will be signalled when the flush completes.  The
> + * caller never sees the bio that is issued here.
> + */
> +void
> +xfs_flush_bdev_async(
> +	struct bio		*bio,
> +	struct block_device	*bdev,
> +	struct completion	*done)
> +{
> +	struct request_queue	*q = bdev->bd_disk->queue;
> +

It seems rather odd to me to accept a bio here and then init it, but I
see this was explicitly changed from the previous version to avoid an
allocation (I'd rather see the bio in the CIL context or something
rather than dropped on the stack, but whatever).

> +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> +		complete(done);

The NULL or no NULL debate aside, this should be consistent with the
logic in the callback (IMO, just check for NULL here as Chandan
suggested). With that fixed up, one way or the other:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +		return;
> +	}
> +
> +	bio_init(bio, NULL, 0);
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> +	bio->bi_private = done;
> +	bio->bi_end_io = xfs_flush_bdev_async_endio;
> +
> +	submit_bio(bio);
> +}
>  int
>  xfs_rw_bdev(
>  	struct block_device	*bdev,
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index af6be9b9ccdf..953d98bc4832 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -196,6 +196,8 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
>  
>  int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
>  		char *data, unsigned int op);
> +void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev,
> +		struct completion *done);
>  
>  #define ASSERT_ALWAYS(expr)	\
>  	(likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__))
> -- 
> 2.28.0
>
Darrick J. Wong March 15, 2021, 4:32 p.m. UTC | #6
On Mon, Mar 15, 2021 at 10:41:13AM -0400, Brian Foster wrote:
> On Mon, Mar 08, 2021 at 02:24:07PM -0800, Darrick J. Wong wrote:
> > On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote:
> > > On 05 Mar 2021 at 10:41, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > The new checkpoint caceh flush mechanism requires us to issue an
> > > > unconditional cache flush before we start a new checkpoint. We don't
> > > > want to block for this if we can help it, and we have a fair chunk
> > > > of CPU work to do between starting the checkpoint and issuing the
> > > > first journal IO.
> > > >
> > > > Hence it makes sense to amortise the latency cost of the cache flush
> > > > by issuing it asynchronously and then waiting for it only when we
> > > > need to issue the first IO in the transaction.
> > > >
> > > > TO do this, we need async cache flush primitives to submit the cache
> > > > flush bio and to wait on it. THe block layer has no such primitives
> > > > for filesystems, so roll our own for the moment.
> > > >
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/xfs_linux.h  |  2 ++
> > > >  2 files changed, 38 insertions(+)
> > > >
> > > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> > > > index 17f36db2f792..668f8bd27b4a 100644
> > > > --- a/fs/xfs/xfs_bio_io.c
> > > > +++ b/fs/xfs/xfs_bio_io.c
> > > > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count)
> > > >  	return bio_max_segs(howmany(count, PAGE_SIZE));
> > > >  }
> > > >  
> > > > +void
> > > > +xfs_flush_bdev_async_endio(
> > > > +	struct bio	*bio)
> > > > +{
> > > > +	if (bio->bi_private)
> > > > +		complete(bio->bi_private);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Submit a request for an async cache flush to run. If the request queue does
> > > > + * not require flush operations, just skip it altogether. If the caller needsi
> > > > + * to wait for the flush completion at a later point in time, they must supply a
> > > > + * valid completion. This will be signalled when the flush completes.  The
> > > > + * caller never sees the bio that is issued here.
> > > > + */
> > > > +void
> > > > +xfs_flush_bdev_async(
> > > > +	struct bio		*bio,
> > > > +	struct block_device	*bdev,
> > > > +	struct completion	*done)
> > > > +{
> > > > +	struct request_queue	*q = bdev->bd_disk->queue;
> > > > +
> > > > +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> > > > +		complete(done);
> > > 
> > > complete() should be invoked only when "done" has a non-NULL value.
> > 
> > The only caller always provides a completion.
> > 
> 
> IMO, if the mechanism (i.e. the helper) accommodates a NULL parameter,
> the underlying completion callback should as well..

Yes, I agree with that principle.  However, the use case for !done isn't
clear ot me -- what is the point of issuing a flush and not waiting for
the results?

Can PREFLUSHes generate IO errors?  And if they do, why don't we return
the error to the caller?

--D

> Brian
> 
> > --D
> > 
> > > > +		return;
> > > > +	}
> > > 
> > > -- 
> > > chandan
> > 
>
Christoph Hellwig March 16, 2021, 8:43 a.m. UTC | #7
On Mon, Mar 15, 2021 at 09:32:22AM -0700, Darrick J. Wong wrote:
> Yes, I agree with that principle.  However, the use case for !done isn't
> clear ot me -- what is the point of issuing a flush and not waiting for
> the results?

There is none.  This check should go away.

> 
> Can PREFLUSHes generate IO errors?  And if they do, why don't we return
> the error to the caller?

The caller owns the bio and can look at bi_status itself.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
index 17f36db2f792..668f8bd27b4a 100644
--- a/fs/xfs/xfs_bio_io.c
+++ b/fs/xfs/xfs_bio_io.c
@@ -9,6 +9,42 @@  static inline unsigned int bio_max_vecs(unsigned int count)
 	return bio_max_segs(howmany(count, PAGE_SIZE));
 }
 
+void
+xfs_flush_bdev_async_endio(
+	struct bio	*bio)
+{
+	if (bio->bi_private)
+		complete(bio->bi_private);
+}
+
+/*
+ * Submit a request for an async cache flush to run. If the request queue does
+ * not require flush operations, just skip it altogether. If the caller needsi
+ * to wait for the flush completion at a later point in time, they must supply a
+ * valid completion. This will be signalled when the flush completes.  The
+ * caller never sees the bio that is issued here.
+ */
+void
+xfs_flush_bdev_async(
+	struct bio		*bio,
+	struct block_device	*bdev,
+	struct completion	*done)
+{
+	struct request_queue	*q = bdev->bd_disk->queue;
+
+	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
+		complete(done);
+		return;
+	}
+
+	bio_init(bio, NULL, 0);
+	bio_set_dev(bio, bdev);
+	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
+	bio->bi_private = done;
+	bio->bi_end_io = xfs_flush_bdev_async_endio;
+
+	submit_bio(bio);
+}
 int
 xfs_rw_bdev(
 	struct block_device	*bdev,
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index af6be9b9ccdf..953d98bc4832 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -196,6 +196,8 @@  static inline uint64_t howmany_64(uint64_t x, uint32_t y)
 
 int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
 		char *data, unsigned int op);
+void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev,
+		struct completion *done);
 
 #define ASSERT_ALWAYS(expr)	\
 	(likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__))