Message ID | 20210305051143.182133-6-david@fromorbit.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: consolidated log and optimisation changes | expand |
On 05 Mar 2021 at 10:41, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > The new checkpoint caceh flush mechanism requires us to issue an > unconditional cache flush before we start a new checkpoint. We don't > want to block for this if we can help it, and we have a fair chunk > of CPU work to do between starting the checkpoint and issuing the > first journal IO. > > Hence it makes sense to amortise the latency cost of the cache flush > by issuing it asynchronously and then waiting for it only when we > need to issue the first IO in the transaction. > > TO do this, we need async cache flush primitives to submit the cache > flush bio and to wait on it. THe block layer has no such primitives > for filesystems, so roll our own for the moment. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_linux.h | 2 ++ > 2 files changed, 38 insertions(+) > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > index 17f36db2f792..668f8bd27b4a 100644 > --- a/fs/xfs/xfs_bio_io.c > +++ b/fs/xfs/xfs_bio_io.c > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > return bio_max_segs(howmany(count, PAGE_SIZE)); > } > > +void > +xfs_flush_bdev_async_endio( > + struct bio *bio) > +{ > + if (bio->bi_private) > + complete(bio->bi_private); > +} > + > +/* > + * Submit a request for an async cache flush to run. If the request queue does > + * not require flush operations, just skip it altogether. If the caller needsi > + * to wait for the flush completion at a later point in time, they must supply a > + * valid completion. This will be signalled when the flush completes. The > + * caller never sees the bio that is issued here. > + */ > +void > +xfs_flush_bdev_async( > + struct bio *bio, > + struct block_device *bdev, > + struct completion *done) > +{ > + struct request_queue *q = bdev->bd_disk->queue; > + > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > + complete(done); complete() should be invoked only when "done" has a non-NULL value. > + return; > + }
On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote: > On 05 Mar 2021 at 10:41, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > The new checkpoint caceh flush mechanism requires us to issue an > > unconditional cache flush before we start a new checkpoint. We don't > > want to block for this if we can help it, and we have a fair chunk > > of CPU work to do between starting the checkpoint and issuing the > > first journal IO. > > > > Hence it makes sense to amortise the latency cost of the cache flush > > by issuing it asynchronously and then waiting for it only when we > > need to issue the first IO in the transaction. > > > > TO do this, we need async cache flush primitives to submit the cache > > flush bio and to wait on it. THe block layer has no such primitives > > for filesystems, so roll our own for the moment. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_linux.h | 2 ++ > > 2 files changed, 38 insertions(+) > > > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > > index 17f36db2f792..668f8bd27b4a 100644 > > --- a/fs/xfs/xfs_bio_io.c > > +++ b/fs/xfs/xfs_bio_io.c > > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > > return bio_max_segs(howmany(count, PAGE_SIZE)); > > } > > > > +void > > +xfs_flush_bdev_async_endio( > > + struct bio *bio) > > +{ > > + if (bio->bi_private) > > + complete(bio->bi_private); > > +} > > + > > +/* > > + * Submit a request for an async cache flush to run. If the request queue does > > + * not require flush operations, just skip it altogether. If the caller needsi > > + * to wait for the flush completion at a later point in time, they must supply a > > + * valid completion. This will be signalled when the flush completes. The > > + * caller never sees the bio that is issued here. > > + */ > > +void > > +xfs_flush_bdev_async( > > + struct bio *bio, > > + struct block_device *bdev, > > + struct completion *done) > > +{ > > + struct request_queue *q = bdev->bd_disk->queue; > > + > > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > > + complete(done); > > complete() should be invoked only when "done" has a non-NULL value. The only caller always provides a completion. --D > > + return; > > + } > > -- > chandan
On Fri, Mar 05, 2021 at 04:11:03PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > The new checkpoint caceh flush mechanism requires us to issue an > unconditional cache flush before we start a new checkpoint. We don't > want to block for this if we can help it, and we have a fair chunk > of CPU work to do between starting the checkpoint and issuing the > first journal IO. > > Hence it makes sense to amortise the latency cost of the cache flush > by issuing it asynchronously and then waiting for it only when we > need to issue the first IO in the transaction. > > TO do this, we need async cache flush primitives to submit the cache > flush bio and to wait on it. THe block layer has no such primitives > for filesystems, so roll our own for the moment. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_linux.h | 2 ++ > 2 files changed, 38 insertions(+) > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > index 17f36db2f792..668f8bd27b4a 100644 > --- a/fs/xfs/xfs_bio_io.c > +++ b/fs/xfs/xfs_bio_io.c > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > return bio_max_segs(howmany(count, PAGE_SIZE)); > } > > +void static void? > +xfs_flush_bdev_async_endio( > + struct bio *bio) > +{ > + if (bio->bi_private) > + complete(bio->bi_private); Er... when would bi_private be null? We always set it in xfs_flush_bdev_async, and nobody else uses this helper, right? --D > +} > + > +/* > + * Submit a request for an async cache flush to run. If the request queue does > + * not require flush operations, just skip it altogether. If the caller needsi > + * to wait for the flush completion at a later point in time, they must supply a > + * valid completion. This will be signalled when the flush completes. The > + * caller never sees the bio that is issued here. > + */ > +void > +xfs_flush_bdev_async( > + struct bio *bio, > + struct block_device *bdev, > + struct completion *done) > +{ > + struct request_queue *q = bdev->bd_disk->queue; > + > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > + complete(done); > + return; > + } > + > + bio_init(bio, NULL, 0); > + bio_set_dev(bio, bdev); > + bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; > + bio->bi_private = done; > + bio->bi_end_io = xfs_flush_bdev_async_endio; > + > + submit_bio(bio); > +} > int > xfs_rw_bdev( > struct block_device *bdev, > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index af6be9b9ccdf..953d98bc4832 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -196,6 +196,8 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y) > > int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, > char *data, unsigned int op); > +void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev, > + struct completion *done); > > #define ASSERT_ALWAYS(expr) \ > (likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__)) > -- > 2.28.0 >
On Mon, Mar 08, 2021 at 02:24:07PM -0800, Darrick J. Wong wrote: > On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote: > > On 05 Mar 2021 at 10:41, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > The new checkpoint caceh flush mechanism requires us to issue an > > > unconditional cache flush before we start a new checkpoint. We don't > > > want to block for this if we can help it, and we have a fair chunk > > > of CPU work to do between starting the checkpoint and issuing the > > > first journal IO. > > > > > > Hence it makes sense to amortise the latency cost of the cache flush > > > by issuing it asynchronously and then waiting for it only when we > > > need to issue the first IO in the transaction. > > > > > > TO do this, we need async cache flush primitives to submit the cache > > > flush bio and to wait on it. THe block layer has no such primitives > > > for filesystems, so roll our own for the moment. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > > > fs/xfs/xfs_linux.h | 2 ++ > > > 2 files changed, 38 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > > > index 17f36db2f792..668f8bd27b4a 100644 > > > --- a/fs/xfs/xfs_bio_io.c > > > +++ b/fs/xfs/xfs_bio_io.c > > > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > > > return bio_max_segs(howmany(count, PAGE_SIZE)); > > > } > > > > > > +void > > > +xfs_flush_bdev_async_endio( > > > + struct bio *bio) > > > +{ > > > + if (bio->bi_private) > > > + complete(bio->bi_private); > > > +} > > > + > > > +/* > > > + * Submit a request for an async cache flush to run. If the request queue does > > > + * not require flush operations, just skip it altogether. If the caller needsi > > > + * to wait for the flush completion at a later point in time, they must supply a > > > + * valid completion. This will be signalled when the flush completes. The > > > + * caller never sees the bio that is issued here. > > > + */ > > > +void > > > +xfs_flush_bdev_async( > > > + struct bio *bio, > > > + struct block_device *bdev, > > > + struct completion *done) > > > +{ > > > + struct request_queue *q = bdev->bd_disk->queue; > > > + > > > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > > > + complete(done); > > > > complete() should be invoked only when "done" has a non-NULL value. > > The only caller always provides a completion. > IMO, if the mechanism (i.e. the helper) accommodates a NULL parameter, the underlying completion callback should as well.. Brian > --D > > > > + return; > > > + } > > > > -- > > chandan >
On Fri, Mar 05, 2021 at 04:11:03PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > The new checkpoint caceh flush mechanism requires us to issue an cache > unconditional cache flush before we start a new checkpoint. We don't > want to block for this if we can help it, and we have a fair chunk > of CPU work to do between starting the checkpoint and issuing the > first journal IO. > > Hence it makes sense to amortise the latency cost of the cache flush > by issuing it asynchronously and then waiting for it only when we > need to issue the first IO in the transaction. > > TO do this, we need async cache flush primitives to submit the cache To > flush bio and to wait on it. THe block layer has no such primitives The > for filesystems, so roll our own for the moment. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_linux.h | 2 ++ > 2 files changed, 38 insertions(+) > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > index 17f36db2f792..668f8bd27b4a 100644 > --- a/fs/xfs/xfs_bio_io.c > +++ b/fs/xfs/xfs_bio_io.c > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > return bio_max_segs(howmany(count, PAGE_SIZE)); > } > > +void > +xfs_flush_bdev_async_endio( > + struct bio *bio) > +{ > + if (bio->bi_private) > + complete(bio->bi_private); > +} > + > +/* > + * Submit a request for an async cache flush to run. If the request queue does > + * not require flush operations, just skip it altogether. If the caller needsi needs > + * to wait for the flush completion at a later point in time, they must supply a > + * valid completion. This will be signalled when the flush completes. The > + * caller never sees the bio that is issued here. > + */ > +void > +xfs_flush_bdev_async( > + struct bio *bio, > + struct block_device *bdev, > + struct completion *done) > +{ > + struct request_queue *q = bdev->bd_disk->queue; > + It seems rather odd to me to accept a bio here and then init it, but I see this was explicitly changed from the previous version to avoid an allocation (I'd rather see the bio in the CIL context or something rather than dropped on the stack, but whatever). > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > + complete(done); The NULL or no NULL debate aside, this should be consistent with the logic in the callback (IMO, just check for NULL here as Chandan suggested). With that fixed up, one way or the other: Reviewed-by: Brian Foster <bfoster@redhat.com> > + return; > + } > + > + bio_init(bio, NULL, 0); > + bio_set_dev(bio, bdev); > + bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; > + bio->bi_private = done; > + bio->bi_end_io = xfs_flush_bdev_async_endio; > + > + submit_bio(bio); > +} > int > xfs_rw_bdev( > struct block_device *bdev, > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index af6be9b9ccdf..953d98bc4832 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -196,6 +196,8 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y) > > int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, > char *data, unsigned int op); > +void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev, > + struct completion *done); > > #define ASSERT_ALWAYS(expr) \ > (likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__)) > -- > 2.28.0 >
On Mon, Mar 15, 2021 at 10:41:13AM -0400, Brian Foster wrote: > On Mon, Mar 08, 2021 at 02:24:07PM -0800, Darrick J. Wong wrote: > > On Mon, Mar 08, 2021 at 03:18:09PM +0530, Chandan Babu R wrote: > > > On 05 Mar 2021 at 10:41, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > > > The new checkpoint caceh flush mechanism requires us to issue an > > > > unconditional cache flush before we start a new checkpoint. We don't > > > > want to block for this if we can help it, and we have a fair chunk > > > > of CPU work to do between starting the checkpoint and issuing the > > > > first journal IO. > > > > > > > > Hence it makes sense to amortise the latency cost of the cache flush > > > > by issuing it asynchronously and then waiting for it only when we > > > > need to issue the first IO in the transaction. > > > > > > > > TO do this, we need async cache flush primitives to submit the cache > > > > flush bio and to wait on it. THe block layer has no such primitives > > > > for filesystems, so roll our own for the moment. > > > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > > --- > > > > fs/xfs/xfs_bio_io.c | 36 ++++++++++++++++++++++++++++++++++++ > > > > fs/xfs/xfs_linux.h | 2 ++ > > > > 2 files changed, 38 insertions(+) > > > > > > > > diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c > > > > index 17f36db2f792..668f8bd27b4a 100644 > > > > --- a/fs/xfs/xfs_bio_io.c > > > > +++ b/fs/xfs/xfs_bio_io.c > > > > @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) > > > > return bio_max_segs(howmany(count, PAGE_SIZE)); > > > > } > > > > > > > > +void > > > > +xfs_flush_bdev_async_endio( > > > > + struct bio *bio) > > > > +{ > > > > + if (bio->bi_private) > > > > + complete(bio->bi_private); > > > > +} > > > > + > > > > +/* > > > > + * Submit a request for an async cache flush to run. If the request queue does > > > > + * not require flush operations, just skip it altogether. If the caller needsi > > > > + * to wait for the flush completion at a later point in time, they must supply a > > > > + * valid completion. This will be signalled when the flush completes. The > > > > + * caller never sees the bio that is issued here. > > > > + */ > > > > +void > > > > +xfs_flush_bdev_async( > > > > + struct bio *bio, > > > > + struct block_device *bdev, > > > > + struct completion *done) > > > > +{ > > > > + struct request_queue *q = bdev->bd_disk->queue; > > > > + > > > > + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { > > > > + complete(done); > > > > > > complete() should be invoked only when "done" has a non-NULL value. > > > > The only caller always provides a completion. > > > > IMO, if the mechanism (i.e. the helper) accommodates a NULL parameter, > the underlying completion callback should as well.. Yes, I agree with that principle. However, the use case for !done isn't clear ot me -- what is the point of issuing a flush and not waiting for the results? Can PREFLUSHes generate IO errors? And if they do, why don't we return the error to the caller? --D > Brian > > > --D > > > > > > + return; > > > > + } > > > > > > -- > > > chandan > > >
On Mon, Mar 15, 2021 at 09:32:22AM -0700, Darrick J. Wong wrote: > Yes, I agree with that principle. However, the use case for !done isn't > clear ot me -- what is the point of issuing a flush and not waiting for > the results? There is none. This check should go away. > > Can PREFLUSHes generate IO errors? And if they do, why don't we return > the error to the caller? The caller owns the bio and can look at bi_status itself.
diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c index 17f36db2f792..668f8bd27b4a 100644 --- a/fs/xfs/xfs_bio_io.c +++ b/fs/xfs/xfs_bio_io.c @@ -9,6 +9,42 @@ static inline unsigned int bio_max_vecs(unsigned int count) return bio_max_segs(howmany(count, PAGE_SIZE)); } +void +xfs_flush_bdev_async_endio( + struct bio *bio) +{ + if (bio->bi_private) + complete(bio->bi_private); +} + +/* + * Submit a request for an async cache flush to run. If the request queue does + * not require flush operations, just skip it altogether. If the caller needsi + * to wait for the flush completion at a later point in time, they must supply a + * valid completion. This will be signalled when the flush completes. The + * caller never sees the bio that is issued here. + */ +void +xfs_flush_bdev_async( + struct bio *bio, + struct block_device *bdev, + struct completion *done) +{ + struct request_queue *q = bdev->bd_disk->queue; + + if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) { + complete(done); + return; + } + + bio_init(bio, NULL, 0); + bio_set_dev(bio, bdev); + bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; + bio->bi_private = done; + bio->bi_end_io = xfs_flush_bdev_async_endio; + + submit_bio(bio); +} int xfs_rw_bdev( struct block_device *bdev, diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index af6be9b9ccdf..953d98bc4832 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -196,6 +196,8 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y) int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, char *data, unsigned int op); +void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev, + struct completion *done); #define ASSERT_ALWAYS(expr) \ (likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__))