Message ID | a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dm: don't reuse bio for flushes | expand |
On Wed, Dec 19, 2018 at 08:50:09AM -0700, Jens Axboe wrote: > DM currently has a statically allocated bio that it uses to issue empty > flushes. It doesn't submit this bio, it just uses it for maintaining > state while setting up clones. Multiple users can access this bio at the > same time. This wasn't previously an issue, even if it was a bit iffy, > but with the blkg associations it can become one. > > We setup the blkg association, then clone bio's and submit, then remove > the blkg assocation again. But since we can have multiple tasks doing > this at the same time, against multiple blkg's, then we can either lose > references to a blkg, or put it twice. The latter causes complaints on > the percpu ref being <= 0 when released, and can cause use-after-free as > well. Ming reports that xfstest generic/475 triggers this: > > ------------[ cut here ]------------ > percpu ref (blkg_release) <= 0 (0) after switching to atomic > WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 > > Switch to just using an on-stack bio for this, and get rid of the > embedded bio. > > Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") > Reported-by: Ming Lei <ming.lei@redhat.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > Only other use case like this I could find is raid5-cache, which > serializes access to its embedded bio (and actually submits it, too). > That one looks fine. With this patch, xfstests(generic/475) won't trigger the warning of 'percpu ref (blkg_release) <= 0 (0) after switching to atomic' any more. Tested-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
On 12/19/18 9:06 AM, Ming Lei wrote: > On Wed, Dec 19, 2018 at 08:50:09AM -0700, Jens Axboe wrote: >> DM currently has a statically allocated bio that it uses to issue empty >> flushes. It doesn't submit this bio, it just uses it for maintaining >> state while setting up clones. Multiple users can access this bio at the >> same time. This wasn't previously an issue, even if it was a bit iffy, >> but with the blkg associations it can become one. >> >> We setup the blkg association, then clone bio's and submit, then remove >> the blkg assocation again. But since we can have multiple tasks doing >> this at the same time, against multiple blkg's, then we can either lose >> references to a blkg, or put it twice. The latter causes complaints on >> the percpu ref being <= 0 when released, and can cause use-after-free as >> well. Ming reports that xfstest generic/475 triggers this: >> >> ------------[ cut here ]------------ >> percpu ref (blkg_release) <= 0 (0) after switching to atomic >> WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 >> >> Switch to just using an on-stack bio for this, and get rid of the >> embedded bio. >> >> Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") >> Reported-by: Ming Lei <ming.lei@redhat.com> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> Only other use case like this I could find is raid5-cache, which >> serializes access to its embedded bio (and actually submits it, too). >> That one looks fine. > > With this patch, xfstests(generic/475) won't trigger the warning of > 'percpu ref (blkg_release) <= 0 (0) after switching to atomic' any more. > > Tested-by: Ming Lei <ming.lei@redhat.com> Great, thanks Ming. I think this was somewhat of a ticking time bomb, so good to have it fixed even if we could have done without a bug. And thanks for all the testing!
On Wed, Dec 19 2018 at 10:50am -0500, Jens Axboe <axboe@kernel.dk> wrote: > DM currently has a statically allocated bio that it uses to issue empty > flushes. It doesn't submit this bio, it just uses it for maintaining > state while setting up clones. Multiple users can access this bio at the > same time. This wasn't previously an issue, even if it was a bit iffy, > but with the blkg associations it can become one. > > We setup the blkg association, then clone bio's and submit, then remove > the blkg assocation again. But since we can have multiple tasks doing > this at the same time, against multiple blkg's, then we can either lose > references to a blkg, or put it twice. The latter causes complaints on > the percpu ref being <= 0 when released, and can cause use-after-free as > well. Ming reports that xfstest generic/475 triggers this: > > ------------[ cut here ]------------ > percpu ref (blkg_release) <= 0 (0) after switching to atomic > WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 > > Switch to just using an on-stack bio for this, and get rid of the > embedded bio. > > Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") > Reported-by: Ming Lei <ming.lei@redhat.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> Thanks for sorting this one out, definitely wasn't happy with how exposed DM was left with the recent blkg changes. This is clearly better. Acked-by: Mike Snitzer <snitzer@redhat.com>
On Wed, Dec 19 2018 at 11:11am -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Wed, Dec 19 2018 at 10:50am -0500, > Jens Axboe <axboe@kernel.dk> wrote: > > > DM currently has a statically allocated bio that it uses to issue empty > > flushes. It doesn't submit this bio, it just uses it for maintaining > > state while setting up clones. Multiple users can access this bio at the > > same time. This wasn't previously an issue, even if it was a bit iffy, > > but with the blkg associations it can become one. > > > > We setup the blkg association, then clone bio's and submit, then remove > > the blkg assocation again. But since we can have multiple tasks doing > > this at the same time, against multiple blkg's, then we can either lose > > references to a blkg, or put it twice. The latter causes complaints on > > the percpu ref being <= 0 when released, and can cause use-after-free as > > well. Ming reports that xfstest generic/475 triggers this: > > > > ------------[ cut here ]------------ > > percpu ref (blkg_release) <= 0 (0) after switching to atomic > > WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 > > > > Switch to just using an on-stack bio for this, and get rid of the > > embedded bio. > > > > Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") > > Reported-by: Ming Lei <ming.lei@redhat.com> > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Thanks for sorting this one out, definitely wasn't happy with how > exposed DM was left with the recent blkg changes. This is clearly > better. > > Acked-by: Mike Snitzer <snitzer@redhat.com> Please upgrade this to: Reviewed-by: Mike Snitzer <snitzer@redhat.com>
On 12/19/18 9:13 AM, Mike Snitzer wrote: > On Wed, Dec 19 2018 at 11:11am -0500, > Mike Snitzer <snitzer@redhat.com> wrote: > >> On Wed, Dec 19 2018 at 10:50am -0500, >> Jens Axboe <axboe@kernel.dk> wrote: >> >>> DM currently has a statically allocated bio that it uses to issue empty >>> flushes. It doesn't submit this bio, it just uses it for maintaining >>> state while setting up clones. Multiple users can access this bio at the >>> same time. This wasn't previously an issue, even if it was a bit iffy, >>> but with the blkg associations it can become one. >>> >>> We setup the blkg association, then clone bio's and submit, then remove >>> the blkg assocation again. But since we can have multiple tasks doing >>> this at the same time, against multiple blkg's, then we can either lose >>> references to a blkg, or put it twice. The latter causes complaints on >>> the percpu ref being <= 0 when released, and can cause use-after-free as >>> well. Ming reports that xfstest generic/475 triggers this: >>> >>> ------------[ cut here ]------------ >>> percpu ref (blkg_release) <= 0 (0) after switching to atomic >>> WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 >>> >>> Switch to just using an on-stack bio for this, and get rid of the >>> embedded bio. >>> >>> Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") >>> Reported-by: Ming Lei <ming.lei@redhat.com> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> Thanks for sorting this one out, definitely wasn't happy with how >> exposed DM was left with the recent blkg changes. This is clearly >> better. >> >> Acked-by: Mike Snitzer <snitzer@redhat.com> > > Please upgrade this to: > > Reviewed-by: Mike Snitzer <snitzer@redhat.com> Done, thanks Mike.
On Wed, Dec 19, 2018 at 09:14:25AM -0700, Jens Axboe wrote: > On 12/19/18 9:13 AM, Mike Snitzer wrote: > > On Wed, Dec 19 2018 at 11:11am -0500, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > >> On Wed, Dec 19 2018 at 10:50am -0500, > >> Jens Axboe <axboe@kernel.dk> wrote: > >> > >>> DM currently has a statically allocated bio that it uses to issue empty > >>> flushes. It doesn't submit this bio, it just uses it for maintaining > >>> state while setting up clones. Multiple users can access this bio at the > >>> same time. This wasn't previously an issue, even if it was a bit iffy, > >>> but with the blkg associations it can become one. > >>> > >>> We setup the blkg association, then clone bio's and submit, then remove > >>> the blkg assocation again. But since we can have multiple tasks doing > >>> this at the same time, against multiple blkg's, then we can either lose > >>> references to a blkg, or put it twice. The latter causes complaints on > >>> the percpu ref being <= 0 when released, and can cause use-after-free as > >>> well. Ming reports that xfstest generic/475 triggers this: > >>> > >>> ------------[ cut here ]------------ > >>> percpu ref (blkg_release) <= 0 (0) after switching to atomic > >>> WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 > >>> > >>> Switch to just using an on-stack bio for this, and get rid of the > >>> embedded bio. > >>> > >>> Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") > >>> Reported-by: Ming Lei <ming.lei@redhat.com> > >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >> > >> Thanks for sorting this one out, definitely wasn't happy with how > >> exposed DM was left with the recent blkg changes. This is clearly > >> better. > >> > >> Acked-by: Mike Snitzer <snitzer@redhat.com> > > > > Please upgrade this to: > > > > Reviewed-by: Mike Snitzer <snitzer@redhat.com> > > Done, thanks Mike. > > -- > Jens Axboe > Thanks all of you for helping fix this issue! Apologies for introducing with the initial change to dm. Thanks, Dennis
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 6fe883fac471..95c6d86ab5e8 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -106,9 +106,6 @@ struct mapped_device { struct block_device *bdev; - /* zero-length flush that will be cloned and submitted to targets */ - struct bio flush_bio; - struct dm_stats stats; /* for blk-mq request-based DM support */ diff --git a/drivers/md/dm.c b/drivers/md/dm.c index dddbca63e140..f588a6a83d80 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1420,11 +1420,11 @@ static int __send_empty_flush(struct clone_info *ci) struct dm_target *ti; /* - * Empty flush uses a statically initialized bio, &md->flush_bio, as - * the base for cloning. However, blkg association requires that a - * bdev is associated with a gendisk, which doesn't happen until the - * bdev is opened. So, blkg association is done at issue time of the - * flush rather than when the device is created in alloc_dev(). + * Empty flush uses a statically initialized bio, as the base for + * cloning. However, blkg association requires that a bdev is + * associated with a gendisk, which doesn't happen until the bdev is + * opened. So, blkg association is done at issue time of the flush + * rather than when the device is created in alloc_dev(). */ bio_set_dev(ci->bio, ci->io->md->bdev); @@ -1609,7 +1609,16 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, init_clone_info(&ci, md, map, bio); if (bio->bi_opf & REQ_PREFLUSH) { - ci.bio = &ci.io->md->flush_bio; + struct bio flush_bio; + + /* + * Use an on-stack bio for this, it's safe since we don't + * need to reference it after submit. It's just used as + * the basis for the clone(s). + */ + bio_init(&flush_bio, NULL, 0); + flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; + ci.bio = &flush_bio; ci.sector_count = 0; error = __send_empty_flush(&ci); /* dec_pending submits any data associated with flush */ @@ -1665,7 +1674,16 @@ static blk_qc_t __process_bio(struct mapped_device *md, init_clone_info(&ci, md, map, bio); if (bio->bi_opf & REQ_PREFLUSH) { - ci.bio = &ci.io->md->flush_bio; + struct bio flush_bio; + + /* + * Use an on-stack bio for this, it's safe since we don't + * need to reference it after submit. It's just used as + * the basis for the clone(s). + */ + bio_init(&flush_bio, NULL, 0); + flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; + ci.bio = &flush_bio; ci.sector_count = 0; error = __send_empty_flush(&ci); /* dec_pending submits any data associated with flush */ @@ -1949,9 +1967,6 @@ static struct mapped_device *alloc_dev(int minor) if (!md->bdev) goto bad; - bio_init(&md->flush_bio, NULL, 0); - md->flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC; - dm_stats_init(&md->stats); /* Populate the mapping, nobody knows we exist yet */
DM currently has a statically allocated bio that it uses to issue empty flushes. It doesn't submit this bio, it just uses it for maintaining state while setting up clones. Multiple users can access this bio at the same time. This wasn't previously an issue, even if it was a bit iffy, but with the blkg associations it can become one. We setup the blkg association, then clone bio's and submit, then remove the blkg assocation again. But since we can have multiple tasks doing this at the same time, against multiple blkg's, then we can either lose references to a blkg, or put it twice. The latter causes complaints on the percpu ref being <= 0 when released, and can cause use-after-free as well. Ming reports that xfstest generic/475 triggers this: ------------[ cut here ]------------ percpu ref (blkg_release) <= 0 (0) after switching to atomic WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 Switch to just using an on-stack bio for this, and get rid of the embedded bio. Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> --- Only other use case like this I could find is raid5-cache, which serializes access to its embedded bio (and actually submits it, too). That one looks fine.