Message ID | alpine.LRH.2.02.1407031514320.3347@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | Deferred, archived |
Delegated to: | Mike Snitzer |
Headers | show |
> + if (unlikely(err)) > + ACCESS_ONCE(bb->error) = err; I can't see a reason for the ACCESS_ONCE here. Also the likely/unlikely annotations here smell like premature optimization. Otherwise looks good to me. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, 8 Jul 2014, Christoph Hellwig wrote: > > + if (unlikely(err)) > > + ACCESS_ONCE(bb->error) = err; > > I can't see a reason for the ACCESS_ONCE here. Multiple bios can be completed concurrently, so they write bb->error at the same time. The compiler may do store tearing (see "store tearing" in Documentation/memory-barriers.txt) - it may split one 4-byte write into several smaller writes - and it could result in setting bb->error to invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't happen. Mikulas > Also the likely/unlikely annotations here smell like premature > optimization. > > Otherwise looks good to me. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, 2014-07-08 at 09:05 -0400, Mikulas Patocka wrote: > > On Tue, 8 Jul 2014, Christoph Hellwig wrote: > > > > + if (unlikely(err)) > > > + ACCESS_ONCE(bb->error) = err; > > > > I can't see a reason for the ACCESS_ONCE here. > > Multiple bios can be completed concurrently, so they write bb->error at > the same time. The compiler may do store tearing (see "store tearing" in > Documentation/memory-barriers.txt) - it may split one 4-byte write into > several smaller writes - and it could result in setting bb->error to > invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't > happen. That's not correct, because it's not applicable in this case. Tearing may occur on misalignment (which ACCESS_ONCE() cannot rectify because it's architectural), short constant loads (again, usually architectural) and structure copies, none of which applies here. We can rely on a properly aligned 32 bit write being atomic. James -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, 8 Jul 2014, James Bottomley wrote: > On Tue, 2014-07-08 at 09:05 -0400, Mikulas Patocka wrote: > > > > On Tue, 8 Jul 2014, Christoph Hellwig wrote: > > > > > > + if (unlikely(err)) > > > > + ACCESS_ONCE(bb->error) = err; > > > > > > I can't see a reason for the ACCESS_ONCE here. > > > > Multiple bios can be completed concurrently, so they write bb->error at > > the same time. The compiler may do store tearing (see "store tearing" in > > Documentation/memory-barriers.txt) - it may split one 4-byte write into > > several smaller writes - and it could result in setting bb->error to > > invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't > > happen. > > That's not correct, because it's not applicable in this case. Tearing > may occur on misalignment (which ACCESS_ONCE() cannot rectify because > it's architectural), short constant loads (again, usually architectural) > and structure copies, none of which applies here. Suppose this scenario: CPU1 writes low byte of the first error code CPU2 writes low byte of the second error code CPU2 writes 3 high bytes of the second error code CPU1 writes 3 high bytes of the first error code - now, bb->error contains garbage - a mix of the first and second error code. That's why we need ACCESS_ONCE. It may happen even if the variable is aligned. The compiler is allowed to split larger memory access to several smaller accesses. The compiler usually doesn't do this split (that's why omitting ACCESS_ONCE usually doesn't result in any observable misbehavior), but it is still a bug to omit it - you don't really know that for all 29 architectures gcc won't split the memory write... > We can rely on a properly aligned 32 bit write being atomic. > > James ... only if you use ACCESS_ONCE ... Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Index: linux-3.16-rc3/block/blk-lib.c =================================================================== --- linux-3.16-rc3.orig/block/blk-lib.c 2014-07-03 18:17:21.000000000 +0200 +++ linux-3.16-rc3/block/blk-lib.c 2014-07-03 18:36:52.000000000 +0200 @@ -11,7 +11,7 @@ struct bio_batch { atomic_t done; - unsigned long flags; + int error; struct completion *wait; }; @@ -19,8 +19,8 @@ static void bio_batch_end_io(struct bio { struct bio_batch *bb = bio->bi_private; - if (err && (err != -EOPNOTSUPP)) - clear_bit(BIO_UPTODATE, &bb->flags); + if (unlikely(err)) + ACCESS_ONCE(bb->error) = err; if (atomic_dec_and_test(&bb->done)) complete(bb->wait); bio_put(bio); @@ -78,7 +78,7 @@ int blkdev_issue_discard(struct block_de } atomic_set(&bb.done, 1); - bb.flags = 1 << BIO_UPTODATE; + bb.error = 0; bb.wait = &wait; blk_start_plug(&plug); @@ -134,8 +134,8 @@ int blkdev_issue_discard(struct block_de if (!atomic_dec_and_test(&bb.done)) wait_for_completion_io(&wait); - if (!test_bit(BIO_UPTODATE, &bb.flags)) - ret = -EIO; + if (likely(!ret)) + ret = bb.error; return ret; } @@ -172,7 +172,7 @@ int blkdev_issue_write_same(struct block return -EOPNOTSUPP; atomic_set(&bb.done, 1); - bb.flags = 1 << BIO_UPTODATE; + bb.error = 0; bb.wait = &wait; while (nr_sects) { @@ -208,8 +208,8 @@ int blkdev_issue_write_same(struct block if (!atomic_dec_and_test(&bb.done)) wait_for_completion_io(&wait); - if (!test_bit(BIO_UPTODATE, &bb.flags)) - ret = -ENOTSUPP; + if (likely(!ret)) + ret = bb.error; return ret; } @@ -236,7 +236,7 @@ static int __blkdev_issue_zeroout(struct DECLARE_COMPLETION_ONSTACK(wait); atomic_set(&bb.done, 1); - bb.flags = 1 << BIO_UPTODATE; + bb.error = 0; bb.wait = &wait; ret = 0; @@ -270,9 +270,8 @@ static int __blkdev_issue_zeroout(struct if (!atomic_dec_and_test(&bb.done)) wait_for_completion_io(&wait); - if (!test_bit(BIO_UPTODATE, &bb.flags)) - /* One of bios in the batch was completed with error.*/ - ret = -EIO; + if (likely(!ret)) + ret = bb.error; return ret; }
The function bio_batch_end_io ignores -EOPNOTSUPP. It doesn't matter for discard (the device isn't required to discard anything, so missing the error code and reporting success shouldn't cause any trouble). However, for WRITE SAME command, missing the error code is obviously wrong. It may fool the user into thinking that the data were written while in fact they weren't. Note that in device mapper, devices may be dynamically reconfigured, so a device that supports WRITE SAME may stop supporting it at any time and return -EOPNOTSUPP. Ignoring -EOPNOTSUPP is wrong. This patch changes bio_batch->flags to an error field and stores the last error there - so that the error is reported accurately and it isn't ignored. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- block/blk-lib.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel