diff mbox

blk-lib: fix error reporting

Message ID alpine.LRH.2.02.1407031514320.3347@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State Deferred, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Mikulas Patocka July 3, 2014, 7:16 p.m. UTC
The function bio_batch_end_io ignores -EOPNOTSUPP. It doesn't matter for
discard (the device isn't required to discard anything, so missing the
error code and reporting success shouldn't cause any trouble). However,
for WRITE SAME command, missing the error code is obviously wrong. It may
fool the user into thinking that the data were written while in fact they
weren't.

Note that in device mapper, devices may be dynamically reconfigured, so a
device that supports WRITE SAME may stop supporting it at any time and
return -EOPNOTSUPP. Ignoring -EOPNOTSUPP is wrong.

This patch changes bio_batch->flags to an error field and stores the last
error there - so that the error is reported accurately and it isn't
ignored.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 block/blk-lib.c |   25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Christoph Hellwig July 8, 2014, 9:50 a.m. UTC | #1
> +	if (unlikely(err))
> +		ACCESS_ONCE(bb->error) = err;

I can't see a reason for the ACCESS_ONCE here.

Also the likely/unlikely annotations here smell like premature
optimization.

Otherwise looks good to me.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 8, 2014, 1:05 p.m. UTC | #2
On Tue, 8 Jul 2014, Christoph Hellwig wrote:

> > +	if (unlikely(err))
> > +		ACCESS_ONCE(bb->error) = err;
> 
> I can't see a reason for the ACCESS_ONCE here.

Multiple bios can be completed concurrently, so they write bb->error at 
the same time. The compiler may do store tearing (see "store tearing" in 
Documentation/memory-barriers.txt) - it may split one 4-byte write into 
several smaller writes - and it could result in setting bb->error to 
invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't 
happen.

Mikulas

> Also the likely/unlikely annotations here smell like premature
> optimization.
> 
> Otherwise looks good to me.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
James Bottomley July 8, 2014, 2:04 p.m. UTC | #3
On Tue, 2014-07-08 at 09:05 -0400, Mikulas Patocka wrote:
> 
> On Tue, 8 Jul 2014, Christoph Hellwig wrote:
> 
> > > +	if (unlikely(err))
> > > +		ACCESS_ONCE(bb->error) = err;
> > 
> > I can't see a reason for the ACCESS_ONCE here.
> 
> Multiple bios can be completed concurrently, so they write bb->error at 
> the same time. The compiler may do store tearing (see "store tearing" in 
> Documentation/memory-barriers.txt) - it may split one 4-byte write into 
> several smaller writes - and it could result in setting bb->error to 
> invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't 
> happen.

That's not correct, because it's not applicable in this case.  Tearing
may occur on misalignment (which ACCESS_ONCE() cannot rectify because
it's architectural), short constant loads (again, usually architectural)
and structure copies, none of which applies here.

We can rely on a properly aligned 32 bit write being atomic.

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 16, 2014, 10:22 a.m. UTC | #4
On Tue, 8 Jul 2014, James Bottomley wrote:

> On Tue, 2014-07-08 at 09:05 -0400, Mikulas Patocka wrote:
> > 
> > On Tue, 8 Jul 2014, Christoph Hellwig wrote:
> > 
> > > > +	if (unlikely(err))
> > > > +		ACCESS_ONCE(bb->error) = err;
> > > 
> > > I can't see a reason for the ACCESS_ONCE here.
> > 
> > Multiple bios can be completed concurrently, so they write bb->error at 
> > the same time. The compiler may do store tearing (see "store tearing" in 
> > Documentation/memory-barriers.txt) - it may split one 4-byte write into 
> > several smaller writes - and it could result in setting bb->error to 
> > invalid value. We need ACCESS_ONCE to make sure that store tearing doesn't 
> > happen.
> 
> That's not correct, because it's not applicable in this case.  Tearing
> may occur on misalignment (which ACCESS_ONCE() cannot rectify because
> it's architectural), short constant loads (again, usually architectural)
> and structure copies, none of which applies here.

Suppose this scenario:
CPU1 writes low byte of the first error code
CPU2 writes low byte of the second error code
CPU2 writes 3 high bytes of the second error code
CPU1 writes 3 high bytes of the first error code

- now, bb->error contains garbage - a mix of the first and second error 
code. That's why we need ACCESS_ONCE.

It may happen even if the variable is aligned. The compiler is allowed to 
split larger memory access to several smaller accesses. The compiler 
usually doesn't do this split (that's why omitting ACCESS_ONCE usually 
doesn't result in any observable misbehavior), but it is still a bug to 
omit it - you don't really know that for all 29 architectures gcc won't 
split the memory write...

> We can rely on a properly aligned 32 bit write being atomic.
>
> James

... only if you use ACCESS_ONCE ...

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

Index: linux-3.16-rc3/block/blk-lib.c
===================================================================
--- linux-3.16-rc3.orig/block/blk-lib.c	2014-07-03 18:17:21.000000000 +0200
+++ linux-3.16-rc3/block/blk-lib.c	2014-07-03 18:36:52.000000000 +0200
@@ -11,7 +11,7 @@ 
 
 struct bio_batch {
 	atomic_t		done;
-	unsigned long		flags;
+	int			error;
 	struct completion	*wait;
 };
 
@@ -19,8 +19,8 @@  static void bio_batch_end_io(struct bio 
 {
 	struct bio_batch *bb = bio->bi_private;
 
-	if (err && (err != -EOPNOTSUPP))
-		clear_bit(BIO_UPTODATE, &bb->flags);
+	if (unlikely(err))
+		ACCESS_ONCE(bb->error) = err;
 	if (atomic_dec_and_test(&bb->done))
 		complete(bb->wait);
 	bio_put(bio);
@@ -78,7 +78,7 @@  int blkdev_issue_discard(struct block_de
 	}
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	blk_start_plug(&plug);
@@ -134,8 +134,8 @@  int blkdev_issue_discard(struct block_de
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -EIO;
+	if (likely(!ret))
+		ret = bb.error;
 
 	return ret;
 }
@@ -172,7 +172,7 @@  int blkdev_issue_write_same(struct block
 		return -EOPNOTSUPP;
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	while (nr_sects) {
@@ -208,8 +208,8 @@  int blkdev_issue_write_same(struct block
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -ENOTSUPP;
+	if (likely(!ret))
+		ret = bb.error;
 
 	return ret;
 }
@@ -236,7 +236,7 @@  static int __blkdev_issue_zeroout(struct
 	DECLARE_COMPLETION_ONSTACK(wait);
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	ret = 0;
@@ -270,9 +270,8 @@  static int __blkdev_issue_zeroout(struct
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		/* One of bios in the batch was completed with error.*/
-		ret = -EIO;
+	if (likely(!ret))
+		ret = bb.error;
 
 	return ret;
 }