diff mbox

SG does not ignore dxferp (direct io + mmap)

Message ID yq18trzhink.fsf@sermon.lab.mkp.net (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Martin K. Petersen Dec. 1, 2016, 1:40 p.m. UTC
>>>>> "Ewan" == Ewan D Milne <emilne@redhat.com> writes:

>> I think what we need to understand is what caused the regression in
>> the first place, I probably should have been bisecting the original
>> failure rather than trying to find where it started working.

Ewan> Bisecting leads to this commit:

commit 37f19e57a0de3c4a3417aa13ff4d04f1e0dee4b3
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Jan 18 16:16:33 2015 +0100

    block: merge __bio_map_user_iov into bio_map_user_iov
    
    And also remove the unused bdev argument.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Ming Lei <tom.leiming@gmail.com>
    Signed-off-by: Jens Axboe <axboe@fb.com>

Specifically, the problem appears to be caused by the removal of
the setting of bio->bi_bdev, which would previously be set to NULL.
If I add:

        /*

Ewan> The test passes (no zero byte corruption).

Ewan> Setting dxferp would cause map_data.null_mapped to be set before
Ewan> it is passed to blk_rq_map_user(_iov) which would cause a
Ewan> difference in behavior.

Christoph?

Comments

Christoph Hellwig Dec. 2, 2016, 12:21 p.m. UTC | #1
On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote:
> Specifically, the problem appears to be caused by the removal of
> the setting of bio->bi_bdev, which would previously be set to NULL.
> If I add:

Very odd.  For one I would expect it to be NULL anyway, second
I don't see why the behavior changed.  But given that this reverts
to the original assignment and makes things work I'll happily hack it
to get things working again:

Acked-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ewan Milne Dec. 2, 2016, 1:29 p.m. UTC | #2
On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote:
> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote:
> > Specifically, the problem appears to be caused by the removal of
> > the setting of bio->bi_bdev, which would previously be set to NULL.
> > If I add:
> 
> Very odd.  For one I would expect it to be NULL anyway, second
> I don't see why the behavior changed.  But given that this reverts
> to the original assignment and makes things work I'll happily hack it
> to get things working again:
> 
> Acked-by: Christoph Hellwig <hch@lst.de>

Yeah, I'm not sure I understand this either, apart from the change
adjusting the code to effectively do what it used to and making the
test case work.  I'm reluctant to cc: stable yet, let me look at this
a bit more and I'll post the actual patch soon.

-Ewan


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Dec. 2, 2016, 2:10 p.m. UTC | #3
On 12/02/2016 02:29 PM, Ewan D. Milne wrote:
> On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote:
>> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote:
>>> Specifically, the problem appears to be caused by the removal of
>>> the setting of bio->bi_bdev, which would previously be set to NULL.
>>> If I add:
>>
>> Very odd.  For one I would expect it to be NULL anyway, second
>> I don't see why the behavior changed.  But given that this reverts
>> to the original assignment and makes things work I'll happily hack it
>> to get things working again:
>>
>> Acked-by: Christoph Hellwig <hch@lst.de>
>
> Yeah, I'm not sure I understand this either, apart from the change
> adjusting the code to effectively do what it used to and making the
> test case work.  I'm reluctant to cc: stable yet, let me look at this
> a bit more and I'll post the actual patch soon.
>
Plus we found that this is basically a timing issue; we've found that 
supposedly fixed bugs will crop up after ~4k iterations.
(Johannes did a _lot_ of testing here :-)
So just because the bug failed to materialize can also mean that you 
simply didn't test long enough.

Cheers,

Hannes
Laurence Oberman Dec. 2, 2016, 2:17 p.m. UTC | #4
----- Original Message -----
> From: "Hannes Reinecke" <hare@suse.de>
> To: emilne@redhat.com, "Christoph Hellwig" <hch@infradead.org>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, "Johannes Thumshirn" <jthumshirn@suse.de>, "Laurence Oberman"
> <loberman@redhat.com>, "Eyal Ben David" <bdeyal@gmail.com>, dgilbert@interlog.com, linux-scsi@vger.kernel.org
> Sent: Friday, December 2, 2016 9:10:01 AM
> Subject: Re: SG does not ignore dxferp (direct io + mmap)
> 
> On 12/02/2016 02:29 PM, Ewan D. Milne wrote:
> > On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote:
> >> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote:
> >>> Specifically, the problem appears to be caused by the removal of
> >>> the setting of bio->bi_bdev, which would previously be set to NULL.
> >>> If I add:
> >>
> >> Very odd.  For one I would expect it to be NULL anyway, second
> >> I don't see why the behavior changed.  But given that this reverts
> >> to the original assignment and makes things work I'll happily hack it
> >> to get things working again:
> >>
> >> Acked-by: Christoph Hellwig <hch@lst.de>
> >
> > Yeah, I'm not sure I understand this either, apart from the change
> > adjusting the code to effectively do what it used to and making the
> > test case work.  I'm reluctant to cc: stable yet, let me look at this
> > a bit more and I'll post the actual patch soon.
> >
> Plus we found that this is basically a timing issue; we've found that
> supposedly fixed bugs will crop up after ~4k iterations.
> (Johannes did a _lot_ of testing here :-)
> So just because the bug failed to materialize can also mean that you
> simply didn't test long enough.
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
> 
Hannes, 
Just to clarify, my original test reverting  commit 3fa6c507319c897598512da91c010a4ad2ed682c,
what Ewan originally bisected to and running > 100000 iterations never failed.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ewan Milne Dec. 2, 2016, 7:29 p.m. UTC | #5
On Fri, 2016-12-02 at 15:10 +0100, Hannes Reinecke wrote:
> On 12/02/2016 02:29 PM, Ewan D. Milne wrote:
> > On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote:
> >> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote:
> >>> Specifically, the problem appears to be caused by the removal of
> >>> the setting of bio->bi_bdev, which would previously be set to NULL.
> >>> If I add:
> >>
> >> Very odd.  For one I would expect it to be NULL anyway, second
> >> I don't see why the behavior changed.  But given that this reverts
> >> to the original assignment and makes things work I'll happily hack it
> >> to get things working again:
> >>
> >> Acked-by: Christoph Hellwig <hch@lst.de>
> >
> > Yeah, I'm not sure I understand this either, apart from the change
> > adjusting the code to effectively do what it used to and making the
> > test case work.  I'm reluctant to cc: stable yet, let me look at this
> > a bit more and I'll post the actual patch soon.
> >
> Plus we found that this is basically a timing issue; we've found that 
> supposedly fixed bugs will crop up after ~4k iterations.
> (Johannes did a _lot_ of testing here :-)
> So just because the bug failed to materialize can also mean that you 
> simply didn't test long enough.
> 
Yes, and following the code paths it isn't completely clear how this
leads to the single zero-byte corruption, I am continuing to investigate.
There may very well be more than one problem.

On kernel versions I tested where I got a failure it was a solid
failure, it never worked no matter how many times I tried, but I
did not exhaustively test apparently successful kernel versions.
Not thousands, of times, anyway.

-Ewan


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ewan Milne Dec. 2, 2016, 8:37 p.m. UTC | #6
On Thu, 2016-12-01 at 08:40 -0500, Martin K. Petersen wrote:
> >>>>> "Ewan" == Ewan D Milne <emilne@redhat.com> writes:
...
> Specifically, the problem appears to be caused by the removal of
> the setting of bio->bi_bdev, which would previously be set to NULL.
> If I add:
> 
> diff --git a/block/bio.c b/block/bio.c
> index 0723d4c..ecac37b 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1351,6 +1351,7 @@ struct bio *bio_map_user_iov(struct request_queue
> *q,
>         if (iter->type & WRITE)
>                 bio->bi_rw |= REQ_WRITE;
>  
> +       bio->bi_bdev = NULL;
>         bio->bi_flags |= (1 << BIO_USER_MAPPED);
>  
>         /*
> 
> Ewan> The test passes (no zero byte corruption).
> 

Adding this seemed to work on top of the commit commit id mentioned
above (37f19e57a0de3c4a3417aa13ff4d04f1e0dee4b3)
but does not help on 4.7, because the test case now takes the
bio_copy_user_iov() code path rather than bio_map_user_iov().

-Ewan



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 0723d4c..ecac37b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1351,6 +1351,7 @@  struct bio *bio_map_user_iov(struct request_queue
*q,
        if (iter->type & WRITE)
                bio->bi_rw |= REQ_WRITE;
 
+       bio->bi_bdev = NULL;
        bio->bi_flags |= (1 << BIO_USER_MAPPED);