[BUG] "block: make generic_make_request handle arbitrarily sized bios" breaks boot on parisc-linux
diff mbox

Message ID CACVXFVOnff1sFfMH+zsoypuYwM4PokJkHeKhbxJYDDH6ZtVScg@mail.gmail.com
State New
Headers show

Commit Message

Ming Lei Feb. 24, 2016, 7:59 a.m. UTC
On Wed, Feb 24, 2016 at 10:28 AM, John David Anglin
<dave.anglin@bell.net> wrote:
> The following block change breaks boot on parisc-linux:
>
> commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e
> Author: Kent Overstreet <kent.overstreet@gmail.com>
> Date:   Thu Apr 23 22:37:18 2015 -0700
>
>    block: make generic_make_request handle arbitrarily sized bios
>
>    The way the block layer is currently written, it goes to great lengths
>    to avoid having to split bios; upper layer code (such as bio_add_page())
>    checks what the underlying device can handle and tries to always create
>    bios that don't need to be split.
>
>    But this approach becomes unwieldy and eventually breaks down with
>    stacked devices and devices with dynamic limits, and it adds a lot of
>    complexity. If the block layer could split bios as needed, we could
>    eliminate a lot of complexity elsewhere - particularly in stacked
>    drivers. Code that creates bios can then create whatever size bios are
>    convenient, and more importantly stacked drivers don't have to deal with
>    both their own bio size limitations and the limitations of the
>    (potentially multiple) devices underneath them.  In the future this will
>    let us delete merge_bvec_fn and a bunch of other code.
>
>    We do this by adding calls to blk_queue_split() to the various
>    make_request functions that need it - a few can already handle arbitrary
>    size bios. Note that we add the call _after_ any call to
>    blk_queue_bounce(); this means that blk_queue_split() and
>    blk_recalc_rq_segments() don't need to be concerned with bouncing
>    affecting segment merging.
>
>    Some make_request_fn() callbacks were simple enough to audit and verify
>    they don't need blk_queue_split() calls. The skipped ones are:
>
>     * nfhd_make_request (arch/m68k/emu/nfblock.c)
>     * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>     * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>     * brd_make_request (ramdisk - drivers/block/brd.c)
>     * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
>     * loop_make_request
>     * null_queue_bio
>     * bcache's make_request fns
>
>    Some others are almost certainly safe to remove now, but will be left
>    for future patches.
>
>    Cc: Jens Axboe <axboe@kernel.dk>
>    Cc: Christoph Hellwig <hch@infradead.org>
>    Cc: Al Viro <viro@zeniv.linux.org.uk>
>    Cc: Ming Lei <ming.lei@canonical.com>
>    Cc: Neil Brown <neilb@suse.de>
>    Cc: Alasdair Kergon <agk@redhat.com>
>    Cc: Mike Snitzer <snitzer@redhat.com>
>    Cc: dm-devel@redhat.com
>    Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
>    Cc: drbd-user@lists.linbit.com
>    Cc: Jiri Kosina <jkosina@suse.cz>
>    Cc: Geoff Levand <geoff@infradead.org>
>    Cc: Jim Paris <jim@jtan.com>
>    Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
>    Cc: Minchan Kim <minchan@kernel.org>
>    Cc: Nitin Gupta <ngupta@vflare.org>
>    Cc: Oleg Drokin <oleg.drokin@intel.com>
>    Cc: Andreas Dilger <andreas.dilger@intel.com>
>    Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
>    Acked-by: Mike Snitzer <snitzer@redhat.com>
>    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>    Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
>    [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
>    Signed-off-by: Dongsu Park <dpark@posteo.net>
>    Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
>    Signed-off-by: Jens Axboe <axboe@fb.com>
>
> This thread on the linux-parisc has most of the discussion and analysis:
> http://www.spinics.net/lists/linux-parisc/msg06710.html
>
> Essentially, the SCSI layer underestimates the number of sg segments needed and we run off the end of the sg list and crash.
> This happens because the protect bit is ignored.  As a result 4.3 and later kernels fail to boot.  This includes the current Debian
> kernel for hppa.
>
> Hopefully, the block group can help resolve this issue.  We can help with testing if needed.
>

We fixed several similar bugs, but maybe there is another one, :-(

Could you apply the attached debug patch and post the log after the issue is
triggered?

Thanks,
Ming Lei

Comments

Helge Deller Feb. 24, 2016, 9:36 p.m. UTC | #1
Hi Ming Lei,

On 24.02.2016 08:59, Ming Lei wrote:
> On Wed, Feb 24, 2016 at 10:28 AM, John David Anglin
> <dave.anglin@bell.net> wrote:
>> The following block change breaks boot on parisc-linux:
>>
>> commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e
>> Author: Kent Overstreet <kent.overstreet@gmail.com>
>> Date:   Thu Apr 23 22:37:18 2015 -0700
>>
>>    block: make generic_make_request handle arbitrarily sized bios
>>
>>    The way the block layer is currently written, it goes to great lengths
>>    to avoid having to split bios; upper layer code (such as bio_add_page())
>>    checks what the underlying device can handle and tries to always create
>>    bios that don't need to be split.
>>
>>    But this approach becomes unwieldy and eventually breaks down with
>>    stacked devices and devices with dynamic limits, and it adds a lot of
>>    complexity. If the block layer could split bios as needed, we could
>>    eliminate a lot of complexity elsewhere - particularly in stacked
>>    drivers. Code that creates bios can then create whatever size bios are
>>    convenient, and more importantly stacked drivers don't have to deal with
>>    both their own bio size limitations and the limitations of the
>>    (potentially multiple) devices underneath them.  In the future this will
>>    let us delete merge_bvec_fn and a bunch of other code.
>>
>>    We do this by adding calls to blk_queue_split() to the various
>>    make_request functions that need it - a few can already handle arbitrary
>>    size bios. Note that we add the call _after_ any call to
>>    blk_queue_bounce(); this means that blk_queue_split() and
>>    blk_recalc_rq_segments() don't need to be concerned with bouncing
>>    affecting segment merging.
>>
>>    Some make_request_fn() callbacks were simple enough to audit and verify
>>    they don't need blk_queue_split() calls. The skipped ones are:
>>
>>     * nfhd_make_request (arch/m68k/emu/nfblock.c)
>>     * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>>     * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>>     * brd_make_request (ramdisk - drivers/block/brd.c)
>>     * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
>>     * loop_make_request
>>     * null_queue_bio
>>     * bcache's make_request fns
>>
>>    Some others are almost certainly safe to remove now, but will be left
>>    for future patches.
>>
>>    Cc: Jens Axboe <axboe@kernel.dk>
>>    Cc: Christoph Hellwig <hch@infradead.org>
>>    Cc: Al Viro <viro@zeniv.linux.org.uk>
>>    Cc: Ming Lei <ming.lei@canonical.com>
>>    Cc: Neil Brown <neilb@suse.de>
>>    Cc: Alasdair Kergon <agk@redhat.com>
>>    Cc: Mike Snitzer <snitzer@redhat.com>
>>    Cc: dm-devel@redhat.com
>>    Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
>>    Cc: drbd-user@lists.linbit.com
>>    Cc: Jiri Kosina <jkosina@suse.cz>
>>    Cc: Geoff Levand <geoff@infradead.org>
>>    Cc: Jim Paris <jim@jtan.com>
>>    Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
>>    Cc: Minchan Kim <minchan@kernel.org>
>>    Cc: Nitin Gupta <ngupta@vflare.org>
>>    Cc: Oleg Drokin <oleg.drokin@intel.com>
>>    Cc: Andreas Dilger <andreas.dilger@intel.com>
>>    Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
>>    Acked-by: Mike Snitzer <snitzer@redhat.com>
>>    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>>    Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
>>    [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
>>    Signed-off-by: Dongsu Park <dpark@posteo.net>
>>    Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
>>    Signed-off-by: Jens Axboe <axboe@fb.com>
>>
>> This thread on the linux-parisc has most of the discussion and analysis:
>> http://www.spinics.net/lists/linux-parisc/msg06710.html
>>
>> Essentially, the SCSI layer underestimates the number of sg segments needed and we run off the end of the sg list and crash.
>> This happens because the protect bit is ignored.  As a result 4.3 and later kernels fail to boot.  This includes the current Debian
>> kernel for hppa.
>>
>> Hopefully, the block group can help resolve this issue.  We can help with testing if needed.
>>
> 
> We fixed several similar bugs, but maybe there is another one, :-(

Thanks for your help!

> Could you apply the attached debug patch and post the log after the issue is
> triggered?

Sadly I was not yet able to produce the requested output for you.
It seems we have - probably triggered due to the block splitting itself - some kind of memory/stack
corruption in here as well. Note, the stack grows upwards(!) on parisc, so maybe local variables
get overwritten somehow...?

First I applied your patch as is. 
Here is the system log so far:

[   24.940000] cdrom: Uniform CD-ROM driver Revision: 3.20
[   25.000000] sd 3:0:6:0: [sdb] 71132960 512-byte logical blocks: (36.4 GB/33.9 GiB)
[   25.092000] sd 3:0:5:0: [sda] Write Protect is off
[   25.152000] sd 3:0:5:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
[   25.268000] sd 3:0:6:0: [sdb] Write Protect is off
[   25.328000] sd 3:0:6:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[   25.452000]  sda: sda1 sda2 sda3 < sda5 sda6 >
[   25.508000]  sdb: sdb1 sdb2 sdb3 < sdb5 sdb6 >
[   25.584000] scsi_id(112): unaligned access to 0x00000000facac009 at ip=0x000000004100390b
[   25.688000] sd 3:0:6:0: [sdb] Attached SCSI disk
[   25.752000] sd 3:0:5:0: [sda] Attached SCSI disk
[   25.840000] scsi_id(113): unaligned access to 0x00000000fa799009 at ip=0x000000004100390b
[   25.972000] random: nonblocking pool is initialized
[   26.064000] ------------[ cut here ]------------
[   26.120000] WARNING: at /build/linux-4.4/linux-4.4.2/block/blk-merge.c:484
[   26.200000] Modules linked in: sd_mod sr_mod cdrom ata_generic ohci_pci ehci_pci ohci_hcd ehci_hcd pata_ns87415 libata sym53c8xx scsi_transport_spi usbcore usb_common scsi_mod tulip
[   26.396000] CPU: 0 PID: 67 Comm: systemd-udevd Not tainted 4.4.0-1-parisc64-smp #7 Debian 4.4.2-2
[   26.504000] task: 00000000bbe96b18 ti: 00000000bbf00000 task.ti: 00000000bbf00000
[   26.592000]
[   26.608000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[   26.664000] PSW: 00001000000001001111111100001110 Not tainted
[   26.736000] r00-03  000000ff0804ff0e 00000000409e9280 00000000404f22a4 00000000bbf011e0

Here it stops. Usually you would see a stack backtrace now, but as I mentioned above
maybe the stack got corrupted and as such your tracing code is not being executed.

Then I moved the WARN_ON behind you tracing code. With that I got:

[   25.352000] cdrom: Uniform CD-ROM driver Revision: 3.20
[   25.444000]  sda: sda1 sda2 sda3 < sda5 sda6 >
[   25.496000]  sdb: sdb1 sdb2 sdb3 < sdb5 sdb6 >
[   25.568000] sd 3:0:5:0: [sda] Attached SCSI disk
[   25.644000] scsi_id(112): unaligned access to 0x00000000fae80009 at ip=0x000000004100390b
[   25.752000] sd 3:0:6:0: [sdb] Attached SCSI disk
[   25.836000] scsi_id(113): unaligned access to 0x00000000fac76009 at ip=0x000000004100390b
[   25.988000] random: nonblocking pool is initialized
[   26.048000] ------------[ cut here ]------------
[   26.104000] kernel BUG at /build/linux-4.4/linux-4.4.2/include/linux/scatterlist.h:92!
[   26.196000] CPU: 0 PID: 68 Comm: systemd-udevd Not tainted 4.4.0-1-parisc64-smp #8 Debian 4.4.2-2
[   26.304000] task: 00000000bbeb40f8 ti: 00000000bbf08000 task.ti: 00000000bbf08000
[   26.392000] 
[   26.412000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[   26.468000] PSW: 00001000000001001111111100001110 Not tainted
[   26.536000] r00-03  000000ff0804ff0e 00000000bbf091e0 00000000404f228c 00000000bbf091e0
[   26.632000] r04-07  00000000409b3a80 0000000000005000 0000000000000000 0000000000001000
[   26.728000] r08-11  000000000000001b 00000000000001b0 00000000bfd6e0f0 00000000bbe15a60
[   26.824000] r12-15  0000000000000000 0000000000000000 0000000000000003 000000007fa95178

Again, no stack backtrace.

Maybe Dave has more luck, otherwise I'll continue to try to get some info.

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Feb. 25, 2016, 3:38 a.m. UTC | #2
On Thu, Feb 25, 2016 at 7:28 AM, John David Anglin <dave.anglin@bell.net> wrote:
> On 2016-02-24, at 4:36 PM, Helge Deller wrote:
>
>> Maybe Dave has more luck, otherwise I'll continue to try to get some info.
>
> I tried your patch on the commit in linux-block which first failed to boot.  As with Helge, the
> system crashed and no useful data was output on console.  I then applied following patch
> to give some extra segments and tired again:
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b1a2631..b421f03 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -595,6 +595,11 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
>
>         BUG_ON(!nents);
>
> +       /* Provide extra entries in case of split.  */
> +       nents += 8;
> +       if (nents > SCSI_MAX_SG_SEGMENTS)
> +               nents = SCSI_MAX_SG_SEGMENTS;
> +

Yeah, this is needed for sake of safety.

>         if (mq) {
>                 if (nents <= SCSI_MAX_SG_SEGMENTS) {
>                         sdb->table.nents = nents;
>
> The attached file shows the crash in first boot.  The second boot was successful and various output
> was generated by your check code.

From the following log(just select one simple, and looks all are similar) in
2nd boot, the bi_phys_segments is figured out as one by block core
, which is wrong because the max segment size is 64k according to
your investigation in the below link, but the whole req/bio is 192k(4k*48).

    http://www.spinics.net/lists/linux-parisc/msg06749.html

Looks weird, it shouldn't have happened because blk_bio_segment_split()
does respect the max segment size limit.

BTW, what is the scsi driver for the device?

blk_rq_map_sg: merge bug: 3 1, extra_len 0, dma_drain 0
check_bvec: dump bvec for 000000007e53c5f0(f:24490000, t:1)
            0: 0 4096 245852 000000007e2c4c40
            1: 0 4096 245853 000000007e2c4c40
            2: 0 4096 245854 000000007e2c4c40
            3: 0 4096 245855 000000007e2c4c40
            4: 0 4096 245856 000000007e2c4c40
            5: 0 4096 245857 000000007e2c4c40
            6: 0 4096 245858 000000007e2c4c40
            7: 0 4096 245859 000000007e2c4c40
            8: 0 4096 245860 000000007e2c4c40
            9: 0 4096 245861 000000007e2c4c40
           10: 0 4096 245862 000000007e2c4c40
           11: 0 4096 245863 000000007e2c4c40
           12: 0 4096 245864 000000007e2c4c40
           13: 0 4096 245865 000000007e2c4c40
           14: 0 4096 245866 000000007e2c4c40
           15: 0 4096 245867 000000007e2c4c40
           16: 0 4096 245868 000000007e2c4c40
           17: 0 4096 245869 000000007e2c4c40
           18: 0 4096 245870 000000007e2c4c40
           19: 0 4096 245871 000000007e2c4c40
           20: 0 4096 245872 000000007e2c4c40
           21: 0 4096 245873 000000007e2c4c40
           22: 0 4096 245874 000000007e2c4c40
           23: 0 4096 245875 000000007e2c4c40
           24: 0 4096 245876 000000007e2c4c40
           25: 0 4096 245877 000000007e2c4c40
           26: 0 4096 245878 000000007e2c4c40
           27: 0 4096 245879 000000007e2c4c40
           28: 0 4096 245880 000000007e2c4c40
           29: 0 4096 245881 000000007e2c4c40
           30: 0 4096 245882 000000007e2c4c40
           31: 0 4096 245883 000000007e2c4c40
           32: 0 4096 245884 000000007e2c4c40
           33: 0 4096 245885 000000007e2c4c40
           34: 0 4096 245886 000000007e2c4c40
           35: 0 4096 245887 000000007e2c4c40
           36: 0 4096 245888 000000007e2c4c40
           37: 0 4096 245889 000000007e2c4c40
           38: 0 4096 245890 000000007e2c4c40
           39: 0 4096 245891 000000007e2c4c40
           40: 0 4096 245892 000000007e2c4c40
           41: 0 4096 245893 000000007e2c4c40
           42: 0 4096 245894 000000007e2c4c40
           43: 0 4096 245895 000000007e2c4c40
           44: 0 4096 245896 000000007e2c4c40
           45: 0 4096 245897 000000007e2c4c40
           46: 0 4096 245898 000000007e2c4c40
           47: 0 4096 245899 000000007e2c4c40


Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Feb. 25, 2016, 10:10 a.m. UTC | #3
> On Thu, Feb 25, 2016 at 7:28 AM, John David Anglin <dave.anglin@bell.net> wrote:
> > On 2016-02-24, at 4:36 PM, Helge Deller wrote:
> >
> >> Maybe Dave has more luck, otherwise I'll continue to try to get some info.
> >
> > I tried your patch on the commit in linux-block which first failed to boot.  As with Helge, the
> > system crashed and no useful data was output on console.  I then applied following patch
> > to give some extra segments and tired again:
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index b1a2631..b421f03 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -595,6 +595,11 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents, bool mq)
> >
> >         BUG_ON(!nents);
> >
> > +       /* Provide extra entries in case of split.  */
> > +       nents += 8;
> > +       if (nents > SCSI_MAX_SG_SEGMENTS)
> > +               nents = SCSI_MAX_SG_SEGMENTS;
> > +
> 
> Yeah, this is needed for sake of safety.
> 
> >         if (mq) {
> >                 if (nents <= SCSI_MAX_SG_SEGMENTS) {
> >                         sdb->table.nents = nents;
> >
> > The attached file shows the crash in first boot.  The second boot was successful and various output
> > was generated by your check code.
> 
> From the following log(just select one simple, and looks all are similar) in
> 2nd boot, the bi_phys_segments is figured out as one by block core
> , which is wrong because the max segment size is 64k according to
> your investigation in the below link, but the whole req/bio is 192k(4k*48).
> 
>     http://www.spinics.net/lists/linux-parisc/msg06749.html
> 
> Looks weird, it shouldn't have happened because blk_bio_segment_split()
> does respect the max segment size limit.
> 
> BTW, what is the scsi driver for the device?

It happens with various drivers.
sym53c8xx (on my machine) and mptspi (Fusion MPT, on Daves machine).
Then we have PATA/SATA controllers too: sil680, sata_sil24, pata_ns87415.
The problem can be reproduced by using sym53c8xx or mptsi and blacklisting all others.

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley March 9, 2016, 12:55 p.m. UTC | #4
On Thu, 2016-02-25 at 11:38 +0800, Ming Lei wrote:
> On Thu, Feb 25, 2016 at 7:28 AM, John David Anglin <
> dave.anglin@bell.net> wrote:
> > On 2016-02-24, at 4:36 PM, Helge Deller wrote:
> > 
> > > Maybe Dave has more luck, otherwise I'll continue to try to get
> > > some info.
> > 
> > I tried your patch on the commit in linux-block which first failed
> > to boot.  As with Helge, the
> > system crashed and no useful data was output on console.  I then
> > applied following patch
> > to give some extra segments and tired again:
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index b1a2631..b421f03 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -595,6 +595,11 @@ static int scsi_alloc_sgtable(struct
> > scsi_data_buffer *sdb, int nents, bool mq)
> > 
> >         BUG_ON(!nents);
> > 
> > +       /* Provide extra entries in case of split.  */
> > +       nents += 8;
> > +       if (nents > SCSI_MAX_SG_SEGMENTS)
> > +               nents = SCSI_MAX_SG_SEGMENTS;
> > +
> 
> Yeah, this is needed for sake of safety.
> 
> >         if (mq) {
> >                 if (nents <= SCSI_MAX_SG_SEGMENTS) {
> >                         sdb->table.nents = nents;
> > 
> > The attached file shows the crash in first boot.  The second boot
> > was successful and various output
> > was generated by your check code.
> 
> From the following log(just select one simple, and looks all are
> similar) in
> 2nd boot, the bi_phys_segments is figured out as one by block core
> , which is wrong because the max segment size is 64k according to
> your investigation in the below link, but the whole req/bio is
> 192k(4k*48).
> 
>     http://www.spinics.net/lists/linux-parisc/msg06749.html
> 
> Looks weird, it shouldn't have happened because
> blk_bio_segment_split()
> does respect the max segment size limit.
> 
> BTW, what is the scsi driver for the device?
> 
> blk_rq_map_sg: merge bug: 3 1, extra_len 0, dma_drain 0
> check_bvec: dump bvec for 000000007e53c5f0(f:24490000, t:1)
>             0: 0 4096 245852 000000007e2c4c40
>             1: 0 4096 245853 000000007e2c4c40
>             2: 0 4096 245854 000000007e2c4c40
>             3: 0 4096 245855 000000007e2c4c40
>             4: 0 4096 245856 000000007e2c4c40
>             5: 0 4096 245857 000000007e2c4c40
>             6: 0 4096 245858 000000007e2c4c40
>             7: 0 4096 245859 000000007e2c4c40
>             8: 0 4096 245860 000000007e2c4c40
>             9: 0 4096 245861 000000007e2c4c40
>            10: 0 4096 245862 000000007e2c4c40
>            11: 0 4096 245863 000000007e2c4c40
>            12: 0 4096 245864 000000007e2c4c40
>            13: 0 4096 245865 000000007e2c4c40
>            14: 0 4096 245866 000000007e2c4c40
>            15: 0 4096 245867 000000007e2c4c40
>            16: 0 4096 245868 000000007e2c4c40
>            17: 0 4096 245869 000000007e2c4c40
>            18: 0 4096 245870 000000007e2c4c40
>            19: 0 4096 245871 000000007e2c4c40
>            20: 0 4096 245872 000000007e2c4c40
>            21: 0 4096 245873 000000007e2c4c40
>            22: 0 4096 245874 000000007e2c4c40
>            23: 0 4096 245875 000000007e2c4c40
>            24: 0 4096 245876 000000007e2c4c40
>            25: 0 4096 245877 000000007e2c4c40
>            26: 0 4096 245878 000000007e2c4c40
>            27: 0 4096 245879 000000007e2c4c40
>            28: 0 4096 245880 000000007e2c4c40
>            29: 0 4096 245881 000000007e2c4c40
>            30: 0 4096 245882 000000007e2c4c40
>            31: 0 4096 245883 000000007e2c4c40
>            32: 0 4096 245884 000000007e2c4c40
>            33: 0 4096 245885 000000007e2c4c40
>            34: 0 4096 245886 000000007e2c4c40
>            35: 0 4096 245887 000000007e2c4c40
>            36: 0 4096 245888 000000007e2c4c40
>            37: 0 4096 245889 000000007e2c4c40
>            38: 0 4096 245890 000000007e2c4c40
>            39: 0 4096 245891 000000007e2c4c40
>            40: 0 4096 245892 000000007e2c4c40
>            41: 0 4096 245893 000000007e2c4c40
>            42: 0 4096 245894 000000007e2c4c40
>            43: 0 4096 245895 000000007e2c4c40
>            44: 0 4096 245896 000000007e2c4c40
>            45: 0 4096 245897 000000007e2c4c40
>            46: 0 4096 245898 000000007e2c4c40
>            47: 0 4096 245899 000000007e2c4c40


We've provided all the information you asked for, what's the next step
on this, or do we have to unwind the bio splitting code with reverts
until it starts working?

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei March 9, 2016, 2:43 p.m. UTC | #5
On Wed, Mar 9, 2016 at 8:55 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Thu, 2016-02-25 at 11:38 +0800, Ming Lei wrote:
>> On Thu, Feb 25, 2016 at 7:28 AM, John David Anglin <
>> dave.anglin@bell.net> wrote:
>> > On 2016-02-24, at 4:36 PM, Helge Deller wrote:
>> >
>> > > Maybe Dave has more luck, otherwise I'll continue to try to get
>> > > some info.
>> >
>> > I tried your patch on the commit in linux-block which first failed
>> > to boot.  As with Helge, the
>> > system crashed and no useful data was output on console.  I then
>> > applied following patch
>> > to give some extra segments and tired again:
>> >
>> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> > index b1a2631..b421f03 100644
>> > --- a/drivers/scsi/scsi_lib.c
>> > +++ b/drivers/scsi/scsi_lib.c
>> > @@ -595,6 +595,11 @@ static int scsi_alloc_sgtable(struct
>> > scsi_data_buffer *sdb, int nents, bool mq)
>> >
>> >         BUG_ON(!nents);
>> >
>> > +       /* Provide extra entries in case of split.  */
>> > +       nents += 8;
>> > +       if (nents > SCSI_MAX_SG_SEGMENTS)
>> > +               nents = SCSI_MAX_SG_SEGMENTS;
>> > +
>>
>> Yeah, this is needed for sake of safety.
>>
>> >         if (mq) {
>> >                 if (nents <= SCSI_MAX_SG_SEGMENTS) {
>> >                         sdb->table.nents = nents;
>> >
>> > The attached file shows the crash in first boot.  The second boot
>> > was successful and various output
>> > was generated by your check code.
>>
>> From the following log(just select one simple, and looks all are
>> similar) in
>> 2nd boot, the bi_phys_segments is figured out as one by block core
>> , which is wrong because the max segment size is 64k according to
>> your investigation in the below link, but the whole req/bio is
>> 192k(4k*48).
>>
>>     http://www.spinics.net/lists/linux-parisc/msg06749.html
>>
>> Looks weird, it shouldn't have happened because
>> blk_bio_segment_split()
>> does respect the max segment size limit.
>>
>> BTW, what is the scsi driver for the device?
>>
>> blk_rq_map_sg: merge bug: 3 1, extra_len 0, dma_drain 0
>> check_bvec: dump bvec for 000000007e53c5f0(f:24490000, t:1)
>>             0: 0 4096 245852 000000007e2c4c40
>>             1: 0 4096 245853 000000007e2c4c40
>>             2: 0 4096 245854 000000007e2c4c40
>>             3: 0 4096 245855 000000007e2c4c40
>>             4: 0 4096 245856 000000007e2c4c40
>>             5: 0 4096 245857 000000007e2c4c40
>>             6: 0 4096 245858 000000007e2c4c40
>>             7: 0 4096 245859 000000007e2c4c40
>>             8: 0 4096 245860 000000007e2c4c40
>>             9: 0 4096 245861 000000007e2c4c40
>>            10: 0 4096 245862 000000007e2c4c40
>>            11: 0 4096 245863 000000007e2c4c40
>>            12: 0 4096 245864 000000007e2c4c40
>>            13: 0 4096 245865 000000007e2c4c40
>>            14: 0 4096 245866 000000007e2c4c40
>>            15: 0 4096 245867 000000007e2c4c40
>>            16: 0 4096 245868 000000007e2c4c40
>>            17: 0 4096 245869 000000007e2c4c40
>>            18: 0 4096 245870 000000007e2c4c40
>>            19: 0 4096 245871 000000007e2c4c40
>>            20: 0 4096 245872 000000007e2c4c40
>>            21: 0 4096 245873 000000007e2c4c40
>>            22: 0 4096 245874 000000007e2c4c40
>>            23: 0 4096 245875 000000007e2c4c40
>>            24: 0 4096 245876 000000007e2c4c40
>>            25: 0 4096 245877 000000007e2c4c40
>>            26: 0 4096 245878 000000007e2c4c40
>>            27: 0 4096 245879 000000007e2c4c40
>>            28: 0 4096 245880 000000007e2c4c40
>>            29: 0 4096 245881 000000007e2c4c40
>>            30: 0 4096 245882 000000007e2c4c40
>>            31: 0 4096 245883 000000007e2c4c40
>>            32: 0 4096 245884 000000007e2c4c40
>>            33: 0 4096 245885 000000007e2c4c40
>>            34: 0 4096 245886 000000007e2c4c40
>>            35: 0 4096 245887 000000007e2c4c40
>>            36: 0 4096 245888 000000007e2c4c40
>>            37: 0 4096 245889 000000007e2c4c40
>>            38: 0 4096 245890 000000007e2c4c40
>>            39: 0 4096 245891 000000007e2c4c40
>>            40: 0 4096 245892 000000007e2c4c40
>>            41: 0 4096 245893 000000007e2c4c40
>>            42: 0 4096 245894 000000007e2c4c40
>>            43: 0 4096 245895 000000007e2c4c40
>>            44: 0 4096 245896 000000007e2c4c40
>>            45: 0 4096 245897 000000007e2c4c40
>>            46: 0 4096 245898 000000007e2c4c40
>>            47: 0 4096 245899 000000007e2c4c40
>
>
> We've provided all the information you asked for, what's the next step
> on this, or do we have to unwind the bio splitting code with reverts
> until it starts working?

John, Helge, and I did discuss the problem for a while privately, and looks
it is related with compiler.  Last time, I sent one patch which can make the
issue disappear, but the main change is just invovled with the below:

 struct bio_vec {
     struct page    *bv_page;
-    unsigned int    bv_len;
+    unsigned int    bv_seg:8;
+    unsigned int    bv_len:24;
     unsigned int    bv_offset;
 };

Maybe John and Helge have some update recently?

The logic in blk_bio_segment_split() is correct, and it does respect the max
segment size limit.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin March 9, 2016, 3:15 p.m. UTC | #6
On 2016-03-09 9:43 AM, Ming Lei wrote:
>> We've provided all the information you asked for, what's the next step
>> >on this, or do we have to unwind the bio splitting code with reverts
>> >until it starts working?
> John, Helge, and I did discuss the problem for a while privately, and looks
> it is related with compiler.  Last time, I sent one patch which can make the
> issue disappear, but the main change is just invovled with the below:
>
>   struct bio_vec {
>       struct page    *bv_page;
> -    unsigned int    bv_len;
> +    unsigned int    bv_seg:8;
> +    unsigned int    bv_len:24;
>       unsigned int    bv_offset;
>   };
>
> Maybe John and Helge have some update recently?
>
> The logic in blk_bio_segment_split() is correct, and it does respect the max
> segment size limit.
Helge has found that tagging blk_bio_segment_split() with "__attribute__ 
((optimize("O0")))"
makes the issue disappear.  The bug remains if one just adds bv_len to 
the struct without the
bit fields.  Maybe problem is evident from following output which I sent 
to Ming and Helge
last weekend?

blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0
check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1)
             0: 0 4096 246503 000000007e4a4f00(0, 94208, 1)
             1: 0 4096 246504 000000007e4a4f00(0, 94208, 1)
             2: 0 4096 246505 000000007e4a4f00(0, 94208, 1)
             3: 0 4096 246506 000000007e4a4f00(0, 94208, 1)
             4: 0 4096 246538 000000007e4a4f00(0, 94208, 2)
             5: 0 4096 246539 000000007e4a4f00(0, 94208, 2)
             6: 0 4096 246540 000000007e4a4f00(0, 94208, 2)
             7: 0 4096 246541 000000007e4a4f00(0, 94208, 2)
             8: 0 4096 246542 000000007e4a4f00(0, 94208, 2)
             9: 0 4096 246543 000000007e4a4f00(0, 94208, 2)
            10: 0 4096 246544 000000007e4a4f00(0, 94208, 2)
            11: 0 4096 246545 000000007e4a4f00(0, 94208, 2)
            12: 0 4096 246546 000000007e4a4f00(0, 94208, 2)
            13: 0 4096 246547 000000007e4a4f00(0, 94208, 2)
            14: 0 4096 246548 000000007e4a4f00(0, 94208, 2)
            15: 0 4096 246549 000000007e4a4f00(0, 94208, 2)
            16: 0 4096 246550 000000007e4a4f00(0, 94208, 2)
            17: 0 4096 246551 000000007e4a4f00(0, 94208, 2)
            18: 0 4096 246552 000000007e4a4f00(0, 94208, 2)
            19: 0 4096 246553 000000007e4a4f00(0, 94208, 2)
            20: 0 4096 246554 000000007e4a4f00(0, 94208, 2)
            21: 0 4096 246555 000000007e4a4f00(0, 94208, 2)
            22: 0 4096 246556 000000007e4a4f00(0, 94208, 2)
Kernel panic - not syncing: bad block merge

It seems segment 1 is too small and segment 2 too big?

The general plan is to disable inlining (maybe move 
blk_bio_segment_split() to a separate
function) to try to figure out what is miscompiled.

As you say, this is probably a GCC bug.  However, it's likely a 
middle-end or optimization
bug in the common GCC code.

Dave
Ming Lei March 9, 2016, 3:51 p.m. UTC | #7
On Wed, Mar 9, 2016 at 11:15 PM, John David Anglin <dave.anglin@bell.net> wrote:
> On 2016-03-09 9:43 AM, Ming Lei wrote:
>>>
>>> We've provided all the information you asked for, what's the next step
>>> >on this, or do we have to unwind the bio splitting code with reverts
>>> >until it starts working?
>>
>> John, Helge, and I did discuss the problem for a while privately, and
>> looks
>> it is related with compiler.  Last time, I sent one patch which can make
>> the
>> issue disappear, but the main change is just invovled with the below:
>>
>>   struct bio_vec {
>>       struct page    *bv_page;
>> -    unsigned int    bv_len;
>> +    unsigned int    bv_seg:8;
>> +    unsigned int    bv_len:24;
>>       unsigned int    bv_offset;
>>   };
>>
>> Maybe John and Helge have some update recently?
>>
>> The logic in blk_bio_segment_split() is correct, and it does respect the
>> max
>> segment size limit.
>
> Helge has found that tagging blk_bio_segment_split() with "__attribute__
> ((optimize("O0")))"
> makes the issue disappear.  The bug remains if one just adds bv_len to the
> struct without the
> bit fields.  Maybe problem is evident from following output which I sent to
> Ming and Helge
> last weekend?
>
> blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0
> check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1)
>             0: 0 4096 246503 000000007e4a4f00(0, 94208, 1)
>             1: 0 4096 246504 000000007e4a4f00(0, 94208, 1)
>             2: 0 4096 246505 000000007e4a4f00(0, 94208, 1)
>             3: 0 4096 246506 000000007e4a4f00(0, 94208, 1)

The above 4 io vectors belong to one same segment since
they are contineous physically from the 3rd column of PFN,
but the vector 4(the below one) isn't contineous with above 3
vectors, so the following one starts the 2nd segment.

>             4: 0 4096 246538 000000007e4a4f00(0, 94208, 2)
>             5: 0 4096 246539 000000007e4a4f00(0, 94208, 2)
>             6: 0 4096 246540 000000007e4a4f00(0, 94208, 2)
>             7: 0 4096 246541 000000007e4a4f00(0, 94208, 2)
>             8: 0 4096 246542 000000007e4a4f00(0, 94208, 2)
>             9: 0 4096 246543 000000007e4a4f00(0, 94208, 2)
>            10: 0 4096 246544 000000007e4a4f00(0, 94208, 2)
>            11: 0 4096 246545 000000007e4a4f00(0, 94208, 2)
>            12: 0 4096 246546 000000007e4a4f00(0, 94208, 2)
>            13: 0 4096 246547 000000007e4a4f00(0, 94208, 2)
>            14: 0 4096 246548 000000007e4a4f00(0, 94208, 2)
>            15: 0 4096 246549 000000007e4a4f00(0, 94208, 2)
>            16: 0 4096 246550 000000007e4a4f00(0, 94208, 2)
>            17: 0 4096 246551 000000007e4a4f00(0, 94208, 2)
>            18: 0 4096 246552 000000007e4a4f00(0, 94208, 2)
>            19: 0 4096 246553 000000007e4a4f00(0, 94208, 2)

The above 16 vectors are contineous physically, but the segment
size becomes 64K now, so blk_bio_segment_split() should have
seen that and started to split the bio.

>            20: 0 4096 246554 000000007e4a4f00(0, 94208, 2)
>            21: 0 4096 246555 000000007e4a4f00(0, 94208, 2)
>            22: 0 4096 246556 000000007e4a4f00(0, 94208, 2)

Unfortunately the bio isn't splitted at all, that means the following
code is run incorrectly:

if (seg_size + bv.bv_len > queue_max_segment_size(q))
                                goto new_segment;

seg_size should be 64K, and bv.bv_len should be 4K, so
the sum between the two should be bigger than
64K(queue_max_segment_size(q)). Unfortunately, the
code is optimized as run incorrectly.

> Kernel panic - not syncing: bad block merge
>
> It seems segment 1 is too small and segment 2 too big?

segment 1 is correct, and segment 2 becomes too big, but
__blk_segment_map_sg() runs correctly and figured out
this bio has 3 segments, so causes the oops.

>
> The general plan is to disable inlining (maybe move blk_bio_segment_split()
> to a separate
> function) to try to figure out what is miscompiled.
>
> As you say, this is probably a GCC bug.  However, it's likely a middle-end
> or optimization
> bug in the common GCC code.
>
> Dave
>
>
> --
> John David Anglin  dave.anglin@bell.net
>
Helge Deller March 9, 2016, 9:20 p.m. UTC | #8
On 09.03.2016 16:15, John David Anglin wrote:
> On 2016-03-09 9:43 AM, Ming Lei wrote:
>>> We've provided all the information you asked for, what's the next step
>>> >on this, or do we have to unwind the bio splitting code with reverts
>>> >until it starts working?
>> John, Helge, and I did discuss the problem for a while privately, and looks
>> it is related with compiler.  Last time, I sent one patch which can make the
>> issue disappear, but the main change is just invovled with the below:
>>
>>   struct bio_vec {
>>       struct page    *bv_page;
>> -    unsigned int    bv_len;
>> +    unsigned int    bv_seg:8;
>> +    unsigned int    bv_len:24;
>>       unsigned int    bv_offset;
>>   };
>>
>> Maybe John and Helge have some update recently?
>>
>> The logic in blk_bio_segment_split() is correct, and it does respect the max
>> segment size limit.
> Helge has found that tagging blk_bio_segment_split() with "__attribute__ ((optimize("O0")))"
> makes the issue disappear.  The bug remains if one just adds bv_len to the struct without the
> bit fields.  Maybe problem is evident from following output which I sent to Ming and Helge
> last weekend?
> 
> blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0
> check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1)
>             0: 0 4096 246503 000000007e4a4f00(0, 94208, 1)
>             1: 0 4096 246504 000000007e4a4f00(0, 94208, 1)
>             2: 0 4096 246505 000000007e4a4f00(0, 94208, 1)
>             3: 0 4096 246506 000000007e4a4f00(0, 94208, 1)
>             4: 0 4096 246538 000000007e4a4f00(0, 94208, 2)
>             5: 0 4096 246539 000000007e4a4f00(0, 94208, 2)
>             6: 0 4096 246540 000000007e4a4f00(0, 94208, 2)
>             7: 0 4096 246541 000000007e4a4f00(0, 94208, 2)
>             8: 0 4096 246542 000000007e4a4f00(0, 94208, 2)
>             9: 0 4096 246543 000000007e4a4f00(0, 94208, 2)
>            10: 0 4096 246544 000000007e4a4f00(0, 94208, 2)
>            11: 0 4096 246545 000000007e4a4f00(0, 94208, 2)
>            12: 0 4096 246546 000000007e4a4f00(0, 94208, 2)
>            13: 0 4096 246547 000000007e4a4f00(0, 94208, 2)
>            14: 0 4096 246548 000000007e4a4f00(0, 94208, 2)
>            15: 0 4096 246549 000000007e4a4f00(0, 94208, 2)
>            16: 0 4096 246550 000000007e4a4f00(0, 94208, 2)
>            17: 0 4096 246551 000000007e4a4f00(0, 94208, 2)
>            18: 0 4096 246552 000000007e4a4f00(0, 94208, 2)
>            19: 0 4096 246553 000000007e4a4f00(0, 94208, 2)
>            20: 0 4096 246554 000000007e4a4f00(0, 94208, 2)
>            21: 0 4096 246555 000000007e4a4f00(0, 94208, 2)
>            22: 0 4096 246556 000000007e4a4f00(0, 94208, 2)
> Kernel panic - not syncing: bad block merge
> 
> It seems segment 1 is too small and segment 2 too big?
> 
> The general plan is to disable inlining (maybe move blk_bio_segment_split() to a separate
> function) to try to figure out what is miscompiled.

Right.
I just succeeded in reproducing the bug with moving blk_bio_segment_split() into an own file
(and with "extern" instead of "static" in blk-merge.c). When compiled with -O2 it still crashes.
So, next step is to analyze what gcc does wrong when compiling this function.
It should get easier now to find the reason, since we have a smaller reproducer now.

Helge
 
> As you say, this is probably a GCC bug.  However, it's likely a middle-end or optimization
> bug in the common GCC code.
> 
> Dave
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley March 10, 2016, 12:16 a.m. UTC | #9
On Wed, 2016-03-09 at 22:20 +0100, Helge Deller wrote:
> On 09.03.2016 16:15, John David Anglin wrote:
> > On 2016-03-09 9:43 AM, Ming Lei wrote:
> > > > We've provided all the information you asked for, what's the
> > > > next step
> > > > > on this, or do we have to unwind the bio splitting code with
> > > > > reverts
> > > > > until it starts working?
> > > John, Helge, and I did discuss the problem for a while privately,
> > > and looks
> > > it is related with compiler.  Last time, I sent one patch which
> > > can make the
> > > issue disappear, but the main change is just invovled with the
> > > below:
> > > 
> > >   struct bio_vec {
> > >       struct page    *bv_page;
> > > -    unsigned int    bv_len;
> > > +    unsigned int    bv_seg:8;
> > > +    unsigned int    bv_len:24;
> > >       unsigned int    bv_offset;
> > >   };
> > > 
> > > Maybe John and Helge have some update recently?
> > > 
> > > The logic in blk_bio_segment_split() is correct, and it does
> > > respect the max
> > > segment size limit.
> > Helge has found that tagging blk_bio_segment_split() with
> > "__attribute__ ((optimize("O0")))"
> > makes the issue disappear.  The bug remains if one just adds bv_len
> > to the struct without the
> > bit fields.  Maybe problem is evident from following output which I
> > sent to Ming and Helge
> > last weekend?
> > 
> > blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0
> > check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1)
> >             0: 0 4096 246503 000000007e4a4f00(0, 94208, 1)
> >             1: 0 4096 246504 000000007e4a4f00(0, 94208, 1)
> >             2: 0 4096 246505 000000007e4a4f00(0, 94208, 1)
> >             3: 0 4096 246506 000000007e4a4f00(0, 94208, 1)
> >             4: 0 4096 246538 000000007e4a4f00(0, 94208, 2)
> >             5: 0 4096 246539 000000007e4a4f00(0, 94208, 2)
> >             6: 0 4096 246540 000000007e4a4f00(0, 94208, 2)
> >             7: 0 4096 246541 000000007e4a4f00(0, 94208, 2)
> >             8: 0 4096 246542 000000007e4a4f00(0, 94208, 2)
> >             9: 0 4096 246543 000000007e4a4f00(0, 94208, 2)
> >            10: 0 4096 246544 000000007e4a4f00(0, 94208, 2)
> >            11: 0 4096 246545 000000007e4a4f00(0, 94208, 2)
> >            12: 0 4096 246546 000000007e4a4f00(0, 94208, 2)
> >            13: 0 4096 246547 000000007e4a4f00(0, 94208, 2)
> >            14: 0 4096 246548 000000007e4a4f00(0, 94208, 2)
> >            15: 0 4096 246549 000000007e4a4f00(0, 94208, 2)
> >            16: 0 4096 246550 000000007e4a4f00(0, 94208, 2)
> >            17: 0 4096 246551 000000007e4a4f00(0, 94208, 2)
> >            18: 0 4096 246552 000000007e4a4f00(0, 94208, 2)
> >            19: 0 4096 246553 000000007e4a4f00(0, 94208, 2)
> >            20: 0 4096 246554 000000007e4a4f00(0, 94208, 2)
> >            21: 0 4096 246555 000000007e4a4f00(0, 94208, 2)
> >            22: 0 4096 246556 000000007e4a4f00(0, 94208, 2)
> > Kernel panic - not syncing: bad block merge
> > 
> > It seems segment 1 is too small and segment 2 too big?
> > 
> > The general plan is to disable inlining (maybe move
> > blk_bio_segment_split() to a separate
> > function) to try to figure out what is miscompiled.
> 
> Right.
> I just succeeded in reproducing the bug with moving
> blk_bio_segment_split() into an own file
> (and with "extern" instead of "static" in blk-merge.c). When compiled
> with -O2 it still crashes.
> So, next step is to analyze what gcc does wrong when compiling this
> function.
> It should get easier now to find the reason, since we have a smaller 
> reproducer now.

OK, that would explain why I don't see the problem, since I'm using an
older compiler.  So it's our issue and basically no action for block.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rolf Eike Beer March 10, 2016, 7:04 a.m. UTC | #10
> Right.
> I just succeeded in reproducing the bug with moving blk_bio_segment_split()
> into an own file (and with "extern" instead of "static" in blk-merge.c).
> When compiled with -O2 it still crashes. So, next step is to analyze what
> gcc does wrong when compiling this function. It should get easier now to
> find the reason, since we have a smaller reproducer now.

I have a ton of compilers here on my C8000 and a few even on my C3600 which 
drive nightly CMake dashboards. Could you send the testcase so I can pass it 
through the list and see what breaks?

Greetings,

Eike
Helge Deller March 20, 2016, 6:12 p.m. UTC | #11
Hi Eike,

On 10.03.2016 08:04, Rolf Eike Beer wrote:
>> Right.
>> I just succeeded in reproducing the bug with moving blk_bio_segment_split()
>> into an own file (and with "extern" instead of "static" in blk-merge.c).
>> When compiled with -O2 it still crashes. So, next step is to analyze what
>> gcc does wrong when compiling this function. It should get easier now to
>> find the reason, since we have a smaller reproducer now.
> 
> I have a ton of compilers here on my C8000 and a few even on my C3600 which 
> drive nightly CMake dashboards. Could you send the testcase so I can pass it 
> through the list and see what breaks?

Thanks for the offer!
Sadly it's not a stand-alone testcase, just the original source code extracted
which can be manually compiled and then the generated assembly can be compared
to what it should be.
That said, I think this specific testcase is not really usable in your test environment.

By the way, the bug was just fixed. Details are in bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70188

Thanks,
Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..1a73eb6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -416,6 +416,24 @@  single_segment:
 	return nsegs;
 }
 
+static void check_bvec(struct request *req)
+{
+	struct bio_vec bvec;
+	struct req_iterator iter;
+	int i = 0;
+
+	printk("%s: dump bvec for %p(f:%llx, t:%d)\n",
+			__func__, req, req->cmd_flags,
+			req->cmd_type);
+	rq_for_each_segment(bvec, req, iter) {
+		printk("\t %4d: %u %u %lu %p\n",
+			     i, bvec.bv_offset, bvec.bv_len,
+			     (unsigned long)page_to_pfn(bvec.bv_page),
+			     iter.bio);
+		i++;
+	}
+}
+
 /*
  * map a request to scatterlist, return number of sg entries setup. Caller
  * must make sure sg can hold rq->nr_phys_segments entries
@@ -425,6 +443,7 @@  int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 {
 	struct scatterlist *sg = NULL;
 	int nsegs = 0;
+	bool extra_len = false, dma_drain = false;
 
 	if (rq->bio)
 		nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg);
@@ -436,6 +455,7 @@  int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 
 		sg->length += pad_len;
 		rq->extra_len += pad_len;
+		extra_len = true;
 	}
 
 	if (q->dma_drain_size && q->dma_drain_needed(rq)) {
@@ -450,6 +470,7 @@  int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 			    (PAGE_SIZE - 1));
 		nsegs++;
 		rq->extra_len += q->dma_drain_size;
+		dma_drain = true;
 	}
 
 	if (sg)
@@ -460,6 +481,12 @@  int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 	 * segment is bigger than number of req's physical segments
 	 */
 	WARN_ON(nsegs > rq->nr_phys_segments);
+	if (nsegs > rq->nr_phys_segments) {
+		printk("%s: merge bug: %d %d, extra_len %d, dma_drain %d\n",
+			__func__, nsegs, rq->nr_phys_segments,
+			extra_len, dma_drain);
+		check_bvec(rq);
+	}
 
 	return nsegs;
 }