Message ID | CACVXFVOnff1sFfMH+zsoypuYwM4PokJkHeKhbxJYDDH6ZtVScg@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Ming Lei, On 24.02.2016 08:59, Ming Lei wrote: > On Wed, Feb 24, 2016 at 10:28 AM, John David Anglin > <dave.anglin@bell.net> wrote: >> The following block change breaks boot on parisc-linux: >> >> commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e >> Author: Kent Overstreet <kent.overstreet@gmail.com> >> Date: Thu Apr 23 22:37:18 2015 -0700 >> >> block: make generic_make_request handle arbitrarily sized bios >> >> The way the block layer is currently written, it goes to great lengths >> to avoid having to split bios; upper layer code (such as bio_add_page()) >> checks what the underlying device can handle and tries to always create >> bios that don't need to be split. >> >> But this approach becomes unwieldy and eventually breaks down with >> stacked devices and devices with dynamic limits, and it adds a lot of >> complexity. If the block layer could split bios as needed, we could >> eliminate a lot of complexity elsewhere - particularly in stacked >> drivers. Code that creates bios can then create whatever size bios are >> convenient, and more importantly stacked drivers don't have to deal with >> both their own bio size limitations and the limitations of the >> (potentially multiple) devices underneath them. In the future this will >> let us delete merge_bvec_fn and a bunch of other code. >> >> We do this by adding calls to blk_queue_split() to the various >> make_request functions that need it - a few can already handle arbitrary >> size bios. Note that we add the call _after_ any call to >> blk_queue_bounce(); this means that blk_queue_split() and >> blk_recalc_rq_segments() don't need to be concerned with bouncing >> affecting segment merging. >> >> Some make_request_fn() callbacks were simple enough to audit and verify >> they don't need blk_queue_split() calls. The skipped ones are: >> >> * nfhd_make_request (arch/m68k/emu/nfblock.c) >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) >> * brd_make_request (ramdisk - drivers/block/brd.c) >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) >> * loop_make_request >> * null_queue_bio >> * bcache's make_request fns >> >> Some others are almost certainly safe to remove now, but will be left >> for future patches. >> >> Cc: Jens Axboe <axboe@kernel.dk> >> Cc: Christoph Hellwig <hch@infradead.org> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Cc: Ming Lei <ming.lei@canonical.com> >> Cc: Neil Brown <neilb@suse.de> >> Cc: Alasdair Kergon <agk@redhat.com> >> Cc: Mike Snitzer <snitzer@redhat.com> >> Cc: dm-devel@redhat.com >> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> >> Cc: drbd-user@lists.linbit.com >> Cc: Jiri Kosina <jkosina@suse.cz> >> Cc: Geoff Levand <geoff@infradead.org> >> Cc: Jim Paris <jim@jtan.com> >> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> >> Cc: Minchan Kim <minchan@kernel.org> >> Cc: Nitin Gupta <ngupta@vflare.org> >> Cc: Oleg Drokin <oleg.drokin@intel.com> >> Cc: Andreas Dilger <andreas.dilger@intel.com> >> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) >> Acked-by: Mike Snitzer <snitzer@redhat.com> >> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> >> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] >> Signed-off-by: Dongsu Park <dpark@posteo.net> >> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> >> Signed-off-by: Jens Axboe <axboe@fb.com> >> >> This thread on the linux-parisc has most of the discussion and analysis: >> http://www.spinics.net/lists/linux-parisc/msg06710.html >> >> Essentially, the SCSI layer underestimates the number of sg segments needed and we run off the end of the sg list and crash. >> This happens because the protect bit is ignored. As a result 4.3 and later kernels fail to boot. This includes the current Debian >> kernel for hppa. >> >> Hopefully, the block group can help resolve this issue. We can help with testing if needed. >> > > We fixed several similar bugs, but maybe there is another one, :-( Thanks for your help! > Could you apply the attached debug patch and post the log after the issue is > triggered? Sadly I was not yet able to produce the requested output for you. It seems we have - probably triggered due to the block splitting itself - some kind of memory/stack corruption in here as well. Note, the stack grows upwards(!) on parisc, so maybe local variables get overwritten somehow...? First I applied your patch as is. Here is the system log so far: [ 24.940000] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 25.000000] sd 3:0:6:0: [sdb] 71132960 512-byte logical blocks: (36.4 GB/33.9 GiB) [ 25.092000] sd 3:0:5:0: [sda] Write Protect is off [ 25.152000] sd 3:0:5:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA [ 25.268000] sd 3:0:6:0: [sdb] Write Protect is off [ 25.328000] sd 3:0:6:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA [ 25.452000] sda: sda1 sda2 sda3 < sda5 sda6 > [ 25.508000] sdb: sdb1 sdb2 sdb3 < sdb5 sdb6 > [ 25.584000] scsi_id(112): unaligned access to 0x00000000facac009 at ip=0x000000004100390b [ 25.688000] sd 3:0:6:0: [sdb] Attached SCSI disk [ 25.752000] sd 3:0:5:0: [sda] Attached SCSI disk [ 25.840000] scsi_id(113): unaligned access to 0x00000000fa799009 at ip=0x000000004100390b [ 25.972000] random: nonblocking pool is initialized [ 26.064000] ------------[ cut here ]------------ [ 26.120000] WARNING: at /build/linux-4.4/linux-4.4.2/block/blk-merge.c:484 [ 26.200000] Modules linked in: sd_mod sr_mod cdrom ata_generic ohci_pci ehci_pci ohci_hcd ehci_hcd pata_ns87415 libata sym53c8xx scsi_transport_spi usbcore usb_common scsi_mod tulip [ 26.396000] CPU: 0 PID: 67 Comm: systemd-udevd Not tainted 4.4.0-1-parisc64-smp #7 Debian 4.4.2-2 [ 26.504000] task: 00000000bbe96b18 ti: 00000000bbf00000 task.ti: 00000000bbf00000 [ 26.592000] [ 26.608000] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI [ 26.664000] PSW: 00001000000001001111111100001110 Not tainted [ 26.736000] r00-03 000000ff0804ff0e 00000000409e9280 00000000404f22a4 00000000bbf011e0 Here it stops. Usually you would see a stack backtrace now, but as I mentioned above maybe the stack got corrupted and as such your tracing code is not being executed. Then I moved the WARN_ON behind you tracing code. With that I got: [ 25.352000] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 25.444000] sda: sda1 sda2 sda3 < sda5 sda6 > [ 25.496000] sdb: sdb1 sdb2 sdb3 < sdb5 sdb6 > [ 25.568000] sd 3:0:5:0: [sda] Attached SCSI disk [ 25.644000] scsi_id(112): unaligned access to 0x00000000fae80009 at ip=0x000000004100390b [ 25.752000] sd 3:0:6:0: [sdb] Attached SCSI disk [ 25.836000] scsi_id(113): unaligned access to 0x00000000fac76009 at ip=0x000000004100390b [ 25.988000] random: nonblocking pool is initialized [ 26.048000] ------------[ cut here ]------------ [ 26.104000] kernel BUG at /build/linux-4.4/linux-4.4.2/include/linux/scatterlist.h:92! [ 26.196000] CPU: 0 PID: 68 Comm: systemd-udevd Not tainted 4.4.0-1-parisc64-smp #8 Debian 4.4.2-2 [ 26.304000] task: 00000000bbeb40f8 ti: 00000000bbf08000 task.ti: 00000000bbf08000 [ 26.392000] [ 26.412000] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI [ 26.468000] PSW: 00001000000001001111111100001110 Not tainted [ 26.536000] r00-03 000000ff0804ff0e 00000000bbf091e0 00000000404f228c 00000000bbf091e0 [ 26.632000] r04-07 00000000409b3a80 0000000000005000 0000000000000000 0000000000001000 [ 26.728000] r08-11 000000000000001b 00000000000001b0 00000000bfd6e0f0 00000000bbe15a60 [ 26.824000] r12-15 0000000000000000 0000000000000000 0000000000000003 000000007fa95178 Again, no stack backtrace. Maybe Dave has more luck, otherwise I'll continue to try to get some info. Helge -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-merge.c b/block/blk-merge.c index 2613531..1a73eb6 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -416,6 +416,24 @@ single_segment: return nsegs; } +static void check_bvec(struct request *req) +{ + struct bio_vec bvec; + struct req_iterator iter; + int i = 0; + + printk("%s: dump bvec for %p(f:%llx, t:%d)\n", + __func__, req, req->cmd_flags, + req->cmd_type); + rq_for_each_segment(bvec, req, iter) { + printk("\t %4d: %u %u %lu %p\n", + i, bvec.bv_offset, bvec.bv_len, + (unsigned long)page_to_pfn(bvec.bv_page), + iter.bio); + i++; + } +} + /* * map a request to scatterlist, return number of sg entries setup. Caller * must make sure sg can hold rq->nr_phys_segments entries @@ -425,6 +443,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, { struct scatterlist *sg = NULL; int nsegs = 0; + bool extra_len = false, dma_drain = false; if (rq->bio) nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg); @@ -436,6 +455,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, sg->length += pad_len; rq->extra_len += pad_len; + extra_len = true; } if (q->dma_drain_size && q->dma_drain_needed(rq)) { @@ -450,6 +470,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, (PAGE_SIZE - 1)); nsegs++; rq->extra_len += q->dma_drain_size; + dma_drain = true; } if (sg) @@ -460,6 +481,12 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, * segment is bigger than number of req's physical segments */ WARN_ON(nsegs > rq->nr_phys_segments); + if (nsegs > rq->nr_phys_segments) { + printk("%s: merge bug: %d %d, extra_len %d, dma_drain %d\n", + __func__, nsegs, rq->nr_phys_segments, + extra_len, dma_drain); + check_bvec(rq); + } return nsegs; }