From patchwork Wed Feb 24 07:59:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 8402431 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 8BB599F1D4 for ; Wed, 24 Feb 2016 07:59:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 690902028D for ; Wed, 24 Feb 2016 07:59:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2F19420212 for ; Wed, 24 Feb 2016 07:59:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757625AbcBXH7Y (ORCPT ); Wed, 24 Feb 2016 02:59:24 -0500 Received: from mail-yw0-f169.google.com ([209.85.161.169]:35055 "EHLO mail-yw0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750718AbcBXH7W (ORCPT ); Wed, 24 Feb 2016 02:59:22 -0500 Received: by mail-yw0-f169.google.com with SMTP id g127so9112657ywf.2; Tue, 23 Feb 2016 23:59:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=psK68E2NU5btFoXvzoCdJgwE0aoDZRa8wnx9hEVm0Ns=; b=j0Ug5FkT+aPP9YfV12wrcnN7fAhsE8oPGUDWZwyiSIVOzjfPiOv17SYzrimbNB2XHr rTvj1ReKU/c63Eq70ZFYYilHzUISwupUtk1ZpZUNy74Ap7ErX67gV7PZMMm/Z3eUpHdd yi4j1dRHTC7Gx9Rs8BeNgCRLxGw36Sk/bnQZnNS5snBRZLxR4dJF+d4XvVCeX/eYiPvL J0WJzVT5ek5B/JoWx1xa6sNCIv+NG/sTOfNTTk4mcgJd+woIgFqxuDfQMhs/GG31mqUk 8KyqUnr9cPvFa3LLUk8S5HEA0L7FLqlbOJN3DRcC9Mebb+x/CKWDKDLmAO0nNbMa0crl fS3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=psK68E2NU5btFoXvzoCdJgwE0aoDZRa8wnx9hEVm0Ns=; b=chTME/kN2zw9pW8NW26QH0jqqShby1Td6qz0la2HSUlBwQ0qST+Dppvn/+RnuF36VR EYksCW8RpnNsDNq8eKZk8uPVFTlZvJc3IPK8bWZq+AQL6HnEvGXzZNHNsifd9ZFxVGMF t8iviKwXky9nymJkUrCnY8e/MCoQvsAojRdFQT5LP5AjsCTNK2AHtWq0ATD5hCohvsqq CC4OkwY0n8QFMMwr3SRG2JH72WQ5EHO/lqflhDCQmp/gBb6/tnO5awcISTfOBCSnbau1 HiG7jpH4ZAymmwF/Wo3d4b3OWI1qZ+nL4jTBfjrHFeLdgKNbxZHOayYdS2TvqZ8pzuHw pqXw== X-Gm-Message-State: AG10YOTDaRI8Lc1ngPn9K+jLrnopfrBKyFImV8aFNZdqK19dR9bj3MZgt1S68KVQCc++fffqtqWDB8B5RBSIEg== MIME-Version: 1.0 X-Received: by 10.13.241.199 with SMTP id a190mr20948459ywf.47.1456300762205; Tue, 23 Feb 2016 23:59:22 -0800 (PST) Received: by 10.37.25.134 with HTTP; Tue, 23 Feb 2016 23:59:22 -0800 (PST) In-Reply-To: <227C05EC-5A1C-4FAF-89D8-4A45AF600EC4@bell.net> References: <227C05EC-5A1C-4FAF-89D8-4A45AF600EC4@bell.net> Date: Wed, 24 Feb 2016 15:59:22 +0800 Message-ID: Subject: Re: [BUG] "block: make generic_make_request handle arbitrarily sized bios" breaks boot on parisc-linux From: Ming Lei To: John David Anglin Cc: linux-block@vger.kernel.org, Linux SCSI List , Helge Deller , James Bottomley , linux-parisc List , Kent Overstreet Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RP_MATCHES_RCVD, T_DKIM_INVALID, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Feb 24, 2016 at 10:28 AM, John David Anglin wrote: > The following block change breaks boot on parisc-linux: > > commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e > Author: Kent Overstreet > Date: Thu Apr 23 22:37:18 2015 -0700 > > block: make generic_make_request handle arbitrarily sized bios > > The way the block layer is currently written, it goes to great lengths > to avoid having to split bios; upper layer code (such as bio_add_page()) > checks what the underlying device can handle and tries to always create > bios that don't need to be split. > > But this approach becomes unwieldy and eventually breaks down with > stacked devices and devices with dynamic limits, and it adds a lot of > complexity. If the block layer could split bios as needed, we could > eliminate a lot of complexity elsewhere - particularly in stacked > drivers. Code that creates bios can then create whatever size bios are > convenient, and more importantly stacked drivers don't have to deal with > both their own bio size limitations and the limitations of the > (potentially multiple) devices underneath them. In the future this will > let us delete merge_bvec_fn and a bunch of other code. > > We do this by adding calls to blk_queue_split() to the various > make_request functions that need it - a few can already handle arbitrary > size bios. Note that we add the call _after_ any call to > blk_queue_bounce(); this means that blk_queue_split() and > blk_recalc_rq_segments() don't need to be concerned with bouncing > affecting segment merging. > > Some make_request_fn() callbacks were simple enough to audit and verify > they don't need blk_queue_split() calls. The skipped ones are: > > * nfhd_make_request (arch/m68k/emu/nfblock.c) > * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) > * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) > * brd_make_request (ramdisk - drivers/block/brd.c) > * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) > * loop_make_request > * null_queue_bio > * bcache's make_request fns > > Some others are almost certainly safe to remove now, but will be left > for future patches. > > Cc: Jens Axboe > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Ming Lei > Cc: Neil Brown > Cc: Alasdair Kergon > Cc: Mike Snitzer > Cc: dm-devel@redhat.com > Cc: Lars Ellenberg > Cc: drbd-user@lists.linbit.com > Cc: Jiri Kosina > Cc: Geoff Levand > Cc: Jim Paris > Cc: Philip Kelleher > Cc: Minchan Kim > Cc: Nitin Gupta > Cc: Oleg Drokin > Cc: Andreas Dilger > Acked-by: NeilBrown (for the 'md/md.c' bits) > Acked-by: Mike Snitzer > Reviewed-by: Martin K. Petersen > Signed-off-by: Kent Overstreet > [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] > Signed-off-by: Dongsu Park > Signed-off-by: Ming Lin > Signed-off-by: Jens Axboe > > This thread on the linux-parisc has most of the discussion and analysis: > http://www.spinics.net/lists/linux-parisc/msg06710.html > > Essentially, the SCSI layer underestimates the number of sg segments needed and we run off the end of the sg list and crash. > This happens because the protect bit is ignored. As a result 4.3 and later kernels fail to boot. This includes the current Debian > kernel for hppa. > > Hopefully, the block group can help resolve this issue. We can help with testing if needed. > We fixed several similar bugs, but maybe there is another one, :-( Could you apply the attached debug patch and post the log after the issue is triggered? Thanks, Ming Lei diff --git a/block/blk-merge.c b/block/blk-merge.c index 2613531..1a73eb6 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -416,6 +416,24 @@ single_segment: return nsegs; } +static void check_bvec(struct request *req) +{ + struct bio_vec bvec; + struct req_iterator iter; + int i = 0; + + printk("%s: dump bvec for %p(f:%llx, t:%d)\n", + __func__, req, req->cmd_flags, + req->cmd_type); + rq_for_each_segment(bvec, req, iter) { + printk("\t %4d: %u %u %lu %p\n", + i, bvec.bv_offset, bvec.bv_len, + (unsigned long)page_to_pfn(bvec.bv_page), + iter.bio); + i++; + } +} + /* * map a request to scatterlist, return number of sg entries setup. Caller * must make sure sg can hold rq->nr_phys_segments entries @@ -425,6 +443,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, { struct scatterlist *sg = NULL; int nsegs = 0; + bool extra_len = false, dma_drain = false; if (rq->bio) nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg); @@ -436,6 +455,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, sg->length += pad_len; rq->extra_len += pad_len; + extra_len = true; } if (q->dma_drain_size && q->dma_drain_needed(rq)) { @@ -450,6 +470,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, (PAGE_SIZE - 1)); nsegs++; rq->extra_len += q->dma_drain_size; + dma_drain = true; } if (sg) @@ -460,6 +481,12 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, * segment is bigger than number of req's physical segments */ WARN_ON(nsegs > rq->nr_phys_segments); + if (nsegs > rq->nr_phys_segments) { + printk("%s: merge bug: %d %d, extra_len %d, dma_drain %d\n", + __func__, nsegs, rq->nr_phys_segments, + extra_len, dma_drain); + check_bvec(rq); + } return nsegs; }