From patchwork Tue Aug 23 13:49:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 9295765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 98775607D0 for ; Tue, 23 Aug 2016 14:42:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8916028C61 for ; Tue, 23 Aug 2016 14:42:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7DAA928C6A; Tue, 23 Aug 2016 14:42:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA17928C61 for ; Tue, 23 Aug 2016 14:42:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753386AbcHWOms (ORCPT ); Tue, 23 Aug 2016 10:42:48 -0400 Received: from mail-pa0-f67.google.com ([209.85.220.67]:36113 "EHLO mail-pa0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752554AbcHWOmr (ORCPT ); Tue, 23 Aug 2016 10:42:47 -0400 Received: by mail-pa0-f67.google.com with SMTP id ez1so10037677pab.3; Tue, 23 Aug 2016 07:39:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=FeGb6azOPmIecc8j0uZYO7kieQTv38rlMx/+m/iQxgg=; b=WXew703xiuHwEOGJ5WULJ7vcZOWGRJv7I90dSnAYW6/6MBp06OarVtYE7L3tNOkgy2 JEQpV34mx4B1J40S5MuJnzx2JZNqguQZX8yg7DB6VQcxvMYVvWsxPBm9ksJlRn682eRM UhhV8tWLVs9rBkdoqJ8uAMs36ZtaFUDTxGpqmUURet9IxGOnHQ/a2vh/nAfjCzryuVHE HmVu6LyDt8d+PEfjTl+mIDzy20G43WtxorgT8UjjWB0GNd0oiTQURPy8Whbh0B09dheW teOsv3SZjLMpl6wNsj31Lljfhe3RDrIc/JhHxO041DexFNf9VENK+OYH8g34pFUR8Vsu v3fw== X-Gm-Message-State: AEkoousDjOvf1fXWX5WA9R3NpQFlYHEsJ8R917KWNv4sKwRM17POOxdk7biodVP4kbicNg== X-Received: by 10.66.246.134 with SMTP id xw6mr53005058pac.35.1471960233080; Tue, 23 Aug 2016 06:50:33 -0700 (PDT) Received: from localhost (56.34.213.162.lcy-01.canonistack.canonical.com. [162.213.34.56]) by smtp.gmail.com with ESMTPSA id h1sm6059564pay.48.2016.08.23.06.50.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Aug 2016 06:50:31 -0700 (PDT) From: Ming Lei To: Jens Axboe , linux-kernel@vger.kernel.org Cc: linux-block@vger.kernel.org, Christoph Hellwig , Kent Overstreet , Eric Wheeler , Sebastian Roesner , Ming Lei , stable@vger.kernel.org (4.3+), Shaohua Li , Jens Axboe Subject: [PATCH v4] block: make sure big bio is splitted into at most 256 bvecs Date: Tue, 23 Aug 2016 21:49:45 +0800 Message-Id: <1471960185-14044-1-git-send-email-ming.lei@canonical.com> X-Mailer: git-send-email 2.7.4 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP After arbitrary bio size is supported, the incoming bio may be very big. We have to split the bio into small bios so that each holds at most BIO_MAX_PAGES bvecs for safety reason, such as bio_clone(). This patch fixes the following kernel crash: > [ 172.660142] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 > [ 172.660229] IP: [] bio_trim+0xf/0x2a > [ 172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0 > [ 172.660399] Oops: 0000 [#1] SMP > [...] > [ 172.664780] Call Trace: > [ 172.664813] [] ? raid1_make_request+0x2e8/0xad7 [raid1] > [ 172.664846] [] ? blk_queue_split+0x377/0x3d4 > [ 172.664880] [] ? md_make_request+0xf6/0x1e9 [md_mod] > [ 172.664912] [] ? generic_make_request+0xb5/0x155 > [ 172.664947] [] ? prio_io+0x85/0x95 [bcache] > [ 172.664981] [] ? register_cache_set+0x355/0x8d0 [bcache] > [ 172.665016] [] ? register_bcache+0x1006/0x1174 [bcache] The issue can be reproduced by the following steps: - create one raid1 over two virtio-blk - build bcache device over the above raid1 and another cache device and bucket size is set as 2Mbytes - set cache mode as writeback - run random write over ext4 on the bcache device Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios) Reported-by: Sebastian Roesner Reported-by: Eric Wheeler Cc: stable@vger.kernel.org (4.3+) Cc: Shaohua Li Acked-by: Kent Overstreet Signed-off-by: Ming Lei Acked-by: Kent Overstreet --- V4: - don't consider merging this kind of bio which is splitted because of reaching max bvec limit for making change simple, as requested by Christoph and Kent V3: - rebase against v4.8-rc1 since .bi_rw of bio is renamed as .bi_opf V2: - don't mark as REQ_NOMERGE in case the bio is splitted for reaching the limit of bvecs count V1: - Kent pointed out that using max io size can't cover the case of non-full bvecs/pages block/blk-merge.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/block/blk-merge.c b/block/blk-merge.c index 3eec75a..f6ae884 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -94,9 +94,31 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, bool do_split = true; struct bio *new = NULL; const unsigned max_sectors = get_max_io_size(q, bio); + unsigned bvecs = 0; bio_for_each_segment(bv, bio, iter) { /* + * With arbitrary bio size, the incoming bio may be very + * big. We have to split the bio into small bios so that + * each holds at most BIO_MAX_PAGES bvecs because + * bio_clone() can fail to allocate big bvecs. + * + * It should have been better to apply the limit per + * request queue in which bio_clone() is involved, + * instead of globally. The biggest blocker is the + * bio_clone() in bio bounce. + * + * If bio is splitted by this reason, we should have + * allowed to continue bios merging, but don't do + * that now for making the change simple. + * + * TODO: deal with bio bounce's bio_clone() gracefully + * and convert the global limit into per-queue limit. + */ + if (bvecs++ >= BIO_MAX_PAGES) + goto split; + + /* * If the queue doesn't support SG gaps and adding this * offset would create a gap, disallow it. */