From patchwork Fri Jun 19 18:52:51 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 6647131 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A5CC99F399 for ; Fri, 19 Jun 2015 18:56:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 875C220967 for ; Fri, 19 Jun 2015 18:56:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 43AAF20960 for ; Fri, 19 Jun 2015 18:56:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755269AbbFSSyH (ORCPT ); Fri, 19 Jun 2015 14:54:07 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:15031 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754234AbbFSSyE (ORCPT ); Fri, 19 Jun 2015 14:54:04 -0400 Received: from pps.filterd (m0004077 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id t5JIra1F008533; Fri, 19 Jun 2015 11:53:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=WceEgxOQ/FZ9qYXSt1Fa/BcGZR4VHUWqjYPfttwpRWs=; b=KWioh1bnQxhr5DPq1Nj6W7TKvTsT7vZpY65noPMge3psreyVauzlQUrXDQhfgVxJiYl5 DkQavRd86r487RbIbrW6GT/K2+tJgu+GVRdMBFezmZPAxxH8+8wttdWO8xLYiDnhTGbb fuS4rXvpQnFezrQm80ohbXaLQ2/PHQxdC8E= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0b-00082601.pphosted.com with ESMTP id 1v4rfb01u0-9 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 19 Jun 2015 11:53:41 -0700 Received: from huxley.thefacebook.com (192.168.52.123) by mail.thefacebook.com (192.168.16.22) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 19 Jun 2015 11:53:16 -0700 From: Omar Sandoval To: CC: Miao Xie , Zhao Lei , wangyf , Philip , Omar Sandoval Subject: [PATCH v2 4/5] Btrfs: fix device replace of a missing RAID 5/6 device Date: Fri, 19 Jun 2015 11:52:51 -0700 Message-ID: <95c1a5d41185c3c263f85228535353e0bc21e755.1434739053.git.osandov@fb.com> X-Mailer: git-send-email 2.4.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151, 1.0.33, 0.0.0000 definitions=2015-06-19_06:2015-06-18, 2015-06-19, 1970-01-01 signatures=0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The original implementation of device replace on RAID 5/6 seems to have missed support for replacing a missing device. When this is attempted, we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which crashes when we try to dereference it. This happens because btrfs_map_block() has no choice but to return us the missing device because RAID 5/6 don't have any alternate mirrors to read from, and a missing device has a NULL bdev. The idea implemented here is to handle the missing device case separately, which better only happen when we're replacing a missing RAID 5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to reconstruct the data from parity, check it with scrub_recheck_block_checksum(), and write it out with scrub_write_block_to_dev_replace(). Reported-by: Philip Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141 Signed-off-by: Omar Sandoval --- fs/btrfs/scrub.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 147 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b94694d59de5..b75f1e9c6adc 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -125,6 +125,7 @@ struct scrub_block { /* It is for the data with checksum */ unsigned int data_corrected:1; }; + struct btrfs_work work; }; /* Used for the chunks with parity stripe such RAID5/6 */ @@ -2164,6 +2165,134 @@ again: return 0; } +static void scrub_missing_raid56_end_io(struct bio *bio, int error) +{ + struct scrub_block *sblock = bio->bi_private; + struct btrfs_fs_info *fs_info = sblock->sctx->dev_root->fs_info; + + if (error) + sblock->no_io_error_seen = 0; + + btrfs_queue_work(fs_info->scrub_workers, &sblock->work); +} + +static void scrub_missing_raid56_worker(struct btrfs_work *work) +{ + struct scrub_block *sblock = container_of(work, struct scrub_block, work); + struct scrub_ctx *sctx = sblock->sctx; + struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info; + unsigned int is_metadata; + unsigned int have_csum; + u8 *csum; + u64 generation; + u64 logical; + struct btrfs_device *dev; + + is_metadata = !(sblock->pagev[0]->flags & BTRFS_EXTENT_FLAG_DATA); + have_csum = sblock->pagev[0]->have_csum; + csum = sblock->pagev[0]->csum; + generation = sblock->pagev[0]->generation; + logical = sblock->pagev[0]->logical; + dev = sblock->pagev[0]->dev; + + if (sblock->no_io_error_seen) { + scrub_recheck_block_checksum(fs_info, sblock, is_metadata, + have_csum, csum, generation, + sctx->csum_size); + } + + if (!sblock->no_io_error_seen) { + spin_lock(&sctx->stat_lock); + sctx->stat.read_errors++; + spin_unlock(&sctx->stat_lock); + printk_ratelimited_in_rcu(KERN_ERR + "BTRFS: I/O error rebulding logical %llu for dev %s\n", + logical, rcu_str_deref(dev->name)); + } else if (sblock->header_error || sblock->checksum_error) { + spin_lock(&sctx->stat_lock); + sctx->stat.uncorrectable_errors++; + spin_unlock(&sctx->stat_lock); + printk_ratelimited_in_rcu(KERN_ERR + "BTRFS: failed to rebuild valid logical %llu for dev %s\n", + logical, rcu_str_deref(dev->name)); + } else { + scrub_write_block_to_dev_replace(sblock); + } + + scrub_block_put(sblock); + + if (sctx->is_dev_replace && + atomic_read(&sctx->wr_ctx.flush_all_writes)) { + mutex_lock(&sctx->wr_ctx.wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_ctx.wr_lock); + } + + scrub_pending_bio_dec(sctx); +} + +static void scrub_missing_raid56_pages(struct scrub_block *sblock) +{ + struct scrub_ctx *sctx = sblock->sctx; + struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info; + u64 length = sblock->page_count * PAGE_SIZE; + u64 logical = sblock->pagev[0]->logical; + struct btrfs_bio *bbio; + struct bio *bio; + struct btrfs_raid_bio *rbio; + int ret; + int i; + + ret = btrfs_map_sblock(fs_info, REQ_GET_READ_MIRRORS, logical, &length, + &bbio, 0, 1); + if (ret || !bbio || !bbio->raid_map) + goto bbio_out; + + if (WARN_ON(!sctx->is_dev_replace || + !(bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK))) { + /* + * We shouldn't be scrubbing a missing device. Even for dev + * replace, we should only get here for RAID 5/6. We either + * managed to mount something with no mirrors remaining or + * there's a bug in scrub_remap_extent()/btrfs_map_block(). + */ + goto bbio_out; + } + + bio = btrfs_io_bio_alloc(GFP_NOFS, 0); + if (!bio) + goto bbio_out; + + bio->bi_iter.bi_sector = logical >> 9; + bio->bi_private = sblock; + bio->bi_end_io = scrub_missing_raid56_end_io; + + rbio = raid56_alloc_missing_rbio(sctx->dev_root, bio, bbio, length); + if (!rbio) + goto rbio_out; + + for (i = 0; i < sblock->page_count; i++) { + struct scrub_page *spage = sblock->pagev[i]; + + raid56_add_scrub_pages(rbio, spage->page, spage->logical); + } + + btrfs_init_work(&sblock->work, btrfs_scrub_helper, + scrub_missing_raid56_worker, NULL, NULL); + scrub_block_get(sblock); + scrub_pending_bio_inc(sctx); + raid56_submit_missing_rbio(rbio); + return; + +rbio_out: + bio_put(bio); +bbio_out: + btrfs_put_bbio(bbio); + spin_lock(&sctx->stat_lock); + sctx->stat.malloc_errors++; + spin_unlock(&sctx->stat_lock); +} + static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u8 *csum, int force, @@ -2227,19 +2356,27 @@ leave_nomem: } WARN_ON(sblock->page_count == 0); - for (index = 0; index < sblock->page_count; index++) { - struct scrub_page *spage = sblock->pagev[index]; - int ret; + if (dev->missing) { + /* + * This case should only be hit for RAID 5/6 device replace. See + * the comment in scrub_missing_raid56_pages() for details. + */ + scrub_missing_raid56_pages(sblock); + } else { + for (index = 0; index < sblock->page_count; index++) { + struct scrub_page *spage = sblock->pagev[index]; + int ret; - ret = scrub_add_page_to_rd_bio(sctx, spage); - if (ret) { - scrub_block_put(sblock); - return ret; + ret = scrub_add_page_to_rd_bio(sctx, spage); + if (ret) { + scrub_block_put(sblock); + return ret; + } } - } - if (force) - scrub_submit(sctx); + if (force) + scrub_submit(sctx); + } /* last one frees, either here or in bio completion for last page */ scrub_block_put(sblock);