From patchwork Fri Sep 23 21:05:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Bo X-Patchwork-Id: 9348879 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 262056077A for ; Fri, 23 Sep 2016 21:00:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C6DA2ADED for ; Fri, 23 Sep 2016 21:00:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 00FDD2ADEF; Fri, 23 Sep 2016 20:59:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75A012ADED for ; Fri, 23 Sep 2016 20:59:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760918AbcIWU7v (ORCPT ); Fri, 23 Sep 2016 16:59:51 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:48935 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753106AbcIWU7u (ORCPT ); Fri, 23 Sep 2016 16:59:50 -0400 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8NKxjYw010000 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Sep 2016 20:59:46 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8NKxj45024656 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Sep 2016 20:59:45 GMT Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u8NKxfkF020309; Fri, 23 Sep 2016 20:59:44 GMT Received: from localhost.us.oracle.com (/10.211.47.181) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 23 Sep 2016 13:59:41 -0700 From: Liu Bo To: linux-btrfs@vger.kernel.org Cc: David Sterba , Chris Mason , Josef Bacik Subject: [PATCH v2] Btrfs: kill BUG_ON in do_relocation Date: Fri, 23 Sep 2016 14:05:04 -0700 Message-Id: <1474664704-2612-1-git-send-email-bo.li.liu@oracle.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1473870467-18721-1-git-send-email-bo.li.liu@oracle.com> References: <1473870467-18721-1-git-send-email-bo.li.liu@oracle.com> X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While updating btree, we try to push items between sibling nodes/leaves in order to keep height as low as possible. But we don't memset the original places with zero when pushing items so that we could end up leaving stale content in nodes/leaves. One may read the above stale content by increasing btree blocks' @nritems. One case I've come across is that in fs tree, a leaf has two parent nodes, hence running balance ends up with processing this leaf with two parent nodes, but it can only reach the valid parent node through btrfs_search_slot, so it'd be like, do_relocation for P in all parent nodes of block A: if !P->eb: btrfs_search_slot(key); --> get path from P to A. if lowest: BUG_ON(A->bytenr != bytenr of A recorded in P); btrfs_cow_block(P, A); --> change A's bytenr in P. After btrfs_cow_block, P has the new bytenr of A, but with the same @key, we get the same path again, and get panic by BUG_ON. Note that this is only happening in a corrupted fs, for a regular fs in which we have correct @nritems so that we won't read stale content in any case. Reviewed-by: Josef Bacik Signed-off-by: Liu Bo --- v2: - use new internal error EFSCORRUPTED as "Filesystem is corrupted", suggested by David Sterba. fs/btrfs/ctree.h | 2 ++ fs/btrfs/relocation.c | 9 ++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 67d71c0..8b23410 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -127,6 +127,8 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) #define BTRFS_OLD_BACKREF_REV 0 #define BTRFS_MIXED_BACKREF_REV 1 +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ + /* * every tree block (leaf or node) starts with this header. */ diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index c0c13dc..6f8b952 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2712,7 +2712,14 @@ static int do_relocation(struct btrfs_trans_handle *trans, bytenr = btrfs_node_blockptr(upper->eb, slot); if (lowest) { - BUG_ON(bytenr != node->bytenr); + if (bytenr != node->bytenr) { + btrfs_err(root->fs_info, + "lowest leaf/node mismatch: bytenr %llu node->bytenr %llu slot %d upper %llu", + bytenr, node->bytenr, slot, + upper->eb->start); + err = -EFSCORRUPTED; + goto next; + } } else { if (node->eb->start == bytenr) goto next;