From patchwork Tue Apr 16 00:19:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10901753 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABA38186E for ; Tue, 16 Apr 2019 00:20:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89B822852A for ; Tue, 16 Apr 2019 00:20:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B64A286B0; Tue, 16 Apr 2019 00:20:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A17502852A for ; Tue, 16 Apr 2019 00:20:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727202AbfDPAUB (ORCPT ); Mon, 15 Apr 2019 20:20:01 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:42322 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727938AbfDPAUB (ORCPT ); Mon, 15 Apr 2019 20:20:01 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3G0JcHV057425; Tue, 16 Apr 2019 00:19:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=psMPPb9MklS3lItOudxAeyvVX1NAkZjfBb9DSvOdFSA=; b=yiQO67q/UUHfB0yh+SYnUe9doC2AG84GtmLLUleoF62i9PT+N5VN2rkJKIb+lITtUmKi 8qY7hSlMJiucesCc18kn7H9+/AkPipEU7F5yAZuQv6MpMEIRG8iLk3h8UTpNSt+2E0mv sqRfMDfvsHdT8TqFwpKzkSh0/m6IRqVKYscPi0AYq+a6uDpjQUhQBH/FkmHJkJhBW4ST E6bGYUM15kg6UdjnazdmNTfM3AarOMWZD0oWNgTBNtFDQV8jUj3QE2x8+R1WD42bbJj2 vPhaetW7znI4PuZYWYHlanDPZa1P7myLs66uDXyTTFKITUiHmVSCjiptiEl3SBXM9ruy Jg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2rusneqm93-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 16 Apr 2019 00:19:58 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3G0Ife2129034; Tue, 16 Apr 2019 00:19:57 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 2rubq619wb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 16 Apr 2019 00:19:57 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x3G0JvZS021018; Tue, 16 Apr 2019 00:19:57 GMT Received: from localhost (/10.159.133.168) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 15 Apr 2019 17:19:56 -0700 Subject: [PATCH 4/5] xfs: scrub/repair should update filesystem metadata health From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, Brian Foster Date: Mon, 15 Apr 2019 17:19:55 -0700 Message-ID: <155537399567.27935.2695652869908662243.stgit@magnolia> In-Reply-To: <155537397092.27935.16073573221774618735.stgit@magnolia> References: <155537397092.27935.16073573221774618735.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9228 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904150160 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9228 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904160000 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Now that we have the ability to track sick metadata in-core, make scrub and repair update those health assessments after doing work. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/health.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/health.h | 12 +++ fs/xfs/scrub/scrub.c | 4 + fs/xfs/scrub/scrub.h | 7 ++ 5 files changed, 200 insertions(+) create mode 100644 fs/xfs/scrub/health.c create mode 100644 fs/xfs/scrub/health.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 786379c143f4..b20964e26a22 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -143,6 +143,7 @@ xfs-y += $(addprefix scrub/, \ common.o \ dabtree.o \ dir.o \ + health.o \ ialloc.o \ inode.o \ parent.o \ diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c new file mode 100644 index 000000000000..770ab0723a38 --- /dev/null +++ b/fs/xfs/scrub/health.c @@ -0,0 +1,176 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_health.h" +#include "scrub/scrub.h" +#include "scrub/health.h" + +/* + * Scrub and In-Core Filesystem Health Assessments + * =============================================== + * + * Online scrub and repair have the time and the ability to perform stronger + * checks than we can do from the metadata verifiers, because they can + * cross-reference records between data structures. Therefore, scrub is in a + * good position to update the online filesystem health assessments to reflect + * the good/bad state of the data structure. + * + * We therefore extend scrub in the following ways to achieve this: + * + * 1. Create a "sick_mask_update" field in the scrub context. When we're + * setting up a scrub call, set this to the default XFS_SICK_* flag(s) for the + * selected scrub type (call it A). Scrub and repair functions can override + * the default sick_mask_update value if they choose. + * + * 2. If the scrubber returns a runtime error code, we exit making no changes + * to the incore sick state. + * + * 3. If the scrubber finds that A is clean, use sick_mask_update to clear the + * incore sick flags before exiting. + * + * 4. If the scrubber finds that A is corrupt, use sick_mask_update to set the + * incore sick flags. If the user didn't want to repair then we exit, leaving + * the metadata structure unfixed and the sick flag set. + * + * 5. Now we know that A is corrupt and the user wants to repair, so run the + * repairer. If the repairer returns an error code, we exit with that error + * code, having made no further changes to the incore sick state. + * + * 6. If repair rebuilds A correctly and the subsequent re-scrub of A is + * clean, use sick_mask_update to clear the incore sick flags. This should + * have the effect that A is no longer marked sick. + * + * 7. If repair rebuilds A incorrectly, the re-scrub will find it corrupt and + * use sick_mask_update to set the incore sick flags. This should have no + * externally visible effect since we already set them in step (4). + * + * There are some complications to this story, however. For certain types of + * complementary metadata indices (e.g. inobt/finobt), it is easier to rebuild + * both structures at the same time. The following principles apply to this + * type of repair strategy: + * + * 8. Any repair function that rebuilds multiple structures should update + * sick_mask_visible to reflect whatever other structures are rebuilt, and + * verify that all the rebuilt structures can pass a scrub check. The + * outcomes of 5-7 still apply, but with a sick_mask_update that covers + * everything being rebuilt. + */ + +/* Map our scrub type to a sick mask and a set of health update functions. */ + +enum xchk_health_group { + XHG_FS = 1, + XHG_RT, + XHG_AG, + XHG_INO, +}; + +struct xchk_health_map { + enum xchk_health_group group; + unsigned int sick_mask; +}; + +static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { + [XFS_SCRUB_TYPE_SB] = { XHG_AG, XFS_SICK_AG_SB }, + [XFS_SCRUB_TYPE_AGF] = { XHG_AG, XFS_SICK_AG_AGF }, + [XFS_SCRUB_TYPE_AGFL] = { XHG_AG, XFS_SICK_AG_AGFL }, + [XFS_SCRUB_TYPE_AGI] = { XHG_AG, XFS_SICK_AG_AGI }, + [XFS_SCRUB_TYPE_BNOBT] = { XHG_AG, XFS_SICK_AG_BNOBT }, + [XFS_SCRUB_TYPE_CNTBT] = { XHG_AG, XFS_SICK_AG_CNTBT }, + [XFS_SCRUB_TYPE_INOBT] = { XHG_AG, XFS_SICK_AG_INOBT }, + [XFS_SCRUB_TYPE_FINOBT] = { XHG_AG, XFS_SICK_AG_FINOBT }, + [XFS_SCRUB_TYPE_RMAPBT] = { XHG_AG, XFS_SICK_AG_RMAPBT }, + [XFS_SCRUB_TYPE_REFCNTBT] = { XHG_AG, XFS_SICK_AG_REFCNTBT }, + [XFS_SCRUB_TYPE_INODE] = { XHG_INO, XFS_SICK_INO_CORE }, + [XFS_SCRUB_TYPE_BMBTD] = { XHG_INO, XFS_SICK_INO_BMBTD }, + [XFS_SCRUB_TYPE_BMBTA] = { XHG_INO, XFS_SICK_INO_BMBTA }, + [XFS_SCRUB_TYPE_BMBTC] = { XHG_INO, XFS_SICK_INO_BMBTC }, + [XFS_SCRUB_TYPE_DIR] = { XHG_INO, XFS_SICK_INO_DIR }, + [XFS_SCRUB_TYPE_XATTR] = { XHG_INO, XFS_SICK_INO_XATTR }, + [XFS_SCRUB_TYPE_SYMLINK] = { XHG_INO, XFS_SICK_INO_SYMLINK }, + [XFS_SCRUB_TYPE_PARENT] = { XHG_INO, XFS_SICK_INO_PARENT }, + [XFS_SCRUB_TYPE_RTBITMAP] = { XHG_RT, XFS_SICK_RT_BITMAP }, + [XFS_SCRUB_TYPE_RTSUM] = { XHG_RT, XFS_SICK_RT_SUMMARY }, + [XFS_SCRUB_TYPE_UQUOTA] = { XHG_FS, XFS_SICK_FS_UQUOTA }, + [XFS_SCRUB_TYPE_GQUOTA] = { XHG_FS, XFS_SICK_FS_GQUOTA }, + [XFS_SCRUB_TYPE_PQUOTA] = { XHG_FS, XFS_SICK_FS_PQUOTA }, +}; + +/* Return the health status mask for this scrub type. */ +unsigned int +xchk_health_mask_for_scrub_type( + __u32 scrub_type) +{ + return type_to_health_flag[scrub_type].sick_mask; +} + +/* + * Update filesystem health assessments based on what we found and did. + * + * If the scrubber finds errors, we mark sick whatever's mentioned in + * sick_mask_update, no matter whether this is a first scan or an + * evaluation of repair effectiveness. + * + * Otherwise, no direct corruption was found, so mark whatever's in + * sick_mask_update as healthy. + */ +void +xchk_update_health( + struct xfs_scrub *sc) +{ + struct xfs_perag *pag; + bool bad; + + if (!sc->sick_mask_update) + return; + + bad = (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT); + switch (type_to_health_flag[sc->sm->sm_type].group) { + case XHG_AG: + pag = xfs_perag_get(sc->mp, sc->sm->sm_agno); + if (bad) + xfs_ag_mark_sick(pag, sc->sick_mask_update); + else + xfs_ag_mark_healthy(pag, sc->sick_mask_update); + xfs_perag_put(pag); + break; + case XHG_INO: + if (!sc->ip) + return; + if (bad) + xfs_inode_mark_sick(sc->ip, sc->sick_mask_update); + else + xfs_inode_mark_healthy(sc->ip, sc->sick_mask_update); + break; + case XHG_FS: + if (bad) + xfs_fs_mark_sick(sc->mp, sc->sick_mask_update); + else + xfs_fs_mark_healthy(sc->mp, sc->sick_mask_update); + break; + case XHG_RT: + if (bad) + xfs_rt_mark_sick(sc->mp, sc->sick_mask_update); + else + xfs_rt_mark_healthy(sc->mp, sc->sick_mask_update); + break; + default: + ASSERT(0); + break; + } +} diff --git a/fs/xfs/scrub/health.h b/fs/xfs/scrub/health.h new file mode 100644 index 000000000000..fd0d466c8658 --- /dev/null +++ b/fs/xfs/scrub/health.h @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SCRUB_HEALTH_H__ +#define __XFS_SCRUB_HEALTH_H__ + +unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type); +void xchk_update_health(struct xfs_scrub *sc); + +#endif /* __XFS_SCRUB_HEALTH_H__ */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 02d278b7d20b..01d5bfc1917c 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -40,6 +40,7 @@ #include "scrub/trace.h" #include "scrub/btree.h" #include "scrub/repair.h" +#include "scrub/health.h" /* * Online Scrub and Repair @@ -498,6 +499,7 @@ xfs_scrub_metadata( xchk_experimental_warning(mp); sc.ops = &meta_scrub_ops[sm->sm_type]; + sc.sick_mask_update = xchk_health_mask_for_scrub_type(sm->sm_type); retry_op: /* Set up for the operation. */ error = sc.ops->setup(&sc, ip); @@ -520,6 +522,8 @@ xfs_scrub_metadata( } else if (error) goto out_teardown; + xchk_update_health(&sc); + if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !(sc.flags & XREP_ALREADY_FIXED)) { bool needs_fix; diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 1b23bf141438..b1d632f1c5ff 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -66,6 +66,13 @@ struct xfs_scrub { /* See the XCHK/XREP state flags below. */ unsigned int flags; + /* + * The XFS_SICK_* flags that correspond to the metadata being scrubbed + * or repaired. We will use this mask to update the in-core fs health + * status with whatever we find. + */ + unsigned int sick_mask_update; + /* State tracking for single-AG operations. */ struct xchk_ag sa; };