From patchwork Thu Apr 11 01:45:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10895011 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13C501390 for ; Thu, 11 Apr 2019 01:45:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF2892858B for ; Thu, 11 Apr 2019 01:45:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E3E0F28C76; Thu, 11 Apr 2019 01:45:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B648D28A9C for ; Thu, 11 Apr 2019 01:45:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726548AbfDKBpu (ORCPT ); Wed, 10 Apr 2019 21:45:50 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:57898 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbfDKBpu (ORCPT ); Wed, 10 Apr 2019 21:45:50 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3B1hVkw051540 for ; Thu, 11 Apr 2019 01:45:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=hFp14u5yMZZ5XSlSc+mfM/Q6wdPxBehqJHxJFsKT6uk=; b=IOI36lBhleBArv7h+FVXnNfxc4ioGZL25HLZFNy/aLNJMXJoRZ6fMpXOi+RtGIiQOTE1 CBJkADX4Uxi7c+Ri+tYQFojxOhgDVhm7hwcgNe0U3nYEvpoaWLzXODLnDEwcxOLKFilv ksGHTl/LnjZNOd1yXJ+kz8VvD1p+L3DwPhc+tz5x/6tKXuPvW9aSz+uPrEN4SVaMwOuo PcX2qHj+XEHHHtRfZTrisSPULvYhu1i9MxnQQKIRa9Vam3lipbDSioab9Ptohl8jVwl7 1ohqxZoR6/CgNqWF3aJ3E76OGy36E6+QbroyhM4L190p34hKk9ndP0MPJKtGpYnd7U61 EA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 2rpmrqe2t4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Thu, 11 Apr 2019 01:45:48 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3B1iaUx049216 for ; Thu, 11 Apr 2019 01:45:48 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 2rph7tgpcy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Thu, 11 Apr 2019 01:45:48 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x3B1jl0C013317 for ; Thu, 11 Apr 2019 01:45:47 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 10 Apr 2019 18:45:47 -0700 Subject: [PATCH 3/8] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Date: Wed, 10 Apr 2019 18:45:45 -0700 Message-ID: <155494714539.1090518.9582107800209578968.stgit@magnolia> In-Reply-To: <155494712442.1090518.2784809287026447547.stgit@magnolia> References: <155494712442.1090518.2784809287026447547.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9223 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904110011 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9223 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904110011 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong If we know the filesystem metadata isn't healthy during unmount, we want to encourage the administrator to run xfs_repair right away. We can't do this if BAD_SUMMARY will cause an unclean log unmount to force summary recalculation, so turn it off if the fs is bad. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/libxfs/xfs_health.h | 2 + fs/xfs/xfs_health.c | 74 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.c | 2 + fs/xfs/xfs_trace.h | 3 ++ 4 files changed, 81 insertions(+) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 30762a5d4862..a434b47f2aa0 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -110,6 +110,8 @@ void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask); void xfs_inode_measure_sickness(struct xfs_inode *ip, unsigned int *sick, unsigned int *checked); +void xfs_health_unmount(struct xfs_mount *mp); + /* Now some helpers. */ static inline bool diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 941f33037e2f..21728228e08b 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -19,6 +19,80 @@ #include "xfs_trace.h" #include "xfs_health.h" +/* + * Warn about metadata corruption that we detected but haven't fixed, and + * make sure we're not sitting on anything that would get in the way of + * recovery. + */ +void +xfs_health_unmount( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + unsigned int sick = 0; + unsigned int checked = 0; + bool warn = false; + + if (XFS_FORCED_SHUTDOWN(mp)) + return; + + /* Measure AG corruption levels. */ + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + pag = xfs_perag_get(mp, agno); + xfs_ag_measure_sickness(pag, &sick, &checked); + if (sick) { + trace_xfs_ag_unfixed_corruption(mp, agno, sick); + warn = true; + } + xfs_perag_put(pag); + } + + /* Measure realtime volume corruption levels. */ + xfs_rt_measure_sickness(mp, &sick, &checked); + if (sick) { + trace_xfs_rt_unfixed_corruption(mp, sick); + warn = true; + } + + /* + * Measure fs corruption and keep the sample around for the warning. + * See the note below for why we exempt FS_COUNTERS. + */ + xfs_fs_measure_sickness(mp, &sick, &checked); + if (sick & ~XFS_SICK_FS_COUNTERS) { + trace_xfs_fs_unfixed_corruption(mp, sick); + warn = true; + } + + if (warn) { + xfs_warn(mp, +"Uncorrected metadata errors detected; please run xfs_repair."); + + /* + * We discovered uncorrected metadata problems at some point + * during this filesystem mount and have advised the + * administrator to run repair once the unmount completes. + * + * However, we must be careful -- when FSCOUNTERS are flagged + * unhealthy, the unmount procedure omits writing the clean + * unmount record to the log so that the next mount will run + * recovery and recompute the summary counters. In other + * words, we leave a dirty log to get the counters fixed. + * + * Unfortunately, xfs_repair cannot recover dirty logs, so if + * there were filesystem problems, FSCOUNTERS was flagged, and + * the administrator takes our advice to run xfs_repair, + * they'll have to zap the log before repairing structures. + * We don't really want to encourage this, so we mark the + * FSCOUNTERS healthy so that a subsequent repair run won't see + * a dirty log. + */ + if (sick & XFS_SICK_FS_COUNTERS) + xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS); + } +} + /* Mark unhealthy per-fs metadata. */ void xfs_fs_mark_sick( diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 14f454e09e6e..eff8b4c3eb3e 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1070,6 +1070,7 @@ xfs_mountfs( */ cancel_delayed_work_sync(&mp->m_reclaim_work); xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_health_unmount(mp); out_log_dealloc: mp->m_flags |= XFS_MOUNT_UNMOUNTING; xfs_log_mount_cancel(mp); @@ -1152,6 +1153,7 @@ xfs_unmountfs( */ cancel_delayed_work_sync(&mp->m_reclaim_work); xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_health_unmount(mp); xfs_qm_unmount(mp); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index f079841c7af6..2464ea351f83 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name, \ TP_ARGS(mp, flags)) DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick); DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy); +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption); DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick); DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy); +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption); DECLARE_EVENT_CLASS(xfs_ag_corrupt_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags), @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name, \ TP_ARGS(mp, agno, flags)) DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick); DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy); +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption); DECLARE_EVENT_CLASS(xfs_inode_corrupt_class, TP_PROTO(struct xfs_inode *ip, unsigned int flags),