From patchwork Thu Oct 29 21:04:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wengang Wang X-Patchwork-Id: 11867537 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38EC661C for ; Thu, 29 Oct 2020 21:12:32 +0000 (UTC) Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C59BC206F7 for ; Thu, 29 Oct 2020 21:12:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="dMkwBenc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C59BC206F7 Authentication-Results: mail.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=ocfs2-devel-bounces@oss.oracle.com Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09TL9F80006402; Thu, 29 Oct 2020 21:11:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : date : message-id : mime-version : subject : list-id : list-unsubscribe : list-archive : list-post : list-help : list-subscribe : content-type : content-transfer-encoding : sender; s=corp-2020-01-29; bh=2L2Xs1aWSoUR4rF7dzrlqZvgQlK3fiYs1hVOar3Gn5c=; b=dMkwBencOXGsX+RWCQbVWf36yg/WrOhoGF412fWU8bG5S5dE9mCtquZKe9FcWvM0Jd6K 3xegWAQbPCDq7r+daHZ1OEfXGW6xwT92UWQUykY8WdrpRcjkeT8qeWRJiG04X4zTxpMf wysbhOjzd0DEMZjX6gUeta/yJGhQ7R8qWlETdqqYv79hzkV2IU+rzlAsD+CJdyPptPnu Cy/f2WaQoKKh5xFJhjQg30KfaTpqVycbRM+Nub78l2seNh4K/y3Uf3guaVLfUGR5qy3N 1l1N/jqalXIN3Jk/SvRogytBMUkJTMMSdLzHyF8GO588lGC1K/NZ74ZCy6rqh+VPVZGp eg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 34cc7m731h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 29 Oct 2020 21:11:51 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09TLAQJ0021788; Thu, 29 Oct 2020 21:11:50 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userp3020.oracle.com with ESMTP id 34cx1tpg3q-1 (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO); Thu, 29 Oct 2020 21:11:50 +0000 Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1kYF6x-00032t-NS; Thu, 29 Oct 2020 14:05:31 -0700 Received: from aserp3030.oracle.com ([141.146.126.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1kYF6T-0002vE-VQ for ocfs2-devel@oss.oracle.com; Thu, 29 Oct 2020 14:05:02 -0700 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09TKsWBr132060 for ; Thu, 29 Oct 2020 21:05:01 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 34cwuq7s88-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Thu, 29 Oct 2020 21:05:01 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09TL51ps019307 for ; Thu, 29 Oct 2020 21:05:01 GMT Received: from oracle.com (/10.211.45.245) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 29 Oct 2020 14:05:01 -0700 From: Wengang Wang To: ocfs2-devel@oss.oracle.com Date: Thu, 29 Oct 2020 14:04:55 -0700 Message-Id: <20201029210455.15587-1-wen.gang.wang@oracle.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9789 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 mlxscore=0 bulkscore=0 spamscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 suspectscore=1 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010290146 Subject: [Ocfs2-devel] [PATCH] ocfs2: initialize ip_next_orphan X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9789 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010290148 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9789 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 adultscore=0 malwarescore=0 spamscore=0 clxscore=1011 mlxscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010290148 Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack [] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] [] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] [] ocfs2_evict_inode+0x152/0x820 [ocfs2] [] evict+0xae/0x1a0 [] iput+0x1c6/0x230 [] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] [] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] [] ocfs2_dir_foreach+0x29/0x30 [ocfs2] [] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] [] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] kthread+0xcb/0xf0 [] ret_from_fork+0x61/0x90 [] 0xffffffffffffffff The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. Fix: initialize ip_next_orphan as NULL. Signed-off-by: Wengang Wang --- fs/ocfs2/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 1d91dd1e8711..6f0e07584a15 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1724,6 +1724,8 @@ static void ocfs2_inode_init_once(void *data) &ocfs2_inode_caching_ops); inode_init_once(&oi->vfs_inode); + + oi->ip_next_orphan = NULL; } static int ocfs2_initialize_mem_caches(void)