[14/14] ocfs2: initialize ip_next_orphan

Message ID	20201114065223.JN1eernhY%akpm@linux-foundation.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ogTf=EU=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23A552064B Date: Fri, 13 Nov 2020 22:52:23 -0800 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, gechangwei@live.cn, ghe@suse.com, jlbec@evilplan.org, joseph.qi@linux.alibaba.com, junxiao.bi@oracle.com, linux-mm@kvack.org, mark@fasheh.com, mm-commits@vger.kernel.org, piaojun@huawei.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wen.gang.wang@oracle.com Subject: [patch 14/14] ocfs2: initialize ip_next_orphan Message-ID: <20201114065223.JN1eernhY%akpm@linux-foundation.org> In-Reply-To: <20201113225115.b24faebc85f710d5aff55aa7@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[01/14] mm/compaction: count pages and stop correctly during page isolation \| expand [01/14] mm/compaction: count pages and stop correctly during page isolation [02/14] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate [03/14] mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit [04/14] mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov [05/14] mm/slub: fix panic in slab_alloc_node() [06/14] mm/gup: use unpin_user_pages() in __gup_longterm_locked() [07/14] compiler.h: fix barrier_data() on clang [08/14] Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" [09/14] reboot: fix overflow parsing reboot cpu number [10/14] kernel/watchdog: fix watchdog_allowed_mask not used warning [11/14] mm: memcontrol: fix missing wakeup polling thread [12/14] hugetlbfs: fix anon huge page migration race [13/14] panic: don't dump stack twice on warn [14/14] ocfs2: initialize ip_next_orphan

Message ID

20201114065223.JN1eernhY%akpm@linux-foundation.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23A552064B
Date: Fri, 13 Nov 2020 22:52:23 -0800
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, gechangwei@live.cn, ghe@suse.com,
 jlbec@evilplan.org, joseph.qi@linux.alibaba.com, junxiao.bi@oracle.com,
 linux-mm@kvack.org, mark@fasheh.com, mm-commits@vger.kernel.org,
 piaojun@huawei.com, stable@vger.kernel.org,
 torvalds@linux-foundation.org, wen.gang.wang@oracle.com
Subject: [patch 14/14] ocfs2: initialize ip_next_orphan
Message-ID: <20201114065223.JN1eernhY%akpm@linux-foundation.org>
In-Reply-To: <20201113225115.b24faebc85f710d5aff55aa7@linux-foundation.org>
User-Agent: s-nail v14.8.16
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[01/14] mm/compaction: count pages and stop correctly during page isolation | expand

Commit Message

Andrew Morton Nov. 14, 2020, 6:52 a.m. UTC

From: Wengang Wang <wen.gang.wang@oracle.com>
Subject: ocfs2: initialize ip_next_orphan

Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.

In one node in the cluster, there is the following callback trace:

# cat /proc/21473/stack
[<ffffffffc09a2f06>] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
[<ffffffffc09a4481>] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
[<ffffffffc09b2ce2>] ocfs2_evict_inode+0x152/0x820 [ocfs2]
[<ffffffff8122b36e>] evict+0xae/0x1a0
[<ffffffff8122bd26>] iput+0x1c6/0x230
[<ffffffffc09b60ed>] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
[<ffffffffc0992ae0>] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
[<ffffffffc099a1e9>] ocfs2_dir_foreach+0x29/0x30 [ocfs2]
[<ffffffffc09b7716>] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
[<ffffffffc09b9b4e>] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
[<ffffffff810a1399>] process_one_work+0x169/0x4a0
[<ffffffff810a1bcb>] worker_thread+0x5b/0x560
[<ffffffff810a7a2b>] kthread+0xcb/0xf0
[<ffffffff816f5d21>] ret_from_fork+0x61/0x90
[<ffffffffffffffff>] 0xffffffffffffffff

The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,

2067         /* Skip inodes which are already added to recover list, since dio may
2068          * happen concurrently with unlink/rename */
2069         if (OCFS2_I(iter)->ip_next_orphan) {
2070                 iput(iter);
2071                 return 0;
2072         }
2073

The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().

While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference).  So I don't think the inode is really in
the recover list (no vmcore to confirm).

Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up for
unlinked inodes.  The on disk inode eviction could involve a lot of IOs
which may need long time to finish.  That means this node could hold the
cluster lock for very long time, that can lead to the lock requests (from
other nodes) to the orhpan directory hang for long time.

Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.

This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.

Fix: initialize ip_next_orphan as NULL.

Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/super.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/ocfs2/super.c~ocfs2-initialize-ip_next_orphan
+++ a/fs/ocfs2/super.c
@@ -1713,6 +1713,7 @@  static void ocfs2_inode_init_once(void *
 
 	oi->ip_blkno = 0ULL;
 	oi->ip_clusters = 0;
+	oi->ip_next_orphan = NULL;
 
 	ocfs2_resv_init_once(&oi->ip_la_data_resv);

[14/14] ocfs2: initialize ip_next_orphan

Commit Message

Patch