diff mbox series

afs: Re-enable freezing once a page fault is interrupted

Message ID 162387854886.1035841.15139736369962171742.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series afs: Re-enable freezing once a page fault is interrupted | expand

Commit Message

David Howells June 16, 2021, 9:22 p.m. UTC
From: Matthew Wilcox (Oracle) <willy@infradead.org>

If a task is killed during a page fault, it does not currently call
sb_end_pagefault(), which means that the filesystem cannot be frozen
at any time thereafter.  This may be reported by lockdep like this:

====================================
WARNING: fsstress/10757 still has locks held!
5.13.0-rc4-build4+ #91 Not tainted
------------------------------------
1 lock held by fsstress/10757:
 #0: ffff888104eac530
 (
sb_pagefaults

as filesystem freezing is modelled as a lock.

Fix this by removing all the direct returns from within the function,
and using 'ret' to indicate whether we were interrupted or successful.

Fixes: 1cf7a1518aef ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/
---

 fs/afs/write.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Linus Torvalds June 18, 2021, 8:52 p.m. UTC | #1
On Wed, Jun 16, 2021 at 2:22 PM David Howells <dhowells@redhat.com> wrote:
>
> If a task is killed during a page fault, it does not currently call
> sb_end_pagefault(), which means that the filesystem cannot be frozen
> at any time thereafter.  This may be reported by lockdep like this:

I've applied this patch.

Everything in my screams "the sb_start/end_pagefault() code is
completely broken", but in the meantime this patch fixes the immediate
bug.

I suspect that the whole sb_start/end_pagefault thing should just go
away entirely, and the freezer should be re-examined. Alternatively,
it should just be done by generic code, not by the filesystem.

But it is what it is.

           Linus
diff mbox series

Patch

diff --git a/fs/afs/write.c b/fs/afs/write.c
index f722cb80a594..ff36800a7389 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -848,6 +848,7 @@  vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	struct inode *inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	unsigned long priv;
+	vm_fault_t ret = VM_FAULT_RETRY;
 
 	_enter("{{%llx:%llu}},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index);
 
@@ -859,14 +860,14 @@  vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 #ifdef CONFIG_AFS_FSCACHE
 	if (PageFsCache(page) &&
 	    wait_on_page_fscache_killable(page) < 0)
-		return VM_FAULT_RETRY;
+		goto out;
 #endif
 
 	if (wait_on_page_writeback_killable(page))
-		return VM_FAULT_RETRY;
+		goto out;
 
 	if (lock_page_killable(page) < 0)
-		return VM_FAULT_RETRY;
+		goto out;
 
 	/* We mustn't change page->private until writeback is complete as that
 	 * details the portion of the page we need to write back and we might
@@ -874,7 +875,7 @@  vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 */
 	if (wait_on_page_writeback_killable(page) < 0) {
 		unlock_page(page);
-		return VM_FAULT_RETRY;
+		goto out;
 	}
 
 	priv = afs_page_dirty(page, 0, thp_size(page));
@@ -888,8 +889,10 @@  vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	}
 	file_update_time(file);
 
+	ret = VM_FAULT_LOCKED;
+out:
 	sb_end_pagefault(inode->i_sb);
-	return VM_FAULT_LOCKED;
+	return ret;
 }
 
 /*