diff mbox series

[10/40] lustre: llite: check truncated page in ->readpage()

Message ID 1681042400-15491-11-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: backport OpenSFS changes from March XX, 2023 | expand

Commit Message

James Simmons April 9, 2023, 12:12 p.m. UTC
From: Qian Yingjin <qian@ddn.com>

The page end offset calculation in filemap_get_read_batch() was
off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae
("mm/filemap: use head pages in generic_file_buffered_read")

When a read is submitted with end offset 1048575, it calculates
the end page index for read of 256 where it should be 255. This
results in the readpage() call for the page with index 256 is over
stripe boundary and may not be covered by a DLM extent lock.

This happens in a corner race case: filemap_get_read_batch()
batches the page with index 256 for read, but later this page is
removed from page cache due to the lock protected it being revoked,
but has a reference count due to the batch.  This results in this
page in the read path is not covered by any DLM lock.

The solution is simple. We can check whether the page was
truncated and removed from page cache in ->readpage() by the
address_sapce pointer of the page. If it was truncated, return
AOP_TRUNCATED_PAGE to the upper caller.  This will cause the
kernel to retry to batch pages and the truncated page will not
be added as it was already removed from page cache of the file.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16412
Lustre-commit: 209afbe28b5f164bd ("LU-16412 llite: check truncated page in ->readpage()")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49433
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  6 ++++--
 fs/lustre/llite/rw.c            | 35 +++++++++++++++++++++++++++++++++++
 fs/lustre/llite/rw26.c          |  7 +++++++
 3 files changed, 46 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index a2930c8..4ef5c61 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -458,8 +458,8 @@ 
 /* was	OBD_FAIL_LLOG_CATINFO_NET			0x1309 until 2.3 */
 #define OBD_FAIL_MDS_SYNC_CAPA_SL			0x1310
 #define OBD_FAIL_SEQ_ALLOC				0x1311
-#define OBD_FAIL_PLAIN_RECORDS			    0x1319
-#define OBD_FAIL_CATALOG_FULL_CHECK		    0x131a
+#define OBD_FAIL_PLAIN_RECORDS				0x1319
+#define OBD_FAIL_CATALOG_FULL_CHECK			0x131a
 
 #define OBD_FAIL_LLITE					0x1400
 #define OBD_FAIL_LLITE_FAULT_TRUNC_RACE			0x1401
@@ -488,6 +488,8 @@ 
 #define OBD_FAIL_LLITE_PAGE_ALLOC			0x1418
 #define OBD_FAIL_LLITE_OPEN_DELAY			0x1419
 #define OBD_FAIL_LLITE_XATTR_PAUSE			0x1420
+#define OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE		0x1421
+#define OBD_FAIL_LLITE_READPAGE_PAUSE			0x1422
 
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 0b14ea6..dea2af1 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1865,6 +1865,41 @@  int ll_readpage(struct file *file, struct page *vmpage)
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	int result;
 
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_READPAGE_PAUSE)) {
+		unlock_page(vmpage);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_READPAGE_PAUSE, cfs_fail_val);
+		lock_page(vmpage);
+	}
+
+	/*
+	 * The @vmpage got truncated.
+	 * This is a kernel bug introduced since kernel 5.12:
+	 * comment: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251
+	 * ("mm/filemap: use head pages in generic_file_buffered_read")
+	 *
+	 * The page end offset calculation in filemap_get_read_batch() was off
+	 * by one.  When a read is submitted with end offset 1048575, then it
+	 * calculates the end page for read of 256 where it should be 255. This
+	 * results in the readpage() for the page with index 256 is over stripe
+	 * boundary and may not covered by a DLM extent lock.
+	 *
+	 * This happens in a corner race case: filemap_get_read_batch() adds
+	 * the page with index 256 for read which is not in the current read
+	 * I/O context, and this page is being invalidated and will be removed
+	 * from page cache due to the lock protected it being revoken. This
+	 * results in this page in the read path not covered by any DLM lock.
+	 *
+	 * The solution is simple. Check whether the page was truncated in
+	 * ->readpage(). If so, just return AOP_TRUNCATED_PAGE to the upper
+	 * caller. Then the kernel will retry to batch pages, and it will not
+	 * add the truncated page into batches as it was removed from page
+	 * cache of the file.
+	 */
+	if (vmpage->mapping != inode->i_mapping) {
+		unlock_page(vmpage);
+		return AOP_TRUNCATED_PAGE;
+	}
+
 	lcc = ll_cl_find(inode);
 	if (lcc) {
 		env = lcc->lcc_env;
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index cadded4..6700717 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -96,6 +96,13 @@  static void ll_invalidatepage(struct page *vmpage, unsigned int offset,
 		}
 		cl_env_percpu_put(env);
 	}
+
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE)) {
+		unlock_page(vmpage);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE,
+				 cfs_fail_val);
+		lock_page(vmpage);
+	}
 }
 
 static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)