[135/151] lustre: llite: Disable tiny writes for append

Message ID	1569869810-23848-136-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=+Pn3=XZ=lists.lustre.org=lustre-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08642224EF From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.com> Date: Mon, 30 Sep 2019 14:56:34 -0400 Message-Id: <1569869810-23848-136-git-send-email-jsimmons@infradead.org> In-Reply-To: <1569869810-23848-1-git-send-email-jsimmons@infradead.org> References: <1569869810-23848-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 135/151] lustre: llite: Disable tiny writes for append Precedence: list Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: update to 2.11 support \| expand [000/151] lustre: update to 2.11 support [001/151] lnet: fix needed headers for lnet headers [002/151] lustre: fix signal handling in abortable waits. [003/151] lnet: ksocklnd: add secondary IP address handling [004/151] lnet: o2iblnd: add secondary IP address handling [005/151] lnet: consoldate secondary IP address handling [006/151] lustre: support for gcc8 [007/151] lnet: Allocate MEs and small MDs in own kmem_caches [008/151] lustre: seq: make seq_proc_write_common() safer [009/151] lustre: ptlrpc: Fix an rq_no_reply assertion failure [010/151] lustre: fld: resend seq lookup RPC if it is on LWP [011/151] lustre: fld: retry fld rpc even for ESHUTDOWN [012/151] lustre: fld: retry fld rpc until the import is closed [013/151] lustre: fld: fld client lookup should retry [014/151] lustre: ldlm: testcases for multiple modify RPCs feature [015/151] lustre: ldlm: Don't check opcode with NULL rq_reqmsg [016/151] lustre: all: remove all Sun license and URL references [017/151] lustre: ldlm: Use interval tree to update kms [018/151] lustre: osc: prepare OSC code to be used from MDC [019/151] lustre: statahead: support striped directory [020/151] lustre: readdir: improve striped readdir [021/151] lustre: llog: consolidate common error checking [022/151] lustre: llite: NULL pointer dereference in cl_object_top() [023/151] lustre: ptlrpc: remove incorrect pid printing [024/151] lnet: Fix lost lock [025/151] lustre: llite: Reduce overhead for ll_do_fast_read [026/151] lustre: ptlrpc: change cr_sent_tv from timespec to ktime [027/151] lustre: ptlrpc: Use C99 initializer in ptlrpc_register_rqbd() [028/151] lustre: lmv: stripe dir page may be released mistakenly [029/151] lnet: selftest: Use C99 struct initializer in framework.c [030/151] lnet: fix memory leak and lnet_interfaces_max [031/151] lnet: decref on peer after use [032/151] lnet: rediscover peer if it changed [033/151] lnet: resolve unsafe list access [034/151] lustre: llite: Implement ladvise lockahead [035/151] lustre: jobstats: move jobstats code into separate file. [036/151] lustre: ldlm: don't use jiffies as sysfs parameter [037/151] lnet: Handle ping buffer with only loopback NID [038/151] lustre: llite: enable readahead for small read_ahead_per_file [039/151] lnet: don't discover loopback interface [040/151] lnet: reduce logging severity [041/151] lustre: ptlrpc: migrate pinger to 64 bit time [042/151] lustre: mdc: add cl_device to the MDC [043/151] lustre: lov: add MDT target to the LOV device [044/151] lustre: mdt: IO request handling in MDT [045/151] lustre: osc: common client setup/cleanup [046/151] lustre: mdc: add IO methods to the MDC [047/151] lustre: lvbo: pass lock as parameter to lvbo_update() [048/151] lustre: mds: add IO locking to the MDC and MDT [049/151] lustre: mdc: add IO stats in mdc [050/151] lustre: lov: add Data-on-MDT tests and fixes [051/151] lustre: mdc: use generic grant code at MDT [052/151] lustre: mds: combine DoM bit with other IBITS [053/151] lustre: llite: increase whole-file readahead to RPC size [054/151] lustre: ldlm: remove liblustre remnants [055/151] lustre: misc: replace LASSERT() with BUILD_BUG_ON() [056/151] lustre: llite: check layout size after cl_object_layout_get [057/151] lustre: mdc: implement own mdc_io_fsync_start() [058/151] lustre: ldlm: migrate the rest of the code to 64 bit time [059/151] lustre: llite: sync bdi sysfs name with lustre sysfs tree [060/151] lustre: lov: allow lov..stripe{size, count}=-1 param [061/151] lustre: brw: add short io osc/ost transfer. [062/151] lustre: lov: take lov layout lock for I/O with ignore_layout [063/151] lustre: lov: pack lsm_flags from layout [064/151] lustre: clio: introduce CIT_GLIMPSE for glimpse [065/151] lustre: flr: add infrastructure to create a new mirror [066/151] lustre: clio: no glimpse for data immutable file [067/151] lustre: flr: read support for flr [068/151] lustre: lov: rework write intent on componect instantiation [069/151] lustre: ptlrpc: use lu_extent in layout_intent [070/151] lustre: flr: Send write intent RPC to mdt [071/151] lustre: flr: extend DATA_VERSION API to read layout version [072/151] lustre: lov: skip empty pages in lov_io_submit() [073/151] lustre: mdc: don't assert on name pack [074/151] lustre: flr: mirror read and write [075/151] lustre: flr: resync support and test tool [076/151] lustre: flr: randomize mirror pick [077/151] lustre: flr: instantiate component for truncate [078/151] lustre: hsm: don't release with wrong size [079/151] lustre: mdc: Add an additional set of 64 changelog flags [080/151] lustre: ldlm: assume OBD_CONNECT_IBITS [081/151] lustre: llite: assume OBD_CONNECT_ATTRFID [082/151] lustre: llite: simplify ll_inode_revalidate() [083/151] lustre: obd: free obd_svc_stats when all users are gone [084/151] lustre: mdc: add uid/gid to Changelogs entries [085/151] lustre: scrub: general framework for OI scrub [086/151] lustre: idl: clean up and document ptlrpc structures [087/151] lustre: idl: remove obsolete RPC MSG flags [088/151] lnet: libcfs: call proper crypto algo when keys are passed in [089/151] lustre: clio: remove unused cl_lock layers [090/151] lustre: sec: migrate to 64 bit time [091/151] lustre: llite: avoid live-lock when concurrent mmap()s [092/151] lustre: llite: change lli_glimpse_time to ktime [093/151] lustre: hsm: filter kkuc write by client UUID [094/151] lustre: dne: allow mkdir with specific MDTs [095/151] lustre: misc: update Intel copyright messages for 2017 [096/151] lustre: fid: improve seq allocation error messages [097/151] lustre: mdc: interruptable during RPC retry for EINPROGRESS [098/151] lustre: osc: migrate to 64 bit time [099/151] lustre: vvp: Print discarded page warning on -EIO [100/151] lustre: clio: Use readahead for partial page write [101/151] lustre: flr: comp-flags support when creating mirrors [102/151] lustre: libcfs: remove cfs_time_XXX_64 wrappers [103/151] lustre: address issues raised by gcc7 [104/151] lustre: lov: fill no-extent fiemap on object with no stripe. [105/151] lustre: ptlrpc: allow to limit number of service's rqbds [106/151] lnet: ensure peer put back on dc request queue [107/151] lustre: recovery: support setstripe replay [108/151] lustre: lustre: move LA_ flags to lustre_user.h [109/151] lustre: flr: revise lease API [110/151] lustre: idl: add PTLRPC definitions to enum [111/151] lustre: obd: remove s2dhms time function [112/151] lustre: mdc: add client NID to Changelogs entries [113/151] lustre: mdc: implement CL_OPEN for Changelogs [114/151] lustre: acl: prepare small buffer for ACL RPC reply [115/151] lnet: safe access in debug print [116/151] lnet: Remove LASSERT on userspace data [117/151] lustre: flr: split a mirror from mirrored file [118/151] lustre: llite: deny 2.10 clients to open mirrored files [119/151] lustre: uapi: rename LCM_FL_NOT_FLR to LCM_FL_NONE [120/151] lustre: flr: layout truncate compatibility [121/151] lustre: mdc: high-priority request handling for DOM [122/151] lustre: llite: Add tiny write support [123/151] lustre: mdc: add CL_GETXATTR for Changelogs [124/151] lustre: uapi: record denied OPEN in Changelogs [125/151] lustre: llite: have ll_write_end to sync for DIO [126/151] lustre: obd: add check to obd_statfs [127/151] lustre: obd: fix statfs handling [128/151] lustre: dom: support DATA_VERSION IO type [129/151] lnet: fix contiguous range support [130/151] lustre: osc: add a bit to indicate osc_page in cache tree [131/151] lustre: ldlm: fix export reference [132/151] lustre: llite: Add exit for filedata allocation failed [133/151] lustre: misc: Wrong checksum return value [134/151] lustre: llite: fix mount error handing [135/151] lustre: llite: Disable tiny writes for append [136/151] lustre: uapi: replace FMODE_{READ, WRITE} with MDS_* equivs [137/151] lnet: reduce discovery timeout [138/151] lustre: update version to 2.10.99 [139/151] lustre: ptlrpc: clarify 64 bit time usage [140/151] lustre: ptlrpc: add watchdog for ptlrpc service threads. [141/151] lustre: handles: discard h_owner in favour of h_ops [142/151] lustre: ldlm: Remove use of SLAB_DESTROY_BY_RCU for ldlm lock slab [143/151] lustre: ldlm: simplify lock_mode_to_index() [144/151] lustre: ptlrpc: use list_move where appropriate. [145/151] lustre: ptlrpc: simplify locking in ptlrpc_add_rqs_to_pool() [146/151] lustre: ptlrpc: incorporate BUILD_BUG_ON into ptlrpc_req_async_args() [147/151] lustre: introduce CONFIG_LUSTRE_FS_POSIX_ACL [148/151] lustre: ptlrpc: discard a server-only waitq. [149/151] lustre: llite: remove // comments. [150/151] lustre: remove outdated comments about ->ap_* functions. [151/151] lustre: clean up some comment alignment.

Message ID

1569869810-23848-136-git-send-email-jsimmons@infradead.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08642224EF
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.com>
Date: Mon, 30 Sep 2019 14:56:34 -0400
Message-Id: <1569869810-23848-136-git-send-email-jsimmons@infradead.org>
In-Reply-To: <1569869810-23848-1-git-send-email-jsimmons@infradead.org>
References: <1569869810-23848-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 135/151] lustre: llite: Disable tiny writes
 for append
Precedence: list
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

Series

lustre: update to 2.11 support | expand

Commit Message

James Simmons Sept. 30, 2019, 6:56 p.m. UTC

From: Patrick Farrell <pfarrell@whamcloud.com>

Unfortunately, tiny writes do not work correctly with
appending to files.  When appending to a file, we must take
DLM locks to EOF on all stripes, in order to protect file
size so we can append correctly.

If we dirty a page with a normal write then append to it
with a tiny write, these DLM locks are not present, and we
can use an incorrect size if another client writes to a
different stripe, increasing the size without cancelling
the lock which is protecting our dirty page.

We could theoretically check to make sure the required DLM
locks are held, but this would be time consuming.

The simplest solution is to just not allow tiny writes when
appending.

Also add option to disable tiny writes at runtime.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10681
Cray-bug-id: LUS-5723
Lustre-commit: d79ffa3ff746 ("LU-10681 llite: Disable tiny writes for append")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/31353
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c           | 70 +++++++++-------------------------------
 fs/lustre/llite/llite_internal.h |  8 +++++
 fs/lustre/llite/llite_lib.c      |  1 +
 fs/lustre/llite/lproc_llite.c    | 36 +++++++++++++++++++++
 4 files changed, 61 insertions(+), 54 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 6732b68..fe4340d 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1484,70 +1484,29 @@  static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
  * and will write it out.  This saves a lot of processing time.
  *
  * All writes here are within one page, so exclusion is handled by the page
- * lock on the vm page.  Exception is appending, which requires locking the
- * full file to handle size issues.  We do not do tiny writes for writes which
- * touch multiple pages because it's very unlikely multiple sequential pages
+ * lock on the vm page.  We do not do tiny writes for writes which touch
+ * multiple pages because it's very unlikely multiple sequential pages are
  * are already dirty.
  *
  * We limit these to < PAGE_SIZE because PAGE_SIZE writes are relatively common
  * and are unlikely to be to already dirty pages.
  *
- * Attribute updates are important here, we do it in ll_tiny_write_end.
+ * Attribute updates are important here, we do them in ll_tiny_write_end.
  */
 static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter)
 {
 	ssize_t count = iov_iter_count(iter);
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	struct ll_inode_info *lli = ll_i2info(inode);
-	struct range_lock range;
 	ssize_t result = 0;
-	bool append = false;
-
-	/* NB: we can't do direct IO for tiny writes because they use the page
-	 * cache, and we can't do sync writes because tiny writes can't flush
-	 * pages.
-	 */
-	if (file->f_flags & (O_DIRECT | O_SYNC))
-		return 0;
 
-	/* It is relatively unlikely we will overwrite a full dirty page, so
-	 * limit tiny writes to < PAGE_SIZE
+	/* Restrict writes to single page and < PAGE_SIZE.  See comment at top
+	 * of function for why.
 	 */
-	if (count >= PAGE_SIZE)
+	if (count >= PAGE_SIZE ||
+	    (iocb->ki_pos & (PAGE_SIZE-1)) + count > PAGE_SIZE)
 		return 0;
 
-	/* For append writes, we must take the range lock to protect size
-	 * and also move pos to current size before writing.
-	 */
-	if (file->f_flags & O_APPEND) {
-		struct lu_env *env;
-		u16 refcheck;
-
-		append = true;
-		range_lock_init(&range, 0, LUSTRE_EOF);
-		result = range_lock(&lli->lli_write_tree, &range);
-		if (result)
-			return result;
-		env = cl_env_get(&refcheck);
-		if (IS_ERR(env)) {
-			result = PTR_ERR(env);
-			goto out;
-		}
-		ll_merge_attr(env, inode);
-		cl_env_put(env, &refcheck);
-		iocb->ki_pos = i_size_read(inode);
-	}
-
-	/* Does this write touch multiple pages?
-	 *
-	 * This partly duplicates the PAGE_SIZE check above, but must come
-	 * after range locking for append writes because it depends on the
-	 * write position (ki_pos).
-	 */
-	if ((iocb->ki_pos & (PAGE_SIZE-1)) + count > PAGE_SIZE)
-		goto out;
-
 	result = __generic_file_write_iter(iocb, iter);
 
 	/* If the page is not already dirty, ll_tiny_write_begin returns
@@ -1562,10 +1521,6 @@  static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter)
 		set_bit(LLIF_DATA_MODIFIED, &ll_i2info(inode)->lli_flags);
 	}
 
-out:
-	if (append)
-		range_unlock(&lli->lli_write_tree, &range);
-
 	CDEBUG(D_VFSTRACE, "result: %zu, original count %zu\n", result, count);
 
 	return result;
@@ -1578,10 +1533,17 @@  static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct lu_env *env;
 	struct vvp_io_args *args;
-	ssize_t rc_tiny, rc_normal;
+	ssize_t rc_tiny = 0, rc_normal;
 	u16 refcheck;
 
-	rc_tiny = ll_do_tiny_write(iocb, from);
+	/* NB: we can't do direct IO for tiny writes because they use the page
+	 * cache, we can't do sync writes because tiny writes can't flush
+	 * pages, and we can't do append writes because we can't guarantee the
+	 * required DLM locks are held to protect file size.
+	 */
+	if (ll_sbi_has_tiny_write(ll_i2sbi(file_inode(iocb->ki_filp))) &&
+	    !(iocb->ki_filp->f_flags & (O_DIRECT | O_SYNC | O_APPEND)))
+		rc_tiny = ll_do_tiny_write(iocb, from);
 
 	/* In case of error, go on and try normal write - Only stop if tiny
 	 * write completed I/O.
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index f23cf65..e067ba4 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -419,6 +419,7 @@  enum stats_track_type {
 #define LL_SBI_FILE_SECCTX	0x800000 /* set file security context at
 					  * create
 					  */
+#define LL_SBI_TINY_WRITE	0x2000000 /* tiny write support */
 
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
@@ -445,6 +446,8 @@  enum stats_track_type {
 	"always_ping",	\
 	"fast_read",    \
 	"file_secctx",	\
+	"pio",		\
+	"tiny_write",	\
 }
 
 /*
@@ -705,6 +708,11 @@  static inline bool ll_sbi_has_fast_read(struct ll_sb_info *sbi)
 	return !!(sbi->ll_flags & LL_SBI_FAST_READ);
 }
 
+static inline bool ll_sbi_has_tiny_write(struct ll_sb_info *sbi)
+{
+	return !!(sbi->ll_flags & LL_SBI_TINY_WRITE);
+}
+
 void ll_ras_enter(struct file *f);
 
 /* llite/lcommon_misc.c */
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index c94bc65..4d14ce1 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -125,6 +125,7 @@  static struct ll_sb_info *ll_init_sbi(void)
 	atomic_set(&sbi->ll_agl_total, 0);
 	sbi->ll_flags |= LL_SBI_AGL_ENABLED;
 	sbi->ll_flags |= LL_SBI_FAST_READ;
+	sbi->ll_flags |= LL_SBI_TINY_WRITE;
 
 	/* root squash */
 	sbi->ll_squash.rsi_uid = 0;
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index b69e5d7..e108326 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1024,6 +1024,41 @@  static ssize_t xattr_cache_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(xattr_cache);
 
+static ssize_t tiny_write_show(struct kobject *kobj,
+			       struct attribute *attr,
+			       char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	return sprintf(buf, "%u\n", !!(sbi->ll_flags & LL_SBI_TINY_WRITE));
+}
+
+static ssize_t tiny_write_store(struct kobject *kobj,
+				struct attribute *attr,
+				const char *buffer,
+				size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	bool val;
+	int rc;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	spin_lock(&sbi->ll_lock);
+	if (val)
+		sbi->ll_flags |= LL_SBI_TINY_WRITE;
+	else
+		sbi->ll_flags &= ~LL_SBI_TINY_WRITE;
+	spin_unlock(&sbi->ll_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(tiny_write);
+
 static ssize_t fast_read_show(struct kobject *kobj,
 			      struct attribute *attr,
 			      char *buf)
@@ -1225,6 +1260,7 @@  static ssize_t ll_nosquash_nids_seq_write(struct file *file,
 	&lustre_attr_default_easize.attr,
 	&lustre_attr_xattr_cache.attr,
 	&lustre_attr_fast_read.attr,
+	&lustre_attr_tiny_write.attr,
 	NULL,
 };

[135/151] lustre: llite: Disable tiny writes for append

Commit Message

Patch