[034/151] lustre: llite: Implement ladvise lockahead

From: Patrick Farrell <pfarrell@whamcloud.com>

From: Patrick Farrell <pfarrell@whamcloud.com>

Ladvise lockahead is a new feature allowing userspace to
request extent locks in advance of the IO which will use
them. These locks are not expanded beyond the size
requested by userspace.  They are intended to make it
possible to address lock contention between multiple
clients resulting from lock expansion.  They should allow
optimizing various IO patterns, notably strided writing.
(Further information in LU-6179)

Asynchronous glimpse locks are a speculative version of
glimpse locks, and already implement the required behavior.
Lockahead requests share this behavior.

Additionally, lockahead creates extent locks in advance
of IO, and so breaks the assumption that the holder of the
highest lock knows the current file size.

So we also modify the ofd_intent_policy code to glimpse
PW locks until it finds one it knows to be in use, taking
care to send only one glimpse to each client.

The current patch allows asynchronous non-blocking lock
ahead requests and synchronous blocking requests.  We
cannot do asynchronous blocking requests, because of
deadlocks that occur in having ptlrpcd threads handle
blocking lock requests.

Finally, this patch also adds another advice to disable
lock expansion, setting a per-file descriptor flag.  This
allows user space to control whether or not lock requests
on this file descriptor will undergo lock expansion.

This means if lockahead locks are not created ahead of IO
(due to inherent raciness) or are cancelled by a competing
IO request, the IO requests that should have used the
manually requested locks will not result in expanded locks.
This avoids lock ping-pong, and because the resulting locks
will not extend to the end of the file, future lockahead
requests can be granted.  Effectively, this means that if
lockahead usage for strided IO is interrupted by a
competing request, it can re-assert itself.

lockahead is implented via the ladvise interface from
userspace. As lockahead results in a DLM lock request
rather than file advice, we do not use the lower levels of
the ladvise implementation.

Note this patch has one oddity:
Cray released an earlier version of lockahead without
FL_SPECULATIVE support.  That version uses
OBD_CONNECT_LOCKAHEAD_OLD, this new one uses
OBD_CONNECT_LOCKAHEAD.

The client code in this patch is interoperable with that
version, so it also advertises OBD_CONNECT_LOCKAHEAD_OLD
support, but the server version is not, so the server
advertises only OBD_CONNECT_LOCKAHEAD support.

Client support for the original lockahead is slated for
removal after the release of 2.12.  This is enforced with
a compile time version test that will remove support.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6179
Lustre-commit: a8dcf372f430 ("LU-6179 llite: Implement ladvise lockahead")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/13564
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 Documentation/lustre/ladvise_lockahead.txt | 304 +++++++++++++++++++++++++++++
 fs/lustre/include/cl_object.h              |  36 +++-
 fs/lustre/include/lustre_dlm.h             |   4 +-
 fs/lustre/include/lustre_dlm_flags.h       |  41 +++-
 fs/lustre/include/lustre_export.h          |  20 ++
 fs/lustre/include/lustre_osc.h             |  11 +-
 fs/lustre/ldlm/ldlm_lib.c                  |   1 +
 fs/lustre/ldlm/ldlm_lock.c                 |   7 +-
 fs/lustre/llite/file.c                     | 277 ++++++++++++++++++++++++--
 fs/lustre/llite/glimpse.c                  |  27 ++-
 fs/lustre/llite/llite_internal.h           |   9 +
 fs/lustre/llite/llite_lib.c                |  23 ++-
 fs/lustre/llite/vvp_io.c                   |   3 +
 fs/lustre/lov/lov_io.c                     |   1 +
 fs/lustre/obdclass/cl_lock.c               |   2 +-
 fs/lustre/obdclass/lprocfs_status.c        |   3 +-
 fs/lustre/osc/osc_internal.h               |   3 +-
 fs/lustre/osc/osc_lock.c                   |  71 +++++--
 fs/lustre/osc/osc_request.c                |  50 +++--
 fs/lustre/ptlrpc/wiretest.c                |   8 +-
 include/uapi/linux/lustre/lustre_idl.h     |  12 +-
 include/uapi/linux/lustre/lustre_user.h    |  40 +++-
 22 files changed, 857 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/lustre/ladvise_lockahead.txt

Message ID	1569869810-23848-35-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=+Pn3=XZ=lists.lustre.org=lustre-devel-bounces@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14CB713BD for <patchwork-lustre-devel@patchwork.kernel.org>; Mon, 30 Sep 2019 18:57:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F0B6D224D5 for <patchwork-lustre-devel@patchwork.kernel.org>; Mon, 30 Sep 2019 18:57:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F0B6D224D5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3BB9C5C3BA4; Mon, 30 Sep 2019 11:57:20 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C14DB5C3AC0 for <lustre-devel@lists.lustre.org>; Mon, 30 Sep 2019 11:57:08 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 8E6101005439; Mon, 30 Sep 2019 14:56:56 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8BB3FBB; Mon, 30 Sep 2019 14:56:56 -0400 (EDT) From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.com> Date: Mon, 30 Sep 2019 14:54:53 -0400 Message-Id: <1569869810-23848-35-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1569869810-23848-1-git-send-email-jsimmons@infradead.org> References: <1569869810-23848-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 034/151] lustre: llite: Implement ladvise lockahead X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." <lustre-devel-lustre.org> List-Unsubscribe: <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>, <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe> List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/> List-Post: <mailto:lustre-devel@lists.lustre.org> List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help> List-Subscribe: <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>, <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe> Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: update to 2.11 support \| expand [000/151] lustre: update to 2.11 support [001/151] lnet: fix needed headers for lnet headers [002/151] lustre: fix signal handling in abortable waits. [003/151] lnet: ksocklnd: add secondary IP address handling [004/151] lnet: o2iblnd: add secondary IP address handling [005/151] lnet: consoldate secondary IP address handling [006/151] lustre: support for gcc8 [007/151] lnet: Allocate MEs and small MDs in own kmem_caches [008/151] lustre: seq: make seq_proc_write_common() safer [009/151] lustre: ptlrpc: Fix an rq_no_reply assertion failure [010/151] lustre: fld: resend seq lookup RPC if it is on LWP [011/151] lustre: fld: retry fld rpc even for ESHUTDOWN [012/151] lustre: fld: retry fld rpc until the import is closed [013/151] lustre: fld: fld client lookup should retry [014/151] lustre: ldlm: testcases for multiple modify RPCs feature [015/151] lustre: ldlm: Don't check opcode with NULL rq_reqmsg [016/151] lustre: all: remove all Sun license and URL references [017/151] lustre: ldlm: Use interval tree to update kms [018/151] lustre: osc: prepare OSC code to be used from MDC [019/151] lustre: statahead: support striped directory [020/151] lustre: readdir: improve striped readdir [021/151] lustre: llog: consolidate common error checking [022/151] lustre: llite: NULL pointer dereference in cl_object_top() [023/151] lustre: ptlrpc: remove incorrect pid printing [024/151] lnet: Fix lost lock [025/151] lustre: llite: Reduce overhead for ll_do_fast_read [026/151] lustre: ptlrpc: change cr_sent_tv from timespec to ktime [027/151] lustre: ptlrpc: Use C99 initializer in ptlrpc_register_rqbd() [028/151] lustre: lmv: stripe dir page may be released mistakenly [029/151] lnet: selftest: Use C99 struct initializer in framework.c [030/151] lnet: fix memory leak and lnet_interfaces_max [031/151] lnet: decref on peer after use [032/151] lnet: rediscover peer if it changed [033/151] lnet: resolve unsafe list access [034/151] lustre: llite: Implement ladvise lockahead [035/151] lustre: jobstats: move jobstats code into separate file. [036/151] lustre: ldlm: don't use jiffies as sysfs parameter [037/151] lnet: Handle ping buffer with only loopback NID [038/151] lustre: llite: enable readahead for small read_ahead_per_file [039/151] lnet: don't discover loopback interface [040/151] lnet: reduce logging severity [041/151] lustre: ptlrpc: migrate pinger to 64 bit time [042/151] lustre: mdc: add cl_device to the MDC [043/151] lustre: lov: add MDT target to the LOV device [044/151] lustre: mdt: IO request handling in MDT [045/151] lustre: osc: common client setup/cleanup [046/151] lustre: mdc: add IO methods to the MDC [047/151] lustre: lvbo: pass lock as parameter to lvbo_update() [048/151] lustre: mds: add IO locking to the MDC and MDT [049/151] lustre: mdc: add IO stats in mdc [050/151] lustre: lov: add Data-on-MDT tests and fixes [051/151] lustre: mdc: use generic grant code at MDT [052/151] lustre: mds: combine DoM bit with other IBITS [053/151] lustre: llite: increase whole-file readahead to RPC size [054/151] lustre: ldlm: remove liblustre remnants [055/151] lustre: misc: replace LASSERT() with BUILD_BUG_ON() [056/151] lustre: llite: check layout size after cl_object_layout_get [057/151] lustre: mdc: implement own mdc_io_fsync_start() [058/151] lustre: ldlm: migrate the rest of the code to 64 bit time [059/151] lustre: llite: sync bdi sysfs name with lustre sysfs tree [060/151] lustre: lov: allow lov..stripe{size, count}=-1 param [061/151] lustre: brw: add short io osc/ost transfer. [062/151] lustre: lov: take lov layout lock for I/O with ignore_layout [063/151] lustre: lov: pack lsm_flags from layout [064/151] lustre: clio: introduce CIT_GLIMPSE for glimpse [065/151] lustre: flr: add infrastructure to create a new mirror [066/151] lustre: clio: no glimpse for data immutable file [067/151] lustre: flr: read support for flr [068/151] lustre: lov: rework write intent on componect instantiation [069/151] lustre: ptlrpc: use lu_extent in layout_intent [070/151] lustre: flr: Send write intent RPC to mdt [071/151] lustre: flr: extend DATA_VERSION API to read layout version [072/151] lustre: lov: skip empty pages in lov_io_submit() [073/151] lustre: mdc: don't assert on name pack [074/151] lustre: flr: mirror read and write [075/151] lustre: flr: resync support and test tool [076/151] lustre: flr: randomize mirror pick [077/151] lustre: flr: instantiate component for truncate [078/151] lustre: hsm: don't release with wrong size [079/151] lustre: mdc: Add an additional set of 64 changelog flags [080/151] lustre: ldlm: assume OBD_CONNECT_IBITS [081/151] lustre: llite: assume OBD_CONNECT_ATTRFID [082/151] lustre: llite: simplify ll_inode_revalidate() [083/151] lustre: obd: free obd_svc_stats when all users are gone [084/151] lustre: mdc: add uid/gid to Changelogs entries [085/151] lustre: scrub: general framework for OI scrub [086/151] lustre: idl: clean up and document ptlrpc structures [087/151] lustre: idl: remove obsolete RPC MSG flags [088/151] lnet: libcfs: call proper crypto algo when keys are passed in [089/151] lustre: clio: remove unused cl_lock layers [090/151] lustre: sec: migrate to 64 bit time [091/151] lustre: llite: avoid live-lock when concurrent mmap()s [092/151] lustre: llite: change lli_glimpse_time to ktime [093/151] lustre: hsm: filter kkuc write by client UUID [094/151] lustre: dne: allow mkdir with specific MDTs [095/151] lustre: misc: update Intel copyright messages for 2017 [096/151] lustre: fid: improve seq allocation error messages [097/151] lustre: mdc: interruptable during RPC retry for EINPROGRESS [098/151] lustre: osc: migrate to 64 bit time [099/151] lustre: vvp: Print discarded page warning on -EIO [100/151] lustre: clio: Use readahead for partial page write [101/151] lustre: flr: comp-flags support when creating mirrors [102/151] lustre: libcfs: remove cfs_time_XXX_64 wrappers [103/151] lustre: address issues raised by gcc7 [104/151] lustre: lov: fill no-extent fiemap on object with no stripe. [105/151] lustre: ptlrpc: allow to limit number of service's rqbds [106/151] lnet: ensure peer put back on dc request queue [107/151] lustre: recovery: support setstripe replay [108/151] lustre: lustre: move LA_ flags to lustre_user.h [109/151] lustre: flr: revise lease API [110/151] lustre: idl: add PTLRPC definitions to enum [111/151] lustre: obd: remove s2dhms time function [112/151] lustre: mdc: add client NID to Changelogs entries [113/151] lustre: mdc: implement CL_OPEN for Changelogs [114/151] lustre: acl: prepare small buffer for ACL RPC reply [115/151] lnet: safe access in debug print [116/151] lnet: Remove LASSERT on userspace data [117/151] lustre: flr: split a mirror from mirrored file [118/151] lustre: llite: deny 2.10 clients to open mirrored files [119/151] lustre: uapi: rename LCM_FL_NOT_FLR to LCM_FL_NONE [120/151] lustre: flr: layout truncate compatibility [121/151] lustre: mdc: high-priority request handling for DOM [122/151] lustre: llite: Add tiny write support [123/151] lustre: mdc: add CL_GETXATTR for Changelogs [124/151] lustre: uapi: record denied OPEN in Changelogs [125/151] lustre: llite: have ll_write_end to sync for DIO [126/151] lustre: obd: add check to obd_statfs [127/151] lustre: obd: fix statfs handling [128/151] lustre: dom: support DATA_VERSION IO type [129/151] lnet: fix contiguous range support [130/151] lustre: osc: add a bit to indicate osc_page in cache tree [131/151] lustre: ldlm: fix export reference [132/151] lustre: llite: Add exit for filedata allocation failed [133/151] lustre: misc: Wrong checksum return value [134/151] lustre: llite: fix mount error handing [135/151] lustre: llite: Disable tiny writes for append [136/151] lustre: uapi: replace FMODE_{READ, WRITE} with MDS_* equivs [137/151] lnet: reduce discovery timeout [138/151] lustre: update version to 2.10.99 [139/151] lustre: ptlrpc: clarify 64 bit time usage [140/151] lustre: ptlrpc: add watchdog for ptlrpc service threads. [141/151] lustre: handles: discard h_owner in favour of h_ops [142/151] lustre: ldlm: Remove use of SLAB_DESTROY_BY_RCU for ldlm lock slab [143/151] lustre: ldlm: simplify lock_mode_to_index() [144/151] lustre: ptlrpc: use list_move where appropriate. [145/151] lustre: ptlrpc: simplify locking in ptlrpc_add_rqs_to_pool() [146/151] lustre: ptlrpc: incorporate BUILD_BUG_ON into ptlrpc_req_async_args() [147/151] lustre: introduce CONFIG_LUSTRE_FS_POSIX_ACL [148/151] lustre: ptlrpc: discard a server-only waitq. [149/151] lustre: llite: remove // comments. [150/151] lustre: remove outdated comments about ->ap_* functions. [151/151] lustre: clean up some comment alignment.

[034/151] lustre: llite: Implement ladvise lockahead

Commit Message

Patch