From patchwork Mon May 25 22:08:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11569545 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D61260D for ; Mon, 25 May 2020 22:09:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 569DC2071A for ; Mon, 25 May 2020 22:09:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 569DC2071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C36FD2472AE; Mon, 25 May 2020 15:09:00 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3D5CC21F88F for ; Mon, 25 May 2020 15:08:41 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5F5811005EF6; Mon, 25 May 2020 18:08:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5EDCF49A; Mon, 25 May 2020 18:08:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 25 May 2020 18:08:12 -0400 Message-Id: <1590444502-20533-36-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1590444502-20533-1-git-send-email-jsimmons@infradead.org> References: <1590444502-20533-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 35/45] lustre: osc: Do not wait for grants for too long X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin obd_timeout is way too long considering we are holding a lock that might be contended. If OST is slow to respond, we might get evicted, so limit us to a half of the shortest possible max wait a server might have before switching to synchronous IO. WC-bug-id: https://jira.whamcloud.com/browse/LU-13131 Lustre-commit: 1eee11c75ca13 ("LU-13131 osc: Do not wait for grants for too long") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/38283 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 2 ++ fs/lustre/ldlm/ldlm_request.c | 1 + fs/lustre/osc/osc_cache.c | 13 ++++++++++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index f67b612..174b314 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -1320,6 +1320,8 @@ int ldlm_cli_cancel_list(struct list_head *head, int count, enum ldlm_cancel_flags flags); /** @} ldlm_cli_api */ +extern unsigned int ldlm_enqueue_min; + int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop); int ldlm_cli_inodebits_convert(struct ldlm_lock *lock, enum ldlm_cancel_flags cancel_flags); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5f06def..12ee241 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -69,6 +69,7 @@ unsigned int ldlm_enqueue_min = OBD_TIMEOUT_DEFAULT; module_param(ldlm_enqueue_min, uint, 0644); MODULE_PARM_DESC(ldlm_enqueue_min, "lock enqueue timeout minimum"); +EXPORT_SYMBOL(ldlm_enqueue_min); /* in client side, whether the cached locks will be canceled before replay */ unsigned int ldlm_cancel_unused_locks_before_replay = 1; diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 9e28ff6..c7f1502 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -39,6 +39,7 @@ #define DEBUG_SUBSYSTEM S_OSC #include +#include #include "osc_internal.h" @@ -1630,10 +1631,20 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli, { struct osc_object *osc = oap->oap_obj; struct lov_oinfo *loi = osc->oo_oinfo; - unsigned long timeout = (AT_OFF ? obd_timeout : at_max) * HZ; int rc = -EDQUOT; int remain; bool entered = false; + /* We cannot wait for a long time here since we are holding ldlm lock + * across the actual IO. If no requests complete fast (e.g. due to + * overloaded OST that takes a long time to process everything, we'd + * get evicted if we wait for a normal obd_timeout or some such. + * So we try to wait half the time it would take the client to be + * evicted by server which is half obd_timeout when AT is off + * or at least ldlm_enqueue_min with AT on. + * See LU-13131 + */ + unsigned long timeout = (AT_OFF ? obd_timeout / 2 : + ldlm_enqueue_min / 2) * HZ; OSC_DUMP_GRANT(D_CACHE, cli, "need:%d\n", bytes);