From patchwork Tue Oct 6 00:06:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11817929 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9FDDD1580 for ; Tue, 6 Oct 2020 00:07:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BC402206F4 for ; Tue, 6 Oct 2020 00:07:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC402206F4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 53CAB21FF56; Mon, 5 Oct 2020 17:06:58 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AD3A22FA49A for ; Mon, 5 Oct 2020 17:06:33 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3228F10087DD; Mon, 5 Oct 2020 20:06:25 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2F76B2F0E5; Mon, 5 Oct 2020 20:06:25 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Oct 2020 20:06:00 -0400 Message-Id: <1601942781-24950-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1601942781-24950-1-git-send-email-jsimmons@infradead.org> References: <1601942781-24950-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/42] lnet: Loosen restrictions on LNet Health params X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The functions that set various LNet Health related parameters require that the parameters be set in a specific order depending on whether health is enabled or disabled. This is not user-friendly. - Don't overwrite lnet_transaction_timeout when health is being enabled or disabled. - Don't overwrite lnet_retry_count when health is being enabled (still set it to zero when health is disabled). - Allow lnet_retry_count to be set to 0 when health is disabled - Correct off-by-one error in transaction_to_set() to ensure lnet_transaction_timeout is greater than lnet_retry_count HPE-bug-id: LUS-8995 WC-bug-id: https://jira.whamcloud.com/browse/LU-13735 Lustre-commit: b8ff97611717d ("LU-13735 lnet: Loosen restrictions on LNet Health params") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/39228 Reviewed-by: Neil Brown Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 35 ++++++++++------------------------- 1 file changed, 10 insertions(+), 25 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index af2089e..f678ae2 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -140,10 +140,8 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_drop_asym_route, "Set to 1 to drop asymmetrical route messages."); -#define LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT 50 -#define LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT 50 - -unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT; +#define LNET_TRANSACTION_TIMEOUT_DEFAULT 50 +unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_DEFAULT; static int transaction_to_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_transaction_timeout = { .set = transaction_to_set, @@ -156,8 +154,8 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_transaction_timeout, "Maximum number of seconds to wait for a peer response."); -#define LNET_RETRY_COUNT_HEALTH_DEFAULT 2 -unsigned int lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT; +#define LNET_RETRY_COUNT_DEFAULT 2 +unsigned int lnet_retry_count = LNET_RETRY_COUNT_DEFAULT; static int retry_count_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_retry_count = { .set = retry_count_set, @@ -184,8 +182,8 @@ static int response_tracking_set(const char *val, MODULE_PARM_DESC(lnet_response_tracking, "(0|1|2|3) LNet Internal Only|GET Reply only|PUT ACK only|Full Tracking (default)"); -#define LNET_LND_TIMEOUT_DEFAULT ((LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT - 1) / \ - (LNET_RETRY_COUNT_HEALTH_DEFAULT + 1)) +#define LNET_LND_TIMEOUT_DEFAULT ((LNET_TRANSACTION_TIMEOUT_DEFAULT - 1) / \ + (LNET_RETRY_COUNT_DEFAULT + 1)) unsigned int lnet_lnd_timeout = LNET_LND_TIMEOUT_DEFAULT; static void lnet_set_lnd_timeout(void) { @@ -235,20 +233,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force, return -EINVAL; } - /* if we're turning on health then use the health timeout - * defaults. - */ - if (*sensitivity == 0 && value != 0) { - lnet_transaction_timeout = - LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT; - lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT; - lnet_set_lnd_timeout(); - /* if we're turning off health then use the no health timeout - * default. - */ - } else if (*sensitivity != 0 && value == 0) { - lnet_transaction_timeout = - LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT; + if (*sensitivity != 0 && value == 0 && lnet_retry_count != 0) { lnet_retry_count = 0; lnet_set_lnd_timeout(); } @@ -396,7 +381,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (value < lnet_retry_count || value == 0) { + if (value <= lnet_retry_count || value == 0) { mutex_unlock(&the_lnet.ln_api_mutex); CERROR("Invalid value for lnet_transaction_timeout (%lu). Has to be greater than lnet_retry_count (%u)\n", value, lnet_retry_count); @@ -437,9 +422,9 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (lnet_health_sensitivity == 0) { + if (lnet_health_sensitivity == 0 && value > 0) { mutex_unlock(&the_lnet.ln_api_mutex); - CERROR("Can not set retry_count when health feature is turned off\n"); + CERROR("Can not set lnet_retry_count when health feature is turned off\n"); return -EINVAL; }