From patchwork Thu Feb 27 21:07:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DE0314BC for ; Thu, 27 Feb 2020 21:18:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 36A00246A1 for ; Thu, 27 Feb 2020 21:18:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36A00246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 201C021FACA; Thu, 27 Feb 2020 13:18:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B5F321F906 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 37C006C5; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2DBBD46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:49 -0500 Message-Id: <1582838290-17243-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 001/622] lustre: always enable special debugging, fhandles, and quota support. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Lustre heavily depends on fhandles for its FID handling and needs quota always enabled. Signed-off-by: James Simmons --- fs/lustre/Kconfig | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/lustre/Kconfig b/fs/lustre/Kconfig index 2ea3f24..2eb7e45 100644 --- a/fs/lustre/Kconfig +++ b/fs/lustre/Kconfig @@ -9,6 +9,9 @@ config LUSTRE_FS select CRYPTO_SHA1 select CRYPTO_SHA256 select CRYPTO_SHA512 + select DEBUG_FS + select FHANDLE + select QUOTA depends on MULTIUSER help This option enables Lustre file system client support. Choose Y @@ -43,6 +46,7 @@ config LUSTRE_FS_POSIX_ACL config LUSTRE_DEBUG_EXPENSIVE_CHECK bool "Enable Lustre DEBUG checks" + select REFCOUNT_FULL depends on LUSTRE_FS help This option is mainly for debug purpose. It enables Lustre code to do From patchwork Thu Feb 27 21:07:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0447314BC for ; Thu, 27 Feb 2020 21:18:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E0CC1246A1 for ; Thu, 27 Feb 2020 21:18:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0CC1246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C8E021FA5C; Thu, 27 Feb 2020 13:18:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DBDD21F906 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 39A6B6C9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 308C846C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:50 -0500 Message-Id: <1582838290-17243-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 002/622] lustre: osc_cache: remove __might_sleep() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The patch 'simplify osc_wake_cache_waiters()' created a new wrapper wait_event_idle_exclusive_timeout_cmd() which includes a __might_sleep() test. This was causing the following back trace: kernel: BUG: sleeping function called from invalid context at fs/lustre/osc/osc_cache.c:1635 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19374, name: cp kernel: INFO: lockdep is turned off. kernel: Preemption disabled at: kernel: [<0000000000000000>] 0x0 kernel: CPU: 11 PID: 19374 Comm: cp Tainted: G W 5.4.0-rc5+ #1 kernel: Call Trace: kernel: dump_stack+0x5e/0x8b kernel: ___might_sleep+0x205/0x260 kernel: osc_queue_async_io+0x1104/0x1de0 [osc] kernel: ? _raw_spin_unlock+0x2e/0x50 kernel: ? libcfs_debug_msg+0x6ab/0xc80 [libcfs] kernel: ? vvp_io_setattr_start+0x200/0x200 [lustre] kernel: osc_page_cache_add+0x2c/0xa0 [osc] kernel: osc_io_commit_async+0x1a8/0x420 [osc] kernel: cl_io_commit_async+0x58/0x80 [obdclass] kernel: ? vvp_io_setattr_start+0x200/0x200 [lustre:1 This can be called from an atomic context and examing the code suggest we don't need __might_sleep() so lets remove it. Fixes: def8e96d4f3d ("lustre: osc_cache: simplify osc_wake_cache_waiters()") Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 3189eb3..2ed7ca2 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1570,7 +1570,6 @@ static bool osc_enter_cache_try(struct client_obd *cli, cmd1, cmd2) \ ({ \ long __ret = timeout; \ - might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_idle_exclusive_timeout_cmd( \ wq_head, condition, timeout, cmd1, cmd2); \ From patchwork Thu Feb 27 21:07:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBE0D138D for ; Thu, 27 Feb 2020 21:18:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D41A7246A1 for ; Thu, 27 Feb 2020 21:18:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D41A7246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C19EF21FA68; Thu, 27 Feb 2020 13:18:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CEB6E21F982 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3C5356CA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3174346D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:51 -0500 Message-Id: <1582838290-17243-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 003/622] lustre: uapi: remove enum hsm_progress_states X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" This enum is used only by server side code. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 0566afad..f5474c5 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1532,27 +1532,6 @@ enum hsm_states { */ #define HSM_FLAGS_MASK (HSM_USER_MASK | HSM_STATUS_MASK) -/** - * HSM request progress state - */ -enum hsm_progress_states { - HPS_WAITING = 1, - HPS_RUNNING = 2, - HPS_DONE = 3, -}; - -#define HPS_NONE 0 - -static inline const char *hsm_progress_state2name(enum hsm_progress_states s) -{ - switch (s) { - case HPS_WAITING: return "waiting"; - case HPS_RUNNING: return "running"; - case HPS_DONE: return "done"; - default: return "unknown"; - } -} - struct hsm_extent { __u64 offset; __u64 length; From patchwork Thu Feb 27 21:07:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 569EA14BC for ; Thu, 27 Feb 2020 21:18:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E454246A1 for ; Thu, 27 Feb 2020 21:18:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E454246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6184E21FA8C; Thu, 27 Feb 2020 13:18:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2024221F982 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3E5D66CB; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 34FF046F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:52 -0500 Message-Id: <1582838290-17243-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 004/622] lustre: uapi: sync enum obd_statfs_state X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" With the drift between the OpenSFS and linux client various enum obd_statfs_state values where dropped that are transmitted over the wire. Sync the values. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index f5474c5..27501a2 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -101,9 +101,9 @@ enum obd_statfs_state { OS_STATE_DEGRADED = 0x00000001, /**< RAID degraded/rebuilding */ OS_STATE_READONLY = 0x00000002, /**< filesystem is read-only */ - OS_STATE_RDONLY_1 = 0x00000004, /**< obsolete 1.6, was EROFS=30 */ - OS_STATE_RDONLY_2 = 0x00000008, /**< obsolete 1.6, was EROFS=30 */ - OS_STATE_RDONLY_3 = 0x00000010, /**< obsolete 1.6, was EROFS=30 */ + OS_STATE_NOPRECREATE = 0x00000004, /**< no object precreation */ + OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ + OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ }; struct obd_statfs { From patchwork Thu Feb 27 21:07:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02EEF14BC for ; Thu, 27 Feb 2020 21:18:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DF92C246A1 for ; Thu, 27 Feb 2020 21:18:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF92C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 317A521FABD; Thu, 27 Feb 2020 13:18:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6293421FA25 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 439046CC; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3800E47C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:53 -0500 Message-Id: <1582838290-17243-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 005/622] lustre: llite: return compatible fsid for statfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong Lustre uses 64-bits inode number to identify object on client side. When re-export Lustre via NFS, NFS will detect whether support fsid via statfs(). For the non-support case, it will only recognizes and packs low 32-bits inode number in nfs handle. Such handle cannot be used to locate the object properly. To avoid patch linux kernel, Lustre client should generate fsid and return it via statfs() to up layer. To be compatible with old Lustre client (NFS server), the fsid will be generated from super_block::s_dev. WC-bug-id: https://jira.whamcloud.com/browse/LU-2904 Lustre-commit: abe4d83fab00 ("LU-2904 llite: return compatible fsid for statfs") Signed-off-by: Fan Yong Reviewed-on: http://review.whamcloud.com/7434 Reviewed-by: Bobi Jam Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 3 --- fs/lustre/llite/llite_lib.c | 8 ++++---- fs/lustre/llite/llite_nfs.c | 16 ---------------- 3 files changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index f0a50fc..3192340 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -538,8 +538,6 @@ struct ll_sb_info { /* st_blksize returned by stat(2), when non-zero */ unsigned int ll_stat_blksize; - __kernel_fsid_t ll_fsid; - struct kset ll_kset; /* sysfs object */ struct completion ll_kobj_unregister; }; @@ -941,7 +939,6 @@ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum) /* llite/llite_nfs.c */ extern const struct export_operations lustre_export_operations; u32 get_uuid2int(const char *name, int len); -void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid); struct inode *search_inode_for_lustre(struct super_block *sb, const struct lu_fid *fid); int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index a48d753..e1932ae 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -591,10 +591,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) * only a node-local comparison. */ uuid = obd_get_uuid(sbi->ll_md_exp); - if (uuid) { + if (uuid) sb->s_dev = get_uuid2int(uuid->uuid, strlen(uuid->uuid)); - get_uuid2fsid(uuid->uuid, strlen(uuid->uuid), &sbi->ll_fsid); - } kfree(data); kfree(osfs); @@ -1775,6 +1773,7 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) { struct super_block *sb = de->d_sb; struct obd_statfs osfs; + u64 fsid = huge_encode_dev(sb->s_dev); int rc; CDEBUG(D_VFSTRACE, "VFS Op: at %llu jiffies\n", get_jiffies_64()); @@ -1805,7 +1804,8 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) sfs->f_blocks = osfs.os_blocks; sfs->f_bfree = osfs.os_bfree; sfs->f_bavail = osfs.os_bavail; - sfs->f_fsid = ll_s2sbi(sb)->ll_fsid; + sfs->f_fsid.val[0] = (u32)fsid; + sfs->f_fsid.val[1] = (u32)(fsid >> 32); return 0; } diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index d6643d0..434f92b 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -57,22 +57,6 @@ u32 get_uuid2int(const char *name, int len) return (key0 << 1); } -void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid) -{ - u64 key = 0, key0 = 0x12a3fe2d, key1 = 0x37abe8f9; - - while (len--) { - key = key1 + (key0 ^ (*name++ * 7152373)); - if (key & 0x8000000000000000ULL) - key -= 0x7fffffffffffffffULL; - key1 = key0; - key0 = key; - } - - fsid->val[0] = key; - fsid->val[1] = key >> 32; -} - struct inode *search_inode_for_lustre(struct super_block *sb, const struct lu_fid *fid) { From patchwork Thu Feb 27 21:07:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409661 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4F9D14BC for ; Thu, 27 Feb 2020 21:18:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD4B6246A1 for ; Thu, 27 Feb 2020 21:18:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD4B6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E495F21FB04; Thu, 27 Feb 2020 13:18:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B93DD21FA25 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 45E806CD; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3AE2D496; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:54 -0500 Message-Id: <1582838290-17243-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 006/622] lustre: ldlm: Make kvzalloc | kvfree use consistent X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "Christopher J. Morrone" struct ldlm_lock's l_lvb_data field is freed in ldlm_lock_put() using kfree. However, some other code paths can attach a buffer to l_lvb_data that was allocated using vmalloc(). This can lead to a kfree() of a vmalloc()ed buffer, which can trigger a kernel Oops. WC-bug-id: https://jira.whamcloud.com/browse/LU-4194 Lustre-commit: 9c4d506c5fea ("LU-4194 ldlm: Make OBD_[ALLOC|FREE]_LARGE use consistent") Signed-off-by: Christopher J. Morrone Reviewed-on: http://review.whamcloud.com/8298 Reviewed-by: Andreas Dilger Reviewed-by: Faccini Bruno Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 6eebf5f..7242cd1 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -185,7 +185,7 @@ void ldlm_lock_put(struct ldlm_lock *lock) lock->l_export = NULL; } - kfree(lock->l_lvb_data); + kvfree(lock->l_lvb_data); lu_ref_fini(&lock->l_reference); OBD_FREE_RCU(lock, sizeof(*lock), &lock->l_handle); @@ -1548,7 +1548,7 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, if (lvb_len) { lock->l_lvb_len = lvb_len; - lock->l_lvb_data = kzalloc(lvb_len, GFP_NOFS); + lock->l_lvb_data = kvzalloc(lvb_len, GFP_NOFS); if (!lock->l_lvb_data) { rc = -ENOMEM; goto out; From patchwork Thu Feb 27 21:07:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB2BF138D for ; Thu, 27 Feb 2020 21:18:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 938FC246A1 for ; Thu, 27 Feb 2020 21:18:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 938FC246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BC1621FAC8; Thu, 27 Feb 2020 13:18:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 497CF21FA41 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 481D66CF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3DA57498; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:55 -0500 Message-Id: <1582838290-17243-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 007/622] lustre: llite: limit smallest max_cached_mb value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Nunez , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: James Nunez Currently, ost-survey hangs due to calling 'lfs setstripe' in an old (positional) style and setting max_cached_mb to zero. In ll_max_cached_mb_seq_write(), the number of pages requested is set to the max of pages requested or PTLRPC_MAX_BRW_PAGES to allow the client to make well formed RPCs. WC-bug-id: https://jira.whamcloud.com/browse/LU-4768 Lustre-commit: 46bec835ac72 ("LU-4768 tests: Update ost-survey script") Signed-off-by: James Nunez Reviewed-on: http://review.whamcloud.com/11971 Reviewed-by: Nathaniel Clark Reviewed-by: Cliff White Reviewed-by: Jian Yu Reviewed-by: Jinshan Xiong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/lproc_llite.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index e108326..5ac6689 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -527,6 +527,8 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file, totalram_pages() >> (20 - PAGE_SHIFT)); return -ERANGE; } + /* Allow enough cache so clients can make well-formed RPCs */ + pages_number = max_t(long, pages_number, PTLRPC_MAX_BRW_PAGES); spin_lock(&sbi->ll_lock); diff = pages_number - cache->ccc_lru_max; From patchwork Thu Feb 27 21:07:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0911614BC for ; Thu, 27 Feb 2020 21:18:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E5C54246A1 for ; Thu, 27 Feb 2020 21:18:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5C54246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D724E21FB27; Thu, 27 Feb 2020 13:18:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 08B3D21FA41 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 481406CE; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 41E6C468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:56 -0500 Message-Id: <1582838290-17243-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 008/622] lustre: obdecho: turn on async flag only for mode 3 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rahul Deshmukh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Rahul Deshmukh There are couple of problems in obdfilter-survey: - Type of test brw i.e. "g" was not followed with npages, - Target netdisk was not set properly and - Turn ON async flag only for mode 3. This patch fixes the last problem which is kernel side. WC-bug-id: https://jira.whamcloud.com/browse/LU-5031 Lustre-commit: 9f38647a7b24 ("LU-5031 tests: obdfilter-survey fixes") Signed-off-by: Rahul Deshmukh Reviewed-on: http://review.whamcloud.com/10264 Reviewed-by: Cliff White Reviewed-by: Bob Glossman Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdecho/echo_client.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index ca963bb..3984cb4 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1425,7 +1425,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, struct obdo *oa = &data->ioc_obdo1; struct echo_object *eco; int rc; - int async = 1; + int async = 0; long test_mode; LASSERT(oa->o_valid & OBD_MD_FLGROUP); @@ -1438,14 +1438,14 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, /* OFD/obdfilter works only via prep/commit */ test_mode = (long)data->ioc_pbuf1; - if (test_mode == 1) - async = 0; - if (!ed->ed_next && test_mode != 3) { test_mode = 3; data->ioc_plen1 = data->ioc_count; } + if (test_mode == 3) + async = 1; + /* Truncate batch size to maximum */ if (data->ioc_plen1 > PTLRPC_MAX_BRW_SIZE) data->ioc_plen1 = PTLRPC_MAX_BRW_SIZE; From patchwork Thu Feb 27 21:07:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 058B514BC for ; Thu, 27 Feb 2020 21:18:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DEC51246A1 for ; Thu, 27 Feb 2020 21:18:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEC51246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4DFC21FAD5; Thu, 27 Feb 2020 13:18:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B8A421FA46 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 49B066D0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4531F46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:57 -0500 Message-Id: <1582838290-17243-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 009/622] lustre: llite: reorganize variable and data structures X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" This patch covers the bits missed in the patch series "Lustre IO stack simplifications and cleanups" from the OpenSFS branch for the LU-5971 work. Details of the original push can be viewed at https://lore.kernel.org/patchwork/cover/662900. No Fixed is provided since the staging patch series was broken up into a much larger patch set. WC-bug-id: https://jira.whamcloud.com/browse/LU-5971 Lustre-commit: 6eda93c7b5f6 ("LU-5971 llite: reorganize variable and data structures") Signed-off-by: John L. Hammond Signed-off-by: Jinshan Xiong Reviewed-on: http://review.whamcloud.com/13714 Reviewed-by: Bobi Jam Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 1 + fs/lustre/llite/glimpse.c | 1 + fs/lustre/llite/lcommon_cl.c | 5 ++--- fs/lustre/llite/lcommon_misc.c | 24 ++++++++++++------------ fs/lustre/llite/llite_internal.h | 8 ++++---- fs/lustre/llite/llite_lib.c | 4 ++-- fs/lustre/llite/super25.c | 1 + fs/lustre/llite/vvp_dev.c | 1 + fs/lustre/llite/vvp_internal.h | 13 +++---------- fs/lustre/llite/vvp_io.c | 4 ++-- 10 files changed, 29 insertions(+), 33 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fe4340d..fe965b1 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -49,6 +49,7 @@ #include #include "llite_internal.h" +#include "vvp_internal.h" struct split_param { struct inode *sp_inode; diff --git a/fs/lustre/llite/glimpse.c b/fs/lustre/llite/glimpse.c index de1a31f..3441904 100644 --- a/fs/lustre/llite/glimpse.c +++ b/fs/lustre/llite/glimpse.c @@ -47,6 +47,7 @@ #include #include "llite_internal.h" +#include "vvp_internal.h" static const struct cl_lock_descr whole_file = { .cld_start = 0, diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c index 988855b..978e05b 100644 --- a/fs/lustre/llite/lcommon_cl.c +++ b/fs/lustre/llite/lcommon_cl.c @@ -30,8 +30,6 @@ * This file is part of Lustre, http://www.lustre.org/ * Lustre is a trademark of Sun Microsystems, Inc. * - * cl code used by vvp (and other Lustre clients in the future). - * * Author: Nikita Danilov */ @@ -63,6 +61,7 @@ * Vvp device and device type functions. * */ +#include "vvp_internal.h" /** * An `emergency' environment used by cl_inode_fini() when cl_env_get() @@ -282,7 +281,7 @@ u64 cl_fid_build_ino(const struct lu_fid *fid, bool api32) return fid_flatten(fid); } -/** +/* * build inode generation from passed @fid. If our FID overflows the 32-bit * inode number then return a non-zero generation to distinguish them. */ diff --git a/fs/lustre/llite/lcommon_misc.c b/fs/lustre/llite/lcommon_misc.c index 29daf5b..48503d6 100644 --- a/fs/lustre/llite/lcommon_misc.c +++ b/fs/lustre/llite/lcommon_misc.c @@ -46,7 +46,7 @@ * maximum-sized (= maximum striped) EA and cookie without having to * calculate this (via a call into the LOV + OSCs) each time we make an RPC. */ -int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp) +static int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp) { u32 val_size, max_easize, def_easize; int rc; @@ -115,7 +115,7 @@ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, #define GROUPLOCK_SCOPE "grouplock" int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, - struct ll_grouplock *cg) + struct ll_grouplock *lg) { struct lu_env *env; struct cl_io *io; @@ -160,22 +160,22 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, return rc; } - cg->lg_env = env; - cg->lg_io = io; - cg->lg_lock = lock; - cg->lg_gid = gid; + lg->lg_env = env; + lg->lg_io = io; + lg->lg_lock = lock; + lg->lg_gid = gid; return 0; } -void cl_put_grouplock(struct ll_grouplock *cg) +void cl_put_grouplock(struct ll_grouplock *lg) { - struct lu_env *env = cg->lg_env; - struct cl_io *io = cg->lg_io; - struct cl_lock *lock = cg->lg_lock; + struct lu_env *env = lg->lg_env; + struct cl_io *io = lg->lg_io; + struct cl_lock *lock = lg->lg_lock; - LASSERT(cg->lg_env); - LASSERT(cg->lg_gid); + LASSERT(lg->lg_env); + LASSERT(lg->lg_gid); cl_lock_release(env, lock); cl_io_fini(env, io); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 3192340..fbe93a4 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -707,7 +707,6 @@ static inline bool ll_sbi_has_tiny_write(struct ll_sb_info *sbi) void ll_ras_enter(struct file *f); /* llite/lcommon_misc.c */ -int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp); int cl_ocd_update(struct obd_device *host, struct obd_device *watched, enum obd_notify_event ev, void *owner); int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, @@ -975,9 +974,9 @@ struct ll_cl_context { struct ll_thread_info { struct iov_iter lti_iter; - struct vvp_io_args lti_args; - struct ra_io_arg lti_ria; - struct ll_cl_context lti_io_ctx; + struct vvp_io_args lti_args; + struct ra_io_arg lti_ria; + struct ll_cl_context lti_io_ctx; }; extern struct lu_context_key ll_thread_key; @@ -1165,6 +1164,7 @@ struct ll_statahead_info { blkcnt_t dirty_cnt(struct inode *inode); int __cl_glimpse_size(struct inode *inode, int agl); + int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io, struct inode *inode, struct cl_object *clob, int agl); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e1932ae..aaa8ad2 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2542,7 +2542,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) { char *buf, *path = NULL; struct dentry *dentry = NULL; - struct vvp_object *obj = cl_inode2vvp(page->mapping->host); + struct inode *inode = page->mapping->host; /* this can be called inside spin lock so use GFP_ATOMIC. */ buf = (char *)__get_free_page(GFP_ATOMIC); @@ -2556,7 +2556,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) "%s: dirty page discard: %s/fid: " DFID "/%s may get corrupted (rc %d)\n", ll_get_fsname(page->mapping->host->i_sb, NULL, 0), s2lsi(page->mapping->host->i_sb)->lsi_lmd->lmd_dev, - PFID(&obj->vob_header.coh_lu.loh_fid), + PFID(ll_inode2fid(inode)), (path && !IS_ERR(path)) ? path : "", ioret); if (dentry) diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 2b65e2f..133fe2a 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -42,6 +42,7 @@ #include #include #include "llite_internal.h" +#include "vvp_internal.h" static struct kmem_cache *ll_inode_cachep; diff --git a/fs/lustre/llite/vvp_dev.c b/fs/lustre/llite/vvp_dev.c index 9f793e9..e1d87f9 100644 --- a/fs/lustre/llite/vvp_dev.c +++ b/fs/lustre/llite/vvp_dev.c @@ -93,6 +93,7 @@ static void *ll_thread_key_init(const struct lu_context *ctx, info = kmem_cache_zalloc(ll_thread_kmem, GFP_NOFS); if (!info) info = ERR_PTR(-ENOMEM); + return info; } diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h index 96f10d2..7a463cb 100644 --- a/fs/lustre/llite/vvp_internal.h +++ b/fs/lustre/llite/vvp_internal.h @@ -166,7 +166,7 @@ static inline struct cl_io *vvp_env_thread_io(const struct lu_env *env) } struct vvp_session { - struct vvp_io cs_ios; + struct vvp_io vs_ios; }; static inline struct vvp_session *vvp_env_session(const struct lu_env *env) @@ -181,11 +181,11 @@ static inline struct vvp_session *vvp_env_session(const struct lu_env *env) static inline struct vvp_io *vvp_env_io(const struct lu_env *env) { - return &vvp_env_session(env)->cs_ios; + return &vvp_env_session(env)->vs_ios; } /** - * ccc-private object state. + * VPP-private object state. */ struct vvp_object { struct cl_object_header vob_header; @@ -246,13 +246,6 @@ struct vvp_device { struct cl_device *vdv_next; }; -void *ccc_key_init(const struct lu_context *ctx, - struct lu_context_key *key); -void ccc_key_fini(const struct lu_context *ctx, - struct lu_context_key *key, void *data); - -void ccc_umount(const struct lu_env *env, struct cl_device *dev); - static inline struct lu_device *vvp2lu_dev(struct vvp_device *vdv) { return &vdv->vdv_cl.cd_lu_dev; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 6145064..37bf942 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -416,10 +416,10 @@ static enum cl_lock_mode vvp_mode_from_vma(struct vm_area_struct *vma) static int vvp_mmap_locks(const struct lu_env *env, struct vvp_io *vio, struct cl_io *io) { - struct vvp_thread_info *cti = vvp_env_info(env); + struct vvp_thread_info *vti = vvp_env_info(env); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - struct cl_lock_descr *descr = &cti->vti_descr; + struct cl_lock_descr *descr = &vti->vti_descr; union ldlm_policy_data policy; unsigned long addr; ssize_t count; From patchwork Thu Feb 27 21:07:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A05ED138D for ; Thu, 27 Feb 2020 21:18:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 89031246A1 for ; Thu, 27 Feb 2020 21:18:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89031246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C9CAF21FB18; Thu, 27 Feb 2020 13:18:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5B3C21FA4E for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4B3CE6D1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4825D46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:58 -0500 Message-Id: <1582838290-17243-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 010/622] lustre: llite: increase whole-file readahead to RPC size X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Increase the default whole-file readahead limit to match the current RPC size. That ensures that files smaller than the RPC size will be read in a single round-trip instead of sending multiple smaller RPCs. WC-bug-id: https://jira.whamcloud.com/browse/LU-7990 Lustre-commit: 627d0133d9d7 ("LU-7990 llite: increase whole-file readahead to RPC size") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/26955 Reviewed-by: Patrick Farrell Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index aaa8ad2..12aafe0 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -465,6 +465,12 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sbi->ll_dt_exp->exp_connect_data = *data; + /* Don't change value if it was specified in the config log */ + if (sbi->ll_ra_info.ra_max_read_ahead_whole_pages == -1) + sbi->ll_ra_info.ra_max_read_ahead_whole_pages = + max_t(unsigned long, SBI_DEFAULT_READAHEAD_WHOLE_MAX, + (data->ocd_brw_size >> PAGE_SHIFT)); + err = obd_fid_init(sbi->ll_dt_exp->exp_obd, sbi->ll_dt_exp, LUSTRE_SEQ_METADATA); if (err) { From patchwork Thu Feb 27 21:07:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA9A614BC for ; Thu, 27 Feb 2020 21:18:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D30FA246A1 for ; Thu, 27 Feb 2020 21:18:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D30FA246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDD2E21FA61; Thu, 27 Feb 2020 13:18:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36DEA21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4F5E86D3; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4B21D46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:59 -0500 Message-Id: <1582838290-17243-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 011/622] lustre: llite: handle ORPHAN/DEAD directories X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Di Wang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Di Wang Don't set the directory MDS striping if the parent is dead. To test this works add the OBD_FAIL_LLITE_NO_CHECK_DEAD injection fault. WC-bug-id: https://jira.whamcloud.com/browse/LU-7579 Lustre-commit: 098fb363c39 ("LU-7579 osd: move ORPHAN/DEAD flag to OSD") Signed-off-by: Di Wang Reviewed-on: http://review.whamcloud.com/18024 Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/llite/dir.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index e10b372..653a456 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -442,6 +442,7 @@ #define OBD_FAIL_LLITE_XATTR_ENOMEM 0x1405 #define OBD_FAIL_MAKE_LOVEA_HOLE 0x1406 #define OBD_FAIL_LLITE_LOST_LAYOUT 0x1407 +#define OBD_FAIL_LLITE_NO_CHECK_DEAD 0x1408 #define OBD_FAIL_GETATTR_DELAY 0x1409 #define OBD_FAIL_LLITE_CREATE_NODE_PAUSE 0x140c #define OBD_FAIL_LLITE_IMUTEX_SEC 0x140e diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index d3ef669..f21727b 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -433,6 +433,10 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, !(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_DIR_STRIPE)) return -EINVAL; + if (IS_DEADDIR(parent) && + !OBD_FAIL_CHECK(OBD_FAIL_LLITE_NO_CHECK_DEAD)) + return -ENOENT; + if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC) && lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC_SPECIFIC)) lustre_swab_lmv_user_md(lump); From patchwork Thu Feb 27 21:08:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FA26138D for ; Thu, 27 Feb 2020 21:18:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7845B246A1 for ; Thu, 27 Feb 2020 21:18:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7845B246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5EFA21FA63; Thu, 27 Feb 2020 13:18:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D2EF21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 515F76D7; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4DF5446F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:00 -0500 Message-Id: <1582838290-17243-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 012/622] lustre: lov: protected ost pool count updation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jadhav Vikram , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jadhav Vikram ASSERTION(iter->lpi_idx <= ((iter->lpi_pool)->pool_obds.op_count) caused due to reading of ost pool count is not protected in pool_proc_next and pool_proc_show, pool_proc_show get called when op_count was zero. Fix to protect ost pool count by taking lock at start sequence function pool_proc_start and released lock in pool_proc_stop. Rather than using down_read / up_read pairs around pool_proc_next and pool_proc_show, this changes make sure ost pool data gets protected throughout sequence operation. Seagate-bug-id: MRP-3629 WC-bug-id: https://jira.whamcloud.com/browse/LU-9620 Lustre-commit: 61c803319b91 ("LU-9620 lod: protected ost pool count updation") Signed-off-by: Jadhav Vikram Reviewed-by: Ashish Purkar Reviewed-by: Vladimir Saveliev Reviewed-on: https://review.whamcloud.com/27506 Reviewed-by: Fan Yong Reviewed-by: Niu Yawei Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_pool.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c index 60565b9..a0552fb 100644 --- a/fs/lustre/lov/lov_pool.c +++ b/fs/lustre/lov/lov_pool.c @@ -117,14 +117,11 @@ static void *pool_proc_next(struct seq_file *s, void *v, loff_t *pos) /* iterate to find a non empty entry */ prev_idx = iter->idx; - down_read(&pool_tgt_rw_sem(iter->pool)); iter->idx++; - if (iter->idx == pool_tgt_count(iter->pool)) { + if (iter->idx >= pool_tgt_count(iter->pool)) { iter->idx = prev_idx; /* we stay on the last entry */ - up_read(&pool_tgt_rw_sem(iter->pool)); return NULL; } - up_read(&pool_tgt_rw_sem(iter->pool)); (*pos)++; /* return != NULL to continue */ return iter; @@ -157,6 +154,7 @@ static void *pool_proc_start(struct seq_file *s, loff_t *pos) */ /* /!\ do not forget to restore it to pool before freeing it */ s->private = iter; + down_read(&pool_tgt_rw_sem(pool)); if (*pos > 0) { loff_t i; void *ptr; @@ -179,6 +177,7 @@ static void pool_proc_stop(struct seq_file *s, void *v) * we have to free only if s->private is an iterator */ if ((iter) && (iter->magic == POOL_IT_MAGIC)) { + up_read(&pool_tgt_rw_sem(iter->pool)); /* we restore s->private so next call to pool_proc_start() * will work */ @@ -197,9 +196,7 @@ static int pool_proc_show(struct seq_file *s, void *v) LASSERT(iter->pool); LASSERT(iter->idx <= pool_tgt_count(iter->pool)); - down_read(&pool_tgt_rw_sem(iter->pool)); tgt = pool_tgt(iter->pool, iter->idx); - up_read(&pool_tgt_rw_sem(iter->pool)); if (tgt) seq_printf(s, "%s\n", obd_uuid2str(&tgt->ltd_uuid)); From patchwork Thu Feb 27 21:08:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD0C014BC for ; Thu, 27 Feb 2020 21:18:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1D11246A1 for ; Thu, 27 Feb 2020 21:18:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1D11246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93B5321FB4F; Thu, 27 Feb 2020 13:18:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB83F21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 539496D9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 50F9747C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:01 -0500 Message-Id: <1582838290-17243-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 013/622] lustre: obdclass: fix llog_cat_cleanup() usage on Client X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini With patch/commit 3a83b4b9 for LU-5195, LLOG code has been strengthen against catalog inconsistency by detecting a referenced plain LLOG is missing and by clearing its associated entry by calling llog_cat_cleanup(), which now needs to handle the case where it is also executed on a Client (ie, cathandle->lgh_obj == NULL) and thus must not attempt to update on-disk catalog. WC-bug-id: https://jira.whamcloud.com/browse/LU-6471 Lustre-commit: 485f3ba87433 ("LU-6471 obdclass: fix llog_cat_cleanup() usage on Client") Signed-off-by: Bruno Faccini Reviewed-on: http://review.whamcloud.com/14489 Reviewed-by: Alex Zhuravlev Reviewed-by: John L. Hammond Reviewed-by: Mikhail Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/llog_cat.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index 580d807..ca97e08 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -133,10 +133,8 @@ int llog_cat_close(const struct lu_env *env, struct llog_handle *cathandle) list_del_init(&loghandle->u.phd.phd_entry); llog_close(env, loghandle); } - /* if handle was stored in ctxt, remove it too */ - if (cathandle->lgh_ctxt->loc_handle == cathandle) - cathandle->lgh_ctxt->loc_handle = NULL; - return llog_close(env, cathandle); + + return 0; } EXPORT_SYMBOL(llog_cat_close); From patchwork Thu Feb 27 21:08:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F1B3138D for ; Thu, 27 Feb 2020 21:18:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 47865246A1 for ; Thu, 27 Feb 2020 21:18:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47865246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A5FA121FA4B; Thu, 27 Feb 2020 13:18:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17EE321FA60 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 55C936DA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 54011468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:02 -0500 Message-Id: <1582838290-17243-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 014/622] lustre: mdc: fix possible NULL pointer dereference X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Fix two static analysis errors. fs/lustre/mdc/mdc_dev.c: in mdc_enqueue_send(), pointer 'matched' return from call to function 'ldlm_handle2lock' at line 704 may be NULL and will be dereferenced at line 705. If client is evicted between ldlm_lock_match() and ldlm_handle2lock() the lock pointer could be NULL. fs/lustre/lov/lov_dev.c:488 in lov_process_config, sscanf format specification '%d' expects type 'int' for 'd', but parameter 3 has a different type '__u32'. Converting to kstrtou32() requires changing the "index" variable type from __u32 to u32, which is fine since it is only used internally, fix up the few functions that are also passing "__u32 index" and the resulting checkpatch.pl warnings. WC-bug-id: https://jira.whamcloud.com/browse/LU-10264 Lustre-commit: b89206476174 ("LU-10264 mdc: fix possible NULL pointer dereference") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31621 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 45 ++++++++++++++++++++++++--------------------- fs/lustre/mdc/mdc_dev.c | 2 +- 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 1708fa9..26637bc 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -312,7 +312,8 @@ static int lov_disconnect(struct obd_export *exp) { struct obd_device *obd = class_exp2obd(exp); struct lov_obd *lov = &obd->u.lov; - int i, rc; + u32 index; + int rc; if (!lov->lov_tgts) goto out; @@ -321,19 +322,19 @@ static int lov_disconnect(struct obd_export *exp) lov->lov_connects--; if (lov->lov_connects != 0) { /* why should there be more than 1 connect? */ - CERROR("disconnect #%d\n", lov->lov_connects); + CWARN("%s: unexpected disconnect #%d\n", + obd->obd_name, lov->lov_connects); goto out; } - /* Let's hold another reference so lov_del_obd doesn't spin through - * putref every time - */ + /* hold another ref so lov_del_obd() doesn't spin in putref each time */ lov_tgts_getref(obd); - for (i = 0; i < lov->desc.ld_tgt_count; i++) { - if (lov->lov_tgts[i] && lov->lov_tgts[i]->ltd_exp) { - /* Disconnection is the last we know about an obd */ - lov_del_target(obd, i, NULL, lov->lov_tgts[i]->ltd_gen); + for (index = 0; index < lov->desc.ld_tgt_count; index++) { + if (lov->lov_tgts[index] && lov->lov_tgts[index]->ltd_exp) { + /* Disconnection is the last we know about an OBD */ + lov_del_target(obd, index, NULL, + lov->lov_tgts[index]->ltd_gen); } } @@ -490,13 +491,12 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, uuidp->uuid, index, gen, active); if (gen <= 0) { - CERROR("request to add OBD %s with invalid generation: %d\n", - uuidp->uuid, gen); + CERROR("%s: request to add '%s' with invalid generation: %d\n", + obd->obd_name, uuidp->uuid, gen); return -EINVAL; } - tgt_obd = class_find_client_obd(uuidp, LUSTRE_OSC_NAME, - &obd->obd_uuid); + tgt_obd = class_find_client_obd(uuidp, LUSTRE_OSC_NAME, &obd->obd_uuid); if (!tgt_obd) return -EINVAL; @@ -504,10 +504,11 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, if ((index < lov->lov_tgt_size) && lov->lov_tgts[index]) { tgt = lov->lov_tgts[index]; - CERROR("UUID %s already assigned at LOV target index %d\n", - obd_uuid2str(&tgt->ltd_uuid), index); + rc = -EEXIST; + CERROR("%s: UUID %s already assigned at index %d: rc = %d\n", + obd->obd_name, obd_uuid2str(&tgt->ltd_uuid), index, rc); mutex_unlock(&lov->lov_lock); - return -EEXIST; + return rc; } if (index >= lov->lov_tgt_size) { @@ -602,8 +603,8 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, out: if (rc) { - CERROR("add failed (%d), deleting %s\n", rc, - obd_uuid2str(&tgt->ltd_uuid)); + CERROR("%s: add failed, deleting %s: rc = %d\n", + obd->obd_name, obd_uuid2str(&tgt->ltd_uuid), rc); lov_del_target(obd, index, NULL, 0); } lov_tgts_putref(obd); @@ -860,6 +861,7 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, case LCFG_LOV_DEL_OBD: { u32 index; int gen; + /* lov_modify_tgts add 0:lov_mdsA 1:ost1_UUID 2:0 3:1 */ if (LUSTRE_CFG_BUFLEN(lcfg, 1) > sizeof(obd_uuid.uuid)) { rc = -EINVAL; @@ -868,11 +870,11 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, obd_str2uuid(&obd_uuid, lustre_cfg_buf(lcfg, 1)); - rc = kstrtoint(lustre_cfg_buf(lcfg, 2), 10, indexp); - if (rc < 0) + rc = kstrtou32(lustre_cfg_buf(lcfg, 2), 10, indexp); + if (rc) goto out; rc = kstrtoint(lustre_cfg_buf(lcfg, 3), 10, genp); - if (rc < 0) + if (rc) goto out; index = *indexp; gen = *genp; @@ -882,6 +884,7 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, rc = lov_add_target(obd, &obd_uuid, index, gen, 0); else rc = lov_del_target(obd, index, &obd_uuid, gen); + goto out; } case LCFG_PARAM: { diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index ca0822d..80e3120 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -684,7 +684,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, return ELDLM_OK; matched = ldlm_handle2lock(&lockh); - if (ldlm_is_kms_ignore(matched)) + if (!matched || ldlm_is_kms_ignore(matched)) goto no_match; if (mdc_set_dom_lock_data(env, matched, einfo->ei_cbdata)) { From patchwork Thu Feb 27 21:08:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409675 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29310138D for ; Thu, 27 Feb 2020 21:19:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11856246A1 for ; Thu, 27 Feb 2020 21:19:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11856246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BA0321FB2A; Thu, 27 Feb 2020 13:18:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6EBF221FA67 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 585EA6DF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 572BB46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:03 -0500 Message-Id: <1582838290-17243-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 015/622] lustre: obdclass: allow specifying complex jobids X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Allow specifying a format string for the jobid_name variable to create a jobid for processes on the client. The jobid_name is used when jobid_var=nodelocal, if jobid_name contains "%j", or as a fallback if getting the specified jobid_var from the environment fails. The jobid_node string allows the following escape sequences: %e = executable name %g = group ID %h = hostname (system utsname) %j = jobid from jobid_var environment variable %p = process ID %u = user ID Any unknown escape sequences are dropped. Other arbitrary characters pass through unmodified, up to the maximum jobid string size of 32, though whitespace within the jobid is not copied. This allows, for example, specifying an arbitrary prefix, such as the cluster name, in addition to the traditional "procname.uid" format, to distinguish between jobs running on clients in different clusters: lctl set_param jobid_var=nodelocal jobid_name=cluster2.%e.%u or lctl set_param jobid_var=SLURM_JOB_ID jobid_name=cluster2.%j.%e To use an environment-specified JobID, if available, but fall back to a static string for all processes that do not have a valid JobID: lctl set_param jobid_var=SLURM_JOB_ID jobid_name=unknown Implementation notes: The LUSTRE_JOBID_SIZE includes a trailing NUL, so don't use "LUSTRE_JOBID_SIZE + 1" anywhere, as that is misleading. Rename the "obd_jobid_node" variable to "obd_jobid_name" to match the sysfs "jobid_name" parameter name to avoid confusion. Rename "struct jobid_to_pid_map" to "jobid_pid_map" since this is not actually mapping from a jobid *to* a PID, but the reverse. Save jobid length, and reorder fields to avoid holes in structure. Consolidate PID->jobid cache handling in jobid_get_from_cache(), which only does environment lookups and caches the results. The fallback to using obd_jobid_name is handled by the caller. Rename check_job_name() to jobid_name_is_valid(), since that makes it clear to the reader a "true" return is a valid name. In jobid_cache_init() there is no benefit for locking the jobid_hash creation, since the spinlock is just initialized in this function, so multiple callers of this function would already be broken. Pass the buffer size from the callers (who know the buffer size) to lustre_get_jobid() instead of assuming it is LUSTRE_JOBID_SIZE. WC-bug-id: https://jira.whamcloud.com/browse/LU-10698 Lustre-commit: 6488c0ec57de ("LU-10698 obdclass: allow specifying complex jobids") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31691 Reviewed-by: Jinshan Xiong Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 4 +- fs/lustre/llite/llite_internal.h | 4 +- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/vvp_io.c | 2 +- fs/lustre/llite/vvp_object.c | 3 +- fs/lustre/obdclass/jobid.c | 95 +++++++++++++++++++++++++++++++--- fs/lustre/obdclass/obd_sysfs.c | 10 ++-- fs/lustre/ptlrpc/pack_generic.c | 4 +- include/uapi/linux/lustre/lustre_idl.h | 2 +- 9 files changed, 105 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 9e07853..146c37e 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -54,7 +54,7 @@ /* OBD Operations Declarations */ struct obd_device *class_exp2obd(struct obd_export *exp); int class_handle_ioctl(unsigned int cmd, unsigned long arg); -int lustre_get_jobid(char *jobid); +int lustre_get_jobid(char *jobid, size_t len); struct lu_device_type; @@ -1672,7 +1672,7 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) int class_check_uuid(struct obd_uuid *uuid, u64 nid); /* class_obd.c */ -extern char obd_jobid_node[]; +extern char obd_jobid_name[]; int class_procfs_init(void); int class_procfs_clean(void); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index fbe93a4..d0a703d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -195,11 +195,11 @@ struct ll_inode_info { int lli_async_rc; /* - * whenever a process try to read/write the file, the + * Whenever a process try to read/write the file, the * jobid of the process will be saved here, and it'll * be packed into the write PRC when flush later. * - * so the read/write statistics for jobid will not be + * So the read/write statistics for jobid will not be * accurate if the file is shared by different jobs. */ char lli_jobid[LUSTRE_JOBID_SIZE]; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 12aafe0..7580d57 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -937,7 +937,7 @@ void ll_lli_init(struct ll_inode_info *lli) lli->lli_async_rc = 0; } mutex_init(&lli->lli_layout_mutex); - memset(lli->lli_jobid, 0, LUSTRE_JOBID_SIZE); + memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); } int ll_fill_super(struct super_block *sb) diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 37bf942..85bb3e0 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1419,7 +1419,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj, * it's not accurate if the file is shared by different * jobs. */ - lustre_get_jobid(lli->lli_jobid); + lustre_get_jobid(lli->lli_jobid, sizeof(lli->lli_jobid)); } else if (io->ci_type == CIT_SETATTR) { if (!cl_io_is_trunc(io)) io->ci_lockreq = CILR_MANDATORY; diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index c750a80..24cde0d 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -212,7 +212,8 @@ static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid); if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_INVALID_PFID)) oa->o_parent_oid++; - memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid, LUSTRE_JOBID_SIZE); + memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid, + sizeof(attr->cra_jobid)); } static const struct cl_object_operations vvp_ops = { diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c index 3655a2e..8bad859 100644 --- a/fs/lustre/obdclass/jobid.c +++ b/fs/lustre/obdclass/jobid.c @@ -32,17 +32,19 @@ */ #define DEBUG_SUBSYSTEM S_RPC +#include #include #ifdef HAVE_UIDGID_HEADER #include #endif +#include #include #include #include char obd_jobid_var[JOBSTATS_JOBID_VAR_MAX_LEN + 1] = JOBSTATS_DISABLE; -char obd_jobid_node[LUSTRE_JOBID_SIZE + 1]; +char obd_jobid_name[LUSTRE_JOBID_SIZE] = "%e.%u"; /* Get jobid of current process from stored variable or calculate * it from pid and user_id. @@ -52,9 +54,89 @@ * This is now deprecated. */ -int lustre_get_jobid(char *jobid) +/* + * jobid_interpret_string() + * + * Interpret the jobfmt string to expand specified fields, like coredumps do: + * %e = executable + * %g = gid + * %h = hostname + * %j = jobid from environment + * %p = pid + * %u = uid + * + * Unknown escape strings are dropped. Other characters are copied through, + * excluding whitespace (to avoid making jobid parsing difficult). + * + * Return: -EOVERFLOW if the expanded string does not fit within @joblen + * 0 for success + */ +static int jobid_interpret_string(const char *jobfmt, char *jobid, + ssize_t joblen) +{ + char c; + + while ((c = *jobfmt++) && joblen > 1) { + char f; + int l; + + if (isspace(c)) /* Don't allow embedded spaces */ + continue; + + if (c != '%') { + *jobid = c; + joblen--; + jobid++; + continue; + } + + switch ((f = *jobfmt++)) { + case 'e': /* executable name */ + l = snprintf(jobid, joblen, "%s", current->comm); + break; + case 'g': /* group ID */ + l = snprintf(jobid, joblen, "%u", + from_kgid(&init_user_ns, current_fsgid())); + break; + case 'h': /* hostname */ + l = snprintf(jobid, joblen, "%s", + init_utsname()->nodename); + break; + case 'j': /* jobid requested by process + * - currently not supported + */ + l = snprintf(jobid, joblen, "%s", "jobid"); + break; + case 'p': /* process ID */ + l = snprintf(jobid, joblen, "%u", current->pid); + break; + case 'u': /* user ID */ + l = snprintf(jobid, joblen, "%u", + from_kuid(&init_user_ns, current_fsuid())); + break; + case '\0': /* '%' at end of format string */ + l = 0; + goto out; + default: /* drop unknown %x format strings */ + l = 0; + break; + } + jobid += l; + joblen -= l; + } + /* + * This points at the end of the buffer, so long as jobid is always + * incremented the same amount as joblen is decremented. + */ +out: + jobid[joblen - 1] = '\0'; + + return joblen < 0 ? -EOVERFLOW : 0; +} + +int lustre_get_jobid(char *jobid, size_t joblen) { - char tmp_jobid[LUSTRE_JOBID_SIZE] = { 0 }; + char tmp_jobid[LUSTRE_JOBID_SIZE] = ""; /* Jobstats isn't enabled */ if (strcmp(obd_jobid_var, JOBSTATS_DISABLE) == 0) @@ -70,10 +152,11 @@ int lustre_get_jobid(char *jobid) /* Whole node dedicated to single job */ if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0) { - strcpy(tmp_jobid, obd_jobid_node); - goto out_cache_jobid; + int rc2 = jobid_interpret_string(obd_jobid_name, + tmp_jobid, joblen); + if (!rc2) + goto out_cache_jobid; } - return -ENOENT; out_cache_jobid: diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c index bac8e7c5..cd2917e 100644 --- a/fs/lustre/obdclass/obd_sysfs.c +++ b/fs/lustre/obdclass/obd_sysfs.c @@ -233,7 +233,7 @@ static ssize_t jobid_var_store(struct kobject *kobj, struct attribute *attr, static ssize_t jobid_name_show(struct kobject *kobj, struct attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_node); + return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_name); } static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, @@ -243,13 +243,13 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, if (!count || count > LUSTRE_JOBID_SIZE) return -EINVAL; - memcpy(obd_jobid_node, buffer, count); + memcpy(obd_jobid_name, buffer, count); - obd_jobid_node[count] = 0; + obd_jobid_name[count] = 0; /* Trim the trailing '\n' if any */ - if (obd_jobid_node[count - 1] == '\n') - obd_jobid_node[count - 1] = 0; + if (obd_jobid_name[count - 1] == '\n') + obd_jobid_name[count - 1] = 0; return count; } diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index b6a4fd8..bc5e513 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1406,9 +1406,9 @@ void lustre_msg_set_jobid(struct lustre_msg *msg, char *jobid) LASSERTF(pb, "invalid msg %p: no ptlrpc body!\n", msg); if (jobid) - memcpy(pb->pb_jobid, jobid, LUSTRE_JOBID_SIZE); + memcpy(pb->pb_jobid, jobid, sizeof(pb->pb_jobid)); else if (pb->pb_jobid[0] == '\0') - lustre_get_jobid(pb->pb_jobid); + lustre_get_jobid(pb->pb_jobid, sizeof(pb->pb_jobid)); return; } default: diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 401f7ef..4e1605a2 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -635,7 +635,7 @@ struct ptlrpc_body_v3 { __u64 pb_padding64_0; __u64 pb_padding64_1; __u64 pb_padding64_2; - char pb_jobid[LUSTRE_JOBID_SIZE]; /* req: ASCII MPI jobid from env */ + char pb_jobid[LUSTRE_JOBID_SIZE]; /* req: ASCII jobid from env + NUL */ }; #define ptlrpc_body ptlrpc_body_v3 From patchwork Thu Feb 27 21:08:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E8A114BC for ; Thu, 27 Feb 2020 21:19:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1727D246A1 for ; Thu, 27 Feb 2020 21:19:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1727D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AE39521FB82; Thu, 27 Feb 2020 13:18:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C3E3421FA67 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5D3558E9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5A58C46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:04 -0500 Message-Id: <1582838290-17243-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 016/622] lustre: ldlm: don't disable softirq for exp_rpc_lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Liang Zhen it is not necessary to call ldlm_lock_busy() in the context of timer callback, we can call it in thread context of expired_lock_main. With this change, we don't need to disable softirq for exp_rpc_lock. Instead of moving busy locks to the end of the waiting list one at a time in the context of the timer callback, move any locks that may be expired onto the expired list. If these locks are still being used by RPCs being processed, then put them back onto the end of the waiting list instead of evicting the client. For the linux client the impact of this change is change of spin_lock_bh() to spin_lock() for the exp_rpc_lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-6032 Lustre-commit: 292aa42e0897 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock") Signed-off-by: Liang Zhen Reviewed-on: https://review.whamcloud.com/12957 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index d57df36..3c61e83 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1307,9 +1307,9 @@ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, LASSERT(rc <= 1); } - spin_lock_bh(&req->rq_export->exp_rpc_lock); + spin_lock(&req->rq_export->exp_rpc_lock); list_add(&req->rq_exp_list, &req->rq_export->exp_hp_rpcs); - spin_unlock_bh(&req->rq_export->exp_rpc_lock); + spin_unlock(&req->rq_export->exp_rpc_lock); } ptlrpc_nrs_req_initialize(svcpt, req, rc); @@ -1327,9 +1327,9 @@ static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req) if (req->rq_ops->hpreq_fini) req->rq_ops->hpreq_fini(req); - spin_lock_bh(&req->rq_export->exp_rpc_lock); + spin_lock(&req->rq_export->exp_rpc_lock); list_del_init(&req->rq_exp_list); - spin_unlock_bh(&req->rq_export->exp_rpc_lock); + spin_unlock(&req->rq_export->exp_rpc_lock); } } From patchwork Thu Feb 27 21:08:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 488D314BC for ; Thu, 27 Feb 2020 21:19:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3155D246A1 for ; Thu, 27 Feb 2020 21:19:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3155D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ABB2B21FBA7; Thu, 27 Feb 2020 13:19:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 159D621FA67 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F2DF905; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5D53246D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:05 -0500 Message-Id: <1582838290-17243-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 017/622] lustre: obdclass: new wrapper to convert NID to string X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Liang Zhen This patch includes a couple of changes: - add new wrapper function obd_import_nid2str - use obd_import_nid2str and obd_export_nid2str to replace all libcfs_nid2str conversions for NID of export/import connection WC-bug-id: https://jira.whamcloud.com/browse/LU-6032 Lustre-commit: 61f9847a812f ("LU-6032 obdclass: new wrapper to convert NID to string") Signed-off-by: Liang Zhen Reviewed-on: https://review.whamcloud.com/12956 Reviewed-by: Dmitry Eremin Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 12 ++++++++++++ fs/lustre/ldlm/ldlm_lock.c | 4 ++-- fs/lustre/ptlrpc/client.c | 5 ++--- fs/lustre/ptlrpc/import.c | 6 +++--- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 146c37e..d896049 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -86,6 +86,18 @@ struct obd_device *class_devices_in_group(struct obd_uuid *grp_uuid, int obd_connect_flags2str(char *page, int count, u64 flags, u64 flags2, const char *sep); +static inline char *obd_export_nid2str(struct obd_export *exp) +{ + return exp->exp_connection ? + libcfs_nid2str(exp->exp_connection->c_peer.nid) : ""; +} + +static inline char *obd_import_nid2str(struct obd_import *imp) +{ + return imp->imp_connection ? + libcfs_nid2str(imp->imp_connection->c_peer.nid) : ""; +} + int obd_zombie_impexp_init(void); void obd_zombie_impexp_stop(void); void obd_zombie_barrier(void); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 7242cd1..aa19b89 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1987,11 +1987,11 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, vaf.va = &args; if (exp && exp->exp_connection) { - nid = libcfs_nid2str(exp->exp_connection->c_peer.nid); + nid = obd_export_nid2str(exp); } else if (exp && exp->exp_obd) { struct obd_import *imp = exp->exp_obd->u.cli.cl_import; - nid = libcfs_nid2str(imp->imp_connection->c_peer.nid); + nid = obd_import_nid2str(imp); } if (!resource) { diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index a533cbb..424db55 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1605,8 +1605,7 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req) current->comm, imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, - libcfs_nid2str(imp->imp_connection->c_peer.nid), - lustre_msg_get_opc(req->rq_reqmsg)); + obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg)); rc = ptl_send_rpc(req, 0); if (rc == -ENOMEM) { @@ -2017,7 +2016,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) current->comm, imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, - libcfs_nid2str(imp->imp_connection->c_peer.nid), + obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg)); spin_lock(&imp->imp_lock); diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index d032962..dca4aa0 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -171,13 +171,13 @@ int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) LCONSOLE_WARN("%s: Connection to %.*s (at %s) was lost; in progress operations using this service will wait for recovery to complete\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } else { LCONSOLE_ERROR_MSG(0x166, "%s: Connection to %.*s (at %s) was lost; in progress operations using this service will fail\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_DISCON); spin_unlock(&imp->imp_lock); @@ -1461,7 +1461,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) LCONSOLE_INFO("%s: Connection restored to %.*s (at %s)\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } if (imp->imp_state == LUSTRE_IMP_FULL) { From patchwork Thu Feb 27 21:08:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409673 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5084514BC for ; Thu, 27 Feb 2020 21:18:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 361E9246A1 for ; Thu, 27 Feb 2020 21:18:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 361E9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 12EC221FB5A; Thu, 27 Feb 2020 13:18:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D0EE21FA55 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 61F5D909; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 604C3468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:06 -0500 Message-Id: <1582838290-17243-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 018/622] lustre: ptlrpc: Add QoS for uid and gid in NRS-TBF X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Li Xi , Qian Yingjin , Teddy Chan , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Teddy Chan This patch add a new QoS feature in TBF policy which could limits the rate based on uid or gid. The policy is able to limit the rate both on MDT and OSS site. The command for this feature is like: Start the tbf uid QoS on OST: lctl set_param ost.OSS.*.nrs_policies="tbf uid" Limit the rate of ptlrpc requests of the uid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name uid={500} rate=100" Start the tbf gid QoS on OST: lctl set_param ost.OSS.*.nrs_policies="tbf gid" Limit the rate of ptlrpc requests of the gid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name gid={500} rate=100" or use generic tbf rule to mix them on OST: lctl set_param ost.OSS.*.nrs_policies="tbf" Limit the rate of ptlrpc requests of the uid 500 gid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name uid={500}&gid={500} rate=100" Also, you can use the following rule to control all reqs to mds: Start the tbf uid QoS on MDS: lctl set_param mds.MDS.*.nrs_policies="tbf uid" Limit the rate of ptlrpc requests of the uid 500 lctl set_param mds.MDS.*.nrs_tbf_rule= "start tbf_name uid={500} rate=100" For the linux client we need to send the uid and gid information to the NRS-TBF handling on the servers. WC-bug-id: https://jira.whamcloud.com/browse/LU-9658 Lustre-commit: e0cdde123c14 ("LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF") Signed-off-by: Teddy Chan Signed-off-by: Li Xi Signed-off-by: Wang Shilong Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/27608 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/vvp_object.c | 5 ++--- fs/lustre/obdclass/obdo.c | 5 +++++ fs/lustre/osc/osc_request.c | 10 ++++++++++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index 24cde0d..eeb8823 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -196,7 +196,7 @@ static int vvp_object_glimpse(const struct lu_env *env, static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, struct cl_req_attr *attr) { - u64 valid_flags = OBD_MD_FLTYPE; + u64 valid_flags = OBD_MD_FLTYPE | OBD_MD_FLUID | OBD_MD_FLGID; struct inode *inode; struct obdo *oa; @@ -204,8 +204,7 @@ static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, inode = vvp_object_inode(obj); if (attr->cra_type == CRT_WRITE) { - valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME | - OBD_MD_FLUID | OBD_MD_FLGID; + valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME; obdo_set_o_projid(oa, ll_i2info(inode)->lli_projid); } obdo_from_inode(oa, inode, valid_flags & attr->cra_flags); diff --git a/fs/lustre/obdclass/obdo.c b/fs/lustre/obdclass/obdo.c index 1926896..e5475f1 100644 --- a/fs/lustre/obdclass/obdo.c +++ b/fs/lustre/obdclass/obdo.c @@ -144,6 +144,11 @@ void lustre_set_wire_obdo(const struct obd_connect_data *ocd, if (!ocd) return; + if (!(wobdo->o_valid & OBD_MD_FLUID)) + wobdo->o_uid = from_kuid(&init_user_ns, current_uid()); + if (!(wobdo->o_valid & OBD_MD_FLGID)) + wobdo->o_gid = from_kgid(&init_user_ns, current_gid()); + if (unlikely(!(ocd->ocd_connect_flags & OBD_CONNECT_FID)) && fid_seq_is_echo(ostid_seq(&lobdo->o_oi))) { /* diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 300dee5..99c9620 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1184,6 +1184,16 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, lustre_set_wire_obdo(&req->rq_import->imp_connect_data, &body->oa, oa); + /* For READ and WRITE, we can't fill o_uid and o_gid using from_kuid() + * and from_kgid(), because they are asynchronous. Fortunately, variable + * oa contains valid o_uid and o_gid in these two operations. + * Besides, filling o_uid and o_gid is enough for nrs-tbf, see LU-9658. + * OBD_MD_FLUID and OBD_MD_FLUID is not set in order to avoid breaking + * other process logic + */ + body->oa.o_uid = oa->o_uid; + body->oa.o_gid = oa->o_gid; + obdo_to_ioobj(oa, ioobj); ioobj->ioo_bufcnt = niocount; /* The high bits of ioo_max_brw tells server _maximum_ number of bulks From patchwork Thu Feb 27 21:08:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E0C314BC for ; Thu, 27 Feb 2020 21:19:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 46BE4246A1 for ; Thu, 27 Feb 2020 21:19:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46BE4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D083821FB70; Thu, 27 Feb 2020 13:18:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C484921FA44 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6438091F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 632A446A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:07 -0500 Message-Id: <1582838290-17243-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 019/622] lustre: hsm: ignore compound_id X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" Ignore request compound ids in the HSM coordinator. Compound ids prevent batching of CDT to CT requests and degrade HSM performance. Use CT/archive id compatabiliy when deciding which HSM actions to put in a request. WC-bug-id: https://jira.whamcloud.com/browse/LU-10383 Lustre-commit: 9ee81f920bb3 ("LU-10383 hsm: ignore compound_id") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/30949 Reviewed-by: Quentin Bouget Reviewed-by: Faccini Bruno Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_idl.h | 2 +- include/uapi/linux/lustre/lustre_user.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 4e1605a2..307feb3 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2508,7 +2508,7 @@ struct llog_agent_req_rec { */ __u32 arr_archive_id; /**< backend archive number */ __u64 arr_flags; /**< req flags */ - __u64 arr_compound_id;/**< compound cookie */ + __u64 arr_compound_id;/**< compound cookie, ignored */ __u64 arr_req_create; /**< req. creation time */ __u64 arr_req_change; /**< req. status change time */ struct hsm_action_item arr_hai; /**< req. to the agent */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 27501a2..5405e1b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1729,7 +1729,7 @@ static inline char *hai_dump_data_field(struct hsm_action_item *hai, struct hsm_action_list { __u32 hal_version; __u32 hal_count; /* number of hai's to follow */ - __u64 hal_compound_id; /* returned by coordinator */ + __u64 hal_compound_id; /* returned by coordinator, ignored */ __u64 hal_flags; __u32 hal_archive_id; /* which archive backend */ __u32 padding1; From patchwork Thu Feb 27 21:08:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B5A714BC for ; Thu, 27 Feb 2020 21:19:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2424A246A1 for ; Thu, 27 Feb 2020 21:19:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2424A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9CD8521FBEC; Thu, 27 Feb 2020 13:19:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 120A421FA44 for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 672C79E0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 65F8C46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:08 -0500 Message-Id: <1582838290-17243-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 020/622] lnet: libcfs: remove unnecessary set_fs(KERNEL_DS) X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mike Marciniszyn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mike Marciniszyn When we converted to using kernel_write(), we left some set_fs() calls that are not unnecessary. Remove them. Original OpenSFS version of this patch, as mentioned below, did the full conversion to kernel_write. WC-bug-id: https://jira.whamcloud.com/browse/LU-10560 lustre-commit: b9a32054600a ("LU-10560 libcfs: Use kernel_write when appropriate") Signed-off-by: Mike Marciniszyn Reviewed-on: https://review.whamcloud.com/31154 Reviewed-by: James Simmons Reviewed-by: Dmitry Eremin Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin igned-off-by: James Simmons --- net/lnet/libcfs/tracefile.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index 3b29116..6e4cc31 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -807,7 +807,6 @@ int cfs_tracefile_dump_all_pages(char *filename) struct cfs_trace_page *tage; struct cfs_trace_page *tmp; char *buf; - mm_segment_t __oldfs; int rc; down_write(&cfs_tracefile_sem); @@ -828,8 +827,6 @@ int cfs_tracefile_dump_all_pages(char *filename) rc = 0; goto close; } - __oldfs = get_fs(); - set_fs(KERNEL_DS); /* ok, for now, just write the pages. in the future we'll be building * iobufs with the pages and calling generic_direct_IO @@ -851,7 +848,7 @@ int cfs_tracefile_dump_all_pages(char *filename) list_del(&tage->linkage); cfs_tage_free(tage); } - set_fs(__oldfs); + rc = vfs_fsync(filp, 1); if (rc) pr_err("sync returns %d\n", rc); From patchwork Thu Feb 27 21:08:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D1B9138D for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 25DAB246A1 for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 25DAB246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36CBA21FB3B; Thu, 27 Feb 2020 13:18:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 52B5F21FA7D for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6A0389E1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 68DFD46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:09 -0500 Message-Id: <1582838290-17243-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 021/622] lustre: ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Assertion fails on !desc->bd_registered during retry after ENOMEM. Drop bd_registered flag and exit via cleanup_bulk to ensure that bulk is fully unregistered. Cray-bug-id: MRP-4733 WC-bug-id: https://jira.whamcloud.com/browse/LU-10643 Lustre-commit: 4a81be263079 ("LU-10643 ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/31228 Reviewed-by: Alexandr Boyko Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/niobuf.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 653a456..67500b5 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -349,6 +349,7 @@ #define OBD_FAIL_PTLRPC_DROP_BULK 0x51a #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK 0x51b #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK 0x51c +#define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 #define OBD_FAIL_OBD_PING_NET 0x600 #define OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 02ed373..2e866fe 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -179,8 +179,13 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LNET_MD_OP_GET : LNET_MD_OP_PUT); ptlrpc_fill_bulk_md(&md, desc, posted_md); - rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, - LNET_UNLINK, LNET_INS_AFTER, &me_h); + if (posted_md > 0 && posted_md + 1 == total_md && + OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) { + rc = -ENOMEM; + } else { + rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, + LNET_UNLINK, LNET_INS_AFTER, &me_h); + } if (rc != 0) { CERROR("%s: LNetMEAttach failed x%llu/%d: rc = %d\n", desc->bd_import->imp_obd->obd_name, mbits, @@ -209,6 +214,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LASSERT(desc->bd_md_count >= 0); mdunlink_iterate_helper(desc->bd_mds, desc->bd_md_max_brw); req->rq_status = -ENOMEM; + desc->bd_registered = 0; return -ENOMEM; } @@ -585,7 +591,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) if (request->rq_bulk) { rc = ptlrpc_register_bulk(request); if (rc != 0) - goto out; + goto cleanup_bulk; /* * All the mds in the request will have the same cpt * encoded in the cookie. So we can just get the first From patchwork Thu Feb 27 21:08:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62A54138D for ; Thu, 27 Feb 2020 21:19:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4A962246A1 for ; Thu, 27 Feb 2020 21:19:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A962246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B798321FC22; Thu, 27 Feb 2020 13:19:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9482321FA3C for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6EACB9E3; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6BC7246D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:10 -0500 Message-Id: <1582838290-17243-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 022/622] lustre: llite: yield cpu after call to ll_agl_trigger X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler The statahead and agl threads loop over all entries in the directory without yielding the CPU. If the number of entries in the directory is large enough then these threads may trigger soft lockups. The fix is to add calls to cond_resched() after calling ll_agl_trigger(), which gets the glimpse lock for a file. Cray-bug-id: LUS-2584 WC-bug-id: https://jira.whamcloud.com/browse/LU-10649 Lustre-commit: 031001f0d438 ("LU-10649 llite: yield cpu after call to ll_agl_trigger") Signed-off-by: Ann Koehler Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/31240 Reviewed-by: Patrick Farrell Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/statahead.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 99b3fee..4a61dac 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -907,6 +907,7 @@ static int ll_agl_thread(void *arg) list_del_init(&clli->lli_agl_list); spin_unlock(&plli->lli_agl_lock); ll_agl_trigger(&clli->lli_vfs_inode, sai); + cond_resched(); } else { spin_unlock(&plli->lli_agl_lock); } @@ -1071,7 +1072,7 @@ static int ll_statahead_thread(void *arg) ll_agl_trigger(&clli->lli_vfs_inode, sai); - + cond_resched(); spin_lock(&lli->lli_agl_lock); } spin_unlock(&lli->lli_agl_lock); From patchwork Thu Feb 27 21:08:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ADE414BC for ; Thu, 27 Feb 2020 21:19:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 83E4C246A1 for ; Thu, 27 Feb 2020 21:19:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83E4C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D33D621FB98; Thu, 27 Feb 2020 13:19:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D8D3621FA84 for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 70B869E8; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F16E468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:11 -0500 Message-Id: <1582838290-17243-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 023/622] lustre: osc: Do not request more than 2GiB grant X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The server enforces a grant limit of 2 GiB, which the client must honor. The existing client code combined with 16 MiB RPCs make it possible for the client to ask for more than this limit. Make this limit explicit, and also fix an overflow bug in o_undirty calculation in osc_announce_cached. (o_undirty is a 32 bit value and 16 MiB*256 rpcs_in_flight = 4 GiB. 4 GiB + extra grant components overflows o_undirty.) Cray-bug-id: LUS-5750 WC-bug-id: https://jira.whamcloud.com/browse/LU-10776 Lustre-commit: c0246d887809 ("LU-10776 osc: Do not request more than 2GiB grant") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/31533 Reviewed-by: Nathaniel Clark Reviewed-by: Bobi Jam Reviewed-by: Andrew Perepechko Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 10 ++++++++-- include/uapi/linux/lustre/lustre_idl.h | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 99c9620..c430239 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -664,11 +664,12 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, oa->o_undirty = 0; } else { unsigned long nrpages; + unsigned long undirty; nrpages = cli->cl_max_pages_per_rpc; nrpages *= cli->cl_max_rpcs_in_flight + 1; nrpages = max(nrpages, cli->cl_dirty_max_pages); - oa->o_undirty = nrpages << PAGE_SHIFT; + undirty = nrpages << PAGE_SHIFT; if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data, GRANT_PARAM)) { int nrextents; @@ -679,8 +680,13 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, */ nrextents = DIV_ROUND_UP(nrpages, cli->cl_max_extent_pages); - oa->o_undirty += nrextents * cli->cl_grant_extent_tax; + undirty += nrextents * cli->cl_grant_extent_tax; } + /* Do not ask for more than OBD_MAX_GRANT - a margin for server + * to add extent tax, etc. + */ + oa->o_undirty = min(undirty, OBD_MAX_GRANT - + (PTLRPC_MAX_BRW_PAGES << PAGE_SHIFT)*4UL); } oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant; oa->o_dropped = cli->cl_lost_grant; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 307feb3..0bce63d 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1213,6 +1213,8 @@ struct hsm_state_set { * it to sync quickly */ +#define OBD_MAX_GRANT 0x7fffffffUL /* Max grant allowed to one client: 2 GiB */ + #define OBD_OBJECT_EOF LUSTRE_EOF #define OST_MIN_PRECREATE 32 From patchwork Thu Feb 27 21:08:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB444138D for ; Thu, 27 Feb 2020 21:19:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A3D1C246A3 for ; Thu, 27 Feb 2020 21:19:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3D1C246A3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 134F921FAC8; Thu, 27 Feb 2020 13:19:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27EF221FA84 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 735FF9EA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7204E46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:12 -0500 Message-Id: <1582838290-17243-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 024/622] lustre: llite: rename FSFILT_IOC_* to system flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jinshan Xiong Those definitions were probably created for compatibility. Now that FS_IOC_* have been existing in kernel for long time, we should use them to avoid confusion. WC-bug-id: https://jira.whamcloud.com/browse/LU-10779 Lustre-commit: 7e3fc106d6e7 ("LU-10779 llite: rename FSFILT_IOC_* to system flags") Signed-off-by: Jinshan Xiong Reviewed-on: https://review.whamcloud.com/31546 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 13 +++++++------ fs/lustre/llite/file.c | 19 ++++++++++--------- fs/lustre/llite/llite_lib.c | 4 ++-- 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f21727b..b006e32 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1108,18 +1108,19 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_IOCTL, 1); switch (cmd) { - case FSFILT_IOC_GETFLAGS: - case FSFILT_IOC_SETFLAGS: + case FS_IOC_GETFLAGS: + case FS_IOC_SETFLAGS: return ll_iocontrol(inode, file, cmd, arg); - case FSFILT_IOC_GETVERSION_OLD: case FSFILT_IOC_GETVERSION: + case FS_IOC_GETVERSION: return put_user(inode->i_generation, (int __user *)arg); /* We need to special case any other ioctls we want to handle, * to send them to the MDS/OST as appropriate and to properly * network encode the arg field. - case FSFILT_IOC_SETVERSION_OLD: - case FSFILT_IOC_SETVERSION: - */ + */ + case FS_IOC_SETVERSION: + return -ENOTSUPP; + case LL_IOC_GET_MDTIDX: { int mdtidx; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fe965b1..c3fb104b 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3055,12 +3055,19 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, case LL_IOC_LOV_GETSTRIPE: case LL_IOC_LOV_GETSTRIPE_NEW: return ll_file_getstripe(inode, (void __user *)arg, 0); - case FSFILT_IOC_GETFLAGS: - case FSFILT_IOC_SETFLAGS: + case FS_IOC_GETFLAGS: + case FS_IOC_SETFLAGS: return ll_iocontrol(inode, file, cmd, arg); - case FSFILT_IOC_GETVERSION_OLD: case FSFILT_IOC_GETVERSION: + case FS_IOC_GETVERSION: return put_user(inode->i_generation, (int __user *)arg); + /* We need to special case any other ioctls we want to handle, + * to send them to the MDS/OST as appropriate and to properly + * network encode the arg field. + */ + case FS_IOC_SETVERSION: + return -ENOTSUPP; + case LL_IOC_GROUP_LOCK: return ll_get_grouplock(inode, file, arg); case LL_IOC_GROUP_UNLOCK: @@ -3068,12 +3075,6 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, case IOC_OBD_STATFS: return ll_obd_statfs(inode, (void __user *)arg); - /* We need to special case any other ioctls we want to handle, - * to send them to the MDS/OST as appropriate and to properly - * network encode the arg field. - case FSFILT_IOC_SETVERSION_OLD: - case FSFILT_IOC_SETVERSION: - */ case LL_IOC_FLUSHCTX: return ll_flush_ctx(inode); case LL_IOC_PATH2FID: { diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 7580d57..e2c7a4d 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2037,7 +2037,7 @@ int ll_iocontrol(struct inode *inode, struct file *file, int rc, flags = 0; switch (cmd) { - case FSFILT_IOC_GETFLAGS: { + case FS_IOC_GETFLAGS: { struct mdt_body *body; struct md_op_data *op_data; @@ -2065,7 +2065,7 @@ int ll_iocontrol(struct inode *inode, struct file *file, return put_user(flags, (int __user *)arg); } - case FSFILT_IOC_SETFLAGS: { + case FS_IOC_SETFLAGS: { struct md_op_data *op_data; struct cl_object *obj; struct iattr *attr; From patchwork Thu Feb 27 21:08:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28D7014BC for ; Thu, 27 Feb 2020 21:19:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 118AF246A1 for ; Thu, 27 Feb 2020 21:19:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 118AF246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BFC1F21FBD5; Thu, 27 Feb 2020 13:19:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8386421FA84 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 764989ED; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 74F3946C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:13 -0500 Message-Id: <1582838290-17243-26-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 025/622] lnet: fix nid range format '*@' support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu In cfs_ip_min_max(), (nidrange->nr_all == 1) means this nid range is a full IP address range(*.*.*.*). In this case, we don't need to compare it to any other nid range, but set min_nid to 0.0.0.0 and max_nid to 255.255.255.255 directly. WC-bug-id: https://jira.whamcloud.com/browse/LU-8913 Lustre-commit: 230266326f49 ("LU-8913 nodemap: fix nodemap range format '*@' support") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/31684 Reviewed-by: Sebastien Buisson Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/nidstrings.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c index b4e38e5..13338d0 100644 --- a/net/lnet/lnet/nidstrings.c +++ b/net/lnet/lnet/nidstrings.c @@ -680,6 +680,12 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, if (nidlist_count > 0) return -EINVAL; + if (nr->nr_all) { + min_ip_addr = 0; + max_ip_addr = 0xffffffff; + break; + } + list_for_each_entry(ar, &nr->nr_addrranges, ar_link) { rc = cfs_ip_ar_min_max(ar, &tmp_min_ip_addr, &tmp_max_ip_addr); From patchwork Thu Feb 27 21:08:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24382138D for ; Thu, 27 Feb 2020 21:19:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D6F6246A1 for ; Thu, 27 Feb 2020 21:19:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D6F6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89BD421FBB0; Thu, 27 Feb 2020 13:19:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C9E8221FA93 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 791F49EF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 77B6D46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:14 -0500 Message-Id: <1582838290-17243-27-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 026/622] lustre: ptlrpc: fix test_req_buffer_pressure behavior X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini In 2nd patch for LU-9372, to allow limiting number of rqbd-buffers, a wrong and unnecessary test had been added to enhance test_req_buffer_pressure feature. This patch fixes this issue by removing such test. WC-bug-id: https://jira.whamcloud.com/browse/LU-10826 Lustre-commit: 040eca67f8d5 ("LU-10826 ptlrpc: fix test_req_buffer_pressure behavior") Signed-off-by: Bruno Faccini Reviewed-on: https://review.whamcloud.com/31690 Reviewed-by: Wang Shilong Reviewed-by: Li Dongyang Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 3c61e83..8dae21a 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -150,8 +150,7 @@ /* NB: another thread might have recycled enough rqbds, we * need to make sure it wouldn't over-allocate, see LU-1212. */ - if (test_req_buffer_pressure || - svcpt->scp_nrqbds_posted >= svc->srv_nbuf_per_group || + if (svcpt->scp_nrqbds_posted >= svc->srv_nbuf_per_group || (svc->srv_nrqbds_max != 0 && svcpt->scp_nrqbds_total > svc->srv_nrqbds_max)) break; From patchwork Thu Feb 27 21:08:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8A1A14BC for ; Thu, 27 Feb 2020 21:19:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9174C246A2 for ; Thu, 27 Feb 2020 21:19:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9174C246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7A96C21FAB4; Thu, 27 Feb 2020 13:19:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1952521FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7C4C19F0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A7BD46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:15 -0500 Message-Id: <1582838290-17243-28-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 027/622] lustre: lu_object: improve debug message for lu_object_put() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov Use a top level object in debug in lu_object_put to match with lu_object_get. WC-bug-id: https://jira.whamcloud.com/browse/LU-LU-10877 Lustre-commit: fd669eba1921 ("LU-10877 lu: fix reference leak") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/31870 Reviewed-by: Andrew Perepechko Reviewed-by: Sergey Cheremencev Reviewed-by: Alex Zhuravlev Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_object.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index d8dfc721..2ab4977 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -184,8 +184,8 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) LASSERT(list_empty(&top->loh_lru)); list_add_tail(&top->loh_lru, &bkt->lsb_lru); percpu_counter_inc(&site->ls_lru_len_counter); - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", - o, site->ls_obj_hash, bkt); + CDEBUG(D_INODE, "Add %p/%p to site lru. hash: %p, bkt: %p\n", + orig, top, site->ls_obj_hash, bkt); cfs_hash_bd_unlock(site->ls_obj_hash, &bd, 1); return; } From patchwork Thu Feb 27 21:08:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409685 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F1C914BC for ; Thu, 27 Feb 2020 21:19:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14932246A2 for ; Thu, 27 Feb 2020 21:19:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14932246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 92CF021FB5C; Thu, 27 Feb 2020 13:19:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AC7521FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7EBB79F1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7D3F7468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:16 -0500 Message-Id: <1582838290-17243-29-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 028/622] lustre: idl: remove obsolete directory split flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The directory split functionality from the old CMD (pre-DNE) feature was never usable in production, and was removed before the DNE 2.4 release. Remove old flags relating to this feature. WC-bug-id: https://jira.whamcloud.com/browse/LU-1187 Lustre-commit: 5c53c353fd82 ("LU-1187 idl: remove obsolete directory split flags") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31700 Reviewed-by: James Simmons Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_lib.c | 2 -- fs/lustre/ptlrpc/wiretest.c | 4 ---- include/uapi/linux/lustre/lustre_idl.h | 4 ++-- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index d4b2bb9..467503c 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -520,8 +520,6 @@ void mdc_getattr_pack(struct ptlrpc_request *req, u64 valid, u32 flags, &RMF_MDT_BODY); b->mbo_valid = valid; - if (op_data->op_bias & MDS_CHECK_SPLIT) - b->mbo_valid |= OBD_MD_FLCKSPLIT; if (op_data->op_bias & MDS_CROSS_REF) b->mbo_valid |= OBD_MD_FLCROSSREF; b->mbo_eadatasize = ea_size; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 21698cc..bcd0229 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1341,8 +1341,6 @@ void lustre_assert_wire_constants(void) OBD_MD_FLMDSCAPA); LASSERTF(OBD_MD_FLOSSCAPA == (0x0000040000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLOSSCAPA); - LASSERTF(OBD_MD_FLCKSPLIT == (0x0000080000000000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLCKSPLIT); LASSERTF(OBD_MD_FLCROSSREF == (0x0000100000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLCROSSREF); LASSERTF(OBD_MD_FLGETATTRLOCK == (0x0000200000000000ULL), "found 0x%.16llxULL\n", @@ -1866,8 +1864,6 @@ void lustre_assert_wire_constants(void) LASSERTF((int)sizeof(((struct ll_fid *)0)->f_type) == 4, "found %lld\n", (long long)(int)sizeof(((struct ll_fid *)0)->f_type)); - LASSERTF(MDS_CHECK_SPLIT == 0x00000001UL, "found 0x%.8xUL\n", - (unsigned int)MDS_CHECK_SPLIT); LASSERTF(MDS_CROSS_REF == 0x00000002UL, "found 0x%.8xUL\n", (unsigned int)MDS_CROSS_REF); LASSERTF(MDS_VTX_BYPASS == 0x00000004UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 0bce63d..589bb81 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1131,7 +1131,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) /* OBD_MD_FLRMTPERM (0x0000010000000000ULL) remote perm, obsolete */ #define OBD_MD_FLMDSCAPA (0x0000020000000000ULL) /* MDS capability */ #define OBD_MD_FLOSSCAPA (0x0000040000000000ULL) /* OSS capability */ -#define OBD_MD_FLCKSPLIT (0x0000080000000000ULL) /* Check split on server */ +/* OBD_MD_FLCKSPLIT (0x0000080000000000ULL) obsolete 2.3.58*/ #define OBD_MD_FLCROSSREF (0x0000100000000000ULL) /* Cross-ref case */ #define OBD_MD_FLGETATTRLOCK (0x0000200000000000ULL) /* Get IOEpoch attributes * under lock; for xattr @@ -1640,7 +1640,7 @@ struct mdt_rec_setattr { #define MDS_ATTR_PROJID 0x10000ULL /* = 65536 */ enum mds_op_bias { - MDS_CHECK_SPLIT = 1 << 0, +/* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ MDS_CROSS_REF = 1 << 1, MDS_VTX_BYPASS = 1 << 2, MDS_PERM_BYPASS = 1 << 3, From patchwork Thu Feb 27 21:08:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1CC0614BC for ; Thu, 27 Feb 2020 21:19:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05377246A1 for ; Thu, 27 Feb 2020 21:19:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05377246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1188B21FAEC; Thu, 27 Feb 2020 13:19:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA5FD21FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 819679F4; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8017D46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:17 -0500 Message-Id: <1582838290-17243-30-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 029/622] lustre: mdc: resend quotactl if needed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In mdc_quotactl, it is better to resend the quotactl request if reconnection or failover is triggered during the process. WC-bug-id: https://jira.whamcloud.com/browse/LU-10368 Lustre-commit: d511918e8eb7 ("LU-10368 mdc: resend quotactl if needed") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/31773 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 5718db2..feac374 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1867,7 +1867,7 @@ static int mdc_ioc_hsm_ct_start(struct obd_export *exp, struct lustre_kernelcomm *lk); static int mdc_quotactl(struct obd_device *unused, struct obd_export *exp, - struct obd_quotactl *oqctl) + struct obd_quotactl *oqctl) { struct ptlrpc_request *req; struct obd_quotactl *oqc; @@ -1884,7 +1884,6 @@ static int mdc_quotactl(struct obd_device *unused, struct obd_export *exp, ptlrpc_request_set_replen(req); ptlrpc_at_set_req_timeout(req); - req->rq_no_resend = 1; rc = ptlrpc_queue_wait(req); if (rc) From patchwork Thu Feb 27 21:08:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 357E914BC for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1DED9246A0 for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DED9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 155D321FF21; Thu, 27 Feb 2020 13:20:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0BD1A21FA58 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 843D39F5; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8301D46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:18 -0500 Message-Id: <1582838290-17243-31-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 030/622] lustre: obd: create ping sysfs file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" We have ping in the lustre debugfs tree. Its a perfect fit for sysfs. Create a sysfs equivalent so we can in time remove the debugfs file. WC-bug-id: https://jira.hpdd.intel.com/browse/LU-8066 Lustre-commit: 0100ab268c31 ("LU-8066 obd: final pieces for sysfs/debugfs support") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/28108 Lustre-commit: 6bbae72c6900 ("LU-8066 sysfs: make ping sysfs file read and writable") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33776 Reviewed-by: Dmitry Eremin Reviewed-by: Ben Evans Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 6 ++++-- fs/lustre/mdc/lproc_mdc.c | 7 +++---- fs/lustre/mgc/lproc_mgc.c | 7 +++---- fs/lustre/osc/lproc_osc.c | 7 +++---- fs/lustre/ptlrpc/lproc_ptlrpc.c | 18 ++++++++---------- 5 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 965f8a1..32d43fb 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -457,8 +457,10 @@ int lprocfs_wr_uint(struct file *file, const char __user *buffer, struct adaptive_timeout; int lprocfs_at_hist_helper(struct seq_file *m, struct adaptive_timeout *at); int lprocfs_rd_timeouts(struct seq_file *m, void *data); -int lprocfs_wr_ping(struct file *file, const char __user *buffer, - size_t count, loff_t *off); + +ssize_t ping_show(struct kobject *kobj, struct attribute *attr, + char *buffer); + int lprocfs_wr_import(struct file *file, const char __user *buffer, size_t count, loff_t *off); int lprocfs_rd_pinger_recov(struct seq_file *m, void *n); diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index f09292e..6b87e76 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -306,6 +306,8 @@ static ssize_t max_mod_rpcs_in_flight_store(struct kobject *kobj, #define mdc_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(mdc_conn_uuid); +LUSTRE_RO_ATTR(ping); + static ssize_t mdc_rpc_stats_seq_write(struct file *file, const char __user *buf, size_t len, loff_t *off) @@ -454,8 +456,6 @@ static ssize_t mdc_stats_seq_write(struct file *file, } LPROC_SEQ_FOPS(mdc_stats); -LPROC_SEQ_FOPS_WR_ONLY(mdc, ping); - LPROC_SEQ_FOPS_RO_TYPE(mdc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(mdc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(mdc, timeouts); @@ -465,8 +465,6 @@ static ssize_t mdc_stats_seq_write(struct file *file, LPROC_SEQ_FOPS_RW_TYPE(mdc, pinger_recov); static struct lprocfs_vars lprocfs_mdc_obd_vars[] = { - { .name = "ping", - .fops = &mdc_ping_fops }, { .name = "connect_flags", .fops = &mdc_connect_flags_fops }, { .name = "mds_server_uuid", @@ -500,6 +498,7 @@ static ssize_t mdc_stats_seq_write(struct file *file, &lustre_attr_max_mod_rpcs_in_flight.attr, &lustre_attr_max_pages_per_rpc.attr, &lustre_attr_mdc_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/mgc/lproc_mgc.c b/fs/lustre/mgc/lproc_mgc.c index d977d51..4c276f9 100644 --- a/fs/lustre/mgc/lproc_mgc.c +++ b/fs/lustre/mgc/lproc_mgc.c @@ -45,8 +45,6 @@ LPROC_SEQ_FOPS_RO_TYPE(mgc, state); -LPROC_SEQ_FOPS_WR_ONLY(mgc, ping); - static int mgc_ir_state_seq_show(struct seq_file *m, void *v) { return lprocfs_mgc_rd_ir_state(m, m->private); @@ -55,8 +53,6 @@ static int mgc_ir_state_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(mgc_ir_state); struct lprocfs_vars lprocfs_mgc_obd_vars[] = { - { .name = "ping", - .fops = &mgc_ping_fops }, { .name = "connect_flags", .fops = &mgc_connect_flags_fops }, { .name = "mgs_server_uuid", @@ -73,8 +69,11 @@ struct lprocfs_vars lprocfs_mgc_obd_vars[] = { #define mgs_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(mgs_conn_uuid); +LUSTRE_RO_ATTR(ping); + static struct attribute *mgc_attrs[] = { &lustre_attr_mgs_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index df48138..605a236 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -176,6 +176,8 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj, #define ost_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(ost_conn_uuid); +LUSTRE_RO_ATTR(ping); + static int osc_cached_mb_seq_show(struct seq_file *m, void *v) { struct obd_device *dev = m->private; @@ -601,14 +603,10 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); LPROC_SEQ_FOPS_RO_TYPE(osc, state); -LPROC_SEQ_FOPS_WR_ONLY(osc, ping); - LPROC_SEQ_FOPS_RW_TYPE(osc, import); LPROC_SEQ_FOPS_RW_TYPE(osc, pinger_recov); static struct lprocfs_vars lprocfs_osc_obd_vars[] = { - { .name = "ping", - .fops = &osc_ping_fops }, { .name = "connect_flags", .fops = &osc_connect_flags_fops }, { .name = "ost_server_uuid", @@ -812,6 +810,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_short_io_bytes.attr, &lustre_attr_resend_count.attr, &lustre_attr_ost_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 3dc99d4..e48a4e8 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -1227,13 +1227,11 @@ void ptlrpc_lprocfs_unregister_obd(struct obd_device *obd) } EXPORT_SYMBOL(ptlrpc_lprocfs_unregister_obd); -#undef BUFLEN - -int lprocfs_wr_ping(struct file *file, const char __user *buffer, - size_t count, loff_t *off) +ssize_t ping_show(struct kobject *kobj, struct attribute *attr, + char *buffer) { - struct seq_file *m = file->private_data; - struct obd_device *obd = m->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct ptlrpc_request *req; int rc; @@ -1249,13 +1247,13 @@ int lprocfs_wr_ping(struct file *file, const char __user *buffer, req->rq_send_state = LUSTRE_IMP_FULL; rc = ptlrpc_queue_wait(req); - ptlrpc_req_finished(req); - if (rc >= 0) - return count; + return rc; } -EXPORT_SYMBOL(lprocfs_wr_ping); +EXPORT_SYMBOL(ping_show); + +#undef BUFLEN /* Write the connection UUID to this file to attempt to connect to that node. * The connection UUID is a node's primary NID. For example, From patchwork Thu Feb 27 21:08:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BF8B138D for ; Thu, 27 Feb 2020 21:19:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14A32246A1 for ; Thu, 27 Feb 2020 21:19:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14A32246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E65621FB95; Thu, 27 Feb 2020 13:19:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 638CF21FAA9 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8901A9F6; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 85F2C46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:19 -0500 Message-Id: <1582838290-17243-32-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 031/622] lustre: ldlm: change LDLM_POOL_ADD_VAR macro to inline function X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Dmitry Eremin , Oleg Drokin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Simple cleanup to create inline funciton ldlm_pool_add_var(). WC-bug-id: https://jira.hpdd.intel.com/browse/LU-8066 Lustre-commit: 05a36534ba2d ("LU-8066 ldlm: move all remaining files from procfs to debugfs") Signed-off-by: Dmitry Eremin Signed-off-by: Oleg Drokin Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/29255 WC-bug-id: https://jira.hpdd.intel.com/browse/LU-3319 Lustre-commit: 4ad445ccd54 ("LU-3319 procfs: move ldlm proc handling over to seq_file") Reviewed-on: http://review.whamcloud.com/7293 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: Peng Tao Reviewed-by: Bob Glossman Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_internal.h | 10 ++++++++++ fs/lustre/ldlm/ldlm_pool.c | 11 ++--------- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 6e54521..96dff1d 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -292,6 +292,16 @@ enum ldlm_policy_res { } \ struct __##var##__dummy_write {; } /* semicolon catcher */ +static inline void +ldlm_add_var(struct lprocfs_vars *vars, struct dentry *debugfs_entry, + const char *name, void *data, const struct file_operations *ops) +{ + vars->name = name; + vars->data = data; + vars->fops = ops; + ldebugfs_add_vars(debugfs_entry, vars, NULL); +} + static inline int is_granted_or_cancelled(struct ldlm_lock *lock) { int ret = 0; diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c index 04bf5de..d2149a6 100644 --- a/fs/lustre/ldlm/ldlm_pool.c +++ b/fs/lustre/ldlm/ldlm_pool.c @@ -504,14 +504,6 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr, LDLM_POOL_SYSFS_WRITER_NOLOCK_STORE(lock_volume_factor, atomic); LUSTRE_RW_ATTR(lock_volume_factor); -#define LDLM_POOL_ADD_VAR(_name, var, ops) \ - do { \ - pool_vars[0].name = #_name; \ - pool_vars[0].data = var; \ - pool_vars[0].fops = ops; \ - ldebugfs_add_vars(pl->pl_debugfs_entry, pool_vars, NULL);\ - } while (0) - /* These are for pools in /sys/fs/lustre/ldlm/namespaces/.../pool */ static struct attribute *ldlm_pl_attrs[] = { &lustre_attr_grant_speed.attr, @@ -571,7 +563,8 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl) memset(pool_vars, 0, sizeof(pool_vars)); - LDLM_POOL_ADD_VAR(state, pl, &lprocfs_pool_state_fops); + ldlm_add_var(&pool_vars[0], pl->pl_debugfs_entry, "state", pl, + &lprocfs_pool_state_fops); pl->pl_stats = lprocfs_alloc_stats(LDLM_POOL_LAST_STAT - LDLM_POOL_FIRST_STAT, 0); From patchwork Thu Feb 27 21:08:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E8BA138D for ; Thu, 27 Feb 2020 21:20:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15B91246A1 for ; Thu, 27 Feb 2020 21:20:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15B91246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EBEF221FC40; Thu, 27 Feb 2020 13:19:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A659C21FA75 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8BC6A9F7; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 890C846D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:20 -0500 Message-Id: <1582838290-17243-33-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 032/622] lustre: obdecho: use vmalloc for lnb X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger When allocating the niobuf_local, if there are a large number of (potential) fragments this allocation can be quite large. Use kvmalloc_array() and kvfree() to avoid allocation errors and console noise. This was causing sanity test_180c to fail in a VM on occasion, and could also be problem in real use. WC-bug-id: https://jira.whamcloud.com/browse/LU-10903 Lustre-commit: 8878bab7ae5f ("LU-10903 obdecho: use OBD_ALLOC_LARGE for lnb") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31964 Reviewed-by: Emoly Liu Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdecho/echo_client.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 3984cb4..0735a5a 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1343,7 +1343,8 @@ static int echo_client_prep_commit(const struct lu_env *env, npages = batch >> PAGE_SHIFT; tot_pages = count >> PAGE_SHIFT; - lnb = kcalloc(npages, sizeof(struct niobuf_local), GFP_NOFS); + lnb = kvmalloc_array(npages, sizeof(struct niobuf_local), + GFP_NOFS | __GFP_ZERO); if (!lnb) { ret = -ENOMEM; goto out; @@ -1411,7 +1412,7 @@ static int echo_client_prep_commit(const struct lu_env *env, } out: - kfree(lnb); + kvfree(lnb); return ret; } From patchwork Thu Feb 27 21:08:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F8A214BC for ; Thu, 27 Feb 2020 21:20:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 77993246A1 for ; Thu, 27 Feb 2020 21:20:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77993246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35E5321FC5F; Thu, 27 Feb 2020 13:19:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E9DC721FA75 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8DAE59F8; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C9C5468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:21 -0500 Message-Id: <1582838290-17243-34-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 033/622] lustre: mdc: deny layout swap for DoM file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Layout swap is prohibited for DoM files until LU-10177 will be implemented. The only exception is the new layout having the same DoM component. WC-bug-id: https://jira.whamcloud.com/browse/LU-10910 Lustre-commit: 51c11d7cfaff ("LU-10910 mdd: deny layout swap for DoM file") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32044 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 80e3120..21dc83e 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -149,7 +149,8 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env, * writers can share a single PW lock. */ mode = mdc_dom_lock_match(env, osc_export(obj), resname, LDLM_IBITS, - policy, LCK_PR | LCK_PW, &flags, obj, &lockh, + policy, LCK_PR | LCK_PW | LCK_GROUP, &flags, + obj, &lockh, dap_flags & OSC_DAP_FL_CANCELING); if (mode) { lock = ldlm_handle2lock(&lockh); From patchwork Thu Feb 27 21:08:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C5F514BC for ; Thu, 27 Feb 2020 21:19:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14489246A1 for ; Thu, 27 Feb 2020 21:19:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14489246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5E1BE21FC5F; Thu, 27 Feb 2020 13:19:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3641621FA63 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 90E979F9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F7A446A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:22 -0500 Message-Id: <1582838290-17243-35-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 034/622] lustre: mgc: remove obsolete IR swabbing workaround X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The OBD_CONNECT_MNE_SWAB check was added to the MGC for compatibility with servers in the 2.2.0-2.2.55 range (in 2012) with big-endian clients. 2.2 was not an LTS release and is no longer being used. Remove the checks on the client for OBD_CONNECT_MNE_SWAB being set, and assume that the server does not have this bug. This will allow the removal of the rest of this workaround from the server code once there are no more clients depending on the presence of this flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-1644 Lustre-commit: a0c644fde340 ("LU-1644 mgc: remove obsolete IR swabbing workaround") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32087 Reviewed-by: John L. Hammond Reviewed-by: Jinshan Xiong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 4 ---- fs/lustre/mgc/mgc_request.c | 9 +-------- fs/lustre/ptlrpc/import.c | 21 --------------------- 3 files changed, 1 insertion(+), 33 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 522e5b7..0d7bb0f 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -289,10 +289,6 @@ struct obd_import { imp_resend_replay:1, /* disable normal recovery, for test only. */ imp_no_pinger_recover:1, -#if OBD_OCD_VERSION(3, 0, 53, 0) > LUSTRE_VERSION_CODE - /* need IR MNE swab */ - imp_need_mne_swab:1, -#endif /* import must be reconnected instead of * chosing new connection */ diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index ca4b8a9..c114aa8 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -1436,14 +1436,7 @@ static int mgc_process_recover_log(struct obd_device *obd, goto out; } - mne_swab = !!ptlrpc_rep_need_swab(req); -#if OBD_OCD_VERSION(3, 0, 53, 0) > LUSTRE_VERSION_CODE - /* This import flag means the server did an extra swab of IR MNE - * records (fixed in LU-1252), reverse it here if needed. LU-1644 - */ - if (unlikely(req->rq_import->imp_need_mne_swab)) - mne_swab = !mne_swab; -#endif + mne_swab = ptlrpc_rep_need_swab(req); for (i = 0; i < nrpages && ealen > 0; i++) { int rc2; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index dca4aa0..f69b907 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -780,27 +780,6 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, warned = true; } -#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0) - /* - * Check if server has LU-1252 fix applied to not always swab - * the IR MNE entries. Do this only once per connection. This - * fixup is version-limited, because we don't want to carry the - * OBD_CONNECT_MNE_SWAB flag around forever, just so long as we - * need interop with unpatched 2.2 servers. For newer servers, - * the client will do MNE swabbing only as needed. LU-1644 - */ - if (unlikely((ocd->ocd_connect_flags & OBD_CONNECT_VERSION) && - !(ocd->ocd_connect_flags & OBD_CONNECT_MNE_SWAB) && - OBD_OCD_VERSION_MAJOR(ocd->ocd_version) == 2 && - OBD_OCD_VERSION_MINOR(ocd->ocd_version) == 2 && - OBD_OCD_VERSION_PATCH(ocd->ocd_version) < 55 && - !strcmp(imp->imp_obd->obd_type->typ_name, - LUSTRE_MGC_NAME))) - imp->imp_need_mne_swab = 1; - else /* clear if server was upgraded since last connect */ - imp->imp_need_mne_swab = 0; -#endif - if (ocd->ocd_connect_flags & OBD_CONNECT_CKSUM) { /* * We sent to the server ocd_cksum_types with bits set From patchwork Thu Feb 27 21:08:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3EC814BC for ; Thu, 27 Feb 2020 21:19:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 89CD4246A1 for ; Thu, 27 Feb 2020 21:19:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89CD4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4F0A21FBC7; Thu, 27 Feb 2020 13:19:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B84921FAB0 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9429B9FA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 927E346C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:23 -0500 Message-Id: <1582838290-17243-36-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 035/622] lustre: ptlrpc: add dir migration connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Add dir migration connect flag to prevent collision with other features. Though dir migration code exists, it will be reworked, and the new RPC protocol won't be compatible with current one. Also handle the previously-added OBD_CONNECT2_FLR flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 14b98596fa24 ("LU-4684 ptlrpc: add dir migration connect flag") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31914 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 8 ++++++-- fs/lustre/ptlrpc/wiretest.c | 4 ++++ include/uapi/linux/lustre/lustre_idl.h | 2 ++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 33c76c1..66d2679 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -111,8 +111,12 @@ "compact_obdo", "second_flags", /* flags2 names */ - "file_secctx", - "lockaheadv2", + "file_secctx", /* 0x01 */ + "lockaheadv2", /* 0x02 */ + "dir_migrate", /* 0x04 */ + "unknown", /* 0x08 */ + "unknown", /* 0x10 */ + "flr", /* 0x20 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index bcd0229..46d5e74 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1111,6 +1111,10 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_FILE_SECCTX); LASSERTF(OBD_CONNECT2_LOCKAHEAD == 0x2ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_LOCKAHEAD); + LASSERTF(OBD_CONNECT2_DIR_MIGRATE == 0x4ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_DIR_MIGRATE); + LASSERTF(OBD_CONNECT2_FLR == 0x20ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_FLR); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 589bb81..e898e67 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -791,6 +791,8 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LOCKAHEAD 0x2ULL /* ladvise lockahead * v2 */ +#define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir + */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ /* XXX README XXX: From patchwork Thu Feb 27 21:08:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B707114BC for ; Thu, 27 Feb 2020 21:22:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D913246A0 for ; Thu, 27 Feb 2020 21:22:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D913246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BE4DC3488DF; Thu, 27 Feb 2020 13:21:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D87F721FAB4 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 968379FE; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9549346F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:24 -0500 Message-Id: <1582838290-17243-37-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 036/622] lustre: mds: remove obsolete MDS_VTX_BYPASS flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The MDS_VTX_BYPASS flag is only set and never checked. This is true since 2.3.53-66-g54fe979 "LU-2216 mdt: remove obsolete DNE code", but it was already obsolete for a long time before that. WC-bug-id: https://jira.whamcloud.com/browse/LU-6349 Lustre-commit: b99344dda425 ("LU-6349 mds: remove obsolete MDS_VTX_BYPASS flag") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31984 Reviewed-by: Lai Siyao Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 -- include/uapi/linux/lustre/lustre_idl.h | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 46d5e74..c92663b 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1870,8 +1870,6 @@ void lustre_assert_wire_constants(void) LASSERTF(MDS_CROSS_REF == 0x00000002UL, "found 0x%.8xUL\n", (unsigned int)MDS_CROSS_REF); - LASSERTF(MDS_VTX_BYPASS == 0x00000004UL, "found 0x%.8xUL\n", - (unsigned int)MDS_VTX_BYPASS); LASSERTF(MDS_PERM_BYPASS == 0x00000008UL, "found 0x%.8xUL\n", (unsigned int)MDS_PERM_BYPASS); LASSERTF(MDS_QUOTA_IGNORE == 0x00000020UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index e898e67..794e6d6 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1644,11 +1644,11 @@ struct mdt_rec_setattr { enum mds_op_bias { /* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ MDS_CROSS_REF = 1 << 1, - MDS_VTX_BYPASS = 1 << 2, +/* MDS_VTX_BYPASS = 1 << 2, obsolete since 2.3.54 */ MDS_PERM_BYPASS = 1 << 3, /* MDS_SOM = 1 << 4, obsolete since 2.8.0 */ MDS_QUOTA_IGNORE = 1 << 5, - MDS_CLOSE_CLEANUP = 1 << 6, +/* MDS_CLOSE_CLEANUP = 1 << 6, obsolete since 2.3.51 */ MDS_KEEP_ORPHAN = 1 << 7, MDS_RECOV_OPEN = 1 << 8, MDS_DATA_MODIFIED = 1 << 9, From patchwork Thu Feb 27 21:08:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A6F9138D for ; Thu, 27 Feb 2020 21:20:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 43266246A1 for ; Thu, 27 Feb 2020 21:20:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43266246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 258DE21FE4A; Thu, 27 Feb 2020 13:19:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2520721FABD for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9AD96A03; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 98831468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:25 -0500 Message-Id: <1582838290-17243-38-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 037/622] lustre: ldlm: expose dirty age limit for flush-on-glimpse X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Glimpse request may cancel old lock and cause data flush. That helps to cache stat results on client locally early. The time limit was hardcoded to 10s and is exposed now as ns_dirty_age_limit namespace value, it can be set/check via /sys/fs/lustre/ldlm/namespaces//dirty_age_limit WC-bug-id: https://jira.whamcloud.com/browse/LU-10413 Lustre-commit: 69727e45b4c0 ("LU-10413 ldlm: expose dirty age limit for flush-on-glimpse") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32113 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 12 +++++++++++- fs/lustre/ldlm/ldlm_lockd.c | 2 +- fs/lustre/ldlm/ldlm_resource.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 40 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index b1a37f0..8dea9ab 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -60,6 +60,10 @@ #define LDLM_DEFAULT_LRU_SIZE (100 * num_online_cpus()) #define LDLM_DEFAULT_MAX_ALIVE (64 * 60) /* 65 min */ +/* if client lock is unused for that time it can be cancelled if any other + * client shows interest in that lock, e.g. glimpse is occurred. + */ +#define LDLM_DIRTY_AGE_LIMIT (10) #define LDLM_DEFAULT_PARALLEL_AST_LIMIT 1024 /** @@ -412,7 +416,13 @@ struct ldlm_namespace { /** Maximum allowed age (last used time) for locks in the LRU */ ktime_t ns_max_age; - + /** + * Number of seconds since the lock was last used. The client may + * cancel the lock limited by this age and flush related data if + * any other client shows interest in it doing glimpse request. + * This allows to cache stat data locally for such files early. + */ + time64_t ns_dirty_age_limit; /** * Used to rate-limit ldlm_namespace_dump calls. * \see ldlm_namespace_dump. Increased by 10 seconds every time diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 84d73e6..481719b 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -305,7 +305,7 @@ static void ldlm_handle_gl_callback(struct ptlrpc_request *req, !lock->l_readers && !lock->l_writers && ktime_after(ktime_get(), ktime_add(lock->l_last_used, - ktime_set(10, 0)))) { + ktime_set(ns->ns_dirty_age_limit, 0)))) { unlock_res_and_lock(lock); if (ldlm_bl_to_thread_lock(ns, NULL, lock)) ldlm_handle_bl_callback(ns, NULL, lock); diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 4e3c6e7..5e0dd53 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -327,6 +327,32 @@ static ssize_t early_lock_cancel_store(struct kobject *kobj, } LUSTRE_RW_ATTR(early_lock_cancel); +static ssize_t dirty_age_limit_show(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace, + ns_kobj); + + return sprintf(buf, "%llu\n", ns->ns_dirty_age_limit); +} + +static ssize_t dirty_age_limit_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, size_t count) +{ + struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace, + ns_kobj); + unsigned long long tmp; + + if (kstrtoull(buffer, 10, &tmp)) + return -EINVAL; + + ns->ns_dirty_age_limit = tmp; + + return count; +} +LUSTRE_RW_ATTR(dirty_age_limit); + /* These are for namespaces in /sys/fs/lustre/ldlm/namespaces/ */ static struct attribute *ldlm_ns_attrs[] = { &lustre_attr_resource_count.attr, @@ -335,6 +361,7 @@ static ssize_t early_lock_cancel_store(struct kobject *kobj, &lustre_attr_lru_size.attr, &lustre_attr_lru_max_age.attr, &lustre_attr_early_lock_cancel.attr, + &lustre_attr_dirty_age_limit.attr, NULL, }; @@ -653,6 +680,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_max_age = ktime_set(LDLM_DEFAULT_MAX_ALIVE, 0); ns->ns_orig_connect_flags = 0; ns->ns_connect_flags = 0; + ns->ns_dirty_age_limit = LDLM_DIRTY_AGE_LIMIT; ns->ns_stopping = 0; rc = ldlm_namespace_sysfs_register(ns); From patchwork Thu Feb 27 21:08:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E02C138D for ; Thu, 27 Feb 2020 21:19:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66D99246A1 for ; Thu, 27 Feb 2020 21:19:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66D99246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E96A121FCB5; Thu, 27 Feb 2020 13:19:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7CF8F21FA96 for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9D55DA04; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9B7CC46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:26 -0500 Message-Id: <1582838290-17243-39-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 038/622] lustre: ldlm: IBITS lock convert instead of cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin For IBITS lock it is possible to drop just conflicting bits and keep lock itself instead of cancelling it. Lock convert is only bits downgrade on client and then on server. Patch implements lock convert during blocking AST. WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 37932c4beb98 ("LU-10175 ldlm: IBITS lock convert instead of cancel") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/30202 Reviewed-by: Lai Siyao Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 6 + fs/lustre/include/lustre_dlm_flags.h | 16 +- fs/lustre/ldlm/ldlm_inodebits.c | 92 +++++++- fs/lustre/ldlm/ldlm_internal.h | 2 + fs/lustre/ldlm/ldlm_lock.c | 13 +- fs/lustre/ldlm/ldlm_lockd.c | 18 ++ fs/lustre/ldlm/ldlm_request.c | 198 ++++++++++++++++- fs/lustre/llite/namei.c | 383 ++++++++++++++++++++------------- fs/lustre/ptlrpc/wiretest.c | 2 +- include/uapi/linux/lustre/lustre_idl.h | 1 + 10 files changed, 569 insertions(+), 162 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 8dea9ab..66608a9 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -544,6 +544,7 @@ enum ldlm_cancel_flags { LCF_BL_AST = 0x4, /* Cancel locks marked as LDLM_FL_BL_AST * in the same RPC */ + LCF_CONVERT = 0x8, /* Try to convert IBITS lock before cancel */ }; struct ldlm_flock { @@ -1306,6 +1307,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, enum ldlm_mode mode, u64 *flags, void *lvb, u32 lvb_len, const struct lustre_handle *lockh, int rc); +int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags); int ldlm_cli_update_pool(struct ptlrpc_request *req); int ldlm_cli_cancel(const struct lustre_handle *lockh, enum ldlm_cancel_flags cancel_flags); @@ -1330,6 +1332,10 @@ int ldlm_cli_cancel_list(struct list_head *head, int count, enum ldlm_cancel_flags flags); /** @} ldlm_cli_api */ +int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop); +int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits); +int ldlm_cli_dropbits_list(struct list_head *converts, u64 drop_bits); + /* mds/handler.c */ /* This has to be here because recursive inclusion sucks. */ int intent_disposition(struct ldlm_reply *rep, int flag); diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h index 22fb595..c8667c8 100644 --- a/fs/lustre/include/lustre_dlm_flags.h +++ b/fs/lustre/include/lustre_dlm_flags.h @@ -26,10 +26,10 @@ */ #ifndef LDLM_ALL_FLAGS_MASK -/** l_flags bits marked as "all_flags" bits */ -#define LDLM_FL_ALL_FLAGS_MASK 0x00FFFFFFC08F932FULL +/* l_flags bits marked as "all_flags" bits */ +#define LDLM_FL_ALL_FLAGS_MASK 0x00FFFFFFC28F932FULL -/** extent, mode, or resource changed */ +/* extent, mode, or resource changed */ #define LDLM_FL_LOCK_CHANGED 0x0000000000000001ULL /* bit 0 */ #define ldlm_is_lock_changed(_l) LDLM_TEST_FLAG((_l), 1ULL << 0) #define ldlm_set_lock_changed(_l) LDLM_SET_FLAG((_l), 1ULL << 0) @@ -146,6 +146,16 @@ #define ldlm_clear_cancel_on_block(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 23) /** + * Flag indicates that lock is being converted (downgraded) during the blocking + * AST instead of cancelling. Used for IBITS locks now and drops conflicting + * bits only keepeing other. + */ +#define LDLM_FL_CONVERTING 0x0000000002000000ULL /* bit 25 */ +#define ldlm_is_converting(_l) LDLM_TEST_FLAG((_l), 1ULL << 25) +#define ldlm_set_converting(_l) LDLM_SET_FLAG((_l), 1ULL << 25) +#define ldlm_clear_converting(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 25) + +/* * Part of original lockahead implementation, OBD_CONNECT_LOCKAHEAD_OLD. * Reserved temporarily to allow those implementations to keep working. * Will be removed after 2.12 release. diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index ea63d9d..e74928e 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -68,7 +68,14 @@ void ldlm_ibits_policy_local_to_wire(const union ldlm_policy_data *lpolicy, wpolicy->l_inodebits.bits = lpolicy->l_inodebits.bits; } -int ldlm_inodebits_drop(struct ldlm_lock *lock, __u64 to_drop) +/** + * Attempt to convert already granted IBITS lock with several bits set to + * a lock with less bits (downgrade). + * + * Such lock conversion is used to keep lock with non-blocking bits instead of + * cancelling it, introduced for better support of DoM files. + */ +int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop) { check_res_locked(lock->l_resource); @@ -89,3 +96,86 @@ int ldlm_inodebits_drop(struct ldlm_lock *lock, __u64 to_drop) return 0; } EXPORT_SYMBOL(ldlm_inodebits_drop); + +/* convert single lock */ +int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) +{ + struct lustre_handle lockh; + u32 flags = 0; + int rc; + + LASSERT(drop_bits); + LASSERT(!lock->l_readers && !lock->l_writers); + + LDLM_DEBUG(lock, "client lock convert START"); + + ldlm_lock2handle(lock, &lockh); + lock_res_and_lock(lock); + /* check if all bits are cancelled */ + if (!(lock->l_policy_data.l_inodebits.bits & ~drop_bits)) { + unlock_res_and_lock(lock); + /* return error to continue with cancel */ + rc = -EINVAL; + goto exit; + } + + /* check if there is race with cancel */ + if (ldlm_is_canceling(lock) || ldlm_is_cancel(lock)) { + unlock_res_and_lock(lock); + rc = -EINVAL; + goto exit; + } + + /* clear cbpending flag early, it is safe to match lock right after + * client convert because it is downgrade always. + */ + ldlm_clear_cbpending(lock); + ldlm_clear_bl_ast(lock); + + /* If lock is being converted already, check drop bits first */ + if (ldlm_is_converting(lock)) { + /* raced lock convert, lock inodebits are remaining bits + * so check if they are conflicting with new convert or not. + */ + if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { + unlock_res_and_lock(lock); + rc = 0; + goto exit; + } + /* Otherwise drop new conflicting bits in new convert */ + } + ldlm_set_converting(lock); + /* from all bits of blocking lock leave only conflicting */ + drop_bits &= lock->l_policy_data.l_inodebits.bits; + /* save them in cancel_bits, so l_blocking_ast will know + * which bits from the current lock were dropped. + */ + lock->l_policy_data.l_inodebits.cancel_bits = drop_bits; + /* Finally clear these bits in lock ibits */ + ldlm_inodebits_drop(lock, drop_bits); + unlock_res_and_lock(lock); + /* Finally call cancel callback for remaining bits only. + * It is important to have converting flag during that + * so blocking_ast callback can distinguish convert from + * cancels. + */ + if (lock->l_blocking_ast) + lock->l_blocking_ast(lock, NULL, lock->l_ast_data, + LDLM_CB_CANCELING); + + /* now notify server about convert */ + rc = ldlm_cli_convert(lock, &flags); + if (rc) { + lock_res_and_lock(lock); + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + unlock_res_and_lock(lock); + LASSERT(list_empty(&lock->l_lru)); + goto exit; + } + +exit: + LDLM_DEBUG(lock, "client lock convert END"); + return rc; +} diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 96dff1d..ec68713 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -153,7 +153,9 @@ int ldlm_run_ast_work(struct ldlm_namespace *ns, struct list_head *rpc_list, #define ldlm_lock_remove_from_lru(lock) \ ldlm_lock_remove_from_lru_check(lock, ktime_set(0, 0)) int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock); +void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock); void ldlm_lock_destroy_nolock(struct ldlm_lock *lock); +void ldlm_grant_lock_with_skiplist(struct ldlm_lock *lock); /* ldlm_lockd.c */ int ldlm_bl_to_thread_lock(struct ldlm_namespace *ns, struct ldlm_lock_desc *ld, diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index aa19b89..9847c43 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -241,7 +241,7 @@ int ldlm_lock_remove_from_lru_check(struct ldlm_lock *lock, ktime_t last_use) /** * Adds LDLM lock @lock to namespace LRU. Assumes LRU is already locked. */ -static void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) +void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) { struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); @@ -791,7 +791,8 @@ void ldlm_lock_decref_internal(struct ldlm_lock *lock, enum ldlm_mode mode) ldlm_bl_to_thread_lock(ns, NULL, lock) != 0) ldlm_handle_bl_callback(ns, NULL, lock); } else if (!lock->l_readers && !lock->l_writers && - !ldlm_is_no_lru(lock) && !ldlm_is_bl_ast(lock)) { + !ldlm_is_no_lru(lock) && !ldlm_is_bl_ast(lock) && + !ldlm_is_converting(lock)) { LDLM_DEBUG(lock, "add lock into lru list"); /* If this is a client-side namespace and this was the last @@ -1648,6 +1649,13 @@ enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns, unlock_res_and_lock(lock); ldlm_lock2desc(lock->l_blocking_lock, &d); + /* copy blocking lock ibits in cancel_bits as well, + * new client may use them for lock convert and it is + * important to use new field to convert locks from + * new servers only + */ + d.l_policy_data.l_inodebits.cancel_bits = + lock->l_blocking_lock->l_policy_data.l_inodebits.bits; rc = lock->l_blocking_ast(lock, &d, (void *)arg, LDLM_CB_BLOCKING); LDLM_LOCK_RELEASE(lock->l_blocking_lock); @@ -1896,6 +1904,7 @@ void ldlm_lock_cancel(struct ldlm_lock *lock) */ if (lock->l_readers || lock->l_writers) { LDLM_ERROR(lock, "lock still has references"); + unlock_res_and_lock(lock); LBUG(); } diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 481719b..b50a3f7 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -118,6 +118,24 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, LDLM_DEBUG(lock, "client blocking AST callback handler"); lock_res_and_lock(lock); + + /* set bits to cancel for this lock for possible lock convert */ + if (lock->l_resource->lr_type == LDLM_IBITS) { + /* Lock description contains policy of blocking lock, + * and its cancel_bits is used to pass conflicting bits. + * NOTE: ld can be NULL or can be not NULL but zeroed if + * passed from ldlm_bl_thread_blwi(), check below used bits + * in ld to make sure it is valid description. + */ + if (ld && ld->l_policy_data.l_inodebits.bits) + lock->l_policy_data.l_inodebits.cancel_bits = + ld->l_policy_data.l_inodebits.cancel_bits; + /* if there is no valid ld and lock is cbpending already + * then cancel_bits should be kept, otherwise it is zeroed. + */ + else if (!ldlm_is_cbpending(lock)) + lock->l_policy_data.l_inodebits.cancel_bits = 0; + } ldlm_set_cbpending(lock); if (ldlm_is_cancel_on_block(lock)) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 92e4f69..5ec0da5 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -818,6 +818,177 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, EXPORT_SYMBOL(ldlm_cli_enqueue); /** + * Client-side lock convert reply handling. + * + * Finish client lock converting, checks for concurrent converts + * and clear 'converting' flag so lock can be placed back into LRU. + */ +static int lock_convert_interpret(const struct lu_env *env, + struct ptlrpc_request *req, + struct ldlm_async_args *aa, int rc) +{ + struct ldlm_lock *lock; + struct ldlm_reply *reply; + + lock = ldlm_handle2lock(&aa->lock_handle); + if (!lock) { + LDLM_DEBUG_NOLOCK("convert ACK for unknown local cookie %#llx", + aa->lock_handle.cookie); + return -ESTALE; + } + + LDLM_DEBUG(lock, "CONVERTED lock:"); + + if (rc != ELDLM_OK) + goto out; + + reply = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP); + if (!reply) { + rc = -EPROTO; + goto out; + } + + if (reply->lock_handle.cookie != aa->lock_handle.cookie) { + LDLM_ERROR(lock, + "convert ACK with wrong lock cookie %#llx but cookie %#llx from server %s id %s\n", + aa->lock_handle.cookie, reply->lock_handle.cookie, + req->rq_export->exp_client_uuid.uuid, + libcfs_id2str(req->rq_peer)); + rc = -ESTALE; + goto out; + } + + lock_res_and_lock(lock); + /* Lock convert is sent for any new bits to drop, the converting flag + * is dropped when ibits on server are the same as on client. Meanwhile + * that can be so that more later convert will be replied first with + * and clear converting flag, so in case of such race just exit here. + * if lock has no converting bits then. + */ + if (!ldlm_is_converting(lock)) { + LDLM_DEBUG(lock, + "convert ACK for lock without converting flag, reply ibits %#llx", + reply->lock_desc.l_policy_data.l_inodebits.bits); + } else if (reply->lock_desc.l_policy_data.l_inodebits.bits != + lock->l_policy_data.l_inodebits.bits) { + /* Compare server returned lock ibits and local lock ibits + * if they are the same we consider conversion is done, + * otherwise we have more converts inflight and keep + * converting flag. + */ + LDLM_DEBUG(lock, "convert ACK with ibits %#llx\n", + reply->lock_desc.l_policy_data.l_inodebits.bits); + } else { + ldlm_clear_converting(lock); + + /* Concurrent BL AST has arrived, it may cause another convert + * or cancel so just exit here. + */ + if (!ldlm_is_bl_ast(lock)) { + struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); + + /* Drop cancel_bits since there are no more converts + * and put lock into LRU if it is not there yet. + */ + lock->l_policy_data.l_inodebits.cancel_bits = 0; + spin_lock(&ns->ns_lock); + if (!list_empty(&lock->l_lru)) + ldlm_lock_remove_from_lru_nolock(lock); + ldlm_lock_add_to_lru_nolock(lock); + spin_unlock(&ns->ns_lock); + } + } + unlock_res_and_lock(lock); +out: + if (rc) { + lock_res_and_lock(lock); + if (ldlm_is_converting(lock)) { + LASSERT(list_empty(&lock->l_lru)); + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } + unlock_res_and_lock(lock); + } + + LDLM_LOCK_PUT(lock); + return rc; +} + +/** + * Client-side IBITS lock convert. + * + * Inform server that lock has been converted instead of canceling. + * Server finishes convert on own side and does reprocess to grant + * all related waiting locks. + * + * Since convert means only ibits downgrading, client doesn't need to + * wait for server reply to finish local converting process so this request + * is made asynchronous. + * + */ +int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) +{ + struct ldlm_request *body; + struct ptlrpc_request *req; + struct ldlm_async_args *aa; + struct obd_export *exp = lock->l_conn_export; + + if (!exp) { + LDLM_ERROR(lock, "convert must not be called on local locks."); + return -EINVAL; + } + + if (lock->l_resource->lr_type != LDLM_IBITS) { + LDLM_ERROR(lock, "convert works with IBITS locks only."); + return -EINVAL; + } + + LDLM_DEBUG(lock, "client-side convert"); + + req = ptlrpc_request_alloc_pack(class_exp2cliimp(exp), + &RQF_LDLM_CONVERT, LUSTRE_DLM_VERSION, + LDLM_CONVERT); + if (!req) + return -ENOMEM; + + body = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ); + body->lock_handle[0] = lock->l_remote_handle; + + body->lock_desc.l_req_mode = lock->l_req_mode; + body->lock_desc.l_granted_mode = lock->l_granted_mode; + + body->lock_desc.l_policy_data.l_inodebits.bits = + lock->l_policy_data.l_inodebits.bits; + body->lock_desc.l_policy_data.l_inodebits.cancel_bits = 0; + + body->lock_flags = ldlm_flags_to_wire(*flags); + body->lock_count = 1; + + ptlrpc_request_set_replen(req); + + /* That could be useful to use cancel portals for convert as well + * as high-priority handling. This will require changes in + * ldlm_cancel_handler to understand convert RPC as well. + * + * req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; + * req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + */ + ptlrpc_at_set_req_timeout(req); + + if (exp->exp_obd->obd_svc_stats) + lprocfs_counter_incr(exp->exp_obd->obd_svc_stats, + LDLM_CONVERT - LDLM_FIRST_OPC); + + aa = ptlrpc_req_async_args(aa, req); + ldlm_lock2handle(lock, &aa->lock_handle); + req->rq_interpret_reply = (ptlrpc_interpterer_t)lock_convert_interpret; + + ptlrpcd_add_req(req); + return 0; +} + +/** * Cancel locks locally. * * Returns: LDLM_FL_LOCAL_ONLY if there is no need for a CANCEL RPC @@ -1057,6 +1228,19 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } + /* Convert lock bits instead of cancel for IBITS locks */ + if (cancel_flags & LCF_CONVERT) { + LASSERT(lock->l_resource->lr_type == LDLM_IBITS); + LASSERT(lock->l_policy_data.l_inodebits.cancel_bits != 0); + + rc = ldlm_cli_dropbits(lock, + lock->l_policy_data.l_inodebits.cancel_bits); + if (rc == 0) { + LDLM_LOCK_RELEASE(lock); + return 0; + } + } + lock_res_and_lock(lock); /* Lock is being canceled and the caller doesn't want to wait */ if (ldlm_is_canceling(lock)) { @@ -1069,6 +1253,15 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } + /* Lock is being converted, cancel it immediately. + * When convert will end, it releases lock and it will be gone. + */ + if (ldlm_is_converting(lock)) { + /* set back flags removed by convert */ + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } + ldlm_set_canceling(lock); unlock_res_and_lock(lock); @@ -1439,7 +1632,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* Somebody is already doing CANCEL. No need for this * lock in LRU, do not traverse it again. */ - if (!ldlm_is_canceling(lock)) + if (!ldlm_is_canceling(lock) || + !ldlm_is_converting(lock)) break; ldlm_lock_remove_from_lru_nolock(lock); @@ -1483,7 +1677,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, lock_res_and_lock(lock); /* Check flags again under the lock. */ - if (ldlm_is_canceling(lock) || + if (ldlm_is_canceling(lock) || ldlm_is_converting(lock) || (ldlm_lock_remove_from_lru_check(lock, last_use) == 0)) { /* Another thread is removing lock from LRU, or * somebody is already doing CANCEL, or there diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 1b5e270..8b1a1ca 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -213,184 +213,261 @@ int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) return rc; } -int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, - void *data, int flag) +void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) { - struct lustre_handle lockh; + struct inode *inode = ll_inode_from_resource_lock(lock); + u64 bits = to_cancel; int rc; - switch (flag) { - case LDLM_CB_BLOCKING: - ldlm_lock2handle(lock, &lockh); - rc = ldlm_cli_cancel(&lockh, LCF_ASYNC); - if (rc < 0) { - CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc); - return rc; - } - break; - case LDLM_CB_CANCELING: { - struct inode *inode = ll_inode_from_resource_lock(lock); - u64 bits = lock->l_policy_data.l_inodebits.bits; + if (!inode) + return; - if (!inode) - break; + if (!fid_res_name_eq(ll_inode2fid(inode), + &lock->l_resource->lr_name)) { + LDLM_ERROR(lock, + "data mismatch with object " DFID "(%p)", + PFID(ll_inode2fid(inode)), inode); + LBUG(); + } - /* Invalidate all dentries associated with this inode */ - LASSERT(ldlm_is_canceling(lock)); + if (bits & MDS_INODELOCK_XATTR) { + if (S_ISDIR(inode->i_mode)) + ll_i2info(inode)->lli_def_stripe_offset = -1; + ll_xattr_cache_destroy(inode); + bits &= ~MDS_INODELOCK_XATTR; + } - if (!fid_res_name_eq(ll_inode2fid(inode), - &lock->l_resource->lr_name)) { - LDLM_ERROR(lock, - "data mismatch with object " DFID "(%p)", - PFID(ll_inode2fid(inode)), inode); + /* For OPEN locks we differentiate between lock modes + * LCK_CR, LCK_CW, LCK_PR - bug 22891 + */ + if (bits & MDS_INODELOCK_OPEN) + ll_have_md_lock(inode, &bits, lock->l_req_mode); + + if (bits & MDS_INODELOCK_OPEN) { + fmode_t fmode; + + switch (lock->l_req_mode) { + case LCK_CW: + fmode = FMODE_WRITE; + break; + case LCK_PR: + fmode = FMODE_EXEC; + break; + case LCK_CR: + fmode = FMODE_READ; + break; + default: + LDLM_ERROR(lock, "bad lock mode for OPEN lock"); LBUG(); } - if (bits & MDS_INODELOCK_XATTR) { - if (S_ISDIR(inode->i_mode)) - ll_i2info(inode)->lli_def_stripe_offset = -1; - ll_xattr_cache_destroy(inode); - bits &= ~MDS_INODELOCK_XATTR; - } + ll_md_real_close(inode, fmode); - /* For OPEN locks we differentiate between lock modes - * LCK_CR, LCK_CW, LCK_PR - bug 22891 - */ - if (bits & MDS_INODELOCK_OPEN) - ll_have_md_lock(inode, &bits, lock->l_req_mode); - - if (bits & MDS_INODELOCK_OPEN) { - fmode_t fmode; - - switch (lock->l_req_mode) { - case LCK_CW: - fmode = FMODE_WRITE; - break; - case LCK_PR: - fmode = FMODE_EXEC; - break; - case LCK_CR: - fmode = FMODE_READ; - break; - default: - LDLM_ERROR(lock, "bad lock mode for OPEN lock"); - LBUG(); - } + bits &= ~MDS_INODELOCK_OPEN; + } - ll_md_real_close(inode, fmode); - } + if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | + MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM | + MDS_INODELOCK_DOM)) + ll_have_md_lock(inode, &bits, LCK_MINMODE); + + if (bits & MDS_INODELOCK_DOM) { + rc = ll_dom_lock_cancel(inode, lock); + if (rc < 0) + CDEBUG(D_INODE, "cannot flush DoM data " + DFID": rc = %d\n", + PFID(ll_inode2fid(inode)), rc); + lock_res_and_lock(lock); + ldlm_set_kms_ignore(lock); + unlock_res_and_lock(lock); + } - if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | - MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM | - MDS_INODELOCK_DOM)) - ll_have_md_lock(inode, &bits, LCK_MINMODE); - - if (bits & MDS_INODELOCK_DOM) { - rc = ll_dom_lock_cancel(inode, lock); - if (rc < 0) - CDEBUG(D_INODE, "cannot flush DoM data " - DFID": rc = %d\n", - PFID(ll_inode2fid(inode)), rc); - lock_res_and_lock(lock); - ldlm_set_kms_ignore(lock); - unlock_res_and_lock(lock); - bits &= ~MDS_INODELOCK_DOM; - } + if (bits & MDS_INODELOCK_LAYOUT) { + struct cl_object_conf conf = { + .coc_opc = OBJECT_CONF_INVALIDATE, + .coc_inode = inode, + }; - if (bits & MDS_INODELOCK_LAYOUT) { - struct cl_object_conf conf = { - .coc_opc = OBJECT_CONF_INVALIDATE, - .coc_inode = inode, - }; - - rc = ll_layout_conf(inode, &conf); - if (rc < 0) - CDEBUG(D_INODE, "cannot invalidate layout of " - DFID ": rc = %d\n", - PFID(ll_inode2fid(inode)), rc); - } + rc = ll_layout_conf(inode, &conf); + if (rc < 0) + CDEBUG(D_INODE, "cannot invalidate layout of " + DFID ": rc = %d\n", + PFID(ll_inode2fid(inode)), rc); + } - if (bits & MDS_INODELOCK_UPDATE) { - set_bit(LLIF_UPDATE_ATIME, - &ll_i2info(inode)->lli_flags); - } + if (bits & MDS_INODELOCK_UPDATE) + set_bit(LLIF_UPDATE_ATIME, + &ll_i2info(inode)->lli_flags); - if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) { - struct ll_inode_info *lli = ll_i2info(inode); + if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) { + struct ll_inode_info *lli = ll_i2info(inode); - CDEBUG(D_INODE, - "invalidating inode " DFID " lli = %p, pfid = " DFID "\n", - PFID(ll_inode2fid(inode)), lli, - PFID(&lli->lli_pfid)); + CDEBUG(D_INODE, + "invalidating inode "DFID" lli = %p, pfid = "DFID"\n", + PFID(ll_inode2fid(inode)), + lli, PFID(&lli->lli_pfid)); + truncate_inode_pages(inode->i_mapping, 0); - truncate_inode_pages(inode->i_mapping, 0); + if (unlikely(!fid_is_zero(&lli->lli_pfid))) { + struct inode *master_inode = NULL; + unsigned long hash; - if (unlikely(!fid_is_zero(&lli->lli_pfid))) { - struct inode *master_inode = NULL; - unsigned long hash; + /* + * This is slave inode, since all of the child dentry + * is connected on the master inode, so we have to + * invalidate the negative children on master inode + */ + CDEBUG(D_INODE, + "Invalidate s" DFID " m" DFID "\n", + PFID(ll_inode2fid(inode)), PFID(&lli->lli_pfid)); - /* - * This is slave inode, since all of the child - * dentry is connected on the master inode, so - * we have to invalidate the negative children - * on master inode - */ - CDEBUG(D_INODE, - "Invalidate s" DFID " m" DFID "\n", - PFID(ll_inode2fid(inode)), - PFID(&lli->lli_pfid)); - - hash = cl_fid_build_ino(&lli->lli_pfid, - ll_need_32bit_api(ll_i2sbi(inode))); - /* - * Do not lookup the inode with ilookup5, - * otherwise it will cause dead lock, - * - * 1. Client1 send chmod req to the MDT0, then - * on MDT0, it enqueues master and all of its - * slaves lock, (mdt_attr_set() -> - * mdt_lock_slaves()), after gets master and - * stripe0 lock, it will send the enqueue req - * (for stripe1) to MDT1, then MDT1 finds the - * lock has been granted to client2. Then MDT1 - * sends blocking ast to client2. - * - * 2. At the same time, client2 tries to unlink - * the striped dir (rm -rf striped_dir), and - * during lookup, it will hold the master inode - * of the striped directory, whose inode state - * is NEW, then tries to revalidate all of its - * slaves, (ll_prep_inode()->ll_iget()-> - * ll_read_inode2()-> ll_update_inode().). And - * it will be blocked on the server side because - * of 1. - * - * 3. Then the client get the blocking_ast req, - * cancel the lock, but being blocked if using - * ->ilookup5()), because master inode state is - * NEW. - */ - master_inode = ilookup5_nowait(inode->i_sb, - hash, - ll_test_inode_by_fid, - (void *)&lli->lli_pfid); - if (master_inode) { - ll_invalidate_negative_children(master_inode); - iput(master_inode); - } - } else { - ll_invalidate_negative_children(inode); + hash = cl_fid_build_ino(&lli->lli_pfid, + ll_need_32bit_api( + ll_i2sbi(inode))); + /* + * Do not lookup the inode with ilookup5, otherwise + * it will cause dead lock, + * 1. Client1 send chmod req to the MDT0, then on MDT0, + * it enqueues master and all of its slaves lock, + * (mdt_attr_set() -> mdt_lock_slaves()), after gets + * master and stripe0 lock, it will send the enqueue + * req (for stripe1) to MDT1, then MDT1 finds the lock + * has been granted to client2. Then MDT1 sends blocking + * ast to client2. + * 2. At the same time, client2 tries to unlink + * the striped dir (rm -rf striped_dir), and during + * lookup, it will hold the master inode of the striped + * directory, whose inode state is NEW, then tries to + * revalidate all of its slaves, (ll_prep_inode()-> + * ll_iget()->ll_read_inode2()-> ll_update_inode().). + * And it will be blocked on the server side because + * of 1. + * 3. Then the client get the blocking_ast req, cancel + * the lock, but being blocked if using ->ilookup5()), + * because master inode state is NEW. + */ + master_inode = ilookup5_nowait(inode->i_sb, hash, + ll_test_inode_by_fid, + (void *)&lli->lli_pfid); + if (master_inode) { + ll_invalidate_negative_children(master_inode); + iput(master_inode); } + } else { + ll_invalidate_negative_children(inode); } + } - if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) && - inode->i_sb->s_root && - !is_root_inode(inode)) - ll_invalidate_aliases(inode); + if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) && + inode->i_sb->s_root && + !is_root_inode(inode)) + ll_invalidate_aliases(inode); - iput(inode); + iput(inode); +} + +/* Check if the given lock may be downgraded instead of canceling and + * that convert is really needed. + */ +int ll_md_need_convert(struct ldlm_lock *lock) +{ + struct inode *inode; + u64 wanted = lock->l_policy_data.l_inodebits.cancel_bits; + u64 bits = lock->l_policy_data.l_inodebits.bits & ~wanted; + enum ldlm_mode mode = LCK_MINMODE; + + if (!wanted || !bits || ldlm_is_cancel(lock)) + return 0; + + /* do not convert locks other than DOM for now */ + if (!((bits | wanted) & MDS_INODELOCK_DOM)) + return 0; + + /* We may have already remaining bits in some other lock so + * lock convert will leave us just extra lock for the same bit. + * Check if client has other lock with the same bits and the same + * or lower mode and don't convert if any. + */ + switch (lock->l_req_mode) { + case LCK_PR: + mode = LCK_PR; + /* fall-through */ + case LCK_PW: + mode |= LCK_CR; + break; + case LCK_CW: + mode = LCK_CW; + /* fall-through */ + case LCK_CR: + mode |= LCK_CR; break; + default: + /* do not convert other modes */ + return 0; } + + /* is lock is too old to be converted? */ + lock_res_and_lock(lock); + if (ktime_after(ktime_get(), + ktime_add(lock->l_last_used, + ktime_set(10, 0)))) { + unlock_res_and_lock(lock); + return 0; + } + unlock_res_and_lock(lock); + + inode = ll_inode_from_resource_lock(lock); + ll_have_md_lock(inode, &bits, mode); + iput(inode); + return !!(bits); +} + +int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, + void *data, int flag) +{ + struct lustre_handle lockh; + u64 bits = lock->l_policy_data.l_inodebits.bits; + int rc; + + switch (flag) { + case LDLM_CB_BLOCKING: + { + u64 cancel_flags = LCF_ASYNC; + + if (ll_md_need_convert(lock)) { + cancel_flags |= LCF_CONVERT; + /* For lock convert some cancel actions may require + * this lock with non-dropped canceled bits, e.g. page + * flush for DOM lock. So call ll_lock_cancel_bits() + * here while canceled bits are still set. + */ + bits = lock->l_policy_data.l_inodebits.cancel_bits; + if (bits & MDS_INODELOCK_DOM) + ll_lock_cancel_bits(lock, MDS_INODELOCK_DOM); + } + ldlm_lock2handle(lock, &lockh); + rc = ldlm_cli_cancel(&lockh, cancel_flags); + if (rc < 0) { + CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc); + return rc; + } + break; + } + case LDLM_CB_CANCELING: + if (ldlm_is_converting(lock)) { + /* this is called on already converted lock, so + * ibits has remained bits only and cancel_bits + * are bits that were dropped. + * Note that DOM lock is handled prior lock convert + * and is excluded here. + */ + bits = lock->l_policy_data.l_inodebits.cancel_bits & + ~MDS_INODELOCK_DOM; + } else { + LASSERT(ldlm_is_canceling(lock)); + } + ll_lock_cancel_bits(lock, bits); + break; default: LBUG(); } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c92663b..b14d301c 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -3027,7 +3027,7 @@ void lustre_assert_wire_constants(void) (long long)(int)sizeof(((struct ldlm_extent *)0)->gid)); /* Checks for struct ldlm_inodebits */ - LASSERTF((int)sizeof(struct ldlm_inodebits) == 8, "found %lld\n", + LASSERTF((int)sizeof(struct ldlm_inodebits) == 16, "found %lld\n", (long long)(int)sizeof(struct ldlm_inodebits)); LASSERTF((int)offsetof(struct ldlm_inodebits, bits) == 0, "found %lld\n", (long long)(int)offsetof(struct ldlm_inodebits, bits)); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 794e6d6..2403b89 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2120,6 +2120,7 @@ static inline bool ldlm_extent_equal(const struct ldlm_extent *ex1, struct ldlm_inodebits { __u64 bits; + __u64 cancel_bits; /* for lock convert */ }; struct ldlm_flock_wire { From patchwork Thu Feb 27 21:08:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D49ED14BC for ; Thu, 27 Feb 2020 21:20:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD16A246A1 for ; Thu, 27 Feb 2020 21:20:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD16A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1585521FECF; Thu, 27 Feb 2020 13:19:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB40B21FA6E for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A0540A05; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9EAD546A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:27 -0500 Message-Id: <1582838290-17243-40-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 039/622] lustre: ptlrpc: fix return type of boolean functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Some functions are returning type int with values 0 or 1 when they could be returning bool. Fix up the return type of: lustre_req_swabbed() lustre_rep_swabbed() ptlrpc_req_need_swab() ptlrpc_rep_need_swab() ptlrpc_buf_need_swab() WC-bug-id: https://jira.whamcloud.com/browse/LU-1644 Lustre-commit: e2cac9fb9baf ("LU-1644 ptlrpc: fix return type of boolean functions") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32088 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 20 ++++++++++---------- fs/lustre/ptlrpc/pack_generic.c | 9 ++++----- fs/lustre/ptlrpc/sec_plain.c | 7 +++---- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 961b8cb..0231011 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -953,35 +953,35 @@ static inline bool ptlrpc_nrs_req_can_move(struct ptlrpc_request *req) /** @} nrs */ /** - * Returns 1 if request buffer at offset @index was already swabbed + * Returns true if request buffer at offset @index was already swabbed */ -static inline int lustre_req_swabbed(struct ptlrpc_request *req, size_t index) +static inline bool lustre_req_swabbed(struct ptlrpc_request *req, size_t index) { LASSERT(index < sizeof(req->rq_req_swab_mask) * 8); return req->rq_req_swab_mask & (1 << index); } /** - * Returns 1 if request reply buffer at offset @index was already swabbed + * Returns true if request reply buffer at offset @index was already swabbed */ -static inline int lustre_rep_swabbed(struct ptlrpc_request *req, size_t index) +static inline bool lustre_rep_swabbed(struct ptlrpc_request *req, size_t index) { LASSERT(index < sizeof(req->rq_rep_swab_mask) * 8); return req->rq_rep_swab_mask & (1 << index); } /** - * Returns 1 if request needs to be swabbed into local cpu byteorder + * Returns true if request needs to be swabbed into local cpu byteorder */ -static inline int ptlrpc_req_need_swab(struct ptlrpc_request *req) +static inline bool ptlrpc_req_need_swab(struct ptlrpc_request *req) { return lustre_req_swabbed(req, MSG_PTLRPC_HEADER_OFF); } /** - * Returns 1 if request reply needs to be swabbed into local cpu byteorder + * Returns true if request reply needs to be swabbed into local cpu byteorder */ -static inline int ptlrpc_rep_need_swab(struct ptlrpc_request *req) +static inline bool ptlrpc_rep_need_swab(struct ptlrpc_request *req) { return lustre_rep_swabbed(req, MSG_PTLRPC_HEADER_OFF); } @@ -1999,8 +1999,8 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, * * @{ */ -int ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, - u32 index); +bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, + u32 index); void ptlrpc_buf_set_swabbed(struct ptlrpc_request *req, const int inout, u32 index); int ptlrpc_unpack_rep_msg(struct ptlrpc_request *req, int len); diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index bc5e513..9cea826 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -78,15 +78,14 @@ void ptlrpc_buf_set_swabbed(struct ptlrpc_request *req, const int inout, lustre_set_rep_swabbed(req, index); } -int ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, - u32 index) +bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, + u32 index) { if (inout) return (ptlrpc_req_need_swab(req) && !lustre_req_swabbed(req, index)); - else - return (ptlrpc_rep_need_swab(req) && - !lustre_rep_swabbed(req, index)); + + return (ptlrpc_rep_need_swab(req) && !lustre_rep_swabbed(req, index)); } /* early reply size */ diff --git a/fs/lustre/ptlrpc/sec_plain.c b/fs/lustre/ptlrpc/sec_plain.c index 2358c3f..93a9a17 100644 --- a/fs/lustre/ptlrpc/sec_plain.c +++ b/fs/lustre/ptlrpc/sec_plain.c @@ -217,7 +217,7 @@ int plain_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req) struct lustre_msg *msg = req->rq_repdata; struct plain_header *phdr; u32 cksum; - int swabbed; + bool swabbed; if (msg->lm_bufcount != PLAIN_PACK_SEGMENTS) { CERROR("unexpected reply buf count %u\n", msg->lm_bufcount); @@ -715,12 +715,11 @@ int plain_enlarge_reqbuf(struct ptlrpc_sec *sec, .sc_policy = &plain_policy, }; -static -int plain_accept(struct ptlrpc_request *req) +static int plain_accept(struct ptlrpc_request *req) { struct lustre_msg *msg = req->rq_reqbuf; struct plain_header *phdr; - int swabbed; + bool swabbed; LASSERT(SPTLRPC_FLVR_POLICY(req->rq_flvr.sf_rpc) == SPTLRPC_POLICY_PLAIN); From patchwork Thu Feb 27 21:08:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0CE2138D for ; Thu, 27 Feb 2020 21:20:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 88ADE246A1 for ; Thu, 27 Feb 2020 21:20:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88ADE246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 882E221CB88; Thu, 27 Feb 2020 13:19:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3D93821FA6E for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A3C3BA1C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A204746C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:28 -0500 Message-Id: <1582838290-17243-41-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 040/622] lustre: llite: decrease sa_running if fail to start statahead X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong Otherwise the counter of ll_sb_info::ll_sa_running will leak as to the umount process will be blocked for ever. WC-bug-id: https://jira.whamcloud.com/browse/LU-10992 Lustre-commit: 6b8638bf7920 ("LU-10992 llite: decrease sa_running if fail to start statahead") Signed-off-by: Fan Yong Reviewed-on: https://review.whamcloud.com/32287 Reviewed-by: Lai Siyao Reviewed-by: Bobi Jam Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/statahead.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 4a61dac..122b9d8 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1566,6 +1566,7 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry) spin_lock(&lli->lli_sa_lock); lli->lli_sai = NULL; spin_unlock(&lli->lli_sa_lock); + atomic_dec(&ll_i2sbi(parent->d_inode)->ll_sa_running); rc = PTR_ERR(task); CERROR("can't start ll_sa thread, rc : %d\n", rc); goto out; From patchwork Thu Feb 27 21:08:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7C2D14BC for ; Thu, 27 Feb 2020 21:20:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A02F5246A1 for ; Thu, 27 Feb 2020 21:20:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A02F5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7A0921FB29; Thu, 27 Feb 2020 13:19:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8044521FAC3 for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A666AA1D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A501546F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:29 -0500 Message-Id: <1582838290-17243-42-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 041/622] lustre: lmv: dir page is released while in use X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao When popping stripe dirent, if it reaches page end, stripe_dirent_next() releases current page and then reads next one, but current dirent is still in use, as will cause wrong values used, and trigger assertion. This patch changes to not read next page upon reaching end, but leave it to next dirent read. WC-bug-id: https://jira.whamcloud.com/browse/LU-9857 Lustre-commit: b51e8d6b53a3 ("LU-9857 lmv: dir page is released while in use") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/32180 Reviewed-by: Fan Yong Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 123 +++++++++++++++++++++++------------------------- 1 file changed, 60 insertions(+), 63 deletions(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index d0f626f..c7bf8c7 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2016,7 +2016,7 @@ struct lmv_dir_ctxt { struct stripe_dirent ldc_stripes[0]; }; -static inline void put_stripe_dirent(struct stripe_dirent *stripe) +static inline void stripe_dirent_unload(struct stripe_dirent *stripe) { if (stripe->sd_page) { kunmap(stripe->sd_page); @@ -2031,62 +2031,77 @@ static inline void put_lmv_dir_ctxt(struct lmv_dir_ctxt *ctxt) int i; for (i = 0; i < ctxt->ldc_count; i++) - put_stripe_dirent(&ctxt->ldc_stripes[i]); + stripe_dirent_unload(&ctxt->ldc_stripes[i]); } -static struct lu_dirent *stripe_dirent_next(struct lmv_dir_ctxt *ctxt, +/* if @ent is dummy, or . .., get next */ +static struct lu_dirent *stripe_dirent_get(struct lmv_dir_ctxt *ctxt, + struct lu_dirent *ent, + int stripe_index) +{ + for (; ent; ent = lu_dirent_next(ent)) { + /* Skip dummy entry */ + if (le16_to_cpu(ent->lde_namelen) == 0) + continue; + + /* skip . and .. for other stripes */ + if (stripe_index && + (strncmp(ent->lde_name, ".", + le16_to_cpu(ent->lde_namelen)) == 0 || + strncmp(ent->lde_name, "..", + le16_to_cpu(ent->lde_namelen)) == 0)) + continue; + + if (le64_to_cpu(ent->lde_hash) >= ctxt->ldc_hash) + break; + } + + return ent; +} + +static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, struct stripe_dirent *stripe, int stripe_index) { + struct md_op_data *op_data = ctxt->ldc_op_data; + struct lmv_oinfo *oinfo; + struct lu_fid fid = op_data->op_fid1; + struct inode *inode = op_data->op_data; + struct lmv_tgt_desc *tgt; struct lu_dirent *ent = stripe->sd_ent; u64 hash = ctxt->ldc_hash; - u64 end; int rc = 0; LASSERT(stripe == &ctxt->ldc_stripes[stripe_index]); - - if (stripe->sd_eof) - return NULL; - - if (ent) { - ent = lu_dirent_next(ent); - if (!ent) { -check_eof: - end = le64_to_cpu(stripe->sd_dp->ldp_hash_end); - - LASSERTF(hash <= end, "hash %llx end %llx\n", - hash, end); + LASSERT(!ent); + + do { + if (stripe->sd_page) { + u64 end = le64_to_cpu(stripe->sd_dp->ldp_hash_end); + + /* @hash should be the last dirent hash */ + LASSERTF(hash <= end, + "ctxt@%p stripe@%p hash %llx end %llx\n", + ctxt, stripe, hash, end); + /* unload last page */ + stripe_dirent_unload(stripe); + /* eof */ if (end == MDS_DIR_END_OFF) { stripe->sd_ent = NULL; stripe->sd_eof = true; - return NULL; + break; } - - put_stripe_dirent(stripe); hash = end; } - } - - if (!ent) { - struct md_op_data *op_data = ctxt->ldc_op_data; - struct lmv_oinfo *oinfo; - struct lu_fid fid = op_data->op_fid1; - struct inode *inode = op_data->op_data; - struct lmv_tgt_desc *tgt; - - LASSERT(!stripe->sd_page); oinfo = &op_data->op_mea1->lsm_md_oinfo[stripe_index]; tgt = lmv_get_target(ctxt->ldc_lmv, oinfo->lmo_mds, NULL); if (IS_ERR(tgt)) { rc = PTR_ERR(tgt); - goto out; + break; } - /* - * op_data will be shared by each stripe, so we need - * reset these value for each stripe - */ + /* op_data is shared by stripes, reset after use */ op_data->op_fid1 = oinfo->lmo_fid; op_data->op_fid2 = oinfo->lmo_fid; op_data->op_data = oinfo->lmo_root; @@ -2099,42 +2114,24 @@ static struct lu_dirent *stripe_dirent_next(struct lmv_dir_ctxt *ctxt, op_data->op_data = inode; if (rc) - goto out; - - stripe->sd_dp = page_address(stripe->sd_page); - ent = lu_dirent_start(stripe->sd_dp); - } - - for (; ent; ent = lu_dirent_next(ent)) { - /* Skip dummy entry */ - if (!le16_to_cpu(ent->lde_namelen)) - continue; - - /* skip . and .. for other stripes */ - if (stripe_index && - (strncmp(ent->lde_name, ".", - le16_to_cpu(ent->lde_namelen)) == 0 || - strncmp(ent->lde_name, "..", - le16_to_cpu(ent->lde_namelen)) == 0)) - continue; - - if (le64_to_cpu(ent->lde_hash) >= hash) break; - } - if (!ent) - goto check_eof; + stripe->sd_dp = page_address(stripe->sd_page); + ent = stripe_dirent_get(ctxt, lu_dirent_start(stripe->sd_dp), + stripe_index); + /* in case a page filled with ., .. and dummy, read next */ + } while (!ent); -out: stripe->sd_ent = ent; - /* treat error as eof, so dir can be partially accessed */ if (rc) { - put_stripe_dirent(stripe); + LASSERT(!ent); + /* treat error as eof, so dir can be partially accessed */ stripe->sd_eof = true; LCONSOLE_WARN("dir " DFID " stripe %d readdir failed: %d, directory is partially accessed!\n", PFID(&ctxt->ldc_op_data->op_fid1), stripe_index, rc); } + return ent; } @@ -2186,8 +2183,7 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) continue; if (!stripe->sd_ent) { - /* locate starting entry */ - stripe_dirent_next(ctxt, stripe, i); + stripe_dirent_load(ctxt, stripe, i); if (!stripe->sd_ent) { LASSERT(stripe->sd_eof); continue; @@ -2208,7 +2204,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) stripe = &ctxt->ldc_stripes[min]; ent = stripe->sd_ent; /* pop found dirent */ - stripe_dirent_next(ctxt, stripe, min); + stripe->sd_ent = stripe_dirent_get(ctxt, lu_dirent_next(ent), + min); } return ent; From patchwork Thu Feb 27 21:08:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409741 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 762D0138D for ; Thu, 27 Feb 2020 21:20:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F38B2469F for ; Thu, 27 Feb 2020 21:20:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F38B2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AC06C21FAC1; Thu, 27 Feb 2020 13:20:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6AB021FA55 for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A9844A1E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A827C468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:30 -0500 Message-Id: <1582838290-17243-43-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 042/622] lustre: ldlm: speed up preparation for list of lock cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng Keep the skipped locks in lru list will cause serious contention for ns_lock. Since we have to travel them every time in the ldlm_prepare_lru_list(). So we will use a cursor to record position that last accessed lock in lru list. WC-bug-id: https://jira.whamcloud.com/browse/LU-9230 Lustre-commit: 651f2cdd2d8d ("LU-9230 ldlm: speed up preparation for list of lock cancel") Signed-off-by: Yang Sheng Signed-off-by: Sergey Cheremencev Reviewed-on: https://review.whamcloud.com/26327 Reviewed-by: Fan Yong Reviewed-by: Vitaly Fertman Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 + fs/lustre/include/lustre_dlm_flags.h | 9 ----- fs/lustre/ldlm/ldlm_lock.c | 3 +- fs/lustre/ldlm/ldlm_request.c | 72 ++++++++++++++++-------------------- fs/lustre/ldlm/ldlm_resource.c | 1 + 5 files changed, 35 insertions(+), 51 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 66608a9..1a19b35 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -406,6 +406,7 @@ struct ldlm_namespace { struct list_head ns_unused_list; /** Number of locks in the LRU list above */ int ns_nr_unused; + struct list_head *ns_last_pos; /** * Maximum number of locks permitted in the LRU. If 0, means locks diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h index c8667c8..3d69c49 100644 --- a/fs/lustre/include/lustre_dlm_flags.h +++ b/fs/lustre/include/lustre_dlm_flags.h @@ -200,15 +200,6 @@ #define ldlm_set_fail_loc(_l) LDLM_SET_FLAG((_l), 1ULL << 32) #define ldlm_clear_fail_loc(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 32) -/** - * Used while processing the unused list to know that we have already - * handled this lock and decided to skip it. - */ -#define LDLM_FL_SKIPPED 0x0000000200000000ULL /* bit 33 */ -#define ldlm_is_skipped(_l) LDLM_TEST_FLAG((_l), 1ULL << 33) -#define ldlm_set_skipped(_l) LDLM_SET_FLAG((_l), 1ULL << 33) -#define ldlm_clear_skipped(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 33) - /** this lock is being destroyed */ #define LDLM_FL_CBPENDING 0x0000000400000000ULL /* bit 34 */ #define ldlm_is_cbpending(_l) LDLM_TEST_FLAG((_l), 1ULL << 34) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 9847c43..894b99b 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -204,6 +204,8 @@ int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock) struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); LASSERT(lock->l_resource->lr_type != LDLM_FLOCK); + if (ns->ns_last_pos == &lock->l_lru) + ns->ns_last_pos = lock->l_lru.prev; list_del_init(&lock->l_lru); LASSERT(ns->ns_nr_unused > 0); ns->ns_nr_unused--; @@ -249,7 +251,6 @@ void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) LASSERT(list_empty(&lock->l_lru)); LASSERT(lock->l_resource->lr_type != LDLM_FLOCK); list_add_tail(&lock->l_lru, &ns->ns_unused_list); - ldlm_clear_skipped(lock); LASSERT(ns->ns_nr_unused >= 0); ns->ns_nr_unused++; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5ec0da5..dd4d958 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1368,9 +1368,6 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, /* fall through */ default: result = LDLM_POLICY_SKIP_LOCK; - lock_res_and_lock(lock); - ldlm_set_skipped(lock); - unlock_res_and_lock(lock); break; } @@ -1592,54 +1589,47 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, int flags) { ldlm_cancel_lru_policy_t pf; - struct ldlm_lock *lock, *next; - int added = 0, unused, remained; + int added = 0; int no_wait = flags & LDLM_LRU_FLAG_NO_WAIT; - spin_lock(&ns->ns_lock); - unused = ns->ns_nr_unused; - remained = unused; - if (!ns_connect_lru_resize(ns)) - count += unused - ns->ns_max_unused; + count += ns->ns_nr_unused - ns->ns_max_unused; pf = ldlm_cancel_lru_policy(ns, flags); LASSERT(pf); - while (!list_empty(&ns->ns_unused_list)) { + /* For any flags, stop scanning if @max is reached. */ + while (!list_empty(&ns->ns_unused_list) && (max == 0 || added < max)) { + struct ldlm_lock *lock; + struct list_head *item, *next; enum ldlm_policy_res result; ktime_t last_use = ktime_set(0, 0); - /* all unused locks */ - if (remained-- <= 0) - break; - - /* For any flags, stop scanning if @max is reached. */ - if (max && added >= max) - break; + spin_lock(&ns->ns_lock); + item = no_wait ? ns->ns_last_pos : &ns->ns_unused_list; + for (item = item->next, next = item->next; + item != &ns->ns_unused_list; + item = next, next = item->next) { + lock = list_entry(item, struct ldlm_lock, l_lru); - list_for_each_entry_safe(lock, next, &ns->ns_unused_list, - l_lru) { /* No locks which got blocking requests. */ LASSERT(!ldlm_is_bl_ast(lock)); - if (no_wait && ldlm_is_skipped(lock)) - /* already processed */ - continue; - - last_use = lock->l_last_used; - - /* Somebody is already doing CANCEL. No need for this - * lock in LRU, do not traverse it again. - */ if (!ldlm_is_canceling(lock) || !ldlm_is_converting(lock)) break; + /* Somebody is already doing CANCEL. No need for this + * lock in LRU, do not traverse it again. + */ ldlm_lock_remove_from_lru_nolock(lock); } - if (&lock->l_lru == &ns->ns_unused_list) + if (item == &ns->ns_unused_list) { + spin_unlock(&ns->ns_lock); break; + } + + last_use = lock->l_last_used; LDLM_LOCK_GET(lock); spin_unlock(&ns->ns_lock); @@ -1659,19 +1649,23 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, * their weight. Big extent locks will stay in * the cache. */ - result = pf(ns, lock, unused, added, count); + result = pf(ns, lock, ns->ns_nr_unused, added, count); if (result == LDLM_POLICY_KEEP_LOCK) { - lu_ref_del(&lock->l_reference, - __func__, current); + lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); break; } + if (result == LDLM_POLICY_SKIP_LOCK) { - lu_ref_del(&lock->l_reference, - __func__, current); + lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); + if (no_wait) { + spin_lock(&ns->ns_lock); + if (!list_empty(&lock->l_lru) && + lock->l_lru.prev == ns->ns_last_pos) + ns->ns_last_pos = &lock->l_lru; + spin_unlock(&ns->ns_lock); + } continue; } @@ -1690,7 +1684,6 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); continue; } LASSERT(!lock->l_readers && !lock->l_writers); @@ -1728,11 +1721,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, list_add(&lock->l_bl_ast, cancels); unlock_res_and_lock(lock); lu_ref_del(&lock->l_reference, __func__, current); - spin_lock(&ns->ns_lock); added++; - unused--; } - spin_unlock(&ns->ns_lock); return added; } diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 5e0dd53..7fe8a8b 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -682,6 +682,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_connect_flags = 0; ns->ns_dirty_age_limit = LDLM_DIRTY_AGE_LIMIT; ns->ns_stopping = 0; + ns->ns_last_pos = &ns->ns_unused_list; rc = ldlm_namespace_sysfs_register(ns); if (rc != 0) { From patchwork Thu Feb 27 21:08:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0A2D159A for ; Thu, 27 Feb 2020 21:19:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D98EF246A1 for ; Thu, 27 Feb 2020 21:19:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D98EF246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69D9321FCDA; Thu, 27 Feb 2020 13:19:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3A89421FACC for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AC4C6B89; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AAF0D46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:31 -0500 Message-Id: <1582838290-17243-44-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 043/622] lustre: checksum: enable/disable checksum correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu There are three ways to set checksum support in Lustre. Their order during client mount is: - 1. configure --enable/disable-checksum, this(ENABLE_CHECKSUM) only affects the default mount option and is set in function client_obd_setup(). - 2. lctl set_param -P osc.*.checksums=0/1, when processing llog, this value will be set by osc_checksum_seq_write(). - 3. mount option checksum/nochecksum, this will be checked in ll_options() and be set in client_common_fill_super()-> obd_set_info_async(). This patch fixes one issue in 3. That is if mount option "-o checksum/nochecksum" is specified, checksum will be changed accordingly, no matter what is set by "set_param -P" or the default option; and if no mount option is specified, the value set by "set_param -P" will be kept. Also, test_77k is added to sanity.sh to verify this patch. What's more, a minor initialization issue of cl_supp_cksum_types is fixed. cl_supp_cksum_types should be always initialized no matter checksum is enabled or not. WC-bug-id: https://jira.whamcloud.com/browse/LU-10906 Lustre-commit: e9b13cd1daf9 ("LU-10906 checksum: enable/disable checksum correctly") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/32095 Reviewed-by: Yingjin Qian Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 5 +++-- fs/lustre/llite/llite_internal.h | 3 ++- fs/lustre/llite/llite_lib.c | 23 ++++++++++++++--------- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 7bc1d10..2c0fad3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -355,6 +355,8 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) init_waitqueue_head(&cli->cl_destroy_waitq); atomic_set(&cli->cl_destroy_in_flight, 0); + + cli->cl_supp_cksum_types = OBD_CKSUM_CRC32; /* Turn on checksumming by default. */ cli->cl_checksum = 1; /* @@ -362,8 +364,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) * Set cl_chksum* to CRC32 for now to avoid returning screwed info * through procfs. */ - cli->cl_cksum_type = OBD_CKSUM_CRC32; - cli->cl_supp_cksum_types = OBD_CKSUM_CRC32; + cli->cl_cksum_type = cli->cl_supp_cksum_types; atomic_set(&cli->cl_resends, OSC_DEFAULT_RESENDS); /* diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d0a703d..6bdbf28 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -479,7 +479,8 @@ struct ll_sb_info { unsigned int ll_umounting:1, ll_xattr_cache_enabled:1, ll_xattr_cache_set:1, /* already set to 0/1 */ - ll_client_common_fill_super_succeeded:1; + ll_client_common_fill_super_succeeded:1, + ll_checksum_set:1; struct lustre_client_ocd ll_lco; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e2c7a4d..eb29064 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -560,13 +560,15 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) } checksum = sbi->ll_flags & LL_SBI_CHECKSUM; - err = obd_set_info_async(NULL, sbi->ll_dt_exp, sizeof(KEY_CHECKSUM), - KEY_CHECKSUM, sizeof(checksum), &checksum, - NULL); - if (err) { - CERROR("%s: Set checksum failed: rc = %d\n", - sbi->ll_dt_exp->exp_obd->obd_name, err); - goto out_root; + if (sbi->ll_checksum_set) { + err = obd_set_info_async(NULL, sbi->ll_dt_exp, + sizeof(KEY_CHECKSUM), KEY_CHECKSUM, + sizeof(checksum), &checksum, NULL); + if (err) { + CERROR("%s: Set checksum failed: rc = %d\n", + sbi->ll_dt_exp->exp_obd->obd_name, err); + goto out_root; + } } cl_sb_init(sb); @@ -763,10 +765,11 @@ static inline int ll_set_opt(const char *opt, char *data, int fl) } /* non-client-specific mount options are parsed in lmd_parse */ -static int ll_options(char *options, int *flags) +static int ll_options(char *options, struct ll_sb_info *sbi) { int tmp; char *s1 = options, *s2; + int *flags = &sbi->ll_flags; if (!options) return 0; @@ -832,11 +835,13 @@ static int ll_options(char *options, int *flags) tmp = ll_set_opt("checksum", s1, LL_SBI_CHECKSUM); if (tmp) { *flags |= tmp; + sbi->ll_checksum_set = 1; goto next; } tmp = ll_set_opt("nochecksum", s1, LL_SBI_CHECKSUM); if (tmp) { *flags &= ~tmp; + sbi->ll_checksum_set = 1; goto next; } tmp = ll_set_opt("lruresize", s1, LL_SBI_LRU_RESIZE); @@ -971,7 +976,7 @@ int ll_fill_super(struct super_block *sb) goto out_free; } - err = ll_options(lsi->lsi_lmd->lmd_opts, &sbi->ll_flags); + err = ll_options(lsi->lsi_lmd->lmd_opts, sbi); if (err) goto out_free; From patchwork Thu Feb 27 21:08:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409745 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64F7014BC for ; Thu, 27 Feb 2020 21:21:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D5A22469F for ; Thu, 27 Feb 2020 21:21:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D5A22469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E507321FD07; Thu, 27 Feb 2020 13:20:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9041F21FA61 for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AF50DBA9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ADE5746A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:32 -0500 Message-Id: <1582838290-17243-45-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 044/622] lustre: build: armv7 client build fixes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko This commit is supposed to fix armv7 Lustre client build, mostly 64-bit division related changes. WC-bug-id: https://jira.whamcloud.com/browse/LU-10964 Lustre-commit: 0300a6efd226 ("LU-10964 build: armv7 client build fixes") Signed-off-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/32194 Reviewed-by: James Simmons Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 3 ++- fs/lustre/ptlrpc/import.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index dd4d958..3991a8f 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1408,7 +1408,8 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns, slv = ldlm_pool_get_slv(pl); lvf = ldlm_pool_get_lvf(pl); - la = ktime_to_ns(ktime_sub(cur, lock->l_last_used)) / NSEC_PER_SEC; + la = div_u64(ktime_to_ns(ktime_sub(cur, lock->l_last_used)), + NSEC_PER_SEC); lv = lvf * la * unused; /* Inform pool about current CLV to see it via debugfs. */ diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index f69b907..5d6546d 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -289,7 +289,7 @@ void ptlrpc_invalidate_import(struct obd_import *imp) */ if (!OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_LONG_REPL_UNLINK)) { timeout = ptlrpc_inflight_timeout(imp); - timeout += timeout / 3; + timeout += div_u64(timeout, 3); if (timeout == 0) timeout = obd_timeout; From patchwork Thu Feb 27 21:08:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 428F214BC for ; Thu, 27 Feb 2020 21:20:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2B0B4246A1 for ; Thu, 27 Feb 2020 21:20:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B0B4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 002CE21FCFD; Thu, 27 Feb 2020 13:19:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D447121FA61 for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B244EE01; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0D4F46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:33 -0500 Message-Id: <1582838290-17243-46-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 045/622] lustre: ldlm: fix l_last_activity usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko When race happen between ldlm_server_blocking_ast() and ldlm_request_cancel(), the at_measured() is called with wrong value equal to current time. And even worse, ldlm_bl_timeout() can return current_time*1.5. Before a time functions was fixed by LU-9019(fdeeed2fb) for 64bit, this race leads to ETIMEDOUT at ptlrpc_import_delay_req() and client eviction during bl ast sending. The wrong type conversion take a place at pltrpc_send_limit_expired() at cfs_time_seconds(). We should not take cancels into accoount if the BLAST is not send, just because the last_activity is not properly initialised - it destroys the AT completely. The patch devides l_last_activity to the client l_activity and server l_blast_sent for better understanding. The l_blast_sent is used for blocking ast only to measure time between BLAST and cancel request. For example: server cancels blocked lock after 1518731697s waiting_locks_callback()) ### lock callback timer expired after 0s: evicting client WC-bug-id: https://jira.whamcloud.com/browse/LU-10945 Lustre-commit: e09d273cb5f2 ("LU-10945 ldlm: fix l_last_activity usage") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-5736 Reviewed-on: https://review.whamcloud.com/32133 Reviewed-by: Andreas Dilger Reviewed-by: Vitaly Fertman Reviewed-by: James Simmons Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 13 +++++++------ fs/lustre/ldlm/ldlm_lock.c | 1 + fs/lustre/ldlm/ldlm_request.c | 14 +++++++------- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 1a19b35..6ad12a3 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -708,12 +708,6 @@ struct ldlm_lock { wait_queue_head_t l_waitq; /** - * Seconds. It will be updated if there is any activity related to - * the lock, e.g. enqueue the lock or send blocking AST. - */ - time64_t l_last_activity; - - /** * Time, in nanoseconds, last used by e.g. being matched by lock match. */ ktime_t l_last_used; @@ -735,6 +729,13 @@ struct ldlm_lock { /** Private storage for lock user. Opaque to LDLM. */ void *l_ast_data; + + /** + * Seconds. It will be updated if there is any activity related to + * the lock at client, e.g. enqueue the lock. + */ + time64_t l_activity; + /* Separate ost_lvb used mostly by Data-on-MDT for now. * It is introduced to don't mix with layout lock data. */ diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 894b99b..1bf387a 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -420,6 +420,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource) lu_ref_init(&lock->l_reference); lu_ref_add(&lock->l_reference, "hash", lock); lock->l_callback_timeout = 0; + lock->l_activity = 0; #if LUSTRE_TRACKS_LOCK_EXP_REFS INIT_LIST_HEAD(&lock->l_exp_refs_link); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 3991a8f..67c23fc 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -114,9 +114,9 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) LDLM_ERROR(lock, "lock timed out (enqueued at %lld, %llds ago); not entering recovery in server code, just going back to sleep", - (s64)lock->l_last_activity, + (s64)lock->l_activity, (s64)(ktime_get_real_seconds() - - lock->l_last_activity)); + lock->l_activity)); if (ktime_get_seconds() > next_dump) { last_dump = next_dump; next_dump = ktime_get_seconds() + 300; @@ -133,8 +133,8 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) ptlrpc_fail_import(imp, conn_cnt); LDLM_ERROR(lock, "lock timed out (enqueued at %lld, %llds ago), entering recovery for %s@%s", - (s64)lock->l_last_activity, - (s64)(ktime_get_real_seconds() - lock->l_last_activity), + (s64)lock->l_activity, + (s64)(ktime_get_real_seconds() - lock->l_activity), obd2cli_tgt(obd), imp->imp_connection->c_remote_uuid.uuid); } @@ -182,7 +182,7 @@ static int ldlm_completion_tail(struct ldlm_lock *lock, void *data) LDLM_DEBUG(lock, "client-side enqueue: granted"); } else { /* Take into AT only CP RPC, not immediately granted locks */ - delay = ktime_get_real_seconds() - lock->l_last_activity; + delay = ktime_get_real_seconds() - lock->l_activity; LDLM_DEBUG(lock, "client-side enqueue: granted after %lds", delay); @@ -245,7 +245,7 @@ int ldlm_completion_ast(struct ldlm_lock *lock, u64 flags, void *data) timeout = ldlm_cp_timeout(lock); - lock->l_last_activity = ktime_get_real_seconds(); + lock->l_activity = ktime_get_real_seconds(); if (imp) { spin_lock(&imp->imp_lock); @@ -725,7 +725,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, lock->l_export = NULL; lock->l_blocking_ast = einfo->ei_cb_bl; lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL)); - lock->l_last_activity = ktime_get_real_seconds(); + lock->l_activity = ktime_get_real_seconds(); /* lock not sent to server yet */ if (!reqp || !*reqp) { From patchwork Thu Feb 27 21:08:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09AA6138D for ; Thu, 27 Feb 2020 21:20:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E506F246A1 for ; Thu, 27 Feb 2020 21:20:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E506F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A232721FD53; Thu, 27 Feb 2020 13:19:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38B6F21FAD6 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B5447E02; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B39EE46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:34 -0500 Message-Id: <1582838290-17243-47-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 046/622] lustre: ptlrpc: Add WBC connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin It denotes ability of the node to understand additional types of intent requests, exclusive metadata locks issued to clients and server operations performed under such locks while still held by clients. WC-bug-id: https://jira.whamcloud.com/browse/LU-10938 Lustre-commit: f024aabf8bbf ("LU-10938 ptlrpc: Add WBC connect flag") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32241 Reviewed-by: Andreas Dilger Reviewed-by: Mikhal Pershin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 5 +++++ 3 files changed, 8 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 66d2679..e2575b4 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -117,6 +117,7 @@ "unknown", /* 0x08 */ "unknown", /* 0x10 */ "flr", /* 0x20 */ + "wbc", /* 0x40 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index b14d301c..c566dea 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1115,6 +1115,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_DIR_MIGRATE); LASSERTF(OBD_CONNECT2_FLR == 0x20ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_FLR); + LASSERTF(OBD_CONNECT2_WBC_INTENTS == 0x40ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_WBC_INTENTS); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 2403b89..f437614 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -794,6 +794,11 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ +#define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents + * for wbc, also operations + * under client-held parent + * locks + */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:08:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A360514BC for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8BFD12469F for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8BFD12469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 779DA21FD4D; Thu, 27 Feb 2020 13:20:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8D1C521F982 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B82C3E03; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B6AA5468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:35 -0500 Message-Id: <1582838290-17243-48-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 047/622] lustre: llog: remove obsolete llog handlers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" Remove the obsolete llog RPC handling for cancel, close, and destroy. Remove llog handling from ldlm_callback_handler(). Remove the unused client side method llog_client_destroy(). WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 85011d372dfb ("LU-10855 llog: remove obsolete llog handlers") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32202 Reviewed-by: Mikhal Pershin Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 3 --- fs/lustre/ptlrpc/layout.c | 26 -------------------------- include/uapi/linux/lustre/lustre_idl.h | 12 ++++++------ 3 files changed, 6 insertions(+), 35 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 2348569..2737240 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -212,13 +212,10 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_LDLM_GL_CALLBACK; extern struct req_format RQF_LDLM_GL_CALLBACK_DESC; /* LOG req_format */ -extern struct req_format RQF_LOG_CANCEL; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_CREATE; -extern struct req_format RQF_LLOG_ORIGIN_HANDLE_DESTROY; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_READ_HEADER; -extern struct req_format RQF_LLOG_ORIGIN_CONNECT; extern struct req_format RQF_CONNECT; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 4909b30..8fe661d 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -88,11 +88,6 @@ &RMF_MGS_CONFIG_RES }; -static const struct req_msg_field *log_cancel_client[] = { - &RMF_PTLRPC_BODY, - &RMF_LOGCOOKIES -}; - static const struct req_msg_field *mdt_body_only[] = { &RMF_PTLRPC_BODY, &RMF_MDT_BODY @@ -547,11 +542,6 @@ &RMF_LLOG_LOG_HDR }; -static const struct req_msg_field *llogd_conn_body_only[] = { - &RMF_PTLRPC_BODY, - &RMF_LLOGD_CONN_BODY -}; - static const struct req_msg_field *llog_origin_handle_next_block_server[] = { &RMF_PTLRPC_BODY, &RMF_LLOGD_BODY, @@ -766,13 +756,10 @@ &RQF_LDLM_INTENT_CREATE, &RQF_LDLM_INTENT_UNLINK, &RQF_LDLM_INTENT_GETXATTR, - &RQF_LOG_CANCEL, &RQF_LLOG_ORIGIN_HANDLE_CREATE, - &RQF_LLOG_ORIGIN_HANDLE_DESTROY, &RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK, &RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK, &RQF_LLOG_ORIGIN_HANDLE_READ_HEADER, - &RQF_LLOG_ORIGIN_CONNECT, &RQF_CONNECT, }; @@ -1254,10 +1241,6 @@ struct req_format RQF_FLD_READ = DEFINE_REQ_FMT0("FLD_READ", fld_read_client, fld_read_server); EXPORT_SYMBOL(RQF_FLD_READ); -struct req_format RQF_LOG_CANCEL = - DEFINE_REQ_FMT0("OBD_LOG_CANCEL", log_cancel_client, empty); -EXPORT_SYMBOL(RQF_LOG_CANCEL); - struct req_format RQF_MDS_QUOTACTL = DEFINE_REQ_FMT0("MDS_QUOTACTL", quotactl_only, quotactl_only); EXPORT_SYMBOL(RQF_MDS_QUOTACTL); @@ -1511,11 +1494,6 @@ struct req_format RQF_LLOG_ORIGIN_HANDLE_CREATE = llog_origin_handle_create_client, llogd_body_only); EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_CREATE); -struct req_format RQF_LLOG_ORIGIN_HANDLE_DESTROY = - DEFINE_REQ_FMT0("LLOG_ORIGIN_HANDLE_DESTROY", - llogd_body_only, llogd_body_only); -EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_DESTROY); - struct req_format RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK = DEFINE_REQ_FMT0("LLOG_ORIGIN_HANDLE_NEXT_BLOCK", llogd_body_only, llog_origin_handle_next_block_server); @@ -1531,10 +1509,6 @@ struct req_format RQF_LLOG_ORIGIN_HANDLE_READ_HEADER = llogd_body_only, llog_log_hdr_only); EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_READ_HEADER); -struct req_format RQF_LLOG_ORIGIN_CONNECT = - DEFINE_REQ_FMT0("LLOG_ORIGIN_CONNECT", llogd_conn_body_only, empty); -EXPORT_SYMBOL(RQF_LLOG_ORIGIN_CONNECT); - struct req_format RQF_CONNECT = DEFINE_REQ_FMT0("CONNECT", obd_connect_client, obd_connect_server); EXPORT_SYMBOL(RQF_CONNECT); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index f437614..7cf7307 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2312,7 +2312,7 @@ struct cfg_marker { enum obd_cmd { OBD_PING = 400, - OBD_LOG_CANCEL, + OBD_LOG_CANCEL, /* Obsolete since 1.5. */ OBD_QC_CALLBACK, /* not used since 2.4 */ OBD_IDX_READ, OBD_LAST_OPC @@ -2624,12 +2624,12 @@ enum llogd_rpc_ops { LLOG_ORIGIN_HANDLE_CREATE = 501, LLOG_ORIGIN_HANDLE_NEXT_BLOCK = 502, LLOG_ORIGIN_HANDLE_READ_HEADER = 503, - LLOG_ORIGIN_HANDLE_WRITE_REC = 504, - LLOG_ORIGIN_HANDLE_CLOSE = 505, - LLOG_ORIGIN_CONNECT = 506, - LLOG_CATINFO = 507, /* deprecated */ + LLOG_ORIGIN_HANDLE_WRITE_REC = 504, /* Obsolete by 2.1. */ + LLOG_ORIGIN_HANDLE_CLOSE = 505, /* Obsolete by 1.8. */ + LLOG_ORIGIN_CONNECT = 506, /* Obsolete by 2.4. */ + LLOG_CATINFO = 507, /* Obsolete by 2.3. */ LLOG_ORIGIN_HANDLE_PREV_BLOCK = 508, - LLOG_ORIGIN_HANDLE_DESTROY = 509, /* for destroy llog object*/ + LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete. */ LLOG_LAST_OPC, LLOG_FIRST_OPC = LLOG_ORIGIN_HANDLE_CREATE }; From patchwork Thu Feb 27 21:08:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B86F14BC for ; Thu, 27 Feb 2020 21:20:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2307C246A1 for ; Thu, 27 Feb 2020 21:20:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2307C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8F5721FE03; Thu, 27 Feb 2020 13:19:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E599D21F982 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BB005E04; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B991646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:36 -0500 Message-Id: <1582838290-17243-49-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 048/622] lustre: ldlm: fix for l_lru usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng Fixes for lock convert code to prevent false assertion and busy locks in LRU: - ensure no l_readers and l_writers when add lock to LRU after convert. - don't verify l_lru without ns_lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-11003 Lustre-commit: 2a77dd3bee76 ("LU-11003 ldlm: fix for l_lru usage") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/32309 Reviewed-by: Fan Yong Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_inodebits.c | 1 - fs/lustre/ldlm/ldlm_request.c | 19 +++++++++++-------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index e74928e..ddbf8d4 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -171,7 +171,6 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); unlock_res_and_lock(lock); - LASSERT(list_empty(&lock->l_lru)); goto exit; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 67c23fc..5833f59 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -881,21 +881,25 @@ static int lock_convert_interpret(const struct lu_env *env, } else { ldlm_clear_converting(lock); - /* Concurrent BL AST has arrived, it may cause another convert - * or cancel so just exit here. + /* Concurrent BL AST may arrive and cause another convert + * or cancel so just do nothing here if bl_ast is set, + * finish with convert otherwise. */ if (!ldlm_is_bl_ast(lock)) { struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); /* Drop cancel_bits since there are no more converts - * and put lock into LRU if it is not there yet. + * and put lock into LRU if it is still not used and + * is not there yet. */ lock->l_policy_data.l_inodebits.cancel_bits = 0; - spin_lock(&ns->ns_lock); - if (!list_empty(&lock->l_lru)) + if (!lock->l_readers && !lock->l_writers) { + spin_lock(&ns->ns_lock); + /* there is check for list_empty() inside */ ldlm_lock_remove_from_lru_nolock(lock); - ldlm_lock_add_to_lru_nolock(lock); - spin_unlock(&ns->ns_lock); + ldlm_lock_add_to_lru_nolock(lock); + spin_unlock(&ns->ns_lock); + } } } unlock_res_and_lock(lock); @@ -903,7 +907,6 @@ static int lock_convert_interpret(const struct lu_env *env, if (rc) { lock_res_and_lock(lock); if (ldlm_is_converting(lock)) { - LASSERT(list_empty(&lock->l_lru)); ldlm_clear_converting(lock); ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); From patchwork Thu Feb 27 21:08:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0D2C138D for ; Thu, 27 Feb 2020 21:20:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A999E246A1 for ; Thu, 27 Feb 2020 21:20:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A999E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73E8C21FE95; Thu, 27 Feb 2020 13:19:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 321A421F982 for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BE4B5E05; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BCA6046A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:37 -0500 Message-Id: <1582838290-17243-50-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 049/622] lustre: lov: Move lov_tgts_kobj init to lov_setup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin and free it in lov_cleanup. This looks like a more robust solution vs doint it in lov_putref esp. since we know refcount there crosses 0 repeatedly, confusing things. WC-bug-id: https://jira.whamcloud.com/browse/LU-11015 Lustre-commit: 313ac16698db ("LU-11015 lov: Move lov_tgts_kobj init to lov_setup") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32367 Reviewed-by: James Simmons Reviewed-by: John L. Hammond Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 26637bc..9449aa9 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -110,10 +110,6 @@ void lov_tgts_putref(struct obd_device *obd) /* Disconnect */ __lov_del_obd(obd, tgt); } - - if (lov->lov_tgts_kobj) - kobject_put(lov->lov_tgts_kobj); - } else { mutex_unlock(&lov->lov_lock); } @@ -235,9 +231,6 @@ static int lov_connect(const struct lu_env *env, lov_tgts_getref(obd); - lov->lov_tgts_kobj = kobject_create_and_add("target_obds", - &obd->obd_kset.kobj); - for (i = 0; i < lov->desc.ld_tgt_count; i++) { tgt = lov->lov_tgts[i]; if (!tgt || obd_uuid_empty(&tgt->ltd_uuid)) @@ -784,6 +777,9 @@ int lov_setup(struct obd_device *obd, struct lustre_cfg *lcfg) if (rc) goto out_tunables; + lov->lov_tgts_kobj = kobject_create_and_add("target_obds", + &obd->obd_kset.kobj); + return 0; out_tunables: @@ -799,6 +795,11 @@ static int lov_cleanup(struct obd_device *obd) struct lov_obd *lov = &obd->u.lov; struct pool_desc *pool, *tmp; + if (lov->lov_tgts_kobj) { + kobject_put(lov->lov_tgts_kobj); + lov->lov_tgts_kobj = NULL; + } + list_for_each_entry_safe(pool, tmp, &lov->lov_pool_list, pool_list) { /* free pool structs */ CDEBUG(D_INFO, "delete pool %p\n", pool); From patchwork Thu Feb 27 21:08:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20BFF138D for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 083092469F for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 083092469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AFB4A21FD8C; Thu, 27 Feb 2020 13:20:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 74F0921FA4B for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C14AAE07; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BFFCC46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:38 -0500 Message-Id: <1582838290-17243-51-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 050/622] lustre: osc: add T10PI support for RPC checksum X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi T10 Protection Information (T10 PI), previously known as Data Integrity Field (DIF), is a standard for end-to-end data integrity validation. T10 PI prevents silent data corruption, ensuring that incomplete and incorrect data cannot overwrite good data. Lustre file system already supports RPC level checksum which validates the data in bulk RPCs when writing/reading data to/from objects on OSTs. RPC level checksum can detect data corruption that happens during RPC being transferred over the wire. However, it is not capable to prevent silent data corruption happening in other conditions, for example, memory corruption when data is cached in page cache. And by using the existing checksum mechanism, only disjoint protection coverage is provided. Thus, in order to provide end-to-end data protection, T10PI support for Lustre should be added. In order to provide end-to-end data integrity validation, the T10 PI checksum of data in a sector need to be calculated on Lustre client side and validated later on the Lustre OSS side. The T10 protection information should be sent together with the data in the RPC. However, in order to avoid significant performance degradation, instead of sending all original guard tags for all sectors in a bulk RPC, the existing checksum feature of bulk RPC will be integrated together with the new T10PI feature. When OST starts, necessary T10PI information will be extracted from storage, i.e. the T10PI DIF type and sector size. The DIF type could be one of TYPE1_IP, TYPE1_CRC, TYPE3_IP and TYPE3_CRC. And sector size could be either 512 or 4K bytes. When an OSC is connecting to OST, OSC and OST will negotiate about the checksum types. New checksum types are added for T10PI support including OBD_CKSUM_T10IP512, OBD_CKSUM_T10IP4K, OBD_CKSUM_T10CRC512, and OBD_CKSUM_T10CRC4K. If the OST storage has T10PI suppoort, the only selectable T10PI checksum type would have the same type with the T10PI type of the hardware. The other existing checksum types (crc32, crc32c, adler32) are still valid options for the RPC checksum type. When calculating RPC checksum of T10PI, the T10PI checksums of all sectors will be calculated first using the T10PI chekcsum type, i.e. 16-bit crc or IP checksum. And then RPC checksum will be calculated on all of the T10PI checksums. The RPC checksum type used in this step is always alder32. Considering that the checksum-of-checksums is only computed on a * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors, or 16KB of GRD tags for 16MB of 4KB sectors, this is only 1/256 or 1/1024 of the total data being checksummed, so the checksum type used here should not affect overall system performance noticeably. obdfilter.*.enforce_t10pi_cksum can be used to tune whether to enforce T10-PI checksum or not. If the OST supports T10-PI feature and T10-PI chekcsum is enforced, clients will have no other choice for RPC checksum type other than using the T10PI chekcsum type. This is useful for enforcing end-to-end integrity in the whole system. If the OST doesn't support T10-PI feature and T10-PI chekcsum is enforced, together with other checksums with reasonably good speeds (e.g. crc32, crc32c, adler, etc.), all the T10-PI checksum types (t10ip512, t10ip4K, t10crc512, t10crc4K) will be added to the available checksum types, regardless of the speeds of T10-PI chekcsums. This is useful for testing T10-PI checksums of RPC. If the OST supports T10-PI feature and T10-PI chekcsum is NOT enforced, the corresponding T10-PI checksum type will be added to the checksum type list, regardless of the speed of the T10-PI chekcsum. This provide the clients to flexibility to choose whether to enable end-to-end integrity or not. If the OST does NOT supports T10-PI feature and T10-PI chekcsum is NOT enforced, together with other checksums with reasonably good speeds, all the T10-PI checksum types with good speeds will be added into the checksum type list. Note that a T10-PI checksum type with a speed worse than half of Alder will NOT be added as a option. In this circumstance, T10-PI checksum types has the same behavior like other normal checksum types. The clients that has no T10-PI RPC checksum support will not be affected by the above-mentioned logic. And that logic will only be enforced to the newly connected clients after changing obdfilter.*.enforce_t10pi_cksum on an OST. Following are the speeds of different checksum types on a server with CPU of Intel(R) Xeon(R) E5-2650 @ 2.00GHz: crc: 1575 MB/s crc32c: 9763 MB/s adler: 1255 MB/s t10ip512: 6151 MB/s t10ip4k: 7935 MB/s t10crc512: 1119 MB/s t10crc4k: 1531 MB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-10472 Lustre-commit: b1e7be00cb6e ("LU-10472 osc: add T10PI support for RPC checksum") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/30980 Reviewed-by: Andreas Dilger Reviewed-by: Faccini Bruno Signed-off-by: James Simmons --- fs/lustre/include/obd_cksum.h | 123 +++++++++------ fs/lustre/include/obd_class.h | 1 - fs/lustre/llite/llite_lib.c | 4 +- fs/lustre/obdclass/Makefile | 2 +- fs/lustre/obdclass/integrity.c | 273 +++++++++++++++++++++++++++++++++ fs/lustre/obdclass/obd_cksum.c | 151 ++++++++++++++++++ fs/lustre/osc/osc_request.c | 214 +++++++++++++++++++++++--- fs/lustre/ptlrpc/import.c | 8 +- fs/lustre/ptlrpc/wiretest.c | 17 +- include/uapi/linux/lustre/lustre_idl.h | 48 ++++-- net/lnet/libcfs/linux-crypto.c | 3 + 11 files changed, 753 insertions(+), 91 deletions(-) create mode 100644 fs/lustre/obdclass/integrity.c create mode 100644 fs/lustre/obdclass/obd_cksum.c diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h index 26a9555..cc47c44 100644 --- a/fs/lustre/include/obd_cksum.h +++ b/fs/lustre/include/obd_cksum.h @@ -35,6 +35,9 @@ #include #include +int obd_t10_cksum_speed(const char *obd_name, + enum cksum_type cksum_type); + static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type) { switch (cksum_type) { @@ -51,59 +54,23 @@ static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type) return 0; } -/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can - * only be a single checksum type per RPC. - * - * The OBD_CHECKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask - * since they need to represent the full range of checksum algorithms that - * both the client and server can understand. - * - * In case of an unsupported types/flags we fall back to ADLER - * because that is supported by all clients since 1.8 - * - * In case multiple algorithms are supported the best one is used. - */ -static inline u32 cksum_type_pack(enum cksum_type cksum_type) -{ - unsigned int performance = 0, tmp; - u32 flag = OBD_FL_CKSUM_ADLER; - - if (cksum_type & OBD_CKSUM_CRC32) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_CRC32; - } - } - if (cksum_type & OBD_CKSUM_CRC32C) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_CRC32C; - } - } - if (cksum_type & OBD_CKSUM_ADLER) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_ADLER; - } - } - if (unlikely(cksum_type && !(cksum_type & (OBD_CKSUM_CRC32C | - OBD_CKSUM_CRC32 | - OBD_CKSUM_ADLER)))) - CWARN("unknown cksum type %x\n", cksum_type); - - return flag; -} +u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type); -static inline enum cksum_type cksum_type_unpack(u32 o_flags) +static inline enum cksum_type obd_cksum_type_unpack(u32 o_flags) { switch (o_flags & OBD_FL_CKSUM_ALL) { case OBD_FL_CKSUM_CRC32C: return OBD_CKSUM_CRC32C; case OBD_FL_CKSUM_CRC32: return OBD_CKSUM_CRC32; + case OBD_FL_CKSUM_T10IP512: + return OBD_CKSUM_T10IP512; + case OBD_FL_CKSUM_T10IP4K: + return OBD_CKSUM_T10IP4K; + case OBD_FL_CKSUM_T10CRC512: + return OBD_CKSUM_T10CRC512; + case OBD_FL_CKSUM_T10CRC4K: + return OBD_CKSUM_T10CRC4K; default: break; } @@ -115,7 +82,7 @@ static inline enum cksum_type cksum_type_unpack(u32 o_flags) * 1.8 supported ADLER it is base and not depend on hw * Client uses all available local algos */ -static inline enum cksum_type cksum_types_supported_client(void) +static inline enum cksum_type obd_cksum_types_supported_client(void) { enum cksum_type ret = OBD_CKSUM_ADLER; @@ -128,6 +95,8 @@ static inline enum cksum_type cksum_types_supported_client(void) ret |= OBD_CKSUM_CRC32C; if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) > 0) ret |= OBD_CKSUM_CRC32; + /* Client support all kinds of T10 checksum */ + ret |= OBD_CKSUM_T10_ALL; return ret; } @@ -140,14 +109,68 @@ static inline enum cksum_type cksum_types_supported_client(void) * Caution is advised, however, since what is fastest on a single client may * not be the fastest or most efficient algorithm on the server. */ -static inline enum cksum_type cksum_type_select(enum cksum_type cksum_types) +static inline enum cksum_type +obd_cksum_type_select(const char *obd_name, enum cksum_type cksum_types) { - return cksum_type_unpack(cksum_type_pack(cksum_types)); + u32 flag = obd_cksum_type_pack(obd_name, cksum_types); + + return obd_cksum_type_unpack(flag); } /* Checksum algorithm names. Must be defined in the same order as the * OBD_CKSUM_* flags. */ -#define DECLARE_CKSUM_NAME char *cksum_name[] = {"crc32", "adler", "crc32c"} +#define DECLARE_CKSUM_NAME const char *cksum_name[] = {"crc32", "adler", \ + "crc32c", "reserved", "t10ip512", "t10ip4K", "t10crc512", "t10crc4K"} + +typedef u16 (obd_dif_csum_fn) (void *, unsigned int); + +u16 obd_dif_crc_fn(void *data, unsigned int len); +u16 obd_dif_ip_fn(void *data, unsigned int len); +int obd_page_dif_generate_buffer(const char *obd_name, struct page *page, + u32 offset, u32 length, + u16 *guard_start, int guard_number, + int *used_number, int sector_size, + obd_dif_csum_fn *fn); +/* + * If checksum type is one T10 checksum types, init the csum_fn and sector + * size. Otherwise, init them to NULL/zero. + */ +static inline void obd_t10_cksum2dif(enum cksum_type cksum_type, + obd_dif_csum_fn **fn, int *sector_size) +{ + *fn = NULL; + *sector_size = 0; + + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + *fn = obd_dif_ip_fn; + *sector_size = 512; + break; + case OBD_CKSUM_T10IP4K: + *fn = obd_dif_ip_fn; + *sector_size = 4096; + break; + case OBD_CKSUM_T10CRC512: + *fn = obd_dif_crc_fn; + *sector_size = 512; + break; + case OBD_CKSUM_T10CRC4K: + *fn = obd_dif_crc_fn; + *sector_size = 4096; + break; + default: + break; + } +} + +enum obd_t10_cksum_type { + OBD_T10_CKSUM_UNKNOWN = 0, + OBD_T10_CKSUM_IP512, + OBD_T10_CKSUM_IP4K, + OBD_T10_CKSUM_CRC512, + OBD_T10_CKSUM_CRC4K, + OBD_T10_CKSUM_MAX +}; #endif /* __OBD_H */ diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index d896049..0153c50 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1687,7 +1687,6 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) extern char obd_jobid_name[]; int class_procfs_init(void); int class_procfs_clean(void); - /* prng.c */ #define ll_generate_random_uuid(uuid_out) \ get_random_bytes(uuid_out, sizeof(class_uuid_t)) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index eb29064..dff349f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -218,7 +218,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_LARGE_ACL; #endif - data->ocd_cksum_types = cksum_types_supported_client(); + data->ocd_cksum_types = obd_cksum_types_supported_client(); if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT)) /* flag mdc connection as lightweight, only used for test @@ -432,7 +432,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) if (OBD_FAIL_CHECK(OBD_FAIL_OSC_CKSUM_ADLER_ONLY)) data->ocd_cksum_types = OBD_CKSUM_ADLER; else - data->ocd_cksum_types = cksum_types_supported_client(); + data->ocd_cksum_types = obd_cksum_types_supported_client(); data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 96fce1b..25d2e1d 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o + jobid.o integrity.o obd_cksum.o diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c new file mode 100644 index 0000000..8348b16 --- /dev/null +++ b/fs/lustre/obdclass/integrity.c @@ -0,0 +1,273 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2018, DataDirect Networks Storage. + * Author: Li Xi. + * + * General data integrity functions + */ +#include +#include +#include +#include +#include + +u16 obd_dif_crc_fn(void *data, unsigned int len) +{ + return cpu_to_be16(crc_t10dif(data, len)); +} +EXPORT_SYMBOL(obd_dif_crc_fn); + +u16 obd_dif_ip_fn(void *data, unsigned int len) +{ + return ip_compute_csum(data, len); +} +EXPORT_SYMBOL(obd_dif_ip_fn); + +int obd_page_dif_generate_buffer(const char *obd_name, struct page *page, + u32 offset, u32 length, + u16 *guard_start, int guard_number, + int *used_number, int sector_size, + obd_dif_csum_fn *fn) +{ + unsigned int i; + char *data_buf; + u16 *guard_buf = guard_start; + unsigned int data_size; + int used = 0; + + data_buf = kmap(page) + offset; + for (i = 0; i < length; i += sector_size) { + if (used >= guard_number) { + CERROR("%s: unexpected used guard number of DIF %u/%u, data length %u, sector size %u: rc = %d\n", + obd_name, used, guard_number, length, + sector_size, -E2BIG); + return -E2BIG; + } + data_size = length - i; + if (data_size > sector_size) + data_size = sector_size; + *guard_buf = fn(data_buf, data_size); + guard_buf++; + data_buf += data_size; + used++; + } + kunmap(page); + *used_number = used; + + return 0; +} +EXPORT_SYMBOL(obd_page_dif_generate_buffer); + +static int __obd_t10_performance_test(const char *obd_name, + enum cksum_type cksum_type, + struct page *data_page, + int repeat_number) +{ + unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP); + struct ahash_request *hdesc; + obd_dif_csum_fn *fn = NULL; + unsigned int bufsize; + unsigned char *buffer; + struct page *__page; + u16 *guard_start; + int guard_number; + int used_number = 0; + int sector_size = 0; + u32 cksum; + int rc = 0; + int rc2; + int used; + int i; + + obd_t10_cksum2dif(cksum_type, &fn, §or_size); + if (!fn) + return -EINVAL; + + __page = alloc_page(GFP_KERNEL); + if (!__page) + return -ENOMEM; + + hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0); + if (IS_ERR(hdesc)) { + rc = PTR_ERR(hdesc); + CERROR("%s: unable to initialize checksum hash %s: rc = %d\n", + obd_name, cfs_crypto_hash_name(cfs_alg), rc); + goto out; + } + + buffer = kmap(__page); + guard_start = (u16 *)buffer; + guard_number = PAGE_SIZE / sizeof(*guard_start); + for (i = 0; i < repeat_number; i++) { + /* + * The left guard number should be able to hold checksums of a + * whole page + */ + rc = obd_page_dif_generate_buffer(obd_name, data_page, 0, + PAGE_SIZE, + guard_start + used_number, + guard_number - used_number, + &used, sector_size, fn); + if (rc) + break; + + used_number += used; + if (used_number == guard_number) { + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + used_number = 0; + } + } + kunmap(__page); + if (rc) + goto out_final; + + if (used_number != 0) + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + + bufsize = sizeof(cksum); +out_final: + rc2 = cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize); + rc = rc ? rc : rc2; +out: + __free_page(__page); + + return rc; +} + +/** + * Array of T10PI checksum algorithm speed in MByte per second + */ +static int obd_t10_cksum_speeds[OBD_T10_CKSUM_MAX]; + +static enum obd_t10_cksum_type +obd_t10_cksum2type(enum cksum_type cksum_type) +{ + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + return OBD_T10_CKSUM_IP512; + case OBD_CKSUM_T10IP4K: + return OBD_T10_CKSUM_IP4K; + case OBD_CKSUM_T10CRC512: + return OBD_T10_CKSUM_CRC512; + case OBD_CKSUM_T10CRC4K: + return OBD_T10_CKSUM_CRC4K; + default: + return OBD_T10_CKSUM_UNKNOWN; + } +} + +static const char *obd_t10_cksum_name(enum obd_t10_cksum_type index) +{ + DECLARE_CKSUM_NAME; + + /* Need to skip "crc32", "adler", "crc32c", "reserved" */ + return cksum_name[3 + index]; +} + +/** + * Compute the speed of specified T10PI checksum type + * + * Run a speed test on the given T10PI checksum on buffer using a 1MB buffer + * size. This is a reasonable buffer size for Lustre RPCs, even if the actual + * RPC size is larger or smaller. + * + * The speed is stored internally in the obd_t10_cksum_speeds[] array, and + * is available through the obd_t10_cksum_speed() function. + * + * This function needs to stay the same as cfs_crypto_performance_test() so + * that the speeds are comparable. And this function should reflect the real + * cost of the checksum calculation. + * + * \param[in] obd_name name of the OBD device + * \param[in] cksum_type checksum type (OBD_CKSUM_T10*) + */ +static void obd_t10_performance_test(const char *obd_name, + enum cksum_type cksum_type) +{ + enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type); + const int buf_len = max(PAGE_SIZE, 1048576UL); + unsigned long bcount; + unsigned long start; + unsigned long end; + struct page *page; + int rc = 0; + void *buf; + + page = alloc_page(GFP_KERNEL); + if (!page) { + rc = -ENOMEM; + goto out; + } + + buf = kmap(page); + memset(buf, 0xAD, PAGE_SIZE); + kunmap(page); + + for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC / 4), + bcount = 0; time_before(jiffies, end) && rc == 0; bcount++) { + rc = __obd_t10_performance_test(obd_name, cksum_type, page, + buf_len / PAGE_SIZE); + if (rc) + break; + } + end = jiffies; + __free_page(page); +out: + if (rc) { + obd_t10_cksum_speeds[index] = rc; + CDEBUG(D_INFO, + "%s: T10 checksum algorithm %s test error: rc = %d\n", + obd_name, obd_t10_cksum_name(index), rc); + } else { + unsigned long tmp; + + tmp = ((bcount * buf_len / jiffies_to_msecs(end - start)) * + 1000) / (1024 * 1024); + obd_t10_cksum_speeds[index] = (int)tmp; + CDEBUG(D_CONFIG, + "%s: T10 checksum algorithm %s speed = %d MB/s\n", + obd_name, obd_t10_cksum_name(index), + obd_t10_cksum_speeds[index]); + } +} + +int obd_t10_cksum_speed(const char *obd_name, + enum cksum_type cksum_type) +{ + enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type); + + if (unlikely(obd_t10_cksum_speeds[index] == 0)) { + static DEFINE_MUTEX(obd_t10_cksum_speed_mutex); + + mutex_lock(&obd_t10_cksum_speed_mutex); + if (obd_t10_cksum_speeds[index] == 0) + obd_t10_performance_test(obd_name, cksum_type); + mutex_unlock(&obd_t10_cksum_speed_mutex); + } + + return obd_t10_cksum_speeds[index]; +} +EXPORT_SYMBOL(obd_t10_cksum_speed); diff --git a/fs/lustre/obdclass/obd_cksum.c b/fs/lustre/obdclass/obd_cksum.c new file mode 100644 index 0000000..601feb7 --- /dev/null +++ b/fs/lustre/obdclass/obd_cksum.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2018, DataDirect Networks Storage. + * Author: Li Xi. + * + * Checksum functions + */ +#include +#include + +/* Server uses algos that perform at 50% or better of the Adler */ +enum cksum_type obd_cksum_types_supported_server(const char *obd_name) +{ + enum cksum_type ret = OBD_CKSUM_ADLER; + int base_speed; + + CDEBUG(D_INFO, + "%s: checksum speed: crc %d, crc32c %d, adler %d, t10ip512 %d, t10ip4k %d, t10crc512 %d, t10crc4k %d\n", + obd_name, + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)), + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)), + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K)); + + base_speed = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)) / 2; + + if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)) >= + base_speed) + ret |= OBD_CKSUM_CRC32C; + + if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) >= + base_speed) + ret |= OBD_CKSUM_CRC32; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512) >= base_speed) + ret |= OBD_CKSUM_T10IP512; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K) >= base_speed) + ret |= OBD_CKSUM_T10IP4K; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512) >= base_speed) + ret |= OBD_CKSUM_T10CRC512; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K) >= base_speed) + ret |= OBD_CKSUM_T10CRC4K; + + return ret; +} +EXPORT_SYMBOL(obd_cksum_types_supported_server); + +/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can + * only be a single checksum type per RPC. + * + * The OBD_CKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask + * since they need to represent the full range of checksum algorithms that + * both the client and server can understand. + * + * In case of an unsupported types/flags we fall back to ADLER + * because that is supported by all clients since 1.8 + * + * In case multiple algorithms are supported the best one is used. + */ +u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type) +{ + unsigned int performance = 0, tmp; + u32 flag = OBD_FL_CKSUM_ADLER; + + if (cksum_type & OBD_CKSUM_CRC32) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_CRC32; + } + } + if (cksum_type & OBD_CKSUM_CRC32C) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_CRC32C; + } + } + if (cksum_type & OBD_CKSUM_ADLER) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_ADLER; + } + } + + if (cksum_type & OBD_CKSUM_T10IP512) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10IP512; + } + } + + if (cksum_type & OBD_CKSUM_T10IP4K) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10IP4K; + } + } + + if (cksum_type & OBD_CKSUM_T10CRC512) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10CRC512; + } + } + + if (cksum_type & OBD_CKSUM_T10CRC4K) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10CRC4K; + } + } + + if (unlikely(cksum_type && !(cksum_type & OBD_CKSUM_ALL))) + CWARN("%s: unknown cksum type %x\n", obd_name, cksum_type); + + return flag; +} +EXPORT_SYMBOL(obd_cksum_type_pack); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index c430239..9ac9c84 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1030,6 +1030,105 @@ static inline int can_merge_pages(struct brw_page *p1, struct brw_page *p2) return (p1->off + p1->count == p2->off); } +static int osc_checksum_bulk_t10pi(const char *obd_name, int nob, + size_t pg_count, struct brw_page **pga, + int opc, obd_dif_csum_fn *fn, + int sector_size, + u32 *check_sum) +{ + struct ahash_request *hdesc; + /* Used Adler as the default checksum type on top of DIF tags */ + unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP); + struct page *__page; + unsigned char *buffer; + u16 *guard_start; + unsigned int bufsize; + int guard_number; + int used_number = 0; + int used; + u32 cksum; + int rc = 0; + int i = 0; + + LASSERT(pg_count > 0); + + __page = alloc_page(GFP_KERNEL); + if (!__page) + return -ENOMEM; + + hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0); + if (IS_ERR(hdesc)) { + rc = PTR_ERR(hdesc); + CERROR("%s: unable to initialize checksum hash %s: rc = %d\n", + obd_name, cfs_crypto_hash_name(cfs_alg), rc); + goto out; + } + + buffer = kmap(__page); + guard_start = (u16 *)buffer; + guard_number = PAGE_SIZE / sizeof(*guard_start); + while (nob > 0 && pg_count > 0) { + unsigned int count = pga[i]->count > nob ? nob : pga[i]->count; + + /* corrupt the data before we compute the checksum, to + * simulate an OST->client data error + */ + if (unlikely(i == 0 && opc == OST_READ && + OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_RECEIVE))) { + unsigned char *ptr = kmap(pga[i]->pg); + int off = pga[i]->off & ~PAGE_MASK; + + memcpy(ptr + off, "bad1", min_t(typeof(nob), 4, nob)); + kunmap(pga[i]->pg); + } + + /* + * The left guard number should be able to hold checksums of a + * whole page + */ + rc = obd_page_dif_generate_buffer(obd_name, pga[i]->pg, 0, + count, + guard_start + used_number, + guard_number - used_number, + &used, sector_size, + fn); + if (rc) + break; + + used_number += used; + if (used_number == guard_number) { + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + used_number = 0; + } + + nob -= pga[i]->count; + pg_count--; + i++; + } + kunmap(__page); + if (rc) + goto out; + + if (used_number != 0) + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + + bufsize = sizeof(cksum); + cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize); + + /* For sending we only compute the wrong checksum instead + * of corrupting the data so it is still correct on a redo + */ + if (opc == OST_WRITE && OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_SEND)) + cksum++; + + *check_sum = cksum; +out: + __free_page(__page); + return rc; +} + static int osc_checksum_bulk(int nob, u32 pg_count, struct brw_page **pga, int opc, enum cksum_type cksum_type, @@ -1090,6 +1189,28 @@ static int osc_checksum_bulk(int nob, u32 pg_count, return 0; } +static int osc_checksum_bulk_rw(const char *obd_name, + enum cksum_type cksum_type, + int nob, size_t pg_count, + struct brw_page **pga, int opc, + u32 *check_sum) +{ + obd_dif_csum_fn *fn = NULL; + int sector_size = 0; + int rc; + + obd_t10_cksum2dif(cksum_type, &fn, §or_size); + + if (fn) + rc = osc_checksum_bulk_t10pi(obd_name, nob, pg_count, pga, + opc, fn, sector_size, check_sum); + else + rc = osc_checksum_bulk(nob, pg_count, pga, opc, cksum_type, + check_sum); + + return rc; +} + static int osc_brw_prep_request(int cmd, struct client_obd *cli, struct obdo *oa, u32 page_count, struct brw_page **pga, @@ -1107,6 +1228,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, struct req_capsule *pill; struct brw_page *pg_prev; void *short_io_buf; + const char *obd_name = cli->cl_import->imp_obd->obd_name; if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ)) return -ENOMEM; /* Recoverable */ @@ -1306,12 +1428,14 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) body->oa.o_flags = 0; - body->oa.o_flags |= cksum_type_pack(cksum_type); + body->oa.o_flags |= obd_cksum_type_pack(obd_name, + cksum_type); body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; - rc = osc_checksum_bulk(requested_nob, page_count, - pga, OST_WRITE, cksum_type, - &body->oa.o_cksum); + rc = osc_checksum_bulk_rw(obd_name, cksum_type, + requested_nob, page_count, + pga, OST_WRITE, + &body->oa.o_cksum); if (rc < 0) { CDEBUG(D_PAGE, "failed to checksum, rc = %d\n", rc); @@ -1322,7 +1446,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, /* save this in 'oa', too, for later checking */ oa->o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; - oa->o_flags |= cksum_type_pack(cksum_type); + oa->o_flags |= obd_cksum_type_pack(obd_name, + cksum_type); } else { /* clear out the checksum flag, in case this is a * resend but cl_checksum is no longer set. b=11238 @@ -1338,7 +1463,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, !sptlrpc_flavor_has_bulk(&req->rq_flvr)) { if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) body->oa.o_flags = 0; - body->oa.o_flags |= cksum_type_pack(cli->cl_cksum_type); + body->oa.o_flags |= obd_cksum_type_pack(obd_name, + cli->cl_cksum_type); body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; } @@ -1441,6 +1567,10 @@ static int check_write_checksum(struct obdo *oa, u32 client_cksum, u32 server_cksum, struct osc_brw_async_args *aa) { + const char *obd_name = aa->aa_cli->cl_import->imp_obd->obd_name; + obd_dif_csum_fn *fn = NULL; + int sector_size = 0; + bool t10pi = false; u32 new_cksum; char *msg; enum cksum_type cksum_type; @@ -1455,15 +1585,50 @@ static int check_write_checksum(struct obdo *oa, dump_all_bulk_pages(oa, aa->aa_page_count, aa->aa_ppga, server_cksum, client_cksum); - cksum_type = cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ? - oa->o_flags : 0); - rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count, - aa->aa_ppga, OST_WRITE, cksum_type, - &new_cksum); + cksum_type = obd_cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ? + oa->o_flags : 0); + + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + t10pi = true; + fn = obd_dif_ip_fn; + sector_size = 512; + break; + case OBD_CKSUM_T10IP4K: + t10pi = true; + fn = obd_dif_ip_fn; + sector_size = 4096; + break; + case OBD_CKSUM_T10CRC512: + t10pi = true; + fn = obd_dif_crc_fn; + sector_size = 512; + break; + case OBD_CKSUM_T10CRC4K: + t10pi = true; + fn = obd_dif_crc_fn; + sector_size = 4096; + break; + default: + break; + } + + if (t10pi) + rc = osc_checksum_bulk_t10pi(obd_name, aa->aa_requested_nob, + aa->aa_page_count, + aa->aa_ppga, + OST_WRITE, + fn, + sector_size, + &new_cksum); + else + rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count, + aa->aa_ppga, OST_WRITE, cksum_type, + &new_cksum); if (rc < 0) msg = "failed to calculate the client write checksum"; - else if (cksum_type != cksum_type_unpack(aa->aa_oa->o_flags)) + else if (cksum_type != obd_cksum_type_unpack(aa->aa_oa->o_flags)) msg = "the server did not use the checksum type specified in the original request - likely a protocol problem"; else if (new_cksum == server_cksum) msg = "changed on the client after we checksummed it - likely false positive due to mmap IO (bug 11742)"; @@ -1474,15 +1639,15 @@ static int check_write_checksum(struct obdo *oa, LCONSOLE_ERROR_MSG(0x132, "%s: BAD WRITE CHECKSUM: %s: from %s inode " DFID " object " DOSTID " extent [%llu-%llu], original client csum %x (type %x), server csum %x (type %x), client csum now %x\n", - aa->aa_cli->cl_import->imp_obd->obd_name, - msg, libcfs_nid2str(peer->nid), + obd_name, msg, libcfs_nid2str(peer->nid), oa->o_valid & OBD_MD_FLFID ? oa->o_parent_seq : (u64)0, oa->o_valid & OBD_MD_FLFID ? oa->o_parent_oid : 0, oa->o_valid & OBD_MD_FLFID ? oa->o_parent_ver : 0, POSTID(&oa->o_oi), aa->aa_ppga[0]->off, aa->aa_ppga[aa->aa_page_count - 1]->off + aa->aa_ppga[aa->aa_page_count - 1]->count - 1, - client_cksum, cksum_type_unpack(aa->aa_oa->o_flags), + client_cksum, + obd_cksum_type_unpack(aa->aa_oa->o_flags), server_cksum, cksum_type, new_cksum); return 1; @@ -1495,6 +1660,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) const struct lnet_process_id *peer = &req->rq_import->imp_connection->c_peer; struct client_obd *cli = aa->aa_cli; + const char *obd_name = cli->cl_import->imp_obd->obd_name; struct ost_body *body; u32 client_cksum = 0; @@ -1619,17 +1785,17 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) char *via = ""; char *router = ""; enum cksum_type cksum_type; + u32 o_flags = body->oa.o_valid & OBD_MD_FLFLAGS ? + body->oa.o_flags : 0; - cksum_type = cksum_type_unpack(body->oa.o_valid & OBD_MD_FLFLAGS ? - body->oa.o_flags : 0); + cksum_type = obd_cksum_type_unpack(o_flags); - rc = osc_checksum_bulk(rc, aa->aa_page_count, aa->aa_ppga, - OST_READ, cksum_type, &client_cksum); - if (rc < 0) { - CDEBUG(D_PAGE, - "failed to calculate checksum, rc = %d\n", rc); + rc = osc_checksum_bulk_rw(obd_name, cksum_type, rc, + aa->aa_page_count, aa->aa_ppga, + OST_READ, &client_cksum); + if (rc < 0) goto out; - } + if (req->rq_bulk && peer->nid != req->rq_bulk->bd_sender) { via = " via "; @@ -1652,7 +1818,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) "%s: BAD READ CHECKSUM: from %s%s%s inode " DFID " object " DOSTID " extent [%llu-%llu], client %x, server %x, cksum_type %x\n", - req->rq_import->imp_obd->obd_name, + obd_name, libcfs_nid2str(peer->nid), via, router, clbody->oa.o_valid & OBD_MD_FLFID ? diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 5d6546d..019648b 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -786,11 +786,12 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, * for algorithms we understand. The server masked off * the checksum types it doesn't support */ - if (!(ocd->ocd_cksum_types & cksum_types_supported_client())) { + if (!(ocd->ocd_cksum_types & + obd_cksum_types_supported_client())) { LCONSOLE_ERROR("The negotiation of the checksum algorithm to use with server %s failed (%x/%x), disabling checksums\n", obd2cli_tgt(imp->imp_obd), ocd->ocd_cksum_types, - cksum_types_supported_client()); + obd_cksum_types_supported_client()); return -EPROTO; } cli->cl_supp_cksum_types = ocd->ocd_cksum_types; @@ -801,7 +802,8 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, */ cli->cl_supp_cksum_types = OBD_CKSUM_ADLER; } - cli->cl_cksum_type = cksum_type_select(cli->cl_supp_cksum_types); + cli->cl_cksum_type = obd_cksum_type_select(imp->imp_obd->obd_name, + cli->cl_supp_cksum_types); if (ocd->ocd_connect_flags & OBD_CONNECT_BRW_SIZE) cli->cl_max_pages_per_rpc = diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c566dea..01ddbee 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1123,6 +1123,18 @@ void lustre_assert_wire_constants(void) (unsigned int)OBD_CKSUM_ADLER); LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32C); + LASSERTF(OBD_CKSUM_RESERVED == 0x00000008UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_RESERVED); + LASSERTF(OBD_CKSUM_T10IP512 == 0x00000010UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10IP512); + LASSERTF(OBD_CKSUM_T10IP4K == 0x00000020UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10IP4K); + LASSERTF(OBD_CKSUM_T10CRC512 == 0x00000040UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10CRC512); + LASSERTF(OBD_CKSUM_T10CRC4K == 0x00000080UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10CRC4K); + LASSERTF(OBD_CKSUM_T10_TOP == 0x00000002UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10_TOP); /* Checks for struct ost_layout */ LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n", @@ -1372,7 +1384,10 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(OBD_FL_CKSUM_CRC32 != 0x00001000); BUILD_BUG_ON(OBD_FL_CKSUM_ADLER != 0x00002000); BUILD_BUG_ON(OBD_FL_CKSUM_CRC32C != 0x00004000); - BUILD_BUG_ON(OBD_FL_CKSUM_RSVD2 != 0x00008000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10IP512 != 0x00005000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10IP4K != 0x00006000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10CRC512 != 0x00007000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10CRC4K != 0x00008000); BUILD_BUG_ON(OBD_FL_CKSUM_RSVD3 != 0x00010000); BUILD_BUG_ON(OBD_FL_SHRINK_GRANT != 0x00020000); BUILD_BUG_ON(OBD_FL_MMAP != 0x00040000); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 7cf7307..11df7b4 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -883,15 +883,37 @@ struct obd_connect_data { /* * Supported checksum algorithms. Up to 32 checksum types are supported. * (32-bit mask stored in obd_connect_data::ocd_cksum_types) - * Please update DECLARE_CKSUM_NAME/OBD_CKSUM_ALL in obd.h when adding a new - * algorithm and also the OBD_FL_CKSUM* flags. + * Please update DECLARE_CKSUM_NAME in obd_cksum.h when adding a new + * algorithm and also the OBD_FL_CKSUM* flags, OBD_CKSUM_ALL flag, + * OBD_FL_CKSUM_ALL flag and potentially OBD_CKSUM_T10_ALL flag. */ enum cksum_type { - OBD_CKSUM_CRC32 = 0x00000001, - OBD_CKSUM_ADLER = 0x00000002, - OBD_CKSUM_CRC32C = 0x00000004, + OBD_CKSUM_CRC32 = 0x00000001, + OBD_CKSUM_ADLER = 0x00000002, + OBD_CKSUM_CRC32C = 0x00000004, + OBD_CKSUM_RESERVED = 0x00000008, + OBD_CKSUM_T10IP512 = 0x00000010, + OBD_CKSUM_T10IP4K = 0x00000020, + OBD_CKSUM_T10CRC512 = 0x00000040, + OBD_CKSUM_T10CRC4K = 0x00000080, }; +#define OBD_CKSUM_T10_ALL (OBD_CKSUM_T10IP512 | OBD_CKSUM_T10IP4K | \ + OBD_CKSUM_T10CRC512 | OBD_CKSUM_T10CRC4K) + +#define OBD_CKSUM_ALL (OBD_CKSUM_CRC32 | OBD_CKSUM_ADLER | OBD_CKSUM_CRC32C | \ + OBD_CKSUM_T10_ALL) + +/* + * The default checksum algorithm used on top of T10PI GRD tags for RPC. + * Considering that the checksum-of-checksums is only computing CRC32 on a + * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors, or 16KB of GRD + * tags for 16MB of 4KB sectors, this is only 1/256 or 1/1024 of the + * total data being checksummed, so the checksum type used here should not + * affect overall system performance noticeably. + */ +#define OBD_CKSUM_T10_TOP OBD_CKSUM_ADLER + /* * OST requests: OBDO & OBD request records */ @@ -940,7 +962,10 @@ enum obdo_flags { OBD_FL_CKSUM_CRC32 = 0x00001000, /* CRC32 checksum type */ OBD_FL_CKSUM_ADLER = 0x00002000, /* ADLER checksum type */ OBD_FL_CKSUM_CRC32C = 0x00004000, /* CRC32C checksum type */ - OBD_FL_CKSUM_RSVD2 = 0x00008000, /* for future cksum types */ + OBD_FL_CKSUM_T10IP512 = 0x00005000, /* T10PI IP cksum, 512B sector */ + OBD_FL_CKSUM_T10IP4K = 0x00006000, /* T10PI IP cksum, 4KB sector */ + OBD_FL_CKSUM_T10CRC512 = 0x00007000, /* T10PI CRC cksum, 512B sector */ + OBD_FL_CKSUM_T10CRC4K = 0x00008000, /* T10PI CRC cksum, 4KB sector */ OBD_FL_CKSUM_RSVD3 = 0x00010000, /* for future cksum types */ OBD_FL_SHRINK_GRANT = 0x00020000, /* object shrink the grant */ OBD_FL_MMAP = 0x00040000, /* object is mmapped on the client. @@ -953,11 +978,16 @@ enum obdo_flags { OBD_FL_SHORT_IO = 0x00400000, /* short io request */ /* OBD_FL_LOCAL_MASK = 0xF0000000, was local-only flags until 2.10 */ - /* Note that while these checksum values are currently separate bits, - * in 2.x we can actually allow all values from 1-31 if we wanted. + /* + * Note that while the original checksum values were separate bits, + * in 2.x we can actually allow all values from 1-31. T10-PI checksum + * types already use values which are not separate bits. */ OBD_FL_CKSUM_ALL = (OBD_FL_CKSUM_CRC32 | OBD_FL_CKSUM_ADLER | - OBD_FL_CKSUM_CRC32C), + OBD_FL_CKSUM_CRC32C | OBD_FL_CKSUM_T10IP512 | + OBD_FL_CKSUM_T10IP4K | + OBD_FL_CKSUM_T10CRC512 | + OBD_FL_CKSUM_T10CRC4K), }; /* diff --git a/net/lnet/libcfs/linux-crypto.c b/net/lnet/libcfs/linux-crypto.c index 53285c2..532fab4 100644 --- a/net/lnet/libcfs/linux-crypto.c +++ b/net/lnet/libcfs/linux-crypto.c @@ -318,6 +318,9 @@ int cfs_crypto_hash_final(struct ahash_request *req, * The speed is stored internally in the cfs_crypto_hash_speeds[] array, and * is available through the cfs_crypto_hash_speed() function. * + * This function needs to stay the same as obd_t10_performance_test() so that + * the speeds are comparable. + * * @hash_alg hash algorithm id (CFS_HASH_ALG_*) * @buf data buffer on which to compute the hash * @buf_len length of @buf on which to compute hash From patchwork Thu Feb 27 21:08:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AC66138D for ; Thu, 27 Feb 2020 21:20:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 52EF7246A1 for ; Thu, 27 Feb 2020 21:20:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52EF7246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CFB8E21FB59; Thu, 27 Feb 2020 13:19:51 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CCF0321FA4B for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C43FEE09; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C2F30468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:39 -0500 Message-Id: <1582838290-17243-52-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 051/622] lustre: ldlm: Reduce debug to console during eviction X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell During an eviction, Lustre calls ldlm_namespace_cleanup, and it will sometimes end up dumping all of the locks on a particular resource to the console log (ldlm_resource_complain), which is very wasteful and only rarely helpful. Move the debug level for this to D_NETERROR since it is in the default debug mask. Cray-bug-id: LUS-1418 WC-bug-id: https://jira.whamcloud.com/browse/LU-10648 Lustre-commit: f92fcb863cb9 ("LU-10648 ldlm: Reduce debug to console during eviction") Signed-off-by: Chris Horn Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/31237 Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 7fe8a8b..5d73132 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -819,7 +819,8 @@ static int ldlm_resource_complain(struct cfs_hash *hs, struct cfs_hash_bd *bd, ldlm_ns_name(ldlm_res_to_ns(res)), PLDLMRES(res), res, atomic_read(&res->lr_refcount) - 1); - ldlm_resource_dump(D_ERROR, res); + /* Use D_NETERROR since it is in the default mask */ + ldlm_resource_dump(D_NETERROR, res); unlock_res(res); return 0; } From patchwork Thu Feb 27 21:08:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B3ED159A for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3260E246A1 for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3260E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F19CF21FDC3; Thu, 27 Feb 2020 13:20:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1BE0E21FA5B for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C7551E0B; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C5EF646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:40 -0500 Message-Id: <1582838290-17243-53-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 052/622] lustre: ptlrpc: idle connections can disconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev - when new request is being allocated ptlrpc initiates connection if it's not connected yet - if the import is idle (no locks, no active RPCs, no non-PING reply for last osc_idle_timeout seconds), then pinger tries to disconnect asynchronously - currently only client-to-OST connections can be idle - lctl set_param osc.*.idle_timeout=N controls new feature: N=0 - disable N>0 - seconds to idle before disconnect - lctl set_param osc.*.idle_connect=N to reconnect if idle (N is positive number) - OSC module parameter osc_idle_timeout controls default idle timeout and set to 20 seconds by default WC-bug-id: https://jira.whamcloud.com/browse/LU-7236 Lustre-commit: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/16682 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 17 +++-- fs/lustre/include/lustre_net.h | 1 + fs/lustre/lov/lov_ea.c | 3 +- fs/lustre/lov/lov_obd.c | 8 ++- fs/lustre/lov/lov_request.c | 25 ++++++-- fs/lustre/osc/lproc_osc.c | 66 +++++++++++++++++++ fs/lustre/osc/osc_request.c | 3 + fs/lustre/ptlrpc/client.c | 32 +++++++++- fs/lustre/ptlrpc/events.c | 3 +- fs/lustre/ptlrpc/import.c | 130 ++++++++++++++++++++++++++++++-------- fs/lustre/ptlrpc/pinger.c | 30 +++++++++ 11 files changed, 275 insertions(+), 43 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 0d7bb0f..c4452e1 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -96,6 +96,8 @@ enum lustre_imp_state { LUSTRE_IMP_RECOVER = 8, LUSTRE_IMP_FULL = 9, LUSTRE_IMP_EVICTED = 10, + LUSTRE_IMP_IDLE = 11, + LUSTRE_IMP_LAST }; /** Returns test string representation of numeric import state @state */ @@ -104,10 +106,10 @@ static inline char *ptlrpc_import_state_name(enum lustre_imp_state state) static char *import_state_names[] = { "", "CLOSED", "NEW", "DISCONN", "CONNECTING", "REPLAY", "REPLAY_LOCKS", "REPLAY_WAIT", - "RECOVER", "FULL", "EVICTED", + "RECOVER", "FULL", "EVICTED", "IDLE", }; - LASSERT(state <= LUSTRE_IMP_EVICTED); + LASSERT(state < LUSTRE_IMP_LAST); return import_state_names[state]; } @@ -226,12 +228,14 @@ struct obd_import { int imp_state_hist_idx; /** Current import generation. Incremented on every reconnect */ int imp_generation; + /* Idle connection initiated at this generation */ + int imp_initiated_at; /** Incremented every time we send reconnection request */ u32 imp_conn_cnt; - /** - * \see ptlrpc_free_committed remembers imp_generation value here - * after a check to save on unnecessary replay list iterations - */ + /* + * \see ptlrpc_free_committed remembers imp_generation value here + * after a check to save on unnecessary replay list iterations + */ int imp_last_generation_checked; /** Last transno we replayed */ u64 imp_last_replay_transno; @@ -299,6 +303,7 @@ struct obd_import { imp_connected:1; u32 imp_connect_op; + u32 imp_idle_timeout; struct obd_connect_data imp_connect_data; u64 imp_connect_flags_orig; u64 imp_connect_flags2_orig; diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 0231011..674803c 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1988,6 +1988,7 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, int ptlrpc_connect_import(struct obd_import *imp); int ptlrpc_init_import(struct obd_import *imp); int ptlrpc_disconnect_import(struct obd_import *imp, int noclose); +int ptlrpc_disconnect_and_idle_import(struct obd_import *imp); int ptlrpc_import_recovery_state_machine(struct obd_import *imp); /* ptlrpc/pack_generic.c */ diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index 41308d3..edca3b0 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -70,7 +70,8 @@ static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt) return maxbytes; spin_lock(&imp->imp_lock); - if (imp->imp_state == LUSTRE_IMP_FULL && + if ((imp->imp_state == LUSTRE_IMP_FULL || + imp->imp_state == LUSTRE_IMP_IDLE) && (imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_MAXBYTES) && imp->imp_connect_data.ocd_maxbytes > 0) maxbytes = imp->imp_connect_data.ocd_maxbytes; diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 9449aa9..35eaa1f 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -977,17 +977,21 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len, struct obd_ioctl_data *data = karg; struct obd_device *osc_obd; struct obd_statfs stat_buf = { 0 }; + struct obd_import *imp; u32 index; u32 flags; - memcpy(&index, data->ioc_inlbuf2, sizeof(u32)); + memcpy(&index, data->ioc_inlbuf2, sizeof(index)); if (index >= count) return -ENODEV; if (!lov->lov_tgts[index]) /* Try again with the next index */ return -EAGAIN; - if (!lov->lov_tgts[index]->ltd_active) + + imp = lov->lov_tgts[index]->ltd_exp->exp_obd->u.cli.cl_import; + if (!lov->lov_tgts[index]->ltd_active && + imp->imp_state != LUSTRE_IMP_IDLE) return -ENODATA; osc_obd = class_exp2obd(lov->lov_tgts[index]->ltd_exp); diff --git a/fs/lustre/lov/lov_request.c b/fs/lustre/lov/lov_request.c index 864e410..added19 100644 --- a/fs/lustre/lov/lov_request.c +++ b/fs/lustre/lov/lov_request.c @@ -99,6 +99,7 @@ static int lov_check_and_wait_active(struct lov_obd *lov, int ost_idx) { int cnt = 0; struct lov_tgt_desc *tgt; + struct obd_import *imp = NULL; int rc = 0; mutex_lock(&lov->lov_lock); @@ -115,7 +116,13 @@ static int lov_check_and_wait_active(struct lov_obd *lov, int ost_idx) goto out; } - if (tgt->ltd_exp && class_exp2cliimp(tgt->ltd_exp)->imp_connect_tried) { + if (tgt->ltd_exp) + imp = class_exp2cliimp(tgt->ltd_exp); + if (imp && imp->imp_connect_tried) { + rc = 0; + goto out; + } + if (imp && imp->imp_state == LUSTRE_IMP_IDLE) { rc = 0; goto out; } @@ -302,11 +309,10 @@ int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo, /* We only get block data from the OBD */ for (i = 0; i < lov->desc.ld_tgt_count; i++) { + struct lov_tgt_desc *ltd = lov->lov_tgts[i]; struct lov_request *req; - if (!lov->lov_tgts[i] || - (oinfo->oi_flags & OBD_STATFS_NODELAY && - !lov->lov_tgts[i]->ltd_active)) { + if (!ltd) { CDEBUG(D_HA, "lov idx %d inactive\n", i); continue; } @@ -314,13 +320,20 @@ int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo, /* skip targets that have been explicitly disabled by the * administrator */ - if (!lov->lov_tgts[i]->ltd_exp) { + if (!ltd->ltd_exp) { CDEBUG(D_HA, "lov idx %d administratively disabled\n", i); continue; } - if (!lov->lov_tgts[i]->ltd_active) + if (oinfo->oi_flags & OBD_STATFS_NODELAY && + class_exp2cliimp(ltd->ltd_exp)->imp_state != + LUSTRE_IMP_IDLE && !ltd->ltd_active) { + CDEBUG(D_HA, "lov idx %d inactive\n", i); + continue; + } + + if (!ltd->ltd_active) lov_check_and_wait_active(lov, i); req = kzalloc(sizeof(*req), GFP_NOFS); diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 605a236..fd84393 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -598,6 +598,68 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(osc_unstable_stats); +static int osc_idle_timeout_seq_show(struct seq_file *m, void *v) +{ + struct obd_device *obd = m->private; + struct client_obd *cli = &obd->u.cli; + + seq_printf(m, "%u\n", cli->cl_import->imp_idle_timeout); + return 0; +} + +static ssize_t osc_idle_timeout_seq_write(struct file *f, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *obd = ((struct seq_file *)f->private_data)->private; + struct client_obd *cli = &obd->u.cli; + struct ptlrpc_request *req; + unsigned int val; + int rc; + + rc = kstrtouint_from_user(buffer, count, 0, &val); + if (rc) + return rc; + + if (val > CONNECTION_SWITCH_MAX) + return -ERANGE; + + cli->cl_import->imp_idle_timeout = val; + + /* to initiate the connection if it's in IDLE state */ + if (!val) { + req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); + if (req) + ptlrpc_req_finished(req); + } + + return count; +} +LPROC_SEQ_FOPS(osc_idle_timeout); + +static int osc_idle_connect_seq_show(struct seq_file *m, void *v) +{ + return 0; +} + +static ssize_t osc_idle_connect_seq_write(struct file *f, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *dev = ((struct seq_file *)f->private_data)->private; + struct client_obd *cli = &dev->u.cli; + struct ptlrpc_request *req; + + /* to initiate the connection if it's in IDLE state */ + req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); + if (req) + ptlrpc_req_finished(req); + ptlrpc_pinger_force(cli->cl_import); + + return count; +} +LPROC_SEQ_FOPS(osc_idle_connect); + LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); @@ -625,6 +687,10 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) .fops = &osc_pinger_recov_fops }, { .name = "unstable_stats", .fops = &osc_unstable_stats_fops }, + { .name = "idle_timeout", + .fops = &osc_idle_timeout_fops }, + { .name = "idle_connect", + .fops = &osc_idle_connect_fops }, { NULL } }; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 9ac9c84..e341fcc 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -61,6 +61,8 @@ /* max memory used for request pool, unit is MB */ static unsigned int osc_reqpool_mem_max = 5; module_param(osc_reqpool_mem_max, uint, 0444); +static int osc_idle_timeout = 20; +module_param(osc_idle_timeout, uint, 0644); struct osc_async_args { struct obd_info *aa_oi; @@ -3214,6 +3216,7 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg) spin_lock(&osc_shrink_lock); list_add_tail(&cli->cl_shrink_list, &osc_shrink_list); spin_unlock(&osc_shrink_lock); + cli->cl_import->imp_idle_timeout = osc_idle_timeout; return rc; diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 424db55..9b41c12 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -885,6 +885,28 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, const struct req_format *format) { struct ptlrpc_request *request; + int connect = 0; + + if (unlikely(imp->imp_state == LUSTRE_IMP_IDLE)) { + int rc; + + CDEBUG(D_INFO, "%s: connect at new req\n", + imp->imp_obd->obd_name); + spin_lock(&imp->imp_lock); + if (imp->imp_state == LUSTRE_IMP_IDLE) { + imp->imp_generation++; + imp->imp_initiated_at = imp->imp_generation; + imp->imp_state = LUSTRE_IMP_NEW; + connect = 1; + } + spin_unlock(&imp->imp_lock); + if (connect) { + rc = ptlrpc_connect_import(imp); + if (rc < 0) + return NULL; + ptlrpc_pinger_add_import(imp); + } + } request = __ptlrpc_request_alloc(imp, pool); if (!request) @@ -1075,6 +1097,7 @@ void ptlrpc_set_add_req(struct ptlrpc_request_set *set, return; } + LASSERT(req->rq_import->imp_state != LUSTRE_IMP_IDLE); LASSERT(list_empty(&req->rq_set_chain)); /* The set takes over the caller's request reference */ @@ -1183,7 +1206,9 @@ static int ptlrpc_import_delay_req(struct obd_import *imp, if (atomic_read(&imp->imp_inval_count) != 0) { DEBUG_REQ(D_ERROR, req, "invalidate in flight"); *status = -EIO; - } else if (req->rq_no_delay) { + } else if (req->rq_no_delay && + imp->imp_generation != imp->imp_initiated_at) { + /* ignore nodelay for requests initiating connections */ *status = -EWOULDBLOCK; } else if (req->rq_allow_replay && (imp->imp_state == LUSTRE_IMP_REPLAY || @@ -1842,8 +1867,11 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) spin_unlock(&imp->imp_lock); goto interpret; } + /* ignore on just initiated connections */ if (ptlrpc_no_resend(req) && - !req->rq_wait_ctx) { + !req->rq_wait_ctx && + imp->imp_generation != + imp->imp_initiated_at) { req->rq_status = -ENOTCONN; ptlrpc_rqphase_move(req, RQ_PHASE_INTERPRET); diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c index 93a59b8..87c0ab7 100644 --- a/fs/lustre/ptlrpc/events.c +++ b/fs/lustre/ptlrpc/events.c @@ -164,7 +164,8 @@ void reply_in_callback(struct lnet_event *ev) ev->mlength, ev->offset, req->rq_replen); } - req->rq_import->imp_last_reply_time = ktime_get_real_seconds(); + if (lustre_msg_get_opc(req->rq_reqmsg) != OBD_PING) + req->rq_import->imp_last_reply_time = ktime_get_real_seconds(); out_wake: /* NB don't unlock till after wakeup; req can disappear under us diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 019648b..b90f78c 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -925,6 +925,21 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, } if (rc) { + struct ptlrpc_request *free_req; + struct ptlrpc_request *tmp; + + /* abort all delayed requests initiated connection */ + list_for_each_entry_safe(free_req, tmp, &imp->imp_delayed_list, + rq_list) { + spin_lock(&free_req->rq_lock); + if (free_req->rq_no_resend) { + free_req->rq_err = 1; + free_req->rq_status = -EIO; + ptlrpc_client_wake_req(free_req); + } + spin_unlock(&free_req->rq_lock); + } + /* if this reconnect to busy export - not need select new target * for connecting */ @@ -1454,14 +1469,11 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) return rc; } -int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) +static struct ptlrpc_request *ptlrpc_disconnect_prep_req(struct obd_import *imp) { struct ptlrpc_request *req; int rq_opc, rc = 0; - if (imp->imp_obd->obd_force) - goto set_state; - switch (imp->imp_connect_op) { case OST_CONNECT: rq_opc = OST_DISCONNECT; @@ -1477,9 +1489,47 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) CERROR("%s: don't know how to disconnect from %s (connect_op %d): rc = %d\n", imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd), imp->imp_connect_op, rc); - return rc; + return ERR_PTR(rc); } + req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_DISCONNECT, + LUSTRE_OBD_VERSION, rq_opc); + if (!req) + return NULL; + + /* We are disconnecting, do not retry a failed DISCONNECT rpc if + * it fails. We can get through the above with a down server + * if the client doesn't know the server is gone yet. + */ + req->rq_no_resend = 1; + + /* We want client umounts to happen quickly, no matter the + * server state... + */ + req->rq_timeout = min_t(int, req->rq_timeout, + INITIAL_CONNECT_TIMEOUT); + + IMPORT_SET_STATE(imp, LUSTRE_IMP_CONNECTING); + req->rq_send_state = LUSTRE_IMP_CONNECTING; + ptlrpc_request_set_replen(req); + + return req; +} + +int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) +{ + struct ptlrpc_request *req; + int rc = 0; + + if (imp->imp_obd->obd_force) + goto set_state; + + /* probably the import has been disconnected already being idle */ + spin_lock(&imp->imp_lock); + if (imp->imp_state == LUSTRE_IMP_IDLE) + goto out; + spin_unlock(&imp->imp_lock); + if (ptlrpc_import_in_recovery(imp)) { long timeout_jiffies; time64_t timeout; @@ -1512,27 +1562,13 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) goto out; spin_unlock(&imp->imp_lock); - req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_DISCONNECT, - LUSTRE_OBD_VERSION, rq_opc); - if (req) { - /* We are disconnecting, do not retry a failed DISCONNECT rpc if - * it fails. We can get through the above with a down server - * if the client doesn't know the server is gone yet. - */ - req->rq_no_resend = 1; - - /* We want client umounts to happen quickly, no matter the - * server state... - */ - req->rq_timeout = min_t(int, req->rq_timeout, - INITIAL_CONNECT_TIMEOUT); - - IMPORT_SET_STATE(imp, LUSTRE_IMP_CONNECTING); - req->rq_send_state = LUSTRE_IMP_CONNECTING; - ptlrpc_request_set_replen(req); - rc = ptlrpc_queue_wait(req); - ptlrpc_req_finished(req); + req = ptlrpc_disconnect_prep_req(imp); + if (IS_ERR(req)) { + rc = PTR_ERR(req); + goto set_state; } + rc = ptlrpc_queue_wait(req); + ptlrpc_req_finished(req); set_state: spin_lock(&imp->imp_lock); @@ -1551,6 +1587,50 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) } EXPORT_SYMBOL(ptlrpc_disconnect_import); +static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, + struct ptlrpc_request *req, + void *data, int rc) +{ + struct obd_import *imp = req->rq_import; + + LASSERT(imp->imp_state == LUSTRE_IMP_CONNECTING); + spin_lock(&imp->imp_lock); + IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_IDLE); + memset(&imp->imp_remote_handle, 0, sizeof(imp->imp_remote_handle)); + spin_unlock(&imp->imp_lock); + + return 0; +} + +int ptlrpc_disconnect_and_idle_import(struct obd_import *imp) +{ + struct ptlrpc_request *req; + + if (imp->imp_obd->obd_force) + return 0; + + if (ptlrpc_import_in_recovery(imp)) + return 0; + + spin_lock(&imp->imp_lock); + if (imp->imp_state != LUSTRE_IMP_FULL) { + spin_unlock(&imp->imp_lock); + return 0; + } + spin_unlock(&imp->imp_lock); + + req = ptlrpc_disconnect_prep_req(imp); + if (IS_ERR(req)) + return PTR_ERR(req); + + CDEBUG(D_INFO, "%s: disconnect\n", imp->imp_obd->obd_name); + req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret; + ptlrpcd_add_req(req); + + return 0; +} +EXPORT_SYMBOL(ptlrpc_disconnect_and_idle_import); + /* Adaptive Timeout utils */ /* diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index 762fd0e..c565e2d 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -79,10 +79,40 @@ int ptlrpc_obd_ping(struct obd_device *obd) } EXPORT_SYMBOL(ptlrpc_obd_ping); +static bool ptlrpc_check_import_is_idle(struct obd_import *imp) +{ + struct ldlm_namespace *ns = imp->imp_obd->obd_namespace; + time64_t now; + + if (!imp->imp_idle_timeout) + return false; + /* 4 comes from: + * - client_obd_setup() - hashed import + * - ptlrpcd_alloc_work() + * - ptlrpcd_alloc_work() + * - ptlrpc_pinger_add_import + */ + if (atomic_read(&imp->imp_refcount) > 4) + return false; + + /* any lock increases ns_bref being a resource holder */ + if (ns && atomic_read(&ns->ns_bref) > 0) + return false; + + now = ktime_get_real_seconds(); + if (now - imp->imp_last_reply_time < imp->imp_idle_timeout) + return false; + + return true; +} + static int ptlrpc_ping(struct obd_import *imp) { struct ptlrpc_request *req; + if (ptlrpc_check_import_is_idle(imp)) + return ptlrpc_disconnect_and_idle_import(imp); + req = ptlrpc_prep_ping(imp); if (!req) { CERROR("OOM trying to ping %s->%s\n", From patchwork Thu Feb 27 21:08:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409735 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08043138D for ; Thu, 27 Feb 2020 21:20:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4178246A1 for ; Thu, 27 Feb 2020 21:20:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4178246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADD0520111E; Thu, 27 Feb 2020 13:19:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73AE221FA5B for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CA475E0E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C8A0246F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:41 -0500 Message-Id: <1582838290-17243-54-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 053/622] lustre: osc: truncate does not update blocks count on client X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Abrarahmed Momin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain 'truncate' call correctly updates the server side with correct size and blocks count. However, on the client side all the metadata are correctly updated except the blocks count, which still reflects the old count prior to truncate call. This patch fixes this issue on the client by modifying osc_io_setattr_end() to update attr with the updated block count. New test case under sanity is added to verify the that the blocks counts are correctly updated after truncate call Co-authored-by: Abrarahmed Momin WC-bug-id: https://jira.whamcloud.com/browse/LU-10370 Lustre-commit: 6115eb7fd55a ("LU-10370 ofd: truncate does not update blocks count on client") Signed-off-by: Abrarahmed Momin Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/31073 Reviewed-by: Jinshan Xiong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_io.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 970e8a7..1485962 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -588,6 +588,9 @@ void osc_io_setattr_end(const struct lu_env *env, struct osc_io *oio = cl2osc_io(env, slice); struct cl_object *obj = slice->cis_obj; struct osc_async_cbargs *cbargs = &oio->oi_cbarg; + struct cl_attr *attr = &osc_env_info(env)->oti_attr; + struct obdo *oa = &oio->oi_oa; + unsigned int cl_valid = 0; int result = 0; if (cbargs->opc_rpc_sent) { @@ -609,6 +612,14 @@ void osc_io_setattr_end(const struct lu_env *env, if (cl_io_is_trunc(io)) { u64 size = io->u.ci_setattr.sa_attr.lvb_size; + cl_object_attr_lock(obj); + if (oa->o_valid & OBD_MD_FLBLOCKS) { + attr->cat_blocks = oa->o_blocks; + cl_valid |= CAT_BLOCKS; + } + + cl_object_attr_update(env, obj, attr, cl_valid); + cl_object_attr_unlock(obj); osc_trunc_check(env, io, oio, size); osc_cache_truncate_end(env, oio->oi_trunc); oio->oi_trunc = NULL; From patchwork Thu Feb 27 21:08:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80E49159A for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 69B13246A0 for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69B13246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2DF421FDF9; Thu, 27 Feb 2020 13:20:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C00A921FAE0 for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD70EE11; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CBB9246A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:42 -0500 Message-Id: <1582838290-17243-55-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 054/622] lustre: ptlrpc: add LOCK_CONVERT connection flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Add LOCK_CONVERT connection flag to don't use lock convert feature with old servers. WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 44a2092f08ca ("LU-10175 ptlrpc: add LOCK_CONVERT connection flag") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32593 Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index e2575b4..385359f 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -118,6 +118,7 @@ "unknown", /* 0x10 */ "flr", /* 0x20 */ "wbc", /* 0x40 */ + "lock_convert", /* 0x80 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 01ddbee..202c5ab 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1117,6 +1117,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_FLR); LASSERTF(OBD_CONNECT2_WBC_INTENTS == 0x40ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_WBC_INTENTS); + LASSERTF(OBD_CONNECT2_LOCK_CONVERT == 0x80ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_LOCK_CONVERT); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 11df7b4..798aa57 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -799,6 +799,7 @@ struct ptlrpc_body_v2 { * under client-held parent * locks */ +#define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:08:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409739 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E3DB138D for ; Thu, 27 Feb 2020 21:20:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8661D2084E for ; Thu, 27 Feb 2020 21:20:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8661D2084E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DF37F21FF16; Thu, 27 Feb 2020 13:20:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0FDB521FAAA for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D11ADE1E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEECB46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:43 -0500 Message-Id: <1582838290-17243-56-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 055/622] lustre: ldlm: handle lock converts in cancel handler X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin - Use cancel portals and high-priority handling for lock converts. Update ldlm_cancel_handler to understand LDLM_CONVERT RPC for that. - Use ns_dirty_age_limit for lock convert - don't convert too old locks. - Check for empty converts and skip such WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 541902a3f934 ("LU-10175 ldlm: handle lock converts in cancel handler") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32314 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_export.h | 6 ++++++ fs/lustre/ldlm/ldlm_inodebits.c | 19 ++++++++++++++----- fs/lustre/ldlm/ldlm_request.c | 39 +++++++++++++++++++++++++++++++-------- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/namei.c | 7 ++++++- 5 files changed, 58 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index de3b109..57cf68b 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -269,9 +269,15 @@ static inline int exp_connect_flr(struct obd_export *exp) return !!(exp_connect_flags2(exp) & OBD_CONNECT2_FLR); } +static inline int exp_connect_lock_convert(struct obd_export *exp) +{ + return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LOCK_CONVERT); +} + struct obd_export *class_conn2export(struct lustre_handle *conn); #define KKUC_CT_DATA_MAGIC 0x092013cea + struct kkuc_ct_data { u32 kcd_magic; u32 kcd_archive; diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index ddbf8d4..9cf3c5f 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -81,7 +81,7 @@ int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop) /* Just return if there are no conflicting bits */ if ((lock->l_policy_data.l_inodebits.bits & to_drop) == 0) { - LDLM_WARN(lock, "try to drop unset bits %#llx/%#llx\n", + LDLM_WARN(lock, "try to drop unset bits %#llx/%#llx", lock->l_policy_data.l_inodebits.bits, to_drop); /* nothing to do */ return 0; @@ -111,7 +111,7 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) ldlm_lock2handle(lock, &lockh); lock_res_and_lock(lock); - /* check if all bits are cancelled */ + /* check if all bits are blocked */ if (!(lock->l_policy_data.l_inodebits.bits & ~drop_bits)) { unlock_res_and_lock(lock); /* return error to continue with cancel */ @@ -119,6 +119,13 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) goto exit; } + /* check if no common bits, consider this as successful convert */ + if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { + unlock_res_and_lock(lock); + rc = 0; + goto exit; + } + /* check if there is race with cancel */ if (ldlm_is_canceling(lock) || ldlm_is_cancel(lock)) { unlock_res_and_lock(lock); @@ -167,9 +174,11 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) rc = ldlm_cli_convert(lock, &flags); if (rc) { lock_res_and_lock(lock); - ldlm_clear_converting(lock); - ldlm_set_cbpending(lock); - ldlm_set_bl_ast(lock); + if (ldlm_is_converting(lock)) { + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } unlock_res_and_lock(lock); goto exit; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5833f59..ad54bd2 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -854,7 +854,7 @@ static int lock_convert_interpret(const struct lu_env *env, aa->lock_handle.cookie, reply->lock_handle.cookie, req->rq_export->exp_client_uuid.uuid, libcfs_id2str(req->rq_peer)); - rc = -ESTALE; + rc = ELDLM_NO_LOCK_DATA; goto out; } @@ -905,15 +905,30 @@ static int lock_convert_interpret(const struct lu_env *env, unlock_res_and_lock(lock); out: if (rc) { + int flag; + lock_res_and_lock(lock); if (ldlm_is_converting(lock)) { ldlm_clear_converting(lock); ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); + lock->l_policy_data.l_inodebits.cancel_bits = 0; } unlock_res_and_lock(lock); - } + /* fallback to normal lock cancel. If rc means there is no + * valid lock on server, do only local cancel + */ + if (rc == ELDLM_NO_LOCK_DATA) + flag = LCF_LOCAL; + else + flag = LCF_ASYNC; + + rc = ldlm_cli_cancel(&aa->lock_handle, flag); + if (rc < 0) + LDLM_DEBUG(lock, "failed to cancel lock: rc = %d\n", + rc); + } LDLM_LOCK_PUT(lock); return rc; } @@ -942,6 +957,15 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) return -EINVAL; } + /* this is better to check earlier and it is done so already, + * but this check is kept too as final one to issue an error + * if any new code will miss such check. + */ + if (!exp_connect_lock_convert(exp)) { + LDLM_ERROR(lock, "server doesn't support lock convert\n"); + return -EPROTO; + } + if (lock->l_resource->lr_type != LDLM_IBITS) { LDLM_ERROR(lock, "convert works with IBITS locks only."); return -EINVAL; @@ -970,13 +994,12 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) ptlrpc_request_set_replen(req); - /* That could be useful to use cancel portals for convert as well - * as high-priority handling. This will require changes in - * ldlm_cancel_handler to understand convert RPC as well. - * - * req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; - * req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + /* + * Use cancel portals for convert as well as high-priority handling. */ + req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; + req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + ptlrpc_at_set_req_timeout(req); if (exp->exp_obd->obd_svc_stats) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index dff349f..0844318 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -209,7 +209,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_GRANT_PARAM | OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2; - data->ocd_connect_flags2 = OBD_CONNECT2_FLR; + data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 8b1a1ca..f835abb 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -371,11 +371,16 @@ void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) */ int ll_md_need_convert(struct ldlm_lock *lock) { + struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); struct inode *inode; u64 wanted = lock->l_policy_data.l_inodebits.cancel_bits; u64 bits = lock->l_policy_data.l_inodebits.bits & ~wanted; enum ldlm_mode mode = LCK_MINMODE; + if (!lock->l_conn_export || + !exp_connect_lock_convert(lock->l_conn_export)) + return 0; + if (!wanted || !bits || ldlm_is_cancel(lock)) return 0; @@ -410,7 +415,7 @@ int ll_md_need_convert(struct ldlm_lock *lock) lock_res_and_lock(lock); if (ktime_after(ktime_get(), ktime_add(lock->l_last_used, - ktime_set(10, 0)))) { + ktime_set(ns->ns_dirty_age_limit, 0)))) { unlock_res_and_lock(lock); return 0; } From patchwork Thu Feb 27 21:08:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409743 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2832514BC for ; Thu, 27 Feb 2020 21:21:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1078E2469F for ; Thu, 27 Feb 2020 21:21:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1078E2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3469921FD1F; Thu, 27 Feb 2020 13:20:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 690E321FAAA for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D4A06E1F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D1AD0468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:44 -0500 Message-Id: <1582838290-17243-57-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 056/622] lustre: ptlrpc: Serialize procfs access to scp_hist_reqs using mutex X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh scp_hist_reqs list can be quite long thus a lot of userland processes can waste CPU power in spinlock cycles. Cray-bug-id: LUS-5833 WC-bug-id: https://jira.whamcloud.com/browse/LU-11004 Lustre-commit: 413a738a37d7 ("LU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex") Signed-off-by: Andriy Skulysh Reviewed-by: Andrew Perepechko Reviewed-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/32307 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 ++ fs/lustre/ptlrpc/lproc_ptlrpc.c | 7 +++++++ fs/lustre/ptlrpc/service.c | 1 + 3 files changed, 10 insertions(+) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 674803c..cf13555 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1543,6 +1543,8 @@ struct ptlrpc_service_part { * threads starting & stopping are also protected by this lock. */ spinlock_t scp_lock __cfs_cacheline_aligned; + /* userland serialization */ + struct mutex scp_mutex; /** total # req buffer descs allocated */ int scp_nrqbds_total; /** # posted request buffers for receiving */ diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index e48a4e8..0efbcfc 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -869,10 +869,12 @@ struct ptlrpc_srh_iterator { if (i > cpt) /* make up the lowest position for this CPT */ *pos = PTLRPC_REQ_CPT2POS(svc, i); + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, PTLRPC_REQ_POS2SEQ(svc, *pos)); spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); if (rc == 0) { *pos = PTLRPC_REQ_SEQ2POS(svc, srhi->srhi_seq); srhi->srhi_idx = i; @@ -914,9 +916,11 @@ struct ptlrpc_srh_iterator { seq = srhi->srhi_seq + (1 << svc->srv_cpt_bits); } + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, seq); spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); if (rc == 0) { *pos = PTLRPC_REQ_SEQ2POS(svc, srhi->srhi_seq); srhi->srhi_idx = i; @@ -940,6 +944,7 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) svcpt = svc->srv_parts[srhi->srhi_idx]; + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, srhi->srhi_seq); @@ -980,6 +985,8 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) } spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); + return rc; } diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 8dae21a..cf920ae 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -471,6 +471,7 @@ static void ptlrpc_at_timer(struct timer_list *t) /* rqbd and incoming request queue */ spin_lock_init(&svcpt->scp_lock); + mutex_init(&svcpt->scp_mutex); INIT_LIST_HEAD(&svcpt->scp_rqbd_idle); INIT_LIST_HEAD(&svcpt->scp_rqbd_posted); INIT_LIST_HEAD(&svcpt->scp_req_incoming); From patchwork Thu Feb 27 21:08:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E34A4138D for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB5B9246A0 for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB5B9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A294734890A; Thu, 27 Feb 2020 13:21:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C109421FA65 for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D62BBE20; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D4CC046D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:45 -0500 Message-Id: <1582838290-17243-58-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 057/622] lustre: ldlm: don't add canceling lock back to LRU X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin When lock is converted check it is not canceling before adding it back to LRU. Lustre-commit: ad52f394bd82 ("LU-11003 ldlm: don't add canceling lock back to LRU") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32692 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index ad54bd2..bc441f0 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -893,7 +893,8 @@ static int lock_convert_interpret(const struct lu_env *env, * is not there yet. */ lock->l_policy_data.l_inodebits.cancel_bits = 0; - if (!lock->l_readers && !lock->l_writers) { + if (!lock->l_readers && !lock->l_writers && + !ldlm_is_canceling(lock)) { spin_lock(&ns->ns_lock); /* there is check for list_empty() inside */ ldlm_lock_remove_from_lru_nolock(lock); From patchwork Thu Feb 27 21:08:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3C1314BC for ; Thu, 27 Feb 2020 21:21:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9C5F8246A0 for ; Thu, 27 Feb 2020 21:21:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C5F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2408B21F8F8; Thu, 27 Feb 2020 13:20:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0F39E21FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DA9EDE21; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D7D6146F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:46 -0500 Message-Id: <1582838290-17243-59-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 058/622] lustre: quota: add default quota setting support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang Similar function which is motivated by GPFS which is friendly feature for cluster administrators to manage quota. Lazy Quota default setting support, here is basic idea: Default quota setting is global quota setting for user, group, project quotas, if default quota is set for one quota type, newer created users/groups/projects will inherit this setting automatically, since Lustre itself don't have ideas when new users created, they could only know when this users trying to acquire space from Lustre. So we try to implement lazy quota setting inherit, Slave firstly check if there exists default quota setting, if exists, it will force slave to acquire quota from master, and master will detect whether default quota is set, then it will set this quota and also return proper grant space to slave. To implement this and reuse existed quota APIs, we try to manage the default quota in the quota record of 0 id, and enforce the quota check when reading the quota recored from disk. In the current Lustre implementation, the grace time is either the time or the timestamp to be used after some quota ID exceeds the soft limt, then 48bits should be enough for it, its high 16bits can be used as kinds of quota flags, this patch will use one of them as the default quota flag. The global quota record used by default quota will set its soft and hard limit as zero, its grace time will contain the default flag. Use lfs setquota -U/-G/-P to set default quota. Use lfs setquota -u/-g/-p foo -d to set foo to use default quota Use lfs quota -U/-G/-P to show default quota. WC-bug-id: https://jira.whamcloud.com/browse/LU-7816 Lustre-commit: 530881fe4ee2 ("LU-7816 quota: add default quota setting support") Signed-off-by: Wang Shilong Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/32306 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 4 +++- include/uapi/linux/lustre/lustre_user.h | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index b006e32..c0c3bf0 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -949,10 +949,12 @@ static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl) switch (cmd) { case Q_SETQUOTA: case Q_SETINFO: + case LUSTRE_Q_SETDEFAULT: if (!capable(CAP_SYS_ADMIN)) return -EPERM; break; case Q_GETQUOTA: + case LUSTRE_Q_GETDEFAULT: if (check_owner(type, id) && !capable(CAP_SYS_ADMIN)) return -EPERM; break; @@ -960,7 +962,7 @@ static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl) break; default: CERROR("unsupported quotactl op: %#x\n", cmd); - return -ENOTTY; + return -ENOTSUPP; } if (valid != QC_GENERAL) { diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 5405e1b..5956f33 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -728,6 +728,28 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen) /* lustre-specific control commands */ #define LUSTRE_Q_INVALIDATE 0x80000b /* deprecated as of 2.4 */ #define LUSTRE_Q_FINVALIDATE 0x80000c /* deprecated as of 2.4 */ +#define LUSTRE_Q_GETDEFAULT 0x80000d /* get default quota */ +#define LUSTRE_Q_SETDEFAULT 0x80000e /* set default quota */ + +/* In the current Lustre implementation, the grace time is either the time + * or the timestamp to be used after some quota ID exceeds the soft limt, + * 48 bits should be enough, its high 16 bits can be used as quota flags. + */ +#define LQUOTA_GRACE_BITS 48 +#define LQUOTA_GRACE_MASK ((1ULL << LQUOTA_GRACE_BITS) - 1) +#define LQUOTA_GRACE_MAX LQUOTA_GRACE_MASK +#define LQUOTA_GRACE(t) (t & LQUOTA_GRACE_MASK) +#define LQUOTA_FLAG(t) (t >> LQUOTA_GRACE_BITS) +#define LQUOTA_GRACE_FLAG(t, f) ((__u64)t | (__u64)f << LQUOTA_GRACE_BITS) + +/* different quota flags */ + +/* the default quota flag, the corresponding quota ID will use the default + * quota setting, the hardlimit and softlimit of its quota record in the global + * quota file will be set to 0, the low 48 bits of the grace will be set to 0 + * and high 16 bits will contain this flag (see above comment). + */ +#define LQUOTA_FLAG_DEFAULT 0x0001 #define ALLQUOTA 255 /* set all quota */ From patchwork Thu Feb 27 21:08:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409747 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F2EA138D for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 07AD62469F for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07AD62469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D224B21FB99; Thu, 27 Feb 2020 13:20:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 681BC21FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DC388E22; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DAEBF46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:47 -0500 Message-Id: <1582838290-17243-60-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 059/622] lustre: ptlrpc: don't zero request handle X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko LNet can retransmit a request at any time if it isn't replied. The ptlrpc_resend_req zero the request handle and ptlrpc_send_rpc set it. If retransmission happen with zeroed handle, the client can't find a valid export by handle and set rq_export to NULL and reply with ENOTCONN. A server evict client with this error. client (nid x.x.x.x@tcp) returned error from blocking AST (req status -107 rc -107), evict it WC-bug-id: https://jira.whamcloud.com/browse/LU-11117 Lustre-commit: 00c72ab6bb43 ("LU-11117 ptlrpc: don't zero request handle") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-6037 Reviewed-on: https://review.whamcloud.com/32781 Reviewed-by: Mikhail Pershin Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 9b41c12..d28a9cd 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2728,7 +2728,6 @@ void ptlrpc_resend_req(struct ptlrpc_request *req) return; } - lustre_msg_set_handle(req->rq_reqmsg, &(struct lustre_handle){ 0 }); req->rq_status = -EAGAIN; req->rq_resend = 1; From patchwork Thu Feb 27 21:08:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23F3A159A for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0966D246A1 for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0966D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A072A21FD8A; Thu, 27 Feb 2020 13:20:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8F9521FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E5DB0E27; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E4381468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:48 -0500 Message-Id: <1582838290-17243-61-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 060/622] lnet: ko2iblnd: determine gaps correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata We're allowed to start at a non-aligned page offset in the first fragment and end at a non-aligned page offset in the last fragment. When checking the iovec exclude both of the first and last fragments from the tx_gaps check. WC-bug-id: https://jira.whamcloud.com/browse/LU-11064 Lustre-commit: e40ea6fd4494 ("LU-11064 lnd: determine gaps correctly") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32586 Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index c2ce3b9..60706b4 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -737,6 +737,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, struct kib_net *net = ni->ni_data; struct scatterlist *sg; int fragnob; + int max_nkiov; CDEBUG(D_NET, "niov %d offset %d nob %d\n", nkiov, offset, nob); @@ -751,16 +752,24 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(nkiov > 0); } + max_nkiov = nkiov; + sg = tx->tx_frags; do { LASSERT(nkiov > 0); fragnob = min((int)(kiov->bv_len - offset), nob); - if ((fragnob < (int)(kiov->bv_len - offset)) && nkiov > 1) { + /* We're allowed to start at a non-aligned page offset in + * the first fragment and end at a non-aligned page offset + * in the last fragment. + */ + if ((fragnob < (int)(kiov->bv_len - offset)) && + nkiov < max_nkiov && nob > fragnob) { CDEBUG(D_NET, - "fragnob %d < available page %d: with remaining %d kiovs\n", - fragnob, (int)(kiov->bv_len - offset), nkiov); + "fragnob %d < available page %d: with remaining %d kiovs with %d nob left\n", + fragnob, (int)(kiov->bv_len - offset), + nkiov, nob); tx->tx_gaps = true; } From patchwork Thu Feb 27 21:08:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48DE314BC for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 30ECA2469F for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30ECA2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AABBC21FDBE; Thu, 27 Feb 2020 13:20:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC15721FAF6 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E85BAE28; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E74A546A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:49 -0500 Message-Id: <1582838290-17243-62-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 061/622] lustre: osc: increase default max_dirty_mb to 2G X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin While ideally we want to go away from max_dirty_mb setting completely and let grants code to take the msot part of it, Andreas raises a somewhat valid point that for certain system configurations with high-latency links, system administrators might want to have ability to limit amount of dirty pages just for those OSCs to limit amount of time it might take to flush that dirty data. So a good compromise is to lift the max_dirty_mb default value first while we work out the current grant code deficiencies WC-bug-id: https://jira.whamcloud.com/browse/LU-10990 Lustre-commit: 92e2b514e06c ("LU-10990 osc: increase default max_dirty_mb to 2G") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32288 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 99577e4..d2bd234 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -127,7 +127,7 @@ struct timeout_item { #define OBD_MAX_RIF_DEFAULT 8 #define OBD_MAX_RIF_MAX 512 #define OSC_MAX_RIF_MAX 256 -#define OSC_MAX_DIRTY_DEFAULT (OBD_MAX_RIF_DEFAULT * 4) +#define OSC_MAX_DIRTY_DEFAULT 2000 /* Arbitrary large value */ #define OSC_MAX_DIRTY_MB_MAX 2048 /* arbitrary, but < MAX_LONG bytes */ #define OSC_DEFAULT_RESENDS 10 From patchwork Thu Feb 27 21:08:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49D49138D for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 328DA246A0 for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 328DA246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 699CD21FDF3; Thu, 27 Feb 2020 13:20:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38D9D21FA93 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EB4A7E2B; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EA1C346C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:50 -0500 Message-Id: <1582838290-17243-63-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 062/622] lustre: ptlrpc: remove obsolete OBD RPC opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the obsolete OBD_LOG_CANCEL (since Lustre 1.5) and OBD_QC_CALLBACK (since Lustre 2.4) RPC opcodes. Assign OBD_IDX_READ an explicit opcode (as should be done with all enums in lustre_idl.h) so that the value does not change if some prior field is removed. Also remove the OBD_FAIL checks that were used to test them. The setting in conf_sanity.sh test_58 was unused for many years. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 7d89a5b8aefc ("LU-10855 ptlrpc: remove obsolete OBD RPC opcodes") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32651 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 6 +++--- fs/lustre/ptlrpc/lproc_ptlrpc.c | 4 ++-- fs/lustre/ptlrpc/wiretest.c | 4 ---- include/uapi/linux/lustre/lustre_idl.h | 12 ++++++------ 4 files changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 67500b5..99b4f1f 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -352,12 +352,12 @@ #define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 #define OBD_FAIL_OBD_PING_NET 0x600 -#define OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 +/* OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 obsolete since 1.5 */ #define OBD_FAIL_OBD_LOGD_NET 0x602 -/* OBD_FAIL_OBD_QC_CALLBACK_NET 0x603 obsolete since 2.4 */ +/* OBD_FAIL_OBD_QC_CALLBACK_NET 0x603 obsolete since 2.4 */ #define OBD_FAIL_OBD_DQACQ 0x604 #define OBD_FAIL_OBD_LLOG_SETUP 0x605 -#define OBD_FAIL_OBD_LOG_CANCEL_REP 0x606 +/* OBD_FAIL_OBD_LOG_CANCEL_REP 0x606 obsolete since 1.5 */ #define OBD_FAIL_OBD_IDX_READ_NET 0x607 #define OBD_FAIL_OBD_IDX_READ_BREAK 0x608 #define OBD_FAIL_OBD_NO_LRU 0x609 diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 0efbcfc..b70a1c7 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -111,8 +111,8 @@ { MGS_SET_INFO, "mgs_set_info" }, { MGS_CONFIG_READ, "mgs_config_read" }, { OBD_PING, "obd_ping" }, - { OBD_LOG_CANCEL, "llog_cancel" }, - { OBD_QC_CALLBACK, "obd_quota_callback" }, + { 401, /* was OBD_LOG_CANCEL */ "llog_cancel" }, + { 402, /* was OBD_QC_CALLBACK */ "obd_quota_callback" }, { OBD_IDX_READ, "dt_index_read" }, { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, { LLOG_ORIGIN_HANDLE_NEXT_BLOCK, "llog_origin_handle_next_block" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 202c5ab..015c5bd 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -326,10 +326,6 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LUSTRE_RES_ID_HSH_OFF != 3); LASSERTF(OBD_PING == 400, "found %lld\n", (long long)OBD_PING); - LASSERTF(OBD_LOG_CANCEL == 401, "found %lld\n", - (long long)OBD_LOG_CANCEL); - LASSERTF(OBD_QC_CALLBACK == 402, "found %lld\n", - (long long)OBD_QC_CALLBACK); LASSERTF(OBD_IDX_READ == 403, "found %lld\n", (long long)OBD_IDX_READ); LASSERTF(OBD_LAST_OPC == 404, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 798aa57..adaa994 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2342,13 +2342,13 @@ struct cfg_marker { */ enum obd_cmd { - OBD_PING = 400, - OBD_LOG_CANCEL, /* Obsolete since 1.5. */ - OBD_QC_CALLBACK, /* not used since 2.4 */ - OBD_IDX_READ, - OBD_LAST_OPC + OBD_PING = 400, +/* OBD_LOG_CANCEL = 401, Obsolete since 1.5 */ +/* OBD_QC_CALLBACK = 402, not used since 2.4 */ + OBD_IDX_READ = 403, + OBD_LAST_OPC, + OBD_FIRST_OPC = OBD_PING }; -#define OBD_FIRST_OPC OBD_PING /** * llog contexts indices. From patchwork Thu Feb 27 21:08:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7321514BC for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5AF99246A0 for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5AF99246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 820D9348940; Thu, 27 Feb 2020 13:21:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F8B821F982 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EF4C9E33; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ECEC646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:51 -0500 Message-Id: <1582838290-17243-64-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 063/622] lustre: ptlrpc: assign specific values to MGS opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Assign specific values to all of the MGS opcodes in enum mgs_cmd so that these values do not change if a new items is added or one is removed in the future. These opcodes are part of the wire protocol and need to remain constant. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 12c5a26609f1 ("LU-10855 ptlrpc: assign specific values to MGS opcodes") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32653 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 20 ++++++++++---------- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 015c5bd..ef07975 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -348,6 +348,8 @@ void lustre_assert_wire_constants(void) (long long)MGS_TARGET_DEL); LASSERTF(MGS_SET_INFO == 255, "found %lld\n", (long long)MGS_SET_INFO); + LASSERTF(MGS_CONFIG_READ == 256, "found %lld\n", + (long long)MGS_CONFIG_READ); LASSERTF(MGS_LAST_OPC == 257, "found %lld\n", (long long)MGS_LAST_OPC); LASSERTF(SEC_CTX_INIT == 801, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index adaa994..1b5794a 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2247,16 +2247,16 @@ struct ldlm_reply { * Opcodes for mountconf (mgs and mgc) */ enum mgs_cmd { - MGS_CONNECT = 250, - MGS_DISCONNECT, - MGS_EXCEPTION, /* node died, etc. */ - MGS_TARGET_REG, /* whenever target starts up */ - MGS_TARGET_DEL, - MGS_SET_INFO, - MGS_CONFIG_READ, - MGS_LAST_OPC -}; -#define MGS_FIRST_OPC MGS_CONNECT + MGS_CONNECT = 250, + MGS_DISCONNECT = 251, + MGS_EXCEPTION = 252, /* node died, etc. */ + MGS_TARGET_REG = 253, /* whenever target starts up */ + MGS_TARGET_DEL = 254, + MGS_SET_INFO = 255, + MGS_CONFIG_READ = 256, + MGS_LAST_OPC, + MGS_FIRST_OPC = MGS_CONNECT +}; #define MGS_PARAM_MAXLEN 1024 #define KEY_SET_INFO "set_info" From patchwork Thu Feb 27 21:08:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C3BD159A for ; Thu, 27 Feb 2020 21:21:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 44D98246A0 for ; Thu, 27 Feb 2020 21:21:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44D98246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED69921C905; Thu, 27 Feb 2020 13:20:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D2CBE21FA80 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F16CEE35; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EFD6A46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:52 -0500 Message-Id: <1582838290-17243-65-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 064/622] lustre: ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the obsolete RPC opcodes LLOG_ORIGIN_HANDLE_WRITE_REC, LLOG_ORIGIN_HANDLE_CLOSE, LLOG_ORIGIN_CONNECT, LLOG_CATINFO along with their unused OBD_FAIL counterparts. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 830ce1b10f3a ("LU-10855 ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32654 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 10 +++++----- fs/lustre/ptlrpc/lproc_ptlrpc.c | 8 ++++---- fs/lustre/ptlrpc/wiretest.c | 5 ----- include/uapi/linux/lustre/lustre_idl.h | 10 +++++----- 4 files changed, 14 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 99b4f1f..28becfa 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -423,15 +423,15 @@ #define OBD_FAIL_SEC_CTX_HDL_PAUSE 0x1204 #define OBD_FAIL_LLOG 0x1300 -#define OBD_FAIL_LLOG_ORIGIN_CONNECT_NET 0x1301 +/* was OBD_FAIL_LLOG_ORIGIN_CONNECT_NET 0x1301 until 2.4 */ #define OBD_FAIL_LLOG_ORIGIN_HANDLE_CREATE_NET 0x1302 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_DESTROY_NET 0x1303 +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_DESTROY_NET 0x1303 until 2.11 */ #define OBD_FAIL_LLOG_ORIGIN_HANDLE_READ_HEADER_NET 0x1304 #define OBD_FAIL_LLOG_ORIGIN_HANDLE_NEXT_BLOCK_NET 0x1305 #define OBD_FAIL_LLOG_ORIGIN_HANDLE_PREV_BLOCK_NET 0x1306 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_WRITE_REC_NET 0x1307 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_CLOSE_NET 0x1308 -#define OBD_FAIL_LLOG_CATINFO_NET 0x1309 +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_WRITE_REC_NET 0x1307 until 2.1 */ +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_CLOSE_NET 0x1308 until 1.8 */ +/* was OBD_FAIL_LLOG_CATINFO_NET 0x1309 until 2.3 */ #define OBD_FAIL_MDS_SYNC_CAPA_SL 0x1310 #define OBD_FAIL_SEQ_ALLOC 0x1311 diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index b70a1c7..6af3384 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -117,10 +117,10 @@ { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, { LLOG_ORIGIN_HANDLE_NEXT_BLOCK, "llog_origin_handle_next_block" }, { LLOG_ORIGIN_HANDLE_READ_HEADER, "llog_origin_handle_read_header" }, - { LLOG_ORIGIN_HANDLE_WRITE_REC, "llog_origin_handle_write_rec" }, - { LLOG_ORIGIN_HANDLE_CLOSE, "llog_origin_handle_close" }, - { LLOG_ORIGIN_CONNECT, "llog_origin_connect" }, - { LLOG_CATINFO, "llog_catinfo" }, + { 504, /*LLOG_ORIGIN_HANDLE_WRITE_REC*/ "llog_origin_handle_write_rec" }, + { 505, /* was LLOG_ORIGIN_HANDLE_CLOSE */"llog_origin_handle_close" }, + { 506, /* was LLOG_ORIGIN_CONNECT */ "llog_origin_connect" }, + { 507, /* was LLOG_CATINFO */ "llog_catinfo" }, { LLOG_ORIGIN_HANDLE_PREV_BLOCK, "llog_origin_handle_prev_block" }, { LLOG_ORIGIN_HANDLE_DESTROY, "llog_origin_handle_destroy" }, { QUOTA_DQACQ, "quota_acquire" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index ef07975..7b6ea86 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -3757,12 +3757,7 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_CREATE != 501); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_NEXT_BLOCK != 502); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_READ_HEADER != 503); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_WRITE_REC != 504); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_CLOSE != 505); - BUILD_BUG_ON(LLOG_ORIGIN_CONNECT != 506); - BUILD_BUG_ON(LLOG_CATINFO != 507); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_PREV_BLOCK != 508); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_DESTROY != 509); BUILD_BUG_ON(LLOG_FIRST_OPC != 501); BUILD_BUG_ON(LLOG_LAST_OPC != 510); BUILD_BUG_ON(LLOG_CONFIG_ORIG_CTXT != 0); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 1b5794a..5db742f 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2655,12 +2655,12 @@ enum llogd_rpc_ops { LLOG_ORIGIN_HANDLE_CREATE = 501, LLOG_ORIGIN_HANDLE_NEXT_BLOCK = 502, LLOG_ORIGIN_HANDLE_READ_HEADER = 503, - LLOG_ORIGIN_HANDLE_WRITE_REC = 504, /* Obsolete by 2.1. */ - LLOG_ORIGIN_HANDLE_CLOSE = 505, /* Obsolete by 1.8. */ - LLOG_ORIGIN_CONNECT = 506, /* Obsolete by 2.4. */ - LLOG_CATINFO = 507, /* Obsolete by 2.3. */ +/* LLOG_ORIGIN_HANDLE_WRITE_REC = 504, Obsolete by 2.1. */ +/* LLOG_ORIGIN_HANDLE_CLOSE = 505, Obsolete by 1.8. */ +/* LLOG_ORIGIN_CONNECT = 506, Obsolete by 2.4. */ +/* LLOG_CATINFO = 507, Obsolete by 2.3. */ LLOG_ORIGIN_HANDLE_PREV_BLOCK = 508, - LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete. */ + LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete by 2.11. */ LLOG_LAST_OPC, LLOG_FIRST_OPC = LLOG_ORIGIN_HANDLE_CREATE }; From patchwork Thu Feb 27 21:08:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E176D14BC for ; Thu, 27 Feb 2020 21:21:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CA075246A0 for ; Thu, 27 Feb 2020 21:21:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA075246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 83C4E21FE4B; Thu, 27 Feb 2020 13:20:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 34EF221FA8C for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 00D7EE3B; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F2AAE468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:53 -0500 Message-Id: <1582838290-17243-66-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 065/622] lustre: osc: fix idle_timeout handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The patch that landed for LU-7236 introduced new sysfs entries which were done wrong. 1) For idle_timeout it returns -ERANGE for any value passed in expect setting idle_timeout to zero. This does not match what the commit message said for LU-7236. So I changed lprocfs_str_with_units_to_s64() into kstrtouint() since a signed 64 bit timeout is not needed. Using kstrtouint() ensures that negative values are not possible and also cap the value to CONNECTION_SWITCH_MAX since the max of 4 billion seconds is over kill. 2) For the next procfs idle_connect it is really a write only file but it was treated as both read and write. There is no need for the osc_idle_connect_seq_show() function. 3) Lastly no more stuffing new entries into proc or debugfs. For this patch convert these new proc entries to sysfs. It seems to be a common occurrence so add LPROC_SEQ_* to spelling.txt so checkpatch will complain about using LPROC_SEQ_* which will go away. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: 406cd8a74d84 ("LU-8066 osc: fix idle_timeout handling") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32719 Reviewed-by: Alex Zhuravlev Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index fd84393..0a12079 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -598,26 +598,27 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(osc_unstable_stats); -static int osc_idle_timeout_seq_show(struct seq_file *m, void *v) +static ssize_t idle_timeout_show(struct kobject *kobj, struct attribute *attr, + char *buf) { - struct obd_device *obd = m->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &obd->u.cli; - seq_printf(m, "%u\n", cli->cl_import->imp_idle_timeout); - return 0; + return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); } -static ssize_t osc_idle_timeout_seq_write(struct file *f, - const char __user *buffer, - size_t count, loff_t *off) +static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) { - struct obd_device *obd = ((struct seq_file *)f->private_data)->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &obd->u.cli; struct ptlrpc_request *req; unsigned int val; int rc; - rc = kstrtouint_from_user(buffer, count, 0, &val); + rc = kstrtouint(buffer, 0, &val); if (rc) return rc; @@ -635,18 +636,13 @@ static ssize_t osc_idle_timeout_seq_write(struct file *f, return count; } -LPROC_SEQ_FOPS(osc_idle_timeout); +LUSTRE_RW_ATTR(idle_timeout); -static int osc_idle_connect_seq_show(struct seq_file *m, void *v) +static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) { - return 0; -} - -static ssize_t osc_idle_connect_seq_write(struct file *f, - const char __user *buffer, - size_t count, loff_t *off) -{ - struct obd_device *dev = ((struct seq_file *)f->private_data)->private; + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &dev->u.cli; struct ptlrpc_request *req; @@ -658,7 +654,7 @@ static ssize_t osc_idle_connect_seq_write(struct file *f, return count; } -LPROC_SEQ_FOPS(osc_idle_connect); +LUSTRE_WO_ATTR(idle_connect); LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); @@ -687,10 +683,6 @@ static ssize_t osc_idle_connect_seq_write(struct file *f, .fops = &osc_pinger_recov_fops }, { .name = "unstable_stats", .fops = &osc_unstable_stats_fops }, - { .name = "idle_timeout", - .fops = &osc_idle_timeout_fops }, - { .name = "idle_connect", - .fops = &osc_idle_connect_fops }, { NULL } }; @@ -877,6 +869,8 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_resend_count.attr, &lustre_attr_ost_conn_uuid.attr, &lustre_attr_ping.attr, + &lustre_attr_idle_timeout.attr, + &lustre_attr_idle_connect.attr, NULL, }; From patchwork Thu Feb 27 21:08:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8A37138D for ; Thu, 27 Feb 2020 21:21:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B17A2246A0 for ; Thu, 27 Feb 2020 21:21:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B17A2246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F26021FE9B; Thu, 27 Feb 2020 13:20:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8E74121FABD for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 02F5FE3D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 016C346A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:54 -0500 Message-Id: <1582838290-17243-67-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 066/622] lustre: ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor)) X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh It's ptlrpc_replay_next() vs close race. ll_close_inode_openhandle() calls mdc_free_open()->ptlrpc_request_committed->ptlrpc_free_request Need to reset imp_replay_cursor while dropping a request from replay list. Cray-bug-id: LUS-2455 WC-bug-id: https://jira.whamcloud.com/browse/LU-11098 Lustre-commit: d69d488e1778 ("LU-11098 ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor))") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/32727 Reviewed-by: Andreas Dilger Reviewed-by: Vladimir Saveliev Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index d28a9cd..57b08de 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2613,8 +2613,11 @@ void ptlrpc_request_committed(struct ptlrpc_request *req, int force) return; } - if (force || req->rq_transno <= imp->imp_peer_committed_transno) + if (force || req->rq_transno <= imp->imp_peer_committed_transno) { + if (imp->imp_replay_cursor == &req->rq_replay_list) + imp->imp_replay_cursor = req->rq_replay_list.next; ptlrpc_free_request(req); + } spin_unlock(&imp->imp_lock); } From patchwork Thu Feb 27 21:08:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409773 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2592214BC for ; Thu, 27 Feb 2020 21:21:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D0F8246A0 for ; Thu, 27 Feb 2020 21:21:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D0F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6667B21FC83; Thu, 27 Feb 2020 13:20:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D3D7421FA5B for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 059B8E3E; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0444646C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:55 -0500 Message-Id: <1582838290-17243-68-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 067/622] lustre: obd: keep dirty_max_pages a round number of MB X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In client_adjust_max_dirty() ensure that the dirty pages limit is always divisible by 256 so that it may faithfully be represented in MB as is the case when the max_dirty_mb parameters are used. WC-bug-id: https://jira.whamcloud.com/browse/LU-11157 Lustre-commit: d3f88d376c49 ("LU-11157 obd: keep dirty_max_pages a round number of MB") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32831 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index d2bd234..5656eb0 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -1106,7 +1106,7 @@ static inline int cli_brw_size(struct obd_device *obd) } /* - * when RPC size or the max RPCs in flight is increased, the max dirty pages + * When RPC size or the max RPCs in flight is increased, the max dirty pages * of the client should be increased accordingly to avoid sending fragmented * RPCs over the network when the client runs out of the maximum dirty space * when so many RPCs are being generated. @@ -1114,10 +1114,10 @@ static inline int cli_brw_size(struct obd_device *obd) static inline void client_adjust_max_dirty(struct client_obd *cli) { /* initializing */ - if (cli->cl_dirty_max_pages <= 0) + if (cli->cl_dirty_max_pages <= 0) { cli->cl_dirty_max_pages = (OSC_MAX_DIRTY_DEFAULT * 1024 * 1024) >> PAGE_SHIFT; - else { + } else { unsigned long dirty_max = cli->cl_max_rpcs_in_flight * cli->cl_max_pages_per_rpc; @@ -1127,6 +1127,13 @@ static inline void client_adjust_max_dirty(struct client_obd *cli) if (cli->cl_dirty_max_pages > totalram_pages() / 8) cli->cl_dirty_max_pages = totalram_pages() / 8; + + /* This value is exported to userspace through the max_dirty_mb + * parameter. So we round up the number of pages to make it a round + * number of MBs. + */ + cli->cl_dirty_max_pages = round_up(cli->cl_dirty_max_pages, + 1 << (20 - PAGE_SHIFT)); } #endif /* __OBD_H */ From patchwork Thu Feb 27 21:08:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DD69138D for ; Thu, 27 Feb 2020 21:21:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA2A5246A1 for ; Thu, 27 Feb 2020 21:21:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA2A5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E59221FEB4; Thu, 27 Feb 2020 13:20:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2666521FAF6 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 08891E3F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 072BA46D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:56 -0500 Message-Id: <1582838290-17243-69-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 068/622] lustre: osc: depart grant shrinking from pinger X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bobi Jam , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam * Removing grant shrinking code outside of pinger, use a workqueue to handle grant shrinking timer. * Enable OSC grant shrinking by default. bugzilla: 19507 WC-bug-id: https://jira.whamcloud.com/browse/LU-8708 Lustre-commit: fc915a43786e ("LU-8708 osc: depart grant shrinking from pinger") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/23202 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 1 + fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/osc/osc_request.c | 155 ++++++++++++++++++++++++++++++-------------- 3 files changed, 110 insertions(+), 48 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 2c0fad3..838ddb3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -349,6 +349,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) spin_lock_init(&cli->cl_lru_list_lock); atomic_long_set(&cli->cl_unstable_count, 0); INIT_LIST_HEAD(&cli->cl_shrink_list); + INIT_LIST_HEAD(&cli->cl_grant_chain); INIT_LIST_HEAD(&cli->cl_flight_waiters); cli->cl_rpcs_in_flight = 0; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 0844318..56624e8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -399,7 +399,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_LAYOUTLOCK | OBD_CONNECT_PINGLESS | OBD_CONNECT_LFSCK | OBD_CONNECT_BULK_MBITS | OBD_CONNECT_SHORTIO | - OBD_CONNECT_FLAGS2; + OBD_CONNECT_FLAGS2 | OBD_CONNECT_GRANT_SHRINK; /* The client currently advertises support for OBD_CONNECT_LOCKAHEAD_OLD * so it can interoperate with an older version of lockahead which was diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index e341fcc..1a9ed8d 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -33,6 +33,7 @@ #define DEBUG_SUBSYSTEM S_OSC +#include #include #include #include @@ -721,6 +722,16 @@ static void osc_update_grant(struct client_obd *cli, struct ost_body *body) } } +/** + * grant thread data for shrinking space. + */ +struct grant_thread_data { + struct list_head gtd_clients; + struct mutex gtd_mutex; + unsigned long gtd_stopped:1; +}; +static struct grant_thread_data client_gtd; + static int osc_shrink_grant_interpret(const struct lu_env *env, struct ptlrpc_request *req, void *aa, int rc) @@ -823,6 +834,9 @@ static int osc_should_shrink_grant(struct client_obd *client) { time64_t next_shrink = client->cl_next_shrink_grant; + if (!client->cl_import) + return 0; + if ((client->cl_import->imp_connect_data.ocd_connect_flags & OBD_CONNECT_GRANT_SHRINK) == 0) return 0; @@ -843,38 +857,83 @@ static int osc_should_shrink_grant(struct client_obd *client) return 0; } -static int osc_grant_shrink_grant_cb(struct timeout_item *item, void *data) -{ - struct client_obd *client; +#define GRANT_SHRINK_RPC_BATCH 100 + +static void osc_grant_work_handler(struct work_struct *data); +static DECLARE_DELAYED_WORK(work, osc_grant_work_handler); - list_for_each_entry(client, &item->ti_obd_list, cl_grant_shrink_list) { - if (osc_should_shrink_grant(client)) - osc_shrink_grant(client); +static void osc_grant_work_handler(struct work_struct *data) +{ + struct client_obd *cli; + int rpc_sent; + bool init_next_shrink = true; + time64_t next_shrink = ktime_get_seconds() + GRANT_SHRINK_INTERVAL; + + rpc_sent = 0; + mutex_lock(&client_gtd.gtd_mutex); + list_for_each_entry(cli, &client_gtd.gtd_clients, + cl_grant_chain) { + if (++rpc_sent < GRANT_SHRINK_RPC_BATCH && + osc_should_shrink_grant(cli)) + osc_shrink_grant(cli); + + if (!init_next_shrink) { + if (cli->cl_next_shrink_grant < next_shrink && + cli->cl_next_shrink_grant > ktime_get_seconds()) + next_shrink = cli->cl_next_shrink_grant; + } else { + init_next_shrink = false; + next_shrink = cli->cl_next_shrink_grant; + } } - return 0; + mutex_unlock(&client_gtd.gtd_mutex); + + if (client_gtd.gtd_stopped == 1) + return; + + if (next_shrink > ktime_get_seconds()) + schedule_delayed_work(&work, msecs_to_jiffies( + (next_shrink - ktime_get_seconds()) * + MSEC_PER_SEC)); + else + schedule_work(&work.work); } -static int osc_add_shrink_grant(struct client_obd *client) +/** + * Start grant thread for returing grant to server for idle clients. + */ +static int osc_start_grant_work(void) { - int rc; + client_gtd.gtd_stopped = 0; + mutex_init(&client_gtd.gtd_mutex); + INIT_LIST_HEAD(&client_gtd.gtd_clients); + + schedule_work(&work.work); - rc = ptlrpc_add_timeout_client(client->cl_grant_shrink_interval, - TIMEOUT_GRANT, - osc_grant_shrink_grant_cb, NULL, - &client->cl_grant_shrink_list); - if (rc) { - CERROR("add grant client %s error %d\n", cli_name(client), rc); - return rc; - } - CDEBUG(D_CACHE, "add grant client %s\n", cli_name(client)); - osc_update_next_shrink(client); return 0; } -static int osc_del_shrink_grant(struct client_obd *client) +static void osc_stop_grant_work(void) +{ + client_gtd.gtd_stopped = 1; + cancel_delayed_work_sync(&work); +} + +static void osc_add_grant_list(struct client_obd *client) { - return ptlrpc_del_timeout_client(&client->cl_grant_shrink_list, - TIMEOUT_GRANT); + mutex_lock(&client_gtd.gtd_mutex); + list_add(&client->cl_grant_chain, &client_gtd.gtd_clients); + mutex_unlock(&client_gtd.gtd_mutex); +} + +static void osc_del_grant_list(struct client_obd *client) +{ + if (list_empty(&client->cl_grant_chain)) + return; + + mutex_lock(&client_gtd.gtd_mutex); + list_del_init(&client->cl_grant_chain); + mutex_unlock(&client_gtd.gtd_mutex); } void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd) @@ -929,9 +988,8 @@ void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd) cli_name(cli), cli->cl_avail_grant, cli->cl_lost_grant, cli->cl_chunkbits, cli->cl_max_extent_pages); - if (ocd->ocd_connect_flags & OBD_CONNECT_GRANT_SHRINK && - list_empty(&cli->cl_grant_shrink_list)) - osc_add_shrink_grant(cli); + if (OCD_HAS_FLAG(ocd, GRANT_SHRINK) && list_empty(&cli->cl_grant_chain)) + osc_add_grant_list(cli); } EXPORT_SYMBOL(osc_init_grant); @@ -2971,15 +3029,12 @@ int osc_disconnect(struct obd_export *exp) * osc_disconnect * del_shrink_grant * ptlrpc_connect_interrupt - * init_grant_shrink + * osc_init_grant * add this client to shrink list - * cleanup_osc - * Bang! pinger trigger the shrink. - * So the osc should be disconnected from the shrink list, after we - * are sure the import has been destroyed. BUG18662 + * cleanup_osc + * Bang! grant shrink thread trigger the shrink. BUG18662 */ - if (!obd->u.cli.cl_import) - osc_del_shrink_grant(&obd->u.cli); + osc_del_grant_list(&obd->u.cli); return rc; } EXPORT_SYMBOL(osc_disconnect); @@ -3159,8 +3214,8 @@ int osc_setup_common(struct obd_device *obd, struct lustre_cfg *lcfg) goto out_ptlrpcd_work; cli->cl_grant_shrink_interval = GRANT_SHRINK_INTERVAL; + osc_update_next_shrink(cli); - INIT_LIST_HEAD(&cli->cl_grant_shrink_list); return 0; out_ptlrpcd_work: @@ -3210,7 +3265,6 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg) atomic_add(added, &osc_pool_req_count); } - INIT_LIST_HEAD(&cli->cl_grant_shrink_list); ns_register_cancel(obd->obd_namespace, osc_cancel_weight); spin_lock(&osc_shrink_lock); @@ -3356,14 +3410,19 @@ static int __init osc_init(void) if (rc) return rc; + rc = class_register_type(&osc_obd_ops, NULL, + LUSTRE_OSC_NAME, &osc_device_type); + if (rc) + goto out_kmem; + rc = register_shrinker(&osc_cache_shrinker); if (rc) - goto err; + goto out_type; /* This is obviously too much memory, only prevent overflow here */ if (osc_reqpool_mem_max >= 1 << 12 || osc_reqpool_mem_max == 0) { rc = -EINVAL; - goto err; + goto out_shrinker; } reqpool_size = osc_reqpool_mem_max << 20; @@ -3383,29 +3442,31 @@ static int __init osc_init(void) atomic_set(&osc_pool_req_count, 0); osc_rq_pool = ptlrpc_init_rq_pool(0, OST_MAXREQSIZE, ptlrpc_add_rqs_to_pool); + if (!osc_rq_pool) { + rc = -ENOMEM; + goto out_shrinker; + } - rc = -ENOMEM; - - if (!osc_rq_pool) - goto err; - - rc = class_register_type(&osc_obd_ops, NULL, - LUSTRE_OSC_NAME, &osc_device_type); + rc = osc_start_grant_work(); if (rc) - goto err; + goto out_req_pool; return rc; -err: - if (osc_rq_pool) - ptlrpc_free_rq_pool(osc_rq_pool); +out_req_pool: + ptlrpc_free_rq_pool(osc_rq_pool); +out_type: + class_unregister_type(LUSTRE_OSC_NAME); +out_shrinker: unregister_shrinker(&osc_cache_shrinker); +out_kmem: lu_kmem_fini(osc_caches); return rc; } static void /*__exit*/ osc_exit(void) { + osc_stop_grant_work(); unregister_shrinker(&osc_cache_shrinker); class_unregister_type(LUSTRE_OSC_NAME); lu_kmem_fini(osc_caches); From patchwork Thu Feb 27 21:08:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30C7D159A for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 19B1C246A1 for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19B1C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8BBB421FF8F; Thu, 27 Feb 2020 13:20:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D84A21FA75 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0CAC1EC0; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0A27646F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:57 -0500 Message-Id: <1582838290-17243-70-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 069/622] lustre: mdt: Lazy size on MDT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The design of Lazy size on MDT (LSOM) does not guarantee the accuracy. A file that is being opened for a long time might cause inaccurate LSOM for a very long time. And also eviction or crash of client might cause incomplete process of closing a file, thus might cause inaccurate LSOM. A precise LSOM could only be read from MDT when 1) all possible corruption and inconsistency caused by client eviction or client/server crash have all been fixed by LFSCK and 2) the file is not being opened for write. In the first step of implementing LSOM, LSOM will not be accessible from client. Instead, LSOM values can only be accessed on MDT. Thus, no interface or logic codes will be added on client side to enabled the access of LSOM from client side. The LSOM will be saved as an EA value on MDT. LSOM includes both the apparent size and also the disk usage of the file. Whenever a file is being truncated, the LSOM of the file on MDT will be updated. Whenever a client is closing a file, ll_prepare_close() will send the size and blocks to the MDS. The MDS will update the LSOM of the file if the file size or block size is being increased. WC-bug-id: https://jira.whamcloud.com/browse/LU-9538 Lustre-commit: f1ebf88aef21 ("LU-9538 mdt: Lazy size on MDT") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/29960 Reviewed-by: Vitaly Fertman Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 4 +++- fs/lustre/llite/file.c | 5 +++++ fs/lustre/mdc/mdc_lib.c | 4 ++++ fs/lustre/ptlrpc/wiretest.c | 24 ++++++++++++++++++++++++ include/uapi/linux/lustre/lustre_idl.h | 2 ++ include/uapi/linux/lustre/lustre_user.h | 17 +++++++++++++++-- 6 files changed, 53 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 5656eb0..c712979 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -204,7 +204,7 @@ struct client_obd { long cl_reserved_grant; wait_queue_head_t cl_cache_waiters; /* waiting for cache/grant */ time64_t cl_next_shrink_grant; /* seconds */ - struct list_head cl_grant_shrink_list; /* Timeout event list */ + struct list_head cl_grant_chain; time64_t cl_grant_shrink_interval; /* seconds */ /* A chunk is an optimal size used by osc_extent to determine @@ -670,6 +670,8 @@ enum op_xvalid { OP_XVALID_OWNEROVERRIDE = BIT(2), /* 0x0004 */ OP_XVALID_FLAGS = BIT(3), /* 0x0008 */ OP_XVALID_PROJID = BIT(4), /* 0x0010 */ + OP_XVALID_LAZYSIZE = BIT(5), /* 0x0020 */ + OP_XVALID_LAZYBLOCKS = BIT(6), /* 0x0040 */ }; struct lu_context; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index c3fb104b..837add1 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -207,6 +207,11 @@ static int ll_close_inode_openhandle(struct inode *inode, break; } + if (!(op_data->op_attr.ia_valid & ATTR_SIZE)) + op_data->op_xvalid |= OP_XVALID_LAZYSIZE; + if (!(op_data->op_xvalid & OP_XVALID_BLOCKS)) + op_data->op_xvalid |= OP_XVALID_LAZYBLOCKS; + rc = md_close(md_exp, op_data, och->och_mod, &req); if (rc && rc != -EINTR) { CERROR("%s: inode " DFID " mdc close failed: rc = %d\n", diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 467503c..e2f1a49 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -317,6 +317,10 @@ static inline u64 attr_pack(unsigned int ia_valid, enum op_xvalid ia_xvalid) sa_valid |= MDS_OPEN_OWNEROVERRIDE; if (ia_xvalid & OP_XVALID_PROJID) sa_valid |= MDS_ATTR_PROJID; + if (ia_xvalid & OP_XVALID_LAZYSIZE) + sa_valid |= MDS_ATTR_LSIZE; + if (ia_xvalid & OP_XVALID_LAZYBLOCKS) + sa_valid |= MDS_ATTR_LBLOCKS; return sa_valid; } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 7b6ea86..b4bb30d 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -258,6 +258,10 @@ void lustre_assert_wire_constants(void) LASSERTF(MDS_ATTR_PROJID == 0x0000000000010000ULL, "found 0x%.16llxULL\n", (long long)MDS_ATTR_PROJID); + LASSERTF(MDS_ATTR_LSIZE == 0x0000000000020000ULL, "found 0x%.16llxULL\n", + (long long)MDS_ATTR_LSIZE); + LASSERTF(MDS_ATTR_LBLOCKS == 0x0000000000040000ULL, "found 0x%.16llxULL\n", + (long long)MDS_ATTR_LBLOCKS); LASSERTF(FLD_QUERY == 900, "found %lld\n", (long long)FLD_QUERY); LASSERTF(FLD_FIRST_OPC == 900, "found %lld\n", @@ -390,6 +394,26 @@ void lustre_assert_wire_constants(void) LASSERTF(LU_SEQ_RANGE_OST == 1, "found %lld\n", (long long)LU_SEQ_RANGE_OST); + /* Checks for struct lustre_som_attrs */ + LASSERTF((int)sizeof(struct lustre_som_attrs) == 24, "found %lld\n", + (long long)(int)sizeof(struct lustre_som_attrs)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_valid) == 0, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_valid)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_valid) == 2, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_valid)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_reserved) == 2, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_reserved)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_reserved) == 6, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_reserved)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_size) == 8, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_size)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_size) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_size)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_blocks) == 16, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_blocks)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_blocks) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_blocks)); + /* Checks for struct lustre_mdt_attrs */ LASSERTF((int)sizeof(struct lustre_mdt_attrs) == 24, "found %lld\n", (long long)(int)sizeof(struct lustre_mdt_attrs)); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 5db742f..9f8d65d 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1676,6 +1676,8 @@ struct mdt_rec_setattr { */ #define MDS_ATTR_BLOCKS 0x8000ULL /* = 32768 */ #define MDS_ATTR_PROJID 0x10000ULL /* = 65536 */ +#define MDS_ATTR_LSIZE 0x20000ULL /* = 131072 */ +#define MDS_ATTR_LBLOCKS 0x40000ULL /* = 262144 */ enum mds_op_bias { /* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 5956f33..b2f5b57 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -202,8 +202,19 @@ struct lustre_mdt_attrs { */ #define LMA_OLD_SIZE (sizeof(struct lustre_mdt_attrs) + 5 * sizeof(__u64)) -enum { - LSOM_FL_VALID = 1 << 0, +enum lustre_som_flags { + /* Unknown or no SoM data, must get size from OSTs. */ + SOM_FL_UNKNOWN = 0x0000, + /* Known strictly correct, FLR or DoM file (SoM guaranteed). */ + SOM_FL_STRICT = 0x0001, + /* Known stale - was right at some point in the past, but it is + * known (or likely) to be incorrect now (e.g. opened for write). + */ + SOM_FL_STALE = 0x0002, + /* Approximate, may never have been strictly correct, + * need to sync SOM data to achieve eventual consistency. + */ + SOM_FL_LAZY = 0x0004, }; struct lustre_som_attrs { @@ -882,6 +893,8 @@ enum la_valid { LA_KILL_SGID = 1 << 14, LA_PROJID = 1 << 15, LA_LAYOUT_VERSION = 1 << 16, + LA_LSIZE = 1 << 17, + LA_LBLOCKS = 1 << 18, /** * Attributes must be transmitted to OST objects */ From patchwork Thu Feb 27 21:08:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30C5014BC for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 18AA9246A0 for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18AA9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73A0F21FF8D; Thu, 27 Feb 2020 13:20:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4D0C21FA75 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0E444EC1; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0D11F468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:58 -0500 Message-Id: <1582838290-17243-71-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 070/622] lustre: lfsck: layout LFSCK for mirrored file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong This patch makes the layout LFSCK to support mirrored file as following: 1. Verify mirrored file's LOV EA and PFID EA, including all kinds of inconsistencies as non-mirrored file may hit. 2. Rebuild mirrored file's LOV EA from orphan OST-objects, recover the component's status/flags before the crash: init, stale, and so on. 3. For the mirrored file with dangling reference (OST object), it does NOT rebuild the lost OST-object from other replica, instead, it either reports the curruption or re-create empty OST-object that follows the same rules as non-mirrored case. Some code cleanup and new test cases for LFSCK against mirrored file. For the linux client we want to keep the wire protocol in sync. WC-bug-id: https://jira.whamcloud.com/browse/LU-10288 Lustre-commit: 36ba989752c6 ("LU-10288 lfsck: layout LFSCK for mirrored file") Signed-off-by: Fan Yong Reviewed-on: https://review.whamcloud.com/32705 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/pack_generic.c | 4 +++- fs/lustre/ptlrpc/wiretest.c | 16 ++++++++++++---- include/uapi/linux/lustre/lustre_user.h | 4 +++- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 9cea826..d09cf3f 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2066,7 +2066,9 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum) __swab64s(&ent->lcme_extent.e_end); __swab32s(&ent->lcme_offset); __swab32s(&ent->lcme_size); - BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding) == 0); + __swab32s(&ent->lcme_layout_gen); + BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_1) == 0); + BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_2) == 0); v1 = (struct lov_user_md_v1 *)((char *)lum + off); stripe_count = v1->lmm_stripe_count; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index b4bb30d..e22f8f8 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1536,10 +1536,18 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_size)); LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size) == 4, "found %lld\n", (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size)); - LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding) == 32, "found %lld\n", - (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding)); - LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding) == 16, "found %lld\n", - (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_layout_gen) == 32, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_layout_gen)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1) == 36, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2) == 40, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2)); LASSERTF(LCME_FL_INIT == 0x00000010UL, "found 0x%.8xUL\n", (unsigned int)LCME_FL_INIT); LASSERTF(LCME_FL_NEG == 0x80000000UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index b2f5b57..8fd5b26 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -517,7 +517,9 @@ struct lov_comp_md_entry_v1 { * start from lov_comp_md_v1 */ __u32 lcme_size; /* size of component blob */ - __u64 lcme_padding[2]; + __u32 lcme_layout_gen; + __u32 lcme_padding_1; + __u64 lcme_padding_2; } __packed; #define SEQ_ID_MAX 0x0000FFFF From patchwork Thu Feb 27 21:08:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F11A138D for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 27C92246A0 for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27C92246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D8A5721F5DB; Thu, 27 Feb 2020 13:20:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3627121FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 11B0DED7; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 107B846A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:59 -0500 Message-Id: <1582838290-17243-72-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 071/622] lustre: mdt: read on open for DoM files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Read file data upon open and return it in reply. That works only for file with Data-on-MDT layout and no OST components initialized. There are three possible cases may occur: 1) file data fits in already allocated reply buffer (~9K) and is returned in that buffer in OPEN reply. 2) File fits in the maximum reply buffer (128K) and reply is returned with larger size to the client causing resend with re-allocated buffer. 3) File doesn't fit in reply buffer but its tail fills page partially then that tail is returned. This can be useful for an append case WC-bug-id: https://jira.whamcloud.com/browse/LU-10181 Lustre-commit: 13372d6c243c ("LU-10181 mdt: read on open for DoM files") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/23011 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 + fs/lustre/include/obd.h | 11 +++ fs/lustre/llite/file.c | 131 +++++++++++++++++++++++++++++++++- fs/lustre/llite/llite_internal.h | 3 + fs/lustre/llite/namei.c | 3 + fs/lustre/mdc/lproc_mdc.c | 32 +++++++++ fs/lustre/mdc/mdc_internal.h | 4 ++ fs/lustre/mdc/mdc_locks.c | 28 +++++++- fs/lustre/mdc/mdc_request.c | 2 + fs/lustre/ptlrpc/layout.c | 11 ++- fs/lustre/ptlrpc/niobuf.c | 5 ++ 11 files changed, 227 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 2737240..807d080 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -291,6 +291,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_msg_field RMF_OBD_ID; extern struct req_msg_field RMF_FID; extern struct req_msg_field RMF_NIOBUF_REMOTE; +extern struct req_msg_field RMF_NIOBUF_INLINE; extern struct req_msg_field RMF_RCS; extern struct req_msg_field RMF_FIEMAP_KEY; extern struct req_msg_field RMF_FIEMAP_VAL; diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index c712979..de9642f 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -184,6 +184,17 @@ struct client_obd { */ u32 cl_max_mds_easize; + /* Data-on-MDT specific value to set larger reply buffer for possible + * data read along with open/stat requests. By default it tries to use + * unused space in reply buffer. + * This value is used to ensure that reply buffer has at least as + * much free space as value indicates. That free space is gained from + * LOV EA buffer which is small for DoM files and on big systems can + * provide up to 32KB of extra space in reply buffer. + * Default value is 8K now. + */ + u32 cl_dom_min_inline_repsize; + enum lustre_sec_part cl_sp_me; enum lustre_sec_part cl_sp_to; struct sptlrpc_flavor cl_flvr_mgc; /* fixed flavor of mgc->mgs */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 837add1..7657c79 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -393,6 +393,132 @@ int ll_file_release(struct inode *inode, struct file *file) return rc; } +static inline int ll_dom_readpage(void *data, struct page *page) +{ + struct niobuf_local *lnb = data; + void *kaddr; + + kaddr = kmap_atomic(page); + memcpy(kaddr, lnb->lnb_data, lnb->lnb_len); + if (lnb->lnb_len < PAGE_SIZE) + memset(kaddr + lnb->lnb_len, 0, + PAGE_SIZE - lnb->lnb_len); + flush_dcache_page(page); + SetPageUptodate(page); + kunmap_atomic(kaddr); + unlock_page(page); + + return 0; +} + +void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, + struct lookup_intent *it) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct cl_object *obj = lli->lli_clob; + struct address_space *mapping = inode->i_mapping; + struct page *vmpage; + struct niobuf_remote *rnb; + char *data; + struct lu_env *env; + struct cl_io *io; + u16 refcheck; + struct lustre_handle lockh; + struct ldlm_lock *lock; + unsigned long index, start; + struct niobuf_local lnb; + int rc; + bool dom_lock = false; + + if (!obj) + return; + + if (it->it_lock_mode != 0) { + lockh.cookie = it->it_lock_handle; + lock = ldlm_handle2lock(&lockh); + if (lock) + dom_lock = ldlm_has_dom(lock); + LDLM_LOCK_PUT(lock); + } + + if (!dom_lock) + return; + + env = cl_env_get(&refcheck); + if (IS_ERR(env)) + return; + + if (!req_capsule_has_field(&req->rq_pill, &RMF_NIOBUF_INLINE, + RCL_SERVER)) { + rc = -ENODATA; + goto out_env; + } + + rnb = req_capsule_server_get(&req->rq_pill, &RMF_NIOBUF_INLINE); + data = (char *)rnb + sizeof(*rnb); + + if (!rnb || rnb->rnb_len == 0) { + rc = 0; + goto out_env; + } + + CDEBUG(D_INFO, "Get data buffer along with open, len %i, i_size %llu\n", + rnb->rnb_len, i_size_read(inode)); + + io = vvp_env_thread_io(env); + io->ci_obj = obj; + io->ci_ignore_layout = 1; + rc = cl_io_init(env, io, CIT_MISC, obj); + if (rc) + goto out_io; + + lnb.lnb_file_offset = rnb->rnb_offset; + start = lnb.lnb_file_offset / PAGE_SIZE; + index = 0; + LASSERT(lnb.lnb_file_offset % PAGE_SIZE == 0); + lnb.lnb_page_offset = 0; + do { + struct cl_page *clp; + + lnb.lnb_data = data + (index << PAGE_SHIFT); + lnb.lnb_len = rnb->rnb_len - (index << PAGE_SHIFT); + if (lnb.lnb_len > PAGE_SIZE) + lnb.lnb_len = PAGE_SIZE; + + vmpage = read_cache_page(mapping, index + start, + ll_dom_readpage, &lnb); + if (IS_ERR(vmpage)) { + CWARN("%s: cannot fill page %lu for "DFID + " with data: rc = %li\n", + ll_get_fsname(inode->i_sb, NULL, 0), + index + start, PFID(lu_object_fid(&obj->co_lu)), + PTR_ERR(vmpage)); + break; + } + lock_page(vmpage); + clp = cl_page_find(env, obj, vmpage->index, vmpage, + CPT_CACHEABLE); + if (IS_ERR(clp)) { + unlock_page(vmpage); + put_page(vmpage); + rc = PTR_ERR(clp); + goto out_io; + } + + /* export page */ + cl_page_export(env, clp, 1); + cl_page_put(env, clp); + unlock_page(vmpage); + put_page(vmpage); + index++; + } while (rnb->rnb_len > (index << PAGE_SHIFT)); + rc = 0; +out_io: + cl_io_fini(env, io); +out_env: + cl_env_put(env, &refcheck); +} + static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, struct lookup_intent *itp) { @@ -450,8 +576,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, } rc = ll_prep_inode(&inode, req, NULL, itp); - if (!rc && itp->it_lock_mode) + + if (!rc && itp->it_lock_mode) { + ll_dom_finish_open(d_inode(de), req, itp); ll_set_lock_data(sbi->ll_md_exp, inode, itp, NULL); + } out: ptlrpc_req_finished(req); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 6bdbf28..7491397 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -916,6 +916,9 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, ssize_t ll_copy_user_md(const struct lov_user_md __user *md, struct lov_user_md **kbuf); +void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, + struct lookup_intent *it); + /* Compute expected user md size when passing in a md from user space */ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum) { diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index f835abb..4ac62b2 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -600,6 +600,9 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, if (rc) return rc; + if (it->it_op & IT_OPEN) + ll_dom_finish_open(inode, request, it); + ll_set_lock_data(ll_i2sbi(parent)->ll_md_exp, inode, it, &bits); /* We used to query real size from OSTs here, but actually diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 6b87e76..0c52bcf 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -456,6 +456,36 @@ static ssize_t mdc_stats_seq_write(struct file *file, } LPROC_SEQ_FOPS(mdc_stats); +static int mdc_dom_min_repsize_seq_show(struct seq_file *m, void *v) +{ + struct obd_device *dev = m->private; + + seq_printf(m, "%u\n", dev->u.cli.cl_dom_min_inline_repsize); + + return 0; +} + +static ssize_t mdc_dom_min_repsize_seq_write(struct file *file, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *dev; + unsigned int val; + int rc; + + dev = ((struct seq_file *)file->private_data)->private; + rc = kstrtouint_from_user(buffer, count, 0, &val); + if (rc) + return rc; + + if (val > MDC_DOM_MAX_INLINE_REPSIZE) + return -ERANGE; + + dev->u.cli.cl_dom_min_inline_repsize = val; + return count; +} +LPROC_SEQ_FOPS(mdc_dom_min_repsize); + LPROC_SEQ_FOPS_RO_TYPE(mdc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(mdc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(mdc, timeouts); @@ -489,6 +519,8 @@ static ssize_t mdc_stats_seq_write(struct file *file, .fops = &mdc_unstable_stats_fops }, { .name = "mdc_stats", .fops = &mdc_stats_fops }, + { .name = "mdc_dom_min_repsize", + .fops = &mdc_dom_min_repsize_fops }, { NULL } }; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 079539d..6cfa79c 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -159,4 +159,8 @@ int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock, struct ldlm_lock_desc *new, void *data, int flag); int mdc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data); int mdc_fill_lvb(struct ptlrpc_request *req, struct ost_lvb *lvb); + +#define MDC_DOM_DEF_INLINE_REPSIZE 8192 +#define MDC_DOM_MAX_INLINE_REPSIZE XATTR_SIZE_MAX + #endif diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 2e4a5c6..abbc908 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -254,8 +254,9 @@ static int mdc_save_lovea(struct ptlrpc_request *req, u32 lmmsize = op_data->op_data_size; LIST_HEAD(cancels); int count = 0; - int mode; + enum ldlm_mode mode; int rc; + int repsize; it->it_create_mode = (it->it_create_mode & ~S_IFMT) | S_IFREG; @@ -336,7 +337,32 @@ static int mdc_save_lovea(struct ptlrpc_request *req, obddev->u.cli.cl_max_mds_easize); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); + /** + * Inline buffer for possible data from Data-on-MDT files. + */ + req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, RCL_SERVER, + sizeof(struct niobuf_remote)); ptlrpc_request_set_replen(req); + + /* Get real repbuf allocated size as rounded up power of 2 */ + repsize = size_roundup_power2(req->rq_replen + + lustre_msg_early_size()); + + /* Estimate free space for DoM files in repbuf */ + repsize -= req->rq_replen - obddev->u.cli.cl_max_mds_easize + + sizeof(struct lov_comp_md_v1) + + sizeof(struct lov_comp_md_entry_v1) + + lov_mds_md_size(0, LOV_MAGIC_V3); + + if (repsize < obddev->u.cli.cl_dom_min_inline_repsize) { + repsize = obddev->u.cli.cl_dom_min_inline_repsize - repsize; + req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, + RCL_SERVER, + sizeof(struct niobuf_remote) + repsize); + ptlrpc_request_set_replen(req); + CDEBUG(D_INFO, "Increase repbuf by %d bytes, total: %d\n", + repsize, req->rq_replen); + } return req; } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index feac374..b173937 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2551,6 +2551,8 @@ int mdc_setup(struct obd_device *obd, struct lustre_cfg *cfg) if (rc) goto err_osc_cleanup; + obd->u.cli.cl_dom_min_inline_repsize = MDC_DOM_DEF_INLINE_REPSIZE; + ns_register_cancel(obd->obd_namespace, mdc_cancel_weight); obd->obd_namespace->ns_lvbo = &inode_lvbo; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 8fe661d..c11b1b0 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -414,7 +414,8 @@ &RMF_MDT_MD, &RMF_ACL, &RMF_CAPA1, - &RMF_CAPA2 + &RMF_CAPA2, + &RMF_NIOBUF_INLINE, }; static const struct req_msg_field *ldlm_intent_getattr_client[] = { @@ -1065,8 +1066,14 @@ struct req_msg_field RMF_NIOBUF_REMOTE = dump_rniobuf); EXPORT_SYMBOL(RMF_NIOBUF_REMOTE); +struct req_msg_field RMF_NIOBUF_INLINE = + DEFINE_MSGF("niobuf_inline", RMF_F_NO_SIZE_CHECK, + sizeof(struct niobuf_remote), lustre_swab_niobuf_remote, + dump_rniobuf); +EXPORT_SYMBOL(RMF_NIOBUF_INLINE); + struct req_msg_field RMF_RCS = - DEFINE_MSGF("niobuf_remote", RMF_F_STRUCT_ARRAY, sizeof(u32), + DEFINE_MSGF("niobuf_rcs", RMF_F_STRUCT_ARRAY, sizeof(u32), lustre_swab_generic_32s, dump_rcs); EXPORT_SYMBOL(RMF_RCS); diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 2e866fe..e8ba57b 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -617,6 +617,11 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) request->rq_status = rc; goto cleanup_bulk; } + /* Use real allocated value in lm_repsize, + * so the server may use whole reply buffer + * without resends where it is needed. + */ + request->rq_reqmsg->lm_repsize = request->rq_repbuf_len; } else { request->rq_repdata = NULL; request->rq_repmsg = NULL; From patchwork Thu Feb 27 21:09:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AC60159A for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51991246A0 for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51991246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E24021FA9B; Thu, 27 Feb 2020 13:20:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DF0621FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 15A08ED8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1378046C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:00 -0500 Message-Id: <1582838290-17243-73-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 072/622] lustre: migrate: pack lmv ea in migrate rpc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao To support stripe directory migration, pack lmv_user_md in migrate RPC. Add arguments of 'mdt-count' and 'mdt-hash' for 'lfs migrate'. Disable directory migration related tests temprorily, and we'll enable them later in the last patch of this set. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 470bdeec6ca5 ("LU-4684 migrate: pack lmv ea in migrate rpc") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31424 Reviewed-by: Andreas Dilger Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 19 ++++++---- fs/lustre/llite/file.c | 67 +++++++++++++++++---------------- fs/lustre/llite/llite_internal.h | 4 +- fs/lustre/llite/llite_lib.c | 4 +- fs/lustre/mdc/mdc_lib.c | 21 +++++++---- fs/lustre/mdc/mdc_reint.c | 20 ++-------- fs/lustre/ptlrpc/layout.c | 3 +- include/uapi/linux/lustre/lustre_idl.h | 2 +- include/uapi/linux/lustre/lustre_user.h | 8 +++- 9 files changed, 77 insertions(+), 71 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index c0c3bf0..751d0183 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1322,7 +1322,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto finish_req; } - lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1); + lum_size = lmv_user_md_size(stripe_count, + LMV_USER_MAGIC_SPECIFIC); tmp = kzalloc(lum_size, GFP_NOFS); if (!tmp) { rc = -ENOMEM; @@ -1655,14 +1656,14 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return rc; } case LL_IOC_MIGRATE: { - const char *filename; + struct lmv_user_md *lum; + char *filename; int namelen = 0; int len; int rc; - int mdtidx; rc = obd_ioctl_getdata(&data, &len, (void __user *)arg); - if (rc < 0) + if (rc) return rc; if (!data->ioc_inlbuf1 || !data->ioc_inlbuf2 || @@ -1674,17 +1675,21 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) filename = data->ioc_inlbuf1; namelen = data->ioc_inllen1; if (namelen < 1 || namelen != strlen(filename) + 1) { + CDEBUG(D_INFO, "IOC_MDC_LOOKUP missing filename\n"); rc = -EINVAL; goto migrate_free; } - if (data->ioc_inllen2 != sizeof(mdtidx)) { + lum = (struct lmv_user_md *)data->ioc_inlbuf2; + if (lum->lum_magic != LMV_USER_MAGIC && + lum->lum_magic != LMV_USER_MAGIC_SPECIFIC) { rc = -EINVAL; + CERROR("%s: wrong lum magic %x: rc = %d\n", + filename, lum->lum_magic, rc); goto migrate_free; } - mdtidx = *(int *)data->ioc_inlbuf2; - rc = ll_migrate(inode, file, mdtidx, filename, namelen - 1); + rc = ll_migrate(inode, file, lum, filename); migrate_free: kvfree(data); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 7657c79..68fb623 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3785,8 +3785,8 @@ int ll_get_fid_by_name(struct inode *parent, const char *name, return rc; } -int ll_migrate(struct inode *parent, struct file *file, int mdtidx, - const char *name, int namelen) +int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, + const char *name) { struct ptlrpc_request *request = NULL; struct obd_client_handle *och = NULL; @@ -3795,16 +3795,18 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, struct md_op_data *op_data; struct mdt_body *body; u64 data_version = 0; + size_t namelen = strlen(name); + int lumlen = lmv_user_md_size(lum->lum_stripe_count, lum->lum_magic); struct qstr qstr; int rc; - CDEBUG(D_VFSTRACE, "migrate %s under " DFID " to MDT%d\n", - name, PFID(ll_inode2fid(parent)), mdtidx); + CDEBUG(D_VFSTRACE, "migrate " DFID "/%s to MDT%d stripe count %d\n", + PFID(ll_inode2fid(parent)), name, + lum->lum_stripe_offset, lum->lum_stripe_count); - op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, - 0, LUSTRE_OPC_ANY, NULL); - if (IS_ERR(op_data)) - return PTR_ERR(op_data); + if (lum->lum_magic != cpu_to_le32(LMV_USER_MAGIC) && + lum->lum_magic != cpu_to_le32(LMV_USER_MAGIC_SPECIFIC)) + lustre_swab_lmv_user_md(lum); /* Get child FID first */ qstr.hash = full_name_hash(file_dentry(file), name, namelen); @@ -3818,16 +3820,14 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, } if (!child_inode) { - rc = ll_get_fid_by_name(parent, name, namelen, - &op_data->op_fid3, &child_inode); + rc = ll_get_fid_by_name(parent, name, namelen, NULL, + &child_inode); if (rc) - goto out_free; + return rc; } - if (!child_inode) { - rc = -EINVAL; - goto out_free; - } + if (!child_inode) + return -ENOENT; /* * lfs migrate command needs to be blocked on the client @@ -3839,6 +3839,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_iput; } + op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, + child_inode->i_mode, LUSTRE_OPC_ANY, NULL); + if (IS_ERR(op_data)) { + rc = PTR_ERR(op_data); + goto out_iput; + } + inode_lock(child_inode); op_data->op_fid3 = *ll_inode2fid(child_inode); if (!fid_is_sane(&op_data->op_fid3)) { @@ -3849,16 +3856,10 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_unlock; } - rc = ll_get_mdt_idx_by_fid(ll_i2sbi(parent), &op_data->op_fid3); - if (rc < 0) - goto out_unlock; + op_data->op_cli_flags |= CLI_MIGRATE | CLI_SET_MEA; + op_data->op_data = lum; + op_data->op_data_size = lumlen; - if (rc == mdtidx) { - CDEBUG(D_INFO, "%s: " DFID " is already on MDT%d.\n", name, - PFID(&op_data->op_fid3), mdtidx); - rc = 0; - goto out_unlock; - } again: if (S_ISREG(child_inode->i_mode)) { och = ll_lease_open(child_inode, NULL, FMODE_WRITE, 0); @@ -3874,16 +3875,17 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_close; op_data->op_handle = och->och_fh; - op_data->op_data = och->och_mod; op_data->op_data_version = data_version; op_data->op_lease_handle = och->och_lease_handle; - op_data->op_bias |= MDS_RENAME_MIGRATE; + op_data->op_bias |= MDS_CLOSE_MIGRATE; + + spin_lock(&och->och_mod->mod_open_req->rq_lock); + och->och_mod->mod_open_req->rq_replay = 0; + spin_unlock(&och->och_mod->mod_open_req->rq_lock); } - op_data->op_mds = mdtidx; - op_data->op_cli_flags = CLI_MIGRATE; - rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name, - namelen, name, namelen, &request); + rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name, namelen, + name, namelen, &request); if (!rc) { LASSERT(request); ll_update_times(request, parent); @@ -3915,16 +3917,15 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto again; out_close: - if (och) /* close the file */ + if (och) ll_lease_close(och, child_inode, NULL); if (!rc) clear_nlink(child_inode); out_unlock: inode_unlock(child_inode); + ll_finish_md_op_data(op_data); out_iput: iput(child_inode); -out_free: - ll_finish_md_op_data(op_data); return rc; } diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 7491397..edb5f2a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -824,8 +824,8 @@ int ll_getattr(const struct path *path, struct kstat *stat, #define ll_set_acl NULL #endif /* CONFIG_LUSTRE_FS_POSIX_ACL */ -int ll_migrate(struct inode *parent, struct file *file, int mdtidx, - const char *name, int namelen); +int ll_migrate(struct inode *parent, struct file *file, + struct lmv_user_md *lum, const char *name); int ll_get_fid_by_name(struct inode *parent, const char *name, int namelen, struct lu_fid *fid, struct inode **inode); int ll_inode_permission(struct inode *inode, int mask); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 56624e8..c04146f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -209,7 +209,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_GRANT_PARAM | OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2; - data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT; + data->ocd_connect_flags2 = OBD_CONNECT2_FLR | + OBD_CONNECT2_LOCK_CONVERT | + OBD_CONNECT2_DIR_MIGRATE; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index e2f1a49..1d38574 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -443,7 +443,7 @@ static void mdc_close_intent_pack(struct ptlrpc_request *req, struct close_data *data; struct ldlm_lock *lock; - if (!(bias & (MDS_CLOSE_INTENT | MDS_RENAME_MIGRATE))) + if (!(bias & (MDS_CLOSE_INTENT | MDS_CLOSE_MIGRATE))) return; data = req_capsule_client_get(&req->rq_pill, &RMF_CLOSE_DATA); @@ -507,13 +507,20 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, if (new) mdc_pack_name(req, &RMF_SYMTGT, new, newlen); - if (op_data->op_cli_flags & CLI_MIGRATE && - op_data->op_bias & MDS_RENAME_MIGRATE) { - struct mdt_ioepoch *epoch; + if (op_data->op_cli_flags & CLI_MIGRATE) { + char *tmp; - mdc_close_intent_pack(req, op_data); - epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH); - mdc_ioepoch_pack(epoch, op_data); + if (op_data->op_bias & MDS_CLOSE_MIGRATE) { + struct mdt_ioepoch *epoch; + + mdc_close_intent_pack(req, op_data); + epoch = req_capsule_client_get(&req->rq_pill, + &RMF_MDT_EPOCH); + mdc_ioepoch_pack(epoch, op_data); + } + + tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA); + memcpy(tmp, op_data->op_data, op_data->op_data_size); } } diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index d326962..030c247 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -390,6 +390,9 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, oldlen + 1); req_capsule_set_size(&req->rq_pill, &RMF_SYMTGT, RCL_CLIENT, newlen + 1); + if (op_data->op_cli_flags & CLI_MIGRATE) + req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, + op_data->op_data_size); rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { @@ -397,23 +400,6 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, return rc; } - if (op_data->op_cli_flags & CLI_MIGRATE && op_data->op_data) { - struct md_open_data *mod = op_data->op_data; - - LASSERTF(mod->mod_open_req && - mod->mod_open_req->rq_type != LI_POISON, - "POISONED open %p!\n", mod->mod_open_req); - - DEBUG_REQ(D_HA, mod->mod_open_req, "matched open"); - /* - * We no longer want to preserve this open for replay even - * though the open was committed. b=3632, b=3633 - */ - spin_lock(&mod->mod_open_req->rq_lock); - mod->mod_open_req->rq_replay = 0; - spin_unlock(&mod->mod_open_req->rq_lock); - } - if (exp_connect_cancelset(exp) && req) ldlm_cli_cancel_list(&cancels, count, req, 0); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index c11b1b0..ae573a2 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -263,7 +263,8 @@ &RMF_SYMTGT, &RMF_DLM_REQ, &RMF_MDT_EPOCH, - &RMF_CLOSE_DATA + &RMF_CLOSE_DATA, + &RMF_EADATA }; static const struct req_msg_field *mds_last_unlink_server[] = { diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 9f8d65d..75326c0 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1693,7 +1693,7 @@ enum mds_op_bias { MDS_CREATE_VOLATILE = 1 << 10, MDS_OWNEROVERRIDE = 1 << 11, MDS_HSM_RELEASE = 1 << 12, - MDS_RENAME_MIGRATE = 1 << 13, + MDS_CLOSE_MIGRATE = 1 << 13, MDS_CLOSE_LAYOUT_SWAP = 1 << 14, MDS_CLOSE_LAYOUT_MERGE = 1 << 15, MDS_CLOSE_RESYNC_DONE = 1 << 16, diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 8fd5b26..421c977 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -632,8 +632,12 @@ struct lmv_user_md_v1 { static inline int lmv_user_md_size(int stripes, int lmm_magic) { - return sizeof(struct lmv_user_md) + - stripes * sizeof(struct lmv_user_mds_data); + int size = sizeof(struct lmv_user_md); + + if (lmm_magic == LMV_USER_MAGIC_SPECIFIC) + size += stripes * sizeof(struct lmv_user_mds_data); + + return size; } struct ll_recreate_obj { From patchwork Thu Feb 27 21:09:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81E8214BC for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6A7B7246A0 for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A7B7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2ED034879C; Thu, 27 Feb 2020 13:20:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E35CB21FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 17B14ED9; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1678046D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:01 -0500 Message-Id: <1582838290-17243-74-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 073/622] lustre: hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Teddy Zheng , Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Teddy Zheng Clients registed to MDS with OBD_CONNECT2_ARCHIVE_ID_ARRAY will use array to pass ARCHIVED IDs. While clients without it still use bitmap. This flag allows old clients connect to new MDSs. WC-bug-id: https://jira.whamcloud.com/browse/LU-10114 Lustre-commit: 1c7e7d1243f7 ("LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array") Signed-off-by: Teddy Zheng Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/32806 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 385359f..fbd46df 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -119,6 +119,7 @@ "flr", /* 0x20 */ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ + "archive_id_array", /* 0x100 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index e22f8f8..1afbb41 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1141,6 +1141,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_WBC_INTENTS); LASSERTF(OBD_CONNECT2_LOCK_CONVERT == 0x80ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_LOCK_CONVERT); + LASSERTF(OBD_CONNECT2_ARCHIVE_ID_ARRAY == 0x100ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ARCHIVE_ID_ARRAY); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 75326c0..dc9872cf3 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -800,6 +800,7 @@ struct ptlrpc_body_v2 { * locks */ #define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ +#define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:09:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F35614BC for ; Thu, 27 Feb 2020 21:22:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 538E5246A0 for ; Thu, 27 Feb 2020 21:22:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 538E5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B994348973; Thu, 27 Feb 2020 13:21:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 309A521FA64 for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1B570EDA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 19D71468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:02 -0500 Message-Id: <1582838290-17243-75-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 074/622] lustre: llite: handle zero length xattr values correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In mdt_getxattr(), set OBD_MD_FLXATTR in mbo_valid of the reply's MDT body so that the client can distinguish between nonexistent extended attributes and zero length values. In ll_xattr_list() and ll_getxattr_common() test for OBD_MD_FLXATTR and return 0 rather than -ENODATA in the appropriate cases. Add sanity test_102t() to test that zero length values are handled correctly. Lustre-commit: 1e4164a1254d ("LU-11109 mdt: handle zero length xattr values correctly") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32755 Reviewed-by: Andreas Dilger Reviewed-by: Mikhail Pershin Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/llite/xattr.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index f25ae59..636334e 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -363,6 +363,11 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, /* only detect the xattr size */ if (size == 0) { + /* LU-11109: Older MDTs do not distinguish + * between nonexistent xattrs and zero length + * values in this case. Newer MDTs will return + * -ENODATA or set OBD_MD_FLXATTR. + */ rc = body->mbo_eadatasize; goto out; } @@ -375,7 +380,22 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, } if (body->mbo_eadatasize == 0) { - rc = -ENODATA; + /* LU-11109: Newer MDTs set OBD_MD_FLXATTR on + * success so that we can distinguish between + * zero length value and nonexistent xattr. + * + * If OBD_MD_FLXATTR is not set then we keep + * the old behavior and return -ENODATA for + * getxattr() when mbo_eadatasize is 0. But + * -ENODATA only makes sense for getxattr() + * and not for listxattr(). + */ + if (body->mbo_valid & OBD_MD_FLXATTR) + rc = 0; + else if (valid == OBD_MD_FLXATTR) + rc = -ENODATA; + else + rc = 0; goto out; } From patchwork Thu Feb 27 21:09:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409787 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A889B138D for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8FA85246A0 for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FA85246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 04FA8348832; Thu, 27 Feb 2020 13:20:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7730221FAAF for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1FF10EE3; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1D23946A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:03 -0500 Message-Id: <1582838290-17243-76-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 075/622] lnet: refactor lnet_select_pathway() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata lnet_select_pathway() is a complex monolithic function which handles many send cases. Broke down lnet_select_pathway() to multiple functions. Each function handles a different send case. This will make it easier to add the handling of the different health cases in future patches. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 4e48761a5719 ("LU-9120 lnet: refactor lnet_select_pathway()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32760 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 13 + net/lnet/lnet/lib-move.c | 1398 ++++++++++++++++++++++++++--------------- 2 files changed, 911 insertions(+), 500 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 22c6152..20b4660 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -827,6 +827,19 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return false; } +static inline struct lnet_peer_net * +lnet_find_peer_net_locked(struct lnet_peer *peer, u32 net_id) +{ + struct lnet_peer_net *peer_net; + + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { + if (peer_net->lpn_net_id == net_id) + return peer_net; + } + + return NULL; +} + static inline void lnet_peer_set_alive(struct lnet_peer_ni *lp) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index cab830a..10aa753 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -45,6 +45,23 @@ module_param(local_nid_dist_zero, int, 0444); MODULE_PARM_DESC(local_nid_dist_zero, "Reserved"); +struct lnet_send_data { + struct lnet_ni *sd_best_ni; + struct lnet_peer_ni *sd_best_lpni; + struct lnet_peer_ni *sd_final_dst_lpni; + struct lnet_peer *sd_peer; + struct lnet_peer *sd_gw_peer; + struct lnet_peer_ni *sd_gw_lpni; + struct lnet_peer_net *sd_peer_net; + struct lnet_msg *sd_msg; + lnet_nid_t sd_dst_nid; + lnet_nid_t sd_src_nid; + lnet_nid_t sd_rtr_nid; + int sd_cpt; + int sd_md_cpt; + u32 sd_send_case; +}; + static inline struct lnet_comm_count * get_stats_counts(struct lnet_element_stats *stats, enum lnet_stats_type stats_type) @@ -1188,7 +1205,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_peer_ni * -lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target, +lnet_find_route_locked(struct lnet_net *net, u32 remote_net, lnet_nid_t rtr_nid) { struct lnet_remotenet *rnet; @@ -1203,7 +1220,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * If @rtr_nid is not LNET_NID_ANY, return the gateway with * rtr_nid nid, otherwise find the best gateway I can use */ - rnet = lnet_find_rnet_locked(LNET_NIDNET(target)); + rnet = lnet_find_rnet_locked(remote_net); if (!rnet) return NULL; @@ -1252,13 +1269,20 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_ni * -lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *cur_ni, +lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *best_ni, + struct lnet_peer *peer, struct lnet_peer_net *peer_net, int md_cpt) { - struct lnet_ni *ni = NULL, *best_ni = cur_ni; + struct lnet_ni *ni = NULL; unsigned int shortest_distance; int best_credits; + /* If there is no peer_ni that we can send to on this network, + * then there is no point in looking for a new best_ni here. + */ + if (!lnet_get_next_peer_ni_locked(peer, peer_net, NULL)) + return best_ni; + if (!best_ni) { shortest_distance = UINT_MAX; best_credits = INT_MIN; @@ -1286,6 +1310,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, md_cpt, ni->ni_dev_cpt); + CDEBUG(D_NET, + "compare ni %s [c:%d, d:%d, s:%d] with best_ni %s [c:%d, d:%d, s:%d]\n", + libcfs_nid2str(ni->ni_nid), ni_credits, distance, + ni->ni_seq, (best_ni) ? libcfs_nid2str(best_ni->ni_nid) + : "not seleced", best_credits, shortest_distance, + (best_ni) ? best_ni->ni_seq : 0); + /* * All distances smaller than the NUMA range * are treated equally. @@ -1311,6 +1342,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_credits = ni_credits; } + CDEBUG(D_NET, "selected best_ni %s\n", + (best_ni) ? libcfs_nid2str(best_ni->ni_nid) : "no selection"); + return best_ni; } @@ -1335,421 +1369,140 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return false; } +#define SRC_SPEC 0x0001 +#define SRC_ANY 0x0002 +#define LOCAL_DST 0x0004 +#define REMOTE_DST 0x0008 +#define MR_DST 0x0010 +#define NMR_DST 0x0020 +#define SND_RESP 0x0040 + +/* The following to defines are used for return codes */ +#define REPEAT_SEND 0x1000 +#define PASS_THROUGH 0x2000 + +/* The different cases lnet_select pathway needs to handle */ +#define SRC_SPEC_LOCAL_MR_DST (SRC_SPEC | LOCAL_DST | MR_DST) +#define SRC_SPEC_ROUTER_MR_DST (SRC_SPEC | REMOTE_DST | MR_DST) +#define SRC_SPEC_LOCAL_NMR_DST (SRC_SPEC | LOCAL_DST | NMR_DST) +#define SRC_SPEC_ROUTER_NMR_DST (SRC_SPEC | REMOTE_DST | NMR_DST) +#define SRC_ANY_LOCAL_MR_DST (SRC_ANY | LOCAL_DST | MR_DST) +#define SRC_ANY_ROUTER_MR_DST (SRC_ANY | REMOTE_DST | MR_DST) +#define SRC_ANY_LOCAL_NMR_DST (SRC_ANY | LOCAL_DST | NMR_DST) +#define SRC_ANY_ROUTER_NMR_DST (SRC_ANY | REMOTE_DST | NMR_DST) + static int -lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, - struct lnet_msg *msg, lnet_nid_t rtr_nid) +lnet_handle_send(struct lnet_send_data *sd) { - struct lnet_ni *best_ni = NULL; - struct lnet_peer_ni *best_lpni = NULL; - struct lnet_peer_ni *best_gw = NULL; - struct lnet_peer_ni *lpni; - struct lnet_peer_ni *final_dst; - struct lnet_peer *peer; - struct lnet_peer_net *peer_net; - struct lnet_net *local_net; - int cpt, cpt2, rc; - bool routing; - bool routing2; - bool ni_is_pref; - bool preferred; - bool local_found; - int best_lpni_credits; - int md_cpt; - - /* - * get an initial CPT to use for locking. The idea here is not to - * serialize the calls to select_pathway, so that as many - * operations can run concurrently as possible. To do that we use - * the CPT where this call is being executed. Later on when we - * determine the CPT to use in lnet_message_commit, we switch the - * lock and check if there was any configuration change. If none, - * then we proceed, if there is, then we restart the operation. - */ - cpt = lnet_net_lock_current(); - - md_cpt = lnet_cpt_of_md(msg->msg_md, msg->msg_offset); - if (md_cpt == CFS_CPT_ANY) - md_cpt = cpt; - -again: - best_ni = NULL; - best_lpni = NULL; - best_gw = NULL; - final_dst = NULL; - local_net = NULL; - routing = false; - routing2 = false; - local_found = false; - - /* - * lnet_nid2peerni_locked() is the path that will find an - * existing peer_ni, or create one and mark it as having been - * created due to network traffic. - */ - lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt); - if (IS_ERR(lpni)) { - lnet_net_unlock(cpt); - return PTR_ERR(lpni); - } + struct lnet_ni *best_ni = sd->sd_best_ni; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + struct lnet_peer_ni *final_dst_lpni = sd->sd_final_dst_lpni; + struct lnet_msg *msg = sd->sd_msg; + int cpt2; + u32 send_case = sd->sd_send_case; + int rc; + u32 routing = send_case & REMOTE_DST; - /* If we're being asked to send to the loopback interface, there - * is no need to go through any selection. We can just shortcut - * the entire process and send over lolnd + /* Increment sequence number of the selected peer so that we + * pick the next one in Round Robin. */ - if (LNET_NETTYP(LNET_NIDNET(dst_nid)) == LOLND) { - lnet_peer_ni_decref_locked(lpni); - best_ni = the_lnet.ln_loni; - goto send; - } + best_lpni->lpni_seq++; - /* - * Now that we have a peer_ni, check if we want to discover - * the peer. Traffic to the LNET_RESERVED_PORTAL should not - * trigger discovery. + /* grab a reference on the peer_ni so it sticks around even if + * we need to drop and relock the lnet_net_lock below. */ - peer = lpni->lpni_peer_net->lpn_peer; - if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { - rc = lnet_discover_peer_locked(lpni, cpt, false); - if (rc) { - lnet_peer_ni_decref_locked(lpni); - lnet_net_unlock(cpt); - return rc; - } - /* The peer may have changed. */ - peer = lpni->lpni_peer_net->lpn_peer; - /* queue message and return */ - msg->msg_src_nid_param = src_nid; - msg->msg_rtr_nid_param = rtr_nid; - msg->msg_sending = 0; - list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); - CDEBUG(D_NET, "%s pending discovery\n", - libcfs_nid2str(peer->lp_primary_nid)); - lnet_peer_ni_decref_locked(lpni); - lnet_net_unlock(cpt); - - return LNET_DC_WAIT; - } - lnet_peer_ni_decref_locked(lpni); - - /* If peer is not healthy then can not send anything to it */ - if (!lnet_is_peer_healthy_locked(peer)) { - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } + lnet_peer_ni_addref_locked(best_lpni); - /* - * STEP 1: first jab at determining best_ni - * if src_nid is explicitly specified, then best_ni is already - * pre-determiend for us. Otherwise we need to select the best - * one to use later on + /* Use lnet_cpt_of_nid() to determine the CPT used to commit the + * message. This ensures that we get a CPT that is correct for + * the NI when the NI has been restricted to a subset of all CPTs. + * If the selected CPT differs from the one currently locked, we + * must unlock and relock the lnet_net_lock(), and then check whether + * the configuration has changed. We don't have a hold on the best_ni + * yet, and it may have vanished. */ - if (src_nid != LNET_NID_ANY) { - best_ni = lnet_nid2ni_locked(src_nid, cpt); - if (!best_ni) { - lnet_net_unlock(cpt); - LCONSOLE_WARN("Can't send to %s: src %s is not a local nid\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EINVAL; - } - } + cpt2 = lnet_cpt_of_nid_locked(best_lpni->lpni_nid, best_ni); + if (sd->sd_cpt != cpt2) { + u32 seq = lnet_get_dlc_seq_locked(); - if (msg->msg_type == LNET_MSG_REPLY || - msg->msg_type == LNET_MSG_ACK || - !lnet_peer_is_multi_rail(peer) || - best_ni) { - /* - * for replies we want to respond on the same peer_ni we - * received the message on if possible. If not, then pick - * a peer_ni to send to - * - * if the peer is non-multi-rail then you want to send to - * the dst_nid provided as well. - * - * If the best_ni has already been determined, IE the - * src_nid has been specified, then use the - * destination_nid provided as well, since we're - * continuing a series of related messages for the same - * RPC. - * - * It is expected to find the lpni using dst_nid, since we - * created it earlier. - */ - best_lpni = lnet_find_peer_ni_locked(dst_nid); - if (best_lpni) + lnet_net_unlock(sd->sd_cpt); + sd->sd_cpt = cpt2; + lnet_net_lock(sd->sd_cpt); + if (seq != lnet_get_dlc_seq_locked()) { lnet_peer_ni_decref_locked(best_lpni); - - if (best_lpni && !lnet_get_net_locked(LNET_NIDNET(dst_nid))) { - /* - * this lpni is not on a local network so we need - * to route this reply. - */ - best_gw = lnet_find_route_locked(NULL, - best_lpni->lpni_nid, - rtr_nid); - if (best_gw) { - /* - * RULE: Each node considers only the next-hop - * - * We're going to route the message, - * so change the peer to the router. - */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - peer = best_gw->lpni_peer_net->lpn_peer; - - /* - * if the router is not multi-rail - * then use the best_gw found to send - * the message to - */ - if (!lnet_peer_is_multi_rail(peer)) - best_lpni = best_gw; - else - best_lpni = NULL; - - routing = true; - } else { - best_lpni = NULL; - } - } else if (!best_lpni) { - lnet_net_unlock(cpt); - CERROR("unable to send msg_type %d to originating %s. Destination NID not in DB\n", - msg->msg_type, libcfs_nid2str(dst_nid)); - return -EINVAL; - } - } - - /* - * We must use a consistent source address when sending to a - * non-MR peer. However, a non-MR peer can have multiple NIDs - * on multiple networks, and we may even need to talk to this - * peer on multiple networks -- certain types of - * load-balancing configuration do this. - * - * So we need to pick the NI the peer prefers for this - * particular network. - */ - if (!lnet_peer_is_multi_rail(peer)) { - if (!best_lpni) { - lnet_net_unlock(cpt); - CERROR("no route to %s\n", - libcfs_nid2str(dst_nid)); - return -EHOSTUNREACH; - } - - /* best ni is already set if src_nid was provided */ - if (!best_ni) { - /* Get the target peer_ni */ - peer_net = lnet_peer_get_net_locked( - peer, LNET_NIDNET(best_lpni->lpni_nid)); - list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_peer_nis) { - if (lpni->lpni_pref_nnids == 0) - continue; - LASSERT(lpni->lpni_pref_nnids == 1); - best_ni = lnet_nid2ni_locked( - lpni->lpni_pref.nid, cpt); - break; - } + return REPEAT_SEND; } - /* if best_ni is still not set just pick one */ - if (!best_ni) { - best_ni = lnet_net2ni_locked( - best_lpni->lpni_net->net_id, cpt); - /* If there is no best_ni we don't have a route */ - if (!best_ni) { - CERROR("no path to %s from net %s\n", - libcfs_nid2str(best_lpni->lpni_nid), - libcfs_net2str(best_lpni->lpni_net->net_id)); - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - lpni = list_first_entry(&peer_net->lpn_peer_nis, - struct lnet_peer_ni, - lpni_peer_nis); - } - /* Set preferred NI if necessary. */ - if (lpni->lpni_pref_nnids == 0) - lnet_peer_ni_set_non_mr_pref_nid(lpni, best_ni->ni_nid); } - /* - * if we already found a best_ni because src_nid is specified and - * best_lpni because we are replying to a message then just send - * the message + /* store the best_lpni in the message right away to avoid having + * to do the same operation under different conditions */ - if (best_ni && best_lpni) - goto send; + msg->msg_txpeer = best_lpni; + msg->msg_txni = best_ni; - /* - * If we already found a best_ni because src_nid is specified then - * pick the peer then send the message + /* grab a reference for the best_ni since now it's in use in this + * send. The reference will be dropped in lnet_finalize() */ - if (best_ni) - goto pick_peer; + lnet_ni_addref_locked(msg->msg_txni, sd->sd_cpt); - /* - * pick the best_ni by going through all the possible networks of - * that peer and see which local NI is best suited to talk to that - * peer. - * - * Locally connected networks will always be preferred over - * a routed network. If there are only routed paths to the peer, - * then the best route is chosen. If all routes are equal then - * they are used in round robin. + /* Always set the target.nid to the best peer picked. Either the + * NID will be one of the peer NIDs selected, or the same NID as + * what was originally set in the target or it will be the NID of + * a router if this message should be routed */ - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { - if (!lnet_is_peer_net_healthy_locked(peer_net)) - continue; - - local_net = lnet_get_net_locked(peer_net->lpn_net_id); - if (!local_net && !routing && !local_found) { - struct lnet_peer_ni *net_gw; - - lpni = list_first_entry(&peer_net->lpn_peer_nis, - struct lnet_peer_ni, - lpni_peer_nis); - - net_gw = lnet_find_route_locked(NULL, - lpni->lpni_nid, - rtr_nid); - if (!net_gw) - continue; - - if (best_gw) { - /* - * lnet_find_route_locked() call - * will return the best_Gw on the - * lpni->lpni_nid network. - * However, best_gw and net_gw can - * be on different networks. - * Therefore need to compare them - * to pick the better of either. - */ - if (lnet_compare_peers(best_gw, net_gw) > 0) - continue; - if (best_gw->lpni_gw_seq <= net_gw->lpni_gw_seq) - continue; - } - best_gw = net_gw; - final_dst = lpni; - - routing2 = true; - } else { - best_gw = NULL; - final_dst = NULL; - routing2 = false; - local_found = true; - } - - /* - * a gw on this network is found, but there could be - * other better gateways on other networks. So don't pick - * the best_ni until we determine the best_gw. - */ - if (best_gw) - continue; - - /* if no local_net found continue */ - if (!local_net) - continue; - - /* - * Iterate through the NIs in this local Net and select - * the NI to send from. The selection is determined by - * these 3 criterion in the following priority: - * 1. NUMA - * 2. NI available credits - * 3. Round Robin - */ - best_ni = lnet_get_best_ni(local_net, best_ni, md_cpt); - } - - if (!best_ni && !best_gw) { - lnet_net_unlock(cpt); - LCONSOLE_WARN("No local ni found to send from to %s\n", - libcfs_nid2str(dst_nid)); - return -EINVAL; - } - - if (!best_ni) { - best_ni = lnet_get_best_ni(best_gw->lpni_net, best_ni, md_cpt); - LASSERT(best_gw && best_ni); - - /* - * We're going to route the message, so change the peer to - * the router. - */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - best_gw->lpni_gw_seq++; - peer = best_gw->lpni_peer_net->lpn_peer; - } + msg->msg_target.nid = msg->msg_txpeer->lpni_nid; - /* - * Now that we selected the NI to use increment its sequence - * number so the Round Robin algorithm will detect that it has - * been used and pick the next NI. + /* lnet_msg_commit assigns the correct cpt to the message, which + * is used to decrement the correct refcount on the ni when it's + * time to return the credits */ - best_ni->ni_seq++; + lnet_msg_commit(msg, sd->sd_cpt); -pick_peer: - /* - * At this point the best_ni is on a local network on which - * the peer has a peer_ni as well - */ - peer_net = lnet_peer_get_net_locked(peer, - best_ni->ni_net->net_id); - /* - * peer_net is not available or the src_nid is explicitly defined - * and the peer_net for that src_nid is unhealthy. find a route to - * the destination nid. + /* If we are routing the message then we keep the src_nid that was + * set by the originator. If we are not routing then we are the + * originator and set it here. */ - if (!peer_net || - (src_nid != LNET_NID_ANY && - !lnet_is_peer_net_healthy_locked(peer_net))) { - best_gw = lnet_find_route_locked(best_ni->ni_net, - dst_nid, - rtr_nid); - /* - * if no route is found for that network then - * move onto the next peer_ni in the peer - */ - if (!best_gw) { - LCONSOLE_WARN("No route to peer from %s\n", - libcfs_nid2str(best_ni->ni_nid)); - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - - CDEBUG(D_NET, "Best route to %s via %s for %s %d\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(best_gw->lpni_nid), - lnet_msgtyp2str(msg->msg_type), msg->msg_len); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = cpu_to_le64(msg->msg_txni->ni_nid); - routing2 = true; - /* - * RULE: Each node considers only the next-hop + if (routing) { + msg->msg_target_is_router = 1; + msg->msg_target.pid = LNET_PID_LUSTRE; + /* since we're routing we want to ensure that the + * msg_hdr.dest_nid is set to the final destination. When + * the router receives this message it knows how to route + * it. * - * We're going to route the message, so change the peer to - * the router. + * final_dst_lpni is set at the beginning of the + * lnet_select_pathway() function and is never changed. + * It's safe to use it here. */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - peer = best_gw->lpni_peer_net->lpn_peer; - } else if (!lnet_is_peer_net_healthy_locked(peer_net)) { - /* - * this peer_net is unhealthy but we still have an opportunity - * to find another peer_net that we can use + msg->msg_hdr.dest_nid = cpu_to_le64(final_dst_lpni->lpni_nid); + } else { + /* if we're not routing set the dest_nid to the best peer + * ni NID that we picked earlier in the algorithm. */ - u32 net_id = peer_net->lpn_net_id; - - LCONSOLE_WARN("peer net %s unhealthy\n", - libcfs_net2str(net_id)); - goto again; + msg->msg_hdr.dest_nid = cpu_to_le64(msg->msg_txpeer->lpni_nid); } + rc = lnet_post_send_locked(msg, 0); + if (!rc) + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_txni->ni_nid), + libcfs_nid2str(sd->sd_src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(msg->msg_txpeer->lpni_nid), + lnet_msgtyp2str(msg->msg_type)); + + return rc; +} + +static struct lnet_peer_ni * +lnet_select_peer_ni(struct lnet_send_data *sd, struct lnet_peer *peer, + struct lnet_peer_net *peer_net) +{ /* * Look at the peer NIs for the destination peer that connect * to the chosen net. If a peer_ni is preferred when using the @@ -1758,20 +1511,30 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * the available transmit credits are used. If the transmit * credits are equal, we round-robin over the peer_ni. */ - lpni = NULL; - best_lpni_credits = INT_MIN; - preferred = false; - best_lpni = NULL; + struct lnet_peer_ni *lpni = NULL; + struct lnet_peer_ni *best_lpni = NULL; + struct lnet_ni *best_ni = sd->sd_best_ni; + lnet_nid_t dst_nid = sd->sd_dst_nid; + int best_lpni_credits = INT_MIN; + bool preferred = false; + bool ni_is_pref; + while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { - /* - * if this peer ni is not healthy just skip it, no point in - * examining it further + /* if the best_ni we've chosen aleady has this lpni + * preferred, then let's use it */ - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, best_ni->ni_nid); + CDEBUG(D_NET, "%s ni_is_pref = %d\n", + libcfs_nid2str(best_ni->ni_nid), ni_is_pref); + + if (best_lpni) + CDEBUG(D_NET, "%s c:[%d, %d], s:[%d, %d]\n", + libcfs_nid2str(lpni->lpni_nid), + lpni->lpni_txcredits, best_lpni_credits, + lpni->lpni_seq, best_lpni->lpni_seq); + /* if this is a preferred peer use it */ if (!preferred && ni_is_pref) { preferred = true; @@ -1810,131 +1573,766 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, u32 net_id = peer_net ? peer_net->lpn_net_id : LNET_NIDNET(dst_nid); - lnet_net_unlock(cpt); - LCONSOLE_WARN("no peer_ni found on peer net %s\n", - libcfs_net2str(net_id)); - return -EHOSTUNREACH; + CDEBUG(D_NET, "no peer_ni found on peer net %s\n", + libcfs_net2str(net_id)); + return NULL; } -send: - /* Shortcut for loopback. */ - if (best_ni == the_lnet.ln_loni) { - /* No send credit hassles with LOLND */ - lnet_ni_addref_locked(best_ni, cpt); - msg->msg_hdr.dest_nid = cpu_to_le64(best_ni->ni_nid); - if (!msg->msg_routing) - msg->msg_hdr.src_nid = cpu_to_le64(best_ni->ni_nid); - msg->msg_target.nid = best_ni->ni_nid; - lnet_msg_commit(msg, cpt); - msg->msg_txni = best_ni; - lnet_net_unlock(cpt); - - return LNET_CREDIT_OK; - } + CDEBUG(D_NET, "sd_best_lpni = %s\n", + libcfs_nid2str(best_lpni->lpni_nid)); - routing = routing || routing2; + return best_lpni; +} - /* - * Increment sequence number of the peer selected so that we - * pick the next one in Round Robin. - */ - best_lpni->lpni_seq++; +/* Prerequisite: the best_ni should already be set in the sd + */ +static inline struct lnet_peer_ni * +lnet_find_best_lpni_on_net(struct lnet_send_data *sd, struct lnet_peer *peer, + u32 net_id) +{ + struct lnet_peer_net *peer_net; - /* - * grab a reference on the peer_ni so it sticks around even if - * we need to drop and relock the lnet_net_lock below. + /* The gateway is Multi-Rail capable so now we must select the + * proper peer_ni */ - lnet_peer_ni_addref_locked(best_lpni); + peer_net = lnet_peer_get_net_locked(peer, net_id); - /* - * Use lnet_cpt_of_nid() to determine the CPT used to commit the - * message. This ensures that we get a CPT that is correct for - * the NI when the NI has been restricted to a subset of all CPTs. - * If the selected CPT differs from the one currently locked, we - * must unlock and relock the lnet_net_lock(), and then check whether - * the configuration has changed. We don't have a hold on the best_ni - * yet, and it may have vanished. + if (!peer_net) { + CERROR("gateway peer %s has no NI on net %s\n", + libcfs_nid2str(peer->lp_primary_nid), + libcfs_net2str(net_id)); + return NULL; + } + + return lnet_select_peer_ni(sd, peer, peer_net); +} + +static inline void +lnet_set_non_mr_pref_nid(struct lnet_send_data *sd) +{ + if (sd->sd_send_case & NMR_DST && + sd->sd_msg->msg_type != LNET_MSG_REPLY && + sd->sd_msg->msg_type != LNET_MSG_ACK && + sd->sd_best_lpni->lpni_pref_nnids == 0) { + CDEBUG(D_NET, "Setting preferred local NID %s on NMR peer %s\n", + libcfs_nid2str(sd->sd_best_ni->ni_nid), + libcfs_nid2str(sd->sd_best_lpni->lpni_nid)); + lnet_peer_ni_set_non_mr_pref_nid(sd->sd_best_lpni, + sd->sd_best_ni->ni_nid); + } +} + +/* Source Specified + * Local Destination + * non-mr peer + * + * use the source and destination NIDs as the pathway + */ +static int +lnet_handle_spec_local_nmr_dst(struct lnet_send_data *sd) +{ + /* the destination lpni is set before we get here. */ + + /* find local NI */ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; + } + + /* the preferred NID will only be set for NMR peers */ - cpt2 = lnet_cpt_of_nid_locked(best_lpni->lpni_nid, best_ni); - if (cpt != cpt2) { - u32 seq = lnet_get_dlc_seq_locked(); - lnet_net_unlock(cpt); - cpt = cpt2; - lnet_net_lock(cpt); - if (seq != lnet_get_dlc_seq_locked()) { - lnet_peer_ni_decref_locked(best_lpni); - goto again; - } + lnet_set_non_mr_pref_nid(sd); + + return lnet_handle_send(sd); +} + +/* Source Specified + * Local Destination + * MR Peer + * + * Run the selection algorithm on the peer NIs unless we're sending + * a response, in this case just send to the destination + */ +static int +lnet_handle_spec_local_mr_dst(struct lnet_send_data *sd) +{ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; } - /* - * store the best_lpni in the message right away to avoid having - * to do the same operation under different conditions + /* only run the selection algorithm to pick the peer_ni if we're + * sending a GET or a PUT. Responses are sent to the same + * destination NID provided. */ - msg->msg_txpeer = best_lpni; - msg->msg_txni = best_ni; + if (!(sd->sd_send_case & SND_RESP)) { + sd->sd_best_lpni = + lnet_find_best_lpni_on_net(sd, sd->sd_peer, + sd->sd_best_ni->ni_net->net_id); + } - /* - * grab a reference for the best_ni since now it's in use in this - * send. the reference will need to be dropped when the message is - * finished in lnet_finalize() + if (sd->sd_best_lpni) + return lnet_handle_send(sd); + + CERROR("can't send to %s. no NI on %s\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_net2str(sd->sd_best_ni->ni_net->net_id)); + + return -EHOSTUNREACH; +} + +struct lnet_ni * +lnet_find_best_ni_on_spec_net(struct lnet_ni *cur_best_ni, + struct lnet_peer *peer, + struct lnet_peer_net *peer_net, + int cpt, + bool incr_seq) +{ + struct lnet_net *local_net; + struct lnet_ni *best_ni; + + local_net = lnet_get_net_locked(peer_net->lpn_net_id); + if (!local_net) + return NULL; + + /* Iterate through the NIs in this local Net and select + * the NI to send from. The selection is determined by + * these 3 criterion in the following priority: + * 1. NUMA + * 2. NI available credits + * 3. Round Robin */ - lnet_ni_addref_locked(msg->msg_txni, cpt); + best_ni = lnet_get_best_ni(local_net, cur_best_ni, + peer, peer_net, cpt); - /* - * Always set the target.nid to the best peer picked. Either the - * nid will be one of the preconfigured NIDs, or the same NID as - * what was originally set in the target or it will be the NID of - * a router if this message should be routed + if (incr_seq && best_ni) + best_ni->ni_seq++; + + return best_ni; +} + +static int +lnet_handle_find_routed_path(struct lnet_send_data *sd, + lnet_nid_t dst_nid, + struct lnet_peer_ni **gw_lpni, + struct lnet_peer **gw_peer) +{ + struct lnet_peer_ni *gw; + lnet_nid_t src_nid = sd->sd_src_nid; + + gw = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), + sd->sd_rtr_nid); + if (!gw) { + CERROR("no route to %s from %s\n", + libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); + return -EHOSTUNREACH; + } + + /* get the peer of the gw_ni */ + LASSERT(gw->lpni_peer_net); + LASSERT(gw->lpni_peer_net->lpn_peer); + + *gw_peer = gw->lpni_peer_net->lpn_peer; + + if (!sd->sd_best_ni) + sd->sd_best_ni = + lnet_find_best_ni_on_spec_net(NULL, *gw_peer, + gw->lpni_peer_net, + sd->sd_md_cpt, + true); + + if (!sd->sd_best_ni) { + CERROR("Internal Error. Expected local ni on %s but non found :%s\n", + libcfs_net2str(gw->lpni_peer_net->lpn_net_id), + libcfs_nid2str(sd->sd_src_nid)); + return -EFAULT; + } + + /* if gw is MR let's find its best peer_ni */ - msg->msg_target.nid = msg->msg_txpeer->lpni_nid; + if (lnet_peer_is_multi_rail(*gw_peer)) { + gw = lnet_find_best_lpni_on_net(sd, *gw_peer, + sd->sd_best_ni->ni_net->net_id); + /* We've already verified that the gw has an NI on that + * desired net, but we're not finding it. Something is + * wrong. + */ + if (!gw) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } + } - /* - * lnet_msg_commit assigns the correct cpt to the message, which - * is used to decrement the correct refcount on the ni when it's - * time to return the credits + *gw_lpni = gw; + + return 0; +} + +/* Handle two cases: + * + * Case 1: + * Source specified + * Remote destination + * Non-MR destination + * + * Case 2: + * Source specified + * Remote destination + * MR destination + * + * The handling of these two cases is similar. Even though the destination + * can be MR or non-MR, we'll deal directly with the router. + */ +static int +lnet_handle_spec_router_dst(struct lnet_send_data *sd) +{ + int rc; + struct lnet_peer_ni *gw_lpni = NULL; + struct lnet_peer *gw_peer = NULL; + + /* find local NI */ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; + } + + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + if (sd->sd_send_case & NMR_DST) + /* since the final destination is non-MR let's set its preferred + * NID before we send + */ + lnet_set_non_mr_pref_nid(sd); + + /* We're going to send to the gw found so let's set its + * info */ - lnet_msg_commit(msg, cpt); + sd->sd_peer = gw_peer; + sd->sd_best_lpni = gw_lpni; - /* - * If we are routing the message then we don't need to overwrite - * the src_nid since it would've been set at the origin. Otherwise - * we are the originator so we need to set it. + return lnet_handle_send(sd); +} + +struct lnet_ni * +lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt) +{ + struct lnet_peer_net *peer_net = NULL; + struct lnet_ni *best_ni = NULL; + + /* The peer can have multiple interfaces, some of them can be on + * the local network and others on a routed network. We should + * prefer the local network. However if the local network is not + * available then we need to try the routed network */ - if (!msg->msg_routing) - msg->msg_hdr.src_nid = cpu_to_le64(msg->msg_txni->ni_nid); - if (routing) { - msg->msg_target_is_router = 1; - msg->msg_target.pid = LNET_PID_LUSTRE; - /* - * since we're routing we want to ensure that the - * msg_hdr.dest_nid is set to the final destination. When - * the router receives this message it knows how to route - * it. - */ - msg->msg_hdr.dest_nid = - cpu_to_le64(final_dst ? final_dst->lpni_nid : dst_nid); - } else { - /* - * if we're not routing set the dest_nid to the best peer - * ni that we picked earlier in the algorithm. + /* go through all the peer nets and find the best_ni */ + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { + /* The peer's list of nets can contain non-local nets. We + * want to only examine the local ones. */ - msg->msg_hdr.dest_nid = cpu_to_le64(msg->msg_txpeer->lpni_nid); + if (!lnet_get_net_locked(peer_net->lpn_net_id)) + continue; + best_ni = lnet_find_best_ni_on_spec_net(best_ni, peer, + peer_net, md_cpt, + false); } - rc = lnet_post_send_locked(msg, 0); + if (best_ni) + /* increment sequence number so we can round robin */ + best_ni->ni_seq++; + + return best_ni; +} + +static struct lnet_ni * +lnet_find_existing_preferred_best_ni(struct lnet_send_data *sd) +{ + struct lnet_ni *best_ni = NULL; + struct lnet_peer_net *peer_net; + struct lnet_peer *peer = sd->sd_peer; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + struct lnet_peer_ni *lpni; + int cpt = sd->sd_cpt; + + /* We must use a consistent source address when sending to a + * non-MR peer. However, a non-MR peer can have multiple NIDs + * on multiple networks, and we may even need to talk to this + * peer on multiple networks -- certain types of + * load-balancing configuration do this. + * + * So we need to pick the NI the peer prefers for this + * particular network. + */ + + /* Get the target peer_ni */ + peer_net = lnet_peer_get_net_locked(peer, + LNET_NIDNET(best_lpni->lpni_nid)); + LASSERT(peer_net); + list_for_each_entry(lpni, &peer_net->lpn_peer_nis, + lpni_peer_nis) { + if (lpni->lpni_pref_nnids == 0) + continue; + LASSERT(lpni->lpni_pref_nnids == 1); + best_ni = lnet_nid2ni_locked(lpni->lpni_pref.nid, cpt); + break; + } + + return best_ni; +} + +/* Prerequisite: sd->sd_peer and sd->sd_best_lpni should be set */ +static int +lnet_select_preferred_best_ni(struct lnet_send_data *sd) +{ + struct lnet_ni *best_ni = NULL; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + + /* We must use a consistent source address when sending to a + * non-MR peer. However, a non-MR peer can have multiple NIDs + * on multiple networks, and we may even need to talk to this + * peer on multiple networks -- certain types of + * load-balancing configuration do this. + * + * So we need to pick the NI the peer prefers for this + * particular network. + */ + + best_ni = lnet_find_existing_preferred_best_ni(sd); + + /* if best_ni is still not set just pick one */ + if (!best_ni) { + best_ni = + lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer, + sd->sd_best_lpni->lpni_peer_net, + sd->sd_md_cpt, true); + /* If there is no best_ni we don't have a route */ + if (!best_ni) { + CERROR("no path to %s from net %s\n", + libcfs_nid2str(best_lpni->lpni_nid), + libcfs_net2str(best_lpni->lpni_net->net_id)); + return -EHOSTUNREACH; + } + } + + sd->sd_best_ni = best_ni; + + /* Set preferred NI if necessary. */ + lnet_set_non_mr_pref_nid(sd); + + return 0; +} + +/* Source not specified + * Local destination + * Non-MR Peer + * + * always use the same source NID for NMR peers + * If we've talked to that peer before then we already have a preferred + * source NI associated with it. Otherwise, we select a preferred local NI + * and store it in the peer + */ +static int +lnet_handle_any_local_nmr_dst(struct lnet_send_data *sd) +{ + int rc; + + /* sd->sd_best_lpni is already set to the final destination */ + + /* At this point we should've created the peer ni and peer. If we + * can't find it, then something went wrong. Instead of assert + * output a relevant message and fail the send + */ + if (!sd->sd_best_lpni) { + CERROR("Internal fault. Unable to send msg %s to %s. NID not known\n", + lnet_msgtyp2str(sd->sd_msg->msg_type), + libcfs_nid2str(sd->sd_dst_nid)); + return -EFAULT; + } + + rc = lnet_select_preferred_best_ni(sd); if (!rc) - CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", - libcfs_nid2str(msg->msg_hdr.src_nid), - libcfs_nid2str(msg->msg_txni->ni_nid), - libcfs_nid2str(src_nid), - libcfs_nid2str(msg->msg_hdr.dest_nid), - libcfs_nid2str(dst_nid), - libcfs_nid2str(msg->msg_txpeer->lpni_nid), - lnet_msgtyp2str(msg->msg_type)); + rc = lnet_handle_send(sd); - lnet_net_unlock(cpt); + return rc; +} + +static int +lnet_handle_any_mr_dsta(struct lnet_send_data *sd) +{ + /* NOTE we've already handled the remote peer case. So we only + * need to worry about the local case here. + * + * if we're sending a response, ACK or reply, we need to send it + * to the destination NID given to us. At this point we already + * have the peer_ni we're suppose to send to, so just find the + * best_ni on the peer net and use that. Since we're sending to an + * MR peer then we can just run the selection algorithm on our + * local NIs and pick the best one. + */ + if (sd->sd_send_case & SND_RESP) { + sd->sd_best_ni = + lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer, + sd->sd_best_lpni->lpni_peer_net, + sd->sd_md_cpt, true); + + if (!sd->sd_best_ni) { + /* We're not going to deal with not able to send + * a response to the provided final destination + */ + CERROR("Can't send response to %s. No local NI available\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + return lnet_handle_send(sd); + } + + /* If we get here that means we're sending a fresh request, PUT or + * GET, so we need to run our standard selection algorithm. + * First find the best local interface that's on any of the peer's + * networks. + */ + sd->sd_best_ni = lnet_find_best_ni_on_local_net(sd->sd_peer, + sd->sd_md_cpt); + if (sd->sd_best_ni) { + sd->sd_best_lpni = + lnet_find_best_lpni_on_net(sd, sd->sd_peer, + sd->sd_best_ni->ni_net->net_id); + + /* if we're successful in selecting a peer_ni on the local + * network, then send to it. Otherwise fall through and + * try and see if we can reach it over another routed + * network + */ + if (sd->sd_best_lpni) { + /* in case we initially started with a routed + * destination, let's reset to local + */ + sd->sd_send_case &= ~REMOTE_DST; + sd->sd_send_case |= LOCAL_DST; + return lnet_handle_send(sd); + } + + CERROR("Internal Error. Expected to have a best_lpni: %s -> %s\n", + libcfs_nid2str(sd->sd_src_nid), + libcfs_nid2str(sd->sd_dst_nid)); + + return -EFAULT; + } + + /* Peer doesn't have a local network. Let's see if there is + * a remote network we can reach it on. + */ + return PASS_THROUGH; +} + +/* Case 1: + * Source NID not specified + * Local destination + * MR peer + * + * Case 2: + * Source NID not speified + * Remote destination + * MR peer + * + * In both of these cases if we're sending a response, ACK or REPLY, then + * we need to send to the destination NID provided. + * + * In the remote case let's deal with MR routers. + * + */ +static int +lnet_handle_any_mr_dst(struct lnet_send_data *sd) +{ + int rc = 0; + struct lnet_peer *gw_peer = NULL; + struct lnet_peer_ni *gw_lpni = NULL; + + /* handle sending a response to a remote peer here so we don't + * have to worry about it if we hit lnet_handle_any_mr_dsta() + */ + if (sd->sd_send_case & REMOTE_DST && + sd->sd_send_case & SND_RESP) { + struct lnet_peer_ni *gw; + struct lnet_peer *gw_peer; + + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw, + &gw_peer); + if (rc < 0) { + CERROR("Can't send response to %s. No route available\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + sd->sd_best_lpni = gw; + sd->sd_peer = gw_peer; + + return lnet_handle_send(sd); + } + + /* Even though the NID for the peer might not be on a local network, + * since the peer is MR there could be other interfaces on the + * local network. In that case we'd still like to prefer the local + * network over the routed network. If we're unable to do that + * then we select the best router among the different routed networks, + * and if the router is MR then we can deal with it as such. + */ + rc = lnet_handle_any_mr_dsta(sd); + if (rc != PASS_THROUGH) + return rc; + + /* TODO; One possible enhancement is to run the selection + * algorithm on the peer. However for remote peers the credits are + * not decremented, so we'll be basically going over the peer NIs + * in round robin. An MR router will run the selection algorithm + * on the next-hop interfaces. + */ + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + sd->sd_send_case &= ~LOCAL_DST; + sd->sd_send_case |= REMOTE_DST; + + sd->sd_peer = gw_peer; + sd->sd_best_lpni = gw_lpni; + + return lnet_handle_send(sd); +} + +/* Source not specified + * Remote destination + * Non-MR peer + * + * Must send to the specified peer NID using the same source NID that + * we've used before. If it's the first time to talk to that peer then + * find the source NI and assign it as preferred to that peer + */ +static int +lnet_handle_any_router_nmr_dst(struct lnet_send_data *sd) +{ + int rc; + struct lnet_peer_ni *gw_lpni = NULL; + struct lnet_peer *gw_peer = NULL; + + /* Let's set if we have a preferred NI to talk to this NMR peer + */ + sd->sd_best_ni = lnet_find_existing_preferred_best_ni(sd); + + /* find the router and that'll find the best NI if we didn't find + * it already. + */ + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + /* set the best_ni we've chosen as the preferred one for + * this peer + */ + lnet_set_non_mr_pref_nid(sd); + + /* we'll be sending to the gw */ + sd->sd_best_lpni = gw_lpni; + sd->sd_peer = gw_peer; + + return lnet_handle_send(sd); +} + +static int +lnet_handle_send_case_locked(struct lnet_send_data *sd) +{ + /* Turn off the SND_RESP bit. + * It will be checked in the case handling + */ + u32 send_case = sd->sd_send_case &= ~SND_RESP; + + CDEBUG(D_NET, "Source %s%s to %s %s %s destination\n", + (send_case & SRC_SPEC) ? "Specified: " : "ANY", + (send_case & SRC_SPEC) ? libcfs_nid2str(sd->sd_src_nid) : "", + (send_case & MR_DST) ? "MR: " : "NMR: ", + libcfs_nid2str(sd->sd_dst_nid), + (send_case & LOCAL_DST) ? "local" : "routed"); + + switch (send_case) { + /* For all cases where the source is specified, we should always + * use the destination NID, whether it's an MR destination or not, + * since we're continuing a series of related messages for the + * same RPC + */ + case SRC_SPEC_LOCAL_NMR_DST: + return lnet_handle_spec_local_nmr_dst(sd); + case SRC_SPEC_LOCAL_MR_DST: + return lnet_handle_spec_local_mr_dst(sd); + case SRC_SPEC_ROUTER_NMR_DST: + case SRC_SPEC_ROUTER_MR_DST: + return lnet_handle_spec_router_dst(sd); + case SRC_ANY_LOCAL_NMR_DST: + return lnet_handle_any_local_nmr_dst(sd); + case SRC_ANY_LOCAL_MR_DST: + case SRC_ANY_ROUTER_MR_DST: + return lnet_handle_any_mr_dst(sd); + case SRC_ANY_ROUTER_NMR_DST: + return lnet_handle_any_router_nmr_dst(sd); + default: + CERROR("Unknown send case\n"); + return -1; + } +} + +static int +lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, + struct lnet_msg *msg, lnet_nid_t rtr_nid) +{ + struct lnet_peer_ni *lpni; + struct lnet_peer *peer; + struct lnet_send_data send_data; + int cpt, rc; + int md_cpt; + u32 send_case = 0; + + memset(&send_data, 0, sizeof(send_data)); + + /* get an initial CPT to use for locking. The idea here is not to + * serialize the calls to select_pathway, so that as many + * operations can run concurrently as possible. To do that we use + * the CPT where this call is being executed. Later on when we + * determine the CPT to use in lnet_message_commit, we switch the + * lock and check if there was any configuration change. If none, + * then we proceed, if there is, then we restart the operation. + */ + cpt = lnet_net_lock_current(); + + md_cpt = lnet_cpt_of_md(msg->msg_md, msg->msg_offset); + if (md_cpt == CFS_CPT_ANY) + md_cpt = cpt; + +again: + /* If we're being asked to send to the loopback interface, there + * is no need to go through any selection. We can just shortcut + * the entire process and send over lolnd + */ + if (LNET_NETTYP(LNET_NIDNET(dst_nid)) == LOLND) { + /* No send credit hassles with LOLND */ + lnet_ni_addref_locked(the_lnet.ln_loni, cpt); + msg->msg_hdr.dest_nid = cpu_to_le64(the_lnet.ln_loni->ni_nid); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = + cpu_to_le64(the_lnet.ln_loni->ni_nid); + msg->msg_target.nid = the_lnet.ln_loni->ni_nid; + lnet_msg_commit(msg, cpt); + msg->msg_txni = the_lnet.ln_loni; + lnet_net_unlock(cpt); + + return LNET_CREDIT_OK; + } + + /* find an existing peer_ni, or create one and mark it as having been + * created due to network traffic. This call will create the + * peer->peer_net->peer_ni tree. + */ + lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt); + if (IS_ERR(lpni)) { + lnet_net_unlock(cpt); + return PTR_ERR(lpni); + } + + /* Now that we have a peer_ni, check if we want to discover + * the peer. Traffic to the LNET_RESERVED_PORTAL should not + * trigger discovery. + */ + peer = lpni->lpni_peer_net->lpn_peer; + if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { + lnet_nid_t primary_nid; + + rc = lnet_discover_peer_locked(lpni, cpt, false); + if (rc) { + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + return rc; + } + /* The peer may have changed. */ + peer = lpni->lpni_peer_net->lpn_peer; + /* queue message and return */ + msg->msg_src_nid_param = src_nid; + msg->msg_rtr_nid_param = rtr_nid; + msg->msg_sending = 0; + list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + lnet_peer_ni_decref_locked(lpni); + primary_nid = peer->lp_primary_nid; + lnet_net_unlock(cpt); + + CDEBUG(D_NET, "%s pending discovery\n", + libcfs_nid2str(primary_nid)); + + return LNET_DC_WAIT; + } + lnet_peer_ni_decref_locked(lpni); + + /* If peer is not healthy then can not send anything to it */ + if (!lnet_is_peer_healthy_locked(peer)) { + lnet_net_unlock(cpt); + return -EHOSTUNREACH; + } + + /* Identify the different send cases + */ + if (src_nid == LNET_NID_ANY) + send_case |= SRC_ANY; + else + send_case |= SRC_SPEC; + + if (lnet_get_net_locked(LNET_NIDNET(dst_nid))) + send_case |= LOCAL_DST; + else + send_case |= REMOTE_DST; + + if (!lnet_peer_is_multi_rail(peer)) + send_case |= NMR_DST; + else + send_case |= MR_DST; + + if (msg->msg_type == LNET_MSG_REPLY || + msg->msg_type == LNET_MSG_ACK) + send_case |= SND_RESP; + + /* assign parameters to the send_data */ + send_data.sd_msg = msg; + send_data.sd_rtr_nid = rtr_nid; + send_data.sd_src_nid = src_nid; + send_data.sd_dst_nid = dst_nid; + send_data.sd_best_lpni = lpni; + /* keep a pointer to the final destination in case we're going to + * route, so we'll need to access it later + */ + send_data.sd_final_dst_lpni = lpni; + send_data.sd_peer = peer; + send_data.sd_md_cpt = md_cpt; + send_data.sd_cpt = cpt; + send_data.sd_send_case = send_case; + + rc = lnet_handle_send_case_locked(&send_data); + + if (rc == REPEAT_SEND) + goto again; + + lnet_net_unlock(send_data.sd_cpt); return rc; } From patchwork Thu Feb 27 21:09:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89B1E14BC for ; Thu, 27 Feb 2020 21:22:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 71F99246A0 for ; Thu, 27 Feb 2020 21:22:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71F99246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9826B34886C; Thu, 27 Feb 2020 13:20:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB09921FB11 for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 21A7BEE4; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2018646F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:04 -0500 Message-Id: <1582838290-17243-77-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 076/622] lnet: add health value per ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add a health value per local network interface. The health value reflects the health of the NI. It is initialized to 1000. 1000 is chosen to be able to granularly decrement the health value on error. If the NI is absolutely not healthy that will be indicated by an LND event, which will flag that the NI is down and should never be used. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: d54afb86116c ("LU-9120 lnet: add health value per ni") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32761 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 15 +++++++++++++++ net/lnet/lnet/api-ni.c | 1 + net/lnet/lnet/lib-move.c | 17 +++++++++++------ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e9560a9..0ed325a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -52,6 +52,12 @@ #define LNET_MAX_IOV (LNET_MAX_PAYLOAD >> PAGE_SHIFT) +/* + * This is the maximum health value. + * All local and peer NIs created have their health default to this value. + */ +#define LNET_MAX_HEALTH_VALUE 1000 + /* forward refs */ struct lnet_libmd; @@ -388,6 +394,15 @@ struct lnet_ni { u32 ni_seq; /* + * health value + * initialized to LNET_MAX_HEALTH_VALUE + * Value is decremented every time we fail to send a message over + * this NI because of a NI specific failure. + * Value is incremented if we successfully send a message. + */ + atomic_t ni_healthv; + + /* * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 8be3354..4e83fa8 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1817,6 +1817,7 @@ static void lnet_push_target_fini(void) atomic_set(&ni->ni_tx_credits, lnet_ni_tq_credits(ni) * ni->ni_ncpts); + atomic_set(&ni->ni_healthv, LNET_MAX_HEALTH_VALUE); CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n", libcfs_nid2str(ni->ni_nid), diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 10aa753..ab32c6f 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1276,6 +1276,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_ni *ni = NULL; unsigned int shortest_distance; int best_credits; + int best_healthv; /* If there is no peer_ni that we can send to on this network, * then there is no point in looking for a new best_ni here. @@ -1286,20 +1287,21 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!best_ni) { shortest_distance = UINT_MAX; best_credits = INT_MIN; + best_healthv = 0; } else { shortest_distance = cfs_cpt_distance(lnet_cpt_table(), md_cpt, best_ni->ni_dev_cpt); best_credits = atomic_read(&best_ni->ni_tx_credits); + best_healthv = atomic_read(&best_ni->ni_healthv); } while ((ni = lnet_get_next_ni_locked(local_net, ni))) { unsigned int distance; int ni_credits; - - if (!lnet_is_ni_healthy_locked(ni)) - continue; + int ni_healthv; ni_credits = atomic_read(&ni->ni_tx_credits); + ni_healthv = atomic_read(&ni->ni_healthv); /* * calculate the distance from the CPT on which @@ -1325,21 +1327,24 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, distance = lnet_numa_range; /* - * Select on shorter distance, then available + * Select on health, shorter distance, available * credits, then round-robin. */ - if (distance > shortest_distance) { + if (ni_healthv < best_healthv) { + continue; + } else if (distance > shortest_distance) { continue; } else if (distance < shortest_distance) { shortest_distance = distance; } else if (ni_credits < best_credits) { continue; } else if (ni_credits == best_credits) { - if (best_ni && (best_ni)->ni_seq <= ni->ni_seq) + if (best_ni && best_ni->ni_seq <= ni->ni_seq) continue; } best_ni = ni; best_credits = ni_credits; + best_healthv = ni_healthv; } CDEBUG(D_NET, "selected best_ni %s\n", From patchwork Thu Feb 27 21:09:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56C1E14BC for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F410246A0 for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F410246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E40F83489A8; Thu, 27 Feb 2020 13:21:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4FC7021FAAE for ; Thu, 27 Feb 2020 13:18:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 24D17EE5; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2362A46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:05 -0500 Message-Id: <1582838290-17243-78-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 077/622] lnet: add lnet_health_sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add lnet_health_senstivity value. This value determines the amount the NI health value is decremented by. The value defaults to 0, which turns off the health feature by default. The user needs to explicitly turn on this feature. The assumption is that many sites will only have one interface in their nodes. In this case the health feature will not increase the resiliency of their system. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 63cf744d0fdf ("LU-9120 lnet: add lnet_health_sensitivity") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32762 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/api-ni.c | 52 +++++++++++++++++++++++++++++++++++++++++++ net/lnet/lnet/lib-move.c | 11 ++++++++- 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 20b4660..5e13d32 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -479,6 +479,7 @@ struct lnet_ni * extern unsigned int lnet_transaction_timeout; extern unsigned int lnet_numa_range; +extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_peer_discovery_disabled; extern int portal_rotor; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 4e83fa8..9d68434 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -78,6 +78,23 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_numa_range, "NUMA range to consider during Multi-Rail selection"); +/* lnet_health_sensitivity determines by how much we decrement the health + * value on sending error. The value defaults to 0, which means health + * checking is turned off by default. + */ +unsigned int lnet_health_sensitivity; +static int sensitivity_set(const char *val, const struct kernel_param *kp); +static struct kernel_param_ops param_ops_health_sensitivity = { + .set = sensitivity_set, + .get = param_get_int, +}; + +#define param_check_health_sensitivity(name, p) \ + __param_check(name, p, int) +module_param(lnet_health_sensitivity, health_sensitivity, 0644); +MODULE_PARM_DESC(lnet_health_sensitivity, + "Value to decrement the health value by on error"); + static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT; static int intf_max_set(const char *val, const struct kernel_param *kp); module_param_call(lnet_interfaces_max, intf_max_set, param_get_int, @@ -115,6 +132,41 @@ static int lnet_discover(struct lnet_process_id id, u32 force, struct lnet_process_id __user *ids, int n_ids); static int +sensitivity_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *sensitivity = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_health_sensitivity'\n"); + return rc; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value == *sensitivity) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *sensitivity = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int discovery_set(const char *val, const struct kernel_param *kp) { int rc; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ab32c6f..38815fd 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1332,6 +1332,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, */ if (ni_healthv < best_healthv) { continue; + } else if (ni_healthv > best_healthv) { + best_healthv = ni_healthv; + /* If we're going to prefer this ni because it's + * the healthiest, then we should set the + * shortest_distance in the algorithm in case + * there are multiple NIs with the same health but + * different distances. + */ + if (distance < shortest_distance) + shortest_distance = distance; } else if (distance > shortest_distance) { continue; } else if (distance < shortest_distance) { @@ -1344,7 +1354,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } best_ni = ni; best_credits = ni_credits; - best_healthv = ni_healthv; } CDEBUG(D_NET, "selected best_ni %s\n", From patchwork Thu Feb 27 21:09:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EDBE159A for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87CA1246A0 for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87CA1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0593C34880A; Thu, 27 Feb 2020 13:20:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A52CB21FAAE for ; Thu, 27 Feb 2020 13:18:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 27DA6EE6; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2655C468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:06 -0500 Message-Id: <1582838290-17243-79-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 078/622] lnet: add monitor thread X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Refactored the router checker thread to be the monitor thread. The monitor thread will check router aliveness, expires messages on the active list, recover local and remote NIs and resend messages. In this patch it only checks router aliveness. A deadline on the message is also added to keep track of when this message should expire. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: b01e6fce1c98 ("LU-9120 lnet: add monitor thread") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32763 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 11 ++- include/linux/lnet/lib-types.h | 27 +++---- net/lnet/lnet/api-ni.c | 12 ++-- net/lnet/lnet/lib-move.c | 98 ++++++++++++++++++++++++++ net/lnet/lnet/lib-msg.c | 9 ++- net/lnet/lnet/router.c | 156 +++++++++++++---------------------------- 6 files changed, 185 insertions(+), 128 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 5e13d32..2c3f665 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -714,8 +714,15 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, int lnet_peers_start_down(void); int lnet_peer_buffer_credits(struct lnet_net *net); -int lnet_router_checker_start(void); -void lnet_router_checker_stop(void); +int lnet_monitor_thr_start(void); +void lnet_monitor_thr_stop(void); + +bool lnet_router_checker_active(void); +void lnet_check_routers(void); +int lnet_router_pre_mt_start(void); +void lnet_router_post_mt_start(void); +void lnet_prune_rc_data(int wait_unlink); +void lnet_router_cleanup(void); void lnet_router_ni_update_locked(struct lnet_peer_ni *gw, u32 net); void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 0ed325a..e1a56a1 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -79,6 +79,12 @@ struct lnet_msg { lnet_nid_t msg_src_nid_param; lnet_nid_t msg_rtr_nid_param; + /* + * Deadline for the message after which it will be finalized if it + * has not completed. + */ + ktime_t msg_deadline; + /* committed for sending */ unsigned int msg_tx_committed:1; /* CPT # this message committed for sending */ @@ -905,9 +911,9 @@ struct lnet_msg_container { /* Router Checker states */ enum lnet_rc_state { - LNET_RC_STATE_SHUTDOWN, /* not started */ - LNET_RC_STATE_RUNNING, /* started up OK */ - LNET_RC_STATE_STOPPING, /* telling thread to stop */ + LNET_MT_STATE_SHUTDOWN, /* not started */ + LNET_MT_STATE_RUNNING, /* started up OK */ + LNET_MT_STATE_STOPPING, /* telling thread to stop */ }; /* LNet states */ @@ -1014,8 +1020,8 @@ struct lnet { /* discovery startup/shutdown state */ int ln_dc_state; - /* router checker startup/shutdown state */ - enum lnet_rc_state ln_rc_state; + /* monitor thread startup/shutdown state */ + enum lnet_rc_state ln_mt_state; /* router checker's event queue */ struct lnet_handle_eq ln_rc_eqh; /* rcd still pending on net */ @@ -1023,7 +1029,7 @@ struct lnet { /* rcd ready for free */ struct list_head ln_rcd_zombie; /* serialise startup/shutdown */ - struct completion ln_rc_signal; + struct completion ln_mt_signal; struct mutex ln_api_mutex; struct mutex ln_lnd_mutex; @@ -1053,13 +1059,10 @@ struct lnet { */ bool ln_nis_from_mod_params; - /* - * waitq for router checker. As long as there are no routes in - * the list, the router checker will sleep on this queue. when - * routes are added the thread will wake up + /* waitq for the monitor thread. The monitor thread takes care of + * checking routes, timedout messages and resending messages. */ - wait_queue_head_t ln_rc_waitq; - + wait_queue_head_t ln_mt_waitq; }; #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 9d68434..418d65e 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -309,7 +309,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force, spin_lock_init(&the_lnet.ln_eq_wait_lock); spin_lock_init(&the_lnet.ln_msg_resend_lock); init_waitqueue_head(&the_lnet.ln_eq_waitq); - init_waitqueue_head(&the_lnet.ln_rc_waitq); + init_waitqueue_head(&the_lnet.ln_mt_waitq); mutex_init(&the_lnet.ln_lnd_mutex); } @@ -2281,13 +2281,13 @@ void lnet_lib_exit(void) lnet_ping_target_update(pbuf, ping_mdh); - rc = lnet_router_checker_start(); + rc = lnet_monitor_thr_start(); if (rc) goto err_stop_ping; rc = lnet_push_target_init(); if (rc != 0) - goto err_stop_router_checker; + goto err_stop_monitor_thr; rc = lnet_peer_discovery_start(); if (rc != 0) @@ -2302,8 +2302,8 @@ void lnet_lib_exit(void) err_destroy_push_target: lnet_push_target_fini(); -err_stop_router_checker: - lnet_router_checker_stop(); +err_stop_monitor_thr: + lnet_monitor_thr_stop(); err_stop_ping: lnet_ping_target_fini(); err_acceptor_stop: @@ -2353,7 +2353,7 @@ void lnet_lib_exit(void) lnet_router_debugfs_fini(); lnet_peer_discovery_stop(); lnet_push_target_fini(); - lnet_router_checker_stop(); + lnet_monitor_thr_stop(); lnet_ping_target_fini(); /* Teardown fns that use my own API functions BEFORE here */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 38815fd..418e3ad 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -818,6 +818,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } } + /* unset the tx_delay flag as we're going to send it now */ + msg->msg_tx_delayed = 0; + if (do_send) { lnet_net_unlock(cpt); lnet_ni_send(ni, msg); @@ -914,6 +917,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, msg->msg_niov = rbp->rbp_npages; msg->msg_kiov = &rb->rb_kiov[0]; + /* unset the msg-rx_delayed flag since we're receiving the message */ + msg->msg_rx_delayed = 0; + if (do_recv) { int cpt = msg->msg_rx_cpt; @@ -2383,6 +2389,98 @@ struct lnet_ni * return 0; } +static int +lnet_monitor_thread(void *arg) +{ + /* The monitor thread takes care of the following: + * 1. Checks the aliveness of routers + * 2. Checks if there are messages on the resend queue to resend + * them. + * 3. Check if there are any NIs on the local recovery queue and + * pings them + * 4. Checks if there are any NIs on the remote recovery queue + * and pings them. + */ + while (the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING) { + if (lnet_router_checker_active()) + lnet_check_routers(); + + /* TODO do we need to check if we should sleep without + * timeout? Technically, an active system will always + * have messages in flight so this check will always + * evaluate to false. And on an idle system do we care + * if we wake up every 1 second? Although, we've seen + * cases where we get a complaint that an idle thread + * is waking up unnecessarily. + */ + wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, + false, HZ); + } + + /* clean up the router checker */ + lnet_prune_rc_data(1); + + /* Shutting down */ + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + + /* signal that the monitor thread is exiting */ + complete(&the_lnet.ln_mt_signal); + + return 0; +} + +int lnet_monitor_thr_start(void) +{ + int rc; + struct task_struct *task; + + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + + init_completion(&the_lnet.ln_mt_signal); + + /* Pre monitor thread start processing */ + rc = lnet_router_pre_mt_start(); + if (!rc) + return rc; + + the_lnet.ln_mt_state = LNET_MT_STATE_RUNNING; + task = kthread_run(lnet_monitor_thread, NULL, "monitor_thread"); + if (IS_ERR(task)) { + rc = PTR_ERR(task); + CERROR("Can't start monitor thread: %d\n", rc); + /* block until event callback signals exit */ + wait_for_completion(&the_lnet.ln_mt_signal); + + /* clean up */ + lnet_router_cleanup(); + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + return -ENOMEM; + } + + /* post monitor thread start processing */ + lnet_router_post_mt_start(); + + return 0; +} + +void lnet_monitor_thr_stop(void) +{ + if (the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) + return; + + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); + the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + + /* tell the monitor thread that we're shutting down */ + wake_up(&the_lnet.ln_mt_waitq); + + /* block until monitor thread signals that it's done */ + wait_for_completion(&the_lnet.ln_mt_signal); + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + + lnet_router_cleanup(); +} + void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, u32 msg_type) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index a7062f6..7869b96 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -141,13 +141,17 @@ { struct lnet_msg_container *container = the_lnet.ln_msg_containers[cpt]; struct lnet_counters *counters = the_lnet.ln_counters[cpt]; + s64 timeout_ns; + + /* set the message deadline */ + timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; + msg->msg_deadline = ktime_add_ns(ktime_get(), timeout_ns); /* routed message can be committed for both receiving and sending */ LASSERT(!msg->msg_tx_committed); if (msg->msg_sending) { LASSERT(!msg->msg_receiving); - msg->msg_tx_cpt = cpt; msg->msg_tx_committed = 1; if (msg->msg_rx_committed) { /* routed message REPLY */ @@ -161,8 +165,9 @@ } LASSERT(!msg->msg_onactivelist); + msg->msg_onactivelist = 1; - list_add(&msg->msg_activelist, &container->msc_active); + list_add_tail(&msg->msg_activelist, &container->msc_active); counters->msgs_alloc++; if (counters->msgs_alloc > counters->msgs_max) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 278807d..3f9d8c5 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -70,9 +70,6 @@ return net->net_tunables.lct_peer_tx_credits; } -/* forward ref's */ -static int lnet_router_checker(void *); - static int check_routers_before_use; module_param(check_routers_before_use, int, 0444); MODULE_PARM_DESC(check_routers_before_use, "Assume routers are down and ping them before use"); @@ -423,8 +420,8 @@ static void lnet_shuffle_seed(void) if (rnet != rnet2) kfree(rnet); - /* indicate to startup the router checker if configured */ - wake_up(&the_lnet.ln_rc_waitq); + /* kick start the monitor thread to handle the added route */ + wake_up(&the_lnet.ln_mt_waitq); return rc; } @@ -809,7 +806,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) struct lnet_peer_ni *rtr; int all_known; - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); for (;;) { int cpt = lnet_net_lock_current(); @@ -1038,7 +1035,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_ni_notify_locked(ni, rtr); if (!lnet_isrouter(rtr) || - the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) { + the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { /* router table changed or router checker is shutting down */ lnet_peer_ni_decref_locked(rtr); return; @@ -1092,14 +1089,9 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_peer_ni_decref_locked(rtr); } -int -lnet_router_checker_start(void) +int lnet_router_pre_mt_start(void) { - struct task_struct *task; int rc; - int eqsz = 0; - - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN); if (check_routers_before_use && dead_router_check_interval <= 0) { @@ -1107,27 +1099,17 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) return -EINVAL; } - init_completion(&the_lnet.ln_rc_signal); - rc = LNetEQAlloc(0, lnet_router_checker_event, &the_lnet.ln_rc_eqh); if (rc) { - CERROR("Can't allocate EQ(%d): %d\n", eqsz, rc); + CERROR("Can't allocate EQ(0): %d\n", rc); return -ENOMEM; } - the_lnet.ln_rc_state = LNET_RC_STATE_RUNNING; - task = kthread_run(lnet_router_checker, NULL, "router_checker"); - if (IS_ERR(task)) { - rc = PTR_ERR(task); - CERROR("Can't start router checker thread: %d\n", rc); - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_rc_signal); - rc = LNetEQFree(the_lnet.ln_rc_eqh); - LASSERT(!rc); - the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN; - return -ENOMEM; - } + return 0; +} +void lnet_router_post_mt_start(void) +{ if (check_routers_before_use) { /* * Note that a helpful side-effect of pinging all known routers @@ -1136,33 +1118,17 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) */ lnet_wait_known_routerstate(); } - - return 0; } -void -lnet_router_checker_stop(void) +void lnet_router_cleanup(void) { int rc; - if (the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN) - return; - - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); - the_lnet.ln_rc_state = LNET_RC_STATE_STOPPING; - /* wakeup the RC thread if it's sleeping */ - wake_up(&the_lnet.ln_rc_waitq); - - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_rc_signal); - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN); - rc = LNetEQFree(the_lnet.ln_rc_eqh); - LASSERT(!rc); + LASSERT(rc == 0); } -static void -lnet_prune_rc_data(int wait_unlink) +void lnet_prune_rc_data(int wait_unlink) { struct lnet_rc_data *rcd; struct lnet_rc_data *tmp; @@ -1170,7 +1136,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) struct list_head head; int i = 2; - if (likely(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING && + if (likely(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING && list_empty(&the_lnet.ln_rcd_deathrow) && list_empty(&the_lnet.ln_rcd_zombie))) return; @@ -1179,7 +1145,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_net_lock(LNET_LOCK_EX); - if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) { + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { /* router checker is stopping, prune all */ list_for_each_entry(lp, &the_lnet.ln_routers, lpni_rtr_list) { @@ -1242,18 +1208,12 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) } /* - * This function is called to check if the RC should block indefinitely. - * It's called from lnet_router_checker() as well as being passed to - * wait_event_interruptible() to avoid the lost wake_up problem. - * - * When it's called from wait_event_interruptible() it is necessary to - * also not sleep if the rc state is not running to avoid a deadlock - * when the system is shutting down + * This function is called from the monitor thread to check if there are + * any active routers that need to be checked. */ -static inline bool -lnet_router_checker_active(void) +bool lnet_router_checker_active(void) { - if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) return true; /* @@ -1263,70 +1223,54 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) if (the_lnet.ln_routing) return true; + /* if there are routers that need to be cleaned up then do so */ + if (!list_empty(&the_lnet.ln_rcd_deathrow) || + !list_empty(&the_lnet.ln_rcd_zombie)) + return true; + return !list_empty(&the_lnet.ln_routers) && (live_router_check_interval > 0 || dead_router_check_interval > 0); } -static int -lnet_router_checker(void *arg) +void +lnet_check_routers(void) { struct lnet_peer_ni *rtr; + u64 version; + int cpt; + int cpt2; - while (the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING) { - u64 version; - int cpt; - int cpt2; - - cpt = lnet_net_lock_current(); + cpt = lnet_net_lock_current(); rescan: - version = the_lnet.ln_routers_version; + version = the_lnet.ln_routers_version; - list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { - cpt2 = rtr->lpni_cpt; - if (cpt != cpt2) { - lnet_net_unlock(cpt); - cpt = cpt2; - lnet_net_lock(cpt); - /* the routers list has changed */ - if (version != the_lnet.ln_routers_version) - goto rescan; - } - - lnet_ping_router_locked(rtr); - - /* NB dropped lock */ - if (version != the_lnet.ln_routers_version) { - /* the routers list has changed */ + list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { + cpt2 = rtr->lpni_cpt; + if (cpt != cpt2) { + lnet_net_unlock(cpt); + cpt = cpt2; + lnet_net_lock(cpt); + /* the routers list has changed */ + if (version != the_lnet.ln_routers_version) goto rescan; - } } - if (the_lnet.ln_routing) - lnet_update_ni_status_locked(); - - lnet_net_unlock(cpt); - - lnet_prune_rc_data(0); /* don't wait for UNLINK */ + lnet_ping_router_locked(rtr); - /* - * if there are any routes then wakeup every second. If - * there are no routes then sleep indefinitely until woken - * up by a user adding a route - */ - if (!lnet_router_checker_active()) - wait_event_idle(the_lnet.ln_rc_waitq, - lnet_router_checker_active()); - else - schedule_timeout_idle(HZ); + /* NB dropped lock */ + if (version != the_lnet.ln_routers_version) { + /* the routers list has changed */ + goto rescan; + } } - lnet_prune_rc_data(1); /* wait for UNLINK */ + if (the_lnet.ln_routing) + lnet_update_ni_status_locked(); - the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN; - complete(&the_lnet.ln_rc_signal); - /* The unlink event callback will signal final completion */ - return 0; + lnet_net_unlock(cpt); + + lnet_prune_rc_data(0); /* don't wait for UNLINK */ } void From patchwork Thu Feb 27 21:09:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B040159A for ; Thu, 27 Feb 2020 21:23:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 039C3246A0 for ; Thu, 27 Feb 2020 21:23:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 039C3246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E21B3489E6; Thu, 27 Feb 2020 13:21:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0810921FAF2 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2C9D2EEB; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2969A46D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:07 -0500 Message-Id: <1582838290-17243-80-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 079/622] lnet: handle local ni failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added an enumerated type listing the different errors which the LND can propagate up to LNet for further handling. All local timeout errors will trigger a resend if the system is configured for resends. Remote errors will not trigger a resend to avoid creating duplicate message scenario on the receiving end. If a transmit error is encountered where we're sure the message wasn't received by the remote end we will attempt a resend. LNet level logic to handle local NI failure. When the LND finalizes a message lnet_finalize() will check if the message completed successfully, if so it increments the healthv of the local NI, but not beyond the max, and if it failed then it'll decrement the healthv but not below 0 and put the message on the resend queue. On local NI failure the local NI is placed on a recovery queue. The monitor thread will wake up and resend all the messages pending. The selection algorithm will properly select the local and remote NIs based on the new healthv. The monitor thread will ping each NI on the local recovery queue. On reply it will check if the NIs healthv is back to maximum, if it is then it will remove it from the recovery queue, otherwise it'll keep it there until it's fully recovered. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 70616605dd44 ("LU-9120 lnet: handle local ni failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32764 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/api.h | 3 +- include/linux/lnet/lib-lnet.h | 3 + include/linux/lnet/lib-types.h | 54 +++-- net/lnet/lnet/api-ni.c | 30 ++- net/lnet/lnet/config.c | 3 +- net/lnet/lnet/lib-move.c | 516 +++++++++++++++++++++++++++++++++++++++-- net/lnet/lnet/lib-msg.c | 281 +++++++++++++++++++++- net/lnet/lnet/peer.c | 57 ++--- net/lnet/lnet/router.c | 2 +- net/lnet/selftest/rpc.c | 2 +- 10 files changed, 862 insertions(+), 89 deletions(-) diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index 7cc1d04..a57ecc8 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -195,7 +195,8 @@ int LNetGet(lnet_nid_t self, struct lnet_process_id target_in, unsigned int portal_in, u64 match_bits_in, - unsigned int offset_in); + unsigned int offset_in, + bool recovery); /** @} lnet_data */ /** \defgroup lnet_misc Miscellaneous operations. diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 2c3f665..965fc5f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -536,6 +536,8 @@ void lnet_prep_send(struct lnet_msg *msg, int type, struct lnet_process_id target, unsigned int offset, unsigned int len); int lnet_send(lnet_nid_t nid, struct lnet_msg *msg, lnet_nid_t rtr_nid); +int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis, + void *user_ptr, struct lnet_handle_eq eqh, bool recovery); void lnet_return_tx_credits_locked(struct lnet_msg *msg); void lnet_return_rx_credits_locked(struct lnet_msg *msg); void lnet_schedule_blocked_locked(struct lnet_rtrbufpool *rbp); @@ -623,6 +625,7 @@ void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, void lnet_msg_containers_destroy(void); int lnet_msg_containers_create(void); +char *lnet_health_error2str(enum lnet_msg_hstatus hstatus); char *lnet_msgtyp2str(int type); void lnet_print_hdr(struct lnet_hdr *hdr); int lnet_fail_nid(lnet_nid_t nid, unsigned int threshold); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e1a56a1..8c3bf34 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -61,6 +61,20 @@ /* forward refs */ struct lnet_libmd; +enum lnet_msg_hstatus { + LNET_MSG_STATUS_OK = 0, + LNET_MSG_STATUS_LOCAL_INTERRUPT, + LNET_MSG_STATUS_LOCAL_DROPPED, + LNET_MSG_STATUS_LOCAL_ABORTED, + LNET_MSG_STATUS_LOCAL_NO_ROUTE, + LNET_MSG_STATUS_LOCAL_ERROR, + LNET_MSG_STATUS_LOCAL_TIMEOUT, + LNET_MSG_STATUS_REMOTE_ERROR, + LNET_MSG_STATUS_REMOTE_DROPPED, + LNET_MSG_STATUS_REMOTE_TIMEOUT, + LNET_MSG_STATUS_NETWORK_TIMEOUT +}; + struct lnet_msg { struct list_head msg_activelist; struct list_head msg_list; /* Q for credits/MD */ @@ -85,6 +99,13 @@ struct lnet_msg { */ ktime_t msg_deadline; + /* The message health status. */ + enum lnet_msg_hstatus msg_health_status; + /* This is a recovery message */ + bool msg_recovery; + /* flag to indicate that we do not want to resend this message */ + bool msg_no_resend; + /* committed for sending */ unsigned int msg_tx_committed:1; /* CPT # this message committed for sending */ @@ -277,18 +298,11 @@ struct lnet_tx_queue { struct list_head tq_delayed; /* delayed TXs */ }; -enum lnet_ni_state { - /* set when NI block is allocated */ - LNET_NI_STATE_INIT = 0, - /* set when NI is started successfully */ - LNET_NI_STATE_ACTIVE, - /* set when LND notifies NI failed */ - LNET_NI_STATE_FAILED, - /* set when LND notifies NI degraded */ - LNET_NI_STATE_DEGRADED, - /* set when shuttding down NI */ - LNET_NI_STATE_DELETING -}; +#define LNET_NI_STATE_INIT (1 << 0) +#define LNET_NI_STATE_ACTIVE (1 << 1) +#define LNET_NI_STATE_FAILED (1 << 2) +#define LNET_NI_STATE_RECOVERY_PENDING (1 << 3) +#define LNET_NI_STATE_DELETING (1 << 4) enum lnet_stats_type { LNET_STATS_TYPE_SEND = 0, @@ -351,6 +365,12 @@ struct lnet_ni { /* chain on the lnet_net structure */ struct list_head ni_netlist; + /* chain on the recovery queue */ + struct list_head ni_recovery; + + /* MD handle for recovery ping */ + struct lnet_handle_md ni_ping_mdh; + /* number of CPTs */ int ni_ncpts; @@ -382,7 +402,7 @@ struct lnet_ni { struct lnet_ni_status *ni_status; /* NI FSM */ - enum lnet_ni_state ni_state; + u32 ni_state; /* per NI LND tunables */ struct lnet_lnd_tunables ni_lnd_tunables; @@ -1063,6 +1083,14 @@ struct lnet { * checking routes, timedout messages and resending messages. */ wait_queue_head_t ln_mt_waitq; + + /* per-cpt resend queues */ + struct list_head **ln_mt_resendqs; + /* local NIs to recover */ + struct list_head ln_mt_localNIRecovq; + /* recovery eq handler */ + struct lnet_handle_eq ln_mt_eqh; + }; #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 418d65e..deef404 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -831,6 +831,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_dc_request); INIT_LIST_HEAD(&the_lnet.ln_dc_working); INIT_LIST_HEAD(&the_lnet.ln_dc_expired); + INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); rc = lnet_descriptor_setup(); @@ -1072,8 +1073,7 @@ struct lnet_net * bool lnet_is_ni_healthy_locked(struct lnet_ni *ni) { - if (ni->ni_state == LNET_NI_STATE_ACTIVE || - ni->ni_state == LNET_NI_STATE_DEGRADED) + if (ni->ni_state & LNET_NI_STATE_ACTIVE) return true; return false; @@ -1650,7 +1650,7 @@ static void lnet_push_target_fini(void) list_del_init(&ni->ni_netlist); /* the ni should be in deleting state. If it's not it's * a bug */ - LASSERT(ni->ni_state == LNET_NI_STATE_DELETING); + LASSERT(ni->ni_state & LNET_NI_STATE_DELETING); cfs_percpt_for_each(ref, j, ni->ni_refs) { if (!*ref) continue; @@ -1697,7 +1697,10 @@ static void lnet_push_target_fini(void) struct lnet_net *net = ni->ni_net; lnet_net_lock(LNET_LOCK_EX); - ni->ni_state = LNET_NI_STATE_DELETING; + lnet_ni_lock(ni); + ni->ni_state |= LNET_NI_STATE_DELETING; + ni->ni_state &= ~LNET_NI_STATE_ACTIVE; + lnet_ni_unlock(ni); lnet_ni_unlink_locked(ni); lnet_incr_dlc_seq(); lnet_net_unlock(LNET_LOCK_EX); @@ -1789,6 +1792,7 @@ static void lnet_push_target_fini(void) list_for_each_entry_safe(msg, tmp, &resend, msg_list) { list_del_init(&msg->msg_list); + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); } @@ -1827,7 +1831,10 @@ static void lnet_push_target_fini(void) goto failed0; } - ni->ni_state = LNET_NI_STATE_ACTIVE; + lnet_ni_lock(ni); + ni->ni_state |= LNET_NI_STATE_ACTIVE; + ni->ni_state &= ~LNET_NI_STATE_INIT; + lnet_ni_unlock(ni); /* We keep a reference on the loopback net through the loopback NI */ if (net->net_lnd->lnd_type == LOLND) { @@ -2554,11 +2561,17 @@ struct lnet_ni * struct lnet_ni *ni; struct lnet_net *net = mynet; + /* It is possible that the net has been cleaned out while there is + * a message being sent. This function accessed the net without + * checking if the list is empty + */ if (!prev) { if (!net) net = list_first_entry(&the_lnet.ln_nets, struct lnet_net, net_list); + if (list_empty(&net->net_ni_list)) + return NULL; ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist); @@ -2580,6 +2593,8 @@ struct lnet_ni * /* get the next net */ net = list_first_entry(&prev->ni_net->net_list, struct lnet_net, net_list); + if (list_empty(&net->net_ni_list)) + return NULL; /* get the ni on it */ ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist); @@ -2587,6 +2602,9 @@ struct lnet_ni * return ni; } + if (list_empty(&prev->ni_netlist)) + return NULL; + /* there are more nis left */ ni = list_first_entry(&prev->ni_netlist, struct lnet_ni, ni_netlist); @@ -3571,7 +3589,7 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, rc = LNetGet(LNET_NID_ANY, mdh, id, LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); + LNET_PROTO_PING_MATCHBITS, 0, false); if (rc) { /* Don't CERROR; this could be deliberate! */ rc2 = LNetMDUnlink(mdh); diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 0560215..ea62d36 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -442,6 +442,7 @@ struct lnet_net * spin_lock_init(&ni->ni_lock); INIT_LIST_HEAD(&ni->ni_netlist); + INIT_LIST_HEAD(&ni->ni_recovery); ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*ni->ni_refs[0])); if (!ni->ni_refs) @@ -466,7 +467,7 @@ struct lnet_net * ni->ni_net_ns = NULL; ni->ni_last_alive = ktime_get_real_seconds(); - ni->ni_state = LNET_NI_STATE_INIT; + ni->ni_state |= LNET_NI_STATE_INIT; list_add_tail(&ni->ni_netlist, &net->net_ni_added); /* diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 418e3ad..f3f4b84 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -579,8 +579,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, (msg->msg_txcredit && msg->msg_peertxcredit)); rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg); - if (rc < 0) + if (rc < 0) { + msg->msg_no_resend = true; lnet_finalize(msg, rc); + } } static int @@ -759,8 +761,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); - if (do_send) + if (do_send) { + msg->msg_health_status = LNET_MSG_STATUS_LOCAL_DROPPED; lnet_finalize(msg, -EHOSTUNREACH); + } lnet_net_lock(cpt); return -EHOSTUNREACH; @@ -772,8 +776,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Aborting message for %s: LNetM[DE]Unlink() already called on the MD/ME.\n", libcfs_id2str(msg->msg_target)); - if (do_send) + if (do_send) { + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); + } lnet_net_lock(cpt); return -ECANCELED; @@ -1059,6 +1065,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lnet_ni_recv(msg->msg_rxni, msg->msg_private, NULL, 0, 0, 0, msg->msg_hdr.payload_length); list_del_init(&msg->msg_list); + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); } @@ -2273,6 +2280,14 @@ struct lnet_ni * return PTR_ERR(lpni); } + /* Cache the original src_nid. If we need to resend the message + * then we'll need to know whether the src_nid was originally + * specified for this message. If it was originally specified, + * then we need to keep using the same src_nid since it's + * continuing the same sequence of messages. + */ + msg->msg_src_nid_param = src_nid; + /* Now that we have a peer_ni, check if we want to discover * the peer. Traffic to the LNET_RESERVED_PORTAL should not * trigger discovery. @@ -2290,7 +2305,6 @@ struct lnet_ni * /* The peer may have changed. */ peer = lpni->lpni_peer_net->lpn_peer; /* queue message and return */ - msg->msg_src_nid_param = src_nid; msg->msg_rtr_nid_param = rtr_nid; msg->msg_sending = 0; list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); @@ -2323,7 +2337,11 @@ struct lnet_ni * else send_case |= REMOTE_DST; - if (!lnet_peer_is_multi_rail(peer)) + /* if this is a non-MR peer or if we're recovering a peer ni then + * let's consider this an NMR case so we can hit the destination + * NID. + */ + if (!lnet_peer_is_multi_rail(peer) || msg->msg_recovery) send_case |= NMR_DST; else send_case |= MR_DST; @@ -2370,6 +2388,7 @@ struct lnet_ni * */ /* NB: !ni == interface pre-determined (ACK/REPLY) */ LASSERT(!msg->msg_txpeer); + LASSERT(!msg->msg_txni); LASSERT(!msg->msg_sending); LASSERT(!msg->msg_target_is_router); LASSERT(!msg->msg_receiving); @@ -2389,6 +2408,314 @@ struct lnet_ni * return 0; } +static void +lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) +{ + struct lnet_msg *msg; + + while (!list_empty(resendq)) { + struct lnet_peer_ni *lpni; + + msg = list_entry(resendq->next, struct lnet_msg, + msg_list); + + list_del_init(&msg->msg_list); + + lpni = lnet_find_peer_ni_locked(msg->msg_hdr.dest_nid); + if (!lpni) { + lnet_net_unlock(cpt); + CERROR("Expected that a peer is already created for %s\n", + libcfs_nid2str(msg->msg_hdr.dest_nid)); + msg->msg_no_resend = true; + lnet_finalize(msg, -EFAULT); + lnet_net_lock(cpt); + } else { + struct lnet_peer *peer; + int rc; + lnet_nid_t src_nid = LNET_NID_ANY; + + /* if this message is not being routed and the + * peer is non-MR then we must use the same + * src_nid that was used in the original send. + * Otherwise if we're routing the message (IE + * we're a router) then we can use any of our + * local interfaces. It doesn't matter to the + * final destination. + */ + peer = lpni->lpni_peer_net->lpn_peer; + if (!msg->msg_routing && + !lnet_peer_is_multi_rail(peer)) + src_nid = le64_to_cpu(msg->msg_hdr.src_nid); + + /* If we originally specified a src NID, then we + * must attempt to reuse it in the resend as well. + */ + if (msg->msg_src_nid_param != LNET_NID_ANY) + src_nid = msg->msg_src_nid_param; + lnet_peer_ni_decref_locked(lpni); + + lnet_net_unlock(cpt); + rc = lnet_send(src_nid, msg, LNET_NID_ANY); + if (rc) { + CERROR("Error sending %s to %s: %d\n", + lnet_msgtyp2str(msg->msg_type), + libcfs_id2str(msg->msg_target), rc); + msg->msg_no_resend = true; + lnet_finalize(msg, rc); + } + lnet_net_lock(cpt); + } + } +} + +static void +lnet_resend_pending_msgs(void) +{ + int i; + + cfs_cpt_for_each(i, lnet_cpt_table()) { + lnet_net_lock(i); + lnet_resend_pending_msgs_locked(the_lnet.ln_mt_resendqs[i], i); + lnet_net_unlock(i); + } +} + +/* called with cpt and ni_lock held */ +static void +lnet_unlink_ni_recovery_mdh_locked(struct lnet_ni *ni, int cpt) +{ + struct lnet_handle_md recovery_mdh; + + LNetInvalidateMDHandle(&recovery_mdh); + + if (ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING) { + recovery_mdh = ni->ni_ping_mdh; + LNetInvalidateMDHandle(&ni->ni_ping_mdh); + } + lnet_ni_unlock(ni); + lnet_net_unlock(cpt); + if (!LNetMDHandleIsInvalid(recovery_mdh)) + LNetMDUnlink(recovery_mdh); + lnet_net_lock(cpt); + lnet_ni_lock(ni); +} + +static void +lnet_recover_local_nis(void) +{ + struct list_head processed_list; + struct list_head local_queue; + struct lnet_handle_md mdh; + struct lnet_ni *tmp; + struct lnet_ni *ni; + lnet_nid_t nid; + int healthv; + int rc; + + INIT_LIST_HEAD(&local_queue); + INIT_LIST_HEAD(&processed_list); + + /* splice the recovery queue on a local queue. We will iterate + * through the local queue and update it as needed. Once we're + * done with the traversal, we'll splice the local queue back on + * the head of the ln_mt_localNIRecovq. Any newly added local NIs + * will be traversed in the next iteration. + */ + lnet_net_lock(0); + list_splice_init(&the_lnet.ln_mt_localNIRecovq, + &local_queue); + lnet_net_unlock(0); + + list_for_each_entry_safe(ni, tmp, &local_queue, ni_recovery) { + /* if an NI is being deleted or it is now healthy, there + * is no need to keep it around in the recovery queue. + * The monitor thread is the only thread responsible for + * removing the NI from the recovery queue. + * Multiple threads can be adding NIs to the recovery + * queue. + */ + healthv = atomic_read(&ni->ni_healthv); + + lnet_net_lock(0); + lnet_ni_lock(ni); + if (!(ni->ni_state & LNET_NI_STATE_ACTIVE) || + healthv == LNET_MAX_HEALTH_VALUE) { + list_del_init(&ni->ni_recovery); + lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_ni_unlock(ni); + lnet_ni_decref_locked(ni, 0); + lnet_net_unlock(0); + continue; + } + lnet_ni_unlock(ni); + lnet_net_unlock(0); + + /* protect the ni->ni_state field. Once we call the + * lnet_send_ping function it's possible we receive + * a response before we check the rc. The lock ensures + * a stable value for the ni_state RECOVERY_PENDING bit + */ + lnet_ni_lock(ni); + if (!(ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING)) { + ni->ni_state |= LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + mdh = ni->ni_ping_mdh; + /* Invalidate the ni mdh in case it's deleted. + * We'll unlink the mdh in this case below. + */ + LNetInvalidateMDHandle(&ni->ni_ping_mdh); + nid = ni->ni_nid; + + /* remove the NI from the local queue and drop the + * reference count to it while we're recovering + * it. The reason for that, is that the NI could + * be deleted, and the way the code is structured + * is if we don't drop the NI, then the deletion + * code will enter a loop waiting for the + * reference count to be removed while holding the + * ln_mutex_lock(). When we look up the peer to + * send to in lnet_select_pathway() we will try to + * lock the ln_mutex_lock() as well, leading to + * a deadlock. By dropping the refcount and + * removing it from the list, we allow for the NI + * to be removed, then we use the cached NID to + * look it up again. If it's gone, then we just + * continue examining the rest of the queue. + */ + lnet_net_lock(0); + list_del_init(&ni->ni_recovery); + lnet_ni_decref_locked(ni, 0); + lnet_net_unlock(0); + + rc = lnet_send_ping(nid, &mdh, + LNET_INTERFACES_MIN, (void *)nid, + the_lnet.ln_mt_eqh, true); + /* lookup the nid again */ + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + /* the NI has been deleted when we dropped + * the ref count + */ + lnet_net_unlock(0); + LNetMDUnlink(mdh); + continue; + } + /* Same note as in lnet_recover_peer_nis(). When + * we're sending the ping, the NI is free to be + * deleted or manipulated. By this point it + * could've been added back on the recovery queue, + * and a refcount taken on it. + * So we can't just add it blindly again or we'll + * corrupt the queue. We must check under lock if + * it's not on any list and if not then add it + * to the processed list, which will eventually be + * spliced back on to the recovery queue. + */ + ni->ni_ping_mdh = mdh; + if (list_empty(&ni->ni_recovery)) { + list_add_tail(&ni->ni_recovery, + &processed_list); + lnet_ni_addref_locked(ni, 0); + } + lnet_net_unlock(0); + + lnet_ni_lock(ni); + if (rc) + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + } + lnet_ni_unlock(ni); + } + + /* put back the remaining NIs on the ln_mt_localNIRecovq to be + * reexamined in the next iteration. + */ + list_splice_init(&processed_list, &local_queue); + lnet_net_lock(0); + list_splice(&local_queue, &the_lnet.ln_mt_localNIRecovq); + lnet_net_unlock(0); +} + +static struct list_head ** +lnet_create_array_of_queues(void) +{ + struct list_head **qs; + struct list_head *q; + int i; + + qs = cfs_percpt_alloc(lnet_cpt_table(), + sizeof(struct list_head)); + if (!qs) { + CERROR("Failed to allocate queues\n"); + return NULL; + } + + cfs_percpt_for_each(q, i, qs) + INIT_LIST_HEAD(q); + + return qs; +} + +static int +lnet_resendqs_create(void) +{ + struct list_head **resendqs; + + resendqs = lnet_create_array_of_queues(); + if (!resendqs) + return -ENOMEM; + + lnet_net_lock(LNET_LOCK_EX); + the_lnet.ln_mt_resendqs = resendqs; + lnet_net_unlock(LNET_LOCK_EX); + + return 0; +} + +static void +lnet_clean_local_ni_recoveryq(void) +{ + struct lnet_ni *ni; + + /* This is only called when the monitor thread has stopped */ + lnet_net_lock(0); + + while (!list_empty(&the_lnet.ln_mt_localNIRecovq)) { + ni = list_entry(the_lnet.ln_mt_localNIRecovq.next, + struct lnet_ni, ni_recovery); + list_del_init(&ni->ni_recovery); + lnet_ni_lock(ni); + lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_ni_unlock(ni); + lnet_ni_decref_locked(ni, 0); + } + + lnet_net_unlock(0); +} + +static void +lnet_clean_resendqs(void) +{ + struct lnet_msg *msg, *tmp; + struct list_head msgs; + int i; + + INIT_LIST_HEAD(&msgs); + + cfs_cpt_for_each(i, lnet_cpt_table()) { + lnet_net_lock(i); + list_splice_init(the_lnet.ln_mt_resendqs[i], &msgs); + lnet_net_unlock(i); + list_for_each_entry_safe(msg, tmp, &msgs, msg_list) { + list_del_init(&msg->msg_list); + msg->msg_no_resend = true; + lnet_finalize(msg, -ESHUTDOWN); + } + } + + cfs_percpt_free(the_lnet.ln_mt_resendqs); +} + static int lnet_monitor_thread(void *arg) { @@ -2405,6 +2732,10 @@ struct lnet_ni * if (lnet_router_checker_active()) lnet_check_routers(); + lnet_resend_pending_msgs(); + + lnet_recover_local_nis(); + /* TODO do we need to check if we should sleep without * timeout? Technically, an active system will always * have messages in flight so this check will always @@ -2429,42 +2760,180 @@ struct lnet_ni * return 0; } -int lnet_monitor_thr_start(void) +/* lnet_send_ping + * Sends a ping. + * Returns == 0 if success + * Returns > 0 if LNetMDBind or prior fails + * Returns < 0 if LNetGet fails + */ +int +lnet_send_ping(lnet_nid_t dest_nid, + struct lnet_handle_md *mdh, int nnis, + void *user_data, struct lnet_handle_eq eqh, bool recovery) { + struct lnet_md md = { NULL }; + struct lnet_process_id id; + struct lnet_ping_buffer *pbuf; int rc; + + if (dest_nid == LNET_NID_ANY) { + rc = -EHOSTUNREACH; + goto fail_error; + } + + pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); + if (!pbuf) { + rc = ENOMEM; + goto fail_error; + } + + /* initialize md content */ + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(nnis); + md.threshold = 2; /* GET/REPLY */ + md.max_size = 0; + md.options = LNET_MD_TRUNCATE; + md.user_ptr = user_data; + md.eq_handle = eqh; + + rc = LNetMDBind(md, LNET_UNLINK, mdh); + if (rc) { + lnet_ping_buffer_decref(pbuf); + CERROR("Can't bind MD: %d\n", rc); + rc = -rc; /* change the rc to positive */ + goto fail_error; + } + id.pid = LNET_PID_LUSTRE; + id.nid = dest_nid; + + rc = LNetGet(LNET_NID_ANY, *mdh, id, + LNET_RESERVED_PORTAL, + LNET_PROTO_PING_MATCHBITS, 0, recovery); + if (rc) + goto fail_unlink_md; + + return 0; + +fail_unlink_md: + LNetMDUnlink(*mdh); + LNetInvalidateMDHandle(mdh); +fail_error: + return rc; +} + +static void +lnet_mt_event_handler(struct lnet_event *event) +{ + lnet_nid_t nid = (lnet_nid_t)event->md.user_ptr; + struct lnet_ni *ni; + struct lnet_ping_buffer *pbuf; + + /* TODO: remove assert */ + LASSERT(event->type == LNET_EVENT_REPLY || + event->type == LNET_EVENT_SEND || + event->type == LNET_EVENT_UNLINK); + + CDEBUG(D_NET, "Received event: %d status: %d\n", event->type, + event->status); + + switch (event->type) { + case LNET_EVENT_REPLY: + /* If the NI has been restored completely then remove from + * the recovery queue + */ + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + lnet_net_unlock(0); + break; + } + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + lnet_net_unlock(0); + break; + case LNET_EVENT_SEND: + CDEBUG(D_NET, "%s recovery message sent %s:%d\n", + libcfs_nid2str(nid), + (event->status) ? "unsuccessfully" : + "successfully", event->status); + break; + case LNET_EVENT_UNLINK: + /* nothing to do */ + CDEBUG(D_NET, "%s recovery ping unlinked\n", + libcfs_nid2str(nid)); + break; + default: + CERROR("Unexpected event: %d\n", event->type); + return; + } + if (event->unlinked) { + pbuf = LNET_PING_INFO_TO_BUFFER(event->md.start); + lnet_ping_buffer_decref(pbuf); + } +} + +int lnet_monitor_thr_start(void) +{ + int rc = 0; struct task_struct *task; - LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + if (the_lnet.ln_mt_state != LNET_MT_STATE_SHUTDOWN) + return -EALREADY; - init_completion(&the_lnet.ln_mt_signal); + rc = lnet_resendqs_create(); + if (rc) + return rc; + + rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); + if (rc != 0) { + CERROR("Can't allocate monitor thread EQ: %d\n", rc); + goto clean_queues; + } /* Pre monitor thread start processing */ rc = lnet_router_pre_mt_start(); - if (!rc) - return rc; + if (rc) + goto free_mem; + + init_completion(&the_lnet.ln_mt_signal); the_lnet.ln_mt_state = LNET_MT_STATE_RUNNING; task = kthread_run(lnet_monitor_thread, NULL, "monitor_thread"); if (IS_ERR(task)) { rc = PTR_ERR(task); CERROR("Can't start monitor thread: %d\n", rc); - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_mt_signal); - - /* clean up */ - lnet_router_cleanup(); - the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; - return -ENOMEM; + goto clean_thread; } /* post monitor thread start processing */ lnet_router_post_mt_start(); return 0; + +clean_thread: + the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + /* block until event callback signals exit */ + wait_for_completion(&the_lnet.ln_mt_signal); + /* clean up */ + lnet_router_cleanup(); +free_mem: + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + LNetEQFree(the_lnet.ln_mt_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); + return rc; +clean_queues: + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + return rc; } void lnet_monitor_thr_stop(void) { + int rc; + if (the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) return; @@ -2478,7 +2947,12 @@ void lnet_monitor_thr_stop(void) wait_for_completion(&the_lnet.ln_mt_signal); LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + /* perform cleanup tasks */ lnet_router_cleanup(); + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + rc = LNetEQFree(the_lnet.ln_mt_eqh); + LASSERT(rc == 0); } void @@ -3173,6 +3647,8 @@ void lnet_monitor_thr_stop(void) lnet_drop_message(msg->msg_rxni, msg->msg_rx_cpt, msg->msg_private, msg->msg_len, msg->msg_type); + + msg->msg_no_resend = true; /* * NB: message will not generate event because w/o attached MD, * but we still should give error code so lnet_msg_decommit() @@ -3338,6 +3814,7 @@ void lnet_monitor_thr_stop(void) if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); + msg->msg_no_resend = true; lnet_finalize(msg, rc); } @@ -3476,7 +3953,7 @@ struct lnet_msg * int LNetGet(lnet_nid_t self, struct lnet_handle_md mdh, struct lnet_process_id target, unsigned int portal, - u64 match_bits, unsigned int offset) + u64 match_bits, unsigned int offset, bool recovery) { struct lnet_msg *msg; struct lnet_libmd *md; @@ -3499,6 +3976,8 @@ struct lnet_msg * return -ENOMEM; } + msg->msg_recovery = recovery; + cpt = lnet_cpt_of_cookie(mdh.cookie); lnet_res_lock(cpt); @@ -3542,6 +4021,7 @@ struct lnet_msg * if (rc < 0) { CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); + msg->msg_no_resend = true; lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 7869b96..e7f7469 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -469,6 +469,234 @@ return 0; } +static void +lnet_dec_healthv_locked(atomic_t *healthv) +{ + int h = atomic_read(healthv); + + if (h < lnet_health_sensitivity) { + atomic_set(healthv, 0); + } else { + h -= lnet_health_sensitivity; + atomic_set(healthv, h); + } +} + +static inline void +lnet_inc_healthv(atomic_t *healthv) +{ + atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); +} + +static void +lnet_handle_local_failure(struct lnet_msg *msg) +{ + struct lnet_ni *local_ni; + + local_ni = msg->msg_txni; + + /* the lnet_net_lock(0) is used to protect the addref on the ni + * and the recovery queue. + */ + lnet_net_lock(0); + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(0); + return; + } + + lnet_dec_healthv_locked(&local_ni->ni_healthv); + /* add the NI to the recovery queue if it's not already there + * and it's health value is actually below the maximum. It's + * possible that the sensitivity might be set to 0, and the health + * value will not be reduced. In this case, there is no reason to + * invoke recovery + */ + if (list_empty(&local_ni->ni_recovery) && + atomic_read(&local_ni->ni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("ni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(local_ni->ni_nid), + atomic_read(&local_ni->ni_healthv)); + list_add_tail(&local_ni->ni_recovery, + &the_lnet.ln_mt_localNIRecovq); + lnet_ni_addref_locked(local_ni, 0); + } + lnet_net_unlock(0); +} + +/* Do a health check on the message: + * return -1 if we're not going to handle the error + * success case will return -1 as well + * return 0 if it the message is requeued for send + */ +static int +lnet_health_check(struct lnet_msg *msg) +{ + enum lnet_msg_hstatus hstatus = msg->msg_health_status; + + /* TODO: lnet_incr_hstats(hstatus); */ + + LASSERT(msg->msg_txni); + + if (hstatus != LNET_MSG_STATUS_OK && + ktime_compare(ktime_get(), msg->msg_deadline) >= 0) + return -1; + + /* if we're shutting down no point in handling health. */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return -1; + + switch (hstatus) { + case LNET_MSG_STATUS_OK: + lnet_inc_healthv(&msg->msg_txni->ni_healthv); + /* we can finalize this message */ + return -1; + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + case LNET_MSG_STATUS_LOCAL_DROPPED: + case LNET_MSG_STATUS_LOCAL_ABORTED: + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + lnet_handle_local_failure(msg); + /* add to the re-send queue */ + goto resend; + + /* TODO: since the remote dropped the message we can + * attempt a resend safely. + */ + case LNET_MSG_STATUS_REMOTE_DROPPED: + break; + + /* These errors will not trigger a resend so simply + * finalize the message + */ + case LNET_MSG_STATUS_LOCAL_ERROR: + lnet_handle_local_failure(msg); + return -1; + case LNET_MSG_STATUS_REMOTE_ERROR: + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + return -1; + } + +resend: + /* don't resend recovery messages */ + if (msg->msg_recovery) + return -1; + + /* if we explicitly indicated we don't want to resend then just + * return + */ + if (msg->msg_no_resend) + return -1; + + lnet_net_lock(msg->msg_tx_cpt); + + /* remove message from the active list and reset it in preparation + * for a resend. Two exception to this + * + * 1. the router case, when a message is committed for rx when + * received, then tx when it is sent. When committed to both tx and + * rx we don't want to remove it from the active list. + * + * 2. The REPLY case since it uses the same msg block for the GET + * that was received. + */ + if (!msg->msg_routing && msg->msg_type != LNET_MSG_REPLY) { + list_del_init(&msg->msg_activelist); + msg->msg_onactivelist = 0; + } + + /* The msg_target.nid which was originally set + * when calling LNetGet() or LNetPut() might've + * been overwritten if we're routing this message. + * Call lnet_return_tx_credits_locked() to return + * the credit this message consumed. The message will + * consume another credit when it gets resent. + */ + msg->msg_target.nid = msg->msg_hdr.dest_nid; + lnet_msg_decommit_tx(msg, -EAGAIN); + msg->msg_sending = 0; + msg->msg_receiving = 0; + msg->msg_target_is_router = 0; + + CDEBUG(D_NET, "%s->%s:%s:%s - queuing for resend\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(hstatus)); + + list_add_tail(&msg->msg_list, the_lnet.ln_mt_resendqs[msg->msg_tx_cpt]); + lnet_net_unlock(msg->msg_tx_cpt); + + wake_up(&the_lnet.ln_mt_waitq); + return 0; +} + +static void +lnet_detach_md(struct lnet_msg *msg, int status) +{ + int cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + + lnet_res_lock(cpt); + lnet_msg_detach_md(msg, status); + lnet_res_unlock(cpt); +} + +static bool +lnet_is_health_check(struct lnet_msg *msg) +{ + bool hc; + int status = msg->msg_ev.status; + + /* perform a health check for any message committed for transmit */ + hc = msg->msg_tx_committed; + + /* Check for status inconsistencies */ + if (hc && + ((!status && msg->msg_health_status != LNET_MSG_STATUS_OK) || + (status && msg->msg_health_status == LNET_MSG_STATUS_OK))) { + CERROR("Msg is in inconsistent state, don't perform health checking (%d, %d)\n", + status, msg->msg_health_status); + hc = false; + } + + CDEBUG(D_NET, "health check = %d, status = %d, hstatus = %d\n", + hc, status, msg->msg_health_status); + + return hc; +} + +char * +lnet_health_error2str(enum lnet_msg_hstatus hstatus) +{ + switch (hstatus) { + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + return "LOCAL_INTERRUPT"; + case LNET_MSG_STATUS_LOCAL_DROPPED: + return "LOCAL_DROPPED"; + case LNET_MSG_STATUS_LOCAL_ABORTED: + return "LOCAL_ABORTED"; + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + return "LOCAL_NO_ROUTE"; + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + return "LOCAL_TIMEOUT"; + case LNET_MSG_STATUS_LOCAL_ERROR: + return "LOCAL_ERROR"; + case LNET_MSG_STATUS_REMOTE_DROPPED: + return "REMOTE_DROPPED"; + case LNET_MSG_STATUS_REMOTE_ERROR: + return "REMOTE_ERROR"; + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + return "REMOTE_TIMEOUT"; + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + return "NETWORK_TIMEOUT"; + case LNET_MSG_STATUS_OK: + return "OK"; + default: + return ""; + } +} + void lnet_finalize(struct lnet_msg *msg, int status) { @@ -477,6 +705,7 @@ int cpt; int rc; int i; + bool hc; LASSERT(!in_interrupt()); @@ -485,15 +714,27 @@ msg->msg_ev.status = status; - if (msg->msg_md) { - cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); - - lnet_res_lock(cpt); - lnet_msg_detach_md(msg, status); - lnet_res_unlock(cpt); - } + /* if the message is successfully sent, no need to keep the MD around */ + if (msg->msg_md && !status) + lnet_detach_md(msg, status); again: + hc = lnet_is_health_check(msg); + + /* the MD would've been detached from the message if it was + * successfully sent. However, if it wasn't successfully sent the + * MD would be around. And since we recalculate whether to + * health check or not, it's possible that we change our minds and + * we don't want to health check this message. In this case also + * free the MD. + * + * If the message is successful we're going to + * go through the lnet_health_check() function, but that'll just + * increment the appropriate health value and return. + */ + if (msg->msg_md && !hc) + lnet_detach_md(msg, status); + rc = 0; if (!msg->msg_tx_committed && !msg->msg_rx_committed) { /* not committed to network yet */ @@ -502,6 +743,28 @@ return; } + if (hc) { + /* Check the health status of the message. If it has one + * of the errors that we're supposed to handle, and it has + * not timed out, then + * 1. Decrement the appropriate health_value + * 2. queue the message on the resend queue + * + * if the message send is success, timed out or failed in the + * health check for any reason then we'll just finalize the + * message. Otherwise just return since the message has been + * put on the resend queue. + */ + if (!lnet_health_check(msg)) + return; + + /* if we get here then we need to clean up the md because we're + * finalizing the message. + */ + if (msg->msg_md) + lnet_detach_md(msg, status); + } + /* * NB: routed message can be committed for both receiving and sending, * we should finalize in LIFO order and keep counters correct. @@ -536,7 +799,7 @@ while ((msg = list_first_entry_or_null(&container->msc_finalizing, struct lnet_msg, msg_list)) != NULL) { - list_del(&msg->msg_list); + list_del_init(&msg->msg_list); /* * NB drops and regains the lnet lock if it actually does @@ -575,7 +838,7 @@ msg_activelist)) != NULL) { LASSERT(msg->msg_onactivelist); msg->msg_onactivelist = 0; - list_del(&msg->msg_activelist); + list_del_init(&msg->msg_activelist); kfree(msg); count++; } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 1534ab2..121876e 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2713,9 +2713,7 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) static int lnet_peer_send_ping(struct lnet_peer *lp) __must_hold(&lp->lp_lock) { - struct lnet_md md = { NULL }; - struct lnet_process_id id; - struct lnet_ping_buffer *pbuf; + lnet_nid_t pnid; int nnis; int rc; int cpt; @@ -2724,54 +2722,35 @@ static int lnet_peer_send_ping(struct lnet_peer *lp) lp->lp_state &= ~LNET_PEER_FORCE_PING; spin_unlock(&lp->lp_lock); - nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN); - pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); - if (!pbuf) { - rc = -ENOMEM; - goto fail_error; - } - - /* initialize md content */ - md.start = &pbuf->pb_info; - md.length = LNET_PING_INFO_SIZE(nnis); - md.threshold = 2; /* GET/REPLY */ - md.max_size = 0; - md.options = LNET_MD_TRUNCATE; - md.user_ptr = lp; - md.eq_handle = the_lnet.ln_dc_eqh; - - rc = LNetMDBind(md, LNET_UNLINK, &lp->lp_ping_mdh); - if (rc != 0) { - lnet_ping_buffer_decref(pbuf); - CERROR("Can't bind MD: %d\n", rc); - goto fail_error; - } cpt = lnet_net_lock_current(); /* Refcount for MD. */ lnet_peer_addref_locked(lp); - id.pid = LNET_PID_LUSTRE; - id.nid = lnet_peer_select_nid(lp); + pnid = lnet_peer_select_nid(lp); lnet_net_unlock(cpt); - if (id.nid == LNET_NID_ANY) { - rc = -EHOSTUNREACH; - goto fail_unlink_md; - } + nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN); - rc = LNetGet(LNET_NID_ANY, lp->lp_ping_mdh, id, - LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); - if (rc) - goto fail_unlink_md; + rc = lnet_send_ping(pnid, &lp->lp_ping_mdh, nnis, lp, + the_lnet.ln_dc_eqh, false); + /* if LNetMDBind in lnet_send_ping fails we need to decrement the + * refcount on the peer, otherwise LNetMDUnlink will be called + * which will eventually do that. + */ + if (rc > 0) { + lnet_net_lock(cpt); + lnet_peer_decref_locked(lp); + lnet_net_unlock(cpt); + rc = -rc; /* change the rc to negative value */ + goto fail_error; + } else if (rc < 0) { + goto fail_error; + } CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); spin_lock(&lp->lp_lock); return 0; -fail_unlink_md: - LNetMDUnlink(lp->lp_ping_mdh); - LNetInvalidateMDHandle(&lp->lp_ping_mdh); fail_error: CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); /* diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 3f9d8c5..7c3bbd8 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1079,7 +1079,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_net_unlock(rtr->lpni_cpt); rc = LNetGet(LNET_NID_ANY, mdh, id, LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); + LNET_PROTO_PING_MATCHBITS, 0, false); lnet_net_lock(rtr->lpni_cpt); if (rc) diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index 295d704..a5941e4 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -425,7 +425,7 @@ struct srpc_bulk * } else { LASSERT(options & LNET_MD_OP_GET); - rc = LNetGet(self, *mdh, peer, portal, matchbits, 0); + rc = LNetGet(self, *mdh, peer, portal, matchbits, 0, false); } if (rc) { From patchwork Thu Feb 27 21:09:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A575138D for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40507246A0 for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40507246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6456E3488AC; Thu, 27 Feb 2020 13:20:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5DAF221FAF1 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2DB40EEC; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2C66E46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:08 -0500 Message-Id: <1582838290-17243-81-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 080/622] lnet: handle o2iblnd tx failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Monitor the different types of failures that might occur on the transmit and flag the type of failure to be propagated to LNet which will handle either by attempting a resend or simply finalizing the message and propagating a failure to the ULP. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 8cf835e425d8 ("LU-9120 lnet: handle o2iblnd tx failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32765 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 2 +- net/lnet/klnds/o2iblnd/o2iblnd.h | 4 ++- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 59 ++++++++++++++++++++++++++++++++----- 3 files changed, 55 insertions(+), 10 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 825fe30..017fe5f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -519,7 +519,7 @@ static int kiblnd_del_peer(struct lnet_ni *ni, lnet_nid_t nid) write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(&zombies, -EIO); + kiblnd_txlist_done(&zombies, -EIO, LNET_MSG_STATUS_LOCAL_ERROR); return rc; } diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 9021051..999b58d 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -515,6 +515,7 @@ struct kib_tx { /* transmit message */ short tx_queued; /* queued for sending */ short tx_waiting; /* waiting for peer_ni */ int tx_status; /* LNET completion status */ + enum lnet_msg_hstatus tx_hstatus; /* health status of the transmit */ ktime_t tx_deadline; /* completion deadline */ u64 tx_cookie; /* completion cookie */ struct lnet_msg *tx_lntmsg[2]; /* lnet msgs to finalize on completion */ @@ -1027,7 +1028,8 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, void kiblnd_close_conn_locked(struct kib_conn *conn, int error); void kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid); -void kiblnd_txlist_done(struct list_head *txlist, int status); +void kiblnd_txlist_done(struct list_head *txlist, int status, + enum lnet_msg_hstatus hstatus); void kiblnd_qp_event(struct ib_event *event, void *arg); void kiblnd_cq_event(struct ib_event *event, void *arg); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 60706b4..007058a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -89,12 +89,17 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, if (!lntmsg[i]) continue; + /* propagate health status to LNet for requests */ + if (i == 0 && lntmsg[i]) + lntmsg[i]->msg_health_status = tx->tx_hstatus; + lnet_finalize(lntmsg[i], rc); } } void -kiblnd_txlist_done(struct list_head *txlist, int status) +kiblnd_txlist_done(struct list_head *txlist, int status, + enum lnet_msg_hstatus hstatus) { struct kib_tx *tx; @@ -105,6 +110,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, /* complete now */ tx->tx_waiting = 0; tx->tx_status = status; + tx->tx_hstatus = hstatus; kiblnd_tx_done(tx); } } @@ -134,6 +140,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, LASSERT(!tx->tx_nfrags); tx->tx_gaps = false; + tx->tx_hstatus = LNET_MSG_STATUS_OK; return tx; } @@ -265,10 +272,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, } if (!tx->tx_status) { /* success so far */ - if (status < 0) /* failed? */ + if (status < 0) { /* failed? */ tx->tx_status = status; - else if (txtype == IBLND_MSG_GET_REQ) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_ERROR; + } else if (txtype == IBLND_MSG_GET_REQ) { lnet_set_reply_msg_len(ni, tx->tx_lntmsg[1], status); + } } tx->tx_waiting = 0; @@ -846,6 +855,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, * posted NOOPs complete */ spin_unlock(&conn->ibc_lock); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); spin_lock(&conn->ibc_lock); CDEBUG(D_NET, "%s(%d): redundant or enough NOOP\n", @@ -1045,6 +1055,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, conn->ibc_noops_posted--; if (failed) { + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_DROPPED; tx->tx_waiting = 0; /* don't wait for peer_ni */ tx->tx_status = -EIO; } @@ -1393,7 +1404,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CWARN("Abort reconnection of %s: %s\n", libcfs_nid2str(peer_ni->ibp_nid), reason); - kiblnd_txlist_done(&txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED, + LNET_MSG_STATUS_LOCAL_ABORTED); return false; } @@ -1471,6 +1483,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (tx) { tx->tx_status = -EHOSTUNREACH; tx->tx_waiting = 0; + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); } return; @@ -1607,6 +1620,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup GET sink for %s: %d\n", libcfs_nid2str(target.nid), rc); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); return -EIO; } @@ -1757,6 +1771,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, return; failed_1: + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); failed_0: lnet_finalize(lntmsg, -EIO); @@ -1839,6 +1854,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup PUT sink for %s: %d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), rc); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); /* tell peer_ni it's over */ kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, @@ -2050,13 +2066,34 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (txs == &conn->ibc_active_txs) { LASSERT(!tx->tx_queued); LASSERT(tx->tx_waiting || tx->tx_sending); + if (conn->ibc_comms_error == -ETIMEDOUT) { + if (tx->tx_waiting && !tx->tx_sending) + tx->tx_hstatus = + LNET_MSG_STATUS_REMOTE_TIMEOUT; + else if (tx->tx_sending) + tx->tx_hstatus = + LNET_MSG_STATUS_NETWORK_TIMEOUT; + } } else { LASSERT(tx->tx_queued); + if (conn->ibc_comms_error == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; } tx->tx_status = -ECONNABORTED; tx->tx_waiting = 0; + /* TODO: This makes an assumption that + * kiblnd_tx_complete() will be called for each tx. If + * that event is dropped we could end up with stale + * connections floating around. We'd like to deal with + * that in a better way. + * + * Also that means we can exceed the timeout by many + * seconds. + */ if (!tx->tx_sending) { tx->tx_queued = 0; list_del(&tx->tx_list); @@ -2066,7 +2103,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, spin_unlock(&conn->ibc_lock); - kiblnd_txlist_done(&zombies, -ECONNABORTED); + /* aborting transmits occurs when finalizing the connection. + * The connection is finalized on error + */ + kiblnd_txlist_done(&zombies, -ECONNABORTED, -1); } static void @@ -2147,7 +2187,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CNETERR("Deleting messages for %s: connection failed\n", libcfs_nid2str(peer_ni->ibp_nid)); - kiblnd_txlist_done(&zombies, -EHOSTUNREACH); + kiblnd_txlist_done(&zombies, error, + LNET_MSG_STATUS_LOCAL_DROPPED); } static void @@ -2223,7 +2264,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, kiblnd_close_conn_locked(conn, -ECONNABORTED); write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(&txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED, + LNET_MSG_STATUS_LOCAL_ERROR); return; } @@ -3300,7 +3342,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); if (!list_empty(&timedout_txs)) - kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT); + kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT, + LNET_MSG_STATUS_LOCAL_TIMEOUT); /* * Handle timeout by closing the whole From patchwork Thu Feb 27 21:09:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409823 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A503E159A for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DEB1246A0 for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DEB1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 81F91348A0C; Thu, 27 Feb 2020 13:21:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B378221FAF1 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 30E2CEEE; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2FC5A46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:09 -0500 Message-Id: <1582838290-17243-82-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 081/622] lnet: handle socklnd tx failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Update the socklnd to propagate the health status up to LNet for handling. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 25c1cb2c4d6f ("LU-9120 lnet: handle socklnd tx failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32766 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.h | 1 + net/lnet/klnds/socklnd/socklnd_cb.c | 49 ++++++++++++++++++++++++++++++++++--- 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 04381a0..48884cf 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -289,6 +289,7 @@ struct ksock_tx { /* transmit packet */ time64_t tx_deadline; /* when (in secs) tx times out */ struct ksock_msg tx_msg; /* socklnd message buffer */ int tx_desc_size; /* size of this descriptor */ + enum lnet_msg_hstatus tx_hstatus; /* health status of tx */ union { struct { struct kvec iov; /* virt hdr */ diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 5b75ea6..d50e0d2 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -56,6 +56,7 @@ struct ksock_tx * tx->tx_zc_aborted = 0; tx->tx_zc_capable = 0; tx->tx_zc_checked = 0; + tx->tx_hstatus = LNET_MSG_STATUS_OK; tx->tx_desc_size = size; atomic_inc(&ksocknal_data.ksnd_nactive_txs); @@ -328,18 +329,26 @@ struct ksock_tx * ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx, int rc) { struct lnet_msg *lnetmsg = tx->tx_lnetmsg; + enum lnet_msg_hstatus hstatus = tx->tx_hstatus; LASSERT(ni || tx->tx_conn); - if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) + if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) { rc = -EIO; + hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } if (tx->tx_conn) ksocknal_conn_decref(tx->tx_conn); ksocknal_free_tx(tx); - if (lnetmsg) /* KSOCK_MSG_NOOP go without lnetmsg */ + if (lnetmsg) { /* KSOCK_MSG_NOOP go without lnetmsg */ + if (rc) + CERROR("tx failure rc = %d, hstatus = %d\n", rc, + hstatus); + lnetmsg->msg_health_status = hstatus; lnet_finalize(lnetmsg, rc); + } } void @@ -362,6 +371,20 @@ struct ksock_tx * list_del(&tx->tx_list); + if (tx->tx_hstatus == LNET_MSG_STATUS_OK) { + if (error == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + else if (error == -ENETDOWN || + error == -EHOSTUNREACH || + error == -ENETUNREACH) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_DROPPED; + /* for all other errors we don't want to + * retransmit + */ + else if (error) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } + LASSERT(atomic_read(&tx->tx_refcount) == 1); ksocknal_tx_done(ni, tx, error); } @@ -481,12 +504,25 @@ struct ksock_tx * wake_up(&ksocknal_data.ksnd_reaper_waitq); spin_unlock_bh(&ksocknal_data.ksnd_reaper_lock); + + /* set the health status of the message which determines + * whether we should retry the transmit + */ + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; return rc; } /* Actual error */ LASSERT(rc < 0); + /* set the health status of the message which determines + * whether we should retry the transmit + */ + if (rc == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (!conn->ksnc_closing) { switch (rc) { case -ECONNRESET: @@ -509,7 +545,7 @@ struct ksock_tx * ksocknal_uncheck_zc_req(tx); /* it's not an error if conn is being closed */ - ksocknal_close_conn_and_siblings(conn, (conn->ksnc_closing) ? 0 : rc); + ksocknal_close_conn_and_siblings(conn, conn->ksnc_closing ? 0 : rc); return rc; } @@ -2167,6 +2203,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) { /* We're called with a shared lock on ksnd_global_lock */ struct ksock_conn *conn; + struct ksock_tx *tx; list_for_each_entry(conn, &peer_ni->ksnp_conns, ksnc_list) { int error; @@ -2229,6 +2266,10 @@ void ksocknal_write_callback(struct ksock_conn *conn) * buffered in the socket's send buffer */ ksocknal_conn_addref(conn); + list_for_each_entry(tx, &conn->ksnc_tx_queue, + tx_list) + tx->tx_hstatus = + LNET_MSG_STATUS_LOCAL_TIMEOUT; CNETERR("Timeout sending data to %s (%pI4h:%d) the network or that node may be down.\n", libcfs_id2str(peer_ni->ksnp_id), &conn->ksnc_ipaddr, @@ -2255,6 +2296,8 @@ void ksocknal_write_callback(struct ksock_conn *conn) if (ktime_get_seconds() < tx->tx_deadline) break; + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + list_del(&tx->tx_list); list_add_tail(&tx->tx_list, &stale_txs); } From patchwork Thu Feb 27 21:09:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A55B14BC for ; Thu, 27 Feb 2020 21:23:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E710F246A0 for ; Thu, 27 Feb 2020 21:23:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E710F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B0DF3348AE1; Thu, 27 Feb 2020 13:21:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1719321FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3514AEF1; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 32E2A468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:10 -0500 Message-Id: <1582838290-17243-83-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 082/622] lnet: handle remote errors in LNet X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add health value in the peer NI structure. Decrement the value whenever there is an error sending to the peer. Modify the selection algorithm to look at the peer NI health value when selecting the best peer NI to send to. Put the peer NI on the recovery queue whenever there is an error sending to it. Attempt only to resend on REMOTE DROPPED since we're sure the message was never received by the peer. For other errors finalize the message. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 76fad19c2dea ("LU-9120 lnet: handle remote errors in LNet") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32767 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 6 + include/linux/lnet/lib-types.h | 12 ++ net/lnet/lnet/api-ni.c | 1 + net/lnet/lnet/lib-move.c | 311 +++++++++++++++++++++++++++++++++++------ net/lnet/lnet/lib-msg.c | 87 ++++++++++-- net/lnet/lnet/peer.c | 9 ++ 6 files changed, 368 insertions(+), 58 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 965fc5f..b8ca114 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -894,6 +894,12 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return false; } +static inline void +lnet_inc_healthv(atomic_t *healthv) +{ + atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); +} + void lnet_incr_stats(struct lnet_element_stats *stats, enum lnet_msg_type msg_type, enum lnet_stats_type stats_type); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 8c3bf34..19b83a4 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -478,6 +478,8 @@ struct lnet_peer_ni { struct list_head lpni_peer_nis; /* chain on remote peer list */ struct list_head lpni_on_remote_peer_ni_list; + /* chain on recovery queue */ + struct list_head lpni_recovery; /* chain on peer hash */ struct list_head lpni_hashlist; /* messages blocking for tx credits */ @@ -529,6 +531,10 @@ struct lnet_peer_ni { lnet_nid_t lpni_nid; /* # refs */ atomic_t lpni_refcount; + /* health value for the peer */ + atomic_t lpni_healthv; + /* recovery ping mdh */ + struct lnet_handle_md lpni_recovery_ping_mdh; /* CPT this peer attached on */ int lpni_cpt; /* state flags -- protected by lpni_lock */ @@ -558,6 +564,10 @@ struct lnet_peer_ni { /* Preferred path added due to traffic on non-MR peer_ni */ #define LNET_PEER_NI_NON_MR_PREF BIT(0) +/* peer is being recovered. */ +#define LNET_PEER_NI_RECOVERY_PENDING BIT(1) +/* peer is being deleted */ +#define LNET_PEER_NI_DELETING BIT(2) struct lnet_peer { /* chain on pt_peer_list */ @@ -1088,6 +1098,8 @@ struct lnet { struct list_head **ln_mt_resendqs; /* local NIs to recover */ struct list_head ln_mt_localNIRecovq; + /* local NIs to recover */ + struct list_head ln_mt_peerNIRecovq; /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index deef404..97d9be5 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -832,6 +832,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_dc_working); INIT_LIST_HEAD(&the_lnet.ln_dc_expired); INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq); + INIT_LIST_HEAD(&the_lnet.ln_mt_peerNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); rc = lnet_descriptor_setup(); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index f3f4b84..5224490 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1025,15 +1025,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } if (txpeer) { - /* - * TODO: - * Once the patch for the health comes in we need to set - * the health of the peer ni to bad when we fail to send - * a message. - * int status = msg->msg_ev.status; - * if (status != 0) - * lnet_set_peer_ni_health_locked(txpeer, false) - */ msg->msg_txpeer = NULL; lnet_peer_ni_decref_locked(txpeer); } @@ -1545,6 +1536,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, int best_lpni_credits = INT_MIN; bool preferred = false; bool ni_is_pref; + int best_lpni_healthv = 0; + int lpni_healthv; while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { /* if the best_ni we've chosen aleady has this lpni @@ -1553,6 +1546,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, best_ni->ni_nid); + lpni_healthv = atomic_read(&lpni->lpni_healthv); + CDEBUG(D_NET, "%s ni_is_pref = %d\n", libcfs_nid2str(best_ni->ni_nid), ni_is_pref); @@ -1562,8 +1557,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lpni->lpni_txcredits, best_lpni_credits, lpni->lpni_seq, best_lpni->lpni_seq); + /* pick the healthiest peer ni */ + if (lpni_healthv < best_lpni_healthv) { + continue; + } else if (lpni_healthv > best_lpni_healthv) { + best_lpni_healthv = lpni_healthv; /* if this is a preferred peer use it */ - if (!preferred && ni_is_pref) { + } else if (!preferred && ni_is_pref) { preferred = true; } else if (preferred && !ni_is_pref) { /* @@ -2408,6 +2408,16 @@ struct lnet_ni * return 0; } +enum lnet_mt_event_type { + MT_TYPE_LOCAL_NI = 0, + MT_TYPE_PEER_NI +}; + +struct lnet_mt_event_info { + enum lnet_mt_event_type mt_type; + lnet_nid_t mt_nid; +}; + static void lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) { @@ -2503,6 +2513,7 @@ struct lnet_ni * static void lnet_recover_local_nis(void) { + struct lnet_mt_event_info *ev_info; struct list_head processed_list; struct list_head local_queue; struct lnet_handle_md mdh; @@ -2550,15 +2561,24 @@ struct lnet_ni * lnet_ni_unlock(ni); lnet_net_unlock(0); - /* protect the ni->ni_state field. Once we call the - * lnet_send_ping function it's possible we receive - * a response before we check the rc. The lock ensures - * a stable value for the ni_state RECOVERY_PENDING bit - */ + CDEBUG(D_NET, "attempting to recover local ni: %s\n", + libcfs_nid2str(ni->ni_nid)); + lnet_ni_lock(ni); if (!(ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING)) { ni->ni_state |= LNET_NI_STATE_RECOVERY_PENDING; lnet_ni_unlock(ni); + + ev_info = kzalloc(sizeof(*ev_info), GFP_NOFS); + if (!ev_info) { + CERROR("out of memory. Can't recover %s\n", + libcfs_nid2str(ni->ni_nid)); + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + continue; + } + mdh = ni->ni_ping_mdh; /* Invalidate the ni mdh in case it's deleted. * We'll unlink the mdh in this case below. @@ -2587,9 +2607,10 @@ struct lnet_ni * lnet_ni_decref_locked(ni, 0); lnet_net_unlock(0); - rc = lnet_send_ping(nid, &mdh, - LNET_INTERFACES_MIN, (void *)nid, - the_lnet.ln_mt_eqh, true); + ev_info->mt_type = MT_TYPE_LOCAL_NI; + ev_info->mt_nid = nid; + rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN, + ev_info, the_lnet.ln_mt_eqh, true); /* lookup the nid again */ lnet_net_lock(0); ni = lnet_nid2ni_locked(nid, 0); @@ -2694,6 +2715,44 @@ struct lnet_ni * } static void +lnet_unlink_lpni_recovery_mdh_locked(struct lnet_peer_ni *lpni, int cpt) +{ + struct lnet_handle_md recovery_mdh; + + LNetInvalidateMDHandle(&recovery_mdh); + + if (lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING) { + recovery_mdh = lpni->lpni_recovery_ping_mdh; + LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); + } + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(cpt); + if (!LNetMDHandleIsInvalid(recovery_mdh)) + LNetMDUnlink(recovery_mdh); + lnet_net_lock(cpt); + spin_lock(&lpni->lpni_lock); +} + +static void +lnet_clean_peer_ni_recoveryq(void) +{ + struct lnet_peer_ni *lpni, *tmp; + + lnet_net_lock(LNET_LOCK_EX); + + list_for_each_entry_safe(lpni, tmp, &the_lnet.ln_mt_peerNIRecovq, + lpni_recovery) { + list_del_init(&lpni->lpni_recovery); + spin_lock(&lpni->lpni_lock); + lnet_unlink_lpni_recovery_mdh_locked(lpni, LNET_LOCK_EX); + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + } + + lnet_net_unlock(LNET_LOCK_EX); +} + +static void lnet_clean_resendqs(void) { struct lnet_msg *msg, *tmp; @@ -2716,6 +2775,128 @@ struct lnet_ni * cfs_percpt_free(the_lnet.ln_mt_resendqs); } +static void +lnet_recover_peer_nis(void) +{ + struct lnet_mt_event_info *ev_info; + struct list_head processed_list; + struct list_head local_queue; + struct lnet_handle_md mdh; + struct lnet_peer_ni *lpni; + struct lnet_peer_ni *tmp; + lnet_nid_t nid; + int healthv; + int rc; + + INIT_LIST_HEAD(&local_queue); + INIT_LIST_HEAD(&processed_list); + + /* Always use cpt 0 for locking across all interactions with + * ln_mt_peerNIRecovq + */ + lnet_net_lock(0); + list_splice_init(&the_lnet.ln_mt_peerNIRecovq, + &local_queue); + lnet_net_unlock(0); + + list_for_each_entry_safe(lpni, tmp, &local_queue, + lpni_recovery) { + /* The same protection strategy is used here as is in the + * local recovery case. + */ + lnet_net_lock(0); + healthv = atomic_read(&lpni->lpni_healthv); + spin_lock(&lpni->lpni_lock); + if (lpni->lpni_state & LNET_PEER_NI_DELETING || + healthv == LNET_MAX_HEALTH_VALUE) { + list_del_init(&lpni->lpni_recovery); + lnet_unlink_lpni_recovery_mdh_locked(lpni, 0); + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + continue; + } + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(0); + + /* NOTE: we're racing with peer deletion from user space. + * It's possible that a peer is deleted after we check its + * state. In this case the recovery can create a new peer + */ + spin_lock(&lpni->lpni_lock); + if (!(lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING) && + !(lpni->lpni_state & LNET_PEER_NI_DELETING)) { + lpni->lpni_state |= LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + + ev_info = kzalloc(sizeof(*ev_info), GFP_NOFS); + if (!ev_info) { + CERROR("out of memory. Can't recover %s\n", + libcfs_nid2str(lpni->lpni_nid)); + spin_lock(&lpni->lpni_lock); + lpni->lpni_state &= + ~LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + continue; + } + + /* look at the comments in lnet_recover_local_nis() */ + mdh = lpni->lpni_recovery_ping_mdh; + LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); + nid = lpni->lpni_nid; + lnet_net_lock(0); + list_del_init(&lpni->lpni_recovery); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + + ev_info->mt_type = MT_TYPE_PEER_NI; + ev_info->mt_nid = nid; + rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN, + ev_info, the_lnet.ln_mt_eqh, true); + lnet_net_lock(0); + /* lnet_find_peer_ni_locked() grabs a refcount for + * us. No need to take it explicitly. + */ + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(0); + LNetMDUnlink(mdh); + continue; + } + + lpni->lpni_recovery_ping_mdh = mdh; + /* While we're unlocked the lpni could've been + * readded on the recovery queue. In this case we + * don't need to add it to the local queue, since + * it's already on there and the thread that added + * it would've incremented the refcount on the + * peer, which means we need to decref the refcount + * that was implicitly grabbed by find_peer_ni_locked. + * Otherwise, if the lpni is still not on + * the recovery queue, then we'll add it to the + * processed list. + */ + if (list_empty(&lpni->lpni_recovery)) + list_add_tail(&lpni->lpni_recovery, + &processed_list); + else + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + + spin_lock(&lpni->lpni_lock); + if (rc) + lpni->lpni_state &= + ~LNET_PEER_NI_RECOVERY_PENDING; + } + spin_unlock(&lpni->lpni_lock); + } + + list_splice_init(&processed_list, &local_queue); + lnet_net_lock(0); + list_splice(&local_queue, &the_lnet.ln_mt_peerNIRecovq); + lnet_net_unlock(0); +} + static int lnet_monitor_thread(void *arg) { @@ -2736,6 +2917,8 @@ struct lnet_ni * lnet_recover_local_nis(); + lnet_recover_peer_nis(); + /* TODO do we need to check if we should sleep without * timeout? Technically, an active system will always * have messages in flight so this check will always @@ -2822,10 +3005,61 @@ struct lnet_ni * } static void +lnet_handle_recovery_reply(struct lnet_mt_event_info *ev_info, + int status) +{ + lnet_nid_t nid = ev_info->mt_nid; + + if (ev_info->mt_type == MT_TYPE_LOCAL_NI) { + struct lnet_ni *ni; + + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + lnet_net_unlock(0); + return; + } + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + lnet_net_unlock(0); + + if (status != 0) { + CERROR("local NI recovery failed with %d\n", status); + return; + } + /* need to increment healthv for the ni here, because in + * the lnet_finalize() path we don't have access to this + * NI. And in order to get access to it, we'll need to + * carry forward too much information. + * In the peer case, it'll naturally be incremented + */ + lnet_inc_healthv(&ni->ni_healthv); + } else { + struct lnet_peer_ni *lpni; + int cpt; + + cpt = lnet_net_lock_current(); + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(cpt); + return; + } + spin_lock(&lpni->lpni_lock); + lpni->lpni_state &= ~LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + + if (status != 0) + CERROR("peer NI recovery failed with %d\n", status); + } +} + +static void lnet_mt_event_handler(struct lnet_event *event) { - lnet_nid_t nid = (lnet_nid_t)event->md.user_ptr; - struct lnet_ni *ni; + struct lnet_mt_event_info *ev_info = event->md.user_ptr; struct lnet_ping_buffer *pbuf; /* TODO: remove assert */ @@ -2837,37 +3071,25 @@ struct lnet_ni * event->status); switch (event->type) { + case LNET_EVENT_UNLINK: + CDEBUG(D_NET, "%s recovery ping unlinked\n", + libcfs_nid2str(ev_info->mt_nid)); + /* fall-through */ case LNET_EVENT_REPLY: - /* If the NI has been restored completely then remove from - * the recovery queue - */ - lnet_net_lock(0); - ni = lnet_nid2ni_locked(nid, 0); - if (!ni) { - lnet_net_unlock(0); - break; - } - lnet_ni_lock(ni); - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; - lnet_ni_unlock(ni); - lnet_net_unlock(0); + lnet_handle_recovery_reply(ev_info, event->status); break; case LNET_EVENT_SEND: CDEBUG(D_NET, "%s recovery message sent %s:%d\n", - libcfs_nid2str(nid), + libcfs_nid2str(ev_info->mt_nid), (event->status) ? "unsuccessfully" : "successfully", event->status); break; - case LNET_EVENT_UNLINK: - /* nothing to do */ - CDEBUG(D_NET, "%s recovery ping unlinked\n", - libcfs_nid2str(nid)); - break; default: CERROR("Unexpected event: %d\n", event->type); - return; + break; } if (event->unlinked) { + kfree(ev_info); pbuf = LNET_PING_INFO_TO_BUFFER(event->md.start); lnet_ping_buffer_decref(pbuf); } @@ -2919,14 +3141,16 @@ int lnet_monitor_thr_start(void) lnet_router_cleanup(); free_mem: the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); LNetEQFree(the_lnet.ln_mt_eqh); LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); return rc; clean_queues: - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); return rc; } @@ -2949,8 +3173,9 @@ void lnet_monitor_thr_stop(void) /* perform cleanup tasks */ lnet_router_cleanup(); - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); rc = LNetEQFree(the_lnet.ln_mt_eqh); LASSERT(rc == 0); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index e7f7469..046923b 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -482,12 +482,6 @@ } } -static inline void -lnet_inc_healthv(atomic_t *healthv) -{ - atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); -} - static void lnet_handle_local_failure(struct lnet_msg *msg) { @@ -524,6 +518,43 @@ lnet_net_unlock(0); } +static void +lnet_handle_remote_failure(struct lnet_msg *msg) +{ + struct lnet_peer_ni *lpni; + + lpni = msg->msg_txpeer; + + /* lpni could be NULL if we're in the LOLND case */ + if (!lpni) + return; + + lnet_net_lock(0); + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(0); + return; + } + + lnet_dec_healthv_locked(&lpni->lpni_healthv); + /* add the peer NI to the recovery queue if it's not already there + * and it's health value is actually below the maximum. It's + * possible that the sensitivity might be set to 0, and the health + * value will not be reduced. In this case, there is no reason to + * invoke recovery + */ + if (list_empty(&lpni->lpni_recovery) && + atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("lpni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(lpni->lpni_nid), + atomic_read(&lpni->lpni_healthv)); + list_add_tail(&lpni->lpni_recovery, + &the_lnet.ln_mt_peerNIRecovq); + lnet_peer_ni_addref_locked(lpni); + } + lnet_net_unlock(0); +} + /* Do a health check on the message: * return -1 if we're not going to handle the error * success case will return -1 as well @@ -533,11 +564,20 @@ lnet_health_check(struct lnet_msg *msg) { enum lnet_msg_hstatus hstatus = msg->msg_health_status; + bool lo = false; /* TODO: lnet_incr_hstats(hstatus); */ LASSERT(msg->msg_txni); + /* if we're sending to the LOLND then the msg_txpeer will not be + * set. So no need to sanity check it. + */ + if (LNET_NETTYP(LNET_NIDNET(msg->msg_txni->ni_nid)) != LOLND) + LASSERT(msg->msg_txpeer); + else + lo = true; + if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; @@ -546,9 +586,21 @@ if (the_lnet.ln_state != LNET_STATE_RUNNING) return -1; + CDEBUG(D_NET, "health check: %s->%s: %s: %s\n", + libcfs_nid2str(msg->msg_txni->ni_nid), + (lo) ? "self" : libcfs_nid2str(msg->msg_txpeer->lpni_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(hstatus)); + switch (hstatus) { case LNET_MSG_STATUS_OK: lnet_inc_healthv(&msg->msg_txni->ni_healthv); + /* It's possible msg_txpeer is NULL in the LOLND + * case. + */ + if (msg->msg_txpeer) + lnet_inc_healthv(&msg->msg_txpeer->lpni_healthv); + /* we can finalize this message */ return -1; case LNET_MSG_STATUS_LOCAL_INTERRUPT: @@ -560,22 +612,27 @@ /* add to the re-send queue */ goto resend; - /* TODO: since the remote dropped the message we can - * attempt a resend safely. - */ - case LNET_MSG_STATUS_REMOTE_DROPPED: - break; - - /* These errors will not trigger a resend so simply - * finalize the message - */ + /* These errors will not trigger a resend so simply + * finalize the message + */ case LNET_MSG_STATUS_LOCAL_ERROR: lnet_handle_local_failure(msg); return -1; + + /* TODO: since the remote dropped the message we can + * attempt a resend safely. + */ + case LNET_MSG_STATUS_REMOTE_DROPPED: + lnet_handle_remote_failure(msg); + goto resend; + case LNET_MSG_STATUS_REMOTE_ERROR: case LNET_MSG_STATUS_REMOTE_TIMEOUT: case LNET_MSG_STATUS_NETWORK_TIMEOUT: + lnet_handle_remote_failure(msg); return -1; + default: + LBUG(); } resend: diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 121876e..4a62f9a 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -124,6 +124,7 @@ INIT_LIST_HEAD(&lpni->lpni_routes); INIT_LIST_HEAD(&lpni->lpni_hashlist); INIT_LIST_HEAD(&lpni->lpni_peer_nis); + INIT_LIST_HEAD(&lpni->lpni_recovery); INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list); spin_lock_init(&lpni->lpni_lock); @@ -133,6 +134,7 @@ lpni->lpni_ping_feats = LNET_PING_FEAT_INVAL; lpni->lpni_nid = nid; lpni->lpni_cpt = cpt; + atomic_set(&lpni->lpni_healthv, LNET_MAX_HEALTH_VALUE); lnet_set_peer_ni_health_locked(lpni, true); net = lnet_get_net_locked(LNET_NIDNET(nid)); @@ -331,6 +333,13 @@ /* remove peer ni from the hash list. */ list_del_init(&lpni->lpni_hashlist); + /* indicate the peer is being deleted so the monitor thread can + * remove it from the recovery queue. + */ + spin_lock(&lpni->lpni_lock); + lpni->lpni_state |= LNET_PEER_NI_DELETING; + spin_unlock(&lpni->lpni_lock); + /* decrement the ref count on the peer table */ ptable = the_lnet.ln_peer_tables[lpni->lpni_cpt]; LASSERT(atomic_read(&ptable->pt_number) > 0); From patchwork Thu Feb 27 21:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB713159A for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A43D5246A1 for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A43D5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70E10348845; Thu, 27 Feb 2020 13:20:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C43B21FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 37816EF3; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 35DA546F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:11 -0500 Message-Id: <1582838290-17243-84-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 083/622] lnet: add retry count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added a module parameter to define the number of retries on a message. It defaults to 0, which means no retries will be attempted. Each message will keep track of the number of times it has been retransmitted. When queuing it on the resend queue, the retry count will be checked and if it's exceeded, then the message will be finalized. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 20e23980eae2 ("LU-9120 lnet: add retry count") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32769 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/api-ni.c | 5 +++++ net/lnet/lnet/lib-msg.c | 8 +++++++- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index b8ca114..ace0d51 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -478,6 +478,7 @@ struct lnet_ni * struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_transaction_timeout; +extern unsigned int lnet_retry_count; extern unsigned int lnet_numa_range; extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_peer_discovery_disabled; diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 19b83a4..1108e3b 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -103,6 +103,8 @@ struct lnet_msg { enum lnet_msg_hstatus msg_health_status; /* This is a recovery message */ bool msg_recovery; + /* the number of times a transmission has been retried */ + int msg_retry_count; /* flag to indicate that we do not want to resend this message */ bool msg_no_resend; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 97d9be5..a54fe2c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -116,6 +116,11 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_transaction_timeout, "Time in seconds to wait for a REPLY or an ACK"); +unsigned int lnet_retry_count; +module_param(lnet_retry_count, uint, 0444); +MODULE_PARM_DESC(lnet_retry_count, + "Maximum number of times to retry transmitting a message"); + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 046923b..9841e14 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -556,7 +556,8 @@ } /* Do a health check on the message: - * return -1 if we're not going to handle the error + * return -1 if we're not going to handle the error or + * if we've reached the maximum number of retries. * success case will return -1 as well * return 0 if it the message is requeued for send */ @@ -646,6 +647,11 @@ if (msg->msg_no_resend) return -1; + /* check if the message has exceeded the number of retries */ + if (msg->msg_retry_count >= lnet_retry_count) + return -1; + msg->msg_retry_count++; + lnet_net_lock(msg->msg_tx_cpt); /* remove message from the active list and reset it in preparation From patchwork Thu Feb 27 21:09:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB225159A for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3E88246A0 for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3E88246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D41B43487E7; Thu, 27 Feb 2020 13:21:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C342921FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3A5E0EF4; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 38DDF46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:12 -0500 Message-Id: <1582838290-17243-85-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 084/622] lnet: calculate the lnd timeout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Calculate the LND timeout based on the transaction timeout and the retry count. Both of these are user defined values. Whenever they are set the lnd timeout is calculated. The LNDs use these timeouts instead of the LND timeout module parameter. Retry count can be set to 0, which means no retries. In that case the LND timeout will default to 5 seconds, which is the same as the default transaction timeout. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 84f3af43c4bd ("LU-9120 lnet: calculate the lnd timeout") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32770 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 20 +++++++++++--------- net/lnet/klnds/socklnd/socklnd.c | 6 +++--- net/lnet/klnds/socklnd/socklnd_cb.c | 22 ++++++++++++---------- net/lnet/lnet/api-ni.c | 9 +++++++++ 5 files changed, 37 insertions(+), 22 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index ace0d51..5500e3f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -85,6 +85,7 @@ extern struct kmem_cache *lnet_small_mds_cachep; /* <= LNET_SMALL_MD_SIZE bytes * MDs kmem_cache */ +#define LNET_LND_DEFAULT_TIMEOUT 5 static inline int lnet_is_route_alive(struct lnet_route *route) { @@ -676,6 +677,7 @@ void lnet_copy_kiov2iter(struct iov_iter *to, struct page *lnet_kvaddr_to_page(unsigned long vaddr); int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset); +unsigned int lnet_get_lnd_timeout(void); void lnet_register_lnd(struct lnet_lnd *lnd); void lnet_unregister_lnd(struct lnet_lnd *lnd); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 007058a..c6e8e73 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1205,7 +1205,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(!tx->tx_queued); /* not queued for sending already */ LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED); - timeout_ns = *kiblnd_tunables.kib_timeout * NSEC_PER_SEC; + timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC; tx->tx_queued = 1; tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns); @@ -1333,14 +1333,14 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (*kiblnd_tunables.kib_use_priv_port) { rc = kiblnd_resolve_addr(cmid, &srcaddr, &dstaddr, - *kiblnd_tunables.kib_timeout * 1000); + lnet_get_lnd_timeout() * 1000); } else { rc = rdma_resolve_addr(cmid, (struct sockaddr *)&srcaddr, (struct sockaddr *)&dstaddr, - *kiblnd_tunables.kib_timeout * 1000); + lnet_get_lnd_timeout() * 1000); } - if (rc) { + if (rc != 0) { /* Can't initiate address resolution: */ CERROR("Can't resolve addr for %s: %d\n", libcfs_nid2str(peer_ni->ibp_nid), rc); @@ -3097,8 +3097,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, event->status); rc = event->status; } else { - rc = rdma_resolve_route( - cmid, *kiblnd_tunables.kib_timeout * 1000); + rc = rdma_resolve_route(cmid, + lnet_get_lnd_timeout() * 1000); if (!rc) { struct kib_net *net = peer_ni->ibp_ni->ni_data; struct kib_dev *dev = net->ibn_dev; @@ -3499,6 +3499,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, const int n = 4; const int p = 1; int chunk = kiblnd_data.kib_peer_hash_size; + unsigned int lnd_timeout; spin_unlock_irqrestore(lock, flags); dropped_lock = 1; @@ -3512,9 +3513,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, * connection within (n+1)/n times the timeout * interval. */ - if (*kiblnd_tunables.kib_timeout > n * p) - chunk = (chunk * n * p) / - *kiblnd_tunables.kib_timeout; + + lnd_timeout = lnet_get_lnd_timeout(); + if (lnd_timeout > n * p) + chunk = (chunk * n * p) / lnd_timeout; if (!chunk) chunk = 1; diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 03fa706..891d3bd 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1284,7 +1284,7 @@ struct ksock_peer * /* Set the deadline for the outgoing HELLO to drain */ conn->ksnc_tx_bufnob = sock->sk->sk_wmem_queued; conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); mb(); /* order with adding to peer_ni's conn list */ list_add(&conn->ksnc_list, &peer_ni->ksnp_conns); @@ -1674,7 +1674,7 @@ struct ksock_peer * switch (conn->ksnc_rx_state) { case SOCKNAL_RX_LNET_PAYLOAD: last_rcv = conn->ksnc_rx_deadline - - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); CERROR("Completing partial receive from %s[%d], ip %pI4h:%d, with error, wanted: %zd, left: %d, last alive is %lld secs ago\n", libcfs_id2str(conn->ksnc_peer->ksnp_id), conn->ksnc_type, &conn->ksnc_ipaddr, conn->ksnc_port, @@ -1849,7 +1849,7 @@ struct ksock_peer * if (bufnob < conn->ksnc_tx_bufnob) { /* something got ACKed */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); peer_ni->ksnp_last_alive = now; conn->ksnc_tx_bufnob = bufnob; } diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index d50e0d2..8bc23d2 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -222,7 +222,7 @@ struct ksock_tx * * something got ACKed */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_tx_bufnob = bufnob; mb(); @@ -268,7 +268,7 @@ struct ksock_tx * conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_rx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); mb(); /* order with setting rx_started */ conn->ksnc_rx_started = 1; @@ -423,7 +423,7 @@ struct ksock_tx * /* ZC_REQ is going to be pinned to the peer_ni */ tx->tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); LASSERT(!tx->tx_msg.ksm_zc_cookies[0]); @@ -705,7 +705,7 @@ struct ksock_conn * if (list_empty(&conn->ksnc_tx_queue) && !bufnob) { /* First packet starts the timeout */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); if (conn->ksnc_tx_bufnob > 0) /* something got ACKed */ conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_tx_bufnob = 0; @@ -881,7 +881,7 @@ struct ksock_route * ksocknal_find_connecting_route_locked(peer_ni)) { /* the message is going to be pinned to the peer_ni */ tx->tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); /* Queue the message until a connection is established */ list_add_tail(&tx->tx_list, &peer_ni->ksnp_tx_queue); @@ -1663,7 +1663,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) /* socket type set on active connections - not set on passive */ LASSERT(!active == !(conn->ksnc_type != SOCKLND_CONN_NONE)); - timeout = active ? *ksocknal_tunables.ksnd_timeout : + timeout = active ? lnet_get_lnd_timeout() : lnet_acceptor_timeout(); rc = lnet_sock_read(sock, &hello->kshm_magic, @@ -1801,7 +1801,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) int retry_later = 0; int rc = 0; - deadline = ktime_get_seconds() + *ksocknal_tunables.ksnd_timeout; + deadline = ktime_get_seconds() + lnet_get_lnd_timeout(); write_lock_bh(&ksocknal_data.ksnd_global_lock); @@ -2552,6 +2552,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) const int n = 4; const int p = 1; int chunk = ksocknal_data.ksnd_peer_hash_size; + unsigned int lnd_timeout; /* * Time to check for timeouts on a few more peers: I do @@ -2561,9 +2562,10 @@ void ksocknal_write_callback(struct ksock_conn *conn) * timeout on any connection within (n+1)/n times the * timeout interval. */ - if (*ksocknal_tunables.ksnd_timeout > n * p) - chunk = (chunk * n * p) / - *ksocknal_tunables.ksnd_timeout; + + lnd_timeout = lnet_get_lnd_timeout(); + if (lnd_timeout > n * p) + chunk = (chunk * n * p) / lnd_timeout; if (!chunk) chunk = 1; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index a54fe2c..e467d64 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -121,6 +121,8 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_retry_count, "Maximum number of times to retry transmitting a message"); +unsigned int lnet_lnd_timeout = LNET_LND_DEFAULT_TIMEOUT; + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or @@ -570,6 +572,13 @@ static void lnet_assert_wire_constants(void) return NULL; } +unsigned int +lnet_get_lnd_timeout(void) +{ + return lnet_lnd_timeout; +} +EXPORT_SYMBOL(lnet_get_lnd_timeout); + void lnet_register_lnd(struct lnet_lnd *lnd) { From patchwork Thu Feb 27 21:09:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D84BC14BC for ; Thu, 27 Feb 2020 21:23:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1075246A0 for ; Thu, 27 Feb 2020 21:23:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1075246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9947C21FDCF; Thu, 27 Feb 2020 13:21:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 278EF21FA9A for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3D647EF5; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BB9346C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:13 -0500 Message-Id: <1582838290-17243-86-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 085/622] lnet: sysfs functions for module params X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Allow transaction timeout and retry count module parameters to be set and shown via sysfs. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 5169827bf790 ("LU-9120 lnet: sysfs functions for module params") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32861 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 84 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 77 insertions(+), 7 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e467d64..38e35bb 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -111,13 +111,27 @@ struct lnet the_lnet = { unsigned int lnet_transaction_timeout = 5; static int transaction_to_set(const char *val, const struct kernel_param *kp); -module_param_call(lnet_transaction_timeout, transaction_to_set, param_get_int, - &lnet_transaction_timeout, 0444); +static struct kernel_param_ops param_ops_transaction_timeout = { + .set = transaction_to_set, + .get = param_get_int, +}; + +#define param_check_transaction_timeout(name, p) \ + __param_check(name, p, int) +module_param(lnet_transaction_timeout, transaction_timeout, 0644); MODULE_PARM_DESC(lnet_transaction_timeout, - "Time in seconds to wait for a REPLY or an ACK"); + "Maximum number of seconds to wait for a peer response."); unsigned int lnet_retry_count; -module_param(lnet_retry_count, uint, 0444); +static int retry_count_set(const char *val, const struct kernel_param *kp); +static struct kernel_param_ops param_ops_retry_count = { + .set = retry_count_set, + .get = param_get_int, +}; + +#define param_check_retry_count(name, p) \ + __param_check(name, p, int) +module_param(lnet_retry_count, retry_count, 0644); MODULE_PARM_DESC(lnet_retry_count, "Maximum number of times to retry transmitting a message"); @@ -241,10 +255,15 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (value == 0) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value < lnet_retry_count || value == 0) { mutex_unlock(&the_lnet.ln_api_mutex); - CERROR("Invalid value for lnet_transaction_timeout (%lu).\n", - value); + CERROR("Invalid value for lnet_transaction_timeout (%lu). Has to be greater than lnet_retry_count (%u)\n", + value, lnet_retry_count); return -EINVAL; } @@ -254,6 +273,57 @@ static int lnet_discover(struct lnet_process_id id, u32 force, } *transaction_to = value; + if (lnet_retry_count == 0) + lnet_lnd_timeout = value; + else + lnet_lnd_timeout = value / lnet_retry_count; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int +retry_count_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *retry_count = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_retry_count'\n"); + return rc; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value > lnet_transaction_timeout) { + mutex_unlock(&the_lnet.ln_api_mutex); + CERROR("Invalid value for lnet_retry_count (%lu). Has to be smaller than lnet_transaction_timeout (%u)\n", + value, lnet_transaction_timeout); + return -EINVAL; + } + + if (value == *retry_count) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *retry_count = value; + + if (value == 0) + lnet_lnd_timeout = lnet_transaction_timeout; + else + lnet_lnd_timeout = lnet_transaction_timeout / value; mutex_unlock(&the_lnet.ln_api_mutex); From patchwork Thu Feb 27 21:09:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 221D3159A for ; Thu, 27 Feb 2020 21:22:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0AE60246A0 for ; Thu, 27 Feb 2020 21:22:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AE60246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 16EED21FF2D; Thu, 27 Feb 2020 13:21:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F80421FA53 for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3FEF2EF6; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3EBB446D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:14 -0500 Message-Id: <1582838290-17243-87-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 086/622] lnet: timeout delayed REPLYs and ACKs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a GET or a PUT which require an ACK are sent, add a response tracker block on a percpt queue. When the REPLY/ACK are received then remove the block from the percpt queue. The monitor thread will wake up periodically to check if any of the blocks have expired and if so, it will send a timeout event to the ULP and flag the MD as stale, then unlink. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: a57fa1176e74 ("LU-9120 lnet: timeout delayed REPLYs and ACKs") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32771 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 20 ++++ include/linux/lnet/lib-types.h | 20 ++++ net/lnet/lnet/lib-move.c | 210 ++++++++++++++++++++++++++++++++++++++++- net/lnet/lnet/lib-msg.c | 9 ++ 4 files changed, 258 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 5500e3f..c2191e5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -438,6 +438,25 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, lnet_net_unlock(0); } +static inline struct lnet_rsp_tracker * +lnet_rspt_alloc(int cpt) +{ + struct lnet_rsp_tracker *rspt; + + rspt = kzalloc(sizeof(*rspt), GFP_NOFS); + lnet_net_lock(cpt); + lnet_net_unlock(cpt); + return rspt; +} + +static inline void +lnet_rspt_free(struct lnet_rsp_tracker *rspt, int cpt) +{ + kfree(rspt); + lnet_net_lock(cpt); + lnet_net_unlock(cpt); +} + void lnet_ni_free(struct lnet_ni *ni); void lnet_net_free(struct lnet_net *net); @@ -614,6 +633,7 @@ struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *get_msg); void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, unsigned int len); +void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt); void lnet_finalize(struct lnet_msg *msg, int rc); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 1108e3b..d815a87 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -75,6 +75,17 @@ enum lnet_msg_hstatus { LNET_MSG_STATUS_NETWORK_TIMEOUT }; +struct lnet_rsp_tracker { + /* chain on the waiting list */ + struct list_head rspt_on_list; + /* cpt to lock */ + int rspt_cpt; + /* deadline of the REPLY/ACK */ + ktime_t rspt_deadline; + /* parent MD */ + struct lnet_handle_md rspt_mdh; +}; + struct lnet_msg { struct list_head msg_activelist; struct list_head msg_list; /* Q for credits/MD */ @@ -201,6 +212,7 @@ struct lnet_libmd { unsigned int md_flags; unsigned int md_niov; /* # frags at end of struct */ void *md_user_ptr; + struct lnet_rsp_tracker *md_rspt_ptr; struct lnet_eq *md_eq; struct lnet_handle_md md_bulk_handle; union { @@ -1102,6 +1114,14 @@ struct lnet { struct list_head ln_mt_localNIRecovq; /* local NIs to recover */ struct list_head ln_mt_peerNIRecovq; + /* + * An array of queues for GET/PUT waiting for REPLY/ACK respectively. + * There are CPT number of queues. Since response trackers will be + * added on the fast path we can't afford to grab the exclusive + * net lock to protect these queues. The CPT will be calculated + * based on the mdh cookie. + */ + struct list_head **ln_mt_rstq; /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 5224490..55cbf57 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2418,6 +2418,110 @@ struct lnet_mt_event_info { lnet_nid_t mt_nid; }; +void +lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt) +{ + struct lnet_rsp_tracker *rspt; + + /* msg has a refcount on the MD so the MD is not going away. + * The rspt queue for the cpt is protected by + * the lnet_net_lock(cpt). cpt is the cpt of the MD cookie. + */ + lnet_res_lock(cpt); + if (!md->md_rspt_ptr) { + lnet_res_unlock(cpt); + return; + } + rspt = md->md_rspt_ptr; + md->md_rspt_ptr = NULL; + + /* debug code */ + LASSERT(rspt->rspt_cpt == cpt); + + /* invalidate the handle to indicate that a response has been + * received, which will then lead the monitor thread to clean up + * the rspt block. + */ + LNetInvalidateMDHandle(&rspt->rspt_mdh); + lnet_res_unlock(cpt); +} + +static void +lnet_finalize_expired_responses(bool force) +{ + struct lnet_libmd *md; + struct list_head local_queue; + struct lnet_rsp_tracker *rspt, *tmp; + int i; + + if (!the_lnet.ln_mt_rstq) + return; + + cfs_cpt_for_each(i, lnet_cpt_table()) { + INIT_LIST_HEAD(&local_queue); + + lnet_net_lock(i); + if (!the_lnet.ln_mt_rstq[i]) { + lnet_net_unlock(i); + continue; + } + list_splice_init(the_lnet.ln_mt_rstq[i], &local_queue); + lnet_net_unlock(i); + + list_for_each_entry_safe(rspt, tmp, &local_queue, + rspt_on_list) { + /* The rspt mdh will be invalidated when a response + * is received or whenever we want to discard the + * block the monitor thread will walk the queue + * and clean up any rsts with an invalid mdh. + * The monitor thread will walk the queue until + * the first unexpired rspt block. This means that + * some rspt blocks which received their + * corresponding responses will linger in the + * queue until they are cleaned up eventually. + */ + lnet_res_lock(i); + if (LNetMDHandleIsInvalid(rspt->rspt_mdh)) { + lnet_res_unlock(i); + list_del_init(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + continue; + } + + if (ktime_compare(ktime_get(), + rspt->rspt_deadline) >= 0 || + force) { + md = lnet_handle2md(&rspt->rspt_mdh); + if (!md) { + LNetInvalidateMDHandle(&rspt->rspt_mdh); + lnet_res_unlock(i); + list_del_init(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + continue; + } + LASSERT(md->md_rspt_ptr == rspt); + md->md_rspt_ptr = NULL; + lnet_res_unlock(i); + + list_del_init(&rspt->rspt_on_list); + + CDEBUG(D_NET, + "Response timed out: md = %p\n", md); + LNetMDUnlink(rspt->rspt_mdh); + lnet_rspt_free(rspt, i); + } else { + lnet_res_unlock(i); + break; + } + } + + lnet_net_lock(i); + if (!list_empty(&local_queue)) + list_splice(&local_queue, the_lnet.ln_mt_rstq[i]); + lnet_net_unlock(i); + } +} + static void lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) { @@ -2900,6 +3004,8 @@ struct lnet_mt_event_info { static int lnet_monitor_thread(void *arg) { + int wakeup_counter = 0; + /* The monitor thread takes care of the following: * 1. Checks the aliveness of routers * 2. Checks if there are messages on the resend queue to resend @@ -2915,6 +3021,12 @@ struct lnet_mt_event_info { lnet_resend_pending_msgs(); + wakeup_counter++; + if (wakeup_counter >= lnet_transaction_timeout / 2) { + lnet_finalize_expired_responses(false); + wakeup_counter = 0; + } + lnet_recover_local_nis(); lnet_recover_peer_nis(); @@ -3095,6 +3207,29 @@ struct lnet_mt_event_info { } } +static int +lnet_rsp_tracker_create(void) +{ + struct list_head **rstqs; + + rstqs = lnet_create_array_of_queues(); + if (!rstqs) + return -ENOMEM; + + the_lnet.ln_mt_rstq = rstqs; + + return 0; +} + +static void +lnet_rsp_tracker_clean(void) +{ + lnet_finalize_expired_responses(true); + + cfs_percpt_free(the_lnet.ln_mt_rstq); + the_lnet.ln_mt_rstq = NULL; +} + int lnet_monitor_thr_start(void) { int rc = 0; @@ -3107,6 +3242,10 @@ int lnet_monitor_thr_start(void) if (rc) return rc; + rc = lnet_rsp_tracker_create(); + if (rc) + goto clean_queues; + rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); if (rc != 0) { CERROR("Can't allocate monitor thread EQ: %d\n", rc); @@ -3141,6 +3280,7 @@ int lnet_monitor_thr_start(void) lnet_router_cleanup(); free_mem: the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3148,6 +3288,7 @@ int lnet_monitor_thr_start(void) LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); return rc; clean_queues: + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3173,6 +3314,7 @@ void lnet_monitor_thr_stop(void) /* perform cleanup tasks */ lnet_router_cleanup(); + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3917,6 +4059,41 @@ void lnet_monitor_thr_stop(void) } } +static void +lnet_attach_rsp_tracker(struct lnet_rsp_tracker *rspt, int cpt, + struct lnet_libmd *md, struct lnet_handle_md mdh) +{ + s64 timeout_ns; + + /* MD has a refcount taken by message so it's not going away. + * The MD however can be looked up. We need to secure the access + * to the md_rspt_ptr by taking the res_lock. + * The rspt can be accessed without protection up to when it gets + * added to the list. + */ + + /* debug code */ + LASSERT(!md->md_rspt_ptr); + + /* we'll use that same event in case we never get a response */ + rspt->rspt_mdh = mdh; + rspt->rspt_cpt = cpt; + timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; + rspt->rspt_deadline = ktime_add_ns(ktime_get(), timeout_ns); + + lnet_res_lock(cpt); + /* store the rspt so we can access it when we get the REPLY */ + md->md_rspt_ptr = rspt; + lnet_res_unlock(cpt); + + /* add to the list of tracked responses. It's added to tail of the + * list in order to expire all the older entries first. + */ + lnet_net_lock(cpt); + list_add_tail(&rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); + lnet_net_unlock(cpt); +} + /** * Initiate an asynchronous PUT operation. * @@ -3968,6 +4145,7 @@ void lnet_monitor_thr_stop(void) u64 match_bits, unsigned int offset, u64 hdr_data) { + struct lnet_rsp_tracker *rspt = NULL; struct lnet_msg *msg; struct lnet_libmd *md; int cpt; @@ -3991,6 +4169,17 @@ void lnet_monitor_thr_stop(void) msg->msg_vmflush = !!(current->flags & PF_MEMALLOC); cpt = lnet_cpt_of_cookie(mdh.cookie); + + if (ack == LNET_ACK_REQ) { + rspt = lnet_rspt_alloc(cpt); + if (!rspt) { + CERROR("Dropping PUT to %s: ENOMEM on response tracker\n", + libcfs_id2str(target)); + return -ENOMEM; + } + INIT_LIST_HEAD(&rspt->rspt_on_list); + } + lnet_res_lock(cpt); md = lnet_handle2md(&mdh); @@ -4003,6 +4192,7 @@ void lnet_monitor_thr_stop(void) md->md_me->me_portal); lnet_res_unlock(cpt); + kfree(rspt); kfree(msg); return -ENOENT; } @@ -4035,11 +4225,15 @@ void lnet_monitor_thr_stop(void) lnet_build_msg_event(msg, LNET_EVENT_SEND); + if (ack == LNET_ACK_REQ) + lnet_attach_rsp_tracker(rspt, cpt, md, mdh); + rc = lnet_send(self, msg, LNET_NID_ANY); if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; + lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } @@ -4180,6 +4374,7 @@ struct lnet_msg * struct lnet_process_id target, unsigned int portal, u64 match_bits, unsigned int offset, bool recovery) { + struct lnet_rsp_tracker *rspt; struct lnet_msg *msg; struct lnet_libmd *md; int cpt; @@ -4201,9 +4396,18 @@ struct lnet_msg * return -ENOMEM; } + cpt = lnet_cpt_of_cookie(mdh.cookie); + + rspt = lnet_rspt_alloc(cpt); + if (!rspt) { + CERROR("Dropping GET to %s: ENOMEM on response tracker\n", + libcfs_id2str(target)); + return -ENOMEM; + } + INIT_LIST_HEAD(&rspt->rspt_on_list); + msg->msg_recovery = recovery; - cpt = lnet_cpt_of_cookie(mdh.cookie); lnet_res_lock(cpt); md = lnet_handle2md(&mdh); @@ -4218,6 +4422,7 @@ struct lnet_msg * lnet_res_unlock(cpt); kfree(msg); + kfree(rspt); return -ENOENT; } @@ -4242,11 +4447,14 @@ struct lnet_msg * lnet_build_msg_event(msg, LNET_EVENT_SEND); + lnet_attach_rsp_tracker(rspt, cpt, md, mdh); + rc = lnet_send(self, msg, LNET_NID_ANY); if (rc < 0) { CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; + lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 9841e14..5046648 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -777,6 +777,15 @@ msg->msg_ev.status = status; + /* if this is an ACK or a REPLY then make sure to remove the + * response tracker. + */ + if (msg->msg_ev.type == LNET_EVENT_REPLY || + msg->msg_ev.type == LNET_EVENT_ACK) { + cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + lnet_detach_rsp_tracker(msg->msg_md, cpt); + } + /* if the message is successfully sent, no need to keep the MD around */ if (msg->msg_md && !status) lnet_detach_md(msg, status); From patchwork Thu Feb 27 21:09:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409835 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4351138D for ; Thu, 27 Feb 2020 21:23:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BCF59246A0 for ; Thu, 27 Feb 2020 21:23:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BCF59246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0786221FE17; Thu, 27 Feb 2020 13:21:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D9D0B21FA5D for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 42A8EEF7; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4191D468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:15 -0500 Message-Id: <1582838290-17243-88-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 087/622] lnet: remove duplicate timeout mechanism X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Remove the duplicate GET/PUT timeout mechanism currently implemented for discovery, as it has been replaced by a more generic timeout mechanism for all GET/PUT messages. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 0b1947d14188 ("LU-9120 lnet: remove duplicate timeout mechanism") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32992 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 39 --------------------------------------- 1 file changed, 39 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4a62f9a..ca9b90b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2925,25 +2925,6 @@ static int lnet_peer_rediscover(struct lnet_peer *lp) } /* - * Returns the first peer on the ln_dc_working queue if its timeout - * has expired. Takes the current time as an argument so as to not - * obsessively re-check the clock. The oldest discovery request will - * be at the head of the queue. - */ -static struct lnet_peer *lnet_peer_get_dc_timed_out(time64_t now) -{ - struct lnet_peer *lp; - - if (list_empty(&the_lnet.ln_dc_working)) - return NULL; - lp = list_first_entry(&the_lnet.ln_dc_working, - struct lnet_peer, lp_dc_list); - if (now < lp->lp_last_queued + lnet_transaction_timeout) - return NULL; - return lp; -} - -/* * Discovering this peer is taking too long. Cancel any Ping or Push * that discovery is waiting on by unlinking the relevant MDs. The * lnet_discovery_event_handler() will proceed from here and complete @@ -2998,8 +2979,6 @@ static int lnet_peer_discovery_wait_for_work(void) break; if (!list_empty(&the_lnet.ln_msg_resend)) break; - if (lnet_peer_get_dc_timed_out(ktime_get_real_seconds())) - break; lnet_net_unlock(cpt); /* @@ -3068,7 +3047,6 @@ static void lnet_resend_msgs(void) static int lnet_peer_discovery(void *arg) { struct lnet_peer *lp; - time64_t now; int rc; CDEBUG(D_NET, "started\n"); @@ -3159,23 +3137,6 @@ static int lnet_peer_discovery(void *arg) break; } - /* - * Now that the ln_dc_request queue has been emptied - * check the ln_dc_working queue for peers that are - * taking too long. Move all that are found to the - * ln_dc_expired queue and time out any pending - * Ping or Push. We have to drop the lnet_net_lock - * in the loop because lnet_peer_cancel_discovery() - * calls LNetMDUnlink(). - */ - now = ktime_get_real_seconds(); - while ((lp = lnet_peer_get_dc_timed_out(now)) != NULL) { - list_move(&lp->lp_dc_list, &the_lnet.ln_dc_expired); - lnet_net_unlock(LNET_LOCK_EX); - lnet_peer_cancel_discovery(lp); - lnet_net_lock(LNET_LOCK_EX); - } - lnet_net_unlock(LNET_LOCK_EX); } From patchwork Thu Feb 27 21:09:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BA2F138D for ; Thu, 27 Feb 2020 21:23:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 541BF246A0 for ; Thu, 27 Feb 2020 21:23:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 541BF246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 799B9348B06; Thu, 27 Feb 2020 13:21:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27E3621FA9F for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 45ED4EF8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 447A346F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:16 -0500 Message-Id: <1582838290-17243-89-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 088/622] lnet: handle fatal device error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The o2iblnd can receive device status on the QP event handler. There are three in specific that are being handled in this patch: IB_EVENT_DEVICE_FATAL IB_EVENT_PORT_ERR IB_EVENT_PORT_ACTIVE For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set in fatal error mode. This NI will no longer be selected when sending messages. When PORT_ACTIVE is received the NI associated with the QP has the fatal error cleared and future messages can use that NI. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 6b1571209a99 ("LU-9120 lnet: handle fatal device error") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32772 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 7 +++++++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 13 +++++++++++++ net/lnet/lnet/lib-move.c | 6 +++++- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index d815a87..2b3e76a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -443,6 +443,13 @@ struct lnet_ni { atomic_t ni_healthv; /* + * Set to 1 by the LND when it receives an event telling it the device + * has gone into a fatal state. Set to 0 when the LND receives an + * even telling it the device is back online. + */ + atomic_t ni_fatal_error_on; + + /* * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index c6e8e73..293a859 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -3567,6 +3567,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, rdma_notify(conn->ibc_cmid, IB_EVENT_COMM_EST); return; + case IB_EVENT_PORT_ERR: + case IB_EVENT_DEVICE_FATAL: + CERROR("Fatal device error for NI %s\n", + libcfs_nid2str(conn->ibc_peer->ibp_ni->ni_nid)); + atomic_set(&conn->ibc_peer->ibp_ni->ni_fatal_error_on, 1); + return; + + case IB_EVENT_PORT_ACTIVE: + CERROR("Port reactivated for NI %s\n", + libcfs_nid2str(conn->ibc_peer->ibp_ni->ni_nid)); + atomic_set(&conn->ibc_peer->ibp_ni->ni_fatal_error_on, 0); + return; + default: CERROR("%s: Async QP event type %d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), event->event); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 55cbf57..8d5f1e5 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1303,9 +1303,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, unsigned int distance; int ni_credits; int ni_healthv; + int ni_fatal; ni_credits = atomic_read(&ni->ni_tx_credits); ni_healthv = atomic_read(&ni->ni_healthv); + ni_fatal = atomic_read(&ni->ni_fatal_error_on); /* * calculate the distance from the CPT on which @@ -1334,7 +1336,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * Select on health, shorter distance, available * credits, then round-robin. */ - if (ni_healthv < best_healthv) { + if (ni_fatal) { + continue; + } else if (ni_healthv < best_healthv) { continue; } else if (ni_healthv > best_healthv) { best_healthv = ni_healthv; From patchwork Thu Feb 27 21:09:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B71A6138D for ; Thu, 27 Feb 2020 21:23:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F991246A0 for ; Thu, 27 Feb 2020 21:23:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F991246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88500348AD7; Thu, 27 Feb 2020 13:21:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E27421FACC for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 48C73EF9; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 474DA46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:17 -0500 Message-Id: <1582838290-17243-90-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 089/622] lnet: reset health value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added an IOCTL to set the local or peer ni health value. This would be useful in debugging where we can test the selection algorithm and recovery mechanism by reducing the health of an interface. If the value specified is -1 then reset the health value to maximum. This is useful to reset the system once a network issue has been resolved. There would be no need to wait for the interface to go to fully healthy on its own. It might be desirable to shortcut the process. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 2f5a6d1233ac ("LU-9120 lnet: reset health value") Lustre-commit: b04c35874dca ("LU-11283 lnet: fix setting health value manually") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32773 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ include/uapi/linux/lnet/libcfs_ioctl.h | 3 +- include/uapi/linux/lnet/lnet-dlc.h | 14 ++++++++ net/lnet/lnet/api-ni.c | 51 +++++++++++++++++++++++++++ net/lnet/lnet/lib-msg.c | 16 +-------- net/lnet/lnet/peer.c | 64 ++++++++++++++++++++++++++++++++++ 6 files changed, 134 insertions(+), 16 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index c2191e5..bd6ea90 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -524,6 +524,8 @@ struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *lnet_get_ni_idx_locked(int idx); int lnet_get_peer_list(u32 *countp, u32 *sizep, struct lnet_process_id __user *ids); +extern void lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all); +extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni); void lnet_router_debugfs_init(void); void lnet_router_debugfs_fini(void); diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 4396d26..458a634 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -148,6 +148,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 99, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 101 +#define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 102 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 484435d..2d3aad8 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -230,6 +230,20 @@ struct lnet_ioctl_peer_cfg { void __user *prcfg_bulk; }; + +enum lnet_health_type { + LNET_HEALTH_TYPE_LOCAL_NI = 0, + LNET_HEALTH_TYPE_PEER_NI, +}; + +struct lnet_ioctl_reset_health_cfg { + struct libcfs_ioctl_hdr rh_hdr; + enum lnet_health_type rh_type; + bool rh_all; + int rh_value; + lnet_nid_t rh_nid; +}; + struct lnet_ioctl_set_value { struct libcfs_ioctl_hdr sv_hdr; __u32 sv_value; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 38e35bb..0cadb2a 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3163,6 +3163,35 @@ u32 lnet_get_dlc_seq_locked(void) return atomic_read(&lnet_dlc_seq_no); } +static void +lnet_ni_set_healthv(lnet_nid_t nid, int value, bool all) +{ + struct lnet_net *net; + struct lnet_ni *ni; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + if (ni->ni_nid == nid || all) { + atomic_set(&ni->ni_healthv, value); + if (list_empty(&ni->ni_recovery) && + value < LNET_MAX_HEALTH_VALUE) { + CERROR("manually adding local NI %s to recovery\n", + libcfs_nid2str(ni->ni_nid)); + list_add_tail(&ni->ni_recovery, + &the_lnet.ln_mt_localNIRecovq); + lnet_ni_addref_locked(ni, 0); + } + if (!all) { + lnet_net_unlock(LNET_LOCK_EX); + return; + } + } + } + } + lnet_net_unlock(LNET_LOCK_EX); +} + /** * LNet ioctl handler. * @@ -3446,6 +3475,28 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_SET_HEALHV: { + struct lnet_ioctl_reset_health_cfg *cfg = arg; + int value; + + if (cfg->rh_hdr.ioc_len < sizeof(*cfg)) + return -EINVAL; + if (cfg->rh_value < 0 || + cfg->rh_value > LNET_MAX_HEALTH_VALUE) + value = LNET_MAX_HEALTH_VALUE; + else + value = cfg->rh_value; + mutex_lock(&the_lnet.ln_api_mutex); + if (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) + lnet_ni_set_healthv(cfg->rh_nid, value, + cfg->rh_all); + else + lnet_peer_ni_set_healthv(cfg->rh_nid, value, + cfg->rh_all); + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + case IOC_LIBCFS_NOTIFY_ROUTER: { time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0]; diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 5046648..32d49e9 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -530,12 +530,6 @@ return; lnet_net_lock(0); - /* the mt could've shutdown and cleaned up the queues */ - if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { - lnet_net_unlock(0); - return; - } - lnet_dec_healthv_locked(&lpni->lpni_healthv); /* add the peer NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's @@ -543,15 +537,7 @@ * value will not be reduced. In this case, there is no reason to * invoke recovery */ - if (list_empty(&lpni->lpni_recovery) && - atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { - CERROR("lpni %s added to recovery queue. Health = %d\n", - libcfs_nid2str(lpni->lpni_nid), - atomic_read(&lpni->lpni_healthv)); - list_add_tail(&lpni->lpni_recovery, - &the_lnet.ln_mt_peerNIRecovq); - lnet_peer_ni_addref_locked(lpni); - } + lnet_peer_ni_add_to_recoveryq_locked(lpni); lnet_net_unlock(0); } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index ca9b90b..9dbb3bd4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3437,3 +3437,67 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) out: return rc; } + +void +lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni) +{ + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) + return; + + if (list_empty(&lpni->lpni_recovery) && + atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("lpni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(lpni->lpni_nid), + atomic_read(&lpni->lpni_healthv)); + list_add_tail(&lpni->lpni_recovery, + &the_lnet.ln_mt_peerNIRecovq); + lnet_peer_ni_addref_locked(lpni); + } +} + +/* Call with the ln_api_mutex held */ +void +lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all) +{ + struct lnet_peer_table *ptable; + struct lnet_peer *lp; + struct lnet_peer_net *lpn; + struct lnet_peer_ni *lpni; + int lncpt; + int cpt; + + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return; + + if (!all) { + lnet_net_lock(LNET_LOCK_EX); + lpni = lnet_find_peer_ni_locked(nid); + atomic_set(&lpni->lpni_healthv, value); + lnet_peer_ni_add_to_recoveryq_locked(lpni); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(LNET_LOCK_EX); + return; + } + + lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); + + /* Walk all the peers and reset the healhv for each one to the + * maximum value. + */ + lnet_net_lock(LNET_LOCK_EX); + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + list_for_each_entry(lp, &ptable->pt_peer_list, lp_peer_list) { + list_for_each_entry(lpn, &lp->lp_peer_nets, + lpn_peer_nets) { + list_for_each_entry(lpni, &lpn->lpn_peer_nis, + lpni_peer_nis) { + atomic_set(&lpni->lpni_healthv, value); + lnet_peer_ni_add_to_recoveryq_locked(lpni); + } + } + } + } + lnet_net_unlock(LNET_LOCK_EX); +} From patchwork Thu Feb 27 21:09:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E35F814BC for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB705246A1 for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB705246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0E69348910; Thu, 27 Feb 2020 13:21:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D76AD21FA63 for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4C0A7EFA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4A2BC46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:18 -0500 Message-Id: <1582838290-17243-91-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 090/622] lnet: add health statistics X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add a health statistics block for each local and peer NI. These statistics will be incremented when processing errors reported by lnet_finalize() WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 67908ab34371 ("LU-9120 lnet: add health statistics") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32775 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 18 +++++++++++++++ net/lnet/lnet/lib-msg.c | 52 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 2b3e76a..e5d4128 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -338,6 +338,22 @@ struct lnet_element_stats { struct lnet_comm_count el_drop_stats; }; +struct lnet_health_local_stats { + atomic_t hlt_local_interrupt; + atomic_t hlt_local_dropped; + atomic_t hlt_local_aborted; + atomic_t hlt_local_no_route; + atomic_t hlt_local_timeout; + atomic_t hlt_local_error; +}; + +struct lnet_health_remote_stats { + atomic_t hlt_remote_dropped; + atomic_t hlt_remote_timeout; + atomic_t hlt_remote_error; + atomic_t hlt_network_timeout; +}; + struct lnet_net { /* chain on the ln_nets */ struct list_head net_list; @@ -426,6 +442,7 @@ struct lnet_ni { /* NI statistics */ struct lnet_element_stats ni_stats; + struct lnet_health_local_stats ni_hstats; /* physical device CPT */ int ni_dev_cpt; @@ -511,6 +528,7 @@ struct lnet_peer_ni { struct list_head lpni_rtr_list; /* statistics kept on each peer NI */ struct lnet_element_stats lpni_stats; + struct lnet_health_remote_stats lpni_hstats; /* spin lock protecting credits and lpni_txq / lpni_rtrq */ spinlock_t lpni_lock; /* # tx credits available */ diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 32d49e9..dc51a17 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -541,6 +541,54 @@ lnet_net_unlock(0); } +static void +lnet_incr_hstats(struct lnet_msg *msg, enum lnet_msg_hstatus hstatus) +{ + struct lnet_ni *ni = msg->msg_txni; + struct lnet_peer_ni *lpni = msg->msg_txpeer; + + switch (hstatus) { + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + atomic_inc(&ni->ni_hstats.hlt_local_interrupt); + break; + case LNET_MSG_STATUS_LOCAL_DROPPED: + atomic_inc(&ni->ni_hstats.hlt_local_dropped); + break; + case LNET_MSG_STATUS_LOCAL_ABORTED: + atomic_inc(&ni->ni_hstats.hlt_local_aborted); + break; + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + atomic_inc(&ni->ni_hstats.hlt_local_no_route); + break; + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + atomic_inc(&ni->ni_hstats.hlt_local_timeout); + break; + case LNET_MSG_STATUS_LOCAL_ERROR: + atomic_inc(&ni->ni_hstats.hlt_local_error); + break; + case LNET_MSG_STATUS_REMOTE_DROPPED: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_dropped); + break; + case LNET_MSG_STATUS_REMOTE_ERROR: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_error); + break; + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_timeout); + break; + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_network_timeout); + break; + case LNET_MSG_STATUS_OK: + break; + default: + LBUG(); + } +} + /* Do a health check on the message: * return -1 if we're not going to handle the error or * if we've reached the maximum number of retries. @@ -553,8 +601,6 @@ enum lnet_msg_hstatus hstatus = msg->msg_health_status; bool lo = false; - /* TODO: lnet_incr_hstats(hstatus); */ - LASSERT(msg->msg_txni); /* if we're sending to the LOLND then the msg_txpeer will not be @@ -565,6 +611,8 @@ else lo = true; + lnet_incr_hstats(msg, hstatus); + if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; From patchwork Thu Feb 27 21:09:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38DEC14BC for ; Thu, 27 Feb 2020 21:23:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 20966246A0 for ; Thu, 27 Feb 2020 21:23:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20966246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 74056348AFD; Thu, 27 Feb 2020 13:21:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B1DE21FA63 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4E806EFB; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4D4A7468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:19 -0500 Message-Id: <1582838290-17243-92-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 091/622] lnet: Add ioctl to get health stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata At the time of this patch the sysfs statistics features is still in development. Therefore, using ioctl to get the stats from LNet. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 10958cac798d ("LU-9120 lnet: Add ioctl to get health stats") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32776 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++- include/uapi/linux/lnet/lnet-dlc.h | 31 ++++++++++++++++----- net/lnet/lnet/api-ni.c | 49 ++++++++++++++++++++++++++++++++++ net/lnet/lnet/peer.c | 29 ++++++++++++++++---- 5 files changed, 101 insertions(+), 12 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index bd6ea90..ba237df 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -823,6 +823,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, u32 *ni_peer_tx_credits, u32 *peer_tx_credits, u32 *peer_rtr_credits, u32 *peer_min_rtr_credtis, u32 *peer_tx_qnob); +int lnet_get_peer_ni_hstats(struct lnet_ioctl_peer_ni_hstats *stats); static inline bool lnet_is_peer_ni_healthy_locked(struct lnet_peer_ni *lpni) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 458a634..683d508 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -149,6 +149,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 102 +#define IOC_LIBCFS_GET_LOCAL_HSTATS _IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 103 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 2d3aad8..8e9850c 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -163,6 +163,31 @@ struct lnet_ioctl_element_stats { __u32 iel_drop_count; }; +enum lnet_health_type { + LNET_HEALTH_TYPE_LOCAL_NI = 0, + LNET_HEALTH_TYPE_PEER_NI, +}; + +struct lnet_ioctl_local_ni_hstats { + struct libcfs_ioctl_hdr hlni_hdr; + lnet_nid_t hlni_nid; + __u32 hlni_local_interrupt; + __u32 hlni_local_dropped; + __u32 hlni_local_aborted; + __u32 hlni_local_no_route; + __u32 hlni_local_timeout; + __u32 hlni_local_error; + __s32 hlni_health_value; +}; + +struct lnet_ioctl_peer_ni_hstats { + __u32 hlpni_remote_dropped; + __u32 hlpni_remote_timeout; + __u32 hlpni_remote_error; + __u32 hlpni_network_timeout; + __s32 hlpni_health_value; +}; + struct lnet_ioctl_element_msg_stats { struct libcfs_ioctl_hdr im_hdr; __u32 im_idx; @@ -230,12 +255,6 @@ struct lnet_ioctl_peer_cfg { void __user *prcfg_bulk; }; - -enum lnet_health_type { - LNET_HEALTH_TYPE_LOCAL_NI = 0, - LNET_HEALTH_TYPE_PEER_NI, -}; - struct lnet_ioctl_reset_health_cfg { struct libcfs_ioctl_hdr rh_hdr; enum lnet_health_type rh_type; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0cadb2a..14a8f2c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3192,6 +3192,42 @@ u32 lnet_get_dlc_seq_locked(void) lnet_net_unlock(LNET_LOCK_EX); } +static int +lnet_get_local_ni_hstats(struct lnet_ioctl_local_ni_hstats *stats) +{ + int cpt, rc = 0; + struct lnet_ni *ni; + lnet_nid_t nid = stats->hlni_nid; + + cpt = lnet_net_lock_current(); + ni = lnet_nid2ni_locked(nid, cpt); + + if (!ni) { + rc = -ENOENT; + goto unlock; + } + + stats->hlni_local_interrupt = + atomic_read(&ni->ni_hstats.hlt_local_interrupt); + stats->hlni_local_dropped = + atomic_read(&ni->ni_hstats.hlt_local_dropped); + stats->hlni_local_aborted = + atomic_read(&ni->ni_hstats.hlt_local_aborted); + stats->hlni_local_no_route = + atomic_read(&ni->ni_hstats.hlt_local_no_route); + stats->hlni_local_timeout = + atomic_read(&ni->ni_hstats.hlt_local_timeout); + stats->hlni_local_error = + atomic_read(&ni->ni_hstats.hlt_local_error); + stats->hlni_health_value = + atomic_read(&ni->ni_healthv); + +unlock: + lnet_net_unlock(cpt); + + return rc; +} + /** * LNet ioctl handler. * @@ -3399,6 +3435,19 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_GET_LOCAL_HSTATS: { + struct lnet_ioctl_local_ni_hstats *stats = arg; + + if (stats->hlni_hdr.ioc_len < sizeof(*stats)) + return -EINVAL; + + mutex_lock(&the_lnet.ln_api_mutex); + rc = lnet_get_local_ni_hstats(stats); + mutex_unlock(&the_lnet.ln_api_mutex); + + return rc; + } + case IOC_LIBCFS_ADD_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 9dbb3bd4..4a38ca6 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3339,6 +3339,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) { struct lnet_ioctl_element_stats *lpni_stats; struct lnet_ioctl_element_msg_stats *lpni_msg_stats; + struct lnet_ioctl_peer_ni_hstats *lpni_hstats; struct lnet_peer_ni_credit_info *lpni_info; struct lnet_peer_ni *lpni; struct lnet_peer *lp; @@ -3354,7 +3355,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) } size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats) + - sizeof(*lpni_msg_stats); + sizeof(*lpni_msg_stats) + sizeof(*lpni_hstats); size *= lp->lp_nnis; if (size > cfg->prcfg_size) { cfg->prcfg_size = size; @@ -3380,6 +3381,9 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_msg_stats = kzalloc(sizeof(*lpni_msg_stats), GFP_KERNEL); if (!lpni_msg_stats) goto out_free_stats; + lpni_hstats = kzalloc(sizeof(*lpni_hstats), GFP_NOFS); + if (!lpni_hstats) + goto out_free_msg_stats; lpni = NULL; @@ -3387,7 +3391,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { nid = lpni->lpni_nid; if (copy_to_user(bulk, &nid, sizeof(nid))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(nid); memset(lpni_info, 0, sizeof(*lpni_info)); @@ -3406,7 +3410,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_info->cr_peer_min_tx_credits = lpni->lpni_mintxcredits; lpni_info->cr_peer_tx_qnob = lpni->lpni_txqnob; if (copy_to_user(bulk, lpni_info, sizeof(*lpni_info))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_info); memset(lpni_stats, 0, sizeof(*lpni_stats)); @@ -3417,15 +3421,30 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_stats->iel_drop_count = lnet_sum_stats(&lpni->lpni_stats, LNET_STATS_TYPE_DROP); if (copy_to_user(bulk, lpni_stats, sizeof(*lpni_stats))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_stats); lnet_usr_translate_stats(lpni_msg_stats, &lpni->lpni_stats); if (copy_to_user(bulk, lpni_msg_stats, sizeof(*lpni_msg_stats))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_msg_stats); + lpni_hstats->hlpni_network_timeout = + atomic_read(&lpni->lpni_hstats.hlt_network_timeout); + lpni_hstats->hlpni_remote_dropped = + atomic_read(&lpni->lpni_hstats.hlt_remote_dropped); + lpni_hstats->hlpni_remote_timeout = + atomic_read(&lpni->lpni_hstats.hlt_remote_timeout); + lpni_hstats->hlpni_remote_error = + atomic_read(&lpni->lpni_hstats.hlt_remote_error); + lpni_hstats->hlpni_health_value = + atomic_read(&lpni->lpni_healthv); + if (copy_to_user(bulk, lpni_hstats, sizeof(*lpni_hstats))) + goto out_free_hstats; + bulk += sizeof(*lpni_hstats); } rc = 0; +out_free_hstats: + kfree(lpni_hstats); out_free_msg_stats: kfree(lpni_msg_stats); out_free_stats: From patchwork Thu Feb 27 21:09:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FE5414BC for ; Thu, 27 Feb 2020 21:23:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 68F3D246A0 for ; Thu, 27 Feb 2020 21:23:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68F3D246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 547BD21FED4; Thu, 27 Feb 2020 13:21:51 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 916AE21FA63 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 52C78EFC; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5043746D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:20 -0500 Message-Id: <1582838290-17243-93-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 092/622] lnet: remove obsolete health functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Removed obsolete health functions that were originally added during the Multi-Rail project. Some assumptions were made about the health implementation back then, that are no longer true. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: ba05b3a98a0c ("LU-9120 lnet: remove obsolete health functions") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32862 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 40 ---------------------------------------- net/lnet/lnet/api-ni.c | 9 --------- net/lnet/lnet/lib-move.c | 6 ------ net/lnet/lnet/peer.c | 8 -------- 4 files changed, 63 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index ba237df..74660d3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -494,7 +494,6 @@ struct lnet_ni * struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid); struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt); struct lnet_ni *lnet_net2ni_addref(u32 net); -bool lnet_is_ni_healthy_locked(struct lnet_ni *ni); struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_transaction_timeout; @@ -825,45 +824,6 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, u32 *peer_tx_qnob); int lnet_get_peer_ni_hstats(struct lnet_ioctl_peer_ni_hstats *stats); -static inline bool -lnet_is_peer_ni_healthy_locked(struct lnet_peer_ni *lpni) -{ - return lpni->lpni_healthy; -} - -static inline void -lnet_set_peer_ni_health_locked(struct lnet_peer_ni *lpni, bool health) -{ - lpni->lpni_healthy = health; -} - -static inline bool -lnet_is_peer_net_healthy_locked(struct lnet_peer_net *peer_net) -{ - struct lnet_peer_ni *lpni; - - list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_peer_nis) { - if (lnet_is_peer_ni_healthy_locked(lpni)) - return true; - } - - return false; -} - -static inline bool -lnet_is_peer_healthy_locked(struct lnet_peer *peer) -{ - struct lnet_peer_net *peer_net; - - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { - if (lnet_is_peer_net_healthy_locked(peer_net)) - return true; - } - - return false; -} - static inline struct lnet_peer_net * lnet_find_peer_net_locked(struct lnet_peer *peer, u32 net_id) { diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 14a8f2c..1ee24c7 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1155,15 +1155,6 @@ struct lnet_net * return !!net; } -bool -lnet_is_ni_healthy_locked(struct lnet_ni *ni) -{ - if (ni->ni_state & LNET_NI_STATE_ACTIVE) - return true; - - return false; -} - struct lnet_ni * lnet_nid2ni_locked(lnet_nid_t nid, int cpt) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 8d5f1e5..c33cf8d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2323,12 +2323,6 @@ struct lnet_ni * } lnet_peer_ni_decref_locked(lpni); - /* If peer is not healthy then can not send anything to it */ - if (!lnet_is_peer_healthy_locked(peer)) { - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - /* Identify the different send cases */ if (src_nid == LNET_NID_ANY) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4a38ca6..b20230b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -135,7 +135,6 @@ lpni->lpni_nid = nid; lpni->lpni_cpt = cpt; atomic_set(&lpni->lpni_healthv, LNET_MAX_HEALTH_VALUE); - lnet_set_peer_ni_health_locked(lpni, true); net = lnet_get_net_locked(LNET_NIDNET(nid)); lpni->lpni_net = net; @@ -2694,8 +2693,6 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) /* Look for a direct-connected NID for this peer. */ lpni = NULL; while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id)) continue; break; @@ -2706,8 +2703,6 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) /* Look for a routed-connected NID for this peer. */ lpni = NULL; while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; if (!lnet_find_rnet_locked(lpni->lpni_peer_net->lpn_net_id)) continue; break; @@ -3082,9 +3077,6 @@ static int lnet_peer_discovery(void *arg) * forever, in case the GET message (for ping) * doesn't get a REPLY or the PUT message (for * push) doesn't get an ACK. - * - * TODO: LNet Health will deal with this scenario - * in a generic way. */ lp->lp_last_queued = ktime_get_real_seconds(); lnet_net_unlock(LNET_LOCK_EX); From patchwork Thu Feb 27 21:09:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B053114BC for ; Thu, 27 Feb 2020 21:23:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 99043246A0 for ; Thu, 27 Feb 2020 21:23:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99043246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45F5D348B2A; Thu, 27 Feb 2020 13:21:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E764121FAD6 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 54E48EFD; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 535AD46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:21 -0500 Message-Id: <1582838290-17243-94-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 093/622] lnet: set health value from user space X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Collect debugging information for ioctl setting manually health value. Test if a peer is returned by lnet_find_peer_ni_locked() when lnet_get_peer_info() is called. This was discovered when the user land tools were updated for setting the health value. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: c0ad398fd716 ("LU-9120 lnet: set health value from user space") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32863 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 6 ++++++ net/lnet/lnet/peer.c | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 1ee24c7..82703dd 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3526,6 +3526,12 @@ u32 lnet_get_dlc_seq_locked(void) value = LNET_MAX_HEALTH_VALUE; else value = cfg->rh_value; + CDEBUG(D_NET, + "Manually setting healthv to %d for %s:%s. all = %d\n", + value, + (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) ? + "local" : "peer", + libcfs_nid2str(cfg->rh_nid), cfg->rh_all); mutex_lock(&the_lnet.ln_api_mutex); if (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) lnet_ni_set_healthv(cfg->rh_nid, value, diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b20230b..2fc5dfc 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3484,6 +3484,10 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) if (!all) { lnet_net_lock(LNET_LOCK_EX); lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(LNET_LOCK_EX); + return; + } atomic_set(&lpni->lpni_healthv, value); lnet_peer_ni_add_to_recoveryq_locked(lpni); lnet_peer_ni_decref_locked(lpni); From patchwork Thu Feb 27 21:09:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D8C1892A for ; Thu, 27 Feb 2020 21:33:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C14E024677 for ; Thu, 27 Feb 2020 21:33:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C14E024677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 23E82349D36; Thu, 27 Feb 2020 13:28:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3616A21FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 57D69EFE; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5698B46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:22 -0500 Message-Id: <1582838290-17243-95-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 094/622] lnet: add global health statistics X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added global health statistics Print that from lnetctl. lnetctl stats show lnet_selftest passes the statistics block over the wire. This, unfortunately, creates an unnecessary backwards compatibility link for lnet_selftest, which shouldn't be there. This patch breaks this backwards compatibility, which means lnet_selftest will not work with older selftest modules. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 15020fd977af ("LU-9120 lnet: add global health statistics") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32949 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ include/uapi/linux/lnet/lnet-types.h | 13 +++++++++++++ net/lnet/lnet/api-ni.c | 13 +++++++++++++ net/lnet/lnet/lib-move.c | 11 +++++++++++ net/lnet/lnet/lib-msg.c | 28 +++++++++++++++++++++++----- 5 files changed, 62 insertions(+), 5 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 74660d3..e4d9ccc 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -445,6 +445,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, rspt = kzalloc(sizeof(*rspt), GFP_NOFS); lnet_net_lock(cpt); + the_lnet.ln_counters[cpt]->rst_alloc++; lnet_net_unlock(cpt); return rspt; } @@ -454,6 +455,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, { kfree(rspt); lnet_net_lock(cpt); + the_lnet.ln_counters[cpt]->rst_alloc--; lnet_net_unlock(cpt); } diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h index 2afdd83..1da72c4 100644 --- a/include/uapi/linux/lnet/lnet-types.h +++ b/include/uapi/linux/lnet/lnet-types.h @@ -278,11 +278,24 @@ struct lnet_ping_info { struct lnet_counters { __u32 msgs_alloc; __u32 msgs_max; + __u32 rst_alloc; __u32 errors; __u32 send_count; __u32 recv_count; __u32 route_count; __u32 drop_count; + __u32 resend_count; + __u32 response_timeout_count; + __u32 local_interrupt_count; + __u32 local_dropped_count; + __u32 local_aborted_count; + __u32 local_no_route_count; + __u32 local_timeout_count; + __u32 local_error_count; + __u32 remote_dropped_count; + __u32 remote_error_count; + __u32 remote_timeout_count; + __u32 network_timeout_count; __u64 send_length; __u64 recv_length; __u64 route_length; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 82703dd..d58006d 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -694,7 +694,20 @@ static void lnet_assert_wire_constants(void) cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) { counters->msgs_max += ctr->msgs_max; counters->msgs_alloc += ctr->msgs_alloc; + counters->rst_alloc += ctr->rst_alloc; counters->errors += ctr->errors; + counters->resend_count += ctr->resend_count; + counters->response_timeout_count += ctr->response_timeout_count; + counters->local_interrupt_count += ctr->local_interrupt_count; + counters->local_dropped_count += ctr->local_dropped_count; + counters->local_aborted_count += ctr->local_aborted_count; + counters->local_no_route_count += ctr->local_no_route_count; + counters->local_timeout_count += ctr->local_timeout_count; + counters->local_error_count += ctr->local_error_count; + counters->remote_dropped_count += ctr->remote_dropped_count; + counters->remote_error_count += ctr->remote_error_count; + counters->remote_timeout_count += ctr->remote_timeout_count; + counters->network_timeout_count += ctr->network_timeout_count; counters->send_count += ctr->send_count; counters->recv_count += ctr->recv_count; counters->route_count += ctr->route_count; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index c33cf8d..6a3704d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2501,6 +2501,10 @@ struct lnet_mt_event_info { md->md_rspt_ptr = NULL; lnet_res_unlock(i); + lnet_net_lock(i); + the_lnet.ln_counters[i]->response_timeout_count++; + lnet_net_unlock(i); + list_del_init(&rspt->rspt_on_list); CDEBUG(D_NET, @@ -2567,6 +2571,11 @@ struct lnet_mt_event_info { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); + CDEBUG(D_NET, "resending %s->%s: %s recovery %d\n", + libcfs_nid2str(src_nid), + libcfs_id2str(msg->msg_target), + lnet_msgtyp2str(msg->msg_type), + msg->msg_recovery); rc = lnet_send(src_nid, msg, LNET_NID_ANY); if (rc) { CERROR("Error sending %s to %s: %d\n", @@ -2576,6 +2585,8 @@ struct lnet_mt_event_info { lnet_finalize(msg, rc); } lnet_net_lock(cpt); + if (!rc) + the_lnet.ln_counters[cpt]->resend_count++; } } } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index dc51a17..70decc7 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -546,41 +546,52 @@ { struct lnet_ni *ni = msg->msg_txni; struct lnet_peer_ni *lpni = msg->msg_txpeer; + struct lnet_counters *counters = the_lnet.ln_counters[0]; switch (hstatus) { case LNET_MSG_STATUS_LOCAL_INTERRUPT: atomic_inc(&ni->ni_hstats.hlt_local_interrupt); + counters->local_interrupt_count++; break; case LNET_MSG_STATUS_LOCAL_DROPPED: atomic_inc(&ni->ni_hstats.hlt_local_dropped); + counters->local_dropped_count++; break; case LNET_MSG_STATUS_LOCAL_ABORTED: atomic_inc(&ni->ni_hstats.hlt_local_aborted); + counters->local_aborted_count++; break; case LNET_MSG_STATUS_LOCAL_NO_ROUTE: atomic_inc(&ni->ni_hstats.hlt_local_no_route); + counters->local_no_route_count++; break; case LNET_MSG_STATUS_LOCAL_TIMEOUT: atomic_inc(&ni->ni_hstats.hlt_local_timeout); + counters->local_timeout_count++; break; case LNET_MSG_STATUS_LOCAL_ERROR: atomic_inc(&ni->ni_hstats.hlt_local_error); + counters->local_error_count++; break; case LNET_MSG_STATUS_REMOTE_DROPPED: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_dropped); + counters->remote_dropped_count++; break; case LNET_MSG_STATUS_REMOTE_ERROR: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_error); + counters->remote_error_count++; break; case LNET_MSG_STATUS_REMOTE_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_timeout); + counters->remote_timeout_count++; break; case LNET_MSG_STATUS_NETWORK_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_network_timeout); + counters->network_timeout_count++; break; case LNET_MSG_STATUS_OK: break; @@ -601,6 +612,10 @@ enum lnet_msg_hstatus hstatus = msg->msg_health_status; bool lo = false; + /* if we're shutting down no point in handling health. */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return -1; + LASSERT(msg->msg_txni); /* if we're sending to the LOLND then the msg_txpeer will not be @@ -611,15 +626,18 @@ else lo = true; - lnet_incr_hstats(msg, hstatus); - if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; - /* if we're shutting down no point in handling health. */ - if (the_lnet.ln_state != LNET_STATE_RUNNING) - return -1; + /* stats are only incremented for errors so avoid wasting time + * incrementing statistics if there is no error. + */ + if (hstatus != LNET_MSG_STATUS_OK) { + lnet_net_lock(0); + lnet_incr_hstats(msg, hstatus); + lnet_net_unlock(0); + } CDEBUG(D_NET, "health check: %s->%s: %s: %s\n", libcfs_nid2str(msg->msg_txni->ni_nid), From patchwork Thu Feb 27 21:09:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD4F5138D for ; Thu, 27 Feb 2020 21:24:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A4F5F246A0 for ; Thu, 27 Feb 2020 21:24:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A4F5F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3AE19348B49; Thu, 27 Feb 2020 13:21:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8CBE721FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5AE3BEFF; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 59B07468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:23 -0500 Message-Id: <1582838290-17243-96-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 095/622] lnet: print recovery queues content X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add commands to lnetctl to print recovery queues content from user space. Associated code to handle the IOCTL added in LNet module. for local NIs: lnetctl debug recovery --local for peer NIs: lnetctl debug recovery --peer WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 826ea19c077b ("LU-9120 lnet: print recovery queues content") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32950 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/uapi/linux/lnet/libcfs_ioctl.h | 3 +- include/uapi/linux/lnet/lnet-dlc.h | 8 +++++ net/lnet/lnet/api-ni.c | 53 ++++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 683d508..dfb73f7 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -150,6 +150,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_HSTATS _IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 103 +#define IOC_LIBCFS_GET_RECOVERY_QUEUE _IOWR(IOC_LIBCFS_TYPE, 104, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 104 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 8e9850c..87f7680 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -35,6 +35,7 @@ #define MAX_NUM_SHOW_ENTRIES 32 #define LNET_MAX_STR_LEN 128 #define LNET_MAX_SHOW_NUM_CPT 128 +#define LNET_MAX_SHOW_NUM_NID 128 #define LNET_UNDEFINED_HOPS ((__u32)(-1)) /* @@ -263,6 +264,13 @@ struct lnet_ioctl_reset_health_cfg { lnet_nid_t rh_nid; }; +struct lnet_ioctl_recovery_list { + struct libcfs_ioctl_hdr rlst_hdr; + enum lnet_health_type rlst_type; + int rlst_num_nids; + lnet_nid_t rlst_nid_array[LNET_MAX_SHOW_NUM_NID]; +}; + struct lnet_ioctl_set_value { struct libcfs_ioctl_hdr sv_hdr; __u32 sv_value; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index d58006d..07bc29f 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3232,6 +3232,44 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } +static int +lnet_get_local_ni_recovery_list(struct lnet_ioctl_recovery_list *list) +{ + struct lnet_ni *ni; + int i = 0; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(ni, &the_lnet.ln_mt_localNIRecovq, ni_recovery) { + list->rlst_nid_array[i] = ni->ni_nid; + i++; + if (i >= LNET_MAX_SHOW_NUM_NID) + break; + } + lnet_net_unlock(LNET_LOCK_EX); + list->rlst_num_nids = i; + + return 0; +} + +static int +lnet_get_peer_ni_recovery_list(struct lnet_ioctl_recovery_list *list) +{ + struct lnet_peer_ni *lpni; + int i = 0; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(lpni, &the_lnet.ln_mt_peerNIRecovq, lpni_recovery) { + list->rlst_nid_array[i] = lpni->lpni_nid; + i++; + if (i >= LNET_MAX_SHOW_NUM_NID) + break; + } + lnet_net_unlock(LNET_LOCK_EX); + list->rlst_num_nids = i; + + return 0; +} + /** * LNet ioctl handler. * @@ -3452,6 +3490,21 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_GET_RECOVERY_QUEUE: { + struct lnet_ioctl_recovery_list *list = arg; + + if (list->rlst_hdr.ioc_len < sizeof(*list)) + return -EINVAL; + + mutex_lock(&the_lnet.ln_api_mutex); + if (list->rlst_type == LNET_HEALTH_TYPE_LOCAL_NI) + rc = lnet_get_local_ni_recovery_list(list); + else + rc = lnet_get_peer_ni_recovery_list(list); + mutex_unlock(&the_lnet.ln_api_mutex); + return rc; + } + case IOC_LIBCFS_ADD_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; From patchwork Thu Feb 27 21:09:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D2691580 for ; Thu, 27 Feb 2020 21:27:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05B41246A0 for ; Thu, 27 Feb 2020 21:27:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05B41246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDEBD348BDA; Thu, 27 Feb 2020 13:24:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E2A6321FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F5541020; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5CA4846C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:24 -0500 Message-Id: <1582838290-17243-97-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 096/622] lnet: health error simulation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Modified the error simulation code to simulate health errors for testing purposes. The specific error can be set. If multiple errors are configured then one at random is chosen from the set. EX: lctl net_drop_add -s *@tcp -d *@tcp -m GET -i 1 -e local_interrupt The -e can be repeated multiple times to specify different errors to simulate. The available set are local_interrupt local_dropped local_aborted local_no_route local_error local_timeout remote_error remote_dropped remote_timeout network_timeout random a -n, "--random", has been added to randomize error generation for drop rules. This will rely an interval value provided via -i. This will generate a random number no bigger than interval. If the number is smaller than half of the interval then the rule isn't matched, otherwise it is. The purpose of this is because drop matching can happen multiple times in the path of sending the message, and using time based or rate will not result in even error generation across the multiple calls. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 5c17777d97bd ("LU-9120 lnet: health error simulation") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32951 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 4 +- include/linux/lnet/lib-types.h | 3 +- include/uapi/linux/lnet/lnetctl.h | 17 +++++++++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 ++- net/lnet/klnds/socklnd/socklnd_cb.c | 27 ++++++++++---- net/lnet/lnet/lib-move.c | 2 +- net/lnet/lnet/lib-msg.c | 24 ++++++++++++ net/lnet/lnet/net_fault.c | 73 ++++++++++++++++++++++++++++++++++--- 8 files changed, 138 insertions(+), 18 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index e4d9ccc..4915a87 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -639,6 +639,8 @@ void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt); void lnet_finalize(struct lnet_msg *msg, int rc); +bool lnet_send_error_simulation(struct lnet_msg *msg, + enum lnet_msg_hstatus *hstatus); void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, u32 msg_type); @@ -661,7 +663,7 @@ void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, int lnet_fault_init(void); void lnet_fault_fini(void); -bool lnet_drop_rule_match(struct lnet_hdr *hdr); +bool lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus); int lnet_delay_rule_add(struct lnet_fault_attr *attr); int lnet_delay_rule_del(lnet_nid_t src, lnet_nid_t dst, bool shutdown); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e5d4128..f82ebb6 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -72,7 +72,8 @@ enum lnet_msg_hstatus { LNET_MSG_STATUS_REMOTE_ERROR, LNET_MSG_STATUS_REMOTE_DROPPED, LNET_MSG_STATUS_REMOTE_TIMEOUT, - LNET_MSG_STATUS_NETWORK_TIMEOUT + LNET_MSG_STATUS_NETWORK_TIMEOUT, + LNET_MSG_STATUS_END, }; struct lnet_rsp_tracker { diff --git a/include/uapi/linux/lnet/lnetctl.h b/include/uapi/linux/lnet/lnetctl.h index 191689c..2eb9c82 100644 --- a/include/uapi/linux/lnet/lnetctl.h +++ b/include/uapi/linux/lnet/lnetctl.h @@ -41,6 +41,19 @@ enum { #define LNET_GET_BIT (1 << 2) #define LNET_REPLY_BIT (1 << 3) +#define HSTATUS_END 11 +#define HSTATUS_LOCAL_INTERRUPT_BIT (1 << 1) +#define HSTATUS_LOCAL_DROPPED_BIT (1 << 2) +#define HSTATUS_LOCAL_ABORTED_BIT (1 << 3) +#define HSTATUS_LOCAL_NO_ROUTE_BIT (1 << 4) +#define HSTATUS_LOCAL_ERROR_BIT (1 << 5) +#define HSTATUS_LOCAL_TIMEOUT_BIT (1 << 6) +#define HSTATUS_REMOTE_ERROR_BIT (1 << 7) +#define HSTATUS_REMOTE_DROPPED_BIT (1 << 8) +#define HSTATUS_REMOTE_TIMEOUT_BIT (1 << 9) +#define HSTATUS_NETWORK_TIMEOUT_BIT (1 << 10) +#define HSTATUS_RANDOM 0xffffffff + /** ioctl parameter for LNet fault simulation */ struct lnet_fault_attr { /** @@ -78,6 +91,10 @@ struct lnet_fault_attr { * with da_rate */ __u32 da_interval; + /** error type mask */ + __u32 da_health_error_mask; + /** randomize error generation */ + bool da_random; } drop; /** message latency simulation */ struct { diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 293a859..5680f2a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -912,7 +912,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, bad->wr_id, bad->opcode, bad->send_flags, libcfs_nid2str(conn->ibc_peer->ibp_nid)); bad = NULL; - rc = ib_post_send(conn->ibc_cmid->qp, wrq, &bad); + if (lnet_send_error_simulation(tx->tx_lntmsg[0], + &tx->tx_hstatus)) + rc = -EINVAL; + else + rc = ib_post_send(conn->ibc_cmid->qp, wrq, &bad); } conn->ibc_last_send = ktime_get(); diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 8bc23d2..057c7f3 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -335,7 +335,8 @@ struct ksock_tx * if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) { rc = -EIO; - hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (hstatus == LNET_MSG_STATUS_OK) + hstatus = LNET_MSG_STATUS_LOCAL_ERROR; } if (tx->tx_conn) @@ -467,6 +468,13 @@ struct ksock_tx * ksocknal_process_transmit(struct ksock_conn *conn, struct ksock_tx *tx) { int rc; + bool error_sim = false; + + if (lnet_send_error_simulation(tx->tx_lnetmsg, &tx->tx_hstatus)) { + error_sim = true; + rc = -EINVAL; + goto simulate_error; + } if (tx->tx_zc_capable && !tx->tx_zc_checked) ksocknal_check_zc_req(tx); @@ -512,16 +520,19 @@ struct ksock_tx * return rc; } +simulate_error: /* Actual error */ LASSERT(rc < 0); - /* set the health status of the message which determines - * whether we should retry the transmit - */ - if (rc == -ETIMEDOUT) - tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; - else - tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (!error_sim) { + /* set the health status of the message which determines + * whether we should retry the transmit + */ + if (rc == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } if (!conn->ksnc_closing) { switch (rc) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 6a3704d..eb0b48d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3875,7 +3875,7 @@ void lnet_monitor_thr_stop(void) } if (!list_empty(&the_lnet.ln_drop_rules) && - lnet_drop_rule_match(hdr)) { + lnet_drop_rule_match(hdr, NULL)) { CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n", libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid), lnet_msgtyp2str(type)); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 70decc7..5072238 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -812,6 +812,30 @@ } } +bool +lnet_send_error_simulation(struct lnet_msg *msg, + enum lnet_msg_hstatus *hstatus) +{ + if (!msg) + return false; + + if (list_empty(&the_lnet.ln_drop_rules)) + return false; + + /* match only health rules */ + if (!lnet_drop_rule_match(&msg->msg_hdr, hstatus)) + return false; + + CDEBUG(D_NET, "src %s, dst %s: %s simulate health error: %s\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(*hstatus)); + + return true; +} +EXPORT_SYMBOL(lnet_send_error_simulation); + void lnet_finalize(struct lnet_msg *msg, int status) { diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index 4589b17..becb709 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -292,13 +292,56 @@ struct lnet_drop_rule { lnet_net_unlock(cpt); } +static void +lnet_fault_match_health(enum lnet_msg_hstatus *hstatus, __u32 mask) +{ + int choice; + int delta; + int best_delta; + int i; + + /* assign a random failure */ + choice = prandom_u32_max(LNET_MSG_STATUS_END - LNET_MSG_STATUS_OK); + if (choice == 0) + choice++; + + if (mask == HSTATUS_RANDOM) { + *hstatus = choice; + return; + } + + if (mask & (1 << choice)) { + *hstatus = choice; + return; + } + + /* round to the closest ON bit */ + i = HSTATUS_END; + best_delta = HSTATUS_END; + while (i > 0) { + if (mask & (1 << i)) { + delta = choice - i; + if (delta < 0) + delta *= -1; + if (delta < best_delta) { + best_delta = delta; + choice = i; + } + } + i--; + } + + *hstatus = choice; +} + /** * check source/destination NID, portal, message type and drop rate, * decide whether should drop this message or not */ static bool drop_rule_match(struct lnet_drop_rule *rule, lnet_nid_t src, - lnet_nid_t dst, unsigned int type, unsigned int portal) + lnet_nid_t dst, unsigned int type, unsigned int portal, + enum lnet_msg_hstatus *hstatus) { struct lnet_fault_attr *attr = &rule->dr_attr; bool drop; @@ -306,9 +349,23 @@ struct lnet_drop_rule { if (!lnet_fault_attr_match(attr, src, dst, type, portal)) return false; + /* if we're trying to match a health status error but it hasn't + * been set in the rule, then don't match + */ + if ((hstatus && !attr->u.drop.da_health_error_mask) || + (!hstatus && attr->u.drop.da_health_error_mask)) + return false; + /* match this rule, check drop rate now */ spin_lock(&rule->dr_lock); - if (rule->dr_drop_time) { /* time based drop */ + if (attr->u.drop.da_random) { + int value = prandom_u32_max(attr->u.drop.da_interval); + + if (value >= (attr->u.drop.da_interval / 2)) + drop = true; + else + drop = false; + } else if (rule->dr_drop_time) { /* time based drop */ time64_t now = ktime_get_seconds(); rule->dr_stat.fs_count++; @@ -340,6 +397,9 @@ struct lnet_drop_rule { } if (drop) { /* drop this message, update counters */ + if (hstatus) + lnet_fault_match_health(hstatus, + attr->u.drop.da_health_error_mask); lnet_fault_stat_inc(&rule->dr_stat, type); rule->dr_stat.u.drop.ds_dropped++; } @@ -352,12 +412,12 @@ struct lnet_drop_rule { * Check if message from @src to @dst can match any existed drop rule */ bool -lnet_drop_rule_match(struct lnet_hdr *hdr) +lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus) { - struct lnet_drop_rule *rule; lnet_nid_t src = le64_to_cpu(hdr->src_nid); lnet_nid_t dst = le64_to_cpu(hdr->dest_nid); unsigned int typ = le32_to_cpu(hdr->type); + struct lnet_drop_rule *rule; unsigned int ptl = -1; bool drop = false; int cpt; @@ -373,12 +433,13 @@ struct lnet_drop_rule { cpt = lnet_net_lock_current(); list_for_each_entry(rule, &the_lnet.ln_drop_rules, dr_link) { - drop = drop_rule_match(rule, src, dst, typ, ptl); + drop = drop_rule_match(rule, src, dst, typ, ptl, + hstatus); if (drop) break; } - lnet_net_unlock(cpt); + return drop; } From patchwork Thu Feb 27 21:09:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CDB414BC for ; Thu, 27 Feb 2020 21:24:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 656A5246A0 for ; Thu, 27 Feb 2020 21:24:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 656A5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76DB021FF70; Thu, 27 Feb 2020 13:21:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44E0521FB39 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 626A11021; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F94646D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:25 -0500 Message-Id: <1582838290-17243-98-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 097/622] lustre: ptlrpc: replace simple_strtol with kstrtol X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Eventually simple_strtol() will be removed so replace its use in the ptlrpc with kstrtoXXX() class of functions. WC-bug-id: https://jira.whamcloud.com/browse/LU-9325 Lustre-commit: 8f37d64b6bc9 ("LU-9325 ptlrpc: replace simple_strtol with kstrtol") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32785 Reviewed-by: Andreas Dilger Reviewed-by: Nikitas Angelinas Signed-off-by: James Simmons --- fs/lustre/ptlrpc/lproc_ptlrpc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 6af3384..eb0ecc0 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -1303,13 +1303,13 @@ int lprocfs_wr_import(struct file *file, const char __user *buffer, ptr = strstr(uuid, "::"); if (ptr) { u32 inst; - char *endptr; + int rc; *ptr = 0; do_reconn = 0; ptr += strlen("::"); - inst = simple_strtoul(ptr, &endptr, 10); - if (*endptr) { + rc = kstrtouint(ptr, 10, &inst); + if (rc) { CERROR("config: wrong instance # %s\n", ptr); } else if (inst != imp->imp_connect_data.ocd_instance) { CDEBUG(D_INFO, From patchwork Thu Feb 27 21:09:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C9881580 for ; Thu, 27 Feb 2020 21:27:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15303246A0 for ; Thu, 27 Feb 2020 21:27:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15303246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B9D8349218; Thu, 27 Feb 2020 13:24:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88C7121FAD5 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 63EBC1022; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 62A2446A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:26 -0500 Message-Id: <1582838290-17243-99-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 098/622] lustre: obd: use correct ip_compute_csum() version X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The linux kernel provides a generic platform independent version of ip_compute_csum() as well as platform optimized versions. Some platforms will disable the generic platform version in favor of the optimized one. If the generic version is disabled and if the checksum.h header from asm-generic is used then we will end up with a undefined symbol error when loading the obdclass module. The solution is to use the platform specific checksum.h header that will handle using the generic or optimized version for us. As a bounus we get better performance with the right kernel configuration. WC-bug-id: https://jira.whamcloud.com/browse/LU-11224 Lustre-commit: 82fe90a1d07d ("LU-11224 obd: use correct ip_compute_csum() version") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32953 Reviewed-by: Li Xi Reviewed-by: Li Dongyang Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c index 8348b16..5cb9a25 100644 --- a/fs/lustre/obdclass/integrity.c +++ b/fs/lustre/obdclass/integrity.c @@ -28,7 +28,7 @@ */ #include #include -#include +#include #include #include From patchwork Thu Feb 27 21:09:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410259 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03A08138D for ; Thu, 27 Feb 2020 21:33:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E02FA24677 for ; Thu, 27 Feb 2020 21:33:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E02FA24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38C71349267; Thu, 27 Feb 2020 13:28:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CCEF821FAD5 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 66F6D1023; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6572E46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:27 -0500 Message-Id: <1582838290-17243-100-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 099/622] lustre: osc: serialize access to idle_timeout vs cleanup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev use lprocfs_climp_check() and up_read() as cl_import can disappear due to umount. WC-bug-id: https://jira.whamcloud.com/browse/LU-11175 Lustre-commit: 5874da0b670b ("LU-11175 osc: serialize access to idle_timeout vs cleanup") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/32883 Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 0a12079..efb4998 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -604,8 +604,15 @@ static ssize_t idle_timeout_show(struct kobject *kobj, struct attribute *attr, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &obd->u.cli; + int ret; - return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); + ret = lprocfs_climp_check(obd); + if (ret) + return ret; + ret = sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); + up_read(&obd->u.cli.cl_sem); + + return ret; } static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, @@ -625,6 +632,10 @@ static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, if (val > CONNECTION_SWITCH_MAX) return -ERANGE; + rc = lprocfs_climp_check(obd); + if (rc) + return rc; + cli->cl_import->imp_idle_timeout = val; /* to initiate the connection if it's in IDLE state */ @@ -633,6 +644,7 @@ static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, if (req) ptlrpc_req_finished(req); } + up_read(&obd->u.cli.cl_sem); return count; } @@ -645,12 +657,18 @@ static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, obd_kset.kobj); struct client_obd *cli = &dev->u.cli; struct ptlrpc_request *req; + int rc; + + rc = lprocfs_climp_check(dev); + if (rc) + return rc; /* to initiate the connection if it's in IDLE state */ req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); if (req) ptlrpc_req_finished(req); ptlrpc_pinger_force(cli->cl_import); + up_read(&dev->u.cli.cl_sem); return count; } From patchwork Thu Feb 27 21:09:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24AB014BC for ; Thu, 27 Feb 2020 21:24:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D0B7246A0 for ; Thu, 27 Feb 2020 21:24:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D0B7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 01A8E348B9C; Thu, 27 Feb 2020 13:22:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1A0C421FB4B for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6E16E1024; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C584468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:28 -0500 Message-Id: <1582838290-17243-101-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 100/622] lustre: mdc: remove obsolete intent opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In enum ldlm_intent_flags, remove the obsolete constants IT_UNLINK, IT_TRUNC, IT_EXEC, IT_PIN, IT_SETXATTR. Remove any handling code for these opcodes. WC-bug-id: https://jira.whamcloud.com/browse/LU-11014 Lustre-commit: 511ea5850f25 ("LU-11014 mdc: remove obsolete intent opcodes") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32361 Reviewed-by: Fan Yong Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 - fs/lustre/include/obd.h | 4 +--- fs/lustre/ldlm/ldlm_lock.c | 2 -- fs/lustre/mdc/mdc_locks.c | 44 +++------------------------------- fs/lustre/ptlrpc/layout.c | 15 ------------ include/uapi/linux/lustre/lustre_idl.h | 14 +++++------ 6 files changed, 11 insertions(+), 69 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 807d080..ed4fc42 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -203,7 +203,6 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_LDLM_INTENT_GETATTR; extern struct req_format RQF_LDLM_INTENT_OPEN; extern struct req_format RQF_LDLM_INTENT_CREATE; -extern struct req_format RQF_LDLM_INTENT_UNLINK; extern struct req_format RQF_LDLM_INTENT_GETXATTR; extern struct req_format RQF_LDLM_CANCEL; extern struct req_format RQF_LDLM_CALLBACK; diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index de9642f..175a99f 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -700,8 +700,6 @@ static inline int it_to_lock_mode(struct lookup_intent *it) return LCK_PR; else if (it->it_op & IT_GETXATTR) return LCK_PR; - else if (it->it_op & IT_SETXATTR) - return LCK_PW; LASSERTF(0, "Invalid it_op: %d\n", it->it_op); return -EINVAL; @@ -730,7 +728,7 @@ enum md_cli_flags { */ static inline bool it_has_reply_body(const struct lookup_intent *it) { - return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR); + return it->it_op & (IT_OPEN | IT_LOOKUP | IT_GETATTR); } struct md_op_data { diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 1bf387a..4f746ad 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -123,8 +123,6 @@ const char *ldlm_it2str(enum ldlm_intent_flags it) return "getattr"; case IT_LOOKUP: return "lookup"; - case IT_UNLINK: - return "unlink"; case IT_GETXATTR: return "getxattr"; case IT_LAYOUT: diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index abbc908..80f2e10 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -430,42 +430,6 @@ static int mdc_save_lovea(struct ptlrpc_request *req, return req; } -static struct ptlrpc_request *mdc_intent_unlink_pack(struct obd_export *exp, - struct lookup_intent *it, - struct md_op_data *op_data) -{ - struct ptlrpc_request *req; - struct obd_device *obddev = class_exp2obd(exp); - struct ldlm_intent *lit; - int rc; - - req = ptlrpc_request_alloc(class_exp2cliimp(exp), - &RQF_LDLM_INTENT_UNLINK); - if (!req) - return ERR_PTR(-ENOMEM); - - req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, - op_data->op_namelen + 1); - - rc = ldlm_prep_enqueue_req(exp, req, NULL, 0); - if (rc) { - ptlrpc_request_free(req); - return ERR_PTR(rc); - } - - /* pack the intent */ - lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); - lit->opc = (u64)it->it_op; - - /* pack the intended request */ - mdc_unlink_pack(req, op_data); - - req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, - obddev->u.cli.cl_default_mds_easize); - ptlrpc_request_set_replen(req); - return req; -} - static struct ptlrpc_request * mdc_intent_getattr_pack(struct obd_export *exp, struct lookup_intent *it, struct md_op_data *op_data, u32 acl_bufsize) @@ -820,18 +784,18 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, LASSERT(!policy); saved_flags |= LDLM_FL_HAS_INTENT; - if (it->it_op & (IT_UNLINK | IT_GETATTR | IT_READDIR)) + if (it->it_op & (IT_GETATTR | IT_READDIR)) policy = &update_policy; else if (it->it_op & IT_LAYOUT) policy = &layout_policy; - else if (it->it_op & (IT_GETXATTR | IT_SETXATTR)) + else if (it->it_op & IT_GETXATTR) policy = &getxattr_policy; else policy = &lookup_policy; } generation = obddev->u.cli.cl_import->imp_generation; - if (!it || (it->it_op & (IT_CREAT | IT_OPEN_CREAT))) + if (!it || (it->it_op & (IT_OPEN | IT_CREAT))) acl_bufsize = imp->imp_connect_data.ocd_max_easize; else acl_bufsize = LUSTRE_POSIX_ACL_MAX_SIZE_OLD; @@ -845,8 +809,6 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, res_id.name[3] = LDLM_FLOCK; } else if (it->it_op & IT_OPEN) { req = mdc_intent_open_pack(exp, it, op_data, acl_bufsize); - } else if (it->it_op & IT_UNLINK) { - req = mdc_intent_unlink_pack(exp, it, op_data); } else if (it->it_op & (IT_GETATTR | IT_LOOKUP)) { req = mdc_intent_getattr_pack(exp, it, op_data, acl_bufsize); } else if (it->it_op & IT_READDIR) { diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index ae573a2..70344b9 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -462,15 +462,6 @@ &RMF_FILE_SECCTX }; -static const struct req_msg_field *ldlm_intent_unlink_client[] = { - &RMF_PTLRPC_BODY, - &RMF_DLM_REQ, - &RMF_LDLM_INTENT, - &RMF_REC_REINT, /* coincides with mds_reint_unlink_client[] */ - &RMF_CAPA1, - &RMF_NAME -}; - static const struct req_msg_field *ldlm_intent_getxattr_client[] = { &RMF_PTLRPC_BODY, &RMF_DLM_REQ, @@ -756,7 +747,6 @@ &RQF_LDLM_INTENT_GETATTR, &RQF_LDLM_INTENT_OPEN, &RQF_LDLM_INTENT_CREATE, - &RQF_LDLM_INTENT_UNLINK, &RQF_LDLM_INTENT_GETXATTR, &RQF_LLOG_ORIGIN_HANDLE_CREATE, &RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK, @@ -1431,11 +1421,6 @@ struct req_format RQF_LDLM_INTENT_CREATE = ldlm_intent_create_client, ldlm_intent_getattr_server); EXPORT_SYMBOL(RQF_LDLM_INTENT_CREATE); -struct req_format RQF_LDLM_INTENT_UNLINK = - DEFINE_REQ_FMT0("LDLM_INTENT_UNLINK", - ldlm_intent_unlink_client, ldlm_intent_server); -EXPORT_SYMBOL(RQF_LDLM_INTENT_UNLINK); - struct req_format RQF_LDLM_INTENT_GETXATTR = DEFINE_REQ_FMT0("LDLM_INTENT_GETXATTR", ldlm_intent_getxattr_client, diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index dc9872cf3..249a3d5 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2190,19 +2190,19 @@ struct ldlm_flock_wire { enum ldlm_intent_flags { IT_OPEN = 0x00000001, IT_CREAT = 0x00000002, - IT_OPEN_CREAT = 0x00000003, - IT_READDIR = 0x00000004, + IT_OPEN_CREAT = IT_OPEN | IT_CREAT, /* To allow case label. */ + IT_READDIR = 0x00000004, /* Used by mdc, not put on the wire. */ IT_GETATTR = 0x00000008, IT_LOOKUP = 0x00000010, - IT_UNLINK = 0x00000020, - IT_TRUNC = 0x00000040, +/* IT_UNLINK = 0x00000020, Obsolete. */ +/* IT_TRUNC = 0x00000040, Obsolete. */ IT_GETXATTR = 0x00000080, - IT_EXEC = 0x00000100, - IT_PIN = 0x00000200, +/* IT_EXEC = 0x00000100, Obsolete. */ +/* IT_PIN = 0x00000200, Obsolete. */ IT_LAYOUT = 0x00000400, IT_QUOTA_DQACQ = 0x00000800, IT_QUOTA_CONN = 0x00001000, - IT_SETXATTR = 0x00002000, +/* IT_SETXATTR = 0x00002000, Obsolete. */ IT_GLIMPSE = 0x00004000, IT_BRW = 0x00008000, }; From patchwork Thu Feb 27 21:09:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DEF2138D for ; Thu, 27 Feb 2020 21:33:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66723246A1 for ; Thu, 27 Feb 2020 21:33:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66723246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BB05821F964; Thu, 27 Feb 2020 13:28:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6F4C921FAFB for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6F8A81025; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6D3E446C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:29 -0500 Message-Id: <1582838290-17243-102-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 101/622] lustre: llite: fix setstripe for specific osts upon dir X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong LOV_USER_MAGIC_SPECIFIC function is broken and it was not available for setting directory. 1) llite doesn't handle LOV_USER_MAGIC_SPECIFIC case properly for dir {set,get}_stripe, and ioctl LL_IOC_LOV_SETSTRIPE did not alloc enough buf, copy ost lists from userspace. 2) lod_get_default_lov_striping() did not handle LOV_USER_MAGIC_SPECIFIC type that newly created files/dir won't inherit parent setting well. 3) there is not any case to cover lfs setstripe '-o' interface which make it hard to figure out when this function was broken. WC-bug-id: https://jira.whamcloud.com/browse/LU-11146 Lustre-commit: 083d62ee6de5 ("LU-11146 lustre: fix setstripe for specific osts upon dir") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/32814 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 71 ++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 56 insertions(+), 15 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 751d0183..06f7bd3 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -541,6 +541,21 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, lum_size = sizeof(struct lmv_user_md); break; } + case LOV_USER_MAGIC_SPECIFIC: { + struct lov_user_md_v3 *v3 = + (struct lov_user_md_v3 *)lump; + if (v3->lmm_stripe_count > LOV_MAX_STRIPE_COUNT) + return -EINVAL; + if (lump->lmm_magic != + cpu_to_le32(LOV_USER_MAGIC_SPECIFIC)) { + lustre_swab_lov_user_md_v3(v3); + lustre_swab_lov_user_md_objects(v3->lmm_objects, + v3->lmm_stripe_count); + } + lum_size = lov_user_md_size(v3->lmm_stripe_count, + LOV_USER_MAGIC_SPECIFIC); + break; + } default: { CDEBUG(D_IOCTL, "bad userland LOV MAGIC: %#08x != %#08x nor %#08x\n", @@ -695,6 +710,16 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC) lustre_swab_lmv_user_md((struct lmv_user_md *)lmm); break; + case LOV_USER_MAGIC_SPECIFIC: { + struct lov_user_md_v3 *v3 = (struct lov_user_md_v3 *)lmm; + + if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) { + lustre_swab_lov_user_md_v3(v3); + lustre_swab_lov_user_md_objects(v3->lmm_objects, + v3->lmm_stripe_count); + } + } + break; default: CERROR("unknown magic: %lX\n", (unsigned long)lmm->lmm_magic); rc = -EPROTO; @@ -1230,35 +1255,51 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) } case LL_IOC_LOV_SETSTRIPE_NEW: case LL_IOC_LOV_SETSTRIPE: { - struct lov_user_md_v3 lumv3; - struct lov_user_md_v1 *lumv1 = (struct lov_user_md_v1 *)&lumv3; + struct lov_user_md_v3 *lumv3 = NULL; + struct lov_user_md_v1 lumv1; + struct lov_user_md_v1 *lumv1_ptr = &lumv1; struct lov_user_md_v1 __user *lumv1p = (void __user *)arg; struct lov_user_md_v3 __user *lumv3p = (void __user *)arg; + int lum_size; int set_default = 0; BUILD_BUG_ON(sizeof(struct lov_user_md_v3) <= sizeof(struct lov_comp_md_v1)); - BUILD_BUG_ON(sizeof(lumv3) != sizeof(*lumv3p)); - BUILD_BUG_ON(sizeof(lumv3.lmm_objects[0]) != - sizeof(lumv3p->lmm_objects[0])); + BUILD_BUG_ON(sizeof(*lumv3) != sizeof(*lumv3p)); /* first try with v1 which is smaller than v3 */ - if (copy_from_user(lumv1, lumv1p, sizeof(*lumv1))) + if (copy_from_user(&lumv1, lumv1p, sizeof(lumv1))) return -EFAULT; - if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) { - if (copy_from_user(&lumv3, lumv3p, sizeof(lumv3))) - return -EFAULT; - if (lumv3.lmm_magic != LOV_USER_MAGIC_V3) - return -EINVAL; - } - if (is_root_inode(inode)) set_default = 1; - /* in v1 and v3 cases lumv1 points to data */ - rc = ll_dir_setstripe(inode, lumv1, set_default); + switch (lumv1.lmm_magic) { + case LOV_USER_MAGIC_V3: + case LOV_USER_MAGIC_SPECIFIC: + lum_size = ll_lov_user_md_size(&lumv1); + if (lum_size < 0) + return lum_size; + lumv3 = kzalloc(lum_size, GFP_NOFS); + if (!lumv3) + return -ENOMEM; + if (copy_from_user(lumv3, lumv3p, lum_size)) { + rc = -EFAULT; + goto out; + } + lumv1_ptr = (struct lov_user_md_v1 *)lumv3; + break; + case LOV_USER_MAGIC_V1: + break; + default: + rc = -ENOTSUPP; + goto out; + } + /* in v1 and v3 cases lumv1 points to data */ + rc = ll_dir_setstripe(inode, lumv1_ptr, set_default); +out: + kfree(lumv3); return rc; } case LL_IOC_LMV_GETSTRIPE: { From patchwork Thu Feb 27 21:09:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11D3B138D for ; Thu, 27 Feb 2020 21:24:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EEBB5246A0 for ; Thu, 27 Feb 2020 21:24:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EEBB5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 50D9821FF8D; Thu, 27 Feb 2020 13:22:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C606221FAFB for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 723751026; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F79046A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:30 -0500 Message-Id: <1582838290-17243-103-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 102/622] lustre: osc: enable/disable OSC grant shrink X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam Add an OSC sysfs interface to enable/disable client's grant shrink feature. lctl get_param osc.*.grant_shrink lctl set_param osc.*.grant_shrink={0,1} WC-bug-id: https://jira.whamcloud.com/browse/LU-8708 Lustre-commit: 3e070e30a98d ("LU-8708 osc: enable/disable OSC grant shrink") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/23203 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index efb4998..16de266 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -674,6 +674,72 @@ static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, } LUSTRE_WO_ATTR(idle_connect); +static ssize_t grant_shrink_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct client_obd *cli = &obd->u.cli; + struct obd_connect_data *ocd; + ssize_t len; + + len = lprocfs_climp_check(obd); + if (len) + return len; + + ocd = &cli->cl_import->imp_connect_data; + + len = snprintf(buf, PAGE_SIZE, "%d\n", + !!OCD_HAS_FLAG(ocd, GRANT_SHRINK)); + up_read(&obd->u.cli.cl_sem); + + return len; +} + +static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct client_obd *cli = &dev->u.cli; + struct obd_connect_data *ocd; + bool val; + int rc; + + if (!dev) + return 0; + + rc = kstrtobool(buffer, &val); + if (rc) + return rc; + + rc = lprocfs_climp_check(dev); + if (rc) + return rc; + + ocd = &cli->cl_import->imp_connect_data; + + if (!val) { + if (OCD_HAS_FLAG(ocd, GRANT_SHRINK)) + ocd->ocd_connect_flags &= ~OBD_CONNECT_GRANT_SHRINK; + } else { + /** + * server replied obd_connect_data is always bigger, so + * client's imp_connect_flags_orig are always supported + * by the server + */ + if (!OCD_HAS_FLAG(ocd, GRANT_SHRINK) && + cli->cl_import->imp_connect_flags_orig & + OBD_CONNECT_GRANT_SHRINK) + ocd->ocd_connect_flags |= OBD_CONNECT_GRANT_SHRINK; + } + + up_read(&dev->u.cli.cl_sem); + + return count; +} +LUSTRE_RW_ATTR(grant_shrink); + LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); @@ -889,6 +955,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_ping.attr, &lustre_attr_idle_timeout.attr, &lustre_attr_idle_connect.attr, + &lustre_attr_grant_shrink.attr, NULL, }; From patchwork Thu Feb 27 21:09:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6D7D14BC for ; Thu, 27 Feb 2020 21:24:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF7E8246A0 for ; Thu, 27 Feb 2020 21:24:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF7E8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0062348B56; Thu, 27 Feb 2020 13:21:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1386021FAFB for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 742611027; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 728D246D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:31 -0500 Message-Id: <1582838290-17243-104-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 103/622] lustre: protocol: MDT as a statfs proxy X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev MDT can act as a proxy for statfs data. this should make df faster (RTT vs RTT*(#MDTs+1)) and enable idling connections so that clients don't connect to each OST just to report statfs data. the protocol has been changing slightly to let MDT differentiate self and aggregated statfs. also, obd_statfs has got a new field "granted" where OST reports how much space has been granted to the requesting MDT so that space can be added to available space. client's NID is used to distribute MDS_STATFS among MDTS. WC-bug-id: https://jira.whamcloud.com/browse/LU-10018 Lustre-commit: b500d5193360 ("LU-10018 protocol: MDT as a statfs proxy") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/29136 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 1 + fs/lustre/include/obd_class.h | 7 +++- fs/lustre/include/obd_support.h | 2 + fs/lustre/llite/llite_lib.c | 9 ++++- fs/lustre/lmv/lmv_obd.c | 65 ++++++++++++++++++++++++++------- fs/lustre/mdc/mdc_request.c | 13 +++++++ fs/lustre/ptlrpc/layout.c | 2 +- fs/lustre/ptlrpc/pack_generic.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 8 ++-- include/uapi/linux/lustre/lustre_idl.h | 3 +- include/uapi/linux/lustre/lustre_user.h | 7 ++-- 11 files changed, 92 insertions(+), 27 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 175a99f..9286755 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -442,6 +442,7 @@ struct lmv_obd { u32 tgts_size; /* size of tgts array */ struct lmv_tgt_desc **tgts; + int lmv_statfs_start; struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 0153c50..a3ef5d5 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -47,6 +47,8 @@ #define OBD_STATFS_FROM_CACHE 0x0002 /* the statfs is only for retrieving information from MDT0 */ #define OBD_STATFS_FOR_MDT0 0x0004 +/* get aggregated statfs from MDT */ +#define OBD_STATFS_SUM 0x0008 /* OBD Device Declarations */ extern rwlock_t obd_dev_lock; @@ -947,7 +949,10 @@ static inline int obd_statfs(const struct lu_env *env, struct obd_export *exp, CDEBUG(D_SUPER, "osfs %lld, max_age %lld\n", obd->obd_osfs_age, max_age); - if (obd->obd_osfs_age < max_age) { + /* ignore cache if aggregated isn't expected */ + if (obd->obd_osfs_age < max_age || + ((obd->obd_osfs.os_state & OS_STATE_SUM) && + !(flags & OBD_STATFS_SUM))) { rc = OBP(obd, statfs)(env, exp, osfs, max_age, flags); if (rc == 0) { spin_lock(&obd->obd_osfs_lock); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 28becfa..3d14723 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -137,7 +137,9 @@ #define OBD_FAIL_MDS_GET_ROOT_NET 0x11b #define OBD_FAIL_MDS_GET_ROOT_PACK 0x11c #define OBD_FAIL_MDS_STATFS_PACK 0x11d +#define OBD_FAIL_MDS_STATFS_SUM_PACK 0x11d #define OBD_FAIL_MDS_STATFS_NET 0x11e +#define OBD_FAIL_MDS_STATFS_SUM_NET 0x11e #define OBD_FAIL_MDS_GETATTR_NAME_NET 0x11f #define OBD_FAIL_MDS_PIN_NET 0x120 #define OBD_FAIL_MDS_UNPIN_NET 0x121 diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index c04146f..8b3e2a3 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -211,7 +211,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | - OBD_CONNECT2_DIR_MIGRATE; + OBD_CONNECT2_DIR_MIGRATE | + OBD_CONNECT2_SUM_STATFS; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; @@ -1751,6 +1752,9 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, osfs->os_bavail, osfs->os_blocks, osfs->os_ffree, osfs->os_files); + if (osfs->os_state & OS_STATE_SUM) + goto out; + if (sbi->ll_flags & LL_SBI_LAZYSTATFS) flags |= OBD_STATFS_NODELAY; @@ -1779,6 +1783,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, osfs->os_ffree = obd_osfs.os_ffree; } +out: return rc; } @@ -1793,7 +1798,7 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STAFS, 1); /* Some amount of caching on the client is allowed */ - rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, 0); + rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, OBD_STATFS_SUM); if (rc) return rc; diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index c7bf8c7..90a46c4 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1325,6 +1325,33 @@ static int lmv_process_config(struct obd_device *obd, u32 len, void *buf) return rc; } +static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) +{ + int i; + + if (flags & OBD_STATFS_FOR_MDT0) + return 0; + + if (lmv->lmv_statfs_start || lmv->desc.ld_tgt_count == 1) + return lmv->lmv_statfs_start; + + /* choose initial MDT for this client */ + for (i = 0;; i++) { + struct lnet_process_id lnet_id; + + if (LNetGetId(i, &lnet_id) == -ENOENT) + break; + + if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) { + lmv->lmv_statfs_start = + lnet_id.nid % lmv->desc.ld_tgt_count; + break; + } + } + + return lmv->lmv_statfs_start; +} + static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, struct obd_statfs *osfs, time64_t max_age, u32 flags) { @@ -1332,41 +1359,51 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, struct lmv_obd *lmv = &obd->u.lmv; struct obd_statfs *temp; int rc = 0; - u32 i; + u32 i, idx; temp = kzalloc(sizeof(*temp), GFP_NOFS); if (!temp) return -ENOMEM; - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp) + /* distribute statfs among MDTs */ + idx = lmv_select_statfs_mdt(lmv, flags); + + for (i = 0; i < lmv->desc.ld_tgt_count; i++, idx++) { + idx = idx % lmv->desc.ld_tgt_count; + if (!lmv->tgts[idx] || !lmv->tgts[idx]->ltd_exp) continue; - rc = obd_statfs(env, lmv->tgts[i]->ltd_exp, temp, + rc = obd_statfs(env, lmv->tgts[idx]->ltd_exp, temp, max_age, flags); if (rc) { CERROR("can't stat MDS #%d (%s), error %d\n", i, - lmv->tgts[i]->ltd_exp->exp_obd->obd_name, + lmv->tgts[idx]->ltd_exp->exp_obd->obd_name, rc); goto out_free_temp; } + if (temp->os_state & OS_STATE_SUM || + flags == OBD_STATFS_FOR_MDT0) { + /* Reset to the last aggregated values + * and don't sum with non-aggrated data. + * If the statfs is from mount, it needs to retrieve + * necessary information from MDT0. i.e. mount does + * not need the merged osfs from all of MDT. Also + * clients can be mounted as long as MDT0 is in + * service + */ + *osfs = *temp; + break; + } + if (i == 0) { *osfs = *temp; - /* If the statfs is from mount, it will needs - * retrieve necessary information from MDT0. - * i.e. mount does not need the merged osfs - * from all of MDT. - * And also clients can be mounted as long as - * MDT0 is in service - */ - if (flags & OBD_STATFS_FOR_MDT0) - goto out_free_temp; } else { osfs->os_bavail += temp->os_bavail; osfs->os_blocks += temp->os_blocks; osfs->os_ffree += temp->os_ffree; osfs->os_files += temp->os_files; + osfs->os_granted += temp->os_granted; } } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index b173937..3341761 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1495,6 +1495,19 @@ static int mdc_statfs(const struct lu_env *env, goto output; } + if ((flags & OBD_STATFS_SUM) && + (exp_connect_flags2(exp) & OBD_CONNECT2_SUM_STATFS)) { + /* request aggregated states */ + struct mdt_body *body; + + body = req_capsule_client_get(&req->rq_pill, &RMF_MDT_BODY); + if (!body) { + rc = -EPROTO; + goto out; + } + body->mbo_valid = OBD_MD_FLAGSTATFS; + } + ptlrpc_request_set_replen(req); if (flags & OBD_STATFS_NODELAY) { diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 70344b9..225a73e 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1252,7 +1252,7 @@ struct req_format RQF_MDS_GET_ROOT = EXPORT_SYMBOL(RQF_MDS_GET_ROOT); struct req_format RQF_MDS_STATFS = - DEFINE_REQ_FMT0("MDS_STATFS", empty, obd_statfs_server); + DEFINE_REQ_FMT0("MDS_STATFS", mdt_body_only, obd_statfs_server); EXPORT_SYMBOL(RQF_MDS_STATFS); struct req_format RQF_MDS_SYNC = diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index d09cf3f..e71f79d 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1645,7 +1645,7 @@ void lustre_swab_obd_statfs(struct obd_statfs *os) __swab32s(&os->os_state); __swab32s(&os->os_fprecreated); BUILD_BUG_ON(offsetof(typeof(*os), os_fprecreated) == 0); - BUILD_BUG_ON(offsetof(typeof(*os), os_spare2) == 0); + __swab32s(&os->os_granted); BUILD_BUG_ON(offsetof(typeof(*os), os_spare3) == 0); BUILD_BUG_ON(offsetof(typeof(*os), os_spare4) == 0); BUILD_BUG_ON(offsetof(typeof(*os), os_spare5) == 0); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 1afbb41..30083c2 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1696,10 +1696,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_fprecreated)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_fprecreated) == 4, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_fprecreated)); - LASSERTF((int)offsetof(struct obd_statfs, os_spare2) == 112, "found %lld\n", - (long long)(int)offsetof(struct obd_statfs, os_spare2)); - LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare2) == 4, "found %lld\n", - (long long)(int)sizeof(((struct obd_statfs *)0)->os_spare2)); + LASSERTF((int)offsetof(struct obd_statfs, os_granted) == 112, "found %lld\n", + (long long)(int)offsetof(struct obd_statfs, os_granted)); + LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_granted) == 4, "found %lld\n", + (long long)(int)sizeof(((struct obd_statfs *)0)->os_granted)); LASSERTF((int)offsetof(struct obd_statfs, os_spare3) == 116, "found %lld\n", (long long)(int)offsetof(struct obd_statfs, os_spare3)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare3) == 4, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 249a3d5..c65663a 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -793,6 +793,7 @@ struct ptlrpc_body_v2 { */ #define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ +#define OBD_CONNECT2_SUM_STATFS 0x8ULL /* MDT return aggregated stats */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ #define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents * for wbc, also operations @@ -1167,7 +1168,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLXATTRLS (0x0000002000000000ULL) /* xattr list */ #define OBD_MD_FLXATTRRM (0x0000004000000000ULL) /* xattr remove */ #define OBD_MD_FLACL (0x0000008000000000ULL) /* ACL */ -/* OBD_MD_FLRMTPERM (0x0000010000000000ULL) remote perm, obsolete */ +#define OBD_MD_FLAGSTATFS (0x0000010000000000ULL) /* aggregated statfs */ #define OBD_MD_FLMDSCAPA (0x0000020000000000ULL) /* MDS capability */ #define OBD_MD_FLOSSCAPA (0x0000040000000000ULL) /* OSS capability */ /* OBD_MD_FLCKSPLIT (0x0000080000000000ULL) obsolete 2.3.58*/ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 421c977..f25bb9b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -104,6 +104,7 @@ enum obd_statfs_state { OS_STATE_NOPRECREATE = 0x00000004, /**< no object precreation */ OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ + OS_STATE_SUM = 0x00000100, /**< aggregated for all tagrets */ }; struct obd_statfs { @@ -121,9 +122,9 @@ struct obd_statfs { __u32 os_fprecreated; /* objs available now to the caller * used in QoS code to find preferred OSTs */ - __u32 os_spare2; /* Unused padding fields. Remember */ - __u32 os_spare3; /* to fix lustre_swab_obd_statfs() */ - __u32 os_spare4; + __u32 os_granted; /* space granted for MDS */ + __u32 os_spare3; /* Unused padding fields. Remember */ + __u32 os_spare4; /* to fix lustre_swab_obd_statfs() */ __u32 os_spare5; __u32 os_spare6; __u32 os_spare7; From patchwork Thu Feb 27 21:09:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410233 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C426138D for ; Thu, 27 Feb 2020 21:33:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E92B9246A1 for ; Thu, 27 Feb 2020 21:33:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E92B9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 32EC93488D0; Thu, 27 Feb 2020 13:28:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69AE721FB55 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 778241029; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7564046F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:32 -0500 Message-Id: <1582838290-17243-105-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 104/622] lustre: ldlm: correct logic in ldlm_prepare_lru_list() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In ldlm_prepare_lru_list() fix an (x != a || x != b) type error and correct a use after free. WC-bug-id: https://jira.whamcloud.com/browse/LU-11075 Lustre-commit: aecafb57d5b6 ("LU-11075 ldlm: correct logic in ldlm_prepare_lru_list()") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32660 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index bc441f0..f045d30 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1643,7 +1643,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* No locks which got blocking requests. */ LASSERT(!ldlm_is_bl_ast(lock)); - if (!ldlm_is_canceling(lock) || + if (!ldlm_is_canceling(lock) && !ldlm_is_converting(lock)) break; @@ -1686,7 +1686,6 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, if (result == LDLM_POLICY_SKIP_LOCK) { lu_ref_del(&lock->l_reference, __func__, current); - LDLM_LOCK_RELEASE(lock); if (no_wait) { spin_lock(&ns->ns_lock); if (!list_empty(&lock->l_lru) && @@ -1694,6 +1693,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, ns->ns_last_pos = &lock->l_lru; spin_unlock(&ns->ns_lock); } + + LDLM_LOCK_RELEASE(lock); continue; } From patchwork Thu Feb 27 21:09:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E050E138D for ; Thu, 27 Feb 2020 21:24:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8E25246A0 for ; Thu, 27 Feb 2020 21:24:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8E25246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6810E348B7F; Thu, 27 Feb 2020 13:21:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AAA8821F9C5 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 79C7A102D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7821047C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:33 -0500 Message-Id: <1582838290-17243-106-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 105/622] lustre: llite: check truncate race for DOM pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin In ll_dom_finish_open() check vmpage mapping still exists after locking and exit otherwise. This can happen if page has been truncated concurrently. WC-bug-id: https://jira.whamcloud.com/browse/LU-11275 Lustre-commit: 0f7d7b200b58 ("LU-11275 llite: check truncate race for DOM pages") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33087 Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 68fb623..ae39b2c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -496,6 +496,13 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, break; } lock_page(vmpage); + if (!vmpage->mapping) { + unlock_page(vmpage); + put_page(vmpage); + /* page was truncated */ + rc = -ENODATA; + goto out_io; + } clp = cl_page_find(env, obj, vmpage->index, vmpage, CPT_CACHEABLE); if (IS_ERR(clp)) { From patchwork Thu Feb 27 21:09:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36D8414BC for ; Thu, 27 Feb 2020 21:24:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1F860246A0 for ; Thu, 27 Feb 2020 21:24:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F860246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB8B421FF75; Thu, 27 Feb 2020 13:22:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC8D421FB57 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7C87D102E; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7B05B468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:34 -0500 Message-Id: <1582838290-17243-107-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 106/622] lnet: lnd: conditionally set health status X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata For specific error scenarios a more accurate health status is set per transmit. These shouldn't be overwritten in kiblnd_txlist_done() WC-bug-id: https://jira.whamcloud.com/browse/LU-11271 Lustre-commit: cf3cc2c72e6e ("LU-11271 lnd: conditionally set health status") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33042 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 5680f2a..68ab7d5 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -110,7 +110,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, /* complete now */ tx->tx_waiting = 0; tx->tx_status = status; - tx->tx_hstatus = hstatus; + if (hstatus != LNET_MSG_STATUS_OK) + tx->tx_hstatus = hstatus; kiblnd_tx_done(tx); } } @@ -2108,9 +2109,11 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, spin_unlock(&conn->ibc_lock); /* aborting transmits occurs when finalizing the connection. - * The connection is finalized on error + * The connection is finalized on error. + * Passing LNET_MSG_STATUS_OK to txlist_done() will not + * override the value already set in tx->tx_hstatus above. */ - kiblnd_txlist_done(&zombies, -ECONNABORTED, -1); + kiblnd_txlist_done(&zombies, -ECONNABORTED, LNET_MSG_STATUS_OK); } static void From patchwork Thu Feb 27 21:09:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7E1014BC for ; Thu, 27 Feb 2020 21:24:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D0600246A0 for ; Thu, 27 Feb 2020 21:24:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0600246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 735DF348BD8; Thu, 27 Feb 2020 13:22:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B90F21FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7F085102F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7DEA746A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:35 -0500 Message-Id: <1582838290-17243-108-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 107/622] lnet: router handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Re-create the md and mdh if the router checker ping times out. When re-transmitting a message do so even if the peer is marked down to fulfill the message's retry quota. WC-bug-id: https://jira.whamcloud.com/browse/LU-11272 Lustre-commit: 05becd69bc0c ("LU-11272 lnet: router handling") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33043 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 12 ++++++++++-- net/lnet/lnet/router.c | 8 +++++++- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index eb0b48d..3cab970 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -678,7 +678,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * may drop the lnet_net_lock */ static int -lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp) +lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp, + struct lnet_msg *msg) { time64_t now = ktime_get_seconds(); @@ -689,6 +690,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return 1; /* + * If we're resending a message, let's attempt to send it even if + * the peer is down to fulfill our resend quota on the message + */ + if (msg->msg_retry_count > 0) + return 1; + + /* * Peer appears dead, but we should avoid frequent NI queries (at * most once per lnet_queryinterval seconds). */ @@ -746,7 +754,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, /* NB 'lp' is always the next hop */ if (!(msg->msg_target.pid & LNET_PID_USERFLAG) && - !lnet_peer_alive_locked(ni, lp)) { + !lnet_peer_alive_locked(ni, lp, msg)) { the_lnet.ln_counters[cpt]->drop_count++; the_lnet.ln_counters[cpt]->drop_length += msg->msg_len; lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 7c3bbd8..66a116c 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1042,7 +1042,13 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) } rcd = rtr->lpni_rcd; - if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis) + + /* The response to the router checker ping could've timed out and + * the mdh might've been invalidated, so we need to update it + * again. + */ + if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis || + LNetMDHandleIsInvalid(rcd->rcd_mdh)) rcd = lnet_update_rc_data_locked(rtr); if (!rcd) return; From patchwork Thu Feb 27 21:09:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409865 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A815914BC for ; Thu, 27 Feb 2020 21:24:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 90D24246A0 for ; Thu, 27 Feb 2020 21:24:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90D24246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B1B5F348C00; Thu, 27 Feb 2020 13:22:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F20721FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 826E41030; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 80CA746C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:36 -0500 Message-Id: <1582838290-17243-109-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 108/622] lustre: obd: check '-o network' and peer discovery conflict X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson "-o network=net" client mount option is not taken into account when LNet dynamic peer discovery is active. Check if LNet dynamic peer discovery is active on local node. If it is, return error if "-o network=net" option is specified. This patch will have to be reverted when the incompatibility between "-o network=net" client mount option and LNet dynamic peer discovery is resolved. WC-bug-id: https://jira.whamcloud.com/browse/LU-11057 Lustre-commit: 2269d27e07cb ("LU-11057 obd: check '-o network' and peer discovery conflict") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/32562 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_mount.c | 7 +++++++ include/linux/lnet/api.h | 1 + net/lnet/lnet/api-ni.c | 13 +++++++++++++ 3 files changed, 21 insertions(+) diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c index 5cf404c..d143112 100644 --- a/fs/lustre/obdclass/obd_mount.c +++ b/fs/lustre/obdclass/obd_mount.c @@ -1169,6 +1169,13 @@ int lmd_parse(char *options, struct lustre_mount_data *lmd) rc = lmd_parse_network(lmd, s1 + 8); if (rc) goto invalid; + + /* check if LNet dynamic peer discovery is activated */ + if (LNetGetPeerDiscoveryStatus()) { + CERROR("LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account.\n"); + goto invalid; + } + clear++; } diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index a57ecc8..4b152c8 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -207,6 +207,7 @@ int LNetGet(lnet_nid_t self, int LNetClearLazyPortal(int portal); int LNetCtl(unsigned int cmd, void *arg); void LNetDebugPeer(struct lnet_process_id id); +int LNetGetPeerDiscoveryStatus(void); /** @} lnet_misc */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 07bc29f..c81f46f 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -4038,3 +4038,16 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, kfree(buf); return rc; } + +/** + * Retrieve peer discovery status. + * + * Return 1 if lnet_peer_discovery_disabled is 0 + * 0 if lnet_peer_discovery_disabled is 1 + */ +int +LNetGetPeerDiscoveryStatus(void) +{ + return !lnet_peer_discovery_disabled; +} +EXPORT_SYMBOL(LNetGetPeerDiscoveryStatus); From patchwork Thu Feb 27 21:09:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F1BA138D for ; Thu, 27 Feb 2020 21:24:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8795B246A0 for ; Thu, 27 Feb 2020 21:24:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8795B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E897348C45; Thu, 27 Feb 2020 13:22:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C55B921FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 854771031; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 83AD946D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:37 -0500 Message-Id: <1582838290-17243-110-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 109/622] lnet: update logging X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add the retry count when logging message sending/resending. Make timed out responses visible on net error. Log cases when a message is not resent WC-bug-id: https://jira.whamcloud.com/browse/LU-11273 Lustre-commit: b9523f474346 ("LU-11273 lnet: update logging") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33044 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 13 +++++++------ net/lnet/lnet/lib-msg.c | 21 ++++++++++++++++++--- 2 files changed, 25 insertions(+), 9 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 3cab970..84a30e0 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1517,14 +1517,14 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, rc = lnet_post_send_locked(msg, 0); if (!rc) - CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s try# %d\n", libcfs_nid2str(msg->msg_hdr.src_nid), libcfs_nid2str(msg->msg_txni->ni_nid), libcfs_nid2str(sd->sd_src_nid), libcfs_nid2str(msg->msg_hdr.dest_nid), libcfs_nid2str(sd->sd_dst_nid), libcfs_nid2str(msg->msg_txpeer->lpni_nid), - lnet_msgtyp2str(msg->msg_type)); + lnet_msgtyp2str(msg->msg_type), msg->msg_retry_count); return rc; } @@ -2515,8 +2515,7 @@ struct lnet_mt_event_info { list_del_init(&rspt->rspt_on_list); - CDEBUG(D_NET, - "Response timed out: md = %p\n", md); + CNETERR("Response timed out: md = %p\n", md); LNetMDUnlink(rspt->rspt_mdh); lnet_rspt_free(rspt, i); } else { @@ -2579,11 +2578,13 @@ struct lnet_mt_event_info { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); - CDEBUG(D_NET, "resending %s->%s: %s recovery %d\n", + CDEBUG(D_NET, + "resending %s->%s: %s recovery %d try# %d\n", libcfs_nid2str(src_nid), libcfs_id2str(msg->msg_target), lnet_msgtyp2str(msg->msg_type), - msg->msg_recovery); + msg->msg_recovery, + msg->msg_retry_count); rc = lnet_send(src_nid, msg, LNET_NID_ANY); if (rc) { CERROR("Error sending %s to %s: %d\n", diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 5072238..9b52549 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -690,18 +690,33 @@ resend: /* don't resend recovery messages */ - if (msg->msg_recovery) + if (msg->msg_recovery) { + CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } /* if we explicitly indicated we don't want to resend then just * return */ - if (msg->msg_no_resend) + if (msg->msg_no_resend) { + CDEBUG(D_NET, "msg %s->%s requested no resend. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } /* check if the message has exceeded the number of retries */ - if (msg->msg_retry_count >= lnet_retry_count) + if (msg->msg_retry_count >= lnet_retry_count) { + CNETERR("msg %s->%s exceeded retry count %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } msg->msg_retry_count++; lnet_net_lock(msg->msg_tx_cpt); From patchwork Thu Feb 27 21:09:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410015 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E16951580 for ; Thu, 27 Feb 2020 21:27:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C6C31246A0 for ; Thu, 27 Feb 2020 21:27:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6C31246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 25636349241; Thu, 27 Feb 2020 13:24:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2991921FAF5 for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 87EE21032; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8677D46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:38 -0500 Message-Id: <1582838290-17243-111-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 110/622] lustre: ldlm: don't cancel DoM locks before replay X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Weigh a DOM locks before lock replay like that is done for OSC EXTENT locks and don't cancel locks with data. Add DoM replay tests for file creation and write cases. WC-bug-id: https://jira.whamcloud.com/browse/LU-10961 Lustre-commit: b44b1ff8c7fc ("LU-10961 ldlm: don't cancel DoM locks before replay") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32791 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 1 + fs/lustre/mdc/mdc_request.c | 6 ++++++ fs/lustre/osc/osc_lock.c | 22 ++++++++++++++-------- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 5ba4f97..dc8071a 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -714,6 +714,7 @@ void osc_lock_cancel(const struct lu_env *env, const struct cl_lock_slice *slice); void osc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice); int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data); +unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock); /**************************************************************************** * diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 3341761..0ee42dd 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2510,6 +2510,12 @@ static int mdc_cancel_weight(struct ldlm_lock *lock) if (lock->l_policy_data.l_inodebits.bits & MDS_INODELOCK_OPEN) return 0; + /* Special case for DoM locks, cancel only unused and granted locks */ + if (ldlm_has_dom(lock) && + (lock->l_granted_mode != lock->l_req_mode || + osc_ldlm_weigh_ast(lock) != 0)) + return 0; + return 1; } diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index b7b33fb..1a2b0bd 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -608,8 +608,8 @@ static bool weigh_cb(const struct lu_env *env, struct cl_io *io, struct cl_page *page = ops->ops_cl.cpl_page; if (cl_page_is_vmlocked(env, page) || - PageDirty(page->cp_vmpage) || PageWriteback(page->cp_vmpage) - ) + PageDirty(page->cp_vmpage) || + PageWriteback(page->cp_vmpage)) return false; *(pgoff_t *)cbdata = osc_index(ops) + 1; @@ -618,7 +618,7 @@ static bool weigh_cb(const struct lu_env *env, struct cl_io *io, static unsigned long osc_lock_weight(const struct lu_env *env, struct osc_object *oscobj, - struct ldlm_extent *extent) + loff_t start, loff_t end) { struct cl_io *io = osc_env_thread_io(env); struct cl_object *obj = cl_object_top(&oscobj->oo_cl); @@ -631,11 +631,10 @@ static unsigned long osc_lock_weight(const struct lu_env *env, if (result != 0) return result; - page_index = cl_index(obj, extent->start); + page_index = cl_index(obj, start); if (!osc_page_gang_lookup(env, io, oscobj, - page_index, - cl_index(obj, extent->end), + page_index, cl_index(obj, end), weigh_cb, (void *)&page_index)) result = 1; cl_io_fini(env, io); @@ -668,7 +667,8 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) /* Mostly because lack of memory, do not eliminate this lock */ return 1; - LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT); + LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT || + ldlm_has_dom(dlmlock)); lock_res_and_lock(dlmlock); obj = dlmlock->l_ast_data; if (obj) @@ -695,7 +695,12 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) goto out; } - weight = osc_lock_weight(env, obj, &dlmlock->l_policy_data.l_extent); + if (ldlm_has_dom(dlmlock)) + weight = osc_lock_weight(env, obj, 0, OBD_OBJECT_EOF); + else + weight = osc_lock_weight(env, obj, + dlmlock->l_policy_data.l_extent.start, + dlmlock->l_policy_data.l_extent.end); out: if (obj) @@ -704,6 +709,7 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) cl_env_put(env, &refcheck); return weight; } +EXPORT_SYMBOL(osc_ldlm_weigh_ast); static void osc_lock_build_einfo(const struct lu_env *env, const struct cl_lock *lock, From patchwork Thu Feb 27 21:09:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB245138D for ; Thu, 27 Feb 2020 21:24:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B3B2E246A0 for ; Thu, 27 Feb 2020 21:24:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3B2E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFE0D348C35; Thu, 27 Feb 2020 13:22:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8015E21FA79 for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8AA961037; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 893FB468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:39 -0500 Message-Id: <1582838290-17243-112-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 111/622] lnet: lnd: Clean up logging X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata No need to output error in ksocknal_tx_done() as this error is tracked in lnet. No need to keep a cookie in the connection. It's always set to the message. This will allow us to set the msg's health status properly before calling lnet_finalize() WC-bug-id: https://jira.whamcloud.com/browse/LU-11309 Lustre-commit: cdf462b19345 ("LU-11309 lnd: Clean up logging") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33096 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 5 ++++- net/lnet/klnds/socklnd/socklnd.h | 3 +-- net/lnet/klnds/socklnd/socklnd_cb.c | 10 +++++----- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 891d3bd..72ecf80 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1680,7 +1680,10 @@ struct ksock_peer * &conn->ksnc_ipaddr, conn->ksnc_port, iov_iter_count(&conn->ksnc_rx_to), conn->ksnc_rx_nob_left, ktime_get_seconds() - last_rcv); - lnet_finalize(conn->ksnc_cookie, -EIO); + if (conn->ksnc_lnet_msg) + conn->ksnc_lnet_msg->msg_health_status = + LNET_MSG_STATUS_REMOTE_ERROR; + lnet_finalize(conn->ksnc_lnet_msg, -EIO); break; case SOCKNAL_RX_LNET_HEADER: if (conn->ksnc_rx_started) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 48884cf..c8d8acf 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -355,8 +355,7 @@ struct ksock_conn { u32 ksnc_rx_csum; /* partial checksum for incoming * data */ - void *ksnc_cookie; /* rx lnet_finalize passthru arg - */ + struct lnet_msg *ksnc_lnet_msg; /* rx lnet_finalize arg */ struct ksock_msg ksnc_msg; /* incoming message buffer: * V2.x message takes the * whole struct diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 057c7f3..10a1934 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -344,9 +344,6 @@ struct ksock_tx * ksocknal_free_tx(tx); if (lnetmsg) { /* KSOCK_MSG_NOOP go without lnetmsg */ - if (rc) - CERROR("tx failure rc = %d, hstatus = %d\n", rc, - hstatus); lnetmsg->msg_health_status = hstatus; lnet_finalize(lnetmsg, rc); } @@ -1266,7 +1263,10 @@ struct ksock_route * le64_to_cpu(lhdr->src_nid) != id->nid); } - lnet_finalize(conn->ksnc_cookie, rc); + if (rc && conn->ksnc_lnet_msg) + conn->ksnc_lnet_msg->msg_health_status = + LNET_MSG_STATUS_REMOTE_ERROR; + lnet_finalize(conn->ksnc_lnet_msg, rc); if (rc) { ksocknal_new_packet(conn, 0); @@ -1300,7 +1300,7 @@ struct ksock_route * LASSERT(iov_iter_count(to) <= rlen); LASSERT(to->nr_segs <= LNET_MAX_IOV); - conn->ksnc_cookie = msg; + conn->ksnc_lnet_msg = msg; conn->ksnc_rx_nob_left = rlen; conn->ksnc_rx_to = *to; From patchwork Thu Feb 27 21:09:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 731E5138D for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5B935246A1 for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B935246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A423C348943; Thu, 27 Feb 2020 13:21:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D996B21FAEC for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8D3901038; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C0D146A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:40 -0500 Message-Id: <1582838290-17243-113-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 112/622] lustre: mdt: revoke lease lock for truncate X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jian Yu Lustre lease lock is usually used to protect file data against concurrent access. Open lock used on MDT side is for this purpose. However, truncate will change file data but it doesn't revoke lease lock. This patch fixes the issue by acquiring open sem, checking lease count and revoking lease if there exists any pending lease on the file. WC-bug-id: https://jira.whamcloud.com/browse/LU-10660 Lustre-commit: e4c168165df2 ("LU-10660 mdt: revoke lease lock for truncate") Signed-off-by: Jian Yu Reviewed-on: https://review.whamcloud.com/33093 Reviewed-by: Andreas Dilger Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 7 +++++++ include/uapi/linux/lustre/lustre_idl.h | 1 + 2 files changed, 8 insertions(+) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 8b3e2a3..37558a8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1616,6 +1616,13 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, clear_bit(LLIF_DATA_MODIFIED, &lli->lli_flags); } + if (attr->ia_valid & ATTR_FILE) { + struct ll_file_data *fd = LUSTRE_FPRIVATE(attr->ia_file); + + if (fd->fd_lease_och) + op_data->op_bias |= MDS_TRUNC_KEEP_LEASE; + } + op_data->op_attr = *attr; op_data->op_xvalid = xvalid; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index c65663a..7f857be 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1700,6 +1700,7 @@ enum mds_op_bias { MDS_CLOSE_LAYOUT_MERGE = 1 << 15, MDS_CLOSE_RESYNC_DONE = 1 << 16, MDS_CLOSE_LAYOUT_SPLIT = 1 << 17, + MDS_TRUNC_KEEP_LEASE = 1 << 18, }; #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP | \ From patchwork Thu Feb 27 21:09:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C576E14BC for ; Thu, 27 Feb 2020 21:24:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ABF48246A0 for ; Thu, 27 Feb 2020 21:24:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABF48246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76EA3348C82; Thu, 27 Feb 2020 13:22:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 26EBC21FA8C for ; Thu, 27 Feb 2020 13:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 906ED1039; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8EDD946C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:41 -0500 Message-Id: <1582838290-17243-114-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 113/622] lustre: ptlrpc: race in AT early reply X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In ptlrpc_at_check_timed, the refcount of the request could be already dropped to zero, the ptlrpc_server_drop_request could continue without the "scp_at_lock" and free the request by writing 0x5a5a5a5a5a5a5a5a to the memory, but the following "atomic_inc_not_zero(&rq->rq_refcount)" will return nonzero and cause freed request to be used in ptlrpc_at_send_early_reply. WC-bug-id: https://jira.whamcloud.com/browse/LU-11281 Lustre-commit: 48e409e65edd ("LU-11281 ptlrpc: race in AT early reply") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/33071 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index cf920ae..a9155b2 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1224,14 +1224,18 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) break; } - ptlrpc_at_remove_timed(rq); /** * ptlrpc_server_drop_request() may drop * refcount to 0 already. Let's check this and * don't add entry to work_list */ - if (likely(atomic_inc_not_zero(&rq->rq_refcount))) + if (likely(atomic_inc_not_zero(&rq->rq_refcount))) { + ptlrpc_at_remove_timed(rq); list_add(&rq->rq_timed_list, &work_list); + } else { + ptlrpc_at_remove_timed(rq); + } + counter++; } From patchwork Thu Feb 27 21:09:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9A8414BC for ; Thu, 27 Feb 2020 21:24:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C21DA246A0 for ; Thu, 27 Feb 2020 21:24:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C21DA246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 830B3348C76; Thu, 27 Feb 2020 13:22:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6957421FADE for ; Thu, 27 Feb 2020 13:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 934CA103B; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 91F3A47C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:42 -0500 Message-Id: <1582838290-17243-115-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 114/622] lustre: migrate: migrate striped directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Migrate striped directory in below steps: 1. create target object if needed: if source is directory, a target object is always created, otherwise if source is already located on the target MDT, or source still has link on source MDT, then skip creating. a) if source is directory, detach source stripes and attach them to target. b) migrate source xattrs to target. c) if source is regular file, update PFID to target fid. d) update fid to target for all links of source 2. update namespace a) migrate dirent from source parent to target parent. b) update linkea parent fid to target parent. c) destroy source object. This implementation improves following fields: 1. all involved objects are locked to avoid race. 2. directory migration doesn't migrate its dir entries, instead it's done in each sub file migration, this avoids timeout in migrating dir entries for large directory, and also avoids touching dir entries without lock. 3. file/dir is migrated in one transaction, so migrate recovery is the same as others. 4. migrating directory can be accessed (modifiable) like normal directory. 5. if migration of sub files under a directory fails, user can redo migrate to finish migration of this directory. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 169738e30a7e ("LU-4684 migrate: migrate striped directory") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31427 Reviewed-by: Andreas Dilger Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 24 ++- fs/lustre/include/lustre_lmv.h | 18 +- fs/lustre/llite/file.c | 11 + fs/lustre/llite/llite_lib.c | 90 +++++---- fs/lustre/lmv/lmv_internal.h | 15 +- fs/lustre/lmv/lmv_obd.c | 357 ++++++++++++++++++++++----------- fs/lustre/mdc/mdc_internal.h | 2 + fs/lustre/mdc/mdc_lib.c | 45 +++-- fs/lustre/mdc/mdc_reint.c | 5 +- fs/lustre/ptlrpc/wiretest.c | 16 +- include/uapi/linux/lustre/lustre_idl.h | 16 +- 11 files changed, 403 insertions(+), 196 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index e49954c..a709ad7 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1229,6 +1229,26 @@ struct lu_name { int ln_namelen; }; +static inline bool name_is_dot_or_dotdot(const char *name, int namelen) +{ + return name[0] == '.' && + (namelen == 1 || (namelen == 2 && name[1] == '.')); +} + +static inline bool lu_name_is_dot_or_dotdot(const struct lu_name *lname) +{ + return name_is_dot_or_dotdot(lname->ln_name, lname->ln_namelen); +} + +static inline bool lu_name_is_valid_len(const char *name, size_t name_len) +{ + return name && + name_len > 0 && + name_len < INT_MAX && + strlen(name) == name_len && + memchr(name, '/', name_len) == NULL; +} + /** * Validate names (path components) * @@ -1240,9 +1260,7 @@ struct lu_name { */ static inline bool lu_name_is_valid_2(const char *name, size_t name_len) { - return name && name_len > 0 && name_len < INT_MAX && - name[name_len] == '\0' && strlen(name) == name_len && - !memchr(name, '/', name_len); + return lu_name_is_valid_len(name, name_len) && name[name_len] == '\0'; } /** diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index 5e15c62..ff279e1 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -47,6 +47,8 @@ struct lmv_stripe_md { u32 lsm_md_master_mdt_index; u32 lsm_md_hash_type; u32 lsm_md_layout_version; + u32 lsm_md_migrate_offset; + u32 lsm_md_migrate_hash; u32 lsm_md_default_count; u32 lsm_md_default_index; char lsm_md_pool_name[LOV_MAXPOOLNAME + 1]; @@ -63,6 +65,10 @@ struct lmv_stripe_md { lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index || lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type || lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version || + lsm1->lsm_md_migrate_offset != + lsm2->lsm_md_migrate_offset || + lsm1->lsm_md_migrate_hash != + lsm2->lsm_md_migrate_hash || strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name) != 0) return false; @@ -137,18 +143,14 @@ static inline int lmv_name_to_stripe_index(u32 lmv_hash_type, unsigned int stripe_count, const char *name, int namelen) { - u32 hash_type = lmv_hash_type & LMV_HASH_TYPE_MASK; int idx; LASSERT(namelen > 0); - if (stripe_count <= 1) - return 0; - /* for migrating object, always start from 0 stripe */ - if (lmv_hash_type & LMV_HASH_FLAG_MIGRATION) + if (stripe_count <= 1) return 0; - switch (hash_type) { + switch (lmv_hash_type & LMV_HASH_TYPE_MASK) { case LMV_HASH_TYPE_ALL_CHARS: idx = lmv_hash_all_chars(stripe_count, name, namelen); break; @@ -159,8 +161,8 @@ static inline int lmv_name_to_stripe_index(u32 lmv_hash_type, idx = -EBADFD; break; } - CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name, - hash_type, idx); + CDEBUG(D_INFO, "name %.*s hash_type %#x idx %d/%u\n", namelen, name, + lmv_hash_type, idx, stripe_count); return idx; } diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index ae39b2c..fd39948 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3836,6 +3836,17 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (!child_inode) return -ENOENT; + if (!(exp_connect_flags2(ll_i2sbi(parent)->ll_md_exp) & + OBD_CONNECT2_DIR_MIGRATE)) { + if (le32_to_cpu(lum->lum_stripe_count) > 1 || + ll_i2info(child_inode)->lli_lsm_md) { + CERROR("%s: MDT doesn't support stripe directory migration!\n", + ll_get_fsname(parent->i_sb, NULL, 0)); + rc = -EOPNOTSUPP; + goto out_iput; + } + } + /* * lfs migrate command needs to be blocked on the client * by checking the migrate FID against the FID of the diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 37558a8..636ddf8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1254,14 +1254,8 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) * different, so it reset lsm_md to NULL to avoid * initializing lsm for slave inode. */ - /* For migrating inode, master stripe and master object will - * be same, so we only need assign this inode - */ - if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION && !i) - lsm->lsm_md_oinfo[i].lmo_root = inode; - else - lsm->lsm_md_oinfo[i].lmo_root = - ll_iget_anon_dir(inode->i_sb, fid, md); + lsm->lsm_md_oinfo[i].lmo_root = + ll_iget_anon_dir(inode->i_sb, fid, md); if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) { int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root); @@ -1273,20 +1267,6 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) return 0; } -static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1, - const struct lmv_stripe_md *lsm_md2) -{ - return lsm_md1->lsm_md_magic == lsm_md2->lsm_md_magic && - lsm_md1->lsm_md_stripe_count == lsm_md2->lsm_md_stripe_count && - lsm_md1->lsm_md_master_mdt_index == - lsm_md2->lsm_md_master_mdt_index && - lsm_md1->lsm_md_hash_type == lsm_md2->lsm_md_hash_type && - lsm_md1->lsm_md_layout_version == - lsm_md2->lsm_md_layout_version && - !strcmp(lsm_md1->lsm_md_pool_name, - lsm_md2->lsm_md_pool_name); -} - static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) { struct ll_inode_info *lli = ll_i2info(inode); @@ -1297,27 +1277,53 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) CDEBUG(D_INODE, "update lsm %p of " DFID "\n", lli->lli_lsm_md, PFID(ll_inode2fid(inode))); - /* no striped information from request. */ - if (!lsm) { - if (!lli->lli_lsm_md) { - return 0; - } else if (lli->lli_lsm_md->lsm_md_hash_type & - LMV_HASH_FLAG_MIGRATION) { - /* - * migration is done, the temporay MIGRATE layout has - * been removed - */ - CDEBUG(D_INODE, DFID " finish migration.\n", - PFID(ll_inode2fid(inode))); - lmv_free_memmd(lli->lli_lsm_md); - lli->lli_lsm_md = NULL; - return 0; - } - /* - * The lustre_md from req does not include stripeEA, - * see ll_md_setattr - */ + /* + * no striped information from request, lustre_md from req does not + * include stripeEA, see ll_md_setattr() + */ + if (!lsm) return 0; + + /* Compare the old and new stripe information */ + if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { + struct lmv_stripe_md *old_lsm = lli->lli_lsm_md; + bool layout_changed = lsm->lsm_md_layout_version > + old_lsm->lsm_md_layout_version; + int mask = layout_changed ? D_INODE : D_ERROR; + int idx; + + CDEBUG(mask, + "%s: inode@%p "DFID" lmv layout %s magic %#x/%#x stripe count %d/%d master_mdt %d/%d hash_type %#x/%#x version %d/%d migrate offset %d/%d migrate hash %#x/%#x pool %s/%s\n", + ll_get_fsname(inode->i_sb, NULL, 0), inode, + PFID(&lli->lli_fid), + layout_changed ? "changed" : "mismatch", + lsm->lsm_md_magic, old_lsm->lsm_md_magic, + lsm->lsm_md_stripe_count, + old_lsm->lsm_md_stripe_count, + lsm->lsm_md_master_mdt_index, + old_lsm->lsm_md_master_mdt_index, + lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type, + lsm->lsm_md_layout_version, + old_lsm->lsm_md_layout_version, + lsm->lsm_md_migrate_offset, + old_lsm->lsm_md_migrate_offset, + lsm->lsm_md_migrate_hash, + old_lsm->lsm_md_migrate_hash, + lsm->lsm_md_pool_name, + old_lsm->lsm_md_pool_name); + + for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) + CDEBUG(mask, "old stripe[%d] "DFID"\n", + idx, PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid)); + + for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) + CDEBUG(mask, "new stripe[%d] "DFID"\n", + idx, PFID(&lsm->lsm_md_oinfo[idx].lmo_fid)); + + if (!layout_changed) + return -EINVAL; + + ll_dir_clear_lsm_md(inode); } /* set the directory layout */ diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index 6794f11..c4a2fb8 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -123,18 +123,21 @@ static inline int lmv_stripe_md_size(int stripe_count) return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]); } -int lmv_name_to_stripe_index(enum lmv_hash_type hashtype, - unsigned int max_mdt_index, - const char *name, int namelen); - +/* for file under migrating directory, return the target stripe info */ static inline const struct lmv_oinfo * lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name, int namelen) { + u32 hash_type = lsm->lsm_md_hash_type; + u32 stripe_count = lsm->lsm_md_stripe_count; int stripe_index; - stripe_index = lmv_name_to_stripe_index(lsm->lsm_md_hash_type, - lsm->lsm_md_stripe_count, + if (hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type &= ~LMV_HASH_FLAG_MIGRATION; + stripe_count = lsm->lsm_md_migrate_offset; + } + + stripe_index = lmv_name_to_stripe_index(hash_type, stripe_count, name, namelen); if (stripe_index < 0) return ERR_PTR(stripe_index); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 90a46c4..3ddffd8 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1836,154 +1836,284 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data, return md_link(tgt->ltd_exp, op_data, request); } -static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, - const char *old, size_t oldlen, - const char *new, size_t newlen, - struct ptlrpc_request **request) +static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, + const char *name, size_t namelen, + struct ptlrpc_request **request) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct obd_export *target_exp; - struct lmv_tgt_desc *src_tgt; - struct lmv_tgt_desc *tgt_tgt; - struct mdt_body *body; + struct lmv_stripe_md *lsm = op_data->op_mea1; + struct lmv_tgt_desc *parent_tgt; + struct lmv_tgt_desc *sp_tgt; + struct lmv_tgt_desc *tp_tgt = NULL; + struct lmv_tgt_desc *child_tgt; + struct lmv_tgt_desc *tgt; + struct lu_fid target_fid; int rc; - LASSERT(oldlen != 0); + LASSERT(op_data->op_cli_flags & CLI_MIGRATE); + LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n", + PFID(&op_data->op_fid3)); - CDEBUG(D_INODE, "RENAME %.*s in " DFID ":%d to %.*s in " DFID ":%d\n", - (int)oldlen, old, PFID(&op_data->op_fid1), - op_data->op_mea1 ? op_data->op_mea1->lsm_md_stripe_count : 0, - (int)newlen, new, PFID(&op_data->op_fid2), - op_data->op_mea2 ? op_data->op_mea2->lsm_md_stripe_count : 0); + CDEBUG(D_INODE, "MIGRATE "DFID"/%.*s\n", + PFID(&op_data->op_fid1), (int)namelen, name); op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid()); op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - if (op_data->op_cli_flags & CLI_MIGRATE) { - LASSERTF(fid_is_sane(&op_data->op_fid3), - "invalid FID " DFID "\n", - PFID(&op_data->op_fid3)); - - if (op_data->op_mea1) { - struct lmv_stripe_md *lsm = op_data->op_mea1; - struct lmv_tgt_desc *tmp; - - /* Fix the parent fid for striped dir */ - tmp = lmv_locate_target_for_name(lmv, lsm, old, - oldlen, - &op_data->op_fid1, - NULL); - if (IS_ERR(tmp)) - return PTR_ERR(tmp); + parent_tgt = lmv_find_target(lmv, &op_data->op_fid1); + if (IS_ERR(parent_tgt)) + return PTR_ERR(parent_tgt); + + if (lsm) { + u32 hash_type = lsm->lsm_md_hash_type; + u32 stripe_count = lsm->lsm_md_stripe_count; + + /* + * old stripes are appended after new stripes for migrating + * directory. + */ + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type = lsm->lsm_md_migrate_hash; + stripe_count -= lsm->lsm_md_migrate_offset; } - rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data); - if (rc) + rc = lmv_name_to_stripe_index(hash_type, stripe_count, name, + namelen); + if (rc < 0) return rc; - src_tgt = lmv_find_target(lmv, &op_data->op_fid3); - if (IS_ERR(src_tgt)) - return PTR_ERR(src_tgt); - target_exp = src_tgt->ltd_exp; - } else { - if (op_data->op_mea1) { - struct lmv_stripe_md *lsm = op_data->op_mea1; + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) + rc += lsm->lsm_md_migrate_offset; - src_tgt = lmv_locate_target_for_name(lmv, lsm, old, - oldlen, - &op_data->op_fid1, - &op_data->op_mds); - } else { - src_tgt = lmv_find_target(lmv, &op_data->op_fid1); - } - if (IS_ERR(src_tgt)) - return PTR_ERR(src_tgt); + /* save it in fid4 temporarily for early cancel */ + op_data->op_fid4 = lsm->lsm_md_oinfo[rc].lmo_fid; + sp_tgt = lmv_get_target(lmv, lsm->lsm_md_oinfo[rc].lmo_mds, + NULL); + if (IS_ERR(sp_tgt)) + return PTR_ERR(sp_tgt); - if (op_data->op_mea2) { - struct lmv_stripe_md *lsm = op_data->op_mea2; - - tgt_tgt = lmv_locate_target_for_name(lmv, lsm, new, - newlen, - &op_data->op_fid2, - &op_data->op_mds); - } else { - tgt_tgt = lmv_find_target(lmv, &op_data->op_fid2); + /* + * if parent is being migrated too, fill op_fid2 with target + * stripe fid, otherwise the target stripe is not created yet. + */ + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type = lsm->lsm_md_hash_type & + ~LMV_HASH_FLAG_MIGRATION; + stripe_count = lsm->lsm_md_migrate_offset; + + rc = lmv_name_to_stripe_index(hash_type, stripe_count, + name, namelen); + if (rc < 0) + return rc; + + op_data->op_fid2 = lsm->lsm_md_oinfo[rc].lmo_fid; + tp_tgt = lmv_get_target(lmv, + lsm->lsm_md_oinfo[rc].lmo_mds, + NULL); + if (IS_ERR(tp_tgt)) + return PTR_ERR(tp_tgt); } - if (IS_ERR(tgt_tgt)) - return PTR_ERR(tgt_tgt); - - target_exp = tgt_tgt->ltd_exp; + } else { + sp_tgt = parent_tgt; } - /* - * LOOKUP lock on src child (fid3) should also be cancelled for - * src_tgt in mdc_rename. - */ - op_data->op_flags |= MF_MDC_CANCEL_FID1 | MF_MDC_CANCEL_FID3; + child_tgt = lmv_find_target(lmv, &op_data->op_fid3); + if (IS_ERR(child_tgt)) + return PTR_ERR(child_tgt); - /* - * Cancel UPDATE locks on tgt parent (fid2), tgt_tgt is its - * own target. - */ - rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_UPDATE, - MF_MDC_CANCEL_FID2); + rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); if (rc) return rc; + /* - * Cancel LOOKUP locks on source child (fid3) for parent tgt_tgt. + * for directory, send migrate request to the MDT where the object will + * be migrated to, because we can't create a striped directory remotely. + * + * otherwise, send to the MDT where source is located because regular + * file may open lease. + * + * NB. if MDT doesn't support DIR_MIGRATE, send to source MDT too for + * backward compatibility. */ - if (fid_is_sane(&op_data->op_fid3)) { - struct lmv_tgt_desc *tgt; - - tgt = lmv_find_target(lmv, &op_data->op_fid1); + if (S_ISDIR(op_data->op_mode) && + (exp_connect_flags2(exp) & OBD_CONNECT2_DIR_MIGRATE)) { + tgt = lmv_find_target(lmv, &target_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); + } else { + tgt = child_tgt; + } - /* Cancel LOOKUP lock on its parent */ - rc = lmv_early_cancel(exp, tgt, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_LOOKUP, - MF_MDC_CANCEL_FID3); + /* cancel UPDATE lock of parent master object */ + rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); + if (rc) + return rc; + + /* cancel UPDATE lock of source parent */ + if (sp_tgt != parent_tgt) { + /* + * migrate RPC packs master object FID, because we can only pack + * two FIDs in reint RPC, but MDS needs to know both source + * parent and target parent, and it will obtain them from master + * FID and LMV, the other FID in RPC is kept for target. + * + * since this FID is not passed to MDC, cancel it anyway. + */ + rc = lmv_ea