From patchwork Thu Feb 27 21:07:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DE0314BC for ; Thu, 27 Feb 2020 21:18:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 36A00246A1 for ; Thu, 27 Feb 2020 21:18:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36A00246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 201C021FACA; Thu, 27 Feb 2020 13:18:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B5F321F906 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 37C006C5; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2DBBD46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:49 -0500 Message-Id: <1582838290-17243-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 001/622] lustre: always enable special debugging, fhandles, and quota support. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Lustre heavily depends on fhandles for its FID handling and needs quota always enabled. Signed-off-by: James Simmons --- fs/lustre/Kconfig | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/lustre/Kconfig b/fs/lustre/Kconfig index 2ea3f24..2eb7e45 100644 --- a/fs/lustre/Kconfig +++ b/fs/lustre/Kconfig @@ -9,6 +9,9 @@ config LUSTRE_FS select CRYPTO_SHA1 select CRYPTO_SHA256 select CRYPTO_SHA512 + select DEBUG_FS + select FHANDLE + select QUOTA depends on MULTIUSER help This option enables Lustre file system client support. Choose Y @@ -43,6 +46,7 @@ config LUSTRE_FS_POSIX_ACL config LUSTRE_DEBUG_EXPENSIVE_CHECK bool "Enable Lustre DEBUG checks" + select REFCOUNT_FULL depends on LUSTRE_FS help This option is mainly for debug purpose. It enables Lustre code to do From patchwork Thu Feb 27 21:07:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0447314BC for ; Thu, 27 Feb 2020 21:18:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E0CC1246A1 for ; Thu, 27 Feb 2020 21:18:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0CC1246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C8E021FA5C; Thu, 27 Feb 2020 13:18:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DBDD21F906 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 39A6B6C9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 308C846C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:50 -0500 Message-Id: <1582838290-17243-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 002/622] lustre: osc_cache: remove __might_sleep() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The patch 'simplify osc_wake_cache_waiters()' created a new wrapper wait_event_idle_exclusive_timeout_cmd() which includes a __might_sleep() test. This was causing the following back trace: kernel: BUG: sleeping function called from invalid context at fs/lustre/osc/osc_cache.c:1635 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19374, name: cp kernel: INFO: lockdep is turned off. kernel: Preemption disabled at: kernel: [<0000000000000000>] 0x0 kernel: CPU: 11 PID: 19374 Comm: cp Tainted: G W 5.4.0-rc5+ #1 kernel: Call Trace: kernel: dump_stack+0x5e/0x8b kernel: ___might_sleep+0x205/0x260 kernel: osc_queue_async_io+0x1104/0x1de0 [osc] kernel: ? _raw_spin_unlock+0x2e/0x50 kernel: ? libcfs_debug_msg+0x6ab/0xc80 [libcfs] kernel: ? vvp_io_setattr_start+0x200/0x200 [lustre] kernel: osc_page_cache_add+0x2c/0xa0 [osc] kernel: osc_io_commit_async+0x1a8/0x420 [osc] kernel: cl_io_commit_async+0x58/0x80 [obdclass] kernel: ? vvp_io_setattr_start+0x200/0x200 [lustre:1 This can be called from an atomic context and examing the code suggest we don't need __might_sleep() so lets remove it. Fixes: def8e96d4f3d ("lustre: osc_cache: simplify osc_wake_cache_waiters()") Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 3189eb3..2ed7ca2 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1570,7 +1570,6 @@ static bool osc_enter_cache_try(struct client_obd *cli, cmd1, cmd2) \ ({ \ long __ret = timeout; \ - might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_idle_exclusive_timeout_cmd( \ wq_head, condition, timeout, cmd1, cmd2); \ From patchwork Thu Feb 27 21:07:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBE0D138D for ; Thu, 27 Feb 2020 21:18:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D41A7246A1 for ; Thu, 27 Feb 2020 21:18:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D41A7246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C19EF21FA68; Thu, 27 Feb 2020 13:18:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CEB6E21F982 for ; Thu, 27 Feb 2020 13:18:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3C5356CA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3174346D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:51 -0500 Message-Id: <1582838290-17243-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 003/622] lustre: uapi: remove enum hsm_progress_states X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" This enum is used only by server side code. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 0566afad..f5474c5 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1532,27 +1532,6 @@ enum hsm_states { */ #define HSM_FLAGS_MASK (HSM_USER_MASK | HSM_STATUS_MASK) -/** - * HSM request progress state - */ -enum hsm_progress_states { - HPS_WAITING = 1, - HPS_RUNNING = 2, - HPS_DONE = 3, -}; - -#define HPS_NONE 0 - -static inline const char *hsm_progress_state2name(enum hsm_progress_states s) -{ - switch (s) { - case HPS_WAITING: return "waiting"; - case HPS_RUNNING: return "running"; - case HPS_DONE: return "done"; - default: return "unknown"; - } -} - struct hsm_extent { __u64 offset; __u64 length; From patchwork Thu Feb 27 21:07:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 569EA14BC for ; Thu, 27 Feb 2020 21:18:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E454246A1 for ; Thu, 27 Feb 2020 21:18:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E454246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6184E21FA8C; Thu, 27 Feb 2020 13:18:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2024221F982 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3E5D66CB; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 34FF046F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:52 -0500 Message-Id: <1582838290-17243-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 004/622] lustre: uapi: sync enum obd_statfs_state X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" With the drift between the OpenSFS and linux client various enum obd_statfs_state values where dropped that are transmitted over the wire. Sync the values. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index f5474c5..27501a2 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -101,9 +101,9 @@ enum obd_statfs_state { OS_STATE_DEGRADED = 0x00000001, /**< RAID degraded/rebuilding */ OS_STATE_READONLY = 0x00000002, /**< filesystem is read-only */ - OS_STATE_RDONLY_1 = 0x00000004, /**< obsolete 1.6, was EROFS=30 */ - OS_STATE_RDONLY_2 = 0x00000008, /**< obsolete 1.6, was EROFS=30 */ - OS_STATE_RDONLY_3 = 0x00000010, /**< obsolete 1.6, was EROFS=30 */ + OS_STATE_NOPRECREATE = 0x00000004, /**< no object precreation */ + OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ + OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ }; struct obd_statfs { From patchwork Thu Feb 27 21:07:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02EEF14BC for ; Thu, 27 Feb 2020 21:18:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DF92C246A1 for ; Thu, 27 Feb 2020 21:18:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF92C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 317A521FABD; Thu, 27 Feb 2020 13:18:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6293421FA25 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 439046CC; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3800E47C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:53 -0500 Message-Id: <1582838290-17243-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 005/622] lustre: llite: return compatible fsid for statfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong Lustre uses 64-bits inode number to identify object on client side. When re-export Lustre via NFS, NFS will detect whether support fsid via statfs(). For the non-support case, it will only recognizes and packs low 32-bits inode number in nfs handle. Such handle cannot be used to locate the object properly. To avoid patch linux kernel, Lustre client should generate fsid and return it via statfs() to up layer. To be compatible with old Lustre client (NFS server), the fsid will be generated from super_block::s_dev. WC-bug-id: https://jira.whamcloud.com/browse/LU-2904 Lustre-commit: abe4d83fab00 ("LU-2904 llite: return compatible fsid for statfs") Signed-off-by: Fan Yong Reviewed-on: http://review.whamcloud.com/7434 Reviewed-by: Bobi Jam Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 3 --- fs/lustre/llite/llite_lib.c | 8 ++++---- fs/lustre/llite/llite_nfs.c | 16 ---------------- 3 files changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index f0a50fc..3192340 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -538,8 +538,6 @@ struct ll_sb_info { /* st_blksize returned by stat(2), when non-zero */ unsigned int ll_stat_blksize; - __kernel_fsid_t ll_fsid; - struct kset ll_kset; /* sysfs object */ struct completion ll_kobj_unregister; }; @@ -941,7 +939,6 @@ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum) /* llite/llite_nfs.c */ extern const struct export_operations lustre_export_operations; u32 get_uuid2int(const char *name, int len); -void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid); struct inode *search_inode_for_lustre(struct super_block *sb, const struct lu_fid *fid); int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index a48d753..e1932ae 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -591,10 +591,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) * only a node-local comparison. */ uuid = obd_get_uuid(sbi->ll_md_exp); - if (uuid) { + if (uuid) sb->s_dev = get_uuid2int(uuid->uuid, strlen(uuid->uuid)); - get_uuid2fsid(uuid->uuid, strlen(uuid->uuid), &sbi->ll_fsid); - } kfree(data); kfree(osfs); @@ -1775,6 +1773,7 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) { struct super_block *sb = de->d_sb; struct obd_statfs osfs; + u64 fsid = huge_encode_dev(sb->s_dev); int rc; CDEBUG(D_VFSTRACE, "VFS Op: at %llu jiffies\n", get_jiffies_64()); @@ -1805,7 +1804,8 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) sfs->f_blocks = osfs.os_blocks; sfs->f_bfree = osfs.os_bfree; sfs->f_bavail = osfs.os_bavail; - sfs->f_fsid = ll_s2sbi(sb)->ll_fsid; + sfs->f_fsid.val[0] = (u32)fsid; + sfs->f_fsid.val[1] = (u32)(fsid >> 32); return 0; } diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index d6643d0..434f92b 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -57,22 +57,6 @@ u32 get_uuid2int(const char *name, int len) return (key0 << 1); } -void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid) -{ - u64 key = 0, key0 = 0x12a3fe2d, key1 = 0x37abe8f9; - - while (len--) { - key = key1 + (key0 ^ (*name++ * 7152373)); - if (key & 0x8000000000000000ULL) - key -= 0x7fffffffffffffffULL; - key1 = key0; - key0 = key; - } - - fsid->val[0] = key; - fsid->val[1] = key >> 32; -} - struct inode *search_inode_for_lustre(struct super_block *sb, const struct lu_fid *fid) { From patchwork Thu Feb 27 21:07:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409661 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4F9D14BC for ; Thu, 27 Feb 2020 21:18:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD4B6246A1 for ; Thu, 27 Feb 2020 21:18:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD4B6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E495F21FB04; Thu, 27 Feb 2020 13:18:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B93DD21FA25 for ; Thu, 27 Feb 2020 13:18:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 45E806CD; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3AE2D496; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:54 -0500 Message-Id: <1582838290-17243-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 006/622] lustre: ldlm: Make kvzalloc | kvfree use consistent X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "Christopher J. Morrone" struct ldlm_lock's l_lvb_data field is freed in ldlm_lock_put() using kfree. However, some other code paths can attach a buffer to l_lvb_data that was allocated using vmalloc(). This can lead to a kfree() of a vmalloc()ed buffer, which can trigger a kernel Oops. WC-bug-id: https://jira.whamcloud.com/browse/LU-4194 Lustre-commit: 9c4d506c5fea ("LU-4194 ldlm: Make OBD_[ALLOC|FREE]_LARGE use consistent") Signed-off-by: Christopher J. Morrone Reviewed-on: http://review.whamcloud.com/8298 Reviewed-by: Andreas Dilger Reviewed-by: Faccini Bruno Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 6eebf5f..7242cd1 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -185,7 +185,7 @@ void ldlm_lock_put(struct ldlm_lock *lock) lock->l_export = NULL; } - kfree(lock->l_lvb_data); + kvfree(lock->l_lvb_data); lu_ref_fini(&lock->l_reference); OBD_FREE_RCU(lock, sizeof(*lock), &lock->l_handle); @@ -1548,7 +1548,7 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, if (lvb_len) { lock->l_lvb_len = lvb_len; - lock->l_lvb_data = kzalloc(lvb_len, GFP_NOFS); + lock->l_lvb_data = kvzalloc(lvb_len, GFP_NOFS); if (!lock->l_lvb_data) { rc = -ENOMEM; goto out; From patchwork Thu Feb 27 21:07:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB2BF138D for ; Thu, 27 Feb 2020 21:18:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 938FC246A1 for ; Thu, 27 Feb 2020 21:18:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 938FC246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BC1621FAC8; Thu, 27 Feb 2020 13:18:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 497CF21FA41 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 481D66CF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3DA57498; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:55 -0500 Message-Id: <1582838290-17243-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 007/622] lustre: llite: limit smallest max_cached_mb value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Nunez , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: James Nunez Currently, ost-survey hangs due to calling 'lfs setstripe' in an old (positional) style and setting max_cached_mb to zero. In ll_max_cached_mb_seq_write(), the number of pages requested is set to the max of pages requested or PTLRPC_MAX_BRW_PAGES to allow the client to make well formed RPCs. WC-bug-id: https://jira.whamcloud.com/browse/LU-4768 Lustre-commit: 46bec835ac72 ("LU-4768 tests: Update ost-survey script") Signed-off-by: James Nunez Reviewed-on: http://review.whamcloud.com/11971 Reviewed-by: Nathaniel Clark Reviewed-by: Cliff White Reviewed-by: Jian Yu Reviewed-by: Jinshan Xiong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/lproc_llite.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index e108326..5ac6689 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -527,6 +527,8 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file, totalram_pages() >> (20 - PAGE_SHIFT)); return -ERANGE; } + /* Allow enough cache so clients can make well-formed RPCs */ + pages_number = max_t(long, pages_number, PTLRPC_MAX_BRW_PAGES); spin_lock(&sbi->ll_lock); diff = pages_number - cache->ccc_lru_max; From patchwork Thu Feb 27 21:07:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0911614BC for ; Thu, 27 Feb 2020 21:18:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E5C54246A1 for ; Thu, 27 Feb 2020 21:18:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5C54246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D724E21FB27; Thu, 27 Feb 2020 13:18:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 08B3D21FA41 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 481406CE; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 41E6C468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:56 -0500 Message-Id: <1582838290-17243-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 008/622] lustre: obdecho: turn on async flag only for mode 3 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rahul Deshmukh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Rahul Deshmukh There are couple of problems in obdfilter-survey: - Type of test brw i.e. "g" was not followed with npages, - Target netdisk was not set properly and - Turn ON async flag only for mode 3. This patch fixes the last problem which is kernel side. WC-bug-id: https://jira.whamcloud.com/browse/LU-5031 Lustre-commit: 9f38647a7b24 ("LU-5031 tests: obdfilter-survey fixes") Signed-off-by: Rahul Deshmukh Reviewed-on: http://review.whamcloud.com/10264 Reviewed-by: Cliff White Reviewed-by: Bob Glossman Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdecho/echo_client.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index ca963bb..3984cb4 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1425,7 +1425,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, struct obdo *oa = &data->ioc_obdo1; struct echo_object *eco; int rc; - int async = 1; + int async = 0; long test_mode; LASSERT(oa->o_valid & OBD_MD_FLGROUP); @@ -1438,14 +1438,14 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, /* OFD/obdfilter works only via prep/commit */ test_mode = (long)data->ioc_pbuf1; - if (test_mode == 1) - async = 0; - if (!ed->ed_next && test_mode != 3) { test_mode = 3; data->ioc_plen1 = data->ioc_count; } + if (test_mode == 3) + async = 1; + /* Truncate batch size to maximum */ if (data->ioc_plen1 > PTLRPC_MAX_BRW_SIZE) data->ioc_plen1 = PTLRPC_MAX_BRW_SIZE; From patchwork Thu Feb 27 21:07:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 058B514BC for ; Thu, 27 Feb 2020 21:18:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DEC51246A1 for ; Thu, 27 Feb 2020 21:18:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEC51246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4DFC21FAD5; Thu, 27 Feb 2020 13:18:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B8A421FA46 for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 49B066D0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4531F46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:57 -0500 Message-Id: <1582838290-17243-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 009/622] lustre: llite: reorganize variable and data structures X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" This patch covers the bits missed in the patch series "Lustre IO stack simplifications and cleanups" from the OpenSFS branch for the LU-5971 work. Details of the original push can be viewed at https://lore.kernel.org/patchwork/cover/662900. No Fixed is provided since the staging patch series was broken up into a much larger patch set. WC-bug-id: https://jira.whamcloud.com/browse/LU-5971 Lustre-commit: 6eda93c7b5f6 ("LU-5971 llite: reorganize variable and data structures") Signed-off-by: John L. Hammond Signed-off-by: Jinshan Xiong Reviewed-on: http://review.whamcloud.com/13714 Reviewed-by: Bobi Jam Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 1 + fs/lustre/llite/glimpse.c | 1 + fs/lustre/llite/lcommon_cl.c | 5 ++--- fs/lustre/llite/lcommon_misc.c | 24 ++++++++++++------------ fs/lustre/llite/llite_internal.h | 8 ++++---- fs/lustre/llite/llite_lib.c | 4 ++-- fs/lustre/llite/super25.c | 1 + fs/lustre/llite/vvp_dev.c | 1 + fs/lustre/llite/vvp_internal.h | 13 +++---------- fs/lustre/llite/vvp_io.c | 4 ++-- 10 files changed, 29 insertions(+), 33 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fe4340d..fe965b1 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -49,6 +49,7 @@ #include #include "llite_internal.h" +#include "vvp_internal.h" struct split_param { struct inode *sp_inode; diff --git a/fs/lustre/llite/glimpse.c b/fs/lustre/llite/glimpse.c index de1a31f..3441904 100644 --- a/fs/lustre/llite/glimpse.c +++ b/fs/lustre/llite/glimpse.c @@ -47,6 +47,7 @@ #include #include "llite_internal.h" +#include "vvp_internal.h" static const struct cl_lock_descr whole_file = { .cld_start = 0, diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c index 988855b..978e05b 100644 --- a/fs/lustre/llite/lcommon_cl.c +++ b/fs/lustre/llite/lcommon_cl.c @@ -30,8 +30,6 @@ * This file is part of Lustre, http://www.lustre.org/ * Lustre is a trademark of Sun Microsystems, Inc. * - * cl code used by vvp (and other Lustre clients in the future). - * * Author: Nikita Danilov */ @@ -63,6 +61,7 @@ * Vvp device and device type functions. * */ +#include "vvp_internal.h" /** * An `emergency' environment used by cl_inode_fini() when cl_env_get() @@ -282,7 +281,7 @@ u64 cl_fid_build_ino(const struct lu_fid *fid, bool api32) return fid_flatten(fid); } -/** +/* * build inode generation from passed @fid. If our FID overflows the 32-bit * inode number then return a non-zero generation to distinguish them. */ diff --git a/fs/lustre/llite/lcommon_misc.c b/fs/lustre/llite/lcommon_misc.c index 29daf5b..48503d6 100644 --- a/fs/lustre/llite/lcommon_misc.c +++ b/fs/lustre/llite/lcommon_misc.c @@ -46,7 +46,7 @@ * maximum-sized (= maximum striped) EA and cookie without having to * calculate this (via a call into the LOV + OSCs) each time we make an RPC. */ -int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp) +static int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp) { u32 val_size, max_easize, def_easize; int rc; @@ -115,7 +115,7 @@ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, #define GROUPLOCK_SCOPE "grouplock" int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, - struct ll_grouplock *cg) + struct ll_grouplock *lg) { struct lu_env *env; struct cl_io *io; @@ -160,22 +160,22 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, return rc; } - cg->lg_env = env; - cg->lg_io = io; - cg->lg_lock = lock; - cg->lg_gid = gid; + lg->lg_env = env; + lg->lg_io = io; + lg->lg_lock = lock; + lg->lg_gid = gid; return 0; } -void cl_put_grouplock(struct ll_grouplock *cg) +void cl_put_grouplock(struct ll_grouplock *lg) { - struct lu_env *env = cg->lg_env; - struct cl_io *io = cg->lg_io; - struct cl_lock *lock = cg->lg_lock; + struct lu_env *env = lg->lg_env; + struct cl_io *io = lg->lg_io; + struct cl_lock *lock = lg->lg_lock; - LASSERT(cg->lg_env); - LASSERT(cg->lg_gid); + LASSERT(lg->lg_env); + LASSERT(lg->lg_gid); cl_lock_release(env, lock); cl_io_fini(env, io); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 3192340..fbe93a4 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -707,7 +707,6 @@ static inline bool ll_sbi_has_tiny_write(struct ll_sb_info *sbi) void ll_ras_enter(struct file *f); /* llite/lcommon_misc.c */ -int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp); int cl_ocd_update(struct obd_device *host, struct obd_device *watched, enum obd_notify_event ev, void *owner); int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, @@ -975,9 +974,9 @@ struct ll_cl_context { struct ll_thread_info { struct iov_iter lti_iter; - struct vvp_io_args lti_args; - struct ra_io_arg lti_ria; - struct ll_cl_context lti_io_ctx; + struct vvp_io_args lti_args; + struct ra_io_arg lti_ria; + struct ll_cl_context lti_io_ctx; }; extern struct lu_context_key ll_thread_key; @@ -1165,6 +1164,7 @@ struct ll_statahead_info { blkcnt_t dirty_cnt(struct inode *inode); int __cl_glimpse_size(struct inode *inode, int agl); + int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io, struct inode *inode, struct cl_object *clob, int agl); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e1932ae..aaa8ad2 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2542,7 +2542,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) { char *buf, *path = NULL; struct dentry *dentry = NULL; - struct vvp_object *obj = cl_inode2vvp(page->mapping->host); + struct inode *inode = page->mapping->host; /* this can be called inside spin lock so use GFP_ATOMIC. */ buf = (char *)__get_free_page(GFP_ATOMIC); @@ -2556,7 +2556,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) "%s: dirty page discard: %s/fid: " DFID "/%s may get corrupted (rc %d)\n", ll_get_fsname(page->mapping->host->i_sb, NULL, 0), s2lsi(page->mapping->host->i_sb)->lsi_lmd->lmd_dev, - PFID(&obj->vob_header.coh_lu.loh_fid), + PFID(ll_inode2fid(inode)), (path && !IS_ERR(path)) ? path : "", ioret); if (dentry) diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 2b65e2f..133fe2a 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -42,6 +42,7 @@ #include #include #include "llite_internal.h" +#include "vvp_internal.h" static struct kmem_cache *ll_inode_cachep; diff --git a/fs/lustre/llite/vvp_dev.c b/fs/lustre/llite/vvp_dev.c index 9f793e9..e1d87f9 100644 --- a/fs/lustre/llite/vvp_dev.c +++ b/fs/lustre/llite/vvp_dev.c @@ -93,6 +93,7 @@ static void *ll_thread_key_init(const struct lu_context *ctx, info = kmem_cache_zalloc(ll_thread_kmem, GFP_NOFS); if (!info) info = ERR_PTR(-ENOMEM); + return info; } diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h index 96f10d2..7a463cb 100644 --- a/fs/lustre/llite/vvp_internal.h +++ b/fs/lustre/llite/vvp_internal.h @@ -166,7 +166,7 @@ static inline struct cl_io *vvp_env_thread_io(const struct lu_env *env) } struct vvp_session { - struct vvp_io cs_ios; + struct vvp_io vs_ios; }; static inline struct vvp_session *vvp_env_session(const struct lu_env *env) @@ -181,11 +181,11 @@ static inline struct vvp_session *vvp_env_session(const struct lu_env *env) static inline struct vvp_io *vvp_env_io(const struct lu_env *env) { - return &vvp_env_session(env)->cs_ios; + return &vvp_env_session(env)->vs_ios; } /** - * ccc-private object state. + * VPP-private object state. */ struct vvp_object { struct cl_object_header vob_header; @@ -246,13 +246,6 @@ struct vvp_device { struct cl_device *vdv_next; }; -void *ccc_key_init(const struct lu_context *ctx, - struct lu_context_key *key); -void ccc_key_fini(const struct lu_context *ctx, - struct lu_context_key *key, void *data); - -void ccc_umount(const struct lu_env *env, struct cl_device *dev); - static inline struct lu_device *vvp2lu_dev(struct vvp_device *vdv) { return &vdv->vdv_cl.cd_lu_dev; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 6145064..37bf942 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -416,10 +416,10 @@ static enum cl_lock_mode vvp_mode_from_vma(struct vm_area_struct *vma) static int vvp_mmap_locks(const struct lu_env *env, struct vvp_io *vio, struct cl_io *io) { - struct vvp_thread_info *cti = vvp_env_info(env); + struct vvp_thread_info *vti = vvp_env_info(env); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - struct cl_lock_descr *descr = &cti->vti_descr; + struct cl_lock_descr *descr = &vti->vti_descr; union ldlm_policy_data policy; unsigned long addr; ssize_t count; From patchwork Thu Feb 27 21:07:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A05ED138D for ; Thu, 27 Feb 2020 21:18:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 89031246A1 for ; Thu, 27 Feb 2020 21:18:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89031246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C9CAF21FB18; Thu, 27 Feb 2020 13:18:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5B3C21FA4E for ; Thu, 27 Feb 2020 13:18:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4B3CE6D1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4825D46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:58 -0500 Message-Id: <1582838290-17243-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 010/622] lustre: llite: increase whole-file readahead to RPC size X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Increase the default whole-file readahead limit to match the current RPC size. That ensures that files smaller than the RPC size will be read in a single round-trip instead of sending multiple smaller RPCs. WC-bug-id: https://jira.whamcloud.com/browse/LU-7990 Lustre-commit: 627d0133d9d7 ("LU-7990 llite: increase whole-file readahead to RPC size") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/26955 Reviewed-by: Patrick Farrell Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index aaa8ad2..12aafe0 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -465,6 +465,12 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sbi->ll_dt_exp->exp_connect_data = *data; + /* Don't change value if it was specified in the config log */ + if (sbi->ll_ra_info.ra_max_read_ahead_whole_pages == -1) + sbi->ll_ra_info.ra_max_read_ahead_whole_pages = + max_t(unsigned long, SBI_DEFAULT_READAHEAD_WHOLE_MAX, + (data->ocd_brw_size >> PAGE_SHIFT)); + err = obd_fid_init(sbi->ll_dt_exp->exp_obd, sbi->ll_dt_exp, LUSTRE_SEQ_METADATA); if (err) { From patchwork Thu Feb 27 21:07:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA9A614BC for ; Thu, 27 Feb 2020 21:18:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D30FA246A1 for ; Thu, 27 Feb 2020 21:18:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D30FA246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDD2E21FA61; Thu, 27 Feb 2020 13:18:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36DEA21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4F5E86D3; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4B21D46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:07:59 -0500 Message-Id: <1582838290-17243-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 011/622] lustre: llite: handle ORPHAN/DEAD directories X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Di Wang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Di Wang Don't set the directory MDS striping if the parent is dead. To test this works add the OBD_FAIL_LLITE_NO_CHECK_DEAD injection fault. WC-bug-id: https://jira.whamcloud.com/browse/LU-7579 Lustre-commit: 098fb363c39 ("LU-7579 osd: move ORPHAN/DEAD flag to OSD") Signed-off-by: Di Wang Reviewed-on: http://review.whamcloud.com/18024 Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/llite/dir.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index e10b372..653a456 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -442,6 +442,7 @@ #define OBD_FAIL_LLITE_XATTR_ENOMEM 0x1405 #define OBD_FAIL_MAKE_LOVEA_HOLE 0x1406 #define OBD_FAIL_LLITE_LOST_LAYOUT 0x1407 +#define OBD_FAIL_LLITE_NO_CHECK_DEAD 0x1408 #define OBD_FAIL_GETATTR_DELAY 0x1409 #define OBD_FAIL_LLITE_CREATE_NODE_PAUSE 0x140c #define OBD_FAIL_LLITE_IMUTEX_SEC 0x140e diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index d3ef669..f21727b 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -433,6 +433,10 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, !(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_DIR_STRIPE)) return -EINVAL; + if (IS_DEADDIR(parent) && + !OBD_FAIL_CHECK(OBD_FAIL_LLITE_NO_CHECK_DEAD)) + return -ENOENT; + if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC) && lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC_SPECIFIC)) lustre_swab_lmv_user_md(lump); From patchwork Thu Feb 27 21:08:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FA26138D for ; Thu, 27 Feb 2020 21:18:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7845B246A1 for ; Thu, 27 Feb 2020 21:18:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7845B246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5EFA21FA63; Thu, 27 Feb 2020 13:18:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D2EF21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 515F76D7; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4DF5446F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:00 -0500 Message-Id: <1582838290-17243-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 012/622] lustre: lov: protected ost pool count updation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jadhav Vikram , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jadhav Vikram ASSERTION(iter->lpi_idx <= ((iter->lpi_pool)->pool_obds.op_count) caused due to reading of ost pool count is not protected in pool_proc_next and pool_proc_show, pool_proc_show get called when op_count was zero. Fix to protect ost pool count by taking lock at start sequence function pool_proc_start and released lock in pool_proc_stop. Rather than using down_read / up_read pairs around pool_proc_next and pool_proc_show, this changes make sure ost pool data gets protected throughout sequence operation. Seagate-bug-id: MRP-3629 WC-bug-id: https://jira.whamcloud.com/browse/LU-9620 Lustre-commit: 61c803319b91 ("LU-9620 lod: protected ost pool count updation") Signed-off-by: Jadhav Vikram Reviewed-by: Ashish Purkar Reviewed-by: Vladimir Saveliev Reviewed-on: https://review.whamcloud.com/27506 Reviewed-by: Fan Yong Reviewed-by: Niu Yawei Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_pool.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c index 60565b9..a0552fb 100644 --- a/fs/lustre/lov/lov_pool.c +++ b/fs/lustre/lov/lov_pool.c @@ -117,14 +117,11 @@ static void *pool_proc_next(struct seq_file *s, void *v, loff_t *pos) /* iterate to find a non empty entry */ prev_idx = iter->idx; - down_read(&pool_tgt_rw_sem(iter->pool)); iter->idx++; - if (iter->idx == pool_tgt_count(iter->pool)) { + if (iter->idx >= pool_tgt_count(iter->pool)) { iter->idx = prev_idx; /* we stay on the last entry */ - up_read(&pool_tgt_rw_sem(iter->pool)); return NULL; } - up_read(&pool_tgt_rw_sem(iter->pool)); (*pos)++; /* return != NULL to continue */ return iter; @@ -157,6 +154,7 @@ static void *pool_proc_start(struct seq_file *s, loff_t *pos) */ /* /!\ do not forget to restore it to pool before freeing it */ s->private = iter; + down_read(&pool_tgt_rw_sem(pool)); if (*pos > 0) { loff_t i; void *ptr; @@ -179,6 +177,7 @@ static void pool_proc_stop(struct seq_file *s, void *v) * we have to free only if s->private is an iterator */ if ((iter) && (iter->magic == POOL_IT_MAGIC)) { + up_read(&pool_tgt_rw_sem(iter->pool)); /* we restore s->private so next call to pool_proc_start() * will work */ @@ -197,9 +196,7 @@ static int pool_proc_show(struct seq_file *s, void *v) LASSERT(iter->pool); LASSERT(iter->idx <= pool_tgt_count(iter->pool)); - down_read(&pool_tgt_rw_sem(iter->pool)); tgt = pool_tgt(iter->pool, iter->idx); - up_read(&pool_tgt_rw_sem(iter->pool)); if (tgt) seq_printf(s, "%s\n", obd_uuid2str(&tgt->ltd_uuid)); From patchwork Thu Feb 27 21:08:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD0C014BC for ; Thu, 27 Feb 2020 21:18:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1D11246A1 for ; Thu, 27 Feb 2020 21:18:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1D11246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93B5321FB4F; Thu, 27 Feb 2020 13:18:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB83F21FA4E for ; Thu, 27 Feb 2020 13:18:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 539496D9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 50F9747C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:01 -0500 Message-Id: <1582838290-17243-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 013/622] lustre: obdclass: fix llog_cat_cleanup() usage on Client X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini With patch/commit 3a83b4b9 for LU-5195, LLOG code has been strengthen against catalog inconsistency by detecting a referenced plain LLOG is missing and by clearing its associated entry by calling llog_cat_cleanup(), which now needs to handle the case where it is also executed on a Client (ie, cathandle->lgh_obj == NULL) and thus must not attempt to update on-disk catalog. WC-bug-id: https://jira.whamcloud.com/browse/LU-6471 Lustre-commit: 485f3ba87433 ("LU-6471 obdclass: fix llog_cat_cleanup() usage on Client") Signed-off-by: Bruno Faccini Reviewed-on: http://review.whamcloud.com/14489 Reviewed-by: Alex Zhuravlev Reviewed-by: John L. Hammond Reviewed-by: Mikhail Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/llog_cat.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index 580d807..ca97e08 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -133,10 +133,8 @@ int llog_cat_close(const struct lu_env *env, struct llog_handle *cathandle) list_del_init(&loghandle->u.phd.phd_entry); llog_close(env, loghandle); } - /* if handle was stored in ctxt, remove it too */ - if (cathandle->lgh_ctxt->loc_handle == cathandle) - cathandle->lgh_ctxt->loc_handle = NULL; - return llog_close(env, cathandle); + + return 0; } EXPORT_SYMBOL(llog_cat_close); From patchwork Thu Feb 27 21:08:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F1B3138D for ; Thu, 27 Feb 2020 21:18:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 47865246A1 for ; Thu, 27 Feb 2020 21:18:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47865246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A5FA121FA4B; Thu, 27 Feb 2020 13:18:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17EE321FA60 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 55C936DA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 54011468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:02 -0500 Message-Id: <1582838290-17243-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 014/622] lustre: mdc: fix possible NULL pointer dereference X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Fix two static analysis errors. fs/lustre/mdc/mdc_dev.c: in mdc_enqueue_send(), pointer 'matched' return from call to function 'ldlm_handle2lock' at line 704 may be NULL and will be dereferenced at line 705. If client is evicted between ldlm_lock_match() and ldlm_handle2lock() the lock pointer could be NULL. fs/lustre/lov/lov_dev.c:488 in lov_process_config, sscanf format specification '%d' expects type 'int' for 'd', but parameter 3 has a different type '__u32'. Converting to kstrtou32() requires changing the "index" variable type from __u32 to u32, which is fine since it is only used internally, fix up the few functions that are also passing "__u32 index" and the resulting checkpatch.pl warnings. WC-bug-id: https://jira.whamcloud.com/browse/LU-10264 Lustre-commit: b89206476174 ("LU-10264 mdc: fix possible NULL pointer dereference") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31621 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 45 ++++++++++++++++++++++++--------------------- fs/lustre/mdc/mdc_dev.c | 2 +- 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 1708fa9..26637bc 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -312,7 +312,8 @@ static int lov_disconnect(struct obd_export *exp) { struct obd_device *obd = class_exp2obd(exp); struct lov_obd *lov = &obd->u.lov; - int i, rc; + u32 index; + int rc; if (!lov->lov_tgts) goto out; @@ -321,19 +322,19 @@ static int lov_disconnect(struct obd_export *exp) lov->lov_connects--; if (lov->lov_connects != 0) { /* why should there be more than 1 connect? */ - CERROR("disconnect #%d\n", lov->lov_connects); + CWARN("%s: unexpected disconnect #%d\n", + obd->obd_name, lov->lov_connects); goto out; } - /* Let's hold another reference so lov_del_obd doesn't spin through - * putref every time - */ + /* hold another ref so lov_del_obd() doesn't spin in putref each time */ lov_tgts_getref(obd); - for (i = 0; i < lov->desc.ld_tgt_count; i++) { - if (lov->lov_tgts[i] && lov->lov_tgts[i]->ltd_exp) { - /* Disconnection is the last we know about an obd */ - lov_del_target(obd, i, NULL, lov->lov_tgts[i]->ltd_gen); + for (index = 0; index < lov->desc.ld_tgt_count; index++) { + if (lov->lov_tgts[index] && lov->lov_tgts[index]->ltd_exp) { + /* Disconnection is the last we know about an OBD */ + lov_del_target(obd, index, NULL, + lov->lov_tgts[index]->ltd_gen); } } @@ -490,13 +491,12 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, uuidp->uuid, index, gen, active); if (gen <= 0) { - CERROR("request to add OBD %s with invalid generation: %d\n", - uuidp->uuid, gen); + CERROR("%s: request to add '%s' with invalid generation: %d\n", + obd->obd_name, uuidp->uuid, gen); return -EINVAL; } - tgt_obd = class_find_client_obd(uuidp, LUSTRE_OSC_NAME, - &obd->obd_uuid); + tgt_obd = class_find_client_obd(uuidp, LUSTRE_OSC_NAME, &obd->obd_uuid); if (!tgt_obd) return -EINVAL; @@ -504,10 +504,11 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, if ((index < lov->lov_tgt_size) && lov->lov_tgts[index]) { tgt = lov->lov_tgts[index]; - CERROR("UUID %s already assigned at LOV target index %d\n", - obd_uuid2str(&tgt->ltd_uuid), index); + rc = -EEXIST; + CERROR("%s: UUID %s already assigned at index %d: rc = %d\n", + obd->obd_name, obd_uuid2str(&tgt->ltd_uuid), index, rc); mutex_unlock(&lov->lov_lock); - return -EEXIST; + return rc; } if (index >= lov->lov_tgt_size) { @@ -602,8 +603,8 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, out: if (rc) { - CERROR("add failed (%d), deleting %s\n", rc, - obd_uuid2str(&tgt->ltd_uuid)); + CERROR("%s: add failed, deleting %s: rc = %d\n", + obd->obd_name, obd_uuid2str(&tgt->ltd_uuid), rc); lov_del_target(obd, index, NULL, 0); } lov_tgts_putref(obd); @@ -860,6 +861,7 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, case LCFG_LOV_DEL_OBD: { u32 index; int gen; + /* lov_modify_tgts add 0:lov_mdsA 1:ost1_UUID 2:0 3:1 */ if (LUSTRE_CFG_BUFLEN(lcfg, 1) > sizeof(obd_uuid.uuid)) { rc = -EINVAL; @@ -868,11 +870,11 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, obd_str2uuid(&obd_uuid, lustre_cfg_buf(lcfg, 1)); - rc = kstrtoint(lustre_cfg_buf(lcfg, 2), 10, indexp); - if (rc < 0) + rc = kstrtou32(lustre_cfg_buf(lcfg, 2), 10, indexp); + if (rc) goto out; rc = kstrtoint(lustre_cfg_buf(lcfg, 3), 10, genp); - if (rc < 0) + if (rc) goto out; index = *indexp; gen = *genp; @@ -882,6 +884,7 @@ int lov_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg, rc = lov_add_target(obd, &obd_uuid, index, gen, 0); else rc = lov_del_target(obd, index, &obd_uuid, gen); + goto out; } case LCFG_PARAM: { diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index ca0822d..80e3120 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -684,7 +684,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, return ELDLM_OK; matched = ldlm_handle2lock(&lockh); - if (ldlm_is_kms_ignore(matched)) + if (!matched || ldlm_is_kms_ignore(matched)) goto no_match; if (mdc_set_dom_lock_data(env, matched, einfo->ei_cbdata)) { From patchwork Thu Feb 27 21:08:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409675 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29310138D for ; Thu, 27 Feb 2020 21:19:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11856246A1 for ; Thu, 27 Feb 2020 21:19:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11856246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BA0321FB2A; Thu, 27 Feb 2020 13:18:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6EBF221FA67 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 585EA6DF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 572BB46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:03 -0500 Message-Id: <1582838290-17243-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 015/622] lustre: obdclass: allow specifying complex jobids X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Allow specifying a format string for the jobid_name variable to create a jobid for processes on the client. The jobid_name is used when jobid_var=nodelocal, if jobid_name contains "%j", or as a fallback if getting the specified jobid_var from the environment fails. The jobid_node string allows the following escape sequences: %e = executable name %g = group ID %h = hostname (system utsname) %j = jobid from jobid_var environment variable %p = process ID %u = user ID Any unknown escape sequences are dropped. Other arbitrary characters pass through unmodified, up to the maximum jobid string size of 32, though whitespace within the jobid is not copied. This allows, for example, specifying an arbitrary prefix, such as the cluster name, in addition to the traditional "procname.uid" format, to distinguish between jobs running on clients in different clusters: lctl set_param jobid_var=nodelocal jobid_name=cluster2.%e.%u or lctl set_param jobid_var=SLURM_JOB_ID jobid_name=cluster2.%j.%e To use an environment-specified JobID, if available, but fall back to a static string for all processes that do not have a valid JobID: lctl set_param jobid_var=SLURM_JOB_ID jobid_name=unknown Implementation notes: The LUSTRE_JOBID_SIZE includes a trailing NUL, so don't use "LUSTRE_JOBID_SIZE + 1" anywhere, as that is misleading. Rename the "obd_jobid_node" variable to "obd_jobid_name" to match the sysfs "jobid_name" parameter name to avoid confusion. Rename "struct jobid_to_pid_map" to "jobid_pid_map" since this is not actually mapping from a jobid *to* a PID, but the reverse. Save jobid length, and reorder fields to avoid holes in structure. Consolidate PID->jobid cache handling in jobid_get_from_cache(), which only does environment lookups and caches the results. The fallback to using obd_jobid_name is handled by the caller. Rename check_job_name() to jobid_name_is_valid(), since that makes it clear to the reader a "true" return is a valid name. In jobid_cache_init() there is no benefit for locking the jobid_hash creation, since the spinlock is just initialized in this function, so multiple callers of this function would already be broken. Pass the buffer size from the callers (who know the buffer size) to lustre_get_jobid() instead of assuming it is LUSTRE_JOBID_SIZE. WC-bug-id: https://jira.whamcloud.com/browse/LU-10698 Lustre-commit: 6488c0ec57de ("LU-10698 obdclass: allow specifying complex jobids") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31691 Reviewed-by: Jinshan Xiong Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 4 +- fs/lustre/llite/llite_internal.h | 4 +- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/vvp_io.c | 2 +- fs/lustre/llite/vvp_object.c | 3 +- fs/lustre/obdclass/jobid.c | 95 +++++++++++++++++++++++++++++++--- fs/lustre/obdclass/obd_sysfs.c | 10 ++-- fs/lustre/ptlrpc/pack_generic.c | 4 +- include/uapi/linux/lustre/lustre_idl.h | 2 +- 9 files changed, 105 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 9e07853..146c37e 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -54,7 +54,7 @@ /* OBD Operations Declarations */ struct obd_device *class_exp2obd(struct obd_export *exp); int class_handle_ioctl(unsigned int cmd, unsigned long arg); -int lustre_get_jobid(char *jobid); +int lustre_get_jobid(char *jobid, size_t len); struct lu_device_type; @@ -1672,7 +1672,7 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) int class_check_uuid(struct obd_uuid *uuid, u64 nid); /* class_obd.c */ -extern char obd_jobid_node[]; +extern char obd_jobid_name[]; int class_procfs_init(void); int class_procfs_clean(void); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index fbe93a4..d0a703d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -195,11 +195,11 @@ struct ll_inode_info { int lli_async_rc; /* - * whenever a process try to read/write the file, the + * Whenever a process try to read/write the file, the * jobid of the process will be saved here, and it'll * be packed into the write PRC when flush later. * - * so the read/write statistics for jobid will not be + * So the read/write statistics for jobid will not be * accurate if the file is shared by different jobs. */ char lli_jobid[LUSTRE_JOBID_SIZE]; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 12aafe0..7580d57 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -937,7 +937,7 @@ void ll_lli_init(struct ll_inode_info *lli) lli->lli_async_rc = 0; } mutex_init(&lli->lli_layout_mutex); - memset(lli->lli_jobid, 0, LUSTRE_JOBID_SIZE); + memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); } int ll_fill_super(struct super_block *sb) diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 37bf942..85bb3e0 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1419,7 +1419,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj, * it's not accurate if the file is shared by different * jobs. */ - lustre_get_jobid(lli->lli_jobid); + lustre_get_jobid(lli->lli_jobid, sizeof(lli->lli_jobid)); } else if (io->ci_type == CIT_SETATTR) { if (!cl_io_is_trunc(io)) io->ci_lockreq = CILR_MANDATORY; diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index c750a80..24cde0d 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -212,7 +212,8 @@ static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid); if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_INVALID_PFID)) oa->o_parent_oid++; - memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid, LUSTRE_JOBID_SIZE); + memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid, + sizeof(attr->cra_jobid)); } static const struct cl_object_operations vvp_ops = { diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c index 3655a2e..8bad859 100644 --- a/fs/lustre/obdclass/jobid.c +++ b/fs/lustre/obdclass/jobid.c @@ -32,17 +32,19 @@ */ #define DEBUG_SUBSYSTEM S_RPC +#include #include #ifdef HAVE_UIDGID_HEADER #include #endif +#include #include #include #include char obd_jobid_var[JOBSTATS_JOBID_VAR_MAX_LEN + 1] = JOBSTATS_DISABLE; -char obd_jobid_node[LUSTRE_JOBID_SIZE + 1]; +char obd_jobid_name[LUSTRE_JOBID_SIZE] = "%e.%u"; /* Get jobid of current process from stored variable or calculate * it from pid and user_id. @@ -52,9 +54,89 @@ * This is now deprecated. */ -int lustre_get_jobid(char *jobid) +/* + * jobid_interpret_string() + * + * Interpret the jobfmt string to expand specified fields, like coredumps do: + * %e = executable + * %g = gid + * %h = hostname + * %j = jobid from environment + * %p = pid + * %u = uid + * + * Unknown escape strings are dropped. Other characters are copied through, + * excluding whitespace (to avoid making jobid parsing difficult). + * + * Return: -EOVERFLOW if the expanded string does not fit within @joblen + * 0 for success + */ +static int jobid_interpret_string(const char *jobfmt, char *jobid, + ssize_t joblen) +{ + char c; + + while ((c = *jobfmt++) && joblen > 1) { + char f; + int l; + + if (isspace(c)) /* Don't allow embedded spaces */ + continue; + + if (c != '%') { + *jobid = c; + joblen--; + jobid++; + continue; + } + + switch ((f = *jobfmt++)) { + case 'e': /* executable name */ + l = snprintf(jobid, joblen, "%s", current->comm); + break; + case 'g': /* group ID */ + l = snprintf(jobid, joblen, "%u", + from_kgid(&init_user_ns, current_fsgid())); + break; + case 'h': /* hostname */ + l = snprintf(jobid, joblen, "%s", + init_utsname()->nodename); + break; + case 'j': /* jobid requested by process + * - currently not supported + */ + l = snprintf(jobid, joblen, "%s", "jobid"); + break; + case 'p': /* process ID */ + l = snprintf(jobid, joblen, "%u", current->pid); + break; + case 'u': /* user ID */ + l = snprintf(jobid, joblen, "%u", + from_kuid(&init_user_ns, current_fsuid())); + break; + case '\0': /* '%' at end of format string */ + l = 0; + goto out; + default: /* drop unknown %x format strings */ + l = 0; + break; + } + jobid += l; + joblen -= l; + } + /* + * This points at the end of the buffer, so long as jobid is always + * incremented the same amount as joblen is decremented. + */ +out: + jobid[joblen - 1] = '\0'; + + return joblen < 0 ? -EOVERFLOW : 0; +} + +int lustre_get_jobid(char *jobid, size_t joblen) { - char tmp_jobid[LUSTRE_JOBID_SIZE] = { 0 }; + char tmp_jobid[LUSTRE_JOBID_SIZE] = ""; /* Jobstats isn't enabled */ if (strcmp(obd_jobid_var, JOBSTATS_DISABLE) == 0) @@ -70,10 +152,11 @@ int lustre_get_jobid(char *jobid) /* Whole node dedicated to single job */ if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0) { - strcpy(tmp_jobid, obd_jobid_node); - goto out_cache_jobid; + int rc2 = jobid_interpret_string(obd_jobid_name, + tmp_jobid, joblen); + if (!rc2) + goto out_cache_jobid; } - return -ENOENT; out_cache_jobid: diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c index bac8e7c5..cd2917e 100644 --- a/fs/lustre/obdclass/obd_sysfs.c +++ b/fs/lustre/obdclass/obd_sysfs.c @@ -233,7 +233,7 @@ static ssize_t jobid_var_store(struct kobject *kobj, struct attribute *attr, static ssize_t jobid_name_show(struct kobject *kobj, struct attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_node); + return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_name); } static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, @@ -243,13 +243,13 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, if (!count || count > LUSTRE_JOBID_SIZE) return -EINVAL; - memcpy(obd_jobid_node, buffer, count); + memcpy(obd_jobid_name, buffer, count); - obd_jobid_node[count] = 0; + obd_jobid_name[count] = 0; /* Trim the trailing '\n' if any */ - if (obd_jobid_node[count - 1] == '\n') - obd_jobid_node[count - 1] = 0; + if (obd_jobid_name[count - 1] == '\n') + obd_jobid_name[count - 1] = 0; return count; } diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index b6a4fd8..bc5e513 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1406,9 +1406,9 @@ void lustre_msg_set_jobid(struct lustre_msg *msg, char *jobid) LASSERTF(pb, "invalid msg %p: no ptlrpc body!\n", msg); if (jobid) - memcpy(pb->pb_jobid, jobid, LUSTRE_JOBID_SIZE); + memcpy(pb->pb_jobid, jobid, sizeof(pb->pb_jobid)); else if (pb->pb_jobid[0] == '\0') - lustre_get_jobid(pb->pb_jobid); + lustre_get_jobid(pb->pb_jobid, sizeof(pb->pb_jobid)); return; } default: diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 401f7ef..4e1605a2 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -635,7 +635,7 @@ struct ptlrpc_body_v3 { __u64 pb_padding64_0; __u64 pb_padding64_1; __u64 pb_padding64_2; - char pb_jobid[LUSTRE_JOBID_SIZE]; /* req: ASCII MPI jobid from env */ + char pb_jobid[LUSTRE_JOBID_SIZE]; /* req: ASCII jobid from env + NUL */ }; #define ptlrpc_body ptlrpc_body_v3 From patchwork Thu Feb 27 21:08:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E8A114BC for ; Thu, 27 Feb 2020 21:19:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1727D246A1 for ; Thu, 27 Feb 2020 21:19:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1727D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AE39521FB82; Thu, 27 Feb 2020 13:18:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C3E3421FA67 for ; Thu, 27 Feb 2020 13:18:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5D3558E9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5A58C46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:04 -0500 Message-Id: <1582838290-17243-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 016/622] lustre: ldlm: don't disable softirq for exp_rpc_lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Liang Zhen it is not necessary to call ldlm_lock_busy() in the context of timer callback, we can call it in thread context of expired_lock_main. With this change, we don't need to disable softirq for exp_rpc_lock. Instead of moving busy locks to the end of the waiting list one at a time in the context of the timer callback, move any locks that may be expired onto the expired list. If these locks are still being used by RPCs being processed, then put them back onto the end of the waiting list instead of evicting the client. For the linux client the impact of this change is change of spin_lock_bh() to spin_lock() for the exp_rpc_lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-6032 Lustre-commit: 292aa42e0897 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock") Signed-off-by: Liang Zhen Reviewed-on: https://review.whamcloud.com/12957 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index d57df36..3c61e83 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1307,9 +1307,9 @@ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, LASSERT(rc <= 1); } - spin_lock_bh(&req->rq_export->exp_rpc_lock); + spin_lock(&req->rq_export->exp_rpc_lock); list_add(&req->rq_exp_list, &req->rq_export->exp_hp_rpcs); - spin_unlock_bh(&req->rq_export->exp_rpc_lock); + spin_unlock(&req->rq_export->exp_rpc_lock); } ptlrpc_nrs_req_initialize(svcpt, req, rc); @@ -1327,9 +1327,9 @@ static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req) if (req->rq_ops->hpreq_fini) req->rq_ops->hpreq_fini(req); - spin_lock_bh(&req->rq_export->exp_rpc_lock); + spin_lock(&req->rq_export->exp_rpc_lock); list_del_init(&req->rq_exp_list); - spin_unlock_bh(&req->rq_export->exp_rpc_lock); + spin_unlock(&req->rq_export->exp_rpc_lock); } } From patchwork Thu Feb 27 21:08:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 488D314BC for ; Thu, 27 Feb 2020 21:19:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3155D246A1 for ; Thu, 27 Feb 2020 21:19:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3155D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ABB2B21FBA7; Thu, 27 Feb 2020 13:19:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 159D621FA67 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F2DF905; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5D53246D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:05 -0500 Message-Id: <1582838290-17243-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 017/622] lustre: obdclass: new wrapper to convert NID to string X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Liang Zhen This patch includes a couple of changes: - add new wrapper function obd_import_nid2str - use obd_import_nid2str and obd_export_nid2str to replace all libcfs_nid2str conversions for NID of export/import connection WC-bug-id: https://jira.whamcloud.com/browse/LU-6032 Lustre-commit: 61f9847a812f ("LU-6032 obdclass: new wrapper to convert NID to string") Signed-off-by: Liang Zhen Reviewed-on: https://review.whamcloud.com/12956 Reviewed-by: Dmitry Eremin Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 12 ++++++++++++ fs/lustre/ldlm/ldlm_lock.c | 4 ++-- fs/lustre/ptlrpc/client.c | 5 ++--- fs/lustre/ptlrpc/import.c | 6 +++--- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 146c37e..d896049 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -86,6 +86,18 @@ struct obd_device *class_devices_in_group(struct obd_uuid *grp_uuid, int obd_connect_flags2str(char *page, int count, u64 flags, u64 flags2, const char *sep); +static inline char *obd_export_nid2str(struct obd_export *exp) +{ + return exp->exp_connection ? + libcfs_nid2str(exp->exp_connection->c_peer.nid) : ""; +} + +static inline char *obd_import_nid2str(struct obd_import *imp) +{ + return imp->imp_connection ? + libcfs_nid2str(imp->imp_connection->c_peer.nid) : ""; +} + int obd_zombie_impexp_init(void); void obd_zombie_impexp_stop(void); void obd_zombie_barrier(void); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 7242cd1..aa19b89 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1987,11 +1987,11 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, vaf.va = &args; if (exp && exp->exp_connection) { - nid = libcfs_nid2str(exp->exp_connection->c_peer.nid); + nid = obd_export_nid2str(exp); } else if (exp && exp->exp_obd) { struct obd_import *imp = exp->exp_obd->u.cli.cl_import; - nid = libcfs_nid2str(imp->imp_connection->c_peer.nid); + nid = obd_import_nid2str(imp); } if (!resource) { diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index a533cbb..424db55 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1605,8 +1605,7 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req) current->comm, imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, - libcfs_nid2str(imp->imp_connection->c_peer.nid), - lustre_msg_get_opc(req->rq_reqmsg)); + obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg)); rc = ptl_send_rpc(req, 0); if (rc == -ENOMEM) { @@ -2017,7 +2016,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) current->comm, imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, - libcfs_nid2str(imp->imp_connection->c_peer.nid), + obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg)); spin_lock(&imp->imp_lock); diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index d032962..dca4aa0 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -171,13 +171,13 @@ int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) LCONSOLE_WARN("%s: Connection to %.*s (at %s) was lost; in progress operations using this service will wait for recovery to complete\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } else { LCONSOLE_ERROR_MSG(0x166, "%s: Connection to %.*s (at %s) was lost; in progress operations using this service will fail\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_DISCON); spin_unlock(&imp->imp_lock); @@ -1461,7 +1461,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) LCONSOLE_INFO("%s: Connection restored to %.*s (at %s)\n", imp->imp_obd->obd_name, target_len, target_start, - libcfs_nid2str(imp->imp_connection->c_peer.nid)); + obd_import_nid2str(imp)); } if (imp->imp_state == LUSTRE_IMP_FULL) { From patchwork Thu Feb 27 21:08:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409673 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5084514BC for ; Thu, 27 Feb 2020 21:18:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 361E9246A1 for ; Thu, 27 Feb 2020 21:18:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 361E9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 12EC221FB5A; Thu, 27 Feb 2020 13:18:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D0EE21FA55 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 61F5D909; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 604C3468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:06 -0500 Message-Id: <1582838290-17243-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 018/622] lustre: ptlrpc: Add QoS for uid and gid in NRS-TBF X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Li Xi , Qian Yingjin , Teddy Chan , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Teddy Chan This patch add a new QoS feature in TBF policy which could limits the rate based on uid or gid. The policy is able to limit the rate both on MDT and OSS site. The command for this feature is like: Start the tbf uid QoS on OST: lctl set_param ost.OSS.*.nrs_policies="tbf uid" Limit the rate of ptlrpc requests of the uid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name uid={500} rate=100" Start the tbf gid QoS on OST: lctl set_param ost.OSS.*.nrs_policies="tbf gid" Limit the rate of ptlrpc requests of the gid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name gid={500} rate=100" or use generic tbf rule to mix them on OST: lctl set_param ost.OSS.*.nrs_policies="tbf" Limit the rate of ptlrpc requests of the uid 500 gid 500 lctl set_param ost.OSS.*.nrs_tbf_rule= "start tbf_name uid={500}&gid={500} rate=100" Also, you can use the following rule to control all reqs to mds: Start the tbf uid QoS on MDS: lctl set_param mds.MDS.*.nrs_policies="tbf uid" Limit the rate of ptlrpc requests of the uid 500 lctl set_param mds.MDS.*.nrs_tbf_rule= "start tbf_name uid={500} rate=100" For the linux client we need to send the uid and gid information to the NRS-TBF handling on the servers. WC-bug-id: https://jira.whamcloud.com/browse/LU-9658 Lustre-commit: e0cdde123c14 ("LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF") Signed-off-by: Teddy Chan Signed-off-by: Li Xi Signed-off-by: Wang Shilong Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/27608 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/vvp_object.c | 5 ++--- fs/lustre/obdclass/obdo.c | 5 +++++ fs/lustre/osc/osc_request.c | 10 ++++++++++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index 24cde0d..eeb8823 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -196,7 +196,7 @@ static int vvp_object_glimpse(const struct lu_env *env, static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, struct cl_req_attr *attr) { - u64 valid_flags = OBD_MD_FLTYPE; + u64 valid_flags = OBD_MD_FLTYPE | OBD_MD_FLUID | OBD_MD_FLGID; struct inode *inode; struct obdo *oa; @@ -204,8 +204,7 @@ static void vvp_req_attr_set(const struct lu_env *env, struct cl_object *obj, inode = vvp_object_inode(obj); if (attr->cra_type == CRT_WRITE) { - valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME | - OBD_MD_FLUID | OBD_MD_FLGID; + valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME; obdo_set_o_projid(oa, ll_i2info(inode)->lli_projid); } obdo_from_inode(oa, inode, valid_flags & attr->cra_flags); diff --git a/fs/lustre/obdclass/obdo.c b/fs/lustre/obdclass/obdo.c index 1926896..e5475f1 100644 --- a/fs/lustre/obdclass/obdo.c +++ b/fs/lustre/obdclass/obdo.c @@ -144,6 +144,11 @@ void lustre_set_wire_obdo(const struct obd_connect_data *ocd, if (!ocd) return; + if (!(wobdo->o_valid & OBD_MD_FLUID)) + wobdo->o_uid = from_kuid(&init_user_ns, current_uid()); + if (!(wobdo->o_valid & OBD_MD_FLGID)) + wobdo->o_gid = from_kgid(&init_user_ns, current_gid()); + if (unlikely(!(ocd->ocd_connect_flags & OBD_CONNECT_FID)) && fid_seq_is_echo(ostid_seq(&lobdo->o_oi))) { /* diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 300dee5..99c9620 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1184,6 +1184,16 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, lustre_set_wire_obdo(&req->rq_import->imp_connect_data, &body->oa, oa); + /* For READ and WRITE, we can't fill o_uid and o_gid using from_kuid() + * and from_kgid(), because they are asynchronous. Fortunately, variable + * oa contains valid o_uid and o_gid in these two operations. + * Besides, filling o_uid and o_gid is enough for nrs-tbf, see LU-9658. + * OBD_MD_FLUID and OBD_MD_FLUID is not set in order to avoid breaking + * other process logic + */ + body->oa.o_uid = oa->o_uid; + body->oa.o_gid = oa->o_gid; + obdo_to_ioobj(oa, ioobj); ioobj->ioo_bufcnt = niocount; /* The high bits of ioo_max_brw tells server _maximum_ number of bulks From patchwork Thu Feb 27 21:08:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E0C314BC for ; Thu, 27 Feb 2020 21:19:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 46BE4246A1 for ; Thu, 27 Feb 2020 21:19:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46BE4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D083821FB70; Thu, 27 Feb 2020 13:18:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C484921FA44 for ; Thu, 27 Feb 2020 13:18:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6438091F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 632A446A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:07 -0500 Message-Id: <1582838290-17243-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 019/622] lustre: hsm: ignore compound_id X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" Ignore request compound ids in the HSM coordinator. Compound ids prevent batching of CDT to CT requests and degrade HSM performance. Use CT/archive id compatabiliy when deciding which HSM actions to put in a request. WC-bug-id: https://jira.whamcloud.com/browse/LU-10383 Lustre-commit: 9ee81f920bb3 ("LU-10383 hsm: ignore compound_id") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/30949 Reviewed-by: Quentin Bouget Reviewed-by: Faccini Bruno Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_idl.h | 2 +- include/uapi/linux/lustre/lustre_user.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 4e1605a2..307feb3 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2508,7 +2508,7 @@ struct llog_agent_req_rec { */ __u32 arr_archive_id; /**< backend archive number */ __u64 arr_flags; /**< req flags */ - __u64 arr_compound_id;/**< compound cookie */ + __u64 arr_compound_id;/**< compound cookie, ignored */ __u64 arr_req_create; /**< req. creation time */ __u64 arr_req_change; /**< req. status change time */ struct hsm_action_item arr_hai; /**< req. to the agent */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 27501a2..5405e1b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1729,7 +1729,7 @@ static inline char *hai_dump_data_field(struct hsm_action_item *hai, struct hsm_action_list { __u32 hal_version; __u32 hal_count; /* number of hai's to follow */ - __u64 hal_compound_id; /* returned by coordinator */ + __u64 hal_compound_id; /* returned by coordinator, ignored */ __u64 hal_flags; __u32 hal_archive_id; /* which archive backend */ __u32 padding1; From patchwork Thu Feb 27 21:08:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B5A714BC for ; Thu, 27 Feb 2020 21:19:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2424A246A1 for ; Thu, 27 Feb 2020 21:19:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2424A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9CD8521FBEC; Thu, 27 Feb 2020 13:19:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 120A421FA44 for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 672C79E0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 65F8C46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:08 -0500 Message-Id: <1582838290-17243-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 020/622] lnet: libcfs: remove unnecessary set_fs(KERNEL_DS) X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mike Marciniszyn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mike Marciniszyn When we converted to using kernel_write(), we left some set_fs() calls that are not unnecessary. Remove them. Original OpenSFS version of this patch, as mentioned below, did the full conversion to kernel_write. WC-bug-id: https://jira.whamcloud.com/browse/LU-10560 lustre-commit: b9a32054600a ("LU-10560 libcfs: Use kernel_write when appropriate") Signed-off-by: Mike Marciniszyn Reviewed-on: https://review.whamcloud.com/31154 Reviewed-by: James Simmons Reviewed-by: Dmitry Eremin Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin igned-off-by: James Simmons --- net/lnet/libcfs/tracefile.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index 3b29116..6e4cc31 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -807,7 +807,6 @@ int cfs_tracefile_dump_all_pages(char *filename) struct cfs_trace_page *tage; struct cfs_trace_page *tmp; char *buf; - mm_segment_t __oldfs; int rc; down_write(&cfs_tracefile_sem); @@ -828,8 +827,6 @@ int cfs_tracefile_dump_all_pages(char *filename) rc = 0; goto close; } - __oldfs = get_fs(); - set_fs(KERNEL_DS); /* ok, for now, just write the pages. in the future we'll be building * iobufs with the pages and calling generic_direct_IO @@ -851,7 +848,7 @@ int cfs_tracefile_dump_all_pages(char *filename) list_del(&tage->linkage); cfs_tage_free(tage); } - set_fs(__oldfs); + rc = vfs_fsync(filp, 1); if (rc) pr_err("sync returns %d\n", rc); From patchwork Thu Feb 27 21:08:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D1B9138D for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 25DAB246A1 for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 25DAB246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36CBA21FB3B; Thu, 27 Feb 2020 13:18:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 52B5F21FA7D for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6A0389E1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 68DFD46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:09 -0500 Message-Id: <1582838290-17243-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 021/622] lustre: ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Assertion fails on !desc->bd_registered during retry after ENOMEM. Drop bd_registered flag and exit via cleanup_bulk to ensure that bulk is fully unregistered. Cray-bug-id: MRP-4733 WC-bug-id: https://jira.whamcloud.com/browse/LU-10643 Lustre-commit: 4a81be263079 ("LU-10643 ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/31228 Reviewed-by: Alexandr Boyko Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/niobuf.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 653a456..67500b5 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -349,6 +349,7 @@ #define OBD_FAIL_PTLRPC_DROP_BULK 0x51a #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK 0x51b #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK 0x51c +#define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 #define OBD_FAIL_OBD_PING_NET 0x600 #define OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 02ed373..2e866fe 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -179,8 +179,13 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LNET_MD_OP_GET : LNET_MD_OP_PUT); ptlrpc_fill_bulk_md(&md, desc, posted_md); - rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, - LNET_UNLINK, LNET_INS_AFTER, &me_h); + if (posted_md > 0 && posted_md + 1 == total_md && + OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) { + rc = -ENOMEM; + } else { + rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, + LNET_UNLINK, LNET_INS_AFTER, &me_h); + } if (rc != 0) { CERROR("%s: LNetMEAttach failed x%llu/%d: rc = %d\n", desc->bd_import->imp_obd->obd_name, mbits, @@ -209,6 +214,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LASSERT(desc->bd_md_count >= 0); mdunlink_iterate_helper(desc->bd_mds, desc->bd_md_max_brw); req->rq_status = -ENOMEM; + desc->bd_registered = 0; return -ENOMEM; } @@ -585,7 +591,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) if (request->rq_bulk) { rc = ptlrpc_register_bulk(request); if (rc != 0) - goto out; + goto cleanup_bulk; /* * All the mds in the request will have the same cpt * encoded in the cookie. So we can just get the first From patchwork Thu Feb 27 21:08:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62A54138D for ; Thu, 27 Feb 2020 21:19:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4A962246A1 for ; Thu, 27 Feb 2020 21:19:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A962246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B798321FC22; Thu, 27 Feb 2020 13:19:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9482321FA3C for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6EACB9E3; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6BC7246D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:10 -0500 Message-Id: <1582838290-17243-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 022/622] lustre: llite: yield cpu after call to ll_agl_trigger X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler The statahead and agl threads loop over all entries in the directory without yielding the CPU. If the number of entries in the directory is large enough then these threads may trigger soft lockups. The fix is to add calls to cond_resched() after calling ll_agl_trigger(), which gets the glimpse lock for a file. Cray-bug-id: LUS-2584 WC-bug-id: https://jira.whamcloud.com/browse/LU-10649 Lustre-commit: 031001f0d438 ("LU-10649 llite: yield cpu after call to ll_agl_trigger") Signed-off-by: Ann Koehler Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/31240 Reviewed-by: Patrick Farrell Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/statahead.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 99b3fee..4a61dac 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -907,6 +907,7 @@ static int ll_agl_thread(void *arg) list_del_init(&clli->lli_agl_list); spin_unlock(&plli->lli_agl_lock); ll_agl_trigger(&clli->lli_vfs_inode, sai); + cond_resched(); } else { spin_unlock(&plli->lli_agl_lock); } @@ -1071,7 +1072,7 @@ static int ll_statahead_thread(void *arg) ll_agl_trigger(&clli->lli_vfs_inode, sai); - + cond_resched(); spin_lock(&lli->lli_agl_lock); } spin_unlock(&lli->lli_agl_lock); From patchwork Thu Feb 27 21:08:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ADE414BC for ; Thu, 27 Feb 2020 21:19:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 83E4C246A1 for ; Thu, 27 Feb 2020 21:19:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83E4C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D33D621FB98; Thu, 27 Feb 2020 13:19:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D8D3621FA84 for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 70B869E8; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F16E468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:11 -0500 Message-Id: <1582838290-17243-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 023/622] lustre: osc: Do not request more than 2GiB grant X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The server enforces a grant limit of 2 GiB, which the client must honor. The existing client code combined with 16 MiB RPCs make it possible for the client to ask for more than this limit. Make this limit explicit, and also fix an overflow bug in o_undirty calculation in osc_announce_cached. (o_undirty is a 32 bit value and 16 MiB*256 rpcs_in_flight = 4 GiB. 4 GiB + extra grant components overflows o_undirty.) Cray-bug-id: LUS-5750 WC-bug-id: https://jira.whamcloud.com/browse/LU-10776 Lustre-commit: c0246d887809 ("LU-10776 osc: Do not request more than 2GiB grant") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/31533 Reviewed-by: Nathaniel Clark Reviewed-by: Bobi Jam Reviewed-by: Andrew Perepechko Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 10 ++++++++-- include/uapi/linux/lustre/lustre_idl.h | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 99c9620..c430239 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -664,11 +664,12 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, oa->o_undirty = 0; } else { unsigned long nrpages; + unsigned long undirty; nrpages = cli->cl_max_pages_per_rpc; nrpages *= cli->cl_max_rpcs_in_flight + 1; nrpages = max(nrpages, cli->cl_dirty_max_pages); - oa->o_undirty = nrpages << PAGE_SHIFT; + undirty = nrpages << PAGE_SHIFT; if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data, GRANT_PARAM)) { int nrextents; @@ -679,8 +680,13 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, */ nrextents = DIV_ROUND_UP(nrpages, cli->cl_max_extent_pages); - oa->o_undirty += nrextents * cli->cl_grant_extent_tax; + undirty += nrextents * cli->cl_grant_extent_tax; } + /* Do not ask for more than OBD_MAX_GRANT - a margin for server + * to add extent tax, etc. + */ + oa->o_undirty = min(undirty, OBD_MAX_GRANT - + (PTLRPC_MAX_BRW_PAGES << PAGE_SHIFT)*4UL); } oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant; oa->o_dropped = cli->cl_lost_grant; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 307feb3..0bce63d 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1213,6 +1213,8 @@ struct hsm_state_set { * it to sync quickly */ +#define OBD_MAX_GRANT 0x7fffffffUL /* Max grant allowed to one client: 2 GiB */ + #define OBD_OBJECT_EOF LUSTRE_EOF #define OST_MIN_PRECREATE 32 From patchwork Thu Feb 27 21:08:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB444138D for ; Thu, 27 Feb 2020 21:19:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A3D1C246A3 for ; Thu, 27 Feb 2020 21:19:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3D1C246A3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 134F921FAC8; Thu, 27 Feb 2020 13:19:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27EF221FA84 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 735FF9EA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7204E46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:12 -0500 Message-Id: <1582838290-17243-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 024/622] lustre: llite: rename FSFILT_IOC_* to system flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jinshan Xiong Those definitions were probably created for compatibility. Now that FS_IOC_* have been existing in kernel for long time, we should use them to avoid confusion. WC-bug-id: https://jira.whamcloud.com/browse/LU-10779 Lustre-commit: 7e3fc106d6e7 ("LU-10779 llite: rename FSFILT_IOC_* to system flags") Signed-off-by: Jinshan Xiong Reviewed-on: https://review.whamcloud.com/31546 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 13 +++++++------ fs/lustre/llite/file.c | 19 ++++++++++--------- fs/lustre/llite/llite_lib.c | 4 ++-- 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f21727b..b006e32 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1108,18 +1108,19 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_IOCTL, 1); switch (cmd) { - case FSFILT_IOC_GETFLAGS: - case FSFILT_IOC_SETFLAGS: + case FS_IOC_GETFLAGS: + case FS_IOC_SETFLAGS: return ll_iocontrol(inode, file, cmd, arg); - case FSFILT_IOC_GETVERSION_OLD: case FSFILT_IOC_GETVERSION: + case FS_IOC_GETVERSION: return put_user(inode->i_generation, (int __user *)arg); /* We need to special case any other ioctls we want to handle, * to send them to the MDS/OST as appropriate and to properly * network encode the arg field. - case FSFILT_IOC_SETVERSION_OLD: - case FSFILT_IOC_SETVERSION: - */ + */ + case FS_IOC_SETVERSION: + return -ENOTSUPP; + case LL_IOC_GET_MDTIDX: { int mdtidx; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fe965b1..c3fb104b 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3055,12 +3055,19 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, case LL_IOC_LOV_GETSTRIPE: case LL_IOC_LOV_GETSTRIPE_NEW: return ll_file_getstripe(inode, (void __user *)arg, 0); - case FSFILT_IOC_GETFLAGS: - case FSFILT_IOC_SETFLAGS: + case FS_IOC_GETFLAGS: + case FS_IOC_SETFLAGS: return ll_iocontrol(inode, file, cmd, arg); - case FSFILT_IOC_GETVERSION_OLD: case FSFILT_IOC_GETVERSION: + case FS_IOC_GETVERSION: return put_user(inode->i_generation, (int __user *)arg); + /* We need to special case any other ioctls we want to handle, + * to send them to the MDS/OST as appropriate and to properly + * network encode the arg field. + */ + case FS_IOC_SETVERSION: + return -ENOTSUPP; + case LL_IOC_GROUP_LOCK: return ll_get_grouplock(inode, file, arg); case LL_IOC_GROUP_UNLOCK: @@ -3068,12 +3075,6 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, case IOC_OBD_STATFS: return ll_obd_statfs(inode, (void __user *)arg); - /* We need to special case any other ioctls we want to handle, - * to send them to the MDS/OST as appropriate and to properly - * network encode the arg field. - case FSFILT_IOC_SETVERSION_OLD: - case FSFILT_IOC_SETVERSION: - */ case LL_IOC_FLUSHCTX: return ll_flush_ctx(inode); case LL_IOC_PATH2FID: { diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 7580d57..e2c7a4d 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2037,7 +2037,7 @@ int ll_iocontrol(struct inode *inode, struct file *file, int rc, flags = 0; switch (cmd) { - case FSFILT_IOC_GETFLAGS: { + case FS_IOC_GETFLAGS: { struct mdt_body *body; struct md_op_data *op_data; @@ -2065,7 +2065,7 @@ int ll_iocontrol(struct inode *inode, struct file *file, return put_user(flags, (int __user *)arg); } - case FSFILT_IOC_SETFLAGS: { + case FS_IOC_SETFLAGS: { struct md_op_data *op_data; struct cl_object *obj; struct iattr *attr; From patchwork Thu Feb 27 21:08:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28D7014BC for ; Thu, 27 Feb 2020 21:19:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 118AF246A1 for ; Thu, 27 Feb 2020 21:19:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 118AF246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BFC1F21FBD5; Thu, 27 Feb 2020 13:19:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8386421FA84 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 764989ED; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 74F3946C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:13 -0500 Message-Id: <1582838290-17243-26-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 025/622] lnet: fix nid range format '*@' support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu In cfs_ip_min_max(), (nidrange->nr_all == 1) means this nid range is a full IP address range(*.*.*.*). In this case, we don't need to compare it to any other nid range, but set min_nid to 0.0.0.0 and max_nid to 255.255.255.255 directly. WC-bug-id: https://jira.whamcloud.com/browse/LU-8913 Lustre-commit: 230266326f49 ("LU-8913 nodemap: fix nodemap range format '*@' support") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/31684 Reviewed-by: Sebastien Buisson Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/nidstrings.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c index b4e38e5..13338d0 100644 --- a/net/lnet/lnet/nidstrings.c +++ b/net/lnet/lnet/nidstrings.c @@ -680,6 +680,12 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, if (nidlist_count > 0) return -EINVAL; + if (nr->nr_all) { + min_ip_addr = 0; + max_ip_addr = 0xffffffff; + break; + } + list_for_each_entry(ar, &nr->nr_addrranges, ar_link) { rc = cfs_ip_ar_min_max(ar, &tmp_min_ip_addr, &tmp_max_ip_addr); From patchwork Thu Feb 27 21:08:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24382138D for ; Thu, 27 Feb 2020 21:19:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D6F6246A1 for ; Thu, 27 Feb 2020 21:19:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D6F6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89BD421FBB0; Thu, 27 Feb 2020 13:19:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C9E8221FA93 for ; Thu, 27 Feb 2020 13:18:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 791F49EF; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 77B6D46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:14 -0500 Message-Id: <1582838290-17243-27-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 026/622] lustre: ptlrpc: fix test_req_buffer_pressure behavior X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini In 2nd patch for LU-9372, to allow limiting number of rqbd-buffers, a wrong and unnecessary test had been added to enhance test_req_buffer_pressure feature. This patch fixes this issue by removing such test. WC-bug-id: https://jira.whamcloud.com/browse/LU-10826 Lustre-commit: 040eca67f8d5 ("LU-10826 ptlrpc: fix test_req_buffer_pressure behavior") Signed-off-by: Bruno Faccini Reviewed-on: https://review.whamcloud.com/31690 Reviewed-by: Wang Shilong Reviewed-by: Li Dongyang Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 3c61e83..8dae21a 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -150,8 +150,7 @@ /* NB: another thread might have recycled enough rqbds, we * need to make sure it wouldn't over-allocate, see LU-1212. */ - if (test_req_buffer_pressure || - svcpt->scp_nrqbds_posted >= svc->srv_nbuf_per_group || + if (svcpt->scp_nrqbds_posted >= svc->srv_nbuf_per_group || (svc->srv_nrqbds_max != 0 && svcpt->scp_nrqbds_total > svc->srv_nrqbds_max)) break; From patchwork Thu Feb 27 21:08:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8A1A14BC for ; Thu, 27 Feb 2020 21:19:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9174C246A2 for ; Thu, 27 Feb 2020 21:19:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9174C246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7A96C21FAB4; Thu, 27 Feb 2020 13:19:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1952521FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7C4C19F0; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A7BD46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:15 -0500 Message-Id: <1582838290-17243-28-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 027/622] lustre: lu_object: improve debug message for lu_object_put() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov Use a top level object in debug in lu_object_put to match with lu_object_get. WC-bug-id: https://jira.whamcloud.com/browse/LU-LU-10877 Lustre-commit: fd669eba1921 ("LU-10877 lu: fix reference leak") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/31870 Reviewed-by: Andrew Perepechko Reviewed-by: Sergey Cheremencev Reviewed-by: Alex Zhuravlev Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_object.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index d8dfc721..2ab4977 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -184,8 +184,8 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) LASSERT(list_empty(&top->loh_lru)); list_add_tail(&top->loh_lru, &bkt->lsb_lru); percpu_counter_inc(&site->ls_lru_len_counter); - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", - o, site->ls_obj_hash, bkt); + CDEBUG(D_INODE, "Add %p/%p to site lru. hash: %p, bkt: %p\n", + orig, top, site->ls_obj_hash, bkt); cfs_hash_bd_unlock(site->ls_obj_hash, &bd, 1); return; } From patchwork Thu Feb 27 21:08:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409685 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F1C914BC for ; Thu, 27 Feb 2020 21:19:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14932246A2 for ; Thu, 27 Feb 2020 21:19:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14932246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 92CF021FB5C; Thu, 27 Feb 2020 13:19:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AC7521FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7EBB79F1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7D3F7468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:16 -0500 Message-Id: <1582838290-17243-29-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 028/622] lustre: idl: remove obsolete directory split flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The directory split functionality from the old CMD (pre-DNE) feature was never usable in production, and was removed before the DNE 2.4 release. Remove old flags relating to this feature. WC-bug-id: https://jira.whamcloud.com/browse/LU-1187 Lustre-commit: 5c53c353fd82 ("LU-1187 idl: remove obsolete directory split flags") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31700 Reviewed-by: James Simmons Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_lib.c | 2 -- fs/lustre/ptlrpc/wiretest.c | 4 ---- include/uapi/linux/lustre/lustre_idl.h | 4 ++-- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index d4b2bb9..467503c 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -520,8 +520,6 @@ void mdc_getattr_pack(struct ptlrpc_request *req, u64 valid, u32 flags, &RMF_MDT_BODY); b->mbo_valid = valid; - if (op_data->op_bias & MDS_CHECK_SPLIT) - b->mbo_valid |= OBD_MD_FLCKSPLIT; if (op_data->op_bias & MDS_CROSS_REF) b->mbo_valid |= OBD_MD_FLCROSSREF; b->mbo_eadatasize = ea_size; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 21698cc..bcd0229 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1341,8 +1341,6 @@ void lustre_assert_wire_constants(void) OBD_MD_FLMDSCAPA); LASSERTF(OBD_MD_FLOSSCAPA == (0x0000040000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLOSSCAPA); - LASSERTF(OBD_MD_FLCKSPLIT == (0x0000080000000000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLCKSPLIT); LASSERTF(OBD_MD_FLCROSSREF == (0x0000100000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLCROSSREF); LASSERTF(OBD_MD_FLGETATTRLOCK == (0x0000200000000000ULL), "found 0x%.16llxULL\n", @@ -1866,8 +1864,6 @@ void lustre_assert_wire_constants(void) LASSERTF((int)sizeof(((struct ll_fid *)0)->f_type) == 4, "found %lld\n", (long long)(int)sizeof(((struct ll_fid *)0)->f_type)); - LASSERTF(MDS_CHECK_SPLIT == 0x00000001UL, "found 0x%.8xUL\n", - (unsigned int)MDS_CHECK_SPLIT); LASSERTF(MDS_CROSS_REF == 0x00000002UL, "found 0x%.8xUL\n", (unsigned int)MDS_CROSS_REF); LASSERTF(MDS_VTX_BYPASS == 0x00000004UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 0bce63d..589bb81 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1131,7 +1131,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) /* OBD_MD_FLRMTPERM (0x0000010000000000ULL) remote perm, obsolete */ #define OBD_MD_FLMDSCAPA (0x0000020000000000ULL) /* MDS capability */ #define OBD_MD_FLOSSCAPA (0x0000040000000000ULL) /* OSS capability */ -#define OBD_MD_FLCKSPLIT (0x0000080000000000ULL) /* Check split on server */ +/* OBD_MD_FLCKSPLIT (0x0000080000000000ULL) obsolete 2.3.58*/ #define OBD_MD_FLCROSSREF (0x0000100000000000ULL) /* Cross-ref case */ #define OBD_MD_FLGETATTRLOCK (0x0000200000000000ULL) /* Get IOEpoch attributes * under lock; for xattr @@ -1640,7 +1640,7 @@ struct mdt_rec_setattr { #define MDS_ATTR_PROJID 0x10000ULL /* = 65536 */ enum mds_op_bias { - MDS_CHECK_SPLIT = 1 << 0, +/* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ MDS_CROSS_REF = 1 << 1, MDS_VTX_BYPASS = 1 << 2, MDS_PERM_BYPASS = 1 << 3, From patchwork Thu Feb 27 21:08:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1CC0614BC for ; Thu, 27 Feb 2020 21:19:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05377246A1 for ; Thu, 27 Feb 2020 21:19:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05377246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1188B21FAEC; Thu, 27 Feb 2020 13:19:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA5FD21FA96 for ; Thu, 27 Feb 2020 13:18:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 819679F4; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8017D46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:17 -0500 Message-Id: <1582838290-17243-30-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 029/622] lustre: mdc: resend quotactl if needed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In mdc_quotactl, it is better to resend the quotactl request if reconnection or failover is triggered during the process. WC-bug-id: https://jira.whamcloud.com/browse/LU-10368 Lustre-commit: d511918e8eb7 ("LU-10368 mdc: resend quotactl if needed") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/31773 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 5718db2..feac374 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1867,7 +1867,7 @@ static int mdc_ioc_hsm_ct_start(struct obd_export *exp, struct lustre_kernelcomm *lk); static int mdc_quotactl(struct obd_device *unused, struct obd_export *exp, - struct obd_quotactl *oqctl) + struct obd_quotactl *oqctl) { struct ptlrpc_request *req; struct obd_quotactl *oqc; @@ -1884,7 +1884,6 @@ static int mdc_quotactl(struct obd_device *unused, struct obd_export *exp, ptlrpc_request_set_replen(req); ptlrpc_at_set_req_timeout(req); - req->rq_no_resend = 1; rc = ptlrpc_queue_wait(req); if (rc) From patchwork Thu Feb 27 21:08:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 357E914BC for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1DED9246A0 for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DED9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 155D321FF21; Thu, 27 Feb 2020 13:20:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0BD1A21FA58 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 843D39F5; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8301D46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:18 -0500 Message-Id: <1582838290-17243-31-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 030/622] lustre: obd: create ping sysfs file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" We have ping in the lustre debugfs tree. Its a perfect fit for sysfs. Create a sysfs equivalent so we can in time remove the debugfs file. WC-bug-id: https://jira.hpdd.intel.com/browse/LU-8066 Lustre-commit: 0100ab268c31 ("LU-8066 obd: final pieces for sysfs/debugfs support") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/28108 Lustre-commit: 6bbae72c6900 ("LU-8066 sysfs: make ping sysfs file read and writable") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33776 Reviewed-by: Dmitry Eremin Reviewed-by: Ben Evans Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 6 ++++-- fs/lustre/mdc/lproc_mdc.c | 7 +++---- fs/lustre/mgc/lproc_mgc.c | 7 +++---- fs/lustre/osc/lproc_osc.c | 7 +++---- fs/lustre/ptlrpc/lproc_ptlrpc.c | 18 ++++++++---------- 5 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 965f8a1..32d43fb 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -457,8 +457,10 @@ int lprocfs_wr_uint(struct file *file, const char __user *buffer, struct adaptive_timeout; int lprocfs_at_hist_helper(struct seq_file *m, struct adaptive_timeout *at); int lprocfs_rd_timeouts(struct seq_file *m, void *data); -int lprocfs_wr_ping(struct file *file, const char __user *buffer, - size_t count, loff_t *off); + +ssize_t ping_show(struct kobject *kobj, struct attribute *attr, + char *buffer); + int lprocfs_wr_import(struct file *file, const char __user *buffer, size_t count, loff_t *off); int lprocfs_rd_pinger_recov(struct seq_file *m, void *n); diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index f09292e..6b87e76 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -306,6 +306,8 @@ static ssize_t max_mod_rpcs_in_flight_store(struct kobject *kobj, #define mdc_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(mdc_conn_uuid); +LUSTRE_RO_ATTR(ping); + static ssize_t mdc_rpc_stats_seq_write(struct file *file, const char __user *buf, size_t len, loff_t *off) @@ -454,8 +456,6 @@ static ssize_t mdc_stats_seq_write(struct file *file, } LPROC_SEQ_FOPS(mdc_stats); -LPROC_SEQ_FOPS_WR_ONLY(mdc, ping); - LPROC_SEQ_FOPS_RO_TYPE(mdc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(mdc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(mdc, timeouts); @@ -465,8 +465,6 @@ static ssize_t mdc_stats_seq_write(struct file *file, LPROC_SEQ_FOPS_RW_TYPE(mdc, pinger_recov); static struct lprocfs_vars lprocfs_mdc_obd_vars[] = { - { .name = "ping", - .fops = &mdc_ping_fops }, { .name = "connect_flags", .fops = &mdc_connect_flags_fops }, { .name = "mds_server_uuid", @@ -500,6 +498,7 @@ static ssize_t mdc_stats_seq_write(struct file *file, &lustre_attr_max_mod_rpcs_in_flight.attr, &lustre_attr_max_pages_per_rpc.attr, &lustre_attr_mdc_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/mgc/lproc_mgc.c b/fs/lustre/mgc/lproc_mgc.c index d977d51..4c276f9 100644 --- a/fs/lustre/mgc/lproc_mgc.c +++ b/fs/lustre/mgc/lproc_mgc.c @@ -45,8 +45,6 @@ LPROC_SEQ_FOPS_RO_TYPE(mgc, state); -LPROC_SEQ_FOPS_WR_ONLY(mgc, ping); - static int mgc_ir_state_seq_show(struct seq_file *m, void *v) { return lprocfs_mgc_rd_ir_state(m, m->private); @@ -55,8 +53,6 @@ static int mgc_ir_state_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(mgc_ir_state); struct lprocfs_vars lprocfs_mgc_obd_vars[] = { - { .name = "ping", - .fops = &mgc_ping_fops }, { .name = "connect_flags", .fops = &mgc_connect_flags_fops }, { .name = "mgs_server_uuid", @@ -73,8 +69,11 @@ struct lprocfs_vars lprocfs_mgc_obd_vars[] = { #define mgs_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(mgs_conn_uuid); +LUSTRE_RO_ATTR(ping); + static struct attribute *mgc_attrs[] = { &lustre_attr_mgs_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index df48138..605a236 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -176,6 +176,8 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj, #define ost_conn_uuid_show conn_uuid_show LUSTRE_RO_ATTR(ost_conn_uuid); +LUSTRE_RO_ATTR(ping); + static int osc_cached_mb_seq_show(struct seq_file *m, void *v) { struct obd_device *dev = m->private; @@ -601,14 +603,10 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); LPROC_SEQ_FOPS_RO_TYPE(osc, state); -LPROC_SEQ_FOPS_WR_ONLY(osc, ping); - LPROC_SEQ_FOPS_RW_TYPE(osc, import); LPROC_SEQ_FOPS_RW_TYPE(osc, pinger_recov); static struct lprocfs_vars lprocfs_osc_obd_vars[] = { - { .name = "ping", - .fops = &osc_ping_fops }, { .name = "connect_flags", .fops = &osc_connect_flags_fops }, { .name = "ost_server_uuid", @@ -812,6 +810,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_short_io_bytes.attr, &lustre_attr_resend_count.attr, &lustre_attr_ost_conn_uuid.attr, + &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 3dc99d4..e48a4e8 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -1227,13 +1227,11 @@ void ptlrpc_lprocfs_unregister_obd(struct obd_device *obd) } EXPORT_SYMBOL(ptlrpc_lprocfs_unregister_obd); -#undef BUFLEN - -int lprocfs_wr_ping(struct file *file, const char __user *buffer, - size_t count, loff_t *off) +ssize_t ping_show(struct kobject *kobj, struct attribute *attr, + char *buffer) { - struct seq_file *m = file->private_data; - struct obd_device *obd = m->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct ptlrpc_request *req; int rc; @@ -1249,13 +1247,13 @@ int lprocfs_wr_ping(struct file *file, const char __user *buffer, req->rq_send_state = LUSTRE_IMP_FULL; rc = ptlrpc_queue_wait(req); - ptlrpc_req_finished(req); - if (rc >= 0) - return count; + return rc; } -EXPORT_SYMBOL(lprocfs_wr_ping); +EXPORT_SYMBOL(ping_show); + +#undef BUFLEN /* Write the connection UUID to this file to attempt to connect to that node. * The connection UUID is a node's primary NID. For example, From patchwork Thu Feb 27 21:08:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BF8B138D for ; Thu, 27 Feb 2020 21:19:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14A32246A1 for ; Thu, 27 Feb 2020 21:19:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14A32246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E65621FB95; Thu, 27 Feb 2020 13:19:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 638CF21FAA9 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8901A9F6; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 85F2C46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:19 -0500 Message-Id: <1582838290-17243-32-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 031/622] lustre: ldlm: change LDLM_POOL_ADD_VAR macro to inline function X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Dmitry Eremin , Oleg Drokin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Simple cleanup to create inline funciton ldlm_pool_add_var(). WC-bug-id: https://jira.hpdd.intel.com/browse/LU-8066 Lustre-commit: 05a36534ba2d ("LU-8066 ldlm: move all remaining files from procfs to debugfs") Signed-off-by: Dmitry Eremin Signed-off-by: Oleg Drokin Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/29255 WC-bug-id: https://jira.hpdd.intel.com/browse/LU-3319 Lustre-commit: 4ad445ccd54 ("LU-3319 procfs: move ldlm proc handling over to seq_file") Reviewed-on: http://review.whamcloud.com/7293 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: Peng Tao Reviewed-by: Bob Glossman Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_internal.h | 10 ++++++++++ fs/lustre/ldlm/ldlm_pool.c | 11 ++--------- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 6e54521..96dff1d 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -292,6 +292,16 @@ enum ldlm_policy_res { } \ struct __##var##__dummy_write {; } /* semicolon catcher */ +static inline void +ldlm_add_var(struct lprocfs_vars *vars, struct dentry *debugfs_entry, + const char *name, void *data, const struct file_operations *ops) +{ + vars->name = name; + vars->data = data; + vars->fops = ops; + ldebugfs_add_vars(debugfs_entry, vars, NULL); +} + static inline int is_granted_or_cancelled(struct ldlm_lock *lock) { int ret = 0; diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c index 04bf5de..d2149a6 100644 --- a/fs/lustre/ldlm/ldlm_pool.c +++ b/fs/lustre/ldlm/ldlm_pool.c @@ -504,14 +504,6 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr, LDLM_POOL_SYSFS_WRITER_NOLOCK_STORE(lock_volume_factor, atomic); LUSTRE_RW_ATTR(lock_volume_factor); -#define LDLM_POOL_ADD_VAR(_name, var, ops) \ - do { \ - pool_vars[0].name = #_name; \ - pool_vars[0].data = var; \ - pool_vars[0].fops = ops; \ - ldebugfs_add_vars(pl->pl_debugfs_entry, pool_vars, NULL);\ - } while (0) - /* These are for pools in /sys/fs/lustre/ldlm/namespaces/.../pool */ static struct attribute *ldlm_pl_attrs[] = { &lustre_attr_grant_speed.attr, @@ -571,7 +563,8 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl) memset(pool_vars, 0, sizeof(pool_vars)); - LDLM_POOL_ADD_VAR(state, pl, &lprocfs_pool_state_fops); + ldlm_add_var(&pool_vars[0], pl->pl_debugfs_entry, "state", pl, + &lprocfs_pool_state_fops); pl->pl_stats = lprocfs_alloc_stats(LDLM_POOL_LAST_STAT - LDLM_POOL_FIRST_STAT, 0); From patchwork Thu Feb 27 21:08:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E8BA138D for ; Thu, 27 Feb 2020 21:20:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15B91246A1 for ; Thu, 27 Feb 2020 21:20:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15B91246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EBEF221FC40; Thu, 27 Feb 2020 13:19:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A659C21FA75 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8BC6A9F7; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 890C846D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:20 -0500 Message-Id: <1582838290-17243-33-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 032/622] lustre: obdecho: use vmalloc for lnb X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger When allocating the niobuf_local, if there are a large number of (potential) fragments this allocation can be quite large. Use kvmalloc_array() and kvfree() to avoid allocation errors and console noise. This was causing sanity test_180c to fail in a VM on occasion, and could also be problem in real use. WC-bug-id: https://jira.whamcloud.com/browse/LU-10903 Lustre-commit: 8878bab7ae5f ("LU-10903 obdecho: use OBD_ALLOC_LARGE for lnb") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31964 Reviewed-by: Emoly Liu Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdecho/echo_client.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 3984cb4..0735a5a 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1343,7 +1343,8 @@ static int echo_client_prep_commit(const struct lu_env *env, npages = batch >> PAGE_SHIFT; tot_pages = count >> PAGE_SHIFT; - lnb = kcalloc(npages, sizeof(struct niobuf_local), GFP_NOFS); + lnb = kvmalloc_array(npages, sizeof(struct niobuf_local), + GFP_NOFS | __GFP_ZERO); if (!lnb) { ret = -ENOMEM; goto out; @@ -1411,7 +1412,7 @@ static int echo_client_prep_commit(const struct lu_env *env, } out: - kfree(lnb); + kvfree(lnb); return ret; } From patchwork Thu Feb 27 21:08:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F8A214BC for ; Thu, 27 Feb 2020 21:20:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 77993246A1 for ; Thu, 27 Feb 2020 21:20:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77993246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35E5321FC5F; Thu, 27 Feb 2020 13:19:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E9DC721FA75 for ; Thu, 27 Feb 2020 13:18:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8DAE59F8; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C9C5468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:21 -0500 Message-Id: <1582838290-17243-34-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 033/622] lustre: mdc: deny layout swap for DoM file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Layout swap is prohibited for DoM files until LU-10177 will be implemented. The only exception is the new layout having the same DoM component. WC-bug-id: https://jira.whamcloud.com/browse/LU-10910 Lustre-commit: 51c11d7cfaff ("LU-10910 mdd: deny layout swap for DoM file") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32044 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 80e3120..21dc83e 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -149,7 +149,8 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env, * writers can share a single PW lock. */ mode = mdc_dom_lock_match(env, osc_export(obj), resname, LDLM_IBITS, - policy, LCK_PR | LCK_PW, &flags, obj, &lockh, + policy, LCK_PR | LCK_PW | LCK_GROUP, &flags, + obj, &lockh, dap_flags & OSC_DAP_FL_CANCELING); if (mode) { lock = ldlm_handle2lock(&lockh); From patchwork Thu Feb 27 21:08:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C5F514BC for ; Thu, 27 Feb 2020 21:19:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14489246A1 for ; Thu, 27 Feb 2020 21:19:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14489246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5E1BE21FC5F; Thu, 27 Feb 2020 13:19:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3641621FA63 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 90E979F9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F7A446A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:22 -0500 Message-Id: <1582838290-17243-35-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 034/622] lustre: mgc: remove obsolete IR swabbing workaround X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The OBD_CONNECT_MNE_SWAB check was added to the MGC for compatibility with servers in the 2.2.0-2.2.55 range (in 2012) with big-endian clients. 2.2 was not an LTS release and is no longer being used. Remove the checks on the client for OBD_CONNECT_MNE_SWAB being set, and assume that the server does not have this bug. This will allow the removal of the rest of this workaround from the server code once there are no more clients depending on the presence of this flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-1644 Lustre-commit: a0c644fde340 ("LU-1644 mgc: remove obsolete IR swabbing workaround") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32087 Reviewed-by: John L. Hammond Reviewed-by: Jinshan Xiong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 4 ---- fs/lustre/mgc/mgc_request.c | 9 +-------- fs/lustre/ptlrpc/import.c | 21 --------------------- 3 files changed, 1 insertion(+), 33 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 522e5b7..0d7bb0f 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -289,10 +289,6 @@ struct obd_import { imp_resend_replay:1, /* disable normal recovery, for test only. */ imp_no_pinger_recover:1, -#if OBD_OCD_VERSION(3, 0, 53, 0) > LUSTRE_VERSION_CODE - /* need IR MNE swab */ - imp_need_mne_swab:1, -#endif /* import must be reconnected instead of * chosing new connection */ diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index ca4b8a9..c114aa8 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -1436,14 +1436,7 @@ static int mgc_process_recover_log(struct obd_device *obd, goto out; } - mne_swab = !!ptlrpc_rep_need_swab(req); -#if OBD_OCD_VERSION(3, 0, 53, 0) > LUSTRE_VERSION_CODE - /* This import flag means the server did an extra swab of IR MNE - * records (fixed in LU-1252), reverse it here if needed. LU-1644 - */ - if (unlikely(req->rq_import->imp_need_mne_swab)) - mne_swab = !mne_swab; -#endif + mne_swab = ptlrpc_rep_need_swab(req); for (i = 0; i < nrpages && ealen > 0; i++) { int rc2; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index dca4aa0..f69b907 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -780,27 +780,6 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, warned = true; } -#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0) - /* - * Check if server has LU-1252 fix applied to not always swab - * the IR MNE entries. Do this only once per connection. This - * fixup is version-limited, because we don't want to carry the - * OBD_CONNECT_MNE_SWAB flag around forever, just so long as we - * need interop with unpatched 2.2 servers. For newer servers, - * the client will do MNE swabbing only as needed. LU-1644 - */ - if (unlikely((ocd->ocd_connect_flags & OBD_CONNECT_VERSION) && - !(ocd->ocd_connect_flags & OBD_CONNECT_MNE_SWAB) && - OBD_OCD_VERSION_MAJOR(ocd->ocd_version) == 2 && - OBD_OCD_VERSION_MINOR(ocd->ocd_version) == 2 && - OBD_OCD_VERSION_PATCH(ocd->ocd_version) < 55 && - !strcmp(imp->imp_obd->obd_type->typ_name, - LUSTRE_MGC_NAME))) - imp->imp_need_mne_swab = 1; - else /* clear if server was upgraded since last connect */ - imp->imp_need_mne_swab = 0; -#endif - if (ocd->ocd_connect_flags & OBD_CONNECT_CKSUM) { /* * We sent to the server ocd_cksum_types with bits set From patchwork Thu Feb 27 21:08:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3EC814BC for ; Thu, 27 Feb 2020 21:19:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 89CD4246A1 for ; Thu, 27 Feb 2020 21:19:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89CD4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4F0A21FBC7; Thu, 27 Feb 2020 13:19:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B84921FAB0 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9429B9FA; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 927E346C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:23 -0500 Message-Id: <1582838290-17243-36-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 035/622] lustre: ptlrpc: add dir migration connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Add dir migration connect flag to prevent collision with other features. Though dir migration code exists, it will be reworked, and the new RPC protocol won't be compatible with current one. Also handle the previously-added OBD_CONNECT2_FLR flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 14b98596fa24 ("LU-4684 ptlrpc: add dir migration connect flag") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31914 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 8 ++++++-- fs/lustre/ptlrpc/wiretest.c | 4 ++++ include/uapi/linux/lustre/lustre_idl.h | 2 ++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 33c76c1..66d2679 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -111,8 +111,12 @@ "compact_obdo", "second_flags", /* flags2 names */ - "file_secctx", - "lockaheadv2", + "file_secctx", /* 0x01 */ + "lockaheadv2", /* 0x02 */ + "dir_migrate", /* 0x04 */ + "unknown", /* 0x08 */ + "unknown", /* 0x10 */ + "flr", /* 0x20 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index bcd0229..46d5e74 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1111,6 +1111,10 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_FILE_SECCTX); LASSERTF(OBD_CONNECT2_LOCKAHEAD == 0x2ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_LOCKAHEAD); + LASSERTF(OBD_CONNECT2_DIR_MIGRATE == 0x4ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_DIR_MIGRATE); + LASSERTF(OBD_CONNECT2_FLR == 0x20ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_FLR); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 589bb81..e898e67 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -791,6 +791,8 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LOCKAHEAD 0x2ULL /* ladvise lockahead * v2 */ +#define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir + */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ /* XXX README XXX: From patchwork Thu Feb 27 21:08:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B707114BC for ; Thu, 27 Feb 2020 21:22:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D913246A0 for ; Thu, 27 Feb 2020 21:22:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D913246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BE4DC3488DF; Thu, 27 Feb 2020 13:21:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D87F721FAB4 for ; Thu, 27 Feb 2020 13:18:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 968379FE; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9549346F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:24 -0500 Message-Id: <1582838290-17243-37-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 036/622] lustre: mds: remove obsolete MDS_VTX_BYPASS flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The MDS_VTX_BYPASS flag is only set and never checked. This is true since 2.3.53-66-g54fe979 "LU-2216 mdt: remove obsolete DNE code", but it was already obsolete for a long time before that. WC-bug-id: https://jira.whamcloud.com/browse/LU-6349 Lustre-commit: b99344dda425 ("LU-6349 mds: remove obsolete MDS_VTX_BYPASS flag") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/31984 Reviewed-by: Lai Siyao Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 -- include/uapi/linux/lustre/lustre_idl.h | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 46d5e74..c92663b 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1870,8 +1870,6 @@ void lustre_assert_wire_constants(void) LASSERTF(MDS_CROSS_REF == 0x00000002UL, "found 0x%.8xUL\n", (unsigned int)MDS_CROSS_REF); - LASSERTF(MDS_VTX_BYPASS == 0x00000004UL, "found 0x%.8xUL\n", - (unsigned int)MDS_VTX_BYPASS); LASSERTF(MDS_PERM_BYPASS == 0x00000008UL, "found 0x%.8xUL\n", (unsigned int)MDS_PERM_BYPASS); LASSERTF(MDS_QUOTA_IGNORE == 0x00000020UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index e898e67..794e6d6 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1644,11 +1644,11 @@ struct mdt_rec_setattr { enum mds_op_bias { /* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ MDS_CROSS_REF = 1 << 1, - MDS_VTX_BYPASS = 1 << 2, +/* MDS_VTX_BYPASS = 1 << 2, obsolete since 2.3.54 */ MDS_PERM_BYPASS = 1 << 3, /* MDS_SOM = 1 << 4, obsolete since 2.8.0 */ MDS_QUOTA_IGNORE = 1 << 5, - MDS_CLOSE_CLEANUP = 1 << 6, +/* MDS_CLOSE_CLEANUP = 1 << 6, obsolete since 2.3.51 */ MDS_KEEP_ORPHAN = 1 << 7, MDS_RECOV_OPEN = 1 << 8, MDS_DATA_MODIFIED = 1 << 9, From patchwork Thu Feb 27 21:08:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A6F9138D for ; Thu, 27 Feb 2020 21:20:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 43266246A1 for ; Thu, 27 Feb 2020 21:20:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43266246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 258DE21FE4A; Thu, 27 Feb 2020 13:19:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2520721FABD for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9AD96A03; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 98831468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:25 -0500 Message-Id: <1582838290-17243-38-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 037/622] lustre: ldlm: expose dirty age limit for flush-on-glimpse X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Glimpse request may cancel old lock and cause data flush. That helps to cache stat results on client locally early. The time limit was hardcoded to 10s and is exposed now as ns_dirty_age_limit namespace value, it can be set/check via /sys/fs/lustre/ldlm/namespaces//dirty_age_limit WC-bug-id: https://jira.whamcloud.com/browse/LU-10413 Lustre-commit: 69727e45b4c0 ("LU-10413 ldlm: expose dirty age limit for flush-on-glimpse") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32113 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 12 +++++++++++- fs/lustre/ldlm/ldlm_lockd.c | 2 +- fs/lustre/ldlm/ldlm_resource.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 40 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index b1a37f0..8dea9ab 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -60,6 +60,10 @@ #define LDLM_DEFAULT_LRU_SIZE (100 * num_online_cpus()) #define LDLM_DEFAULT_MAX_ALIVE (64 * 60) /* 65 min */ +/* if client lock is unused for that time it can be cancelled if any other + * client shows interest in that lock, e.g. glimpse is occurred. + */ +#define LDLM_DIRTY_AGE_LIMIT (10) #define LDLM_DEFAULT_PARALLEL_AST_LIMIT 1024 /** @@ -412,7 +416,13 @@ struct ldlm_namespace { /** Maximum allowed age (last used time) for locks in the LRU */ ktime_t ns_max_age; - + /** + * Number of seconds since the lock was last used. The client may + * cancel the lock limited by this age and flush related data if + * any other client shows interest in it doing glimpse request. + * This allows to cache stat data locally for such files early. + */ + time64_t ns_dirty_age_limit; /** * Used to rate-limit ldlm_namespace_dump calls. * \see ldlm_namespace_dump. Increased by 10 seconds every time diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 84d73e6..481719b 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -305,7 +305,7 @@ static void ldlm_handle_gl_callback(struct ptlrpc_request *req, !lock->l_readers && !lock->l_writers && ktime_after(ktime_get(), ktime_add(lock->l_last_used, - ktime_set(10, 0)))) { + ktime_set(ns->ns_dirty_age_limit, 0)))) { unlock_res_and_lock(lock); if (ldlm_bl_to_thread_lock(ns, NULL, lock)) ldlm_handle_bl_callback(ns, NULL, lock); diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 4e3c6e7..5e0dd53 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -327,6 +327,32 @@ static ssize_t early_lock_cancel_store(struct kobject *kobj, } LUSTRE_RW_ATTR(early_lock_cancel); +static ssize_t dirty_age_limit_show(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace, + ns_kobj); + + return sprintf(buf, "%llu\n", ns->ns_dirty_age_limit); +} + +static ssize_t dirty_age_limit_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, size_t count) +{ + struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace, + ns_kobj); + unsigned long long tmp; + + if (kstrtoull(buffer, 10, &tmp)) + return -EINVAL; + + ns->ns_dirty_age_limit = tmp; + + return count; +} +LUSTRE_RW_ATTR(dirty_age_limit); + /* These are for namespaces in /sys/fs/lustre/ldlm/namespaces/ */ static struct attribute *ldlm_ns_attrs[] = { &lustre_attr_resource_count.attr, @@ -335,6 +361,7 @@ static ssize_t early_lock_cancel_store(struct kobject *kobj, &lustre_attr_lru_size.attr, &lustre_attr_lru_max_age.attr, &lustre_attr_early_lock_cancel.attr, + &lustre_attr_dirty_age_limit.attr, NULL, }; @@ -653,6 +680,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_max_age = ktime_set(LDLM_DEFAULT_MAX_ALIVE, 0); ns->ns_orig_connect_flags = 0; ns->ns_connect_flags = 0; + ns->ns_dirty_age_limit = LDLM_DIRTY_AGE_LIMIT; ns->ns_stopping = 0; rc = ldlm_namespace_sysfs_register(ns); From patchwork Thu Feb 27 21:08:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E02C138D for ; Thu, 27 Feb 2020 21:19:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66D99246A1 for ; Thu, 27 Feb 2020 21:19:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66D99246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E96A121FCB5; Thu, 27 Feb 2020 13:19:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7CF8F21FA96 for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9D55DA04; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9B7CC46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:26 -0500 Message-Id: <1582838290-17243-39-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 038/622] lustre: ldlm: IBITS lock convert instead of cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin For IBITS lock it is possible to drop just conflicting bits and keep lock itself instead of cancelling it. Lock convert is only bits downgrade on client and then on server. Patch implements lock convert during blocking AST. WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 37932c4beb98 ("LU-10175 ldlm: IBITS lock convert instead of cancel") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/30202 Reviewed-by: Lai Siyao Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 6 + fs/lustre/include/lustre_dlm_flags.h | 16 +- fs/lustre/ldlm/ldlm_inodebits.c | 92 +++++++- fs/lustre/ldlm/ldlm_internal.h | 2 + fs/lustre/ldlm/ldlm_lock.c | 13 +- fs/lustre/ldlm/ldlm_lockd.c | 18 ++ fs/lustre/ldlm/ldlm_request.c | 198 ++++++++++++++++- fs/lustre/llite/namei.c | 383 ++++++++++++++++++++------------- fs/lustre/ptlrpc/wiretest.c | 2 +- include/uapi/linux/lustre/lustre_idl.h | 1 + 10 files changed, 569 insertions(+), 162 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 8dea9ab..66608a9 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -544,6 +544,7 @@ enum ldlm_cancel_flags { LCF_BL_AST = 0x4, /* Cancel locks marked as LDLM_FL_BL_AST * in the same RPC */ + LCF_CONVERT = 0x8, /* Try to convert IBITS lock before cancel */ }; struct ldlm_flock { @@ -1306,6 +1307,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, enum ldlm_mode mode, u64 *flags, void *lvb, u32 lvb_len, const struct lustre_handle *lockh, int rc); +int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags); int ldlm_cli_update_pool(struct ptlrpc_request *req); int ldlm_cli_cancel(const struct lustre_handle *lockh, enum ldlm_cancel_flags cancel_flags); @@ -1330,6 +1332,10 @@ int ldlm_cli_cancel_list(struct list_head *head, int count, enum ldlm_cancel_flags flags); /** @} ldlm_cli_api */ +int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop); +int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits); +int ldlm_cli_dropbits_list(struct list_head *converts, u64 drop_bits); + /* mds/handler.c */ /* This has to be here because recursive inclusion sucks. */ int intent_disposition(struct ldlm_reply *rep, int flag); diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h index 22fb595..c8667c8 100644 --- a/fs/lustre/include/lustre_dlm_flags.h +++ b/fs/lustre/include/lustre_dlm_flags.h @@ -26,10 +26,10 @@ */ #ifndef LDLM_ALL_FLAGS_MASK -/** l_flags bits marked as "all_flags" bits */ -#define LDLM_FL_ALL_FLAGS_MASK 0x00FFFFFFC08F932FULL +/* l_flags bits marked as "all_flags" bits */ +#define LDLM_FL_ALL_FLAGS_MASK 0x00FFFFFFC28F932FULL -/** extent, mode, or resource changed */ +/* extent, mode, or resource changed */ #define LDLM_FL_LOCK_CHANGED 0x0000000000000001ULL /* bit 0 */ #define ldlm_is_lock_changed(_l) LDLM_TEST_FLAG((_l), 1ULL << 0) #define ldlm_set_lock_changed(_l) LDLM_SET_FLAG((_l), 1ULL << 0) @@ -146,6 +146,16 @@ #define ldlm_clear_cancel_on_block(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 23) /** + * Flag indicates that lock is being converted (downgraded) during the blocking + * AST instead of cancelling. Used for IBITS locks now and drops conflicting + * bits only keepeing other. + */ +#define LDLM_FL_CONVERTING 0x0000000002000000ULL /* bit 25 */ +#define ldlm_is_converting(_l) LDLM_TEST_FLAG((_l), 1ULL << 25) +#define ldlm_set_converting(_l) LDLM_SET_FLAG((_l), 1ULL << 25) +#define ldlm_clear_converting(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 25) + +/* * Part of original lockahead implementation, OBD_CONNECT_LOCKAHEAD_OLD. * Reserved temporarily to allow those implementations to keep working. * Will be removed after 2.12 release. diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index ea63d9d..e74928e 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -68,7 +68,14 @@ void ldlm_ibits_policy_local_to_wire(const union ldlm_policy_data *lpolicy, wpolicy->l_inodebits.bits = lpolicy->l_inodebits.bits; } -int ldlm_inodebits_drop(struct ldlm_lock *lock, __u64 to_drop) +/** + * Attempt to convert already granted IBITS lock with several bits set to + * a lock with less bits (downgrade). + * + * Such lock conversion is used to keep lock with non-blocking bits instead of + * cancelling it, introduced for better support of DoM files. + */ +int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop) { check_res_locked(lock->l_resource); @@ -89,3 +96,86 @@ int ldlm_inodebits_drop(struct ldlm_lock *lock, __u64 to_drop) return 0; } EXPORT_SYMBOL(ldlm_inodebits_drop); + +/* convert single lock */ +int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) +{ + struct lustre_handle lockh; + u32 flags = 0; + int rc; + + LASSERT(drop_bits); + LASSERT(!lock->l_readers && !lock->l_writers); + + LDLM_DEBUG(lock, "client lock convert START"); + + ldlm_lock2handle(lock, &lockh); + lock_res_and_lock(lock); + /* check if all bits are cancelled */ + if (!(lock->l_policy_data.l_inodebits.bits & ~drop_bits)) { + unlock_res_and_lock(lock); + /* return error to continue with cancel */ + rc = -EINVAL; + goto exit; + } + + /* check if there is race with cancel */ + if (ldlm_is_canceling(lock) || ldlm_is_cancel(lock)) { + unlock_res_and_lock(lock); + rc = -EINVAL; + goto exit; + } + + /* clear cbpending flag early, it is safe to match lock right after + * client convert because it is downgrade always. + */ + ldlm_clear_cbpending(lock); + ldlm_clear_bl_ast(lock); + + /* If lock is being converted already, check drop bits first */ + if (ldlm_is_converting(lock)) { + /* raced lock convert, lock inodebits are remaining bits + * so check if they are conflicting with new convert or not. + */ + if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { + unlock_res_and_lock(lock); + rc = 0; + goto exit; + } + /* Otherwise drop new conflicting bits in new convert */ + } + ldlm_set_converting(lock); + /* from all bits of blocking lock leave only conflicting */ + drop_bits &= lock->l_policy_data.l_inodebits.bits; + /* save them in cancel_bits, so l_blocking_ast will know + * which bits from the current lock were dropped. + */ + lock->l_policy_data.l_inodebits.cancel_bits = drop_bits; + /* Finally clear these bits in lock ibits */ + ldlm_inodebits_drop(lock, drop_bits); + unlock_res_and_lock(lock); + /* Finally call cancel callback for remaining bits only. + * It is important to have converting flag during that + * so blocking_ast callback can distinguish convert from + * cancels. + */ + if (lock->l_blocking_ast) + lock->l_blocking_ast(lock, NULL, lock->l_ast_data, + LDLM_CB_CANCELING); + + /* now notify server about convert */ + rc = ldlm_cli_convert(lock, &flags); + if (rc) { + lock_res_and_lock(lock); + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + unlock_res_and_lock(lock); + LASSERT(list_empty(&lock->l_lru)); + goto exit; + } + +exit: + LDLM_DEBUG(lock, "client lock convert END"); + return rc; +} diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 96dff1d..ec68713 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -153,7 +153,9 @@ int ldlm_run_ast_work(struct ldlm_namespace *ns, struct list_head *rpc_list, #define ldlm_lock_remove_from_lru(lock) \ ldlm_lock_remove_from_lru_check(lock, ktime_set(0, 0)) int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock); +void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock); void ldlm_lock_destroy_nolock(struct ldlm_lock *lock); +void ldlm_grant_lock_with_skiplist(struct ldlm_lock *lock); /* ldlm_lockd.c */ int ldlm_bl_to_thread_lock(struct ldlm_namespace *ns, struct ldlm_lock_desc *ld, diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index aa19b89..9847c43 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -241,7 +241,7 @@ int ldlm_lock_remove_from_lru_check(struct ldlm_lock *lock, ktime_t last_use) /** * Adds LDLM lock @lock to namespace LRU. Assumes LRU is already locked. */ -static void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) +void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) { struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); @@ -791,7 +791,8 @@ void ldlm_lock_decref_internal(struct ldlm_lock *lock, enum ldlm_mode mode) ldlm_bl_to_thread_lock(ns, NULL, lock) != 0) ldlm_handle_bl_callback(ns, NULL, lock); } else if (!lock->l_readers && !lock->l_writers && - !ldlm_is_no_lru(lock) && !ldlm_is_bl_ast(lock)) { + !ldlm_is_no_lru(lock) && !ldlm_is_bl_ast(lock) && + !ldlm_is_converting(lock)) { LDLM_DEBUG(lock, "add lock into lru list"); /* If this is a client-side namespace and this was the last @@ -1648,6 +1649,13 @@ enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns, unlock_res_and_lock(lock); ldlm_lock2desc(lock->l_blocking_lock, &d); + /* copy blocking lock ibits in cancel_bits as well, + * new client may use them for lock convert and it is + * important to use new field to convert locks from + * new servers only + */ + d.l_policy_data.l_inodebits.cancel_bits = + lock->l_blocking_lock->l_policy_data.l_inodebits.bits; rc = lock->l_blocking_ast(lock, &d, (void *)arg, LDLM_CB_BLOCKING); LDLM_LOCK_RELEASE(lock->l_blocking_lock); @@ -1896,6 +1904,7 @@ void ldlm_lock_cancel(struct ldlm_lock *lock) */ if (lock->l_readers || lock->l_writers) { LDLM_ERROR(lock, "lock still has references"); + unlock_res_and_lock(lock); LBUG(); } diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 481719b..b50a3f7 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -118,6 +118,24 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, LDLM_DEBUG(lock, "client blocking AST callback handler"); lock_res_and_lock(lock); + + /* set bits to cancel for this lock for possible lock convert */ + if (lock->l_resource->lr_type == LDLM_IBITS) { + /* Lock description contains policy of blocking lock, + * and its cancel_bits is used to pass conflicting bits. + * NOTE: ld can be NULL or can be not NULL but zeroed if + * passed from ldlm_bl_thread_blwi(), check below used bits + * in ld to make sure it is valid description. + */ + if (ld && ld->l_policy_data.l_inodebits.bits) + lock->l_policy_data.l_inodebits.cancel_bits = + ld->l_policy_data.l_inodebits.cancel_bits; + /* if there is no valid ld and lock is cbpending already + * then cancel_bits should be kept, otherwise it is zeroed. + */ + else if (!ldlm_is_cbpending(lock)) + lock->l_policy_data.l_inodebits.cancel_bits = 0; + } ldlm_set_cbpending(lock); if (ldlm_is_cancel_on_block(lock)) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 92e4f69..5ec0da5 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -818,6 +818,177 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, EXPORT_SYMBOL(ldlm_cli_enqueue); /** + * Client-side lock convert reply handling. + * + * Finish client lock converting, checks for concurrent converts + * and clear 'converting' flag so lock can be placed back into LRU. + */ +static int lock_convert_interpret(const struct lu_env *env, + struct ptlrpc_request *req, + struct ldlm_async_args *aa, int rc) +{ + struct ldlm_lock *lock; + struct ldlm_reply *reply; + + lock = ldlm_handle2lock(&aa->lock_handle); + if (!lock) { + LDLM_DEBUG_NOLOCK("convert ACK for unknown local cookie %#llx", + aa->lock_handle.cookie); + return -ESTALE; + } + + LDLM_DEBUG(lock, "CONVERTED lock:"); + + if (rc != ELDLM_OK) + goto out; + + reply = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP); + if (!reply) { + rc = -EPROTO; + goto out; + } + + if (reply->lock_handle.cookie != aa->lock_handle.cookie) { + LDLM_ERROR(lock, + "convert ACK with wrong lock cookie %#llx but cookie %#llx from server %s id %s\n", + aa->lock_handle.cookie, reply->lock_handle.cookie, + req->rq_export->exp_client_uuid.uuid, + libcfs_id2str(req->rq_peer)); + rc = -ESTALE; + goto out; + } + + lock_res_and_lock(lock); + /* Lock convert is sent for any new bits to drop, the converting flag + * is dropped when ibits on server are the same as on client. Meanwhile + * that can be so that more later convert will be replied first with + * and clear converting flag, so in case of such race just exit here. + * if lock has no converting bits then. + */ + if (!ldlm_is_converting(lock)) { + LDLM_DEBUG(lock, + "convert ACK for lock without converting flag, reply ibits %#llx", + reply->lock_desc.l_policy_data.l_inodebits.bits); + } else if (reply->lock_desc.l_policy_data.l_inodebits.bits != + lock->l_policy_data.l_inodebits.bits) { + /* Compare server returned lock ibits and local lock ibits + * if they are the same we consider conversion is done, + * otherwise we have more converts inflight and keep + * converting flag. + */ + LDLM_DEBUG(lock, "convert ACK with ibits %#llx\n", + reply->lock_desc.l_policy_data.l_inodebits.bits); + } else { + ldlm_clear_converting(lock); + + /* Concurrent BL AST has arrived, it may cause another convert + * or cancel so just exit here. + */ + if (!ldlm_is_bl_ast(lock)) { + struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); + + /* Drop cancel_bits since there are no more converts + * and put lock into LRU if it is not there yet. + */ + lock->l_policy_data.l_inodebits.cancel_bits = 0; + spin_lock(&ns->ns_lock); + if (!list_empty(&lock->l_lru)) + ldlm_lock_remove_from_lru_nolock(lock); + ldlm_lock_add_to_lru_nolock(lock); + spin_unlock(&ns->ns_lock); + } + } + unlock_res_and_lock(lock); +out: + if (rc) { + lock_res_and_lock(lock); + if (ldlm_is_converting(lock)) { + LASSERT(list_empty(&lock->l_lru)); + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } + unlock_res_and_lock(lock); + } + + LDLM_LOCK_PUT(lock); + return rc; +} + +/** + * Client-side IBITS lock convert. + * + * Inform server that lock has been converted instead of canceling. + * Server finishes convert on own side and does reprocess to grant + * all related waiting locks. + * + * Since convert means only ibits downgrading, client doesn't need to + * wait for server reply to finish local converting process so this request + * is made asynchronous. + * + */ +int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) +{ + struct ldlm_request *body; + struct ptlrpc_request *req; + struct ldlm_async_args *aa; + struct obd_export *exp = lock->l_conn_export; + + if (!exp) { + LDLM_ERROR(lock, "convert must not be called on local locks."); + return -EINVAL; + } + + if (lock->l_resource->lr_type != LDLM_IBITS) { + LDLM_ERROR(lock, "convert works with IBITS locks only."); + return -EINVAL; + } + + LDLM_DEBUG(lock, "client-side convert"); + + req = ptlrpc_request_alloc_pack(class_exp2cliimp(exp), + &RQF_LDLM_CONVERT, LUSTRE_DLM_VERSION, + LDLM_CONVERT); + if (!req) + return -ENOMEM; + + body = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ); + body->lock_handle[0] = lock->l_remote_handle; + + body->lock_desc.l_req_mode = lock->l_req_mode; + body->lock_desc.l_granted_mode = lock->l_granted_mode; + + body->lock_desc.l_policy_data.l_inodebits.bits = + lock->l_policy_data.l_inodebits.bits; + body->lock_desc.l_policy_data.l_inodebits.cancel_bits = 0; + + body->lock_flags = ldlm_flags_to_wire(*flags); + body->lock_count = 1; + + ptlrpc_request_set_replen(req); + + /* That could be useful to use cancel portals for convert as well + * as high-priority handling. This will require changes in + * ldlm_cancel_handler to understand convert RPC as well. + * + * req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; + * req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + */ + ptlrpc_at_set_req_timeout(req); + + if (exp->exp_obd->obd_svc_stats) + lprocfs_counter_incr(exp->exp_obd->obd_svc_stats, + LDLM_CONVERT - LDLM_FIRST_OPC); + + aa = ptlrpc_req_async_args(aa, req); + ldlm_lock2handle(lock, &aa->lock_handle); + req->rq_interpret_reply = (ptlrpc_interpterer_t)lock_convert_interpret; + + ptlrpcd_add_req(req); + return 0; +} + +/** * Cancel locks locally. * * Returns: LDLM_FL_LOCAL_ONLY if there is no need for a CANCEL RPC @@ -1057,6 +1228,19 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } + /* Convert lock bits instead of cancel for IBITS locks */ + if (cancel_flags & LCF_CONVERT) { + LASSERT(lock->l_resource->lr_type == LDLM_IBITS); + LASSERT(lock->l_policy_data.l_inodebits.cancel_bits != 0); + + rc = ldlm_cli_dropbits(lock, + lock->l_policy_data.l_inodebits.cancel_bits); + if (rc == 0) { + LDLM_LOCK_RELEASE(lock); + return 0; + } + } + lock_res_and_lock(lock); /* Lock is being canceled and the caller doesn't want to wait */ if (ldlm_is_canceling(lock)) { @@ -1069,6 +1253,15 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } + /* Lock is being converted, cancel it immediately. + * When convert will end, it releases lock and it will be gone. + */ + if (ldlm_is_converting(lock)) { + /* set back flags removed by convert */ + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } + ldlm_set_canceling(lock); unlock_res_and_lock(lock); @@ -1439,7 +1632,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* Somebody is already doing CANCEL. No need for this * lock in LRU, do not traverse it again. */ - if (!ldlm_is_canceling(lock)) + if (!ldlm_is_canceling(lock) || + !ldlm_is_converting(lock)) break; ldlm_lock_remove_from_lru_nolock(lock); @@ -1483,7 +1677,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, lock_res_and_lock(lock); /* Check flags again under the lock. */ - if (ldlm_is_canceling(lock) || + if (ldlm_is_canceling(lock) || ldlm_is_converting(lock) || (ldlm_lock_remove_from_lru_check(lock, last_use) == 0)) { /* Another thread is removing lock from LRU, or * somebody is already doing CANCEL, or there diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 1b5e270..8b1a1ca 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -213,184 +213,261 @@ int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) return rc; } -int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, - void *data, int flag) +void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) { - struct lustre_handle lockh; + struct inode *inode = ll_inode_from_resource_lock(lock); + u64 bits = to_cancel; int rc; - switch (flag) { - case LDLM_CB_BLOCKING: - ldlm_lock2handle(lock, &lockh); - rc = ldlm_cli_cancel(&lockh, LCF_ASYNC); - if (rc < 0) { - CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc); - return rc; - } - break; - case LDLM_CB_CANCELING: { - struct inode *inode = ll_inode_from_resource_lock(lock); - u64 bits = lock->l_policy_data.l_inodebits.bits; + if (!inode) + return; - if (!inode) - break; + if (!fid_res_name_eq(ll_inode2fid(inode), + &lock->l_resource->lr_name)) { + LDLM_ERROR(lock, + "data mismatch with object " DFID "(%p)", + PFID(ll_inode2fid(inode)), inode); + LBUG(); + } - /* Invalidate all dentries associated with this inode */ - LASSERT(ldlm_is_canceling(lock)); + if (bits & MDS_INODELOCK_XATTR) { + if (S_ISDIR(inode->i_mode)) + ll_i2info(inode)->lli_def_stripe_offset = -1; + ll_xattr_cache_destroy(inode); + bits &= ~MDS_INODELOCK_XATTR; + } - if (!fid_res_name_eq(ll_inode2fid(inode), - &lock->l_resource->lr_name)) { - LDLM_ERROR(lock, - "data mismatch with object " DFID "(%p)", - PFID(ll_inode2fid(inode)), inode); + /* For OPEN locks we differentiate between lock modes + * LCK_CR, LCK_CW, LCK_PR - bug 22891 + */ + if (bits & MDS_INODELOCK_OPEN) + ll_have_md_lock(inode, &bits, lock->l_req_mode); + + if (bits & MDS_INODELOCK_OPEN) { + fmode_t fmode; + + switch (lock->l_req_mode) { + case LCK_CW: + fmode = FMODE_WRITE; + break; + case LCK_PR: + fmode = FMODE_EXEC; + break; + case LCK_CR: + fmode = FMODE_READ; + break; + default: + LDLM_ERROR(lock, "bad lock mode for OPEN lock"); LBUG(); } - if (bits & MDS_INODELOCK_XATTR) { - if (S_ISDIR(inode->i_mode)) - ll_i2info(inode)->lli_def_stripe_offset = -1; - ll_xattr_cache_destroy(inode); - bits &= ~MDS_INODELOCK_XATTR; - } + ll_md_real_close(inode, fmode); - /* For OPEN locks we differentiate between lock modes - * LCK_CR, LCK_CW, LCK_PR - bug 22891 - */ - if (bits & MDS_INODELOCK_OPEN) - ll_have_md_lock(inode, &bits, lock->l_req_mode); - - if (bits & MDS_INODELOCK_OPEN) { - fmode_t fmode; - - switch (lock->l_req_mode) { - case LCK_CW: - fmode = FMODE_WRITE; - break; - case LCK_PR: - fmode = FMODE_EXEC; - break; - case LCK_CR: - fmode = FMODE_READ; - break; - default: - LDLM_ERROR(lock, "bad lock mode for OPEN lock"); - LBUG(); - } + bits &= ~MDS_INODELOCK_OPEN; + } - ll_md_real_close(inode, fmode); - } + if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | + MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM | + MDS_INODELOCK_DOM)) + ll_have_md_lock(inode, &bits, LCK_MINMODE); + + if (bits & MDS_INODELOCK_DOM) { + rc = ll_dom_lock_cancel(inode, lock); + if (rc < 0) + CDEBUG(D_INODE, "cannot flush DoM data " + DFID": rc = %d\n", + PFID(ll_inode2fid(inode)), rc); + lock_res_and_lock(lock); + ldlm_set_kms_ignore(lock); + unlock_res_and_lock(lock); + } - if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | - MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM | - MDS_INODELOCK_DOM)) - ll_have_md_lock(inode, &bits, LCK_MINMODE); - - if (bits & MDS_INODELOCK_DOM) { - rc = ll_dom_lock_cancel(inode, lock); - if (rc < 0) - CDEBUG(D_INODE, "cannot flush DoM data " - DFID": rc = %d\n", - PFID(ll_inode2fid(inode)), rc); - lock_res_and_lock(lock); - ldlm_set_kms_ignore(lock); - unlock_res_and_lock(lock); - bits &= ~MDS_INODELOCK_DOM; - } + if (bits & MDS_INODELOCK_LAYOUT) { + struct cl_object_conf conf = { + .coc_opc = OBJECT_CONF_INVALIDATE, + .coc_inode = inode, + }; - if (bits & MDS_INODELOCK_LAYOUT) { - struct cl_object_conf conf = { - .coc_opc = OBJECT_CONF_INVALIDATE, - .coc_inode = inode, - }; - - rc = ll_layout_conf(inode, &conf); - if (rc < 0) - CDEBUG(D_INODE, "cannot invalidate layout of " - DFID ": rc = %d\n", - PFID(ll_inode2fid(inode)), rc); - } + rc = ll_layout_conf(inode, &conf); + if (rc < 0) + CDEBUG(D_INODE, "cannot invalidate layout of " + DFID ": rc = %d\n", + PFID(ll_inode2fid(inode)), rc); + } - if (bits & MDS_INODELOCK_UPDATE) { - set_bit(LLIF_UPDATE_ATIME, - &ll_i2info(inode)->lli_flags); - } + if (bits & MDS_INODELOCK_UPDATE) + set_bit(LLIF_UPDATE_ATIME, + &ll_i2info(inode)->lli_flags); - if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) { - struct ll_inode_info *lli = ll_i2info(inode); + if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) { + struct ll_inode_info *lli = ll_i2info(inode); - CDEBUG(D_INODE, - "invalidating inode " DFID " lli = %p, pfid = " DFID "\n", - PFID(ll_inode2fid(inode)), lli, - PFID(&lli->lli_pfid)); + CDEBUG(D_INODE, + "invalidating inode "DFID" lli = %p, pfid = "DFID"\n", + PFID(ll_inode2fid(inode)), + lli, PFID(&lli->lli_pfid)); + truncate_inode_pages(inode->i_mapping, 0); - truncate_inode_pages(inode->i_mapping, 0); + if (unlikely(!fid_is_zero(&lli->lli_pfid))) { + struct inode *master_inode = NULL; + unsigned long hash; - if (unlikely(!fid_is_zero(&lli->lli_pfid))) { - struct inode *master_inode = NULL; - unsigned long hash; + /* + * This is slave inode, since all of the child dentry + * is connected on the master inode, so we have to + * invalidate the negative children on master inode + */ + CDEBUG(D_INODE, + "Invalidate s" DFID " m" DFID "\n", + PFID(ll_inode2fid(inode)), PFID(&lli->lli_pfid)); - /* - * This is slave inode, since all of the child - * dentry is connected on the master inode, so - * we have to invalidate the negative children - * on master inode - */ - CDEBUG(D_INODE, - "Invalidate s" DFID " m" DFID "\n", - PFID(ll_inode2fid(inode)), - PFID(&lli->lli_pfid)); - - hash = cl_fid_build_ino(&lli->lli_pfid, - ll_need_32bit_api(ll_i2sbi(inode))); - /* - * Do not lookup the inode with ilookup5, - * otherwise it will cause dead lock, - * - * 1. Client1 send chmod req to the MDT0, then - * on MDT0, it enqueues master and all of its - * slaves lock, (mdt_attr_set() -> - * mdt_lock_slaves()), after gets master and - * stripe0 lock, it will send the enqueue req - * (for stripe1) to MDT1, then MDT1 finds the - * lock has been granted to client2. Then MDT1 - * sends blocking ast to client2. - * - * 2. At the same time, client2 tries to unlink - * the striped dir (rm -rf striped_dir), and - * during lookup, it will hold the master inode - * of the striped directory, whose inode state - * is NEW, then tries to revalidate all of its - * slaves, (ll_prep_inode()->ll_iget()-> - * ll_read_inode2()-> ll_update_inode().). And - * it will be blocked on the server side because - * of 1. - * - * 3. Then the client get the blocking_ast req, - * cancel the lock, but being blocked if using - * ->ilookup5()), because master inode state is - * NEW. - */ - master_inode = ilookup5_nowait(inode->i_sb, - hash, - ll_test_inode_by_fid, - (void *)&lli->lli_pfid); - if (master_inode) { - ll_invalidate_negative_children(master_inode); - iput(master_inode); - } - } else { - ll_invalidate_negative_children(inode); + hash = cl_fid_build_ino(&lli->lli_pfid, + ll_need_32bit_api( + ll_i2sbi(inode))); + /* + * Do not lookup the inode with ilookup5, otherwise + * it will cause dead lock, + * 1. Client1 send chmod req to the MDT0, then on MDT0, + * it enqueues master and all of its slaves lock, + * (mdt_attr_set() -> mdt_lock_slaves()), after gets + * master and stripe0 lock, it will send the enqueue + * req (for stripe1) to MDT1, then MDT1 finds the lock + * has been granted to client2. Then MDT1 sends blocking + * ast to client2. + * 2. At the same time, client2 tries to unlink + * the striped dir (rm -rf striped_dir), and during + * lookup, it will hold the master inode of the striped + * directory, whose inode state is NEW, then tries to + * revalidate all of its slaves, (ll_prep_inode()-> + * ll_iget()->ll_read_inode2()-> ll_update_inode().). + * And it will be blocked on the server side because + * of 1. + * 3. Then the client get the blocking_ast req, cancel + * the lock, but being blocked if using ->ilookup5()), + * because master inode state is NEW. + */ + master_inode = ilookup5_nowait(inode->i_sb, hash, + ll_test_inode_by_fid, + (void *)&lli->lli_pfid); + if (master_inode) { + ll_invalidate_negative_children(master_inode); + iput(master_inode); } + } else { + ll_invalidate_negative_children(inode); } + } - if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) && - inode->i_sb->s_root && - !is_root_inode(inode)) - ll_invalidate_aliases(inode); + if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) && + inode->i_sb->s_root && + !is_root_inode(inode)) + ll_invalidate_aliases(inode); - iput(inode); + iput(inode); +} + +/* Check if the given lock may be downgraded instead of canceling and + * that convert is really needed. + */ +int ll_md_need_convert(struct ldlm_lock *lock) +{ + struct inode *inode; + u64 wanted = lock->l_policy_data.l_inodebits.cancel_bits; + u64 bits = lock->l_policy_data.l_inodebits.bits & ~wanted; + enum ldlm_mode mode = LCK_MINMODE; + + if (!wanted || !bits || ldlm_is_cancel(lock)) + return 0; + + /* do not convert locks other than DOM for now */ + if (!((bits | wanted) & MDS_INODELOCK_DOM)) + return 0; + + /* We may have already remaining bits in some other lock so + * lock convert will leave us just extra lock for the same bit. + * Check if client has other lock with the same bits and the same + * or lower mode and don't convert if any. + */ + switch (lock->l_req_mode) { + case LCK_PR: + mode = LCK_PR; + /* fall-through */ + case LCK_PW: + mode |= LCK_CR; + break; + case LCK_CW: + mode = LCK_CW; + /* fall-through */ + case LCK_CR: + mode |= LCK_CR; break; + default: + /* do not convert other modes */ + return 0; } + + /* is lock is too old to be converted? */ + lock_res_and_lock(lock); + if (ktime_after(ktime_get(), + ktime_add(lock->l_last_used, + ktime_set(10, 0)))) { + unlock_res_and_lock(lock); + return 0; + } + unlock_res_and_lock(lock); + + inode = ll_inode_from_resource_lock(lock); + ll_have_md_lock(inode, &bits, mode); + iput(inode); + return !!(bits); +} + +int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, + void *data, int flag) +{ + struct lustre_handle lockh; + u64 bits = lock->l_policy_data.l_inodebits.bits; + int rc; + + switch (flag) { + case LDLM_CB_BLOCKING: + { + u64 cancel_flags = LCF_ASYNC; + + if (ll_md_need_convert(lock)) { + cancel_flags |= LCF_CONVERT; + /* For lock convert some cancel actions may require + * this lock with non-dropped canceled bits, e.g. page + * flush for DOM lock. So call ll_lock_cancel_bits() + * here while canceled bits are still set. + */ + bits = lock->l_policy_data.l_inodebits.cancel_bits; + if (bits & MDS_INODELOCK_DOM) + ll_lock_cancel_bits(lock, MDS_INODELOCK_DOM); + } + ldlm_lock2handle(lock, &lockh); + rc = ldlm_cli_cancel(&lockh, cancel_flags); + if (rc < 0) { + CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc); + return rc; + } + break; + } + case LDLM_CB_CANCELING: + if (ldlm_is_converting(lock)) { + /* this is called on already converted lock, so + * ibits has remained bits only and cancel_bits + * are bits that were dropped. + * Note that DOM lock is handled prior lock convert + * and is excluded here. + */ + bits = lock->l_policy_data.l_inodebits.cancel_bits & + ~MDS_INODELOCK_DOM; + } else { + LASSERT(ldlm_is_canceling(lock)); + } + ll_lock_cancel_bits(lock, bits); + break; default: LBUG(); } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c92663b..b14d301c 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -3027,7 +3027,7 @@ void lustre_assert_wire_constants(void) (long long)(int)sizeof(((struct ldlm_extent *)0)->gid)); /* Checks for struct ldlm_inodebits */ - LASSERTF((int)sizeof(struct ldlm_inodebits) == 8, "found %lld\n", + LASSERTF((int)sizeof(struct ldlm_inodebits) == 16, "found %lld\n", (long long)(int)sizeof(struct ldlm_inodebits)); LASSERTF((int)offsetof(struct ldlm_inodebits, bits) == 0, "found %lld\n", (long long)(int)offsetof(struct ldlm_inodebits, bits)); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 794e6d6..2403b89 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2120,6 +2120,7 @@ static inline bool ldlm_extent_equal(const struct ldlm_extent *ex1, struct ldlm_inodebits { __u64 bits; + __u64 cancel_bits; /* for lock convert */ }; struct ldlm_flock_wire { From patchwork Thu Feb 27 21:08:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D49ED14BC for ; Thu, 27 Feb 2020 21:20:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD16A246A1 for ; Thu, 27 Feb 2020 21:20:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD16A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1585521FECF; Thu, 27 Feb 2020 13:19:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB40B21FA6E for ; Thu, 27 Feb 2020 13:18:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A0540A05; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9EAD546A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:27 -0500 Message-Id: <1582838290-17243-40-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 039/622] lustre: ptlrpc: fix return type of boolean functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Some functions are returning type int with values 0 or 1 when they could be returning bool. Fix up the return type of: lustre_req_swabbed() lustre_rep_swabbed() ptlrpc_req_need_swab() ptlrpc_rep_need_swab() ptlrpc_buf_need_swab() WC-bug-id: https://jira.whamcloud.com/browse/LU-1644 Lustre-commit: e2cac9fb9baf ("LU-1644 ptlrpc: fix return type of boolean functions") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32088 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 20 ++++++++++---------- fs/lustre/ptlrpc/pack_generic.c | 9 ++++----- fs/lustre/ptlrpc/sec_plain.c | 7 +++---- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 961b8cb..0231011 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -953,35 +953,35 @@ static inline bool ptlrpc_nrs_req_can_move(struct ptlrpc_request *req) /** @} nrs */ /** - * Returns 1 if request buffer at offset @index was already swabbed + * Returns true if request buffer at offset @index was already swabbed */ -static inline int lustre_req_swabbed(struct ptlrpc_request *req, size_t index) +static inline bool lustre_req_swabbed(struct ptlrpc_request *req, size_t index) { LASSERT(index < sizeof(req->rq_req_swab_mask) * 8); return req->rq_req_swab_mask & (1 << index); } /** - * Returns 1 if request reply buffer at offset @index was already swabbed + * Returns true if request reply buffer at offset @index was already swabbed */ -static inline int lustre_rep_swabbed(struct ptlrpc_request *req, size_t index) +static inline bool lustre_rep_swabbed(struct ptlrpc_request *req, size_t index) { LASSERT(index < sizeof(req->rq_rep_swab_mask) * 8); return req->rq_rep_swab_mask & (1 << index); } /** - * Returns 1 if request needs to be swabbed into local cpu byteorder + * Returns true if request needs to be swabbed into local cpu byteorder */ -static inline int ptlrpc_req_need_swab(struct ptlrpc_request *req) +static inline bool ptlrpc_req_need_swab(struct ptlrpc_request *req) { return lustre_req_swabbed(req, MSG_PTLRPC_HEADER_OFF); } /** - * Returns 1 if request reply needs to be swabbed into local cpu byteorder + * Returns true if request reply needs to be swabbed into local cpu byteorder */ -static inline int ptlrpc_rep_need_swab(struct ptlrpc_request *req) +static inline bool ptlrpc_rep_need_swab(struct ptlrpc_request *req) { return lustre_rep_swabbed(req, MSG_PTLRPC_HEADER_OFF); } @@ -1999,8 +1999,8 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, * * @{ */ -int ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, - u32 index); +bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, + u32 index); void ptlrpc_buf_set_swabbed(struct ptlrpc_request *req, const int inout, u32 index); int ptlrpc_unpack_rep_msg(struct ptlrpc_request *req, int len); diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index bc5e513..9cea826 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -78,15 +78,14 @@ void ptlrpc_buf_set_swabbed(struct ptlrpc_request *req, const int inout, lustre_set_rep_swabbed(req, index); } -int ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, - u32 index) +bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, + u32 index) { if (inout) return (ptlrpc_req_need_swab(req) && !lustre_req_swabbed(req, index)); - else - return (ptlrpc_rep_need_swab(req) && - !lustre_rep_swabbed(req, index)); + + return (ptlrpc_rep_need_swab(req) && !lustre_rep_swabbed(req, index)); } /* early reply size */ diff --git a/fs/lustre/ptlrpc/sec_plain.c b/fs/lustre/ptlrpc/sec_plain.c index 2358c3f..93a9a17 100644 --- a/fs/lustre/ptlrpc/sec_plain.c +++ b/fs/lustre/ptlrpc/sec_plain.c @@ -217,7 +217,7 @@ int plain_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req) struct lustre_msg *msg = req->rq_repdata; struct plain_header *phdr; u32 cksum; - int swabbed; + bool swabbed; if (msg->lm_bufcount != PLAIN_PACK_SEGMENTS) { CERROR("unexpected reply buf count %u\n", msg->lm_bufcount); @@ -715,12 +715,11 @@ int plain_enlarge_reqbuf(struct ptlrpc_sec *sec, .sc_policy = &plain_policy, }; -static -int plain_accept(struct ptlrpc_request *req) +static int plain_accept(struct ptlrpc_request *req) { struct lustre_msg *msg = req->rq_reqbuf; struct plain_header *phdr; - int swabbed; + bool swabbed; LASSERT(SPTLRPC_FLVR_POLICY(req->rq_flvr.sf_rpc) == SPTLRPC_POLICY_PLAIN); From patchwork Thu Feb 27 21:08:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0CE2138D for ; Thu, 27 Feb 2020 21:20:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 88ADE246A1 for ; Thu, 27 Feb 2020 21:20:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88ADE246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 882E221CB88; Thu, 27 Feb 2020 13:19:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3D93821FA6E for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A3C3BA1C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A204746C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:28 -0500 Message-Id: <1582838290-17243-41-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 040/622] lustre: llite: decrease sa_running if fail to start statahead X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong Otherwise the counter of ll_sb_info::ll_sa_running will leak as to the umount process will be blocked for ever. WC-bug-id: https://jira.whamcloud.com/browse/LU-10992 Lustre-commit: 6b8638bf7920 ("LU-10992 llite: decrease sa_running if fail to start statahead") Signed-off-by: Fan Yong Reviewed-on: https://review.whamcloud.com/32287 Reviewed-by: Lai Siyao Reviewed-by: Bobi Jam Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/statahead.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 4a61dac..122b9d8 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1566,6 +1566,7 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry) spin_lock(&lli->lli_sa_lock); lli->lli_sai = NULL; spin_unlock(&lli->lli_sa_lock); + atomic_dec(&ll_i2sbi(parent->d_inode)->ll_sa_running); rc = PTR_ERR(task); CERROR("can't start ll_sa thread, rc : %d\n", rc); goto out; From patchwork Thu Feb 27 21:08:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7C2D14BC for ; Thu, 27 Feb 2020 21:20:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A02F5246A1 for ; Thu, 27 Feb 2020 21:20:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A02F5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7A0921FB29; Thu, 27 Feb 2020 13:19:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8044521FAC3 for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A666AA1D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A501546F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:29 -0500 Message-Id: <1582838290-17243-42-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 041/622] lustre: lmv: dir page is released while in use X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao When popping stripe dirent, if it reaches page end, stripe_dirent_next() releases current page and then reads next one, but current dirent is still in use, as will cause wrong values used, and trigger assertion. This patch changes to not read next page upon reaching end, but leave it to next dirent read. WC-bug-id: https://jira.whamcloud.com/browse/LU-9857 Lustre-commit: b51e8d6b53a3 ("LU-9857 lmv: dir page is released while in use") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/32180 Reviewed-by: Fan Yong Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 123 +++++++++++++++++++++++------------------------- 1 file changed, 60 insertions(+), 63 deletions(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index d0f626f..c7bf8c7 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2016,7 +2016,7 @@ struct lmv_dir_ctxt { struct stripe_dirent ldc_stripes[0]; }; -static inline void put_stripe_dirent(struct stripe_dirent *stripe) +static inline void stripe_dirent_unload(struct stripe_dirent *stripe) { if (stripe->sd_page) { kunmap(stripe->sd_page); @@ -2031,62 +2031,77 @@ static inline void put_lmv_dir_ctxt(struct lmv_dir_ctxt *ctxt) int i; for (i = 0; i < ctxt->ldc_count; i++) - put_stripe_dirent(&ctxt->ldc_stripes[i]); + stripe_dirent_unload(&ctxt->ldc_stripes[i]); } -static struct lu_dirent *stripe_dirent_next(struct lmv_dir_ctxt *ctxt, +/* if @ent is dummy, or . .., get next */ +static struct lu_dirent *stripe_dirent_get(struct lmv_dir_ctxt *ctxt, + struct lu_dirent *ent, + int stripe_index) +{ + for (; ent; ent = lu_dirent_next(ent)) { + /* Skip dummy entry */ + if (le16_to_cpu(ent->lde_namelen) == 0) + continue; + + /* skip . and .. for other stripes */ + if (stripe_index && + (strncmp(ent->lde_name, ".", + le16_to_cpu(ent->lde_namelen)) == 0 || + strncmp(ent->lde_name, "..", + le16_to_cpu(ent->lde_namelen)) == 0)) + continue; + + if (le64_to_cpu(ent->lde_hash) >= ctxt->ldc_hash) + break; + } + + return ent; +} + +static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, struct stripe_dirent *stripe, int stripe_index) { + struct md_op_data *op_data = ctxt->ldc_op_data; + struct lmv_oinfo *oinfo; + struct lu_fid fid = op_data->op_fid1; + struct inode *inode = op_data->op_data; + struct lmv_tgt_desc *tgt; struct lu_dirent *ent = stripe->sd_ent; u64 hash = ctxt->ldc_hash; - u64 end; int rc = 0; LASSERT(stripe == &ctxt->ldc_stripes[stripe_index]); - - if (stripe->sd_eof) - return NULL; - - if (ent) { - ent = lu_dirent_next(ent); - if (!ent) { -check_eof: - end = le64_to_cpu(stripe->sd_dp->ldp_hash_end); - - LASSERTF(hash <= end, "hash %llx end %llx\n", - hash, end); + LASSERT(!ent); + + do { + if (stripe->sd_page) { + u64 end = le64_to_cpu(stripe->sd_dp->ldp_hash_end); + + /* @hash should be the last dirent hash */ + LASSERTF(hash <= end, + "ctxt@%p stripe@%p hash %llx end %llx\n", + ctxt, stripe, hash, end); + /* unload last page */ + stripe_dirent_unload(stripe); + /* eof */ if (end == MDS_DIR_END_OFF) { stripe->sd_ent = NULL; stripe->sd_eof = true; - return NULL; + break; } - - put_stripe_dirent(stripe); hash = end; } - } - - if (!ent) { - struct md_op_data *op_data = ctxt->ldc_op_data; - struct lmv_oinfo *oinfo; - struct lu_fid fid = op_data->op_fid1; - struct inode *inode = op_data->op_data; - struct lmv_tgt_desc *tgt; - - LASSERT(!stripe->sd_page); oinfo = &op_data->op_mea1->lsm_md_oinfo[stripe_index]; tgt = lmv_get_target(ctxt->ldc_lmv, oinfo->lmo_mds, NULL); if (IS_ERR(tgt)) { rc = PTR_ERR(tgt); - goto out; + break; } - /* - * op_data will be shared by each stripe, so we need - * reset these value for each stripe - */ + /* op_data is shared by stripes, reset after use */ op_data->op_fid1 = oinfo->lmo_fid; op_data->op_fid2 = oinfo->lmo_fid; op_data->op_data = oinfo->lmo_root; @@ -2099,42 +2114,24 @@ static struct lu_dirent *stripe_dirent_next(struct lmv_dir_ctxt *ctxt, op_data->op_data = inode; if (rc) - goto out; - - stripe->sd_dp = page_address(stripe->sd_page); - ent = lu_dirent_start(stripe->sd_dp); - } - - for (; ent; ent = lu_dirent_next(ent)) { - /* Skip dummy entry */ - if (!le16_to_cpu(ent->lde_namelen)) - continue; - - /* skip . and .. for other stripes */ - if (stripe_index && - (strncmp(ent->lde_name, ".", - le16_to_cpu(ent->lde_namelen)) == 0 || - strncmp(ent->lde_name, "..", - le16_to_cpu(ent->lde_namelen)) == 0)) - continue; - - if (le64_to_cpu(ent->lde_hash) >= hash) break; - } - if (!ent) - goto check_eof; + stripe->sd_dp = page_address(stripe->sd_page); + ent = stripe_dirent_get(ctxt, lu_dirent_start(stripe->sd_dp), + stripe_index); + /* in case a page filled with ., .. and dummy, read next */ + } while (!ent); -out: stripe->sd_ent = ent; - /* treat error as eof, so dir can be partially accessed */ if (rc) { - put_stripe_dirent(stripe); + LASSERT(!ent); + /* treat error as eof, so dir can be partially accessed */ stripe->sd_eof = true; LCONSOLE_WARN("dir " DFID " stripe %d readdir failed: %d, directory is partially accessed!\n", PFID(&ctxt->ldc_op_data->op_fid1), stripe_index, rc); } + return ent; } @@ -2186,8 +2183,7 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) continue; if (!stripe->sd_ent) { - /* locate starting entry */ - stripe_dirent_next(ctxt, stripe, i); + stripe_dirent_load(ctxt, stripe, i); if (!stripe->sd_ent) { LASSERT(stripe->sd_eof); continue; @@ -2208,7 +2204,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) stripe = &ctxt->ldc_stripes[min]; ent = stripe->sd_ent; /* pop found dirent */ - stripe_dirent_next(ctxt, stripe, min); + stripe->sd_ent = stripe_dirent_get(ctxt, lu_dirent_next(ent), + min); } return ent; From patchwork Thu Feb 27 21:08:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409741 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 762D0138D for ; Thu, 27 Feb 2020 21:20:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F38B2469F for ; Thu, 27 Feb 2020 21:20:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F38B2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AC06C21FAC1; Thu, 27 Feb 2020 13:20:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6AB021FA55 for ; Thu, 27 Feb 2020 13:18:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A9844A1E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A827C468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:30 -0500 Message-Id: <1582838290-17243-43-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 042/622] lustre: ldlm: speed up preparation for list of lock cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng Keep the skipped locks in lru list will cause serious contention for ns_lock. Since we have to travel them every time in the ldlm_prepare_lru_list(). So we will use a cursor to record position that last accessed lock in lru list. WC-bug-id: https://jira.whamcloud.com/browse/LU-9230 Lustre-commit: 651f2cdd2d8d ("LU-9230 ldlm: speed up preparation for list of lock cancel") Signed-off-by: Yang Sheng Signed-off-by: Sergey Cheremencev Reviewed-on: https://review.whamcloud.com/26327 Reviewed-by: Fan Yong Reviewed-by: Vitaly Fertman Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 + fs/lustre/include/lustre_dlm_flags.h | 9 ----- fs/lustre/ldlm/ldlm_lock.c | 3 +- fs/lustre/ldlm/ldlm_request.c | 72 ++++++++++++++++-------------------- fs/lustre/ldlm/ldlm_resource.c | 1 + 5 files changed, 35 insertions(+), 51 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 66608a9..1a19b35 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -406,6 +406,7 @@ struct ldlm_namespace { struct list_head ns_unused_list; /** Number of locks in the LRU list above */ int ns_nr_unused; + struct list_head *ns_last_pos; /** * Maximum number of locks permitted in the LRU. If 0, means locks diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h index c8667c8..3d69c49 100644 --- a/fs/lustre/include/lustre_dlm_flags.h +++ b/fs/lustre/include/lustre_dlm_flags.h @@ -200,15 +200,6 @@ #define ldlm_set_fail_loc(_l) LDLM_SET_FLAG((_l), 1ULL << 32) #define ldlm_clear_fail_loc(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 32) -/** - * Used while processing the unused list to know that we have already - * handled this lock and decided to skip it. - */ -#define LDLM_FL_SKIPPED 0x0000000200000000ULL /* bit 33 */ -#define ldlm_is_skipped(_l) LDLM_TEST_FLAG((_l), 1ULL << 33) -#define ldlm_set_skipped(_l) LDLM_SET_FLAG((_l), 1ULL << 33) -#define ldlm_clear_skipped(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 33) - /** this lock is being destroyed */ #define LDLM_FL_CBPENDING 0x0000000400000000ULL /* bit 34 */ #define ldlm_is_cbpending(_l) LDLM_TEST_FLAG((_l), 1ULL << 34) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 9847c43..894b99b 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -204,6 +204,8 @@ int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock) struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); LASSERT(lock->l_resource->lr_type != LDLM_FLOCK); + if (ns->ns_last_pos == &lock->l_lru) + ns->ns_last_pos = lock->l_lru.prev; list_del_init(&lock->l_lru); LASSERT(ns->ns_nr_unused > 0); ns->ns_nr_unused--; @@ -249,7 +251,6 @@ void ldlm_lock_add_to_lru_nolock(struct ldlm_lock *lock) LASSERT(list_empty(&lock->l_lru)); LASSERT(lock->l_resource->lr_type != LDLM_FLOCK); list_add_tail(&lock->l_lru, &ns->ns_unused_list); - ldlm_clear_skipped(lock); LASSERT(ns->ns_nr_unused >= 0); ns->ns_nr_unused++; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5ec0da5..dd4d958 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1368,9 +1368,6 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, /* fall through */ default: result = LDLM_POLICY_SKIP_LOCK; - lock_res_and_lock(lock); - ldlm_set_skipped(lock); - unlock_res_and_lock(lock); break; } @@ -1592,54 +1589,47 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, int flags) { ldlm_cancel_lru_policy_t pf; - struct ldlm_lock *lock, *next; - int added = 0, unused, remained; + int added = 0; int no_wait = flags & LDLM_LRU_FLAG_NO_WAIT; - spin_lock(&ns->ns_lock); - unused = ns->ns_nr_unused; - remained = unused; - if (!ns_connect_lru_resize(ns)) - count += unused - ns->ns_max_unused; + count += ns->ns_nr_unused - ns->ns_max_unused; pf = ldlm_cancel_lru_policy(ns, flags); LASSERT(pf); - while (!list_empty(&ns->ns_unused_list)) { + /* For any flags, stop scanning if @max is reached. */ + while (!list_empty(&ns->ns_unused_list) && (max == 0 || added < max)) { + struct ldlm_lock *lock; + struct list_head *item, *next; enum ldlm_policy_res result; ktime_t last_use = ktime_set(0, 0); - /* all unused locks */ - if (remained-- <= 0) - break; - - /* For any flags, stop scanning if @max is reached. */ - if (max && added >= max) - break; + spin_lock(&ns->ns_lock); + item = no_wait ? ns->ns_last_pos : &ns->ns_unused_list; + for (item = item->next, next = item->next; + item != &ns->ns_unused_list; + item = next, next = item->next) { + lock = list_entry(item, struct ldlm_lock, l_lru); - list_for_each_entry_safe(lock, next, &ns->ns_unused_list, - l_lru) { /* No locks which got blocking requests. */ LASSERT(!ldlm_is_bl_ast(lock)); - if (no_wait && ldlm_is_skipped(lock)) - /* already processed */ - continue; - - last_use = lock->l_last_used; - - /* Somebody is already doing CANCEL. No need for this - * lock in LRU, do not traverse it again. - */ if (!ldlm_is_canceling(lock) || !ldlm_is_converting(lock)) break; + /* Somebody is already doing CANCEL. No need for this + * lock in LRU, do not traverse it again. + */ ldlm_lock_remove_from_lru_nolock(lock); } - if (&lock->l_lru == &ns->ns_unused_list) + if (item == &ns->ns_unused_list) { + spin_unlock(&ns->ns_lock); break; + } + + last_use = lock->l_last_used; LDLM_LOCK_GET(lock); spin_unlock(&ns->ns_lock); @@ -1659,19 +1649,23 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, * their weight. Big extent locks will stay in * the cache. */ - result = pf(ns, lock, unused, added, count); + result = pf(ns, lock, ns->ns_nr_unused, added, count); if (result == LDLM_POLICY_KEEP_LOCK) { - lu_ref_del(&lock->l_reference, - __func__, current); + lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); break; } + if (result == LDLM_POLICY_SKIP_LOCK) { - lu_ref_del(&lock->l_reference, - __func__, current); + lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); + if (no_wait) { + spin_lock(&ns->ns_lock); + if (!list_empty(&lock->l_lru) && + lock->l_lru.prev == ns->ns_last_pos) + ns->ns_last_pos = &lock->l_lru; + spin_unlock(&ns->ns_lock); + } continue; } @@ -1690,7 +1684,6 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, lu_ref_del(&lock->l_reference, __func__, current); LDLM_LOCK_RELEASE(lock); - spin_lock(&ns->ns_lock); continue; } LASSERT(!lock->l_readers && !lock->l_writers); @@ -1728,11 +1721,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, list_add(&lock->l_bl_ast, cancels); unlock_res_and_lock(lock); lu_ref_del(&lock->l_reference, __func__, current); - spin_lock(&ns->ns_lock); added++; - unused--; } - spin_unlock(&ns->ns_lock); return added; } diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 5e0dd53..7fe8a8b 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -682,6 +682,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_connect_flags = 0; ns->ns_dirty_age_limit = LDLM_DIRTY_AGE_LIMIT; ns->ns_stopping = 0; + ns->ns_last_pos = &ns->ns_unused_list; rc = ldlm_namespace_sysfs_register(ns); if (rc != 0) { From patchwork Thu Feb 27 21:08:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0A2D159A for ; Thu, 27 Feb 2020 21:19:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D98EF246A1 for ; Thu, 27 Feb 2020 21:19:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D98EF246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69D9321FCDA; Thu, 27 Feb 2020 13:19:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3A89421FACC for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AC4C6B89; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AAF0D46D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:31 -0500 Message-Id: <1582838290-17243-44-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 043/622] lustre: checksum: enable/disable checksum correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu There are three ways to set checksum support in Lustre. Their order during client mount is: - 1. configure --enable/disable-checksum, this(ENABLE_CHECKSUM) only affects the default mount option and is set in function client_obd_setup(). - 2. lctl set_param -P osc.*.checksums=0/1, when processing llog, this value will be set by osc_checksum_seq_write(). - 3. mount option checksum/nochecksum, this will be checked in ll_options() and be set in client_common_fill_super()-> obd_set_info_async(). This patch fixes one issue in 3. That is if mount option "-o checksum/nochecksum" is specified, checksum will be changed accordingly, no matter what is set by "set_param -P" or the default option; and if no mount option is specified, the value set by "set_param -P" will be kept. Also, test_77k is added to sanity.sh to verify this patch. What's more, a minor initialization issue of cl_supp_cksum_types is fixed. cl_supp_cksum_types should be always initialized no matter checksum is enabled or not. WC-bug-id: https://jira.whamcloud.com/browse/LU-10906 Lustre-commit: e9b13cd1daf9 ("LU-10906 checksum: enable/disable checksum correctly") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/32095 Reviewed-by: Yingjin Qian Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 5 +++-- fs/lustre/llite/llite_internal.h | 3 ++- fs/lustre/llite/llite_lib.c | 23 ++++++++++++++--------- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 7bc1d10..2c0fad3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -355,6 +355,8 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) init_waitqueue_head(&cli->cl_destroy_waitq); atomic_set(&cli->cl_destroy_in_flight, 0); + + cli->cl_supp_cksum_types = OBD_CKSUM_CRC32; /* Turn on checksumming by default. */ cli->cl_checksum = 1; /* @@ -362,8 +364,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) * Set cl_chksum* to CRC32 for now to avoid returning screwed info * through procfs. */ - cli->cl_cksum_type = OBD_CKSUM_CRC32; - cli->cl_supp_cksum_types = OBD_CKSUM_CRC32; + cli->cl_cksum_type = cli->cl_supp_cksum_types; atomic_set(&cli->cl_resends, OSC_DEFAULT_RESENDS); /* diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d0a703d..6bdbf28 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -479,7 +479,8 @@ struct ll_sb_info { unsigned int ll_umounting:1, ll_xattr_cache_enabled:1, ll_xattr_cache_set:1, /* already set to 0/1 */ - ll_client_common_fill_super_succeeded:1; + ll_client_common_fill_super_succeeded:1, + ll_checksum_set:1; struct lustre_client_ocd ll_lco; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e2c7a4d..eb29064 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -560,13 +560,15 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) } checksum = sbi->ll_flags & LL_SBI_CHECKSUM; - err = obd_set_info_async(NULL, sbi->ll_dt_exp, sizeof(KEY_CHECKSUM), - KEY_CHECKSUM, sizeof(checksum), &checksum, - NULL); - if (err) { - CERROR("%s: Set checksum failed: rc = %d\n", - sbi->ll_dt_exp->exp_obd->obd_name, err); - goto out_root; + if (sbi->ll_checksum_set) { + err = obd_set_info_async(NULL, sbi->ll_dt_exp, + sizeof(KEY_CHECKSUM), KEY_CHECKSUM, + sizeof(checksum), &checksum, NULL); + if (err) { + CERROR("%s: Set checksum failed: rc = %d\n", + sbi->ll_dt_exp->exp_obd->obd_name, err); + goto out_root; + } } cl_sb_init(sb); @@ -763,10 +765,11 @@ static inline int ll_set_opt(const char *opt, char *data, int fl) } /* non-client-specific mount options are parsed in lmd_parse */ -static int ll_options(char *options, int *flags) +static int ll_options(char *options, struct ll_sb_info *sbi) { int tmp; char *s1 = options, *s2; + int *flags = &sbi->ll_flags; if (!options) return 0; @@ -832,11 +835,13 @@ static int ll_options(char *options, int *flags) tmp = ll_set_opt("checksum", s1, LL_SBI_CHECKSUM); if (tmp) { *flags |= tmp; + sbi->ll_checksum_set = 1; goto next; } tmp = ll_set_opt("nochecksum", s1, LL_SBI_CHECKSUM); if (tmp) { *flags &= ~tmp; + sbi->ll_checksum_set = 1; goto next; } tmp = ll_set_opt("lruresize", s1, LL_SBI_LRU_RESIZE); @@ -971,7 +976,7 @@ int ll_fill_super(struct super_block *sb) goto out_free; } - err = ll_options(lsi->lsi_lmd->lmd_opts, &sbi->ll_flags); + err = ll_options(lsi->lsi_lmd->lmd_opts, sbi); if (err) goto out_free; From patchwork Thu Feb 27 21:08:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409745 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64F7014BC for ; Thu, 27 Feb 2020 21:21:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D5A22469F for ; Thu, 27 Feb 2020 21:21:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D5A22469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E507321FD07; Thu, 27 Feb 2020 13:20:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9041F21FA61 for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AF50DBA9; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ADE5746A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:32 -0500 Message-Id: <1582838290-17243-45-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 044/622] lustre: build: armv7 client build fixes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko This commit is supposed to fix armv7 Lustre client build, mostly 64-bit division related changes. WC-bug-id: https://jira.whamcloud.com/browse/LU-10964 Lustre-commit: 0300a6efd226 ("LU-10964 build: armv7 client build fixes") Signed-off-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/32194 Reviewed-by: James Simmons Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 3 ++- fs/lustre/ptlrpc/import.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index dd4d958..3991a8f 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1408,7 +1408,8 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns, slv = ldlm_pool_get_slv(pl); lvf = ldlm_pool_get_lvf(pl); - la = ktime_to_ns(ktime_sub(cur, lock->l_last_used)) / NSEC_PER_SEC; + la = div_u64(ktime_to_ns(ktime_sub(cur, lock->l_last_used)), + NSEC_PER_SEC); lv = lvf * la * unused; /* Inform pool about current CLV to see it via debugfs. */ diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index f69b907..5d6546d 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -289,7 +289,7 @@ void ptlrpc_invalidate_import(struct obd_import *imp) */ if (!OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_LONG_REPL_UNLINK)) { timeout = ptlrpc_inflight_timeout(imp); - timeout += timeout / 3; + timeout += div_u64(timeout, 3); if (timeout == 0) timeout = obd_timeout; From patchwork Thu Feb 27 21:08:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 428F214BC for ; Thu, 27 Feb 2020 21:20:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2B0B4246A1 for ; Thu, 27 Feb 2020 21:20:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B0B4246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 002CE21FCFD; Thu, 27 Feb 2020 13:19:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D447121FA61 for ; Thu, 27 Feb 2020 13:18:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B244EE01; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0D4F46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:33 -0500 Message-Id: <1582838290-17243-46-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 045/622] lustre: ldlm: fix l_last_activity usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko When race happen between ldlm_server_blocking_ast() and ldlm_request_cancel(), the at_measured() is called with wrong value equal to current time. And even worse, ldlm_bl_timeout() can return current_time*1.5. Before a time functions was fixed by LU-9019(fdeeed2fb) for 64bit, this race leads to ETIMEDOUT at ptlrpc_import_delay_req() and client eviction during bl ast sending. The wrong type conversion take a place at pltrpc_send_limit_expired() at cfs_time_seconds(). We should not take cancels into accoount if the BLAST is not send, just because the last_activity is not properly initialised - it destroys the AT completely. The patch devides l_last_activity to the client l_activity and server l_blast_sent for better understanding. The l_blast_sent is used for blocking ast only to measure time between BLAST and cancel request. For example: server cancels blocked lock after 1518731697s waiting_locks_callback()) ### lock callback timer expired after 0s: evicting client WC-bug-id: https://jira.whamcloud.com/browse/LU-10945 Lustre-commit: e09d273cb5f2 ("LU-10945 ldlm: fix l_last_activity usage") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-5736 Reviewed-on: https://review.whamcloud.com/32133 Reviewed-by: Andreas Dilger Reviewed-by: Vitaly Fertman Reviewed-by: James Simmons Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 13 +++++++------ fs/lustre/ldlm/ldlm_lock.c | 1 + fs/lustre/ldlm/ldlm_request.c | 14 +++++++------- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 1a19b35..6ad12a3 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -708,12 +708,6 @@ struct ldlm_lock { wait_queue_head_t l_waitq; /** - * Seconds. It will be updated if there is any activity related to - * the lock, e.g. enqueue the lock or send blocking AST. - */ - time64_t l_last_activity; - - /** * Time, in nanoseconds, last used by e.g. being matched by lock match. */ ktime_t l_last_used; @@ -735,6 +729,13 @@ struct ldlm_lock { /** Private storage for lock user. Opaque to LDLM. */ void *l_ast_data; + + /** + * Seconds. It will be updated if there is any activity related to + * the lock at client, e.g. enqueue the lock. + */ + time64_t l_activity; + /* Separate ost_lvb used mostly by Data-on-MDT for now. * It is introduced to don't mix with layout lock data. */ diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 894b99b..1bf387a 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -420,6 +420,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource) lu_ref_init(&lock->l_reference); lu_ref_add(&lock->l_reference, "hash", lock); lock->l_callback_timeout = 0; + lock->l_activity = 0; #if LUSTRE_TRACKS_LOCK_EXP_REFS INIT_LIST_HEAD(&lock->l_exp_refs_link); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 3991a8f..67c23fc 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -114,9 +114,9 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) LDLM_ERROR(lock, "lock timed out (enqueued at %lld, %llds ago); not entering recovery in server code, just going back to sleep", - (s64)lock->l_last_activity, + (s64)lock->l_activity, (s64)(ktime_get_real_seconds() - - lock->l_last_activity)); + lock->l_activity)); if (ktime_get_seconds() > next_dump) { last_dump = next_dump; next_dump = ktime_get_seconds() + 300; @@ -133,8 +133,8 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) ptlrpc_fail_import(imp, conn_cnt); LDLM_ERROR(lock, "lock timed out (enqueued at %lld, %llds ago), entering recovery for %s@%s", - (s64)lock->l_last_activity, - (s64)(ktime_get_real_seconds() - lock->l_last_activity), + (s64)lock->l_activity, + (s64)(ktime_get_real_seconds() - lock->l_activity), obd2cli_tgt(obd), imp->imp_connection->c_remote_uuid.uuid); } @@ -182,7 +182,7 @@ static int ldlm_completion_tail(struct ldlm_lock *lock, void *data) LDLM_DEBUG(lock, "client-side enqueue: granted"); } else { /* Take into AT only CP RPC, not immediately granted locks */ - delay = ktime_get_real_seconds() - lock->l_last_activity; + delay = ktime_get_real_seconds() - lock->l_activity; LDLM_DEBUG(lock, "client-side enqueue: granted after %lds", delay); @@ -245,7 +245,7 @@ int ldlm_completion_ast(struct ldlm_lock *lock, u64 flags, void *data) timeout = ldlm_cp_timeout(lock); - lock->l_last_activity = ktime_get_real_seconds(); + lock->l_activity = ktime_get_real_seconds(); if (imp) { spin_lock(&imp->imp_lock); @@ -725,7 +725,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, lock->l_export = NULL; lock->l_blocking_ast = einfo->ei_cb_bl; lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL)); - lock->l_last_activity = ktime_get_real_seconds(); + lock->l_activity = ktime_get_real_seconds(); /* lock not sent to server yet */ if (!reqp || !*reqp) { From patchwork Thu Feb 27 21:08:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09AA6138D for ; Thu, 27 Feb 2020 21:20:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E506F246A1 for ; Thu, 27 Feb 2020 21:20:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E506F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A232721FD53; Thu, 27 Feb 2020 13:19:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38B6F21FAD6 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B5447E02; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B39EE46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:34 -0500 Message-Id: <1582838290-17243-47-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 046/622] lustre: ptlrpc: Add WBC connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin It denotes ability of the node to understand additional types of intent requests, exclusive metadata locks issued to clients and server operations performed under such locks while still held by clients. WC-bug-id: https://jira.whamcloud.com/browse/LU-10938 Lustre-commit: f024aabf8bbf ("LU-10938 ptlrpc: Add WBC connect flag") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32241 Reviewed-by: Andreas Dilger Reviewed-by: Mikhal Pershin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 5 +++++ 3 files changed, 8 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 66d2679..e2575b4 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -117,6 +117,7 @@ "unknown", /* 0x08 */ "unknown", /* 0x10 */ "flr", /* 0x20 */ + "wbc", /* 0x40 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index b14d301c..c566dea 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1115,6 +1115,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_DIR_MIGRATE); LASSERTF(OBD_CONNECT2_FLR == 0x20ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_FLR); + LASSERTF(OBD_CONNECT2_WBC_INTENTS == 0x40ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_WBC_INTENTS); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 2403b89..f437614 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -794,6 +794,11 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ +#define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents + * for wbc, also operations + * under client-held parent + * locks + */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:08:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A360514BC for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8BFD12469F for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8BFD12469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 779DA21FD4D; Thu, 27 Feb 2020 13:20:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8D1C521F982 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B82C3E03; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B6AA5468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:35 -0500 Message-Id: <1582838290-17243-48-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 047/622] lustre: llog: remove obsolete llog handlers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" Remove the obsolete llog RPC handling for cancel, close, and destroy. Remove llog handling from ldlm_callback_handler(). Remove the unused client side method llog_client_destroy(). WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 85011d372dfb ("LU-10855 llog: remove obsolete llog handlers") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32202 Reviewed-by: Mikhal Pershin Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 3 --- fs/lustre/ptlrpc/layout.c | 26 -------------------------- include/uapi/linux/lustre/lustre_idl.h | 12 ++++++------ 3 files changed, 6 insertions(+), 35 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 2348569..2737240 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -212,13 +212,10 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_LDLM_GL_CALLBACK; extern struct req_format RQF_LDLM_GL_CALLBACK_DESC; /* LOG req_format */ -extern struct req_format RQF_LOG_CANCEL; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_CREATE; -extern struct req_format RQF_LLOG_ORIGIN_HANDLE_DESTROY; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK; extern struct req_format RQF_LLOG_ORIGIN_HANDLE_READ_HEADER; -extern struct req_format RQF_LLOG_ORIGIN_CONNECT; extern struct req_format RQF_CONNECT; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 4909b30..8fe661d 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -88,11 +88,6 @@ &RMF_MGS_CONFIG_RES }; -static const struct req_msg_field *log_cancel_client[] = { - &RMF_PTLRPC_BODY, - &RMF_LOGCOOKIES -}; - static const struct req_msg_field *mdt_body_only[] = { &RMF_PTLRPC_BODY, &RMF_MDT_BODY @@ -547,11 +542,6 @@ &RMF_LLOG_LOG_HDR }; -static const struct req_msg_field *llogd_conn_body_only[] = { - &RMF_PTLRPC_BODY, - &RMF_LLOGD_CONN_BODY -}; - static const struct req_msg_field *llog_origin_handle_next_block_server[] = { &RMF_PTLRPC_BODY, &RMF_LLOGD_BODY, @@ -766,13 +756,10 @@ &RQF_LDLM_INTENT_CREATE, &RQF_LDLM_INTENT_UNLINK, &RQF_LDLM_INTENT_GETXATTR, - &RQF_LOG_CANCEL, &RQF_LLOG_ORIGIN_HANDLE_CREATE, - &RQF_LLOG_ORIGIN_HANDLE_DESTROY, &RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK, &RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK, &RQF_LLOG_ORIGIN_HANDLE_READ_HEADER, - &RQF_LLOG_ORIGIN_CONNECT, &RQF_CONNECT, }; @@ -1254,10 +1241,6 @@ struct req_format RQF_FLD_READ = DEFINE_REQ_FMT0("FLD_READ", fld_read_client, fld_read_server); EXPORT_SYMBOL(RQF_FLD_READ); -struct req_format RQF_LOG_CANCEL = - DEFINE_REQ_FMT0("OBD_LOG_CANCEL", log_cancel_client, empty); -EXPORT_SYMBOL(RQF_LOG_CANCEL); - struct req_format RQF_MDS_QUOTACTL = DEFINE_REQ_FMT0("MDS_QUOTACTL", quotactl_only, quotactl_only); EXPORT_SYMBOL(RQF_MDS_QUOTACTL); @@ -1511,11 +1494,6 @@ struct req_format RQF_LLOG_ORIGIN_HANDLE_CREATE = llog_origin_handle_create_client, llogd_body_only); EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_CREATE); -struct req_format RQF_LLOG_ORIGIN_HANDLE_DESTROY = - DEFINE_REQ_FMT0("LLOG_ORIGIN_HANDLE_DESTROY", - llogd_body_only, llogd_body_only); -EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_DESTROY); - struct req_format RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK = DEFINE_REQ_FMT0("LLOG_ORIGIN_HANDLE_NEXT_BLOCK", llogd_body_only, llog_origin_handle_next_block_server); @@ -1531,10 +1509,6 @@ struct req_format RQF_LLOG_ORIGIN_HANDLE_READ_HEADER = llogd_body_only, llog_log_hdr_only); EXPORT_SYMBOL(RQF_LLOG_ORIGIN_HANDLE_READ_HEADER); -struct req_format RQF_LLOG_ORIGIN_CONNECT = - DEFINE_REQ_FMT0("LLOG_ORIGIN_CONNECT", llogd_conn_body_only, empty); -EXPORT_SYMBOL(RQF_LLOG_ORIGIN_CONNECT); - struct req_format RQF_CONNECT = DEFINE_REQ_FMT0("CONNECT", obd_connect_client, obd_connect_server); EXPORT_SYMBOL(RQF_CONNECT); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index f437614..7cf7307 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2312,7 +2312,7 @@ struct cfg_marker { enum obd_cmd { OBD_PING = 400, - OBD_LOG_CANCEL, + OBD_LOG_CANCEL, /* Obsolete since 1.5. */ OBD_QC_CALLBACK, /* not used since 2.4 */ OBD_IDX_READ, OBD_LAST_OPC @@ -2624,12 +2624,12 @@ enum llogd_rpc_ops { LLOG_ORIGIN_HANDLE_CREATE = 501, LLOG_ORIGIN_HANDLE_NEXT_BLOCK = 502, LLOG_ORIGIN_HANDLE_READ_HEADER = 503, - LLOG_ORIGIN_HANDLE_WRITE_REC = 504, - LLOG_ORIGIN_HANDLE_CLOSE = 505, - LLOG_ORIGIN_CONNECT = 506, - LLOG_CATINFO = 507, /* deprecated */ + LLOG_ORIGIN_HANDLE_WRITE_REC = 504, /* Obsolete by 2.1. */ + LLOG_ORIGIN_HANDLE_CLOSE = 505, /* Obsolete by 1.8. */ + LLOG_ORIGIN_CONNECT = 506, /* Obsolete by 2.4. */ + LLOG_CATINFO = 507, /* Obsolete by 2.3. */ LLOG_ORIGIN_HANDLE_PREV_BLOCK = 508, - LLOG_ORIGIN_HANDLE_DESTROY = 509, /* for destroy llog object*/ + LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete. */ LLOG_LAST_OPC, LLOG_FIRST_OPC = LLOG_ORIGIN_HANDLE_CREATE }; From patchwork Thu Feb 27 21:08:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B86F14BC for ; Thu, 27 Feb 2020 21:20:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2307C246A1 for ; Thu, 27 Feb 2020 21:20:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2307C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8F5721FE03; Thu, 27 Feb 2020 13:19:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E599D21F982 for ; Thu, 27 Feb 2020 13:18:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BB005E04; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B991646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:36 -0500 Message-Id: <1582838290-17243-49-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 048/622] lustre: ldlm: fix for l_lru usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng Fixes for lock convert code to prevent false assertion and busy locks in LRU: - ensure no l_readers and l_writers when add lock to LRU after convert. - don't verify l_lru without ns_lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-11003 Lustre-commit: 2a77dd3bee76 ("LU-11003 ldlm: fix for l_lru usage") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/32309 Reviewed-by: Fan Yong Reviewed-by: Mikhal Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_inodebits.c | 1 - fs/lustre/ldlm/ldlm_request.c | 19 +++++++++++-------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index e74928e..ddbf8d4 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -171,7 +171,6 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); unlock_res_and_lock(lock); - LASSERT(list_empty(&lock->l_lru)); goto exit; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 67c23fc..5833f59 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -881,21 +881,25 @@ static int lock_convert_interpret(const struct lu_env *env, } else { ldlm_clear_converting(lock); - /* Concurrent BL AST has arrived, it may cause another convert - * or cancel so just exit here. + /* Concurrent BL AST may arrive and cause another convert + * or cancel so just do nothing here if bl_ast is set, + * finish with convert otherwise. */ if (!ldlm_is_bl_ast(lock)) { struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); /* Drop cancel_bits since there are no more converts - * and put lock into LRU if it is not there yet. + * and put lock into LRU if it is still not used and + * is not there yet. */ lock->l_policy_data.l_inodebits.cancel_bits = 0; - spin_lock(&ns->ns_lock); - if (!list_empty(&lock->l_lru)) + if (!lock->l_readers && !lock->l_writers) { + spin_lock(&ns->ns_lock); + /* there is check for list_empty() inside */ ldlm_lock_remove_from_lru_nolock(lock); - ldlm_lock_add_to_lru_nolock(lock); - spin_unlock(&ns->ns_lock); + ldlm_lock_add_to_lru_nolock(lock); + spin_unlock(&ns->ns_lock); + } } } unlock_res_and_lock(lock); @@ -903,7 +907,6 @@ static int lock_convert_interpret(const struct lu_env *env, if (rc) { lock_res_and_lock(lock); if (ldlm_is_converting(lock)) { - LASSERT(list_empty(&lock->l_lru)); ldlm_clear_converting(lock); ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); From patchwork Thu Feb 27 21:08:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0D2C138D for ; Thu, 27 Feb 2020 21:20:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A999E246A1 for ; Thu, 27 Feb 2020 21:20:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A999E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73E8C21FE95; Thu, 27 Feb 2020 13:19:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 321A421F982 for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BE4B5E05; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BCA6046A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:37 -0500 Message-Id: <1582838290-17243-50-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 049/622] lustre: lov: Move lov_tgts_kobj init to lov_setup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin and free it in lov_cleanup. This looks like a more robust solution vs doint it in lov_putref esp. since we know refcount there crosses 0 repeatedly, confusing things. WC-bug-id: https://jira.whamcloud.com/browse/LU-11015 Lustre-commit: 313ac16698db ("LU-11015 lov: Move lov_tgts_kobj init to lov_setup") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32367 Reviewed-by: James Simmons Reviewed-by: John L. Hammond Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 26637bc..9449aa9 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -110,10 +110,6 @@ void lov_tgts_putref(struct obd_device *obd) /* Disconnect */ __lov_del_obd(obd, tgt); } - - if (lov->lov_tgts_kobj) - kobject_put(lov->lov_tgts_kobj); - } else { mutex_unlock(&lov->lov_lock); } @@ -235,9 +231,6 @@ static int lov_connect(const struct lu_env *env, lov_tgts_getref(obd); - lov->lov_tgts_kobj = kobject_create_and_add("target_obds", - &obd->obd_kset.kobj); - for (i = 0; i < lov->desc.ld_tgt_count; i++) { tgt = lov->lov_tgts[i]; if (!tgt || obd_uuid_empty(&tgt->ltd_uuid)) @@ -784,6 +777,9 @@ int lov_setup(struct obd_device *obd, struct lustre_cfg *lcfg) if (rc) goto out_tunables; + lov->lov_tgts_kobj = kobject_create_and_add("target_obds", + &obd->obd_kset.kobj); + return 0; out_tunables: @@ -799,6 +795,11 @@ static int lov_cleanup(struct obd_device *obd) struct lov_obd *lov = &obd->u.lov; struct pool_desc *pool, *tmp; + if (lov->lov_tgts_kobj) { + kobject_put(lov->lov_tgts_kobj); + lov->lov_tgts_kobj = NULL; + } + list_for_each_entry_safe(pool, tmp, &lov->lov_pool_list, pool_list) { /* free pool structs */ CDEBUG(D_INFO, "delete pool %p\n", pool); From patchwork Thu Feb 27 21:08:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20BFF138D for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 083092469F for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 083092469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AFB4A21FD8C; Thu, 27 Feb 2020 13:20:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 74F0921FA4B for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C14AAE07; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BFFCC46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:38 -0500 Message-Id: <1582838290-17243-51-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 050/622] lustre: osc: add T10PI support for RPC checksum X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi T10 Protection Information (T10 PI), previously known as Data Integrity Field (DIF), is a standard for end-to-end data integrity validation. T10 PI prevents silent data corruption, ensuring that incomplete and incorrect data cannot overwrite good data. Lustre file system already supports RPC level checksum which validates the data in bulk RPCs when writing/reading data to/from objects on OSTs. RPC level checksum can detect data corruption that happens during RPC being transferred over the wire. However, it is not capable to prevent silent data corruption happening in other conditions, for example, memory corruption when data is cached in page cache. And by using the existing checksum mechanism, only disjoint protection coverage is provided. Thus, in order to provide end-to-end data protection, T10PI support for Lustre should be added. In order to provide end-to-end data integrity validation, the T10 PI checksum of data in a sector need to be calculated on Lustre client side and validated later on the Lustre OSS side. The T10 protection information should be sent together with the data in the RPC. However, in order to avoid significant performance degradation, instead of sending all original guard tags for all sectors in a bulk RPC, the existing checksum feature of bulk RPC will be integrated together with the new T10PI feature. When OST starts, necessary T10PI information will be extracted from storage, i.e. the T10PI DIF type and sector size. The DIF type could be one of TYPE1_IP, TYPE1_CRC, TYPE3_IP and TYPE3_CRC. And sector size could be either 512 or 4K bytes. When an OSC is connecting to OST, OSC and OST will negotiate about the checksum types. New checksum types are added for T10PI support including OBD_CKSUM_T10IP512, OBD_CKSUM_T10IP4K, OBD_CKSUM_T10CRC512, and OBD_CKSUM_T10CRC4K. If the OST storage has T10PI suppoort, the only selectable T10PI checksum type would have the same type with the T10PI type of the hardware. The other existing checksum types (crc32, crc32c, adler32) are still valid options for the RPC checksum type. When calculating RPC checksum of T10PI, the T10PI checksums of all sectors will be calculated first using the T10PI chekcsum type, i.e. 16-bit crc or IP checksum. And then RPC checksum will be calculated on all of the T10PI checksums. The RPC checksum type used in this step is always alder32. Considering that the checksum-of-checksums is only computed on a * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors, or 16KB of GRD tags for 16MB of 4KB sectors, this is only 1/256 or 1/1024 of the total data being checksummed, so the checksum type used here should not affect overall system performance noticeably. obdfilter.*.enforce_t10pi_cksum can be used to tune whether to enforce T10-PI checksum or not. If the OST supports T10-PI feature and T10-PI chekcsum is enforced, clients will have no other choice for RPC checksum type other than using the T10PI chekcsum type. This is useful for enforcing end-to-end integrity in the whole system. If the OST doesn't support T10-PI feature and T10-PI chekcsum is enforced, together with other checksums with reasonably good speeds (e.g. crc32, crc32c, adler, etc.), all the T10-PI checksum types (t10ip512, t10ip4K, t10crc512, t10crc4K) will be added to the available checksum types, regardless of the speeds of T10-PI chekcsums. This is useful for testing T10-PI checksums of RPC. If the OST supports T10-PI feature and T10-PI chekcsum is NOT enforced, the corresponding T10-PI checksum type will be added to the checksum type list, regardless of the speed of the T10-PI chekcsum. This provide the clients to flexibility to choose whether to enable end-to-end integrity or not. If the OST does NOT supports T10-PI feature and T10-PI chekcsum is NOT enforced, together with other checksums with reasonably good speeds, all the T10-PI checksum types with good speeds will be added into the checksum type list. Note that a T10-PI checksum type with a speed worse than half of Alder will NOT be added as a option. In this circumstance, T10-PI checksum types has the same behavior like other normal checksum types. The clients that has no T10-PI RPC checksum support will not be affected by the above-mentioned logic. And that logic will only be enforced to the newly connected clients after changing obdfilter.*.enforce_t10pi_cksum on an OST. Following are the speeds of different checksum types on a server with CPU of Intel(R) Xeon(R) E5-2650 @ 2.00GHz: crc: 1575 MB/s crc32c: 9763 MB/s adler: 1255 MB/s t10ip512: 6151 MB/s t10ip4k: 7935 MB/s t10crc512: 1119 MB/s t10crc4k: 1531 MB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-10472 Lustre-commit: b1e7be00cb6e ("LU-10472 osc: add T10PI support for RPC checksum") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/30980 Reviewed-by: Andreas Dilger Reviewed-by: Faccini Bruno Signed-off-by: James Simmons --- fs/lustre/include/obd_cksum.h | 123 +++++++++------ fs/lustre/include/obd_class.h | 1 - fs/lustre/llite/llite_lib.c | 4 +- fs/lustre/obdclass/Makefile | 2 +- fs/lustre/obdclass/integrity.c | 273 +++++++++++++++++++++++++++++++++ fs/lustre/obdclass/obd_cksum.c | 151 ++++++++++++++++++ fs/lustre/osc/osc_request.c | 214 +++++++++++++++++++++++--- fs/lustre/ptlrpc/import.c | 8 +- fs/lustre/ptlrpc/wiretest.c | 17 +- include/uapi/linux/lustre/lustre_idl.h | 48 ++++-- net/lnet/libcfs/linux-crypto.c | 3 + 11 files changed, 753 insertions(+), 91 deletions(-) create mode 100644 fs/lustre/obdclass/integrity.c create mode 100644 fs/lustre/obdclass/obd_cksum.c diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h index 26a9555..cc47c44 100644 --- a/fs/lustre/include/obd_cksum.h +++ b/fs/lustre/include/obd_cksum.h @@ -35,6 +35,9 @@ #include #include +int obd_t10_cksum_speed(const char *obd_name, + enum cksum_type cksum_type); + static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type) { switch (cksum_type) { @@ -51,59 +54,23 @@ static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type) return 0; } -/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can - * only be a single checksum type per RPC. - * - * The OBD_CHECKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask - * since they need to represent the full range of checksum algorithms that - * both the client and server can understand. - * - * In case of an unsupported types/flags we fall back to ADLER - * because that is supported by all clients since 1.8 - * - * In case multiple algorithms are supported the best one is used. - */ -static inline u32 cksum_type_pack(enum cksum_type cksum_type) -{ - unsigned int performance = 0, tmp; - u32 flag = OBD_FL_CKSUM_ADLER; - - if (cksum_type & OBD_CKSUM_CRC32) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_CRC32; - } - } - if (cksum_type & OBD_CKSUM_CRC32C) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_CRC32C; - } - } - if (cksum_type & OBD_CKSUM_ADLER) { - tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)); - if (tmp > performance) { - performance = tmp; - flag = OBD_FL_CKSUM_ADLER; - } - } - if (unlikely(cksum_type && !(cksum_type & (OBD_CKSUM_CRC32C | - OBD_CKSUM_CRC32 | - OBD_CKSUM_ADLER)))) - CWARN("unknown cksum type %x\n", cksum_type); - - return flag; -} +u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type); -static inline enum cksum_type cksum_type_unpack(u32 o_flags) +static inline enum cksum_type obd_cksum_type_unpack(u32 o_flags) { switch (o_flags & OBD_FL_CKSUM_ALL) { case OBD_FL_CKSUM_CRC32C: return OBD_CKSUM_CRC32C; case OBD_FL_CKSUM_CRC32: return OBD_CKSUM_CRC32; + case OBD_FL_CKSUM_T10IP512: + return OBD_CKSUM_T10IP512; + case OBD_FL_CKSUM_T10IP4K: + return OBD_CKSUM_T10IP4K; + case OBD_FL_CKSUM_T10CRC512: + return OBD_CKSUM_T10CRC512; + case OBD_FL_CKSUM_T10CRC4K: + return OBD_CKSUM_T10CRC4K; default: break; } @@ -115,7 +82,7 @@ static inline enum cksum_type cksum_type_unpack(u32 o_flags) * 1.8 supported ADLER it is base and not depend on hw * Client uses all available local algos */ -static inline enum cksum_type cksum_types_supported_client(void) +static inline enum cksum_type obd_cksum_types_supported_client(void) { enum cksum_type ret = OBD_CKSUM_ADLER; @@ -128,6 +95,8 @@ static inline enum cksum_type cksum_types_supported_client(void) ret |= OBD_CKSUM_CRC32C; if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) > 0) ret |= OBD_CKSUM_CRC32; + /* Client support all kinds of T10 checksum */ + ret |= OBD_CKSUM_T10_ALL; return ret; } @@ -140,14 +109,68 @@ static inline enum cksum_type cksum_types_supported_client(void) * Caution is advised, however, since what is fastest on a single client may * not be the fastest or most efficient algorithm on the server. */ -static inline enum cksum_type cksum_type_select(enum cksum_type cksum_types) +static inline enum cksum_type +obd_cksum_type_select(const char *obd_name, enum cksum_type cksum_types) { - return cksum_type_unpack(cksum_type_pack(cksum_types)); + u32 flag = obd_cksum_type_pack(obd_name, cksum_types); + + return obd_cksum_type_unpack(flag); } /* Checksum algorithm names. Must be defined in the same order as the * OBD_CKSUM_* flags. */ -#define DECLARE_CKSUM_NAME char *cksum_name[] = {"crc32", "adler", "crc32c"} +#define DECLARE_CKSUM_NAME const char *cksum_name[] = {"crc32", "adler", \ + "crc32c", "reserved", "t10ip512", "t10ip4K", "t10crc512", "t10crc4K"} + +typedef u16 (obd_dif_csum_fn) (void *, unsigned int); + +u16 obd_dif_crc_fn(void *data, unsigned int len); +u16 obd_dif_ip_fn(void *data, unsigned int len); +int obd_page_dif_generate_buffer(const char *obd_name, struct page *page, + u32 offset, u32 length, + u16 *guard_start, int guard_number, + int *used_number, int sector_size, + obd_dif_csum_fn *fn); +/* + * If checksum type is one T10 checksum types, init the csum_fn and sector + * size. Otherwise, init them to NULL/zero. + */ +static inline void obd_t10_cksum2dif(enum cksum_type cksum_type, + obd_dif_csum_fn **fn, int *sector_size) +{ + *fn = NULL; + *sector_size = 0; + + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + *fn = obd_dif_ip_fn; + *sector_size = 512; + break; + case OBD_CKSUM_T10IP4K: + *fn = obd_dif_ip_fn; + *sector_size = 4096; + break; + case OBD_CKSUM_T10CRC512: + *fn = obd_dif_crc_fn; + *sector_size = 512; + break; + case OBD_CKSUM_T10CRC4K: + *fn = obd_dif_crc_fn; + *sector_size = 4096; + break; + default: + break; + } +} + +enum obd_t10_cksum_type { + OBD_T10_CKSUM_UNKNOWN = 0, + OBD_T10_CKSUM_IP512, + OBD_T10_CKSUM_IP4K, + OBD_T10_CKSUM_CRC512, + OBD_T10_CKSUM_CRC4K, + OBD_T10_CKSUM_MAX +}; #endif /* __OBD_H */ diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index d896049..0153c50 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1687,7 +1687,6 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) extern char obd_jobid_name[]; int class_procfs_init(void); int class_procfs_clean(void); - /* prng.c */ #define ll_generate_random_uuid(uuid_out) \ get_random_bytes(uuid_out, sizeof(class_uuid_t)) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index eb29064..dff349f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -218,7 +218,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_LARGE_ACL; #endif - data->ocd_cksum_types = cksum_types_supported_client(); + data->ocd_cksum_types = obd_cksum_types_supported_client(); if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT)) /* flag mdc connection as lightweight, only used for test @@ -432,7 +432,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) if (OBD_FAIL_CHECK(OBD_FAIL_OSC_CKSUM_ADLER_ONLY)) data->ocd_cksum_types = OBD_CKSUM_ADLER; else - data->ocd_cksum_types = cksum_types_supported_client(); + data->ocd_cksum_types = obd_cksum_types_supported_client(); data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 96fce1b..25d2e1d 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o + jobid.o integrity.o obd_cksum.o diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c new file mode 100644 index 0000000..8348b16 --- /dev/null +++ b/fs/lustre/obdclass/integrity.c @@ -0,0 +1,273 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2018, DataDirect Networks Storage. + * Author: Li Xi. + * + * General data integrity functions + */ +#include +#include +#include +#include +#include + +u16 obd_dif_crc_fn(void *data, unsigned int len) +{ + return cpu_to_be16(crc_t10dif(data, len)); +} +EXPORT_SYMBOL(obd_dif_crc_fn); + +u16 obd_dif_ip_fn(void *data, unsigned int len) +{ + return ip_compute_csum(data, len); +} +EXPORT_SYMBOL(obd_dif_ip_fn); + +int obd_page_dif_generate_buffer(const char *obd_name, struct page *page, + u32 offset, u32 length, + u16 *guard_start, int guard_number, + int *used_number, int sector_size, + obd_dif_csum_fn *fn) +{ + unsigned int i; + char *data_buf; + u16 *guard_buf = guard_start; + unsigned int data_size; + int used = 0; + + data_buf = kmap(page) + offset; + for (i = 0; i < length; i += sector_size) { + if (used >= guard_number) { + CERROR("%s: unexpected used guard number of DIF %u/%u, data length %u, sector size %u: rc = %d\n", + obd_name, used, guard_number, length, + sector_size, -E2BIG); + return -E2BIG; + } + data_size = length - i; + if (data_size > sector_size) + data_size = sector_size; + *guard_buf = fn(data_buf, data_size); + guard_buf++; + data_buf += data_size; + used++; + } + kunmap(page); + *used_number = used; + + return 0; +} +EXPORT_SYMBOL(obd_page_dif_generate_buffer); + +static int __obd_t10_performance_test(const char *obd_name, + enum cksum_type cksum_type, + struct page *data_page, + int repeat_number) +{ + unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP); + struct ahash_request *hdesc; + obd_dif_csum_fn *fn = NULL; + unsigned int bufsize; + unsigned char *buffer; + struct page *__page; + u16 *guard_start; + int guard_number; + int used_number = 0; + int sector_size = 0; + u32 cksum; + int rc = 0; + int rc2; + int used; + int i; + + obd_t10_cksum2dif(cksum_type, &fn, §or_size); + if (!fn) + return -EINVAL; + + __page = alloc_page(GFP_KERNEL); + if (!__page) + return -ENOMEM; + + hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0); + if (IS_ERR(hdesc)) { + rc = PTR_ERR(hdesc); + CERROR("%s: unable to initialize checksum hash %s: rc = %d\n", + obd_name, cfs_crypto_hash_name(cfs_alg), rc); + goto out; + } + + buffer = kmap(__page); + guard_start = (u16 *)buffer; + guard_number = PAGE_SIZE / sizeof(*guard_start); + for (i = 0; i < repeat_number; i++) { + /* + * The left guard number should be able to hold checksums of a + * whole page + */ + rc = obd_page_dif_generate_buffer(obd_name, data_page, 0, + PAGE_SIZE, + guard_start + used_number, + guard_number - used_number, + &used, sector_size, fn); + if (rc) + break; + + used_number += used; + if (used_number == guard_number) { + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + used_number = 0; + } + } + kunmap(__page); + if (rc) + goto out_final; + + if (used_number != 0) + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + + bufsize = sizeof(cksum); +out_final: + rc2 = cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize); + rc = rc ? rc : rc2; +out: + __free_page(__page); + + return rc; +} + +/** + * Array of T10PI checksum algorithm speed in MByte per second + */ +static int obd_t10_cksum_speeds[OBD_T10_CKSUM_MAX]; + +static enum obd_t10_cksum_type +obd_t10_cksum2type(enum cksum_type cksum_type) +{ + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + return OBD_T10_CKSUM_IP512; + case OBD_CKSUM_T10IP4K: + return OBD_T10_CKSUM_IP4K; + case OBD_CKSUM_T10CRC512: + return OBD_T10_CKSUM_CRC512; + case OBD_CKSUM_T10CRC4K: + return OBD_T10_CKSUM_CRC4K; + default: + return OBD_T10_CKSUM_UNKNOWN; + } +} + +static const char *obd_t10_cksum_name(enum obd_t10_cksum_type index) +{ + DECLARE_CKSUM_NAME; + + /* Need to skip "crc32", "adler", "crc32c", "reserved" */ + return cksum_name[3 + index]; +} + +/** + * Compute the speed of specified T10PI checksum type + * + * Run a speed test on the given T10PI checksum on buffer using a 1MB buffer + * size. This is a reasonable buffer size for Lustre RPCs, even if the actual + * RPC size is larger or smaller. + * + * The speed is stored internally in the obd_t10_cksum_speeds[] array, and + * is available through the obd_t10_cksum_speed() function. + * + * This function needs to stay the same as cfs_crypto_performance_test() so + * that the speeds are comparable. And this function should reflect the real + * cost of the checksum calculation. + * + * \param[in] obd_name name of the OBD device + * \param[in] cksum_type checksum type (OBD_CKSUM_T10*) + */ +static void obd_t10_performance_test(const char *obd_name, + enum cksum_type cksum_type) +{ + enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type); + const int buf_len = max(PAGE_SIZE, 1048576UL); + unsigned long bcount; + unsigned long start; + unsigned long end; + struct page *page; + int rc = 0; + void *buf; + + page = alloc_page(GFP_KERNEL); + if (!page) { + rc = -ENOMEM; + goto out; + } + + buf = kmap(page); + memset(buf, 0xAD, PAGE_SIZE); + kunmap(page); + + for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC / 4), + bcount = 0; time_before(jiffies, end) && rc == 0; bcount++) { + rc = __obd_t10_performance_test(obd_name, cksum_type, page, + buf_len / PAGE_SIZE); + if (rc) + break; + } + end = jiffies; + __free_page(page); +out: + if (rc) { + obd_t10_cksum_speeds[index] = rc; + CDEBUG(D_INFO, + "%s: T10 checksum algorithm %s test error: rc = %d\n", + obd_name, obd_t10_cksum_name(index), rc); + } else { + unsigned long tmp; + + tmp = ((bcount * buf_len / jiffies_to_msecs(end - start)) * + 1000) / (1024 * 1024); + obd_t10_cksum_speeds[index] = (int)tmp; + CDEBUG(D_CONFIG, + "%s: T10 checksum algorithm %s speed = %d MB/s\n", + obd_name, obd_t10_cksum_name(index), + obd_t10_cksum_speeds[index]); + } +} + +int obd_t10_cksum_speed(const char *obd_name, + enum cksum_type cksum_type) +{ + enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type); + + if (unlikely(obd_t10_cksum_speeds[index] == 0)) { + static DEFINE_MUTEX(obd_t10_cksum_speed_mutex); + + mutex_lock(&obd_t10_cksum_speed_mutex); + if (obd_t10_cksum_speeds[index] == 0) + obd_t10_performance_test(obd_name, cksum_type); + mutex_unlock(&obd_t10_cksum_speed_mutex); + } + + return obd_t10_cksum_speeds[index]; +} +EXPORT_SYMBOL(obd_t10_cksum_speed); diff --git a/fs/lustre/obdclass/obd_cksum.c b/fs/lustre/obdclass/obd_cksum.c new file mode 100644 index 0000000..601feb7 --- /dev/null +++ b/fs/lustre/obdclass/obd_cksum.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2018, DataDirect Networks Storage. + * Author: Li Xi. + * + * Checksum functions + */ +#include +#include + +/* Server uses algos that perform at 50% or better of the Adler */ +enum cksum_type obd_cksum_types_supported_server(const char *obd_name) +{ + enum cksum_type ret = OBD_CKSUM_ADLER; + int base_speed; + + CDEBUG(D_INFO, + "%s: checksum speed: crc %d, crc32c %d, adler %d, t10ip512 %d, t10ip4k %d, t10crc512 %d, t10crc4k %d\n", + obd_name, + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)), + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)), + cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512), + obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K)); + + base_speed = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)) / 2; + + if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)) >= + base_speed) + ret |= OBD_CKSUM_CRC32C; + + if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) >= + base_speed) + ret |= OBD_CKSUM_CRC32; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512) >= base_speed) + ret |= OBD_CKSUM_T10IP512; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K) >= base_speed) + ret |= OBD_CKSUM_T10IP4K; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512) >= base_speed) + ret |= OBD_CKSUM_T10CRC512; + + if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K) >= base_speed) + ret |= OBD_CKSUM_T10CRC4K; + + return ret; +} +EXPORT_SYMBOL(obd_cksum_types_supported_server); + +/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can + * only be a single checksum type per RPC. + * + * The OBD_CKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask + * since they need to represent the full range of checksum algorithms that + * both the client and server can understand. + * + * In case of an unsupported types/flags we fall back to ADLER + * because that is supported by all clients since 1.8 + * + * In case multiple algorithms are supported the best one is used. + */ +u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type) +{ + unsigned int performance = 0, tmp; + u32 flag = OBD_FL_CKSUM_ADLER; + + if (cksum_type & OBD_CKSUM_CRC32) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_CRC32; + } + } + if (cksum_type & OBD_CKSUM_CRC32C) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_CRC32C; + } + } + if (cksum_type & OBD_CKSUM_ADLER) { + tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_ADLER; + } + } + + if (cksum_type & OBD_CKSUM_T10IP512) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10IP512; + } + } + + if (cksum_type & OBD_CKSUM_T10IP4K) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10IP4K; + } + } + + if (cksum_type & OBD_CKSUM_T10CRC512) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10CRC512; + } + } + + if (cksum_type & OBD_CKSUM_T10CRC4K) { + tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K); + if (tmp > performance) { + performance = tmp; + flag = OBD_FL_CKSUM_T10CRC4K; + } + } + + if (unlikely(cksum_type && !(cksum_type & OBD_CKSUM_ALL))) + CWARN("%s: unknown cksum type %x\n", obd_name, cksum_type); + + return flag; +} +EXPORT_SYMBOL(obd_cksum_type_pack); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index c430239..9ac9c84 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1030,6 +1030,105 @@ static inline int can_merge_pages(struct brw_page *p1, struct brw_page *p2) return (p1->off + p1->count == p2->off); } +static int osc_checksum_bulk_t10pi(const char *obd_name, int nob, + size_t pg_count, struct brw_page **pga, + int opc, obd_dif_csum_fn *fn, + int sector_size, + u32 *check_sum) +{ + struct ahash_request *hdesc; + /* Used Adler as the default checksum type on top of DIF tags */ + unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP); + struct page *__page; + unsigned char *buffer; + u16 *guard_start; + unsigned int bufsize; + int guard_number; + int used_number = 0; + int used; + u32 cksum; + int rc = 0; + int i = 0; + + LASSERT(pg_count > 0); + + __page = alloc_page(GFP_KERNEL); + if (!__page) + return -ENOMEM; + + hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0); + if (IS_ERR(hdesc)) { + rc = PTR_ERR(hdesc); + CERROR("%s: unable to initialize checksum hash %s: rc = %d\n", + obd_name, cfs_crypto_hash_name(cfs_alg), rc); + goto out; + } + + buffer = kmap(__page); + guard_start = (u16 *)buffer; + guard_number = PAGE_SIZE / sizeof(*guard_start); + while (nob > 0 && pg_count > 0) { + unsigned int count = pga[i]->count > nob ? nob : pga[i]->count; + + /* corrupt the data before we compute the checksum, to + * simulate an OST->client data error + */ + if (unlikely(i == 0 && opc == OST_READ && + OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_RECEIVE))) { + unsigned char *ptr = kmap(pga[i]->pg); + int off = pga[i]->off & ~PAGE_MASK; + + memcpy(ptr + off, "bad1", min_t(typeof(nob), 4, nob)); + kunmap(pga[i]->pg); + } + + /* + * The left guard number should be able to hold checksums of a + * whole page + */ + rc = obd_page_dif_generate_buffer(obd_name, pga[i]->pg, 0, + count, + guard_start + used_number, + guard_number - used_number, + &used, sector_size, + fn); + if (rc) + break; + + used_number += used; + if (used_number == guard_number) { + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + used_number = 0; + } + + nob -= pga[i]->count; + pg_count--; + i++; + } + kunmap(__page); + if (rc) + goto out; + + if (used_number != 0) + cfs_crypto_hash_update_page(hdesc, __page, 0, + used_number * sizeof(*guard_start)); + + bufsize = sizeof(cksum); + cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize); + + /* For sending we only compute the wrong checksum instead + * of corrupting the data so it is still correct on a redo + */ + if (opc == OST_WRITE && OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_SEND)) + cksum++; + + *check_sum = cksum; +out: + __free_page(__page); + return rc; +} + static int osc_checksum_bulk(int nob, u32 pg_count, struct brw_page **pga, int opc, enum cksum_type cksum_type, @@ -1090,6 +1189,28 @@ static int osc_checksum_bulk(int nob, u32 pg_count, return 0; } +static int osc_checksum_bulk_rw(const char *obd_name, + enum cksum_type cksum_type, + int nob, size_t pg_count, + struct brw_page **pga, int opc, + u32 *check_sum) +{ + obd_dif_csum_fn *fn = NULL; + int sector_size = 0; + int rc; + + obd_t10_cksum2dif(cksum_type, &fn, §or_size); + + if (fn) + rc = osc_checksum_bulk_t10pi(obd_name, nob, pg_count, pga, + opc, fn, sector_size, check_sum); + else + rc = osc_checksum_bulk(nob, pg_count, pga, opc, cksum_type, + check_sum); + + return rc; +} + static int osc_brw_prep_request(int cmd, struct client_obd *cli, struct obdo *oa, u32 page_count, struct brw_page **pga, @@ -1107,6 +1228,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, struct req_capsule *pill; struct brw_page *pg_prev; void *short_io_buf; + const char *obd_name = cli->cl_import->imp_obd->obd_name; if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ)) return -ENOMEM; /* Recoverable */ @@ -1306,12 +1428,14 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) body->oa.o_flags = 0; - body->oa.o_flags |= cksum_type_pack(cksum_type); + body->oa.o_flags |= obd_cksum_type_pack(obd_name, + cksum_type); body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; - rc = osc_checksum_bulk(requested_nob, page_count, - pga, OST_WRITE, cksum_type, - &body->oa.o_cksum); + rc = osc_checksum_bulk_rw(obd_name, cksum_type, + requested_nob, page_count, + pga, OST_WRITE, + &body->oa.o_cksum); if (rc < 0) { CDEBUG(D_PAGE, "failed to checksum, rc = %d\n", rc); @@ -1322,7 +1446,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, /* save this in 'oa', too, for later checking */ oa->o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; - oa->o_flags |= cksum_type_pack(cksum_type); + oa->o_flags |= obd_cksum_type_pack(obd_name, + cksum_type); } else { /* clear out the checksum flag, in case this is a * resend but cl_checksum is no longer set. b=11238 @@ -1338,7 +1463,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, !sptlrpc_flavor_has_bulk(&req->rq_flvr)) { if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) body->oa.o_flags = 0; - body->oa.o_flags |= cksum_type_pack(cli->cl_cksum_type); + body->oa.o_flags |= obd_cksum_type_pack(obd_name, + cli->cl_cksum_type); body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS; } @@ -1441,6 +1567,10 @@ static int check_write_checksum(struct obdo *oa, u32 client_cksum, u32 server_cksum, struct osc_brw_async_args *aa) { + const char *obd_name = aa->aa_cli->cl_import->imp_obd->obd_name; + obd_dif_csum_fn *fn = NULL; + int sector_size = 0; + bool t10pi = false; u32 new_cksum; char *msg; enum cksum_type cksum_type; @@ -1455,15 +1585,50 @@ static int check_write_checksum(struct obdo *oa, dump_all_bulk_pages(oa, aa->aa_page_count, aa->aa_ppga, server_cksum, client_cksum); - cksum_type = cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ? - oa->o_flags : 0); - rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count, - aa->aa_ppga, OST_WRITE, cksum_type, - &new_cksum); + cksum_type = obd_cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ? + oa->o_flags : 0); + + switch (cksum_type) { + case OBD_CKSUM_T10IP512: + t10pi = true; + fn = obd_dif_ip_fn; + sector_size = 512; + break; + case OBD_CKSUM_T10IP4K: + t10pi = true; + fn = obd_dif_ip_fn; + sector_size = 4096; + break; + case OBD_CKSUM_T10CRC512: + t10pi = true; + fn = obd_dif_crc_fn; + sector_size = 512; + break; + case OBD_CKSUM_T10CRC4K: + t10pi = true; + fn = obd_dif_crc_fn; + sector_size = 4096; + break; + default: + break; + } + + if (t10pi) + rc = osc_checksum_bulk_t10pi(obd_name, aa->aa_requested_nob, + aa->aa_page_count, + aa->aa_ppga, + OST_WRITE, + fn, + sector_size, + &new_cksum); + else + rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count, + aa->aa_ppga, OST_WRITE, cksum_type, + &new_cksum); if (rc < 0) msg = "failed to calculate the client write checksum"; - else if (cksum_type != cksum_type_unpack(aa->aa_oa->o_flags)) + else if (cksum_type != obd_cksum_type_unpack(aa->aa_oa->o_flags)) msg = "the server did not use the checksum type specified in the original request - likely a protocol problem"; else if (new_cksum == server_cksum) msg = "changed on the client after we checksummed it - likely false positive due to mmap IO (bug 11742)"; @@ -1474,15 +1639,15 @@ static int check_write_checksum(struct obdo *oa, LCONSOLE_ERROR_MSG(0x132, "%s: BAD WRITE CHECKSUM: %s: from %s inode " DFID " object " DOSTID " extent [%llu-%llu], original client csum %x (type %x), server csum %x (type %x), client csum now %x\n", - aa->aa_cli->cl_import->imp_obd->obd_name, - msg, libcfs_nid2str(peer->nid), + obd_name, msg, libcfs_nid2str(peer->nid), oa->o_valid & OBD_MD_FLFID ? oa->o_parent_seq : (u64)0, oa->o_valid & OBD_MD_FLFID ? oa->o_parent_oid : 0, oa->o_valid & OBD_MD_FLFID ? oa->o_parent_ver : 0, POSTID(&oa->o_oi), aa->aa_ppga[0]->off, aa->aa_ppga[aa->aa_page_count - 1]->off + aa->aa_ppga[aa->aa_page_count - 1]->count - 1, - client_cksum, cksum_type_unpack(aa->aa_oa->o_flags), + client_cksum, + obd_cksum_type_unpack(aa->aa_oa->o_flags), server_cksum, cksum_type, new_cksum); return 1; @@ -1495,6 +1660,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) const struct lnet_process_id *peer = &req->rq_import->imp_connection->c_peer; struct client_obd *cli = aa->aa_cli; + const char *obd_name = cli->cl_import->imp_obd->obd_name; struct ost_body *body; u32 client_cksum = 0; @@ -1619,17 +1785,17 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) char *via = ""; char *router = ""; enum cksum_type cksum_type; + u32 o_flags = body->oa.o_valid & OBD_MD_FLFLAGS ? + body->oa.o_flags : 0; - cksum_type = cksum_type_unpack(body->oa.o_valid & OBD_MD_FLFLAGS ? - body->oa.o_flags : 0); + cksum_type = obd_cksum_type_unpack(o_flags); - rc = osc_checksum_bulk(rc, aa->aa_page_count, aa->aa_ppga, - OST_READ, cksum_type, &client_cksum); - if (rc < 0) { - CDEBUG(D_PAGE, - "failed to calculate checksum, rc = %d\n", rc); + rc = osc_checksum_bulk_rw(obd_name, cksum_type, rc, + aa->aa_page_count, aa->aa_ppga, + OST_READ, &client_cksum); + if (rc < 0) goto out; - } + if (req->rq_bulk && peer->nid != req->rq_bulk->bd_sender) { via = " via "; @@ -1652,7 +1818,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) "%s: BAD READ CHECKSUM: from %s%s%s inode " DFID " object " DOSTID " extent [%llu-%llu], client %x, server %x, cksum_type %x\n", - req->rq_import->imp_obd->obd_name, + obd_name, libcfs_nid2str(peer->nid), via, router, clbody->oa.o_valid & OBD_MD_FLFID ? diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 5d6546d..019648b 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -786,11 +786,12 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, * for algorithms we understand. The server masked off * the checksum types it doesn't support */ - if (!(ocd->ocd_cksum_types & cksum_types_supported_client())) { + if (!(ocd->ocd_cksum_types & + obd_cksum_types_supported_client())) { LCONSOLE_ERROR("The negotiation of the checksum algorithm to use with server %s failed (%x/%x), disabling checksums\n", obd2cli_tgt(imp->imp_obd), ocd->ocd_cksum_types, - cksum_types_supported_client()); + obd_cksum_types_supported_client()); return -EPROTO; } cli->cl_supp_cksum_types = ocd->ocd_cksum_types; @@ -801,7 +802,8 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, */ cli->cl_supp_cksum_types = OBD_CKSUM_ADLER; } - cli->cl_cksum_type = cksum_type_select(cli->cl_supp_cksum_types); + cli->cl_cksum_type = obd_cksum_type_select(imp->imp_obd->obd_name, + cli->cl_supp_cksum_types); if (ocd->ocd_connect_flags & OBD_CONNECT_BRW_SIZE) cli->cl_max_pages_per_rpc = diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c566dea..01ddbee 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1123,6 +1123,18 @@ void lustre_assert_wire_constants(void) (unsigned int)OBD_CKSUM_ADLER); LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32C); + LASSERTF(OBD_CKSUM_RESERVED == 0x00000008UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_RESERVED); + LASSERTF(OBD_CKSUM_T10IP512 == 0x00000010UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10IP512); + LASSERTF(OBD_CKSUM_T10IP4K == 0x00000020UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10IP4K); + LASSERTF(OBD_CKSUM_T10CRC512 == 0x00000040UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10CRC512); + LASSERTF(OBD_CKSUM_T10CRC4K == 0x00000080UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10CRC4K); + LASSERTF(OBD_CKSUM_T10_TOP == 0x00000002UL, "found 0x%.8xUL\n", + (unsigned int)OBD_CKSUM_T10_TOP); /* Checks for struct ost_layout */ LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n", @@ -1372,7 +1384,10 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(OBD_FL_CKSUM_CRC32 != 0x00001000); BUILD_BUG_ON(OBD_FL_CKSUM_ADLER != 0x00002000); BUILD_BUG_ON(OBD_FL_CKSUM_CRC32C != 0x00004000); - BUILD_BUG_ON(OBD_FL_CKSUM_RSVD2 != 0x00008000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10IP512 != 0x00005000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10IP4K != 0x00006000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10CRC512 != 0x00007000); + BUILD_BUG_ON(OBD_FL_CKSUM_T10CRC4K != 0x00008000); BUILD_BUG_ON(OBD_FL_CKSUM_RSVD3 != 0x00010000); BUILD_BUG_ON(OBD_FL_SHRINK_GRANT != 0x00020000); BUILD_BUG_ON(OBD_FL_MMAP != 0x00040000); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 7cf7307..11df7b4 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -883,15 +883,37 @@ struct obd_connect_data { /* * Supported checksum algorithms. Up to 32 checksum types are supported. * (32-bit mask stored in obd_connect_data::ocd_cksum_types) - * Please update DECLARE_CKSUM_NAME/OBD_CKSUM_ALL in obd.h when adding a new - * algorithm and also the OBD_FL_CKSUM* flags. + * Please update DECLARE_CKSUM_NAME in obd_cksum.h when adding a new + * algorithm and also the OBD_FL_CKSUM* flags, OBD_CKSUM_ALL flag, + * OBD_FL_CKSUM_ALL flag and potentially OBD_CKSUM_T10_ALL flag. */ enum cksum_type { - OBD_CKSUM_CRC32 = 0x00000001, - OBD_CKSUM_ADLER = 0x00000002, - OBD_CKSUM_CRC32C = 0x00000004, + OBD_CKSUM_CRC32 = 0x00000001, + OBD_CKSUM_ADLER = 0x00000002, + OBD_CKSUM_CRC32C = 0x00000004, + OBD_CKSUM_RESERVED = 0x00000008, + OBD_CKSUM_T10IP512 = 0x00000010, + OBD_CKSUM_T10IP4K = 0x00000020, + OBD_CKSUM_T10CRC512 = 0x00000040, + OBD_CKSUM_T10CRC4K = 0x00000080, }; +#define OBD_CKSUM_T10_ALL (OBD_CKSUM_T10IP512 | OBD_CKSUM_T10IP4K | \ + OBD_CKSUM_T10CRC512 | OBD_CKSUM_T10CRC4K) + +#define OBD_CKSUM_ALL (OBD_CKSUM_CRC32 | OBD_CKSUM_ADLER | OBD_CKSUM_CRC32C | \ + OBD_CKSUM_T10_ALL) + +/* + * The default checksum algorithm used on top of T10PI GRD tags for RPC. + * Considering that the checksum-of-checksums is only computing CRC32 on a + * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors, or 16KB of GRD + * tags for 16MB of 4KB sectors, this is only 1/256 or 1/1024 of the + * total data being checksummed, so the checksum type used here should not + * affect overall system performance noticeably. + */ +#define OBD_CKSUM_T10_TOP OBD_CKSUM_ADLER + /* * OST requests: OBDO & OBD request records */ @@ -940,7 +962,10 @@ enum obdo_flags { OBD_FL_CKSUM_CRC32 = 0x00001000, /* CRC32 checksum type */ OBD_FL_CKSUM_ADLER = 0x00002000, /* ADLER checksum type */ OBD_FL_CKSUM_CRC32C = 0x00004000, /* CRC32C checksum type */ - OBD_FL_CKSUM_RSVD2 = 0x00008000, /* for future cksum types */ + OBD_FL_CKSUM_T10IP512 = 0x00005000, /* T10PI IP cksum, 512B sector */ + OBD_FL_CKSUM_T10IP4K = 0x00006000, /* T10PI IP cksum, 4KB sector */ + OBD_FL_CKSUM_T10CRC512 = 0x00007000, /* T10PI CRC cksum, 512B sector */ + OBD_FL_CKSUM_T10CRC4K = 0x00008000, /* T10PI CRC cksum, 4KB sector */ OBD_FL_CKSUM_RSVD3 = 0x00010000, /* for future cksum types */ OBD_FL_SHRINK_GRANT = 0x00020000, /* object shrink the grant */ OBD_FL_MMAP = 0x00040000, /* object is mmapped on the client. @@ -953,11 +978,16 @@ enum obdo_flags { OBD_FL_SHORT_IO = 0x00400000, /* short io request */ /* OBD_FL_LOCAL_MASK = 0xF0000000, was local-only flags until 2.10 */ - /* Note that while these checksum values are currently separate bits, - * in 2.x we can actually allow all values from 1-31 if we wanted. + /* + * Note that while the original checksum values were separate bits, + * in 2.x we can actually allow all values from 1-31. T10-PI checksum + * types already use values which are not separate bits. */ OBD_FL_CKSUM_ALL = (OBD_FL_CKSUM_CRC32 | OBD_FL_CKSUM_ADLER | - OBD_FL_CKSUM_CRC32C), + OBD_FL_CKSUM_CRC32C | OBD_FL_CKSUM_T10IP512 | + OBD_FL_CKSUM_T10IP4K | + OBD_FL_CKSUM_T10CRC512 | + OBD_FL_CKSUM_T10CRC4K), }; /* diff --git a/net/lnet/libcfs/linux-crypto.c b/net/lnet/libcfs/linux-crypto.c index 53285c2..532fab4 100644 --- a/net/lnet/libcfs/linux-crypto.c +++ b/net/lnet/libcfs/linux-crypto.c @@ -318,6 +318,9 @@ int cfs_crypto_hash_final(struct ahash_request *req, * The speed is stored internally in the cfs_crypto_hash_speeds[] array, and * is available through the cfs_crypto_hash_speed() function. * + * This function needs to stay the same as obd_t10_performance_test() so that + * the speeds are comparable. + * * @hash_alg hash algorithm id (CFS_HASH_ALG_*) * @buf data buffer on which to compute the hash * @buf_len length of @buf on which to compute hash From patchwork Thu Feb 27 21:08:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AC66138D for ; Thu, 27 Feb 2020 21:20:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 52EF7246A1 for ; Thu, 27 Feb 2020 21:20:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52EF7246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CFB8E21FB59; Thu, 27 Feb 2020 13:19:51 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CCF0321FA4B for ; Thu, 27 Feb 2020 13:18:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C43FEE09; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C2F30468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:39 -0500 Message-Id: <1582838290-17243-52-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 051/622] lustre: ldlm: Reduce debug to console during eviction X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell During an eviction, Lustre calls ldlm_namespace_cleanup, and it will sometimes end up dumping all of the locks on a particular resource to the console log (ldlm_resource_complain), which is very wasteful and only rarely helpful. Move the debug level for this to D_NETERROR since it is in the default debug mask. Cray-bug-id: LUS-1418 WC-bug-id: https://jira.whamcloud.com/browse/LU-10648 Lustre-commit: f92fcb863cb9 ("LU-10648 ldlm: Reduce debug to console during eviction") Signed-off-by: Chris Horn Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/31237 Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 7fe8a8b..5d73132 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -819,7 +819,8 @@ static int ldlm_resource_complain(struct cfs_hash *hs, struct cfs_hash_bd *bd, ldlm_ns_name(ldlm_res_to_ns(res)), PLDLMRES(res), res, atomic_read(&res->lr_refcount) - 1); - ldlm_resource_dump(D_ERROR, res); + /* Use D_NETERROR since it is in the default mask */ + ldlm_resource_dump(D_NETERROR, res); unlock_res(res); return 0; } From patchwork Thu Feb 27 21:08:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B3ED159A for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3260E246A1 for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3260E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F19CF21FDC3; Thu, 27 Feb 2020 13:20:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1BE0E21FA5B for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C7551E0B; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C5EF646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:40 -0500 Message-Id: <1582838290-17243-53-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 052/622] lustre: ptlrpc: idle connections can disconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev - when new request is being allocated ptlrpc initiates connection if it's not connected yet - if the import is idle (no locks, no active RPCs, no non-PING reply for last osc_idle_timeout seconds), then pinger tries to disconnect asynchronously - currently only client-to-OST connections can be idle - lctl set_param osc.*.idle_timeout=N controls new feature: N=0 - disable N>0 - seconds to idle before disconnect - lctl set_param osc.*.idle_connect=N to reconnect if idle (N is positive number) - OSC module parameter osc_idle_timeout controls default idle timeout and set to 20 seconds by default WC-bug-id: https://jira.whamcloud.com/browse/LU-7236 Lustre-commit: 5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/16682 Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 17 +++-- fs/lustre/include/lustre_net.h | 1 + fs/lustre/lov/lov_ea.c | 3 +- fs/lustre/lov/lov_obd.c | 8 ++- fs/lustre/lov/lov_request.c | 25 ++++++-- fs/lustre/osc/lproc_osc.c | 66 +++++++++++++++++++ fs/lustre/osc/osc_request.c | 3 + fs/lustre/ptlrpc/client.c | 32 +++++++++- fs/lustre/ptlrpc/events.c | 3 +- fs/lustre/ptlrpc/import.c | 130 ++++++++++++++++++++++++++++++-------- fs/lustre/ptlrpc/pinger.c | 30 +++++++++ 11 files changed, 275 insertions(+), 43 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 0d7bb0f..c4452e1 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -96,6 +96,8 @@ enum lustre_imp_state { LUSTRE_IMP_RECOVER = 8, LUSTRE_IMP_FULL = 9, LUSTRE_IMP_EVICTED = 10, + LUSTRE_IMP_IDLE = 11, + LUSTRE_IMP_LAST }; /** Returns test string representation of numeric import state @state */ @@ -104,10 +106,10 @@ static inline char *ptlrpc_import_state_name(enum lustre_imp_state state) static char *import_state_names[] = { "", "CLOSED", "NEW", "DISCONN", "CONNECTING", "REPLAY", "REPLAY_LOCKS", "REPLAY_WAIT", - "RECOVER", "FULL", "EVICTED", + "RECOVER", "FULL", "EVICTED", "IDLE", }; - LASSERT(state <= LUSTRE_IMP_EVICTED); + LASSERT(state < LUSTRE_IMP_LAST); return import_state_names[state]; } @@ -226,12 +228,14 @@ struct obd_import { int imp_state_hist_idx; /** Current import generation. Incremented on every reconnect */ int imp_generation; + /* Idle connection initiated at this generation */ + int imp_initiated_at; /** Incremented every time we send reconnection request */ u32 imp_conn_cnt; - /** - * \see ptlrpc_free_committed remembers imp_generation value here - * after a check to save on unnecessary replay list iterations - */ + /* + * \see ptlrpc_free_committed remembers imp_generation value here + * after a check to save on unnecessary replay list iterations + */ int imp_last_generation_checked; /** Last transno we replayed */ u64 imp_last_replay_transno; @@ -299,6 +303,7 @@ struct obd_import { imp_connected:1; u32 imp_connect_op; + u32 imp_idle_timeout; struct obd_connect_data imp_connect_data; u64 imp_connect_flags_orig; u64 imp_connect_flags2_orig; diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 0231011..674803c 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1988,6 +1988,7 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, int ptlrpc_connect_import(struct obd_import *imp); int ptlrpc_init_import(struct obd_import *imp); int ptlrpc_disconnect_import(struct obd_import *imp, int noclose); +int ptlrpc_disconnect_and_idle_import(struct obd_import *imp); int ptlrpc_import_recovery_state_machine(struct obd_import *imp); /* ptlrpc/pack_generic.c */ diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index 41308d3..edca3b0 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -70,7 +70,8 @@ static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt) return maxbytes; spin_lock(&imp->imp_lock); - if (imp->imp_state == LUSTRE_IMP_FULL && + if ((imp->imp_state == LUSTRE_IMP_FULL || + imp->imp_state == LUSTRE_IMP_IDLE) && (imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_MAXBYTES) && imp->imp_connect_data.ocd_maxbytes > 0) maxbytes = imp->imp_connect_data.ocd_maxbytes; diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 9449aa9..35eaa1f 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -977,17 +977,21 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len, struct obd_ioctl_data *data = karg; struct obd_device *osc_obd; struct obd_statfs stat_buf = { 0 }; + struct obd_import *imp; u32 index; u32 flags; - memcpy(&index, data->ioc_inlbuf2, sizeof(u32)); + memcpy(&index, data->ioc_inlbuf2, sizeof(index)); if (index >= count) return -ENODEV; if (!lov->lov_tgts[index]) /* Try again with the next index */ return -EAGAIN; - if (!lov->lov_tgts[index]->ltd_active) + + imp = lov->lov_tgts[index]->ltd_exp->exp_obd->u.cli.cl_import; + if (!lov->lov_tgts[index]->ltd_active && + imp->imp_state != LUSTRE_IMP_IDLE) return -ENODATA; osc_obd = class_exp2obd(lov->lov_tgts[index]->ltd_exp); diff --git a/fs/lustre/lov/lov_request.c b/fs/lustre/lov/lov_request.c index 864e410..added19 100644 --- a/fs/lustre/lov/lov_request.c +++ b/fs/lustre/lov/lov_request.c @@ -99,6 +99,7 @@ static int lov_check_and_wait_active(struct lov_obd *lov, int ost_idx) { int cnt = 0; struct lov_tgt_desc *tgt; + struct obd_import *imp = NULL; int rc = 0; mutex_lock(&lov->lov_lock); @@ -115,7 +116,13 @@ static int lov_check_and_wait_active(struct lov_obd *lov, int ost_idx) goto out; } - if (tgt->ltd_exp && class_exp2cliimp(tgt->ltd_exp)->imp_connect_tried) { + if (tgt->ltd_exp) + imp = class_exp2cliimp(tgt->ltd_exp); + if (imp && imp->imp_connect_tried) { + rc = 0; + goto out; + } + if (imp && imp->imp_state == LUSTRE_IMP_IDLE) { rc = 0; goto out; } @@ -302,11 +309,10 @@ int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo, /* We only get block data from the OBD */ for (i = 0; i < lov->desc.ld_tgt_count; i++) { + struct lov_tgt_desc *ltd = lov->lov_tgts[i]; struct lov_request *req; - if (!lov->lov_tgts[i] || - (oinfo->oi_flags & OBD_STATFS_NODELAY && - !lov->lov_tgts[i]->ltd_active)) { + if (!ltd) { CDEBUG(D_HA, "lov idx %d inactive\n", i); continue; } @@ -314,13 +320,20 @@ int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo, /* skip targets that have been explicitly disabled by the * administrator */ - if (!lov->lov_tgts[i]->ltd_exp) { + if (!ltd->ltd_exp) { CDEBUG(D_HA, "lov idx %d administratively disabled\n", i); continue; } - if (!lov->lov_tgts[i]->ltd_active) + if (oinfo->oi_flags & OBD_STATFS_NODELAY && + class_exp2cliimp(ltd->ltd_exp)->imp_state != + LUSTRE_IMP_IDLE && !ltd->ltd_active) { + CDEBUG(D_HA, "lov idx %d inactive\n", i); + continue; + } + + if (!ltd->ltd_active) lov_check_and_wait_active(lov, i); req = kzalloc(sizeof(*req), GFP_NOFS); diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 605a236..fd84393 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -598,6 +598,68 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(osc_unstable_stats); +static int osc_idle_timeout_seq_show(struct seq_file *m, void *v) +{ + struct obd_device *obd = m->private; + struct client_obd *cli = &obd->u.cli; + + seq_printf(m, "%u\n", cli->cl_import->imp_idle_timeout); + return 0; +} + +static ssize_t osc_idle_timeout_seq_write(struct file *f, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *obd = ((struct seq_file *)f->private_data)->private; + struct client_obd *cli = &obd->u.cli; + struct ptlrpc_request *req; + unsigned int val; + int rc; + + rc = kstrtouint_from_user(buffer, count, 0, &val); + if (rc) + return rc; + + if (val > CONNECTION_SWITCH_MAX) + return -ERANGE; + + cli->cl_import->imp_idle_timeout = val; + + /* to initiate the connection if it's in IDLE state */ + if (!val) { + req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); + if (req) + ptlrpc_req_finished(req); + } + + return count; +} +LPROC_SEQ_FOPS(osc_idle_timeout); + +static int osc_idle_connect_seq_show(struct seq_file *m, void *v) +{ + return 0; +} + +static ssize_t osc_idle_connect_seq_write(struct file *f, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *dev = ((struct seq_file *)f->private_data)->private; + struct client_obd *cli = &dev->u.cli; + struct ptlrpc_request *req; + + /* to initiate the connection if it's in IDLE state */ + req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); + if (req) + ptlrpc_req_finished(req); + ptlrpc_pinger_force(cli->cl_import); + + return count; +} +LPROC_SEQ_FOPS(osc_idle_connect); + LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); @@ -625,6 +687,10 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) .fops = &osc_pinger_recov_fops }, { .name = "unstable_stats", .fops = &osc_unstable_stats_fops }, + { .name = "idle_timeout", + .fops = &osc_idle_timeout_fops }, + { .name = "idle_connect", + .fops = &osc_idle_connect_fops }, { NULL } }; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 9ac9c84..e341fcc 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -61,6 +61,8 @@ /* max memory used for request pool, unit is MB */ static unsigned int osc_reqpool_mem_max = 5; module_param(osc_reqpool_mem_max, uint, 0444); +static int osc_idle_timeout = 20; +module_param(osc_idle_timeout, uint, 0644); struct osc_async_args { struct obd_info *aa_oi; @@ -3214,6 +3216,7 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg) spin_lock(&osc_shrink_lock); list_add_tail(&cli->cl_shrink_list, &osc_shrink_list); spin_unlock(&osc_shrink_lock); + cli->cl_import->imp_idle_timeout = osc_idle_timeout; return rc; diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 424db55..9b41c12 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -885,6 +885,28 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, const struct req_format *format) { struct ptlrpc_request *request; + int connect = 0; + + if (unlikely(imp->imp_state == LUSTRE_IMP_IDLE)) { + int rc; + + CDEBUG(D_INFO, "%s: connect at new req\n", + imp->imp_obd->obd_name); + spin_lock(&imp->imp_lock); + if (imp->imp_state == LUSTRE_IMP_IDLE) { + imp->imp_generation++; + imp->imp_initiated_at = imp->imp_generation; + imp->imp_state = LUSTRE_IMP_NEW; + connect = 1; + } + spin_unlock(&imp->imp_lock); + if (connect) { + rc = ptlrpc_connect_import(imp); + if (rc < 0) + return NULL; + ptlrpc_pinger_add_import(imp); + } + } request = __ptlrpc_request_alloc(imp, pool); if (!request) @@ -1075,6 +1097,7 @@ void ptlrpc_set_add_req(struct ptlrpc_request_set *set, return; } + LASSERT(req->rq_import->imp_state != LUSTRE_IMP_IDLE); LASSERT(list_empty(&req->rq_set_chain)); /* The set takes over the caller's request reference */ @@ -1183,7 +1206,9 @@ static int ptlrpc_import_delay_req(struct obd_import *imp, if (atomic_read(&imp->imp_inval_count) != 0) { DEBUG_REQ(D_ERROR, req, "invalidate in flight"); *status = -EIO; - } else if (req->rq_no_delay) { + } else if (req->rq_no_delay && + imp->imp_generation != imp->imp_initiated_at) { + /* ignore nodelay for requests initiating connections */ *status = -EWOULDBLOCK; } else if (req->rq_allow_replay && (imp->imp_state == LUSTRE_IMP_REPLAY || @@ -1842,8 +1867,11 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) spin_unlock(&imp->imp_lock); goto interpret; } + /* ignore on just initiated connections */ if (ptlrpc_no_resend(req) && - !req->rq_wait_ctx) { + !req->rq_wait_ctx && + imp->imp_generation != + imp->imp_initiated_at) { req->rq_status = -ENOTCONN; ptlrpc_rqphase_move(req, RQ_PHASE_INTERPRET); diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c index 93a59b8..87c0ab7 100644 --- a/fs/lustre/ptlrpc/events.c +++ b/fs/lustre/ptlrpc/events.c @@ -164,7 +164,8 @@ void reply_in_callback(struct lnet_event *ev) ev->mlength, ev->offset, req->rq_replen); } - req->rq_import->imp_last_reply_time = ktime_get_real_seconds(); + if (lustre_msg_get_opc(req->rq_reqmsg) != OBD_PING) + req->rq_import->imp_last_reply_time = ktime_get_real_seconds(); out_wake: /* NB don't unlock till after wakeup; req can disappear under us diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 019648b..b90f78c 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -925,6 +925,21 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, } if (rc) { + struct ptlrpc_request *free_req; + struct ptlrpc_request *tmp; + + /* abort all delayed requests initiated connection */ + list_for_each_entry_safe(free_req, tmp, &imp->imp_delayed_list, + rq_list) { + spin_lock(&free_req->rq_lock); + if (free_req->rq_no_resend) { + free_req->rq_err = 1; + free_req->rq_status = -EIO; + ptlrpc_client_wake_req(free_req); + } + spin_unlock(&free_req->rq_lock); + } + /* if this reconnect to busy export - not need select new target * for connecting */ @@ -1454,14 +1469,11 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) return rc; } -int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) +static struct ptlrpc_request *ptlrpc_disconnect_prep_req(struct obd_import *imp) { struct ptlrpc_request *req; int rq_opc, rc = 0; - if (imp->imp_obd->obd_force) - goto set_state; - switch (imp->imp_connect_op) { case OST_CONNECT: rq_opc = OST_DISCONNECT; @@ -1477,9 +1489,47 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) CERROR("%s: don't know how to disconnect from %s (connect_op %d): rc = %d\n", imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd), imp->imp_connect_op, rc); - return rc; + return ERR_PTR(rc); } + req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_DISCONNECT, + LUSTRE_OBD_VERSION, rq_opc); + if (!req) + return NULL; + + /* We are disconnecting, do not retry a failed DISCONNECT rpc if + * it fails. We can get through the above with a down server + * if the client doesn't know the server is gone yet. + */ + req->rq_no_resend = 1; + + /* We want client umounts to happen quickly, no matter the + * server state... + */ + req->rq_timeout = min_t(int, req->rq_timeout, + INITIAL_CONNECT_TIMEOUT); + + IMPORT_SET_STATE(imp, LUSTRE_IMP_CONNECTING); + req->rq_send_state = LUSTRE_IMP_CONNECTING; + ptlrpc_request_set_replen(req); + + return req; +} + +int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) +{ + struct ptlrpc_request *req; + int rc = 0; + + if (imp->imp_obd->obd_force) + goto set_state; + + /* probably the import has been disconnected already being idle */ + spin_lock(&imp->imp_lock); + if (imp->imp_state == LUSTRE_IMP_IDLE) + goto out; + spin_unlock(&imp->imp_lock); + if (ptlrpc_import_in_recovery(imp)) { long timeout_jiffies; time64_t timeout; @@ -1512,27 +1562,13 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) goto out; spin_unlock(&imp->imp_lock); - req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_DISCONNECT, - LUSTRE_OBD_VERSION, rq_opc); - if (req) { - /* We are disconnecting, do not retry a failed DISCONNECT rpc if - * it fails. We can get through the above with a down server - * if the client doesn't know the server is gone yet. - */ - req->rq_no_resend = 1; - - /* We want client umounts to happen quickly, no matter the - * server state... - */ - req->rq_timeout = min_t(int, req->rq_timeout, - INITIAL_CONNECT_TIMEOUT); - - IMPORT_SET_STATE(imp, LUSTRE_IMP_CONNECTING); - req->rq_send_state = LUSTRE_IMP_CONNECTING; - ptlrpc_request_set_replen(req); - rc = ptlrpc_queue_wait(req); - ptlrpc_req_finished(req); + req = ptlrpc_disconnect_prep_req(imp); + if (IS_ERR(req)) { + rc = PTR_ERR(req); + goto set_state; } + rc = ptlrpc_queue_wait(req); + ptlrpc_req_finished(req); set_state: spin_lock(&imp->imp_lock); @@ -1551,6 +1587,50 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) } EXPORT_SYMBOL(ptlrpc_disconnect_import); +static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, + struct ptlrpc_request *req, + void *data, int rc) +{ + struct obd_import *imp = req->rq_import; + + LASSERT(imp->imp_state == LUSTRE_IMP_CONNECTING); + spin_lock(&imp->imp_lock); + IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_IDLE); + memset(&imp->imp_remote_handle, 0, sizeof(imp->imp_remote_handle)); + spin_unlock(&imp->imp_lock); + + return 0; +} + +int ptlrpc_disconnect_and_idle_import(struct obd_import *imp) +{ + struct ptlrpc_request *req; + + if (imp->imp_obd->obd_force) + return 0; + + if (ptlrpc_import_in_recovery(imp)) + return 0; + + spin_lock(&imp->imp_lock); + if (imp->imp_state != LUSTRE_IMP_FULL) { + spin_unlock(&imp->imp_lock); + return 0; + } + spin_unlock(&imp->imp_lock); + + req = ptlrpc_disconnect_prep_req(imp); + if (IS_ERR(req)) + return PTR_ERR(req); + + CDEBUG(D_INFO, "%s: disconnect\n", imp->imp_obd->obd_name); + req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret; + ptlrpcd_add_req(req); + + return 0; +} +EXPORT_SYMBOL(ptlrpc_disconnect_and_idle_import); + /* Adaptive Timeout utils */ /* diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index 762fd0e..c565e2d 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -79,10 +79,40 @@ int ptlrpc_obd_ping(struct obd_device *obd) } EXPORT_SYMBOL(ptlrpc_obd_ping); +static bool ptlrpc_check_import_is_idle(struct obd_import *imp) +{ + struct ldlm_namespace *ns = imp->imp_obd->obd_namespace; + time64_t now; + + if (!imp->imp_idle_timeout) + return false; + /* 4 comes from: + * - client_obd_setup() - hashed import + * - ptlrpcd_alloc_work() + * - ptlrpcd_alloc_work() + * - ptlrpc_pinger_add_import + */ + if (atomic_read(&imp->imp_refcount) > 4) + return false; + + /* any lock increases ns_bref being a resource holder */ + if (ns && atomic_read(&ns->ns_bref) > 0) + return false; + + now = ktime_get_real_seconds(); + if (now - imp->imp_last_reply_time < imp->imp_idle_timeout) + return false; + + return true; +} + static int ptlrpc_ping(struct obd_import *imp) { struct ptlrpc_request *req; + if (ptlrpc_check_import_is_idle(imp)) + return ptlrpc_disconnect_and_idle_import(imp); + req = ptlrpc_prep_ping(imp); if (!req) { CERROR("OOM trying to ping %s->%s\n", From patchwork Thu Feb 27 21:08:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409735 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08043138D for ; Thu, 27 Feb 2020 21:20:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4178246A1 for ; Thu, 27 Feb 2020 21:20:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4178246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADD0520111E; Thu, 27 Feb 2020 13:19:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73AE221FA5B for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CA475E0E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C8A0246F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:41 -0500 Message-Id: <1582838290-17243-54-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 053/622] lustre: osc: truncate does not update blocks count on client X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Abrarahmed Momin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain 'truncate' call correctly updates the server side with correct size and blocks count. However, on the client side all the metadata are correctly updated except the blocks count, which still reflects the old count prior to truncate call. This patch fixes this issue on the client by modifying osc_io_setattr_end() to update attr with the updated block count. New test case under sanity is added to verify the that the blocks counts are correctly updated after truncate call Co-authored-by: Abrarahmed Momin WC-bug-id: https://jira.whamcloud.com/browse/LU-10370 Lustre-commit: 6115eb7fd55a ("LU-10370 ofd: truncate does not update blocks count on client") Signed-off-by: Abrarahmed Momin Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/31073 Reviewed-by: Jinshan Xiong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_io.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 970e8a7..1485962 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -588,6 +588,9 @@ void osc_io_setattr_end(const struct lu_env *env, struct osc_io *oio = cl2osc_io(env, slice); struct cl_object *obj = slice->cis_obj; struct osc_async_cbargs *cbargs = &oio->oi_cbarg; + struct cl_attr *attr = &osc_env_info(env)->oti_attr; + struct obdo *oa = &oio->oi_oa; + unsigned int cl_valid = 0; int result = 0; if (cbargs->opc_rpc_sent) { @@ -609,6 +612,14 @@ void osc_io_setattr_end(const struct lu_env *env, if (cl_io_is_trunc(io)) { u64 size = io->u.ci_setattr.sa_attr.lvb_size; + cl_object_attr_lock(obj); + if (oa->o_valid & OBD_MD_FLBLOCKS) { + attr->cat_blocks = oa->o_blocks; + cl_valid |= CAT_BLOCKS; + } + + cl_object_attr_update(env, obj, attr, cl_valid); + cl_object_attr_unlock(obj); osc_trunc_check(env, io, oio, size); osc_cache_truncate_end(env, oio->oi_trunc); oio->oi_trunc = NULL; From patchwork Thu Feb 27 21:08:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80E49159A for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 69B13246A0 for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69B13246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2DF421FDF9; Thu, 27 Feb 2020 13:20:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C00A921FAE0 for ; Thu, 27 Feb 2020 13:18:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD70EE11; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CBB9246A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:42 -0500 Message-Id: <1582838290-17243-55-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 054/622] lustre: ptlrpc: add LOCK_CONVERT connection flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Add LOCK_CONVERT connection flag to don't use lock convert feature with old servers. WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 44a2092f08ca ("LU-10175 ptlrpc: add LOCK_CONVERT connection flag") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32593 Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index e2575b4..385359f 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -118,6 +118,7 @@ "unknown", /* 0x10 */ "flr", /* 0x20 */ "wbc", /* 0x40 */ + "lock_convert", /* 0x80 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 01ddbee..202c5ab 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1117,6 +1117,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_FLR); LASSERTF(OBD_CONNECT2_WBC_INTENTS == 0x40ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_WBC_INTENTS); + LASSERTF(OBD_CONNECT2_LOCK_CONVERT == 0x80ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_LOCK_CONVERT); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 11df7b4..798aa57 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -799,6 +799,7 @@ struct ptlrpc_body_v2 { * under client-held parent * locks */ +#define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:08:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409739 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E3DB138D for ; Thu, 27 Feb 2020 21:20:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8661D2084E for ; Thu, 27 Feb 2020 21:20:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8661D2084E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DF37F21FF16; Thu, 27 Feb 2020 13:20:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0FDB521FAAA for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D11ADE1E; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEECB46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:43 -0500 Message-Id: <1582838290-17243-56-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 055/622] lustre: ldlm: handle lock converts in cancel handler X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin - Use cancel portals and high-priority handling for lock converts. Update ldlm_cancel_handler to understand LDLM_CONVERT RPC for that. - Use ns_dirty_age_limit for lock convert - don't convert too old locks. - Check for empty converts and skip such WC-bug-id: https://jira.whamcloud.com/browse/LU-10175 Lustre-commit: 541902a3f934 ("LU-10175 ldlm: handle lock converts in cancel handler") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32314 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_export.h | 6 ++++++ fs/lustre/ldlm/ldlm_inodebits.c | 19 ++++++++++++++----- fs/lustre/ldlm/ldlm_request.c | 39 +++++++++++++++++++++++++++++++-------- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/namei.c | 7 ++++++- 5 files changed, 58 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index de3b109..57cf68b 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -269,9 +269,15 @@ static inline int exp_connect_flr(struct obd_export *exp) return !!(exp_connect_flags2(exp) & OBD_CONNECT2_FLR); } +static inline int exp_connect_lock_convert(struct obd_export *exp) +{ + return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LOCK_CONVERT); +} + struct obd_export *class_conn2export(struct lustre_handle *conn); #define KKUC_CT_DATA_MAGIC 0x092013cea + struct kkuc_ct_data { u32 kcd_magic; u32 kcd_archive; diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index ddbf8d4..9cf3c5f 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -81,7 +81,7 @@ int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop) /* Just return if there are no conflicting bits */ if ((lock->l_policy_data.l_inodebits.bits & to_drop) == 0) { - LDLM_WARN(lock, "try to drop unset bits %#llx/%#llx\n", + LDLM_WARN(lock, "try to drop unset bits %#llx/%#llx", lock->l_policy_data.l_inodebits.bits, to_drop); /* nothing to do */ return 0; @@ -111,7 +111,7 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) ldlm_lock2handle(lock, &lockh); lock_res_and_lock(lock); - /* check if all bits are cancelled */ + /* check if all bits are blocked */ if (!(lock->l_policy_data.l_inodebits.bits & ~drop_bits)) { unlock_res_and_lock(lock); /* return error to continue with cancel */ @@ -119,6 +119,13 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) goto exit; } + /* check if no common bits, consider this as successful convert */ + if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { + unlock_res_and_lock(lock); + rc = 0; + goto exit; + } + /* check if there is race with cancel */ if (ldlm_is_canceling(lock) || ldlm_is_cancel(lock)) { unlock_res_and_lock(lock); @@ -167,9 +174,11 @@ int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) rc = ldlm_cli_convert(lock, &flags); if (rc) { lock_res_and_lock(lock); - ldlm_clear_converting(lock); - ldlm_set_cbpending(lock); - ldlm_set_bl_ast(lock); + if (ldlm_is_converting(lock)) { + ldlm_clear_converting(lock); + ldlm_set_cbpending(lock); + ldlm_set_bl_ast(lock); + } unlock_res_and_lock(lock); goto exit; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5833f59..ad54bd2 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -854,7 +854,7 @@ static int lock_convert_interpret(const struct lu_env *env, aa->lock_handle.cookie, reply->lock_handle.cookie, req->rq_export->exp_client_uuid.uuid, libcfs_id2str(req->rq_peer)); - rc = -ESTALE; + rc = ELDLM_NO_LOCK_DATA; goto out; } @@ -905,15 +905,30 @@ static int lock_convert_interpret(const struct lu_env *env, unlock_res_and_lock(lock); out: if (rc) { + int flag; + lock_res_and_lock(lock); if (ldlm_is_converting(lock)) { ldlm_clear_converting(lock); ldlm_set_cbpending(lock); ldlm_set_bl_ast(lock); + lock->l_policy_data.l_inodebits.cancel_bits = 0; } unlock_res_and_lock(lock); - } + /* fallback to normal lock cancel. If rc means there is no + * valid lock on server, do only local cancel + */ + if (rc == ELDLM_NO_LOCK_DATA) + flag = LCF_LOCAL; + else + flag = LCF_ASYNC; + + rc = ldlm_cli_cancel(&aa->lock_handle, flag); + if (rc < 0) + LDLM_DEBUG(lock, "failed to cancel lock: rc = %d\n", + rc); + } LDLM_LOCK_PUT(lock); return rc; } @@ -942,6 +957,15 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) return -EINVAL; } + /* this is better to check earlier and it is done so already, + * but this check is kept too as final one to issue an error + * if any new code will miss such check. + */ + if (!exp_connect_lock_convert(exp)) { + LDLM_ERROR(lock, "server doesn't support lock convert\n"); + return -EPROTO; + } + if (lock->l_resource->lr_type != LDLM_IBITS) { LDLM_ERROR(lock, "convert works with IBITS locks only."); return -EINVAL; @@ -970,13 +994,12 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) ptlrpc_request_set_replen(req); - /* That could be useful to use cancel portals for convert as well - * as high-priority handling. This will require changes in - * ldlm_cancel_handler to understand convert RPC as well. - * - * req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; - * req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + /* + * Use cancel portals for convert as well as high-priority handling. */ + req->rq_request_portal = LDLM_CANCEL_REQUEST_PORTAL; + req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; + ptlrpc_at_set_req_timeout(req); if (exp->exp_obd->obd_svc_stats) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index dff349f..0844318 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -209,7 +209,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_GRANT_PARAM | OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2; - data->ocd_connect_flags2 = OBD_CONNECT2_FLR; + data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 8b1a1ca..f835abb 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -371,11 +371,16 @@ void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) */ int ll_md_need_convert(struct ldlm_lock *lock) { + struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); struct inode *inode; u64 wanted = lock->l_policy_data.l_inodebits.cancel_bits; u64 bits = lock->l_policy_data.l_inodebits.bits & ~wanted; enum ldlm_mode mode = LCK_MINMODE; + if (!lock->l_conn_export || + !exp_connect_lock_convert(lock->l_conn_export)) + return 0; + if (!wanted || !bits || ldlm_is_cancel(lock)) return 0; @@ -410,7 +415,7 @@ int ll_md_need_convert(struct ldlm_lock *lock) lock_res_and_lock(lock); if (ktime_after(ktime_get(), ktime_add(lock->l_last_used, - ktime_set(10, 0)))) { + ktime_set(ns->ns_dirty_age_limit, 0)))) { unlock_res_and_lock(lock); return 0; } From patchwork Thu Feb 27 21:08:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409743 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2832514BC for ; Thu, 27 Feb 2020 21:21:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1078E2469F for ; Thu, 27 Feb 2020 21:21:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1078E2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3469921FD1F; Thu, 27 Feb 2020 13:20:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 690E321FAAA for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D4A06E1F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D1AD0468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:44 -0500 Message-Id: <1582838290-17243-57-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 056/622] lustre: ptlrpc: Serialize procfs access to scp_hist_reqs using mutex X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh scp_hist_reqs list can be quite long thus a lot of userland processes can waste CPU power in spinlock cycles. Cray-bug-id: LUS-5833 WC-bug-id: https://jira.whamcloud.com/browse/LU-11004 Lustre-commit: 413a738a37d7 ("LU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex") Signed-off-by: Andriy Skulysh Reviewed-by: Andrew Perepechko Reviewed-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/32307 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 ++ fs/lustre/ptlrpc/lproc_ptlrpc.c | 7 +++++++ fs/lustre/ptlrpc/service.c | 1 + 3 files changed, 10 insertions(+) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 674803c..cf13555 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1543,6 +1543,8 @@ struct ptlrpc_service_part { * threads starting & stopping are also protected by this lock. */ spinlock_t scp_lock __cfs_cacheline_aligned; + /* userland serialization */ + struct mutex scp_mutex; /** total # req buffer descs allocated */ int scp_nrqbds_total; /** # posted request buffers for receiving */ diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index e48a4e8..0efbcfc 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -869,10 +869,12 @@ struct ptlrpc_srh_iterator { if (i > cpt) /* make up the lowest position for this CPT */ *pos = PTLRPC_REQ_CPT2POS(svc, i); + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, PTLRPC_REQ_POS2SEQ(svc, *pos)); spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); if (rc == 0) { *pos = PTLRPC_REQ_SEQ2POS(svc, srhi->srhi_seq); srhi->srhi_idx = i; @@ -914,9 +916,11 @@ struct ptlrpc_srh_iterator { seq = srhi->srhi_seq + (1 << svc->srv_cpt_bits); } + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, seq); spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); if (rc == 0) { *pos = PTLRPC_REQ_SEQ2POS(svc, srhi->srhi_seq); srhi->srhi_idx = i; @@ -940,6 +944,7 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) svcpt = svc->srv_parts[srhi->srhi_idx]; + mutex_lock(&svcpt->scp_mutex); spin_lock(&svcpt->scp_lock); rc = ptlrpc_lprocfs_svc_req_history_seek(svcpt, srhi, srhi->srhi_seq); @@ -980,6 +985,8 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) } spin_unlock(&svcpt->scp_lock); + mutex_unlock(&svcpt->scp_mutex); + return rc; } diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 8dae21a..cf920ae 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -471,6 +471,7 @@ static void ptlrpc_at_timer(struct timer_list *t) /* rqbd and incoming request queue */ spin_lock_init(&svcpt->scp_lock); + mutex_init(&svcpt->scp_mutex); INIT_LIST_HEAD(&svcpt->scp_rqbd_idle); INIT_LIST_HEAD(&svcpt->scp_rqbd_posted); INIT_LIST_HEAD(&svcpt->scp_req_incoming); From patchwork Thu Feb 27 21:08:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E34A4138D for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB5B9246A0 for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB5B9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A294734890A; Thu, 27 Feb 2020 13:21:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C109421FA65 for ; Thu, 27 Feb 2020 13:18:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D62BBE20; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D4CC046D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:45 -0500 Message-Id: <1582838290-17243-58-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 057/622] lustre: ldlm: don't add canceling lock back to LRU X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin When lock is converted check it is not canceling before adding it back to LRU. Lustre-commit: ad52f394bd82 ("LU-11003 ldlm: don't add canceling lock back to LRU") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32692 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index ad54bd2..bc441f0 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -893,7 +893,8 @@ static int lock_convert_interpret(const struct lu_env *env, * is not there yet. */ lock->l_policy_data.l_inodebits.cancel_bits = 0; - if (!lock->l_readers && !lock->l_writers) { + if (!lock->l_readers && !lock->l_writers && + !ldlm_is_canceling(lock)) { spin_lock(&ns->ns_lock); /* there is check for list_empty() inside */ ldlm_lock_remove_from_lru_nolock(lock); From patchwork Thu Feb 27 21:08:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3C1314BC for ; Thu, 27 Feb 2020 21:21:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9C5F8246A0 for ; Thu, 27 Feb 2020 21:21:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C5F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2408B21F8F8; Thu, 27 Feb 2020 13:20:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0F39E21FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DA9EDE21; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D7D6146F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:46 -0500 Message-Id: <1582838290-17243-59-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 058/622] lustre: quota: add default quota setting support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang Similar function which is motivated by GPFS which is friendly feature for cluster administrators to manage quota. Lazy Quota default setting support, here is basic idea: Default quota setting is global quota setting for user, group, project quotas, if default quota is set for one quota type, newer created users/groups/projects will inherit this setting automatically, since Lustre itself don't have ideas when new users created, they could only know when this users trying to acquire space from Lustre. So we try to implement lazy quota setting inherit, Slave firstly check if there exists default quota setting, if exists, it will force slave to acquire quota from master, and master will detect whether default quota is set, then it will set this quota and also return proper grant space to slave. To implement this and reuse existed quota APIs, we try to manage the default quota in the quota record of 0 id, and enforce the quota check when reading the quota recored from disk. In the current Lustre implementation, the grace time is either the time or the timestamp to be used after some quota ID exceeds the soft limt, then 48bits should be enough for it, its high 16bits can be used as kinds of quota flags, this patch will use one of them as the default quota flag. The global quota record used by default quota will set its soft and hard limit as zero, its grace time will contain the default flag. Use lfs setquota -U/-G/-P to set default quota. Use lfs setquota -u/-g/-p foo -d to set foo to use default quota Use lfs quota -U/-G/-P to show default quota. WC-bug-id: https://jira.whamcloud.com/browse/LU-7816 Lustre-commit: 530881fe4ee2 ("LU-7816 quota: add default quota setting support") Signed-off-by: Wang Shilong Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/32306 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 4 +++- include/uapi/linux/lustre/lustre_user.h | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index b006e32..c0c3bf0 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -949,10 +949,12 @@ static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl) switch (cmd) { case Q_SETQUOTA: case Q_SETINFO: + case LUSTRE_Q_SETDEFAULT: if (!capable(CAP_SYS_ADMIN)) return -EPERM; break; case Q_GETQUOTA: + case LUSTRE_Q_GETDEFAULT: if (check_owner(type, id) && !capable(CAP_SYS_ADMIN)) return -EPERM; break; @@ -960,7 +962,7 @@ static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl) break; default: CERROR("unsupported quotactl op: %#x\n", cmd); - return -ENOTTY; + return -ENOTSUPP; } if (valid != QC_GENERAL) { diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 5405e1b..5956f33 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -728,6 +728,28 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen) /* lustre-specific control commands */ #define LUSTRE_Q_INVALIDATE 0x80000b /* deprecated as of 2.4 */ #define LUSTRE_Q_FINVALIDATE 0x80000c /* deprecated as of 2.4 */ +#define LUSTRE_Q_GETDEFAULT 0x80000d /* get default quota */ +#define LUSTRE_Q_SETDEFAULT 0x80000e /* set default quota */ + +/* In the current Lustre implementation, the grace time is either the time + * or the timestamp to be used after some quota ID exceeds the soft limt, + * 48 bits should be enough, its high 16 bits can be used as quota flags. + */ +#define LQUOTA_GRACE_BITS 48 +#define LQUOTA_GRACE_MASK ((1ULL << LQUOTA_GRACE_BITS) - 1) +#define LQUOTA_GRACE_MAX LQUOTA_GRACE_MASK +#define LQUOTA_GRACE(t) (t & LQUOTA_GRACE_MASK) +#define LQUOTA_FLAG(t) (t >> LQUOTA_GRACE_BITS) +#define LQUOTA_GRACE_FLAG(t, f) ((__u64)t | (__u64)f << LQUOTA_GRACE_BITS) + +/* different quota flags */ + +/* the default quota flag, the corresponding quota ID will use the default + * quota setting, the hardlimit and softlimit of its quota record in the global + * quota file will be set to 0, the low 48 bits of the grace will be set to 0 + * and high 16 bits will contain this flag (see above comment). + */ +#define LQUOTA_FLAG_DEFAULT 0x0001 #define ALLQUOTA 255 /* set all quota */ From patchwork Thu Feb 27 21:08:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409747 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F2EA138D for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 07AD62469F for ; Thu, 27 Feb 2020 21:21:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07AD62469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D224B21FB99; Thu, 27 Feb 2020 13:20:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 681BC21FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DC388E22; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DAEBF46A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:47 -0500 Message-Id: <1582838290-17243-60-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 059/622] lustre: ptlrpc: don't zero request handle X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko LNet can retransmit a request at any time if it isn't replied. The ptlrpc_resend_req zero the request handle and ptlrpc_send_rpc set it. If retransmission happen with zeroed handle, the client can't find a valid export by handle and set rq_export to NULL and reply with ENOTCONN. A server evict client with this error. client (nid x.x.x.x@tcp) returned error from blocking AST (req status -107 rc -107), evict it WC-bug-id: https://jira.whamcloud.com/browse/LU-11117 Lustre-commit: 00c72ab6bb43 ("LU-11117 ptlrpc: don't zero request handle") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-6037 Reviewed-on: https://review.whamcloud.com/32781 Reviewed-by: Mikhail Pershin Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 9b41c12..d28a9cd 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2728,7 +2728,6 @@ void ptlrpc_resend_req(struct ptlrpc_request *req) return; } - lustre_msg_set_handle(req->rq_reqmsg, &(struct lustre_handle){ 0 }); req->rq_status = -EAGAIN; req->rq_resend = 1; From patchwork Thu Feb 27 21:08:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23F3A159A for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0966D246A1 for ; Thu, 27 Feb 2020 21:21:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0966D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A072A21FD8A; Thu, 27 Feb 2020 13:20:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8F9521FA65 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E5DB0E27; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E4381468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:48 -0500 Message-Id: <1582838290-17243-61-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 060/622] lnet: ko2iblnd: determine gaps correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata We're allowed to start at a non-aligned page offset in the first fragment and end at a non-aligned page offset in the last fragment. When checking the iovec exclude both of the first and last fragments from the tx_gaps check. WC-bug-id: https://jira.whamcloud.com/browse/LU-11064 Lustre-commit: e40ea6fd4494 ("LU-11064 lnd: determine gaps correctly") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32586 Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index c2ce3b9..60706b4 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -737,6 +737,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, struct kib_net *net = ni->ni_data; struct scatterlist *sg; int fragnob; + int max_nkiov; CDEBUG(D_NET, "niov %d offset %d nob %d\n", nkiov, offset, nob); @@ -751,16 +752,24 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(nkiov > 0); } + max_nkiov = nkiov; + sg = tx->tx_frags; do { LASSERT(nkiov > 0); fragnob = min((int)(kiov->bv_len - offset), nob); - if ((fragnob < (int)(kiov->bv_len - offset)) && nkiov > 1) { + /* We're allowed to start at a non-aligned page offset in + * the first fragment and end at a non-aligned page offset + * in the last fragment. + */ + if ((fragnob < (int)(kiov->bv_len - offset)) && + nkiov < max_nkiov && nob > fragnob) { CDEBUG(D_NET, - "fragnob %d < available page %d: with remaining %d kiovs\n", - fragnob, (int)(kiov->bv_len - offset), nkiov); + "fragnob %d < available page %d: with remaining %d kiovs with %d nob left\n", + fragnob, (int)(kiov->bv_len - offset), + nkiov, nob); tx->tx_gaps = true; } From patchwork Thu Feb 27 21:08:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48DE314BC for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 30ECA2469F for ; Thu, 27 Feb 2020 21:21:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30ECA2469F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AABBC21FDBE; Thu, 27 Feb 2020 13:20:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC15721FAF6 for ; Thu, 27 Feb 2020 13:18:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E85BAE28; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E74A546A; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:49 -0500 Message-Id: <1582838290-17243-62-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 061/622] lustre: osc: increase default max_dirty_mb to 2G X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin While ideally we want to go away from max_dirty_mb setting completely and let grants code to take the msot part of it, Andreas raises a somewhat valid point that for certain system configurations with high-latency links, system administrators might want to have ability to limit amount of dirty pages just for those OSCs to limit amount of time it might take to flush that dirty data. So a good compromise is to lift the max_dirty_mb default value first while we work out the current grant code deficiencies WC-bug-id: https://jira.whamcloud.com/browse/LU-10990 Lustre-commit: 92e2b514e06c ("LU-10990 osc: increase default max_dirty_mb to 2G") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32288 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 99577e4..d2bd234 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -127,7 +127,7 @@ struct timeout_item { #define OBD_MAX_RIF_DEFAULT 8 #define OBD_MAX_RIF_MAX 512 #define OSC_MAX_RIF_MAX 256 -#define OSC_MAX_DIRTY_DEFAULT (OBD_MAX_RIF_DEFAULT * 4) +#define OSC_MAX_DIRTY_DEFAULT 2000 /* Arbitrary large value */ #define OSC_MAX_DIRTY_MB_MAX 2048 /* arbitrary, but < MAX_LONG bytes */ #define OSC_DEFAULT_RESENDS 10 From patchwork Thu Feb 27 21:08:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49D49138D for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 328DA246A0 for ; Thu, 27 Feb 2020 21:21:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 328DA246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 699CD21FDF3; Thu, 27 Feb 2020 13:20:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38D9D21FA93 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EB4A7E2B; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EA1C346C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:50 -0500 Message-Id: <1582838290-17243-63-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 062/622] lustre: ptlrpc: remove obsolete OBD RPC opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the obsolete OBD_LOG_CANCEL (since Lustre 1.5) and OBD_QC_CALLBACK (since Lustre 2.4) RPC opcodes. Assign OBD_IDX_READ an explicit opcode (as should be done with all enums in lustre_idl.h) so that the value does not change if some prior field is removed. Also remove the OBD_FAIL checks that were used to test them. The setting in conf_sanity.sh test_58 was unused for many years. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 7d89a5b8aefc ("LU-10855 ptlrpc: remove obsolete OBD RPC opcodes") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32651 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 6 +++--- fs/lustre/ptlrpc/lproc_ptlrpc.c | 4 ++-- fs/lustre/ptlrpc/wiretest.c | 4 ---- include/uapi/linux/lustre/lustre_idl.h | 12 ++++++------ 4 files changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 67500b5..99b4f1f 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -352,12 +352,12 @@ #define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 #define OBD_FAIL_OBD_PING_NET 0x600 -#define OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 +/* OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 obsolete since 1.5 */ #define OBD_FAIL_OBD_LOGD_NET 0x602 -/* OBD_FAIL_OBD_QC_CALLBACK_NET 0x603 obsolete since 2.4 */ +/* OBD_FAIL_OBD_QC_CALLBACK_NET 0x603 obsolete since 2.4 */ #define OBD_FAIL_OBD_DQACQ 0x604 #define OBD_FAIL_OBD_LLOG_SETUP 0x605 -#define OBD_FAIL_OBD_LOG_CANCEL_REP 0x606 +/* OBD_FAIL_OBD_LOG_CANCEL_REP 0x606 obsolete since 1.5 */ #define OBD_FAIL_OBD_IDX_READ_NET 0x607 #define OBD_FAIL_OBD_IDX_READ_BREAK 0x608 #define OBD_FAIL_OBD_NO_LRU 0x609 diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 0efbcfc..b70a1c7 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -111,8 +111,8 @@ { MGS_SET_INFO, "mgs_set_info" }, { MGS_CONFIG_READ, "mgs_config_read" }, { OBD_PING, "obd_ping" }, - { OBD_LOG_CANCEL, "llog_cancel" }, - { OBD_QC_CALLBACK, "obd_quota_callback" }, + { 401, /* was OBD_LOG_CANCEL */ "llog_cancel" }, + { 402, /* was OBD_QC_CALLBACK */ "obd_quota_callback" }, { OBD_IDX_READ, "dt_index_read" }, { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, { LLOG_ORIGIN_HANDLE_NEXT_BLOCK, "llog_origin_handle_next_block" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 202c5ab..015c5bd 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -326,10 +326,6 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LUSTRE_RES_ID_HSH_OFF != 3); LASSERTF(OBD_PING == 400, "found %lld\n", (long long)OBD_PING); - LASSERTF(OBD_LOG_CANCEL == 401, "found %lld\n", - (long long)OBD_LOG_CANCEL); - LASSERTF(OBD_QC_CALLBACK == 402, "found %lld\n", - (long long)OBD_QC_CALLBACK); LASSERTF(OBD_IDX_READ == 403, "found %lld\n", (long long)OBD_IDX_READ); LASSERTF(OBD_LAST_OPC == 404, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 798aa57..adaa994 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2342,13 +2342,13 @@ struct cfg_marker { */ enum obd_cmd { - OBD_PING = 400, - OBD_LOG_CANCEL, /* Obsolete since 1.5. */ - OBD_QC_CALLBACK, /* not used since 2.4 */ - OBD_IDX_READ, - OBD_LAST_OPC + OBD_PING = 400, +/* OBD_LOG_CANCEL = 401, Obsolete since 1.5 */ +/* OBD_QC_CALLBACK = 402, not used since 2.4 */ + OBD_IDX_READ = 403, + OBD_LAST_OPC, + OBD_FIRST_OPC = OBD_PING }; -#define OBD_FIRST_OPC OBD_PING /** * llog contexts indices. From patchwork Thu Feb 27 21:08:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7321514BC for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5AF99246A0 for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5AF99246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 820D9348940; Thu, 27 Feb 2020 13:21:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F8B821F982 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EF4C9E33; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ECEC646D; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:51 -0500 Message-Id: <1582838290-17243-64-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 063/622] lustre: ptlrpc: assign specific values to MGS opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Assign specific values to all of the MGS opcodes in enum mgs_cmd so that these values do not change if a new items is added or one is removed in the future. These opcodes are part of the wire protocol and need to remain constant. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 12c5a26609f1 ("LU-10855 ptlrpc: assign specific values to MGS opcodes") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32653 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 20 ++++++++++---------- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 015c5bd..ef07975 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -348,6 +348,8 @@ void lustre_assert_wire_constants(void) (long long)MGS_TARGET_DEL); LASSERTF(MGS_SET_INFO == 255, "found %lld\n", (long long)MGS_SET_INFO); + LASSERTF(MGS_CONFIG_READ == 256, "found %lld\n", + (long long)MGS_CONFIG_READ); LASSERTF(MGS_LAST_OPC == 257, "found %lld\n", (long long)MGS_LAST_OPC); LASSERTF(SEC_CTX_INIT == 801, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index adaa994..1b5794a 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2247,16 +2247,16 @@ struct ldlm_reply { * Opcodes for mountconf (mgs and mgc) */ enum mgs_cmd { - MGS_CONNECT = 250, - MGS_DISCONNECT, - MGS_EXCEPTION, /* node died, etc. */ - MGS_TARGET_REG, /* whenever target starts up */ - MGS_TARGET_DEL, - MGS_SET_INFO, - MGS_CONFIG_READ, - MGS_LAST_OPC -}; -#define MGS_FIRST_OPC MGS_CONNECT + MGS_CONNECT = 250, + MGS_DISCONNECT = 251, + MGS_EXCEPTION = 252, /* node died, etc. */ + MGS_TARGET_REG = 253, /* whenever target starts up */ + MGS_TARGET_DEL = 254, + MGS_SET_INFO = 255, + MGS_CONFIG_READ = 256, + MGS_LAST_OPC, + MGS_FIRST_OPC = MGS_CONNECT +}; #define MGS_PARAM_MAXLEN 1024 #define KEY_SET_INFO "set_info" From patchwork Thu Feb 27 21:08:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C3BD159A for ; Thu, 27 Feb 2020 21:21:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 44D98246A0 for ; Thu, 27 Feb 2020 21:21:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 44D98246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED69921C905; Thu, 27 Feb 2020 13:20:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D2CBE21FA80 for ; Thu, 27 Feb 2020 13:18:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F16CEE35; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EFD6A46F; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:52 -0500 Message-Id: <1582838290-17243-65-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 064/622] lustre: ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the obsolete RPC opcodes LLOG_ORIGIN_HANDLE_WRITE_REC, LLOG_ORIGIN_HANDLE_CLOSE, LLOG_ORIGIN_CONNECT, LLOG_CATINFO along with their unused OBD_FAIL counterparts. WC-bug-id: https://jira.whamcloud.com/browse/LU-10855 Lustre-commit: 830ce1b10f3a ("LU-10855 ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32654 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 10 +++++----- fs/lustre/ptlrpc/lproc_ptlrpc.c | 8 ++++---- fs/lustre/ptlrpc/wiretest.c | 5 ----- include/uapi/linux/lustre/lustre_idl.h | 10 +++++----- 4 files changed, 14 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 99b4f1f..28becfa 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -423,15 +423,15 @@ #define OBD_FAIL_SEC_CTX_HDL_PAUSE 0x1204 #define OBD_FAIL_LLOG 0x1300 -#define OBD_FAIL_LLOG_ORIGIN_CONNECT_NET 0x1301 +/* was OBD_FAIL_LLOG_ORIGIN_CONNECT_NET 0x1301 until 2.4 */ #define OBD_FAIL_LLOG_ORIGIN_HANDLE_CREATE_NET 0x1302 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_DESTROY_NET 0x1303 +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_DESTROY_NET 0x1303 until 2.11 */ #define OBD_FAIL_LLOG_ORIGIN_HANDLE_READ_HEADER_NET 0x1304 #define OBD_FAIL_LLOG_ORIGIN_HANDLE_NEXT_BLOCK_NET 0x1305 #define OBD_FAIL_LLOG_ORIGIN_HANDLE_PREV_BLOCK_NET 0x1306 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_WRITE_REC_NET 0x1307 -#define OBD_FAIL_LLOG_ORIGIN_HANDLE_CLOSE_NET 0x1308 -#define OBD_FAIL_LLOG_CATINFO_NET 0x1309 +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_WRITE_REC_NET 0x1307 until 2.1 */ +/* was OBD_FAIL_LLOG_ORIGIN_HANDLE_CLOSE_NET 0x1308 until 1.8 */ +/* was OBD_FAIL_LLOG_CATINFO_NET 0x1309 until 2.3 */ #define OBD_FAIL_MDS_SYNC_CAPA_SL 0x1310 #define OBD_FAIL_SEQ_ALLOC 0x1311 diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index b70a1c7..6af3384 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -117,10 +117,10 @@ { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, { LLOG_ORIGIN_HANDLE_NEXT_BLOCK, "llog_origin_handle_next_block" }, { LLOG_ORIGIN_HANDLE_READ_HEADER, "llog_origin_handle_read_header" }, - { LLOG_ORIGIN_HANDLE_WRITE_REC, "llog_origin_handle_write_rec" }, - { LLOG_ORIGIN_HANDLE_CLOSE, "llog_origin_handle_close" }, - { LLOG_ORIGIN_CONNECT, "llog_origin_connect" }, - { LLOG_CATINFO, "llog_catinfo" }, + { 504, /*LLOG_ORIGIN_HANDLE_WRITE_REC*/ "llog_origin_handle_write_rec" }, + { 505, /* was LLOG_ORIGIN_HANDLE_CLOSE */"llog_origin_handle_close" }, + { 506, /* was LLOG_ORIGIN_CONNECT */ "llog_origin_connect" }, + { 507, /* was LLOG_CATINFO */ "llog_catinfo" }, { LLOG_ORIGIN_HANDLE_PREV_BLOCK, "llog_origin_handle_prev_block" }, { LLOG_ORIGIN_HANDLE_DESTROY, "llog_origin_handle_destroy" }, { QUOTA_DQACQ, "quota_acquire" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index ef07975..7b6ea86 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -3757,12 +3757,7 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_CREATE != 501); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_NEXT_BLOCK != 502); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_READ_HEADER != 503); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_WRITE_REC != 504); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_CLOSE != 505); - BUILD_BUG_ON(LLOG_ORIGIN_CONNECT != 506); - BUILD_BUG_ON(LLOG_CATINFO != 507); BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_PREV_BLOCK != 508); - BUILD_BUG_ON(LLOG_ORIGIN_HANDLE_DESTROY != 509); BUILD_BUG_ON(LLOG_FIRST_OPC != 501); BUILD_BUG_ON(LLOG_LAST_OPC != 510); BUILD_BUG_ON(LLOG_CONFIG_ORIG_CTXT != 0); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 1b5794a..5db742f 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2655,12 +2655,12 @@ enum llogd_rpc_ops { LLOG_ORIGIN_HANDLE_CREATE = 501, LLOG_ORIGIN_HANDLE_NEXT_BLOCK = 502, LLOG_ORIGIN_HANDLE_READ_HEADER = 503, - LLOG_ORIGIN_HANDLE_WRITE_REC = 504, /* Obsolete by 2.1. */ - LLOG_ORIGIN_HANDLE_CLOSE = 505, /* Obsolete by 1.8. */ - LLOG_ORIGIN_CONNECT = 506, /* Obsolete by 2.4. */ - LLOG_CATINFO = 507, /* Obsolete by 2.3. */ +/* LLOG_ORIGIN_HANDLE_WRITE_REC = 504, Obsolete by 2.1. */ +/* LLOG_ORIGIN_HANDLE_CLOSE = 505, Obsolete by 1.8. */ +/* LLOG_ORIGIN_CONNECT = 506, Obsolete by 2.4. */ +/* LLOG_CATINFO = 507, Obsolete by 2.3. */ LLOG_ORIGIN_HANDLE_PREV_BLOCK = 508, - LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete. */ + LLOG_ORIGIN_HANDLE_DESTROY = 509, /* Obsolete by 2.11. */ LLOG_LAST_OPC, LLOG_FIRST_OPC = LLOG_ORIGIN_HANDLE_CREATE }; From patchwork Thu Feb 27 21:08:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E176D14BC for ; Thu, 27 Feb 2020 21:21:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CA075246A0 for ; Thu, 27 Feb 2020 21:21:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA075246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 83C4E21FE4B; Thu, 27 Feb 2020 13:20:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 34EF221FA8C for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 00D7EE3B; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F2AAE468; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:53 -0500 Message-Id: <1582838290-17243-66-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 065/622] lustre: osc: fix idle_timeout handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The patch that landed for LU-7236 introduced new sysfs entries which were done wrong. 1) For idle_timeout it returns -ERANGE for any value passed in expect setting idle_timeout to zero. This does not match what the commit message said for LU-7236. So I changed lprocfs_str_with_units_to_s64() into kstrtouint() since a signed 64 bit timeout is not needed. Using kstrtouint() ensures that negative values are not possible and also cap the value to CONNECTION_SWITCH_MAX since the max of 4 billion seconds is over kill. 2) For the next procfs idle_connect it is really a write only file but it was treated as both read and write. There is no need for the osc_idle_connect_seq_show() function. 3) Lastly no more stuffing new entries into proc or debugfs. For this patch convert these new proc entries to sysfs. It seems to be a common occurrence so add LPROC_SEQ_* to spelling.txt so checkpatch will complain about using LPROC_SEQ_* which will go away. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: 406cd8a74d84 ("LU-8066 osc: fix idle_timeout handling") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32719 Reviewed-by: Alex Zhuravlev Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index fd84393..0a12079 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -598,26 +598,27 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v) LPROC_SEQ_FOPS_RO(osc_unstable_stats); -static int osc_idle_timeout_seq_show(struct seq_file *m, void *v) +static ssize_t idle_timeout_show(struct kobject *kobj, struct attribute *attr, + char *buf) { - struct obd_device *obd = m->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &obd->u.cli; - seq_printf(m, "%u\n", cli->cl_import->imp_idle_timeout); - return 0; + return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); } -static ssize_t osc_idle_timeout_seq_write(struct file *f, - const char __user *buffer, - size_t count, loff_t *off) +static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) { - struct obd_device *obd = ((struct seq_file *)f->private_data)->private; + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &obd->u.cli; struct ptlrpc_request *req; unsigned int val; int rc; - rc = kstrtouint_from_user(buffer, count, 0, &val); + rc = kstrtouint(buffer, 0, &val); if (rc) return rc; @@ -635,18 +636,13 @@ static ssize_t osc_idle_timeout_seq_write(struct file *f, return count; } -LPROC_SEQ_FOPS(osc_idle_timeout); +LUSTRE_RW_ATTR(idle_timeout); -static int osc_idle_connect_seq_show(struct seq_file *m, void *v) +static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) { - return 0; -} - -static ssize_t osc_idle_connect_seq_write(struct file *f, - const char __user *buffer, - size_t count, loff_t *off) -{ - struct obd_device *dev = ((struct seq_file *)f->private_data)->private; + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); struct client_obd *cli = &dev->u.cli; struct ptlrpc_request *req; @@ -658,7 +654,7 @@ static ssize_t osc_idle_connect_seq_write(struct file *f, return count; } -LPROC_SEQ_FOPS(osc_idle_connect); +LUSTRE_WO_ATTR(idle_connect); LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); @@ -687,10 +683,6 @@ static ssize_t osc_idle_connect_seq_write(struct file *f, .fops = &osc_pinger_recov_fops }, { .name = "unstable_stats", .fops = &osc_unstable_stats_fops }, - { .name = "idle_timeout", - .fops = &osc_idle_timeout_fops }, - { .name = "idle_connect", - .fops = &osc_idle_connect_fops }, { NULL } }; @@ -877,6 +869,8 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_resend_count.attr, &lustre_attr_ost_conn_uuid.attr, &lustre_attr_ping.attr, + &lustre_attr_idle_timeout.attr, + &lustre_attr_idle_connect.attr, NULL, }; From patchwork Thu Feb 27 21:08:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8A37138D for ; Thu, 27 Feb 2020 21:21:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B17A2246A0 for ; Thu, 27 Feb 2020 21:21:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B17A2246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F26021FE9B; Thu, 27 Feb 2020 13:20:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8E74121FABD for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 02F5FE3D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 016C346A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:54 -0500 Message-Id: <1582838290-17243-67-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 066/622] lustre: ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor)) X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh It's ptlrpc_replay_next() vs close race. ll_close_inode_openhandle() calls mdc_free_open()->ptlrpc_request_committed->ptlrpc_free_request Need to reset imp_replay_cursor while dropping a request from replay list. Cray-bug-id: LUS-2455 WC-bug-id: https://jira.whamcloud.com/browse/LU-11098 Lustre-commit: d69d488e1778 ("LU-11098 ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor))") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/32727 Reviewed-by: Andreas Dilger Reviewed-by: Vladimir Saveliev Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index d28a9cd..57b08de 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2613,8 +2613,11 @@ void ptlrpc_request_committed(struct ptlrpc_request *req, int force) return; } - if (force || req->rq_transno <= imp->imp_peer_committed_transno) + if (force || req->rq_transno <= imp->imp_peer_committed_transno) { + if (imp->imp_replay_cursor == &req->rq_replay_list) + imp->imp_replay_cursor = req->rq_replay_list.next; ptlrpc_free_request(req); + } spin_unlock(&imp->imp_lock); } From patchwork Thu Feb 27 21:08:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409773 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2592214BC for ; Thu, 27 Feb 2020 21:21:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D0F8246A0 for ; Thu, 27 Feb 2020 21:21:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D0F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6667B21FC83; Thu, 27 Feb 2020 13:20:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D3D7421FA5B for ; Thu, 27 Feb 2020 13:18:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 059B8E3E; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0444646C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:55 -0500 Message-Id: <1582838290-17243-68-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 067/622] lustre: obd: keep dirty_max_pages a round number of MB X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In client_adjust_max_dirty() ensure that the dirty pages limit is always divisible by 256 so that it may faithfully be represented in MB as is the case when the max_dirty_mb parameters are used. WC-bug-id: https://jira.whamcloud.com/browse/LU-11157 Lustre-commit: d3f88d376c49 ("LU-11157 obd: keep dirty_max_pages a round number of MB") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32831 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index d2bd234..5656eb0 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -1106,7 +1106,7 @@ static inline int cli_brw_size(struct obd_device *obd) } /* - * when RPC size or the max RPCs in flight is increased, the max dirty pages + * When RPC size or the max RPCs in flight is increased, the max dirty pages * of the client should be increased accordingly to avoid sending fragmented * RPCs over the network when the client runs out of the maximum dirty space * when so many RPCs are being generated. @@ -1114,10 +1114,10 @@ static inline int cli_brw_size(struct obd_device *obd) static inline void client_adjust_max_dirty(struct client_obd *cli) { /* initializing */ - if (cli->cl_dirty_max_pages <= 0) + if (cli->cl_dirty_max_pages <= 0) { cli->cl_dirty_max_pages = (OSC_MAX_DIRTY_DEFAULT * 1024 * 1024) >> PAGE_SHIFT; - else { + } else { unsigned long dirty_max = cli->cl_max_rpcs_in_flight * cli->cl_max_pages_per_rpc; @@ -1127,6 +1127,13 @@ static inline void client_adjust_max_dirty(struct client_obd *cli) if (cli->cl_dirty_max_pages > totalram_pages() / 8) cli->cl_dirty_max_pages = totalram_pages() / 8; + + /* This value is exported to userspace through the max_dirty_mb + * parameter. So we round up the number of pages to make it a round + * number of MBs. + */ + cli->cl_dirty_max_pages = round_up(cli->cl_dirty_max_pages, + 1 << (20 - PAGE_SHIFT)); } #endif /* __OBD_H */ From patchwork Thu Feb 27 21:08:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DD69138D for ; Thu, 27 Feb 2020 21:21:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA2A5246A1 for ; Thu, 27 Feb 2020 21:21:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA2A5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E59221FEB4; Thu, 27 Feb 2020 13:20:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2666521FAF6 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 08891E3F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 072BA46D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:56 -0500 Message-Id: <1582838290-17243-69-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 068/622] lustre: osc: depart grant shrinking from pinger X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bobi Jam , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam * Removing grant shrinking code outside of pinger, use a workqueue to handle grant shrinking timer. * Enable OSC grant shrinking by default. bugzilla: 19507 WC-bug-id: https://jira.whamcloud.com/browse/LU-8708 Lustre-commit: fc915a43786e ("LU-8708 osc: depart grant shrinking from pinger") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/23202 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 1 + fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/osc/osc_request.c | 155 ++++++++++++++++++++++++++++++-------------- 3 files changed, 110 insertions(+), 48 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 2c0fad3..838ddb3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -349,6 +349,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) spin_lock_init(&cli->cl_lru_list_lock); atomic_long_set(&cli->cl_unstable_count, 0); INIT_LIST_HEAD(&cli->cl_shrink_list); + INIT_LIST_HEAD(&cli->cl_grant_chain); INIT_LIST_HEAD(&cli->cl_flight_waiters); cli->cl_rpcs_in_flight = 0; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 0844318..56624e8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -399,7 +399,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_LAYOUTLOCK | OBD_CONNECT_PINGLESS | OBD_CONNECT_LFSCK | OBD_CONNECT_BULK_MBITS | OBD_CONNECT_SHORTIO | - OBD_CONNECT_FLAGS2; + OBD_CONNECT_FLAGS2 | OBD_CONNECT_GRANT_SHRINK; /* The client currently advertises support for OBD_CONNECT_LOCKAHEAD_OLD * so it can interoperate with an older version of lockahead which was diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index e341fcc..1a9ed8d 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -33,6 +33,7 @@ #define DEBUG_SUBSYSTEM S_OSC +#include #include #include #include @@ -721,6 +722,16 @@ static void osc_update_grant(struct client_obd *cli, struct ost_body *body) } } +/** + * grant thread data for shrinking space. + */ +struct grant_thread_data { + struct list_head gtd_clients; + struct mutex gtd_mutex; + unsigned long gtd_stopped:1; +}; +static struct grant_thread_data client_gtd; + static int osc_shrink_grant_interpret(const struct lu_env *env, struct ptlrpc_request *req, void *aa, int rc) @@ -823,6 +834,9 @@ static int osc_should_shrink_grant(struct client_obd *client) { time64_t next_shrink = client->cl_next_shrink_grant; + if (!client->cl_import) + return 0; + if ((client->cl_import->imp_connect_data.ocd_connect_flags & OBD_CONNECT_GRANT_SHRINK) == 0) return 0; @@ -843,38 +857,83 @@ static int osc_should_shrink_grant(struct client_obd *client) return 0; } -static int osc_grant_shrink_grant_cb(struct timeout_item *item, void *data) -{ - struct client_obd *client; +#define GRANT_SHRINK_RPC_BATCH 100 + +static void osc_grant_work_handler(struct work_struct *data); +static DECLARE_DELAYED_WORK(work, osc_grant_work_handler); - list_for_each_entry(client, &item->ti_obd_list, cl_grant_shrink_list) { - if (osc_should_shrink_grant(client)) - osc_shrink_grant(client); +static void osc_grant_work_handler(struct work_struct *data) +{ + struct client_obd *cli; + int rpc_sent; + bool init_next_shrink = true; + time64_t next_shrink = ktime_get_seconds() + GRANT_SHRINK_INTERVAL; + + rpc_sent = 0; + mutex_lock(&client_gtd.gtd_mutex); + list_for_each_entry(cli, &client_gtd.gtd_clients, + cl_grant_chain) { + if (++rpc_sent < GRANT_SHRINK_RPC_BATCH && + osc_should_shrink_grant(cli)) + osc_shrink_grant(cli); + + if (!init_next_shrink) { + if (cli->cl_next_shrink_grant < next_shrink && + cli->cl_next_shrink_grant > ktime_get_seconds()) + next_shrink = cli->cl_next_shrink_grant; + } else { + init_next_shrink = false; + next_shrink = cli->cl_next_shrink_grant; + } } - return 0; + mutex_unlock(&client_gtd.gtd_mutex); + + if (client_gtd.gtd_stopped == 1) + return; + + if (next_shrink > ktime_get_seconds()) + schedule_delayed_work(&work, msecs_to_jiffies( + (next_shrink - ktime_get_seconds()) * + MSEC_PER_SEC)); + else + schedule_work(&work.work); } -static int osc_add_shrink_grant(struct client_obd *client) +/** + * Start grant thread for returing grant to server for idle clients. + */ +static int osc_start_grant_work(void) { - int rc; + client_gtd.gtd_stopped = 0; + mutex_init(&client_gtd.gtd_mutex); + INIT_LIST_HEAD(&client_gtd.gtd_clients); + + schedule_work(&work.work); - rc = ptlrpc_add_timeout_client(client->cl_grant_shrink_interval, - TIMEOUT_GRANT, - osc_grant_shrink_grant_cb, NULL, - &client->cl_grant_shrink_list); - if (rc) { - CERROR("add grant client %s error %d\n", cli_name(client), rc); - return rc; - } - CDEBUG(D_CACHE, "add grant client %s\n", cli_name(client)); - osc_update_next_shrink(client); return 0; } -static int osc_del_shrink_grant(struct client_obd *client) +static void osc_stop_grant_work(void) +{ + client_gtd.gtd_stopped = 1; + cancel_delayed_work_sync(&work); +} + +static void osc_add_grant_list(struct client_obd *client) { - return ptlrpc_del_timeout_client(&client->cl_grant_shrink_list, - TIMEOUT_GRANT); + mutex_lock(&client_gtd.gtd_mutex); + list_add(&client->cl_grant_chain, &client_gtd.gtd_clients); + mutex_unlock(&client_gtd.gtd_mutex); +} + +static void osc_del_grant_list(struct client_obd *client) +{ + if (list_empty(&client->cl_grant_chain)) + return; + + mutex_lock(&client_gtd.gtd_mutex); + list_del_init(&client->cl_grant_chain); + mutex_unlock(&client_gtd.gtd_mutex); } void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd) @@ -929,9 +988,8 @@ void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd) cli_name(cli), cli->cl_avail_grant, cli->cl_lost_grant, cli->cl_chunkbits, cli->cl_max_extent_pages); - if (ocd->ocd_connect_flags & OBD_CONNECT_GRANT_SHRINK && - list_empty(&cli->cl_grant_shrink_list)) - osc_add_shrink_grant(cli); + if (OCD_HAS_FLAG(ocd, GRANT_SHRINK) && list_empty(&cli->cl_grant_chain)) + osc_add_grant_list(cli); } EXPORT_SYMBOL(osc_init_grant); @@ -2971,15 +3029,12 @@ int osc_disconnect(struct obd_export *exp) * osc_disconnect * del_shrink_grant * ptlrpc_connect_interrupt - * init_grant_shrink + * osc_init_grant * add this client to shrink list - * cleanup_osc - * Bang! pinger trigger the shrink. - * So the osc should be disconnected from the shrink list, after we - * are sure the import has been destroyed. BUG18662 + * cleanup_osc + * Bang! grant shrink thread trigger the shrink. BUG18662 */ - if (!obd->u.cli.cl_import) - osc_del_shrink_grant(&obd->u.cli); + osc_del_grant_list(&obd->u.cli); return rc; } EXPORT_SYMBOL(osc_disconnect); @@ -3159,8 +3214,8 @@ int osc_setup_common(struct obd_device *obd, struct lustre_cfg *lcfg) goto out_ptlrpcd_work; cli->cl_grant_shrink_interval = GRANT_SHRINK_INTERVAL; + osc_update_next_shrink(cli); - INIT_LIST_HEAD(&cli->cl_grant_shrink_list); return 0; out_ptlrpcd_work: @@ -3210,7 +3265,6 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg) atomic_add(added, &osc_pool_req_count); } - INIT_LIST_HEAD(&cli->cl_grant_shrink_list); ns_register_cancel(obd->obd_namespace, osc_cancel_weight); spin_lock(&osc_shrink_lock); @@ -3356,14 +3410,19 @@ static int __init osc_init(void) if (rc) return rc; + rc = class_register_type(&osc_obd_ops, NULL, + LUSTRE_OSC_NAME, &osc_device_type); + if (rc) + goto out_kmem; + rc = register_shrinker(&osc_cache_shrinker); if (rc) - goto err; + goto out_type; /* This is obviously too much memory, only prevent overflow here */ if (osc_reqpool_mem_max >= 1 << 12 || osc_reqpool_mem_max == 0) { rc = -EINVAL; - goto err; + goto out_shrinker; } reqpool_size = osc_reqpool_mem_max << 20; @@ -3383,29 +3442,31 @@ static int __init osc_init(void) atomic_set(&osc_pool_req_count, 0); osc_rq_pool = ptlrpc_init_rq_pool(0, OST_MAXREQSIZE, ptlrpc_add_rqs_to_pool); + if (!osc_rq_pool) { + rc = -ENOMEM; + goto out_shrinker; + } - rc = -ENOMEM; - - if (!osc_rq_pool) - goto err; - - rc = class_register_type(&osc_obd_ops, NULL, - LUSTRE_OSC_NAME, &osc_device_type); + rc = osc_start_grant_work(); if (rc) - goto err; + goto out_req_pool; return rc; -err: - if (osc_rq_pool) - ptlrpc_free_rq_pool(osc_rq_pool); +out_req_pool: + ptlrpc_free_rq_pool(osc_rq_pool); +out_type: + class_unregister_type(LUSTRE_OSC_NAME); +out_shrinker: unregister_shrinker(&osc_cache_shrinker); +out_kmem: lu_kmem_fini(osc_caches); return rc; } static void /*__exit*/ osc_exit(void) { + osc_stop_grant_work(); unregister_shrinker(&osc_cache_shrinker); class_unregister_type(LUSTRE_OSC_NAME); lu_kmem_fini(osc_caches); From patchwork Thu Feb 27 21:08:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30C7D159A for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 19B1C246A1 for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19B1C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8BBB421FF8F; Thu, 27 Feb 2020 13:20:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D84A21FA75 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0CAC1EC0; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0A27646F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:57 -0500 Message-Id: <1582838290-17243-70-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 069/622] lustre: mdt: Lazy size on MDT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The design of Lazy size on MDT (LSOM) does not guarantee the accuracy. A file that is being opened for a long time might cause inaccurate LSOM for a very long time. And also eviction or crash of client might cause incomplete process of closing a file, thus might cause inaccurate LSOM. A precise LSOM could only be read from MDT when 1) all possible corruption and inconsistency caused by client eviction or client/server crash have all been fixed by LFSCK and 2) the file is not being opened for write. In the first step of implementing LSOM, LSOM will not be accessible from client. Instead, LSOM values can only be accessed on MDT. Thus, no interface or logic codes will be added on client side to enabled the access of LSOM from client side. The LSOM will be saved as an EA value on MDT. LSOM includes both the apparent size and also the disk usage of the file. Whenever a file is being truncated, the LSOM of the file on MDT will be updated. Whenever a client is closing a file, ll_prepare_close() will send the size and blocks to the MDS. The MDS will update the LSOM of the file if the file size or block size is being increased. WC-bug-id: https://jira.whamcloud.com/browse/LU-9538 Lustre-commit: f1ebf88aef21 ("LU-9538 mdt: Lazy size on MDT") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/29960 Reviewed-by: Vitaly Fertman Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 4 +++- fs/lustre/llite/file.c | 5 +++++ fs/lustre/mdc/mdc_lib.c | 4 ++++ fs/lustre/ptlrpc/wiretest.c | 24 ++++++++++++++++++++++++ include/uapi/linux/lustre/lustre_idl.h | 2 ++ include/uapi/linux/lustre/lustre_user.h | 17 +++++++++++++++-- 6 files changed, 53 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 5656eb0..c712979 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -204,7 +204,7 @@ struct client_obd { long cl_reserved_grant; wait_queue_head_t cl_cache_waiters; /* waiting for cache/grant */ time64_t cl_next_shrink_grant; /* seconds */ - struct list_head cl_grant_shrink_list; /* Timeout event list */ + struct list_head cl_grant_chain; time64_t cl_grant_shrink_interval; /* seconds */ /* A chunk is an optimal size used by osc_extent to determine @@ -670,6 +670,8 @@ enum op_xvalid { OP_XVALID_OWNEROVERRIDE = BIT(2), /* 0x0004 */ OP_XVALID_FLAGS = BIT(3), /* 0x0008 */ OP_XVALID_PROJID = BIT(4), /* 0x0010 */ + OP_XVALID_LAZYSIZE = BIT(5), /* 0x0020 */ + OP_XVALID_LAZYBLOCKS = BIT(6), /* 0x0040 */ }; struct lu_context; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index c3fb104b..837add1 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -207,6 +207,11 @@ static int ll_close_inode_openhandle(struct inode *inode, break; } + if (!(op_data->op_attr.ia_valid & ATTR_SIZE)) + op_data->op_xvalid |= OP_XVALID_LAZYSIZE; + if (!(op_data->op_xvalid & OP_XVALID_BLOCKS)) + op_data->op_xvalid |= OP_XVALID_LAZYBLOCKS; + rc = md_close(md_exp, op_data, och->och_mod, &req); if (rc && rc != -EINTR) { CERROR("%s: inode " DFID " mdc close failed: rc = %d\n", diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 467503c..e2f1a49 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -317,6 +317,10 @@ static inline u64 attr_pack(unsigned int ia_valid, enum op_xvalid ia_xvalid) sa_valid |= MDS_OPEN_OWNEROVERRIDE; if (ia_xvalid & OP_XVALID_PROJID) sa_valid |= MDS_ATTR_PROJID; + if (ia_xvalid & OP_XVALID_LAZYSIZE) + sa_valid |= MDS_ATTR_LSIZE; + if (ia_xvalid & OP_XVALID_LAZYBLOCKS) + sa_valid |= MDS_ATTR_LBLOCKS; return sa_valid; } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 7b6ea86..b4bb30d 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -258,6 +258,10 @@ void lustre_assert_wire_constants(void) LASSERTF(MDS_ATTR_PROJID == 0x0000000000010000ULL, "found 0x%.16llxULL\n", (long long)MDS_ATTR_PROJID); + LASSERTF(MDS_ATTR_LSIZE == 0x0000000000020000ULL, "found 0x%.16llxULL\n", + (long long)MDS_ATTR_LSIZE); + LASSERTF(MDS_ATTR_LBLOCKS == 0x0000000000040000ULL, "found 0x%.16llxULL\n", + (long long)MDS_ATTR_LBLOCKS); LASSERTF(FLD_QUERY == 900, "found %lld\n", (long long)FLD_QUERY); LASSERTF(FLD_FIRST_OPC == 900, "found %lld\n", @@ -390,6 +394,26 @@ void lustre_assert_wire_constants(void) LASSERTF(LU_SEQ_RANGE_OST == 1, "found %lld\n", (long long)LU_SEQ_RANGE_OST); + /* Checks for struct lustre_som_attrs */ + LASSERTF((int)sizeof(struct lustre_som_attrs) == 24, "found %lld\n", + (long long)(int)sizeof(struct lustre_som_attrs)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_valid) == 0, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_valid)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_valid) == 2, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_valid)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_reserved) == 2, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_reserved)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_reserved) == 6, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_reserved)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_size) == 8, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_size)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_size) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_size)); + LASSERTF((int)offsetof(struct lustre_som_attrs, lsa_blocks) == 16, "found %lld\n", + (long long)(int)offsetof(struct lustre_som_attrs, lsa_blocks)); + LASSERTF((int)sizeof(((struct lustre_som_attrs *)0)->lsa_blocks) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lustre_som_attrs *)0)->lsa_blocks)); + /* Checks for struct lustre_mdt_attrs */ LASSERTF((int)sizeof(struct lustre_mdt_attrs) == 24, "found %lld\n", (long long)(int)sizeof(struct lustre_mdt_attrs)); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 5db742f..9f8d65d 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1676,6 +1676,8 @@ struct mdt_rec_setattr { */ #define MDS_ATTR_BLOCKS 0x8000ULL /* = 32768 */ #define MDS_ATTR_PROJID 0x10000ULL /* = 65536 */ +#define MDS_ATTR_LSIZE 0x20000ULL /* = 131072 */ +#define MDS_ATTR_LBLOCKS 0x40000ULL /* = 262144 */ enum mds_op_bias { /* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 5956f33..b2f5b57 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -202,8 +202,19 @@ struct lustre_mdt_attrs { */ #define LMA_OLD_SIZE (sizeof(struct lustre_mdt_attrs) + 5 * sizeof(__u64)) -enum { - LSOM_FL_VALID = 1 << 0, +enum lustre_som_flags { + /* Unknown or no SoM data, must get size from OSTs. */ + SOM_FL_UNKNOWN = 0x0000, + /* Known strictly correct, FLR or DoM file (SoM guaranteed). */ + SOM_FL_STRICT = 0x0001, + /* Known stale - was right at some point in the past, but it is + * known (or likely) to be incorrect now (e.g. opened for write). + */ + SOM_FL_STALE = 0x0002, + /* Approximate, may never have been strictly correct, + * need to sync SOM data to achieve eventual consistency. + */ + SOM_FL_LAZY = 0x0004, }; struct lustre_som_attrs { @@ -882,6 +893,8 @@ enum la_valid { LA_KILL_SGID = 1 << 14, LA_PROJID = 1 << 15, LA_LAYOUT_VERSION = 1 << 16, + LA_LSIZE = 1 << 17, + LA_LBLOCKS = 1 << 18, /** * Attributes must be transmitted to OST objects */ From patchwork Thu Feb 27 21:08:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30C5014BC for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 18AA9246A0 for ; Thu, 27 Feb 2020 21:21:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18AA9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73A0F21FF8D; Thu, 27 Feb 2020 13:20:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4D0C21FA75 for ; Thu, 27 Feb 2020 13:18:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0E444EC1; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0D11F468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:58 -0500 Message-Id: <1582838290-17243-71-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 070/622] lustre: lfsck: layout LFSCK for mirrored file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fan Yong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Fan Yong This patch makes the layout LFSCK to support mirrored file as following: 1. Verify mirrored file's LOV EA and PFID EA, including all kinds of inconsistencies as non-mirrored file may hit. 2. Rebuild mirrored file's LOV EA from orphan OST-objects, recover the component's status/flags before the crash: init, stale, and so on. 3. For the mirrored file with dangling reference (OST object), it does NOT rebuild the lost OST-object from other replica, instead, it either reports the curruption or re-create empty OST-object that follows the same rules as non-mirrored case. Some code cleanup and new test cases for LFSCK against mirrored file. For the linux client we want to keep the wire protocol in sync. WC-bug-id: https://jira.whamcloud.com/browse/LU-10288 Lustre-commit: 36ba989752c6 ("LU-10288 lfsck: layout LFSCK for mirrored file") Signed-off-by: Fan Yong Reviewed-on: https://review.whamcloud.com/32705 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/pack_generic.c | 4 +++- fs/lustre/ptlrpc/wiretest.c | 16 ++++++++++++---- include/uapi/linux/lustre/lustre_user.h | 4 +++- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 9cea826..d09cf3f 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2066,7 +2066,9 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum) __swab64s(&ent->lcme_extent.e_end); __swab32s(&ent->lcme_offset); __swab32s(&ent->lcme_size); - BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding) == 0); + __swab32s(&ent->lcme_layout_gen); + BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_1) == 0); + BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_2) == 0); v1 = (struct lov_user_md_v1 *)((char *)lum + off); stripe_count = v1->lmm_stripe_count; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index b4bb30d..e22f8f8 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1536,10 +1536,18 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_size)); LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size) == 4, "found %lld\n", (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size)); - LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding) == 32, "found %lld\n", - (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding)); - LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding) == 16, "found %lld\n", - (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_layout_gen) == 32, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_layout_gen)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1) == 36, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2) == 40, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2)); LASSERTF(LCME_FL_INIT == 0x00000010UL, "found 0x%.8xUL\n", (unsigned int)LCME_FL_INIT); LASSERTF(LCME_FL_NEG == 0x80000000UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index b2f5b57..8fd5b26 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -517,7 +517,9 @@ struct lov_comp_md_entry_v1 { * start from lov_comp_md_v1 */ __u32 lcme_size; /* size of component blob */ - __u64 lcme_padding[2]; + __u32 lcme_layout_gen; + __u32 lcme_padding_1; + __u64 lcme_padding_2; } __packed; #define SEQ_ID_MAX 0x0000FFFF From patchwork Thu Feb 27 21:08:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F11A138D for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 27C92246A0 for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27C92246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D8A5721F5DB; Thu, 27 Feb 2020 13:20:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3627121FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 11B0DED7; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 107B846A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:59 -0500 Message-Id: <1582838290-17243-72-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 071/622] lustre: mdt: read on open for DoM files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Read file data upon open and return it in reply. That works only for file with Data-on-MDT layout and no OST components initialized. There are three possible cases may occur: 1) file data fits in already allocated reply buffer (~9K) and is returned in that buffer in OPEN reply. 2) File fits in the maximum reply buffer (128K) and reply is returned with larger size to the client causing resend with re-allocated buffer. 3) File doesn't fit in reply buffer but its tail fills page partially then that tail is returned. This can be useful for an append case WC-bug-id: https://jira.whamcloud.com/browse/LU-10181 Lustre-commit: 13372d6c243c ("LU-10181 mdt: read on open for DoM files") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/23011 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 + fs/lustre/include/obd.h | 11 +++ fs/lustre/llite/file.c | 131 +++++++++++++++++++++++++++++++++- fs/lustre/llite/llite_internal.h | 3 + fs/lustre/llite/namei.c | 3 + fs/lustre/mdc/lproc_mdc.c | 32 +++++++++ fs/lustre/mdc/mdc_internal.h | 4 ++ fs/lustre/mdc/mdc_locks.c | 28 +++++++- fs/lustre/mdc/mdc_request.c | 2 + fs/lustre/ptlrpc/layout.c | 11 ++- fs/lustre/ptlrpc/niobuf.c | 5 ++ 11 files changed, 227 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 2737240..807d080 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -291,6 +291,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_msg_field RMF_OBD_ID; extern struct req_msg_field RMF_FID; extern struct req_msg_field RMF_NIOBUF_REMOTE; +extern struct req_msg_field RMF_NIOBUF_INLINE; extern struct req_msg_field RMF_RCS; extern struct req_msg_field RMF_FIEMAP_KEY; extern struct req_msg_field RMF_FIEMAP_VAL; diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index c712979..de9642f 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -184,6 +184,17 @@ struct client_obd { */ u32 cl_max_mds_easize; + /* Data-on-MDT specific value to set larger reply buffer for possible + * data read along with open/stat requests. By default it tries to use + * unused space in reply buffer. + * This value is used to ensure that reply buffer has at least as + * much free space as value indicates. That free space is gained from + * LOV EA buffer which is small for DoM files and on big systems can + * provide up to 32KB of extra space in reply buffer. + * Default value is 8K now. + */ + u32 cl_dom_min_inline_repsize; + enum lustre_sec_part cl_sp_me; enum lustre_sec_part cl_sp_to; struct sptlrpc_flavor cl_flvr_mgc; /* fixed flavor of mgc->mgs */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 837add1..7657c79 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -393,6 +393,132 @@ int ll_file_release(struct inode *inode, struct file *file) return rc; } +static inline int ll_dom_readpage(void *data, struct page *page) +{ + struct niobuf_local *lnb = data; + void *kaddr; + + kaddr = kmap_atomic(page); + memcpy(kaddr, lnb->lnb_data, lnb->lnb_len); + if (lnb->lnb_len < PAGE_SIZE) + memset(kaddr + lnb->lnb_len, 0, + PAGE_SIZE - lnb->lnb_len); + flush_dcache_page(page); + SetPageUptodate(page); + kunmap_atomic(kaddr); + unlock_page(page); + + return 0; +} + +void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, + struct lookup_intent *it) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct cl_object *obj = lli->lli_clob; + struct address_space *mapping = inode->i_mapping; + struct page *vmpage; + struct niobuf_remote *rnb; + char *data; + struct lu_env *env; + struct cl_io *io; + u16 refcheck; + struct lustre_handle lockh; + struct ldlm_lock *lock; + unsigned long index, start; + struct niobuf_local lnb; + int rc; + bool dom_lock = false; + + if (!obj) + return; + + if (it->it_lock_mode != 0) { + lockh.cookie = it->it_lock_handle; + lock = ldlm_handle2lock(&lockh); + if (lock) + dom_lock = ldlm_has_dom(lock); + LDLM_LOCK_PUT(lock); + } + + if (!dom_lock) + return; + + env = cl_env_get(&refcheck); + if (IS_ERR(env)) + return; + + if (!req_capsule_has_field(&req->rq_pill, &RMF_NIOBUF_INLINE, + RCL_SERVER)) { + rc = -ENODATA; + goto out_env; + } + + rnb = req_capsule_server_get(&req->rq_pill, &RMF_NIOBUF_INLINE); + data = (char *)rnb + sizeof(*rnb); + + if (!rnb || rnb->rnb_len == 0) { + rc = 0; + goto out_env; + } + + CDEBUG(D_INFO, "Get data buffer along with open, len %i, i_size %llu\n", + rnb->rnb_len, i_size_read(inode)); + + io = vvp_env_thread_io(env); + io->ci_obj = obj; + io->ci_ignore_layout = 1; + rc = cl_io_init(env, io, CIT_MISC, obj); + if (rc) + goto out_io; + + lnb.lnb_file_offset = rnb->rnb_offset; + start = lnb.lnb_file_offset / PAGE_SIZE; + index = 0; + LASSERT(lnb.lnb_file_offset % PAGE_SIZE == 0); + lnb.lnb_page_offset = 0; + do { + struct cl_page *clp; + + lnb.lnb_data = data + (index << PAGE_SHIFT); + lnb.lnb_len = rnb->rnb_len - (index << PAGE_SHIFT); + if (lnb.lnb_len > PAGE_SIZE) + lnb.lnb_len = PAGE_SIZE; + + vmpage = read_cache_page(mapping, index + start, + ll_dom_readpage, &lnb); + if (IS_ERR(vmpage)) { + CWARN("%s: cannot fill page %lu for "DFID + " with data: rc = %li\n", + ll_get_fsname(inode->i_sb, NULL, 0), + index + start, PFID(lu_object_fid(&obj->co_lu)), + PTR_ERR(vmpage)); + break; + } + lock_page(vmpage); + clp = cl_page_find(env, obj, vmpage->index, vmpage, + CPT_CACHEABLE); + if (IS_ERR(clp)) { + unlock_page(vmpage); + put_page(vmpage); + rc = PTR_ERR(clp); + goto out_io; + } + + /* export page */ + cl_page_export(env, clp, 1); + cl_page_put(env, clp); + unlock_page(vmpage); + put_page(vmpage); + index++; + } while (rnb->rnb_len > (index << PAGE_SHIFT)); + rc = 0; +out_io: + cl_io_fini(env, io); +out_env: + cl_env_put(env, &refcheck); +} + static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, struct lookup_intent *itp) { @@ -450,8 +576,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, } rc = ll_prep_inode(&inode, req, NULL, itp); - if (!rc && itp->it_lock_mode) + + if (!rc && itp->it_lock_mode) { + ll_dom_finish_open(d_inode(de), req, itp); ll_set_lock_data(sbi->ll_md_exp, inode, itp, NULL); + } out: ptlrpc_req_finished(req); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 6bdbf28..7491397 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -916,6 +916,9 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, ssize_t ll_copy_user_md(const struct lov_user_md __user *md, struct lov_user_md **kbuf); +void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, + struct lookup_intent *it); + /* Compute expected user md size when passing in a md from user space */ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum) { diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index f835abb..4ac62b2 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -600,6 +600,9 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, if (rc) return rc; + if (it->it_op & IT_OPEN) + ll_dom_finish_open(inode, request, it); + ll_set_lock_data(ll_i2sbi(parent)->ll_md_exp, inode, it, &bits); /* We used to query real size from OSTs here, but actually diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 6b87e76..0c52bcf 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -456,6 +456,36 @@ static ssize_t mdc_stats_seq_write(struct file *file, } LPROC_SEQ_FOPS(mdc_stats); +static int mdc_dom_min_repsize_seq_show(struct seq_file *m, void *v) +{ + struct obd_device *dev = m->private; + + seq_printf(m, "%u\n", dev->u.cli.cl_dom_min_inline_repsize); + + return 0; +} + +static ssize_t mdc_dom_min_repsize_seq_write(struct file *file, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct obd_device *dev; + unsigned int val; + int rc; + + dev = ((struct seq_file *)file->private_data)->private; + rc = kstrtouint_from_user(buffer, count, 0, &val); + if (rc) + return rc; + + if (val > MDC_DOM_MAX_INLINE_REPSIZE) + return -ERANGE; + + dev->u.cli.cl_dom_min_inline_repsize = val; + return count; +} +LPROC_SEQ_FOPS(mdc_dom_min_repsize); + LPROC_SEQ_FOPS_RO_TYPE(mdc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(mdc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(mdc, timeouts); @@ -489,6 +519,8 @@ static ssize_t mdc_stats_seq_write(struct file *file, .fops = &mdc_unstable_stats_fops }, { .name = "mdc_stats", .fops = &mdc_stats_fops }, + { .name = "mdc_dom_min_repsize", + .fops = &mdc_dom_min_repsize_fops }, { NULL } }; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 079539d..6cfa79c 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -159,4 +159,8 @@ int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock, struct ldlm_lock_desc *new, void *data, int flag); int mdc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data); int mdc_fill_lvb(struct ptlrpc_request *req, struct ost_lvb *lvb); + +#define MDC_DOM_DEF_INLINE_REPSIZE 8192 +#define MDC_DOM_MAX_INLINE_REPSIZE XATTR_SIZE_MAX + #endif diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 2e4a5c6..abbc908 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -254,8 +254,9 @@ static int mdc_save_lovea(struct ptlrpc_request *req, u32 lmmsize = op_data->op_data_size; LIST_HEAD(cancels); int count = 0; - int mode; + enum ldlm_mode mode; int rc; + int repsize; it->it_create_mode = (it->it_create_mode & ~S_IFMT) | S_IFREG; @@ -336,7 +337,32 @@ static int mdc_save_lovea(struct ptlrpc_request *req, obddev->u.cli.cl_max_mds_easize); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); + /** + * Inline buffer for possible data from Data-on-MDT files. + */ + req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, RCL_SERVER, + sizeof(struct niobuf_remote)); ptlrpc_request_set_replen(req); + + /* Get real repbuf allocated size as rounded up power of 2 */ + repsize = size_roundup_power2(req->rq_replen + + lustre_msg_early_size()); + + /* Estimate free space for DoM files in repbuf */ + repsize -= req->rq_replen - obddev->u.cli.cl_max_mds_easize + + sizeof(struct lov_comp_md_v1) + + sizeof(struct lov_comp_md_entry_v1) + + lov_mds_md_size(0, LOV_MAGIC_V3); + + if (repsize < obddev->u.cli.cl_dom_min_inline_repsize) { + repsize = obddev->u.cli.cl_dom_min_inline_repsize - repsize; + req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, + RCL_SERVER, + sizeof(struct niobuf_remote) + repsize); + ptlrpc_request_set_replen(req); + CDEBUG(D_INFO, "Increase repbuf by %d bytes, total: %d\n", + repsize, req->rq_replen); + } return req; } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index feac374..b173937 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2551,6 +2551,8 @@ int mdc_setup(struct obd_device *obd, struct lustre_cfg *cfg) if (rc) goto err_osc_cleanup; + obd->u.cli.cl_dom_min_inline_repsize = MDC_DOM_DEF_INLINE_REPSIZE; + ns_register_cancel(obd->obd_namespace, mdc_cancel_weight); obd->obd_namespace->ns_lvbo = &inode_lvbo; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 8fe661d..c11b1b0 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -414,7 +414,8 @@ &RMF_MDT_MD, &RMF_ACL, &RMF_CAPA1, - &RMF_CAPA2 + &RMF_CAPA2, + &RMF_NIOBUF_INLINE, }; static const struct req_msg_field *ldlm_intent_getattr_client[] = { @@ -1065,8 +1066,14 @@ struct req_msg_field RMF_NIOBUF_REMOTE = dump_rniobuf); EXPORT_SYMBOL(RMF_NIOBUF_REMOTE); +struct req_msg_field RMF_NIOBUF_INLINE = + DEFINE_MSGF("niobuf_inline", RMF_F_NO_SIZE_CHECK, + sizeof(struct niobuf_remote), lustre_swab_niobuf_remote, + dump_rniobuf); +EXPORT_SYMBOL(RMF_NIOBUF_INLINE); + struct req_msg_field RMF_RCS = - DEFINE_MSGF("niobuf_remote", RMF_F_STRUCT_ARRAY, sizeof(u32), + DEFINE_MSGF("niobuf_rcs", RMF_F_STRUCT_ARRAY, sizeof(u32), lustre_swab_generic_32s, dump_rcs); EXPORT_SYMBOL(RMF_RCS); diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 2e866fe..e8ba57b 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -617,6 +617,11 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) request->rq_status = rc; goto cleanup_bulk; } + /* Use real allocated value in lm_repsize, + * so the server may use whole reply buffer + * without resends where it is needed. + */ + request->rq_reqmsg->lm_repsize = request->rq_repbuf_len; } else { request->rq_repdata = NULL; request->rq_repmsg = NULL; From patchwork Thu Feb 27 21:09:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AC60159A for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51991246A0 for ; Thu, 27 Feb 2020 21:22:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51991246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E24021FA9B; Thu, 27 Feb 2020 13:20:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DF0621FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 15A08ED8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1378046C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:00 -0500 Message-Id: <1582838290-17243-73-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 072/622] lustre: migrate: pack lmv ea in migrate rpc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao To support stripe directory migration, pack lmv_user_md in migrate RPC. Add arguments of 'mdt-count' and 'mdt-hash' for 'lfs migrate'. Disable directory migration related tests temprorily, and we'll enable them later in the last patch of this set. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 470bdeec6ca5 ("LU-4684 migrate: pack lmv ea in migrate rpc") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31424 Reviewed-by: Andreas Dilger Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 19 ++++++---- fs/lustre/llite/file.c | 67 +++++++++++++++++---------------- fs/lustre/llite/llite_internal.h | 4 +- fs/lustre/llite/llite_lib.c | 4 +- fs/lustre/mdc/mdc_lib.c | 21 +++++++---- fs/lustre/mdc/mdc_reint.c | 20 ++-------- fs/lustre/ptlrpc/layout.c | 3 +- include/uapi/linux/lustre/lustre_idl.h | 2 +- include/uapi/linux/lustre/lustre_user.h | 8 +++- 9 files changed, 77 insertions(+), 71 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index c0c3bf0..751d0183 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1322,7 +1322,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto finish_req; } - lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1); + lum_size = lmv_user_md_size(stripe_count, + LMV_USER_MAGIC_SPECIFIC); tmp = kzalloc(lum_size, GFP_NOFS); if (!tmp) { rc = -ENOMEM; @@ -1655,14 +1656,14 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return rc; } case LL_IOC_MIGRATE: { - const char *filename; + struct lmv_user_md *lum; + char *filename; int namelen = 0; int len; int rc; - int mdtidx; rc = obd_ioctl_getdata(&data, &len, (void __user *)arg); - if (rc < 0) + if (rc) return rc; if (!data->ioc_inlbuf1 || !data->ioc_inlbuf2 || @@ -1674,17 +1675,21 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) filename = data->ioc_inlbuf1; namelen = data->ioc_inllen1; if (namelen < 1 || namelen != strlen(filename) + 1) { + CDEBUG(D_INFO, "IOC_MDC_LOOKUP missing filename\n"); rc = -EINVAL; goto migrate_free; } - if (data->ioc_inllen2 != sizeof(mdtidx)) { + lum = (struct lmv_user_md *)data->ioc_inlbuf2; + if (lum->lum_magic != LMV_USER_MAGIC && + lum->lum_magic != LMV_USER_MAGIC_SPECIFIC) { rc = -EINVAL; + CERROR("%s: wrong lum magic %x: rc = %d\n", + filename, lum->lum_magic, rc); goto migrate_free; } - mdtidx = *(int *)data->ioc_inlbuf2; - rc = ll_migrate(inode, file, mdtidx, filename, namelen - 1); + rc = ll_migrate(inode, file, lum, filename); migrate_free: kvfree(data); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 7657c79..68fb623 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3785,8 +3785,8 @@ int ll_get_fid_by_name(struct inode *parent, const char *name, return rc; } -int ll_migrate(struct inode *parent, struct file *file, int mdtidx, - const char *name, int namelen) +int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, + const char *name) { struct ptlrpc_request *request = NULL; struct obd_client_handle *och = NULL; @@ -3795,16 +3795,18 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, struct md_op_data *op_data; struct mdt_body *body; u64 data_version = 0; + size_t namelen = strlen(name); + int lumlen = lmv_user_md_size(lum->lum_stripe_count, lum->lum_magic); struct qstr qstr; int rc; - CDEBUG(D_VFSTRACE, "migrate %s under " DFID " to MDT%d\n", - name, PFID(ll_inode2fid(parent)), mdtidx); + CDEBUG(D_VFSTRACE, "migrate " DFID "/%s to MDT%d stripe count %d\n", + PFID(ll_inode2fid(parent)), name, + lum->lum_stripe_offset, lum->lum_stripe_count); - op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, - 0, LUSTRE_OPC_ANY, NULL); - if (IS_ERR(op_data)) - return PTR_ERR(op_data); + if (lum->lum_magic != cpu_to_le32(LMV_USER_MAGIC) && + lum->lum_magic != cpu_to_le32(LMV_USER_MAGIC_SPECIFIC)) + lustre_swab_lmv_user_md(lum); /* Get child FID first */ qstr.hash = full_name_hash(file_dentry(file), name, namelen); @@ -3818,16 +3820,14 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, } if (!child_inode) { - rc = ll_get_fid_by_name(parent, name, namelen, - &op_data->op_fid3, &child_inode); + rc = ll_get_fid_by_name(parent, name, namelen, NULL, + &child_inode); if (rc) - goto out_free; + return rc; } - if (!child_inode) { - rc = -EINVAL; - goto out_free; - } + if (!child_inode) + return -ENOENT; /* * lfs migrate command needs to be blocked on the client @@ -3839,6 +3839,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_iput; } + op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, + child_inode->i_mode, LUSTRE_OPC_ANY, NULL); + if (IS_ERR(op_data)) { + rc = PTR_ERR(op_data); + goto out_iput; + } + inode_lock(child_inode); op_data->op_fid3 = *ll_inode2fid(child_inode); if (!fid_is_sane(&op_data->op_fid3)) { @@ -3849,16 +3856,10 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_unlock; } - rc = ll_get_mdt_idx_by_fid(ll_i2sbi(parent), &op_data->op_fid3); - if (rc < 0) - goto out_unlock; + op_data->op_cli_flags |= CLI_MIGRATE | CLI_SET_MEA; + op_data->op_data = lum; + op_data->op_data_size = lumlen; - if (rc == mdtidx) { - CDEBUG(D_INFO, "%s: " DFID " is already on MDT%d.\n", name, - PFID(&op_data->op_fid3), mdtidx); - rc = 0; - goto out_unlock; - } again: if (S_ISREG(child_inode->i_mode)) { och = ll_lease_open(child_inode, NULL, FMODE_WRITE, 0); @@ -3874,16 +3875,17 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto out_close; op_data->op_handle = och->och_fh; - op_data->op_data = och->och_mod; op_data->op_data_version = data_version; op_data->op_lease_handle = och->och_lease_handle; - op_data->op_bias |= MDS_RENAME_MIGRATE; + op_data->op_bias |= MDS_CLOSE_MIGRATE; + + spin_lock(&och->och_mod->mod_open_req->rq_lock); + och->och_mod->mod_open_req->rq_replay = 0; + spin_unlock(&och->och_mod->mod_open_req->rq_lock); } - op_data->op_mds = mdtidx; - op_data->op_cli_flags = CLI_MIGRATE; - rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name, - namelen, name, namelen, &request); + rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name, namelen, + name, namelen, &request); if (!rc) { LASSERT(request); ll_update_times(request, parent); @@ -3915,16 +3917,15 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx, goto again; out_close: - if (och) /* close the file */ + if (och) ll_lease_close(och, child_inode, NULL); if (!rc) clear_nlink(child_inode); out_unlock: inode_unlock(child_inode); + ll_finish_md_op_data(op_data); out_iput: iput(child_inode); -out_free: - ll_finish_md_op_data(op_data); return rc; } diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 7491397..edb5f2a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -824,8 +824,8 @@ int ll_getattr(const struct path *path, struct kstat *stat, #define ll_set_acl NULL #endif /* CONFIG_LUSTRE_FS_POSIX_ACL */ -int ll_migrate(struct inode *parent, struct file *file, int mdtidx, - const char *name, int namelen); +int ll_migrate(struct inode *parent, struct file *file, + struct lmv_user_md *lum, const char *name); int ll_get_fid_by_name(struct inode *parent, const char *name, int namelen, struct lu_fid *fid, struct inode **inode); int ll_inode_permission(struct inode *inode, int mask); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 56624e8..c04146f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -209,7 +209,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_GRANT_PARAM | OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2; - data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT; + data->ocd_connect_flags2 = OBD_CONNECT2_FLR | + OBD_CONNECT2_LOCK_CONVERT | + OBD_CONNECT2_DIR_MIGRATE; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index e2f1a49..1d38574 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -443,7 +443,7 @@ static void mdc_close_intent_pack(struct ptlrpc_request *req, struct close_data *data; struct ldlm_lock *lock; - if (!(bias & (MDS_CLOSE_INTENT | MDS_RENAME_MIGRATE))) + if (!(bias & (MDS_CLOSE_INTENT | MDS_CLOSE_MIGRATE))) return; data = req_capsule_client_get(&req->rq_pill, &RMF_CLOSE_DATA); @@ -507,13 +507,20 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, if (new) mdc_pack_name(req, &RMF_SYMTGT, new, newlen); - if (op_data->op_cli_flags & CLI_MIGRATE && - op_data->op_bias & MDS_RENAME_MIGRATE) { - struct mdt_ioepoch *epoch; + if (op_data->op_cli_flags & CLI_MIGRATE) { + char *tmp; - mdc_close_intent_pack(req, op_data); - epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH); - mdc_ioepoch_pack(epoch, op_data); + if (op_data->op_bias & MDS_CLOSE_MIGRATE) { + struct mdt_ioepoch *epoch; + + mdc_close_intent_pack(req, op_data); + epoch = req_capsule_client_get(&req->rq_pill, + &RMF_MDT_EPOCH); + mdc_ioepoch_pack(epoch, op_data); + } + + tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA); + memcpy(tmp, op_data->op_data, op_data->op_data_size); } } diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index d326962..030c247 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -390,6 +390,9 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, oldlen + 1); req_capsule_set_size(&req->rq_pill, &RMF_SYMTGT, RCL_CLIENT, newlen + 1); + if (op_data->op_cli_flags & CLI_MIGRATE) + req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, + op_data->op_data_size); rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { @@ -397,23 +400,6 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, return rc; } - if (op_data->op_cli_flags & CLI_MIGRATE && op_data->op_data) { - struct md_open_data *mod = op_data->op_data; - - LASSERTF(mod->mod_open_req && - mod->mod_open_req->rq_type != LI_POISON, - "POISONED open %p!\n", mod->mod_open_req); - - DEBUG_REQ(D_HA, mod->mod_open_req, "matched open"); - /* - * We no longer want to preserve this open for replay even - * though the open was committed. b=3632, b=3633 - */ - spin_lock(&mod->mod_open_req->rq_lock); - mod->mod_open_req->rq_replay = 0; - spin_unlock(&mod->mod_open_req->rq_lock); - } - if (exp_connect_cancelset(exp) && req) ldlm_cli_cancel_list(&cancels, count, req, 0); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index c11b1b0..ae573a2 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -263,7 +263,8 @@ &RMF_SYMTGT, &RMF_DLM_REQ, &RMF_MDT_EPOCH, - &RMF_CLOSE_DATA + &RMF_CLOSE_DATA, + &RMF_EADATA }; static const struct req_msg_field *mds_last_unlink_server[] = { diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 9f8d65d..75326c0 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1693,7 +1693,7 @@ enum mds_op_bias { MDS_CREATE_VOLATILE = 1 << 10, MDS_OWNEROVERRIDE = 1 << 11, MDS_HSM_RELEASE = 1 << 12, - MDS_RENAME_MIGRATE = 1 << 13, + MDS_CLOSE_MIGRATE = 1 << 13, MDS_CLOSE_LAYOUT_SWAP = 1 << 14, MDS_CLOSE_LAYOUT_MERGE = 1 << 15, MDS_CLOSE_RESYNC_DONE = 1 << 16, diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 8fd5b26..421c977 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -632,8 +632,12 @@ struct lmv_user_md_v1 { static inline int lmv_user_md_size(int stripes, int lmm_magic) { - return sizeof(struct lmv_user_md) + - stripes * sizeof(struct lmv_user_mds_data); + int size = sizeof(struct lmv_user_md); + + if (lmm_magic == LMV_USER_MAGIC_SPECIFIC) + size += stripes * sizeof(struct lmv_user_mds_data); + + return size; } struct ll_recreate_obj { From patchwork Thu Feb 27 21:09:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81E8214BC for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6A7B7246A0 for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A7B7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2ED034879C; Thu, 27 Feb 2020 13:20:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E35CB21FA75 for ; Thu, 27 Feb 2020 13:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 17B14ED9; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1678046D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:01 -0500 Message-Id: <1582838290-17243-74-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 073/622] lustre: hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Teddy Zheng , Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Teddy Zheng Clients registed to MDS with OBD_CONNECT2_ARCHIVE_ID_ARRAY will use array to pass ARCHIVED IDs. While clients without it still use bitmap. This flag allows old clients connect to new MDSs. WC-bug-id: https://jira.whamcloud.com/browse/LU-10114 Lustre-commit: 1c7e7d1243f7 ("LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array") Signed-off-by: Teddy Zheng Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/32806 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 385359f..fbd46df 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -119,6 +119,7 @@ "flr", /* 0x20 */ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ + "archive_id_array", /* 0x100 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index e22f8f8..1afbb41 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1141,6 +1141,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_WBC_INTENTS); LASSERTF(OBD_CONNECT2_LOCK_CONVERT == 0x80ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_LOCK_CONVERT); + LASSERTF(OBD_CONNECT2_ARCHIVE_ID_ARRAY == 0x100ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ARCHIVE_ID_ARRAY); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 75326c0..dc9872cf3 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -800,6 +800,7 @@ struct ptlrpc_body_v2 { * locks */ #define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ +#define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:09:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F35614BC for ; Thu, 27 Feb 2020 21:22:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 538E5246A0 for ; Thu, 27 Feb 2020 21:22:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 538E5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B994348973; Thu, 27 Feb 2020 13:21:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 309A521FA64 for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1B570EDA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 19D71468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:02 -0500 Message-Id: <1582838290-17243-75-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 074/622] lustre: llite: handle zero length xattr values correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In mdt_getxattr(), set OBD_MD_FLXATTR in mbo_valid of the reply's MDT body so that the client can distinguish between nonexistent extended attributes and zero length values. In ll_xattr_list() and ll_getxattr_common() test for OBD_MD_FLXATTR and return 0 rather than -ENODATA in the appropriate cases. Add sanity test_102t() to test that zero length values are handled correctly. Lustre-commit: 1e4164a1254d ("LU-11109 mdt: handle zero length xattr values correctly") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32755 Reviewed-by: Andreas Dilger Reviewed-by: Mikhail Pershin Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/llite/xattr.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index f25ae59..636334e 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -363,6 +363,11 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, /* only detect the xattr size */ if (size == 0) { + /* LU-11109: Older MDTs do not distinguish + * between nonexistent xattrs and zero length + * values in this case. Newer MDTs will return + * -ENODATA or set OBD_MD_FLXATTR. + */ rc = body->mbo_eadatasize; goto out; } @@ -375,7 +380,22 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, } if (body->mbo_eadatasize == 0) { - rc = -ENODATA; + /* LU-11109: Newer MDTs set OBD_MD_FLXATTR on + * success so that we can distinguish between + * zero length value and nonexistent xattr. + * + * If OBD_MD_FLXATTR is not set then we keep + * the old behavior and return -ENODATA for + * getxattr() when mbo_eadatasize is 0. But + * -ENODATA only makes sense for getxattr() + * and not for listxattr(). + */ + if (body->mbo_valid & OBD_MD_FLXATTR) + rc = 0; + else if (valid == OBD_MD_FLXATTR) + rc = -ENODATA; + else + rc = 0; goto out; } From patchwork Thu Feb 27 21:09:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409787 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A889B138D for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8FA85246A0 for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FA85246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 04FA8348832; Thu, 27 Feb 2020 13:20:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7730221FAAF for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1FF10EE3; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1D23946A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:03 -0500 Message-Id: <1582838290-17243-76-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 075/622] lnet: refactor lnet_select_pathway() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata lnet_select_pathway() is a complex monolithic function which handles many send cases. Broke down lnet_select_pathway() to multiple functions. Each function handles a different send case. This will make it easier to add the handling of the different health cases in future patches. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 4e48761a5719 ("LU-9120 lnet: refactor lnet_select_pathway()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32760 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 13 + net/lnet/lnet/lib-move.c | 1398 ++++++++++++++++++++++++++--------------- 2 files changed, 911 insertions(+), 500 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 22c6152..20b4660 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -827,6 +827,19 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return false; } +static inline struct lnet_peer_net * +lnet_find_peer_net_locked(struct lnet_peer *peer, u32 net_id) +{ + struct lnet_peer_net *peer_net; + + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { + if (peer_net->lpn_net_id == net_id) + return peer_net; + } + + return NULL; +} + static inline void lnet_peer_set_alive(struct lnet_peer_ni *lp) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index cab830a..10aa753 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -45,6 +45,23 @@ module_param(local_nid_dist_zero, int, 0444); MODULE_PARM_DESC(local_nid_dist_zero, "Reserved"); +struct lnet_send_data { + struct lnet_ni *sd_best_ni; + struct lnet_peer_ni *sd_best_lpni; + struct lnet_peer_ni *sd_final_dst_lpni; + struct lnet_peer *sd_peer; + struct lnet_peer *sd_gw_peer; + struct lnet_peer_ni *sd_gw_lpni; + struct lnet_peer_net *sd_peer_net; + struct lnet_msg *sd_msg; + lnet_nid_t sd_dst_nid; + lnet_nid_t sd_src_nid; + lnet_nid_t sd_rtr_nid; + int sd_cpt; + int sd_md_cpt; + u32 sd_send_case; +}; + static inline struct lnet_comm_count * get_stats_counts(struct lnet_element_stats *stats, enum lnet_stats_type stats_type) @@ -1188,7 +1205,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_peer_ni * -lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target, +lnet_find_route_locked(struct lnet_net *net, u32 remote_net, lnet_nid_t rtr_nid) { struct lnet_remotenet *rnet; @@ -1203,7 +1220,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * If @rtr_nid is not LNET_NID_ANY, return the gateway with * rtr_nid nid, otherwise find the best gateway I can use */ - rnet = lnet_find_rnet_locked(LNET_NIDNET(target)); + rnet = lnet_find_rnet_locked(remote_net); if (!rnet) return NULL; @@ -1252,13 +1269,20 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_ni * -lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *cur_ni, +lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *best_ni, + struct lnet_peer *peer, struct lnet_peer_net *peer_net, int md_cpt) { - struct lnet_ni *ni = NULL, *best_ni = cur_ni; + struct lnet_ni *ni = NULL; unsigned int shortest_distance; int best_credits; + /* If there is no peer_ni that we can send to on this network, + * then there is no point in looking for a new best_ni here. + */ + if (!lnet_get_next_peer_ni_locked(peer, peer_net, NULL)) + return best_ni; + if (!best_ni) { shortest_distance = UINT_MAX; best_credits = INT_MIN; @@ -1286,6 +1310,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, md_cpt, ni->ni_dev_cpt); + CDEBUG(D_NET, + "compare ni %s [c:%d, d:%d, s:%d] with best_ni %s [c:%d, d:%d, s:%d]\n", + libcfs_nid2str(ni->ni_nid), ni_credits, distance, + ni->ni_seq, (best_ni) ? libcfs_nid2str(best_ni->ni_nid) + : "not seleced", best_credits, shortest_distance, + (best_ni) ? best_ni->ni_seq : 0); + /* * All distances smaller than the NUMA range * are treated equally. @@ -1311,6 +1342,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_credits = ni_credits; } + CDEBUG(D_NET, "selected best_ni %s\n", + (best_ni) ? libcfs_nid2str(best_ni->ni_nid) : "no selection"); + return best_ni; } @@ -1335,421 +1369,140 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return false; } +#define SRC_SPEC 0x0001 +#define SRC_ANY 0x0002 +#define LOCAL_DST 0x0004 +#define REMOTE_DST 0x0008 +#define MR_DST 0x0010 +#define NMR_DST 0x0020 +#define SND_RESP 0x0040 + +/* The following to defines are used for return codes */ +#define REPEAT_SEND 0x1000 +#define PASS_THROUGH 0x2000 + +/* The different cases lnet_select pathway needs to handle */ +#define SRC_SPEC_LOCAL_MR_DST (SRC_SPEC | LOCAL_DST | MR_DST) +#define SRC_SPEC_ROUTER_MR_DST (SRC_SPEC | REMOTE_DST | MR_DST) +#define SRC_SPEC_LOCAL_NMR_DST (SRC_SPEC | LOCAL_DST | NMR_DST) +#define SRC_SPEC_ROUTER_NMR_DST (SRC_SPEC | REMOTE_DST | NMR_DST) +#define SRC_ANY_LOCAL_MR_DST (SRC_ANY | LOCAL_DST | MR_DST) +#define SRC_ANY_ROUTER_MR_DST (SRC_ANY | REMOTE_DST | MR_DST) +#define SRC_ANY_LOCAL_NMR_DST (SRC_ANY | LOCAL_DST | NMR_DST) +#define SRC_ANY_ROUTER_NMR_DST (SRC_ANY | REMOTE_DST | NMR_DST) + static int -lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, - struct lnet_msg *msg, lnet_nid_t rtr_nid) +lnet_handle_send(struct lnet_send_data *sd) { - struct lnet_ni *best_ni = NULL; - struct lnet_peer_ni *best_lpni = NULL; - struct lnet_peer_ni *best_gw = NULL; - struct lnet_peer_ni *lpni; - struct lnet_peer_ni *final_dst; - struct lnet_peer *peer; - struct lnet_peer_net *peer_net; - struct lnet_net *local_net; - int cpt, cpt2, rc; - bool routing; - bool routing2; - bool ni_is_pref; - bool preferred; - bool local_found; - int best_lpni_credits; - int md_cpt; - - /* - * get an initial CPT to use for locking. The idea here is not to - * serialize the calls to select_pathway, so that as many - * operations can run concurrently as possible. To do that we use - * the CPT where this call is being executed. Later on when we - * determine the CPT to use in lnet_message_commit, we switch the - * lock and check if there was any configuration change. If none, - * then we proceed, if there is, then we restart the operation. - */ - cpt = lnet_net_lock_current(); - - md_cpt = lnet_cpt_of_md(msg->msg_md, msg->msg_offset); - if (md_cpt == CFS_CPT_ANY) - md_cpt = cpt; - -again: - best_ni = NULL; - best_lpni = NULL; - best_gw = NULL; - final_dst = NULL; - local_net = NULL; - routing = false; - routing2 = false; - local_found = false; - - /* - * lnet_nid2peerni_locked() is the path that will find an - * existing peer_ni, or create one and mark it as having been - * created due to network traffic. - */ - lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt); - if (IS_ERR(lpni)) { - lnet_net_unlock(cpt); - return PTR_ERR(lpni); - } + struct lnet_ni *best_ni = sd->sd_best_ni; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + struct lnet_peer_ni *final_dst_lpni = sd->sd_final_dst_lpni; + struct lnet_msg *msg = sd->sd_msg; + int cpt2; + u32 send_case = sd->sd_send_case; + int rc; + u32 routing = send_case & REMOTE_DST; - /* If we're being asked to send to the loopback interface, there - * is no need to go through any selection. We can just shortcut - * the entire process and send over lolnd + /* Increment sequence number of the selected peer so that we + * pick the next one in Round Robin. */ - if (LNET_NETTYP(LNET_NIDNET(dst_nid)) == LOLND) { - lnet_peer_ni_decref_locked(lpni); - best_ni = the_lnet.ln_loni; - goto send; - } + best_lpni->lpni_seq++; - /* - * Now that we have a peer_ni, check if we want to discover - * the peer. Traffic to the LNET_RESERVED_PORTAL should not - * trigger discovery. + /* grab a reference on the peer_ni so it sticks around even if + * we need to drop and relock the lnet_net_lock below. */ - peer = lpni->lpni_peer_net->lpn_peer; - if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { - rc = lnet_discover_peer_locked(lpni, cpt, false); - if (rc) { - lnet_peer_ni_decref_locked(lpni); - lnet_net_unlock(cpt); - return rc; - } - /* The peer may have changed. */ - peer = lpni->lpni_peer_net->lpn_peer; - /* queue message and return */ - msg->msg_src_nid_param = src_nid; - msg->msg_rtr_nid_param = rtr_nid; - msg->msg_sending = 0; - list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); - CDEBUG(D_NET, "%s pending discovery\n", - libcfs_nid2str(peer->lp_primary_nid)); - lnet_peer_ni_decref_locked(lpni); - lnet_net_unlock(cpt); - - return LNET_DC_WAIT; - } - lnet_peer_ni_decref_locked(lpni); - - /* If peer is not healthy then can not send anything to it */ - if (!lnet_is_peer_healthy_locked(peer)) { - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } + lnet_peer_ni_addref_locked(best_lpni); - /* - * STEP 1: first jab at determining best_ni - * if src_nid is explicitly specified, then best_ni is already - * pre-determiend for us. Otherwise we need to select the best - * one to use later on + /* Use lnet_cpt_of_nid() to determine the CPT used to commit the + * message. This ensures that we get a CPT that is correct for + * the NI when the NI has been restricted to a subset of all CPTs. + * If the selected CPT differs from the one currently locked, we + * must unlock and relock the lnet_net_lock(), and then check whether + * the configuration has changed. We don't have a hold on the best_ni + * yet, and it may have vanished. */ - if (src_nid != LNET_NID_ANY) { - best_ni = lnet_nid2ni_locked(src_nid, cpt); - if (!best_ni) { - lnet_net_unlock(cpt); - LCONSOLE_WARN("Can't send to %s: src %s is not a local nid\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EINVAL; - } - } + cpt2 = lnet_cpt_of_nid_locked(best_lpni->lpni_nid, best_ni); + if (sd->sd_cpt != cpt2) { + u32 seq = lnet_get_dlc_seq_locked(); - if (msg->msg_type == LNET_MSG_REPLY || - msg->msg_type == LNET_MSG_ACK || - !lnet_peer_is_multi_rail(peer) || - best_ni) { - /* - * for replies we want to respond on the same peer_ni we - * received the message on if possible. If not, then pick - * a peer_ni to send to - * - * if the peer is non-multi-rail then you want to send to - * the dst_nid provided as well. - * - * If the best_ni has already been determined, IE the - * src_nid has been specified, then use the - * destination_nid provided as well, since we're - * continuing a series of related messages for the same - * RPC. - * - * It is expected to find the lpni using dst_nid, since we - * created it earlier. - */ - best_lpni = lnet_find_peer_ni_locked(dst_nid); - if (best_lpni) + lnet_net_unlock(sd->sd_cpt); + sd->sd_cpt = cpt2; + lnet_net_lock(sd->sd_cpt); + if (seq != lnet_get_dlc_seq_locked()) { lnet_peer_ni_decref_locked(best_lpni); - - if (best_lpni && !lnet_get_net_locked(LNET_NIDNET(dst_nid))) { - /* - * this lpni is not on a local network so we need - * to route this reply. - */ - best_gw = lnet_find_route_locked(NULL, - best_lpni->lpni_nid, - rtr_nid); - if (best_gw) { - /* - * RULE: Each node considers only the next-hop - * - * We're going to route the message, - * so change the peer to the router. - */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - peer = best_gw->lpni_peer_net->lpn_peer; - - /* - * if the router is not multi-rail - * then use the best_gw found to send - * the message to - */ - if (!lnet_peer_is_multi_rail(peer)) - best_lpni = best_gw; - else - best_lpni = NULL; - - routing = true; - } else { - best_lpni = NULL; - } - } else if (!best_lpni) { - lnet_net_unlock(cpt); - CERROR("unable to send msg_type %d to originating %s. Destination NID not in DB\n", - msg->msg_type, libcfs_nid2str(dst_nid)); - return -EINVAL; - } - } - - /* - * We must use a consistent source address when sending to a - * non-MR peer. However, a non-MR peer can have multiple NIDs - * on multiple networks, and we may even need to talk to this - * peer on multiple networks -- certain types of - * load-balancing configuration do this. - * - * So we need to pick the NI the peer prefers for this - * particular network. - */ - if (!lnet_peer_is_multi_rail(peer)) { - if (!best_lpni) { - lnet_net_unlock(cpt); - CERROR("no route to %s\n", - libcfs_nid2str(dst_nid)); - return -EHOSTUNREACH; - } - - /* best ni is already set if src_nid was provided */ - if (!best_ni) { - /* Get the target peer_ni */ - peer_net = lnet_peer_get_net_locked( - peer, LNET_NIDNET(best_lpni->lpni_nid)); - list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_peer_nis) { - if (lpni->lpni_pref_nnids == 0) - continue; - LASSERT(lpni->lpni_pref_nnids == 1); - best_ni = lnet_nid2ni_locked( - lpni->lpni_pref.nid, cpt); - break; - } + return REPEAT_SEND; } - /* if best_ni is still not set just pick one */ - if (!best_ni) { - best_ni = lnet_net2ni_locked( - best_lpni->lpni_net->net_id, cpt); - /* If there is no best_ni we don't have a route */ - if (!best_ni) { - CERROR("no path to %s from net %s\n", - libcfs_nid2str(best_lpni->lpni_nid), - libcfs_net2str(best_lpni->lpni_net->net_id)); - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - lpni = list_first_entry(&peer_net->lpn_peer_nis, - struct lnet_peer_ni, - lpni_peer_nis); - } - /* Set preferred NI if necessary. */ - if (lpni->lpni_pref_nnids == 0) - lnet_peer_ni_set_non_mr_pref_nid(lpni, best_ni->ni_nid); } - /* - * if we already found a best_ni because src_nid is specified and - * best_lpni because we are replying to a message then just send - * the message + /* store the best_lpni in the message right away to avoid having + * to do the same operation under different conditions */ - if (best_ni && best_lpni) - goto send; + msg->msg_txpeer = best_lpni; + msg->msg_txni = best_ni; - /* - * If we already found a best_ni because src_nid is specified then - * pick the peer then send the message + /* grab a reference for the best_ni since now it's in use in this + * send. The reference will be dropped in lnet_finalize() */ - if (best_ni) - goto pick_peer; + lnet_ni_addref_locked(msg->msg_txni, sd->sd_cpt); - /* - * pick the best_ni by going through all the possible networks of - * that peer and see which local NI is best suited to talk to that - * peer. - * - * Locally connected networks will always be preferred over - * a routed network. If there are only routed paths to the peer, - * then the best route is chosen. If all routes are equal then - * they are used in round robin. + /* Always set the target.nid to the best peer picked. Either the + * NID will be one of the peer NIDs selected, or the same NID as + * what was originally set in the target or it will be the NID of + * a router if this message should be routed */ - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { - if (!lnet_is_peer_net_healthy_locked(peer_net)) - continue; - - local_net = lnet_get_net_locked(peer_net->lpn_net_id); - if (!local_net && !routing && !local_found) { - struct lnet_peer_ni *net_gw; - - lpni = list_first_entry(&peer_net->lpn_peer_nis, - struct lnet_peer_ni, - lpni_peer_nis); - - net_gw = lnet_find_route_locked(NULL, - lpni->lpni_nid, - rtr_nid); - if (!net_gw) - continue; - - if (best_gw) { - /* - * lnet_find_route_locked() call - * will return the best_Gw on the - * lpni->lpni_nid network. - * However, best_gw and net_gw can - * be on different networks. - * Therefore need to compare them - * to pick the better of either. - */ - if (lnet_compare_peers(best_gw, net_gw) > 0) - continue; - if (best_gw->lpni_gw_seq <= net_gw->lpni_gw_seq) - continue; - } - best_gw = net_gw; - final_dst = lpni; - - routing2 = true; - } else { - best_gw = NULL; - final_dst = NULL; - routing2 = false; - local_found = true; - } - - /* - * a gw on this network is found, but there could be - * other better gateways on other networks. So don't pick - * the best_ni until we determine the best_gw. - */ - if (best_gw) - continue; - - /* if no local_net found continue */ - if (!local_net) - continue; - - /* - * Iterate through the NIs in this local Net and select - * the NI to send from. The selection is determined by - * these 3 criterion in the following priority: - * 1. NUMA - * 2. NI available credits - * 3. Round Robin - */ - best_ni = lnet_get_best_ni(local_net, best_ni, md_cpt); - } - - if (!best_ni && !best_gw) { - lnet_net_unlock(cpt); - LCONSOLE_WARN("No local ni found to send from to %s\n", - libcfs_nid2str(dst_nid)); - return -EINVAL; - } - - if (!best_ni) { - best_ni = lnet_get_best_ni(best_gw->lpni_net, best_ni, md_cpt); - LASSERT(best_gw && best_ni); - - /* - * We're going to route the message, so change the peer to - * the router. - */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - best_gw->lpni_gw_seq++; - peer = best_gw->lpni_peer_net->lpn_peer; - } + msg->msg_target.nid = msg->msg_txpeer->lpni_nid; - /* - * Now that we selected the NI to use increment its sequence - * number so the Round Robin algorithm will detect that it has - * been used and pick the next NI. + /* lnet_msg_commit assigns the correct cpt to the message, which + * is used to decrement the correct refcount on the ni when it's + * time to return the credits */ - best_ni->ni_seq++; + lnet_msg_commit(msg, sd->sd_cpt); -pick_peer: - /* - * At this point the best_ni is on a local network on which - * the peer has a peer_ni as well - */ - peer_net = lnet_peer_get_net_locked(peer, - best_ni->ni_net->net_id); - /* - * peer_net is not available or the src_nid is explicitly defined - * and the peer_net for that src_nid is unhealthy. find a route to - * the destination nid. + /* If we are routing the message then we keep the src_nid that was + * set by the originator. If we are not routing then we are the + * originator and set it here. */ - if (!peer_net || - (src_nid != LNET_NID_ANY && - !lnet_is_peer_net_healthy_locked(peer_net))) { - best_gw = lnet_find_route_locked(best_ni->ni_net, - dst_nid, - rtr_nid); - /* - * if no route is found for that network then - * move onto the next peer_ni in the peer - */ - if (!best_gw) { - LCONSOLE_WARN("No route to peer from %s\n", - libcfs_nid2str(best_ni->ni_nid)); - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - - CDEBUG(D_NET, "Best route to %s via %s for %s %d\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(best_gw->lpni_nid), - lnet_msgtyp2str(msg->msg_type), msg->msg_len); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = cpu_to_le64(msg->msg_txni->ni_nid); - routing2 = true; - /* - * RULE: Each node considers only the next-hop + if (routing) { + msg->msg_target_is_router = 1; + msg->msg_target.pid = LNET_PID_LUSTRE; + /* since we're routing we want to ensure that the + * msg_hdr.dest_nid is set to the final destination. When + * the router receives this message it knows how to route + * it. * - * We're going to route the message, so change the peer to - * the router. + * final_dst_lpni is set at the beginning of the + * lnet_select_pathway() function and is never changed. + * It's safe to use it here. */ - LASSERT(best_gw->lpni_peer_net); - LASSERT(best_gw->lpni_peer_net->lpn_peer); - peer = best_gw->lpni_peer_net->lpn_peer; - } else if (!lnet_is_peer_net_healthy_locked(peer_net)) { - /* - * this peer_net is unhealthy but we still have an opportunity - * to find another peer_net that we can use + msg->msg_hdr.dest_nid = cpu_to_le64(final_dst_lpni->lpni_nid); + } else { + /* if we're not routing set the dest_nid to the best peer + * ni NID that we picked earlier in the algorithm. */ - u32 net_id = peer_net->lpn_net_id; - - LCONSOLE_WARN("peer net %s unhealthy\n", - libcfs_net2str(net_id)); - goto again; + msg->msg_hdr.dest_nid = cpu_to_le64(msg->msg_txpeer->lpni_nid); } + rc = lnet_post_send_locked(msg, 0); + if (!rc) + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_txni->ni_nid), + libcfs_nid2str(sd->sd_src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(msg->msg_txpeer->lpni_nid), + lnet_msgtyp2str(msg->msg_type)); + + return rc; +} + +static struct lnet_peer_ni * +lnet_select_peer_ni(struct lnet_send_data *sd, struct lnet_peer *peer, + struct lnet_peer_net *peer_net) +{ /* * Look at the peer NIs for the destination peer that connect * to the chosen net. If a peer_ni is preferred when using the @@ -1758,20 +1511,30 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * the available transmit credits are used. If the transmit * credits are equal, we round-robin over the peer_ni. */ - lpni = NULL; - best_lpni_credits = INT_MIN; - preferred = false; - best_lpni = NULL; + struct lnet_peer_ni *lpni = NULL; + struct lnet_peer_ni *best_lpni = NULL; + struct lnet_ni *best_ni = sd->sd_best_ni; + lnet_nid_t dst_nid = sd->sd_dst_nid; + int best_lpni_credits = INT_MIN; + bool preferred = false; + bool ni_is_pref; + while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { - /* - * if this peer ni is not healthy just skip it, no point in - * examining it further + /* if the best_ni we've chosen aleady has this lpni + * preferred, then let's use it */ - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, best_ni->ni_nid); + CDEBUG(D_NET, "%s ni_is_pref = %d\n", + libcfs_nid2str(best_ni->ni_nid), ni_is_pref); + + if (best_lpni) + CDEBUG(D_NET, "%s c:[%d, %d], s:[%d, %d]\n", + libcfs_nid2str(lpni->lpni_nid), + lpni->lpni_txcredits, best_lpni_credits, + lpni->lpni_seq, best_lpni->lpni_seq); + /* if this is a preferred peer use it */ if (!preferred && ni_is_pref) { preferred = true; @@ -1810,131 +1573,766 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, u32 net_id = peer_net ? peer_net->lpn_net_id : LNET_NIDNET(dst_nid); - lnet_net_unlock(cpt); - LCONSOLE_WARN("no peer_ni found on peer net %s\n", - libcfs_net2str(net_id)); - return -EHOSTUNREACH; + CDEBUG(D_NET, "no peer_ni found on peer net %s\n", + libcfs_net2str(net_id)); + return NULL; } -send: - /* Shortcut for loopback. */ - if (best_ni == the_lnet.ln_loni) { - /* No send credit hassles with LOLND */ - lnet_ni_addref_locked(best_ni, cpt); - msg->msg_hdr.dest_nid = cpu_to_le64(best_ni->ni_nid); - if (!msg->msg_routing) - msg->msg_hdr.src_nid = cpu_to_le64(best_ni->ni_nid); - msg->msg_target.nid = best_ni->ni_nid; - lnet_msg_commit(msg, cpt); - msg->msg_txni = best_ni; - lnet_net_unlock(cpt); - - return LNET_CREDIT_OK; - } + CDEBUG(D_NET, "sd_best_lpni = %s\n", + libcfs_nid2str(best_lpni->lpni_nid)); - routing = routing || routing2; + return best_lpni; +} - /* - * Increment sequence number of the peer selected so that we - * pick the next one in Round Robin. - */ - best_lpni->lpni_seq++; +/* Prerequisite: the best_ni should already be set in the sd + */ +static inline struct lnet_peer_ni * +lnet_find_best_lpni_on_net(struct lnet_send_data *sd, struct lnet_peer *peer, + u32 net_id) +{ + struct lnet_peer_net *peer_net; - /* - * grab a reference on the peer_ni so it sticks around even if - * we need to drop and relock the lnet_net_lock below. + /* The gateway is Multi-Rail capable so now we must select the + * proper peer_ni */ - lnet_peer_ni_addref_locked(best_lpni); + peer_net = lnet_peer_get_net_locked(peer, net_id); - /* - * Use lnet_cpt_of_nid() to determine the CPT used to commit the - * message. This ensures that we get a CPT that is correct for - * the NI when the NI has been restricted to a subset of all CPTs. - * If the selected CPT differs from the one currently locked, we - * must unlock and relock the lnet_net_lock(), and then check whether - * the configuration has changed. We don't have a hold on the best_ni - * yet, and it may have vanished. + if (!peer_net) { + CERROR("gateway peer %s has no NI on net %s\n", + libcfs_nid2str(peer->lp_primary_nid), + libcfs_net2str(net_id)); + return NULL; + } + + return lnet_select_peer_ni(sd, peer, peer_net); +} + +static inline void +lnet_set_non_mr_pref_nid(struct lnet_send_data *sd) +{ + if (sd->sd_send_case & NMR_DST && + sd->sd_msg->msg_type != LNET_MSG_REPLY && + sd->sd_msg->msg_type != LNET_MSG_ACK && + sd->sd_best_lpni->lpni_pref_nnids == 0) { + CDEBUG(D_NET, "Setting preferred local NID %s on NMR peer %s\n", + libcfs_nid2str(sd->sd_best_ni->ni_nid), + libcfs_nid2str(sd->sd_best_lpni->lpni_nid)); + lnet_peer_ni_set_non_mr_pref_nid(sd->sd_best_lpni, + sd->sd_best_ni->ni_nid); + } +} + +/* Source Specified + * Local Destination + * non-mr peer + * + * use the source and destination NIDs as the pathway + */ +static int +lnet_handle_spec_local_nmr_dst(struct lnet_send_data *sd) +{ + /* the destination lpni is set before we get here. */ + + /* find local NI */ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; + } + + /* the preferred NID will only be set for NMR peers */ - cpt2 = lnet_cpt_of_nid_locked(best_lpni->lpni_nid, best_ni); - if (cpt != cpt2) { - u32 seq = lnet_get_dlc_seq_locked(); - lnet_net_unlock(cpt); - cpt = cpt2; - lnet_net_lock(cpt); - if (seq != lnet_get_dlc_seq_locked()) { - lnet_peer_ni_decref_locked(best_lpni); - goto again; - } + lnet_set_non_mr_pref_nid(sd); + + return lnet_handle_send(sd); +} + +/* Source Specified + * Local Destination + * MR Peer + * + * Run the selection algorithm on the peer NIs unless we're sending + * a response, in this case just send to the destination + */ +static int +lnet_handle_spec_local_mr_dst(struct lnet_send_data *sd) +{ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; } - /* - * store the best_lpni in the message right away to avoid having - * to do the same operation under different conditions + /* only run the selection algorithm to pick the peer_ni if we're + * sending a GET or a PUT. Responses are sent to the same + * destination NID provided. */ - msg->msg_txpeer = best_lpni; - msg->msg_txni = best_ni; + if (!(sd->sd_send_case & SND_RESP)) { + sd->sd_best_lpni = + lnet_find_best_lpni_on_net(sd, sd->sd_peer, + sd->sd_best_ni->ni_net->net_id); + } - /* - * grab a reference for the best_ni since now it's in use in this - * send. the reference will need to be dropped when the message is - * finished in lnet_finalize() + if (sd->sd_best_lpni) + return lnet_handle_send(sd); + + CERROR("can't send to %s. no NI on %s\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_net2str(sd->sd_best_ni->ni_net->net_id)); + + return -EHOSTUNREACH; +} + +struct lnet_ni * +lnet_find_best_ni_on_spec_net(struct lnet_ni *cur_best_ni, + struct lnet_peer *peer, + struct lnet_peer_net *peer_net, + int cpt, + bool incr_seq) +{ + struct lnet_net *local_net; + struct lnet_ni *best_ni; + + local_net = lnet_get_net_locked(peer_net->lpn_net_id); + if (!local_net) + return NULL; + + /* Iterate through the NIs in this local Net and select + * the NI to send from. The selection is determined by + * these 3 criterion in the following priority: + * 1. NUMA + * 2. NI available credits + * 3. Round Robin */ - lnet_ni_addref_locked(msg->msg_txni, cpt); + best_ni = lnet_get_best_ni(local_net, cur_best_ni, + peer, peer_net, cpt); - /* - * Always set the target.nid to the best peer picked. Either the - * nid will be one of the preconfigured NIDs, or the same NID as - * what was originally set in the target or it will be the NID of - * a router if this message should be routed + if (incr_seq && best_ni) + best_ni->ni_seq++; + + return best_ni; +} + +static int +lnet_handle_find_routed_path(struct lnet_send_data *sd, + lnet_nid_t dst_nid, + struct lnet_peer_ni **gw_lpni, + struct lnet_peer **gw_peer) +{ + struct lnet_peer_ni *gw; + lnet_nid_t src_nid = sd->sd_src_nid; + + gw = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), + sd->sd_rtr_nid); + if (!gw) { + CERROR("no route to %s from %s\n", + libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); + return -EHOSTUNREACH; + } + + /* get the peer of the gw_ni */ + LASSERT(gw->lpni_peer_net); + LASSERT(gw->lpni_peer_net->lpn_peer); + + *gw_peer = gw->lpni_peer_net->lpn_peer; + + if (!sd->sd_best_ni) + sd->sd_best_ni = + lnet_find_best_ni_on_spec_net(NULL, *gw_peer, + gw->lpni_peer_net, + sd->sd_md_cpt, + true); + + if (!sd->sd_best_ni) { + CERROR("Internal Error. Expected local ni on %s but non found :%s\n", + libcfs_net2str(gw->lpni_peer_net->lpn_net_id), + libcfs_nid2str(sd->sd_src_nid)); + return -EFAULT; + } + + /* if gw is MR let's find its best peer_ni */ - msg->msg_target.nid = msg->msg_txpeer->lpni_nid; + if (lnet_peer_is_multi_rail(*gw_peer)) { + gw = lnet_find_best_lpni_on_net(sd, *gw_peer, + sd->sd_best_ni->ni_net->net_id); + /* We've already verified that the gw has an NI on that + * desired net, but we're not finding it. Something is + * wrong. + */ + if (!gw) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } + } - /* - * lnet_msg_commit assigns the correct cpt to the message, which - * is used to decrement the correct refcount on the ni when it's - * time to return the credits + *gw_lpni = gw; + + return 0; +} + +/* Handle two cases: + * + * Case 1: + * Source specified + * Remote destination + * Non-MR destination + * + * Case 2: + * Source specified + * Remote destination + * MR destination + * + * The handling of these two cases is similar. Even though the destination + * can be MR or non-MR, we'll deal directly with the router. + */ +static int +lnet_handle_spec_router_dst(struct lnet_send_data *sd) +{ + int rc; + struct lnet_peer_ni *gw_lpni = NULL; + struct lnet_peer *gw_peer = NULL; + + /* find local NI */ + sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt); + if (!sd->sd_best_ni) { + CERROR("Can't send to %s: src %s is not a local nid\n", + libcfs_nid2str(sd->sd_dst_nid), + libcfs_nid2str(sd->sd_src_nid)); + return -EINVAL; + } + + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + if (sd->sd_send_case & NMR_DST) + /* since the final destination is non-MR let's set its preferred + * NID before we send + */ + lnet_set_non_mr_pref_nid(sd); + + /* We're going to send to the gw found so let's set its + * info */ - lnet_msg_commit(msg, cpt); + sd->sd_peer = gw_peer; + sd->sd_best_lpni = gw_lpni; - /* - * If we are routing the message then we don't need to overwrite - * the src_nid since it would've been set at the origin. Otherwise - * we are the originator so we need to set it. + return lnet_handle_send(sd); +} + +struct lnet_ni * +lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt) +{ + struct lnet_peer_net *peer_net = NULL; + struct lnet_ni *best_ni = NULL; + + /* The peer can have multiple interfaces, some of them can be on + * the local network and others on a routed network. We should + * prefer the local network. However if the local network is not + * available then we need to try the routed network */ - if (!msg->msg_routing) - msg->msg_hdr.src_nid = cpu_to_le64(msg->msg_txni->ni_nid); - if (routing) { - msg->msg_target_is_router = 1; - msg->msg_target.pid = LNET_PID_LUSTRE; - /* - * since we're routing we want to ensure that the - * msg_hdr.dest_nid is set to the final destination. When - * the router receives this message it knows how to route - * it. - */ - msg->msg_hdr.dest_nid = - cpu_to_le64(final_dst ? final_dst->lpni_nid : dst_nid); - } else { - /* - * if we're not routing set the dest_nid to the best peer - * ni that we picked earlier in the algorithm. + /* go through all the peer nets and find the best_ni */ + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { + /* The peer's list of nets can contain non-local nets. We + * want to only examine the local ones. */ - msg->msg_hdr.dest_nid = cpu_to_le64(msg->msg_txpeer->lpni_nid); + if (!lnet_get_net_locked(peer_net->lpn_net_id)) + continue; + best_ni = lnet_find_best_ni_on_spec_net(best_ni, peer, + peer_net, md_cpt, + false); } - rc = lnet_post_send_locked(msg, 0); + if (best_ni) + /* increment sequence number so we can round robin */ + best_ni->ni_seq++; + + return best_ni; +} + +static struct lnet_ni * +lnet_find_existing_preferred_best_ni(struct lnet_send_data *sd) +{ + struct lnet_ni *best_ni = NULL; + struct lnet_peer_net *peer_net; + struct lnet_peer *peer = sd->sd_peer; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + struct lnet_peer_ni *lpni; + int cpt = sd->sd_cpt; + + /* We must use a consistent source address when sending to a + * non-MR peer. However, a non-MR peer can have multiple NIDs + * on multiple networks, and we may even need to talk to this + * peer on multiple networks -- certain types of + * load-balancing configuration do this. + * + * So we need to pick the NI the peer prefers for this + * particular network. + */ + + /* Get the target peer_ni */ + peer_net = lnet_peer_get_net_locked(peer, + LNET_NIDNET(best_lpni->lpni_nid)); + LASSERT(peer_net); + list_for_each_entry(lpni, &peer_net->lpn_peer_nis, + lpni_peer_nis) { + if (lpni->lpni_pref_nnids == 0) + continue; + LASSERT(lpni->lpni_pref_nnids == 1); + best_ni = lnet_nid2ni_locked(lpni->lpni_pref.nid, cpt); + break; + } + + return best_ni; +} + +/* Prerequisite: sd->sd_peer and sd->sd_best_lpni should be set */ +static int +lnet_select_preferred_best_ni(struct lnet_send_data *sd) +{ + struct lnet_ni *best_ni = NULL; + struct lnet_peer_ni *best_lpni = sd->sd_best_lpni; + + /* We must use a consistent source address when sending to a + * non-MR peer. However, a non-MR peer can have multiple NIDs + * on multiple networks, and we may even need to talk to this + * peer on multiple networks -- certain types of + * load-balancing configuration do this. + * + * So we need to pick the NI the peer prefers for this + * particular network. + */ + + best_ni = lnet_find_existing_preferred_best_ni(sd); + + /* if best_ni is still not set just pick one */ + if (!best_ni) { + best_ni = + lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer, + sd->sd_best_lpni->lpni_peer_net, + sd->sd_md_cpt, true); + /* If there is no best_ni we don't have a route */ + if (!best_ni) { + CERROR("no path to %s from net %s\n", + libcfs_nid2str(best_lpni->lpni_nid), + libcfs_net2str(best_lpni->lpni_net->net_id)); + return -EHOSTUNREACH; + } + } + + sd->sd_best_ni = best_ni; + + /* Set preferred NI if necessary. */ + lnet_set_non_mr_pref_nid(sd); + + return 0; +} + +/* Source not specified + * Local destination + * Non-MR Peer + * + * always use the same source NID for NMR peers + * If we've talked to that peer before then we already have a preferred + * source NI associated with it. Otherwise, we select a preferred local NI + * and store it in the peer + */ +static int +lnet_handle_any_local_nmr_dst(struct lnet_send_data *sd) +{ + int rc; + + /* sd->sd_best_lpni is already set to the final destination */ + + /* At this point we should've created the peer ni and peer. If we + * can't find it, then something went wrong. Instead of assert + * output a relevant message and fail the send + */ + if (!sd->sd_best_lpni) { + CERROR("Internal fault. Unable to send msg %s to %s. NID not known\n", + lnet_msgtyp2str(sd->sd_msg->msg_type), + libcfs_nid2str(sd->sd_dst_nid)); + return -EFAULT; + } + + rc = lnet_select_preferred_best_ni(sd); if (!rc) - CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", - libcfs_nid2str(msg->msg_hdr.src_nid), - libcfs_nid2str(msg->msg_txni->ni_nid), - libcfs_nid2str(src_nid), - libcfs_nid2str(msg->msg_hdr.dest_nid), - libcfs_nid2str(dst_nid), - libcfs_nid2str(msg->msg_txpeer->lpni_nid), - lnet_msgtyp2str(msg->msg_type)); + rc = lnet_handle_send(sd); - lnet_net_unlock(cpt); + return rc; +} + +static int +lnet_handle_any_mr_dsta(struct lnet_send_data *sd) +{ + /* NOTE we've already handled the remote peer case. So we only + * need to worry about the local case here. + * + * if we're sending a response, ACK or reply, we need to send it + * to the destination NID given to us. At this point we already + * have the peer_ni we're suppose to send to, so just find the + * best_ni on the peer net and use that. Since we're sending to an + * MR peer then we can just run the selection algorithm on our + * local NIs and pick the best one. + */ + if (sd->sd_send_case & SND_RESP) { + sd->sd_best_ni = + lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer, + sd->sd_best_lpni->lpni_peer_net, + sd->sd_md_cpt, true); + + if (!sd->sd_best_ni) { + /* We're not going to deal with not able to send + * a response to the provided final destination + */ + CERROR("Can't send response to %s. No local NI available\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + return lnet_handle_send(sd); + } + + /* If we get here that means we're sending a fresh request, PUT or + * GET, so we need to run our standard selection algorithm. + * First find the best local interface that's on any of the peer's + * networks. + */ + sd->sd_best_ni = lnet_find_best_ni_on_local_net(sd->sd_peer, + sd->sd_md_cpt); + if (sd->sd_best_ni) { + sd->sd_best_lpni = + lnet_find_best_lpni_on_net(sd, sd->sd_peer, + sd->sd_best_ni->ni_net->net_id); + + /* if we're successful in selecting a peer_ni on the local + * network, then send to it. Otherwise fall through and + * try and see if we can reach it over another routed + * network + */ + if (sd->sd_best_lpni) { + /* in case we initially started with a routed + * destination, let's reset to local + */ + sd->sd_send_case &= ~REMOTE_DST; + sd->sd_send_case |= LOCAL_DST; + return lnet_handle_send(sd); + } + + CERROR("Internal Error. Expected to have a best_lpni: %s -> %s\n", + libcfs_nid2str(sd->sd_src_nid), + libcfs_nid2str(sd->sd_dst_nid)); + + return -EFAULT; + } + + /* Peer doesn't have a local network. Let's see if there is + * a remote network we can reach it on. + */ + return PASS_THROUGH; +} + +/* Case 1: + * Source NID not specified + * Local destination + * MR peer + * + * Case 2: + * Source NID not speified + * Remote destination + * MR peer + * + * In both of these cases if we're sending a response, ACK or REPLY, then + * we need to send to the destination NID provided. + * + * In the remote case let's deal with MR routers. + * + */ +static int +lnet_handle_any_mr_dst(struct lnet_send_data *sd) +{ + int rc = 0; + struct lnet_peer *gw_peer = NULL; + struct lnet_peer_ni *gw_lpni = NULL; + + /* handle sending a response to a remote peer here so we don't + * have to worry about it if we hit lnet_handle_any_mr_dsta() + */ + if (sd->sd_send_case & REMOTE_DST && + sd->sd_send_case & SND_RESP) { + struct lnet_peer_ni *gw; + struct lnet_peer *gw_peer; + + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw, + &gw_peer); + if (rc < 0) { + CERROR("Can't send response to %s. No route available\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + sd->sd_best_lpni = gw; + sd->sd_peer = gw_peer; + + return lnet_handle_send(sd); + } + + /* Even though the NID for the peer might not be on a local network, + * since the peer is MR there could be other interfaces on the + * local network. In that case we'd still like to prefer the local + * network over the routed network. If we're unable to do that + * then we select the best router among the different routed networks, + * and if the router is MR then we can deal with it as such. + */ + rc = lnet_handle_any_mr_dsta(sd); + if (rc != PASS_THROUGH) + return rc; + + /* TODO; One possible enhancement is to run the selection + * algorithm on the peer. However for remote peers the credits are + * not decremented, so we'll be basically going over the peer NIs + * in round robin. An MR router will run the selection algorithm + * on the next-hop interfaces. + */ + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + sd->sd_send_case &= ~LOCAL_DST; + sd->sd_send_case |= REMOTE_DST; + + sd->sd_peer = gw_peer; + sd->sd_best_lpni = gw_lpni; + + return lnet_handle_send(sd); +} + +/* Source not specified + * Remote destination + * Non-MR peer + * + * Must send to the specified peer NID using the same source NID that + * we've used before. If it's the first time to talk to that peer then + * find the source NI and assign it as preferred to that peer + */ +static int +lnet_handle_any_router_nmr_dst(struct lnet_send_data *sd) +{ + int rc; + struct lnet_peer_ni *gw_lpni = NULL; + struct lnet_peer *gw_peer = NULL; + + /* Let's set if we have a preferred NI to talk to this NMR peer + */ + sd->sd_best_ni = lnet_find_existing_preferred_best_ni(sd); + + /* find the router and that'll find the best NI if we didn't find + * it already. + */ + rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, + &gw_peer); + if (rc < 0) + return rc; + + /* set the best_ni we've chosen as the preferred one for + * this peer + */ + lnet_set_non_mr_pref_nid(sd); + + /* we'll be sending to the gw */ + sd->sd_best_lpni = gw_lpni; + sd->sd_peer = gw_peer; + + return lnet_handle_send(sd); +} + +static int +lnet_handle_send_case_locked(struct lnet_send_data *sd) +{ + /* Turn off the SND_RESP bit. + * It will be checked in the case handling + */ + u32 send_case = sd->sd_send_case &= ~SND_RESP; + + CDEBUG(D_NET, "Source %s%s to %s %s %s destination\n", + (send_case & SRC_SPEC) ? "Specified: " : "ANY", + (send_case & SRC_SPEC) ? libcfs_nid2str(sd->sd_src_nid) : "", + (send_case & MR_DST) ? "MR: " : "NMR: ", + libcfs_nid2str(sd->sd_dst_nid), + (send_case & LOCAL_DST) ? "local" : "routed"); + + switch (send_case) { + /* For all cases where the source is specified, we should always + * use the destination NID, whether it's an MR destination or not, + * since we're continuing a series of related messages for the + * same RPC + */ + case SRC_SPEC_LOCAL_NMR_DST: + return lnet_handle_spec_local_nmr_dst(sd); + case SRC_SPEC_LOCAL_MR_DST: + return lnet_handle_spec_local_mr_dst(sd); + case SRC_SPEC_ROUTER_NMR_DST: + case SRC_SPEC_ROUTER_MR_DST: + return lnet_handle_spec_router_dst(sd); + case SRC_ANY_LOCAL_NMR_DST: + return lnet_handle_any_local_nmr_dst(sd); + case SRC_ANY_LOCAL_MR_DST: + case SRC_ANY_ROUTER_MR_DST: + return lnet_handle_any_mr_dst(sd); + case SRC_ANY_ROUTER_NMR_DST: + return lnet_handle_any_router_nmr_dst(sd); + default: + CERROR("Unknown send case\n"); + return -1; + } +} + +static int +lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, + struct lnet_msg *msg, lnet_nid_t rtr_nid) +{ + struct lnet_peer_ni *lpni; + struct lnet_peer *peer; + struct lnet_send_data send_data; + int cpt, rc; + int md_cpt; + u32 send_case = 0; + + memset(&send_data, 0, sizeof(send_data)); + + /* get an initial CPT to use for locking. The idea here is not to + * serialize the calls to select_pathway, so that as many + * operations can run concurrently as possible. To do that we use + * the CPT where this call is being executed. Later on when we + * determine the CPT to use in lnet_message_commit, we switch the + * lock and check if there was any configuration change. If none, + * then we proceed, if there is, then we restart the operation. + */ + cpt = lnet_net_lock_current(); + + md_cpt = lnet_cpt_of_md(msg->msg_md, msg->msg_offset); + if (md_cpt == CFS_CPT_ANY) + md_cpt = cpt; + +again: + /* If we're being asked to send to the loopback interface, there + * is no need to go through any selection. We can just shortcut + * the entire process and send over lolnd + */ + if (LNET_NETTYP(LNET_NIDNET(dst_nid)) == LOLND) { + /* No send credit hassles with LOLND */ + lnet_ni_addref_locked(the_lnet.ln_loni, cpt); + msg->msg_hdr.dest_nid = cpu_to_le64(the_lnet.ln_loni->ni_nid); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = + cpu_to_le64(the_lnet.ln_loni->ni_nid); + msg->msg_target.nid = the_lnet.ln_loni->ni_nid; + lnet_msg_commit(msg, cpt); + msg->msg_txni = the_lnet.ln_loni; + lnet_net_unlock(cpt); + + return LNET_CREDIT_OK; + } + + /* find an existing peer_ni, or create one and mark it as having been + * created due to network traffic. This call will create the + * peer->peer_net->peer_ni tree. + */ + lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt); + if (IS_ERR(lpni)) { + lnet_net_unlock(cpt); + return PTR_ERR(lpni); + } + + /* Now that we have a peer_ni, check if we want to discover + * the peer. Traffic to the LNET_RESERVED_PORTAL should not + * trigger discovery. + */ + peer = lpni->lpni_peer_net->lpn_peer; + if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { + lnet_nid_t primary_nid; + + rc = lnet_discover_peer_locked(lpni, cpt, false); + if (rc) { + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + return rc; + } + /* The peer may have changed. */ + peer = lpni->lpni_peer_net->lpn_peer; + /* queue message and return */ + msg->msg_src_nid_param = src_nid; + msg->msg_rtr_nid_param = rtr_nid; + msg->msg_sending = 0; + list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + lnet_peer_ni_decref_locked(lpni); + primary_nid = peer->lp_primary_nid; + lnet_net_unlock(cpt); + + CDEBUG(D_NET, "%s pending discovery\n", + libcfs_nid2str(primary_nid)); + + return LNET_DC_WAIT; + } + lnet_peer_ni_decref_locked(lpni); + + /* If peer is not healthy then can not send anything to it */ + if (!lnet_is_peer_healthy_locked(peer)) { + lnet_net_unlock(cpt); + return -EHOSTUNREACH; + } + + /* Identify the different send cases + */ + if (src_nid == LNET_NID_ANY) + send_case |= SRC_ANY; + else + send_case |= SRC_SPEC; + + if (lnet_get_net_locked(LNET_NIDNET(dst_nid))) + send_case |= LOCAL_DST; + else + send_case |= REMOTE_DST; + + if (!lnet_peer_is_multi_rail(peer)) + send_case |= NMR_DST; + else + send_case |= MR_DST; + + if (msg->msg_type == LNET_MSG_REPLY || + msg->msg_type == LNET_MSG_ACK) + send_case |= SND_RESP; + + /* assign parameters to the send_data */ + send_data.sd_msg = msg; + send_data.sd_rtr_nid = rtr_nid; + send_data.sd_src_nid = src_nid; + send_data.sd_dst_nid = dst_nid; + send_data.sd_best_lpni = lpni; + /* keep a pointer to the final destination in case we're going to + * route, so we'll need to access it later + */ + send_data.sd_final_dst_lpni = lpni; + send_data.sd_peer = peer; + send_data.sd_md_cpt = md_cpt; + send_data.sd_cpt = cpt; + send_data.sd_send_case = send_case; + + rc = lnet_handle_send_case_locked(&send_data); + + if (rc == REPEAT_SEND) + goto again; + + lnet_net_unlock(send_data.sd_cpt); return rc; } From patchwork Thu Feb 27 21:09:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89B1E14BC for ; Thu, 27 Feb 2020 21:22:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 71F99246A0 for ; Thu, 27 Feb 2020 21:22:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71F99246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9826B34886C; Thu, 27 Feb 2020 13:20:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB09921FB11 for ; Thu, 27 Feb 2020 13:18:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 21A7BEE4; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2018646F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:04 -0500 Message-Id: <1582838290-17243-77-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 076/622] lnet: add health value per ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add a health value per local network interface. The health value reflects the health of the NI. It is initialized to 1000. 1000 is chosen to be able to granularly decrement the health value on error. If the NI is absolutely not healthy that will be indicated by an LND event, which will flag that the NI is down and should never be used. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: d54afb86116c ("LU-9120 lnet: add health value per ni") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32761 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 15 +++++++++++++++ net/lnet/lnet/api-ni.c | 1 + net/lnet/lnet/lib-move.c | 17 +++++++++++------ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e9560a9..0ed325a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -52,6 +52,12 @@ #define LNET_MAX_IOV (LNET_MAX_PAYLOAD >> PAGE_SHIFT) +/* + * This is the maximum health value. + * All local and peer NIs created have their health default to this value. + */ +#define LNET_MAX_HEALTH_VALUE 1000 + /* forward refs */ struct lnet_libmd; @@ -388,6 +394,15 @@ struct lnet_ni { u32 ni_seq; /* + * health value + * initialized to LNET_MAX_HEALTH_VALUE + * Value is decremented every time we fail to send a message over + * this NI because of a NI specific failure. + * Value is incremented if we successfully send a message. + */ + atomic_t ni_healthv; + + /* * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 8be3354..4e83fa8 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1817,6 +1817,7 @@ static void lnet_push_target_fini(void) atomic_set(&ni->ni_tx_credits, lnet_ni_tq_credits(ni) * ni->ni_ncpts); + atomic_set(&ni->ni_healthv, LNET_MAX_HEALTH_VALUE); CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n", libcfs_nid2str(ni->ni_nid), diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 10aa753..ab32c6f 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1276,6 +1276,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_ni *ni = NULL; unsigned int shortest_distance; int best_credits; + int best_healthv; /* If there is no peer_ni that we can send to on this network, * then there is no point in looking for a new best_ni here. @@ -1286,20 +1287,21 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!best_ni) { shortest_distance = UINT_MAX; best_credits = INT_MIN; + best_healthv = 0; } else { shortest_distance = cfs_cpt_distance(lnet_cpt_table(), md_cpt, best_ni->ni_dev_cpt); best_credits = atomic_read(&best_ni->ni_tx_credits); + best_healthv = atomic_read(&best_ni->ni_healthv); } while ((ni = lnet_get_next_ni_locked(local_net, ni))) { unsigned int distance; int ni_credits; - - if (!lnet_is_ni_healthy_locked(ni)) - continue; + int ni_healthv; ni_credits = atomic_read(&ni->ni_tx_credits); + ni_healthv = atomic_read(&ni->ni_healthv); /* * calculate the distance from the CPT on which @@ -1325,21 +1327,24 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, distance = lnet_numa_range; /* - * Select on shorter distance, then available + * Select on health, shorter distance, available * credits, then round-robin. */ - if (distance > shortest_distance) { + if (ni_healthv < best_healthv) { + continue; + } else if (distance > shortest_distance) { continue; } else if (distance < shortest_distance) { shortest_distance = distance; } else if (ni_credits < best_credits) { continue; } else if (ni_credits == best_credits) { - if (best_ni && (best_ni)->ni_seq <= ni->ni_seq) + if (best_ni && best_ni->ni_seq <= ni->ni_seq) continue; } best_ni = ni; best_credits = ni_credits; + best_healthv = ni_healthv; } CDEBUG(D_NET, "selected best_ni %s\n", From patchwork Thu Feb 27 21:09:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56C1E14BC for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F410246A0 for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F410246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E40F83489A8; Thu, 27 Feb 2020 13:21:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4FC7021FAAE for ; Thu, 27 Feb 2020 13:18:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 24D17EE5; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2362A46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:05 -0500 Message-Id: <1582838290-17243-78-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 077/622] lnet: add lnet_health_sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add lnet_health_senstivity value. This value determines the amount the NI health value is decremented by. The value defaults to 0, which turns off the health feature by default. The user needs to explicitly turn on this feature. The assumption is that many sites will only have one interface in their nodes. In this case the health feature will not increase the resiliency of their system. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 63cf744d0fdf ("LU-9120 lnet: add lnet_health_sensitivity") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32762 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/api-ni.c | 52 +++++++++++++++++++++++++++++++++++++++++++ net/lnet/lnet/lib-move.c | 11 ++++++++- 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 20b4660..5e13d32 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -479,6 +479,7 @@ struct lnet_ni * extern unsigned int lnet_transaction_timeout; extern unsigned int lnet_numa_range; +extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_peer_discovery_disabled; extern int portal_rotor; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 4e83fa8..9d68434 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -78,6 +78,23 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_numa_range, "NUMA range to consider during Multi-Rail selection"); +/* lnet_health_sensitivity determines by how much we decrement the health + * value on sending error. The value defaults to 0, which means health + * checking is turned off by default. + */ +unsigned int lnet_health_sensitivity; +static int sensitivity_set(const char *val, const struct kernel_param *kp); +static struct kernel_param_ops param_ops_health_sensitivity = { + .set = sensitivity_set, + .get = param_get_int, +}; + +#define param_check_health_sensitivity(name, p) \ + __param_check(name, p, int) +module_param(lnet_health_sensitivity, health_sensitivity, 0644); +MODULE_PARM_DESC(lnet_health_sensitivity, + "Value to decrement the health value by on error"); + static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT; static int intf_max_set(const char *val, const struct kernel_param *kp); module_param_call(lnet_interfaces_max, intf_max_set, param_get_int, @@ -115,6 +132,41 @@ static int lnet_discover(struct lnet_process_id id, u32 force, struct lnet_process_id __user *ids, int n_ids); static int +sensitivity_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *sensitivity = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_health_sensitivity'\n"); + return rc; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value == *sensitivity) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *sensitivity = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int discovery_set(const char *val, const struct kernel_param *kp) { int rc; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ab32c6f..38815fd 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1332,6 +1332,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, */ if (ni_healthv < best_healthv) { continue; + } else if (ni_healthv > best_healthv) { + best_healthv = ni_healthv; + /* If we're going to prefer this ni because it's + * the healthiest, then we should set the + * shortest_distance in the algorithm in case + * there are multiple NIs with the same health but + * different distances. + */ + if (distance < shortest_distance) + shortest_distance = distance; } else if (distance > shortest_distance) { continue; } else if (distance < shortest_distance) { @@ -1344,7 +1354,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } best_ni = ni; best_credits = ni_credits; - best_healthv = ni_healthv; } CDEBUG(D_NET, "selected best_ni %s\n", From patchwork Thu Feb 27 21:09:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EDBE159A for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87CA1246A0 for ; Thu, 27 Feb 2020 21:22:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87CA1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0593C34880A; Thu, 27 Feb 2020 13:20:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A52CB21FAAE for ; Thu, 27 Feb 2020 13:18:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 27DA6EE6; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2655C468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:06 -0500 Message-Id: <1582838290-17243-79-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 078/622] lnet: add monitor thread X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Refactored the router checker thread to be the monitor thread. The monitor thread will check router aliveness, expires messages on the active list, recover local and remote NIs and resend messages. In this patch it only checks router aliveness. A deadline on the message is also added to keep track of when this message should expire. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: b01e6fce1c98 ("LU-9120 lnet: add monitor thread") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32763 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 11 ++- include/linux/lnet/lib-types.h | 27 +++---- net/lnet/lnet/api-ni.c | 12 ++-- net/lnet/lnet/lib-move.c | 98 ++++++++++++++++++++++++++ net/lnet/lnet/lib-msg.c | 9 ++- net/lnet/lnet/router.c | 156 +++++++++++++---------------------------- 6 files changed, 185 insertions(+), 128 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 5e13d32..2c3f665 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -714,8 +714,15 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, int lnet_peers_start_down(void); int lnet_peer_buffer_credits(struct lnet_net *net); -int lnet_router_checker_start(void); -void lnet_router_checker_stop(void); +int lnet_monitor_thr_start(void); +void lnet_monitor_thr_stop(void); + +bool lnet_router_checker_active(void); +void lnet_check_routers(void); +int lnet_router_pre_mt_start(void); +void lnet_router_post_mt_start(void); +void lnet_prune_rc_data(int wait_unlink); +void lnet_router_cleanup(void); void lnet_router_ni_update_locked(struct lnet_peer_ni *gw, u32 net); void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 0ed325a..e1a56a1 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -79,6 +79,12 @@ struct lnet_msg { lnet_nid_t msg_src_nid_param; lnet_nid_t msg_rtr_nid_param; + /* + * Deadline for the message after which it will be finalized if it + * has not completed. + */ + ktime_t msg_deadline; + /* committed for sending */ unsigned int msg_tx_committed:1; /* CPT # this message committed for sending */ @@ -905,9 +911,9 @@ struct lnet_msg_container { /* Router Checker states */ enum lnet_rc_state { - LNET_RC_STATE_SHUTDOWN, /* not started */ - LNET_RC_STATE_RUNNING, /* started up OK */ - LNET_RC_STATE_STOPPING, /* telling thread to stop */ + LNET_MT_STATE_SHUTDOWN, /* not started */ + LNET_MT_STATE_RUNNING, /* started up OK */ + LNET_MT_STATE_STOPPING, /* telling thread to stop */ }; /* LNet states */ @@ -1014,8 +1020,8 @@ struct lnet { /* discovery startup/shutdown state */ int ln_dc_state; - /* router checker startup/shutdown state */ - enum lnet_rc_state ln_rc_state; + /* monitor thread startup/shutdown state */ + enum lnet_rc_state ln_mt_state; /* router checker's event queue */ struct lnet_handle_eq ln_rc_eqh; /* rcd still pending on net */ @@ -1023,7 +1029,7 @@ struct lnet { /* rcd ready for free */ struct list_head ln_rcd_zombie; /* serialise startup/shutdown */ - struct completion ln_rc_signal; + struct completion ln_mt_signal; struct mutex ln_api_mutex; struct mutex ln_lnd_mutex; @@ -1053,13 +1059,10 @@ struct lnet { */ bool ln_nis_from_mod_params; - /* - * waitq for router checker. As long as there are no routes in - * the list, the router checker will sleep on this queue. when - * routes are added the thread will wake up + /* waitq for the monitor thread. The monitor thread takes care of + * checking routes, timedout messages and resending messages. */ - wait_queue_head_t ln_rc_waitq; - + wait_queue_head_t ln_mt_waitq; }; #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 9d68434..418d65e 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -309,7 +309,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force, spin_lock_init(&the_lnet.ln_eq_wait_lock); spin_lock_init(&the_lnet.ln_msg_resend_lock); init_waitqueue_head(&the_lnet.ln_eq_waitq); - init_waitqueue_head(&the_lnet.ln_rc_waitq); + init_waitqueue_head(&the_lnet.ln_mt_waitq); mutex_init(&the_lnet.ln_lnd_mutex); } @@ -2281,13 +2281,13 @@ void lnet_lib_exit(void) lnet_ping_target_update(pbuf, ping_mdh); - rc = lnet_router_checker_start(); + rc = lnet_monitor_thr_start(); if (rc) goto err_stop_ping; rc = lnet_push_target_init(); if (rc != 0) - goto err_stop_router_checker; + goto err_stop_monitor_thr; rc = lnet_peer_discovery_start(); if (rc != 0) @@ -2302,8 +2302,8 @@ void lnet_lib_exit(void) err_destroy_push_target: lnet_push_target_fini(); -err_stop_router_checker: - lnet_router_checker_stop(); +err_stop_monitor_thr: + lnet_monitor_thr_stop(); err_stop_ping: lnet_ping_target_fini(); err_acceptor_stop: @@ -2353,7 +2353,7 @@ void lnet_lib_exit(void) lnet_router_debugfs_fini(); lnet_peer_discovery_stop(); lnet_push_target_fini(); - lnet_router_checker_stop(); + lnet_monitor_thr_stop(); lnet_ping_target_fini(); /* Teardown fns that use my own API functions BEFORE here */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 38815fd..418e3ad 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -818,6 +818,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } } + /* unset the tx_delay flag as we're going to send it now */ + msg->msg_tx_delayed = 0; + if (do_send) { lnet_net_unlock(cpt); lnet_ni_send(ni, msg); @@ -914,6 +917,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, msg->msg_niov = rbp->rbp_npages; msg->msg_kiov = &rb->rb_kiov[0]; + /* unset the msg-rx_delayed flag since we're receiving the message */ + msg->msg_rx_delayed = 0; + if (do_recv) { int cpt = msg->msg_rx_cpt; @@ -2383,6 +2389,98 @@ struct lnet_ni * return 0; } +static int +lnet_monitor_thread(void *arg) +{ + /* The monitor thread takes care of the following: + * 1. Checks the aliveness of routers + * 2. Checks if there are messages on the resend queue to resend + * them. + * 3. Check if there are any NIs on the local recovery queue and + * pings them + * 4. Checks if there are any NIs on the remote recovery queue + * and pings them. + */ + while (the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING) { + if (lnet_router_checker_active()) + lnet_check_routers(); + + /* TODO do we need to check if we should sleep without + * timeout? Technically, an active system will always + * have messages in flight so this check will always + * evaluate to false. And on an idle system do we care + * if we wake up every 1 second? Although, we've seen + * cases where we get a complaint that an idle thread + * is waking up unnecessarily. + */ + wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, + false, HZ); + } + + /* clean up the router checker */ + lnet_prune_rc_data(1); + + /* Shutting down */ + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + + /* signal that the monitor thread is exiting */ + complete(&the_lnet.ln_mt_signal); + + return 0; +} + +int lnet_monitor_thr_start(void) +{ + int rc; + struct task_struct *task; + + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + + init_completion(&the_lnet.ln_mt_signal); + + /* Pre monitor thread start processing */ + rc = lnet_router_pre_mt_start(); + if (!rc) + return rc; + + the_lnet.ln_mt_state = LNET_MT_STATE_RUNNING; + task = kthread_run(lnet_monitor_thread, NULL, "monitor_thread"); + if (IS_ERR(task)) { + rc = PTR_ERR(task); + CERROR("Can't start monitor thread: %d\n", rc); + /* block until event callback signals exit */ + wait_for_completion(&the_lnet.ln_mt_signal); + + /* clean up */ + lnet_router_cleanup(); + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + return -ENOMEM; + } + + /* post monitor thread start processing */ + lnet_router_post_mt_start(); + + return 0; +} + +void lnet_monitor_thr_stop(void) +{ + if (the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) + return; + + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); + the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + + /* tell the monitor thread that we're shutting down */ + wake_up(&the_lnet.ln_mt_waitq); + + /* block until monitor thread signals that it's done */ + wait_for_completion(&the_lnet.ln_mt_signal); + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + + lnet_router_cleanup(); +} + void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, u32 msg_type) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index a7062f6..7869b96 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -141,13 +141,17 @@ { struct lnet_msg_container *container = the_lnet.ln_msg_containers[cpt]; struct lnet_counters *counters = the_lnet.ln_counters[cpt]; + s64 timeout_ns; + + /* set the message deadline */ + timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; + msg->msg_deadline = ktime_add_ns(ktime_get(), timeout_ns); /* routed message can be committed for both receiving and sending */ LASSERT(!msg->msg_tx_committed); if (msg->msg_sending) { LASSERT(!msg->msg_receiving); - msg->msg_tx_cpt = cpt; msg->msg_tx_committed = 1; if (msg->msg_rx_committed) { /* routed message REPLY */ @@ -161,8 +165,9 @@ } LASSERT(!msg->msg_onactivelist); + msg->msg_onactivelist = 1; - list_add(&msg->msg_activelist, &container->msc_active); + list_add_tail(&msg->msg_activelist, &container->msc_active); counters->msgs_alloc++; if (counters->msgs_alloc > counters->msgs_max) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 278807d..3f9d8c5 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -70,9 +70,6 @@ return net->net_tunables.lct_peer_tx_credits; } -/* forward ref's */ -static int lnet_router_checker(void *); - static int check_routers_before_use; module_param(check_routers_before_use, int, 0444); MODULE_PARM_DESC(check_routers_before_use, "Assume routers are down and ping them before use"); @@ -423,8 +420,8 @@ static void lnet_shuffle_seed(void) if (rnet != rnet2) kfree(rnet); - /* indicate to startup the router checker if configured */ - wake_up(&the_lnet.ln_rc_waitq); + /* kick start the monitor thread to handle the added route */ + wake_up(&the_lnet.ln_mt_waitq); return rc; } @@ -809,7 +806,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) struct lnet_peer_ni *rtr; int all_known; - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); + LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); for (;;) { int cpt = lnet_net_lock_current(); @@ -1038,7 +1035,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_ni_notify_locked(ni, rtr); if (!lnet_isrouter(rtr) || - the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) { + the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { /* router table changed or router checker is shutting down */ lnet_peer_ni_decref_locked(rtr); return; @@ -1092,14 +1089,9 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_peer_ni_decref_locked(rtr); } -int -lnet_router_checker_start(void) +int lnet_router_pre_mt_start(void) { - struct task_struct *task; int rc; - int eqsz = 0; - - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN); if (check_routers_before_use && dead_router_check_interval <= 0) { @@ -1107,27 +1099,17 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) return -EINVAL; } - init_completion(&the_lnet.ln_rc_signal); - rc = LNetEQAlloc(0, lnet_router_checker_event, &the_lnet.ln_rc_eqh); if (rc) { - CERROR("Can't allocate EQ(%d): %d\n", eqsz, rc); + CERROR("Can't allocate EQ(0): %d\n", rc); return -ENOMEM; } - the_lnet.ln_rc_state = LNET_RC_STATE_RUNNING; - task = kthread_run(lnet_router_checker, NULL, "router_checker"); - if (IS_ERR(task)) { - rc = PTR_ERR(task); - CERROR("Can't start router checker thread: %d\n", rc); - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_rc_signal); - rc = LNetEQFree(the_lnet.ln_rc_eqh); - LASSERT(!rc); - the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN; - return -ENOMEM; - } + return 0; +} +void lnet_router_post_mt_start(void) +{ if (check_routers_before_use) { /* * Note that a helpful side-effect of pinging all known routers @@ -1136,33 +1118,17 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) */ lnet_wait_known_routerstate(); } - - return 0; } -void -lnet_router_checker_stop(void) +void lnet_router_cleanup(void) { int rc; - if (the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN) - return; - - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); - the_lnet.ln_rc_state = LNET_RC_STATE_STOPPING; - /* wakeup the RC thread if it's sleeping */ - wake_up(&the_lnet.ln_rc_waitq); - - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_rc_signal); - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN); - rc = LNetEQFree(the_lnet.ln_rc_eqh); - LASSERT(!rc); + LASSERT(rc == 0); } -static void -lnet_prune_rc_data(int wait_unlink) +void lnet_prune_rc_data(int wait_unlink) { struct lnet_rc_data *rcd; struct lnet_rc_data *tmp; @@ -1170,7 +1136,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) struct list_head head; int i = 2; - if (likely(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING && + if (likely(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING && list_empty(&the_lnet.ln_rcd_deathrow) && list_empty(&the_lnet.ln_rcd_zombie))) return; @@ -1179,7 +1145,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_net_lock(LNET_LOCK_EX); - if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) { + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { /* router checker is stopping, prune all */ list_for_each_entry(lp, &the_lnet.ln_routers, lpni_rtr_list) { @@ -1242,18 +1208,12 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) } /* - * This function is called to check if the RC should block indefinitely. - * It's called from lnet_router_checker() as well as being passed to - * wait_event_interruptible() to avoid the lost wake_up problem. - * - * When it's called from wait_event_interruptible() it is necessary to - * also not sleep if the rc state is not running to avoid a deadlock - * when the system is shutting down + * This function is called from the monitor thread to check if there are + * any active routers that need to be checked. */ -static inline bool -lnet_router_checker_active(void) +bool lnet_router_checker_active(void) { - if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) return true; /* @@ -1263,70 +1223,54 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) if (the_lnet.ln_routing) return true; + /* if there are routers that need to be cleaned up then do so */ + if (!list_empty(&the_lnet.ln_rcd_deathrow) || + !list_empty(&the_lnet.ln_rcd_zombie)) + return true; + return !list_empty(&the_lnet.ln_routers) && (live_router_check_interval > 0 || dead_router_check_interval > 0); } -static int -lnet_router_checker(void *arg) +void +lnet_check_routers(void) { struct lnet_peer_ni *rtr; + u64 version; + int cpt; + int cpt2; - while (the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING) { - u64 version; - int cpt; - int cpt2; - - cpt = lnet_net_lock_current(); + cpt = lnet_net_lock_current(); rescan: - version = the_lnet.ln_routers_version; + version = the_lnet.ln_routers_version; - list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { - cpt2 = rtr->lpni_cpt; - if (cpt != cpt2) { - lnet_net_unlock(cpt); - cpt = cpt2; - lnet_net_lock(cpt); - /* the routers list has changed */ - if (version != the_lnet.ln_routers_version) - goto rescan; - } - - lnet_ping_router_locked(rtr); - - /* NB dropped lock */ - if (version != the_lnet.ln_routers_version) { - /* the routers list has changed */ + list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { + cpt2 = rtr->lpni_cpt; + if (cpt != cpt2) { + lnet_net_unlock(cpt); + cpt = cpt2; + lnet_net_lock(cpt); + /* the routers list has changed */ + if (version != the_lnet.ln_routers_version) goto rescan; - } } - if (the_lnet.ln_routing) - lnet_update_ni_status_locked(); - - lnet_net_unlock(cpt); - - lnet_prune_rc_data(0); /* don't wait for UNLINK */ + lnet_ping_router_locked(rtr); - /* - * if there are any routes then wakeup every second. If - * there are no routes then sleep indefinitely until woken - * up by a user adding a route - */ - if (!lnet_router_checker_active()) - wait_event_idle(the_lnet.ln_rc_waitq, - lnet_router_checker_active()); - else - schedule_timeout_idle(HZ); + /* NB dropped lock */ + if (version != the_lnet.ln_routers_version) { + /* the routers list has changed */ + goto rescan; + } } - lnet_prune_rc_data(1); /* wait for UNLINK */ + if (the_lnet.ln_routing) + lnet_update_ni_status_locked(); - the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN; - complete(&the_lnet.ln_rc_signal); - /* The unlink event callback will signal final completion */ - return 0; + lnet_net_unlock(cpt); + + lnet_prune_rc_data(0); /* don't wait for UNLINK */ } void From patchwork Thu Feb 27 21:09:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B040159A for ; Thu, 27 Feb 2020 21:23:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 039C3246A0 for ; Thu, 27 Feb 2020 21:23:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 039C3246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E21B3489E6; Thu, 27 Feb 2020 13:21:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0810921FAF2 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2C9D2EEB; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2969A46D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:07 -0500 Message-Id: <1582838290-17243-80-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 079/622] lnet: handle local ni failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added an enumerated type listing the different errors which the LND can propagate up to LNet for further handling. All local timeout errors will trigger a resend if the system is configured for resends. Remote errors will not trigger a resend to avoid creating duplicate message scenario on the receiving end. If a transmit error is encountered where we're sure the message wasn't received by the remote end we will attempt a resend. LNet level logic to handle local NI failure. When the LND finalizes a message lnet_finalize() will check if the message completed successfully, if so it increments the healthv of the local NI, but not beyond the max, and if it failed then it'll decrement the healthv but not below 0 and put the message on the resend queue. On local NI failure the local NI is placed on a recovery queue. The monitor thread will wake up and resend all the messages pending. The selection algorithm will properly select the local and remote NIs based on the new healthv. The monitor thread will ping each NI on the local recovery queue. On reply it will check if the NIs healthv is back to maximum, if it is then it will remove it from the recovery queue, otherwise it'll keep it there until it's fully recovered. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 70616605dd44 ("LU-9120 lnet: handle local ni failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32764 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/api.h | 3 +- include/linux/lnet/lib-lnet.h | 3 + include/linux/lnet/lib-types.h | 54 +++-- net/lnet/lnet/api-ni.c | 30 ++- net/lnet/lnet/config.c | 3 +- net/lnet/lnet/lib-move.c | 516 +++++++++++++++++++++++++++++++++++++++-- net/lnet/lnet/lib-msg.c | 281 +++++++++++++++++++++- net/lnet/lnet/peer.c | 57 ++--- net/lnet/lnet/router.c | 2 +- net/lnet/selftest/rpc.c | 2 +- 10 files changed, 862 insertions(+), 89 deletions(-) diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index 7cc1d04..a57ecc8 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -195,7 +195,8 @@ int LNetGet(lnet_nid_t self, struct lnet_process_id target_in, unsigned int portal_in, u64 match_bits_in, - unsigned int offset_in); + unsigned int offset_in, + bool recovery); /** @} lnet_data */ /** \defgroup lnet_misc Miscellaneous operations. diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 2c3f665..965fc5f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -536,6 +536,8 @@ void lnet_prep_send(struct lnet_msg *msg, int type, struct lnet_process_id target, unsigned int offset, unsigned int len); int lnet_send(lnet_nid_t nid, struct lnet_msg *msg, lnet_nid_t rtr_nid); +int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis, + void *user_ptr, struct lnet_handle_eq eqh, bool recovery); void lnet_return_tx_credits_locked(struct lnet_msg *msg); void lnet_return_rx_credits_locked(struct lnet_msg *msg); void lnet_schedule_blocked_locked(struct lnet_rtrbufpool *rbp); @@ -623,6 +625,7 @@ void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, void lnet_msg_containers_destroy(void); int lnet_msg_containers_create(void); +char *lnet_health_error2str(enum lnet_msg_hstatus hstatus); char *lnet_msgtyp2str(int type); void lnet_print_hdr(struct lnet_hdr *hdr); int lnet_fail_nid(lnet_nid_t nid, unsigned int threshold); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e1a56a1..8c3bf34 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -61,6 +61,20 @@ /* forward refs */ struct lnet_libmd; +enum lnet_msg_hstatus { + LNET_MSG_STATUS_OK = 0, + LNET_MSG_STATUS_LOCAL_INTERRUPT, + LNET_MSG_STATUS_LOCAL_DROPPED, + LNET_MSG_STATUS_LOCAL_ABORTED, + LNET_MSG_STATUS_LOCAL_NO_ROUTE, + LNET_MSG_STATUS_LOCAL_ERROR, + LNET_MSG_STATUS_LOCAL_TIMEOUT, + LNET_MSG_STATUS_REMOTE_ERROR, + LNET_MSG_STATUS_REMOTE_DROPPED, + LNET_MSG_STATUS_REMOTE_TIMEOUT, + LNET_MSG_STATUS_NETWORK_TIMEOUT +}; + struct lnet_msg { struct list_head msg_activelist; struct list_head msg_list; /* Q for credits/MD */ @@ -85,6 +99,13 @@ struct lnet_msg { */ ktime_t msg_deadline; + /* The message health status. */ + enum lnet_msg_hstatus msg_health_status; + /* This is a recovery message */ + bool msg_recovery; + /* flag to indicate that we do not want to resend this message */ + bool msg_no_resend; + /* committed for sending */ unsigned int msg_tx_committed:1; /* CPT # this message committed for sending */ @@ -277,18 +298,11 @@ struct lnet_tx_queue { struct list_head tq_delayed; /* delayed TXs */ }; -enum lnet_ni_state { - /* set when NI block is allocated */ - LNET_NI_STATE_INIT = 0, - /* set when NI is started successfully */ - LNET_NI_STATE_ACTIVE, - /* set when LND notifies NI failed */ - LNET_NI_STATE_FAILED, - /* set when LND notifies NI degraded */ - LNET_NI_STATE_DEGRADED, - /* set when shuttding down NI */ - LNET_NI_STATE_DELETING -}; +#define LNET_NI_STATE_INIT (1 << 0) +#define LNET_NI_STATE_ACTIVE (1 << 1) +#define LNET_NI_STATE_FAILED (1 << 2) +#define LNET_NI_STATE_RECOVERY_PENDING (1 << 3) +#define LNET_NI_STATE_DELETING (1 << 4) enum lnet_stats_type { LNET_STATS_TYPE_SEND = 0, @@ -351,6 +365,12 @@ struct lnet_ni { /* chain on the lnet_net structure */ struct list_head ni_netlist; + /* chain on the recovery queue */ + struct list_head ni_recovery; + + /* MD handle for recovery ping */ + struct lnet_handle_md ni_ping_mdh; + /* number of CPTs */ int ni_ncpts; @@ -382,7 +402,7 @@ struct lnet_ni { struct lnet_ni_status *ni_status; /* NI FSM */ - enum lnet_ni_state ni_state; + u32 ni_state; /* per NI LND tunables */ struct lnet_lnd_tunables ni_lnd_tunables; @@ -1063,6 +1083,14 @@ struct lnet { * checking routes, timedout messages and resending messages. */ wait_queue_head_t ln_mt_waitq; + + /* per-cpt resend queues */ + struct list_head **ln_mt_resendqs; + /* local NIs to recover */ + struct list_head ln_mt_localNIRecovq; + /* recovery eq handler */ + struct lnet_handle_eq ln_mt_eqh; + }; #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 418d65e..deef404 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -831,6 +831,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_dc_request); INIT_LIST_HEAD(&the_lnet.ln_dc_working); INIT_LIST_HEAD(&the_lnet.ln_dc_expired); + INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); rc = lnet_descriptor_setup(); @@ -1072,8 +1073,7 @@ struct lnet_net * bool lnet_is_ni_healthy_locked(struct lnet_ni *ni) { - if (ni->ni_state == LNET_NI_STATE_ACTIVE || - ni->ni_state == LNET_NI_STATE_DEGRADED) + if (ni->ni_state & LNET_NI_STATE_ACTIVE) return true; return false; @@ -1650,7 +1650,7 @@ static void lnet_push_target_fini(void) list_del_init(&ni->ni_netlist); /* the ni should be in deleting state. If it's not it's * a bug */ - LASSERT(ni->ni_state == LNET_NI_STATE_DELETING); + LASSERT(ni->ni_state & LNET_NI_STATE_DELETING); cfs_percpt_for_each(ref, j, ni->ni_refs) { if (!*ref) continue; @@ -1697,7 +1697,10 @@ static void lnet_push_target_fini(void) struct lnet_net *net = ni->ni_net; lnet_net_lock(LNET_LOCK_EX); - ni->ni_state = LNET_NI_STATE_DELETING; + lnet_ni_lock(ni); + ni->ni_state |= LNET_NI_STATE_DELETING; + ni->ni_state &= ~LNET_NI_STATE_ACTIVE; + lnet_ni_unlock(ni); lnet_ni_unlink_locked(ni); lnet_incr_dlc_seq(); lnet_net_unlock(LNET_LOCK_EX); @@ -1789,6 +1792,7 @@ static void lnet_push_target_fini(void) list_for_each_entry_safe(msg, tmp, &resend, msg_list) { list_del_init(&msg->msg_list); + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); } @@ -1827,7 +1831,10 @@ static void lnet_push_target_fini(void) goto failed0; } - ni->ni_state = LNET_NI_STATE_ACTIVE; + lnet_ni_lock(ni); + ni->ni_state |= LNET_NI_STATE_ACTIVE; + ni->ni_state &= ~LNET_NI_STATE_INIT; + lnet_ni_unlock(ni); /* We keep a reference on the loopback net through the loopback NI */ if (net->net_lnd->lnd_type == LOLND) { @@ -2554,11 +2561,17 @@ struct lnet_ni * struct lnet_ni *ni; struct lnet_net *net = mynet; + /* It is possible that the net has been cleaned out while there is + * a message being sent. This function accessed the net without + * checking if the list is empty + */ if (!prev) { if (!net) net = list_first_entry(&the_lnet.ln_nets, struct lnet_net, net_list); + if (list_empty(&net->net_ni_list)) + return NULL; ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist); @@ -2580,6 +2593,8 @@ struct lnet_ni * /* get the next net */ net = list_first_entry(&prev->ni_net->net_list, struct lnet_net, net_list); + if (list_empty(&net->net_ni_list)) + return NULL; /* get the ni on it */ ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist); @@ -2587,6 +2602,9 @@ struct lnet_ni * return ni; } + if (list_empty(&prev->ni_netlist)) + return NULL; + /* there are more nis left */ ni = list_first_entry(&prev->ni_netlist, struct lnet_ni, ni_netlist); @@ -3571,7 +3589,7 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, rc = LNetGet(LNET_NID_ANY, mdh, id, LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); + LNET_PROTO_PING_MATCHBITS, 0, false); if (rc) { /* Don't CERROR; this could be deliberate! */ rc2 = LNetMDUnlink(mdh); diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 0560215..ea62d36 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -442,6 +442,7 @@ struct lnet_net * spin_lock_init(&ni->ni_lock); INIT_LIST_HEAD(&ni->ni_netlist); + INIT_LIST_HEAD(&ni->ni_recovery); ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*ni->ni_refs[0])); if (!ni->ni_refs) @@ -466,7 +467,7 @@ struct lnet_net * ni->ni_net_ns = NULL; ni->ni_last_alive = ktime_get_real_seconds(); - ni->ni_state = LNET_NI_STATE_INIT; + ni->ni_state |= LNET_NI_STATE_INIT; list_add_tail(&ni->ni_netlist, &net->net_ni_added); /* diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 418e3ad..f3f4b84 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -579,8 +579,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, (msg->msg_txcredit && msg->msg_peertxcredit)); rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg); - if (rc < 0) + if (rc < 0) { + msg->msg_no_resend = true; lnet_finalize(msg, rc); + } } static int @@ -759,8 +761,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); - if (do_send) + if (do_send) { + msg->msg_health_status = LNET_MSG_STATUS_LOCAL_DROPPED; lnet_finalize(msg, -EHOSTUNREACH); + } lnet_net_lock(cpt); return -EHOSTUNREACH; @@ -772,8 +776,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Aborting message for %s: LNetM[DE]Unlink() already called on the MD/ME.\n", libcfs_id2str(msg->msg_target)); - if (do_send) + if (do_send) { + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); + } lnet_net_lock(cpt); return -ECANCELED; @@ -1059,6 +1065,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lnet_ni_recv(msg->msg_rxni, msg->msg_private, NULL, 0, 0, 0, msg->msg_hdr.payload_length); list_del_init(&msg->msg_list); + msg->msg_no_resend = true; lnet_finalize(msg, -ECANCELED); } @@ -2273,6 +2280,14 @@ struct lnet_ni * return PTR_ERR(lpni); } + /* Cache the original src_nid. If we need to resend the message + * then we'll need to know whether the src_nid was originally + * specified for this message. If it was originally specified, + * then we need to keep using the same src_nid since it's + * continuing the same sequence of messages. + */ + msg->msg_src_nid_param = src_nid; + /* Now that we have a peer_ni, check if we want to discover * the peer. Traffic to the LNET_RESERVED_PORTAL should not * trigger discovery. @@ -2290,7 +2305,6 @@ struct lnet_ni * /* The peer may have changed. */ peer = lpni->lpni_peer_net->lpn_peer; /* queue message and return */ - msg->msg_src_nid_param = src_nid; msg->msg_rtr_nid_param = rtr_nid; msg->msg_sending = 0; list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); @@ -2323,7 +2337,11 @@ struct lnet_ni * else send_case |= REMOTE_DST; - if (!lnet_peer_is_multi_rail(peer)) + /* if this is a non-MR peer or if we're recovering a peer ni then + * let's consider this an NMR case so we can hit the destination + * NID. + */ + if (!lnet_peer_is_multi_rail(peer) || msg->msg_recovery) send_case |= NMR_DST; else send_case |= MR_DST; @@ -2370,6 +2388,7 @@ struct lnet_ni * */ /* NB: !ni == interface pre-determined (ACK/REPLY) */ LASSERT(!msg->msg_txpeer); + LASSERT(!msg->msg_txni); LASSERT(!msg->msg_sending); LASSERT(!msg->msg_target_is_router); LASSERT(!msg->msg_receiving); @@ -2389,6 +2408,314 @@ struct lnet_ni * return 0; } +static void +lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) +{ + struct lnet_msg *msg; + + while (!list_empty(resendq)) { + struct lnet_peer_ni *lpni; + + msg = list_entry(resendq->next, struct lnet_msg, + msg_list); + + list_del_init(&msg->msg_list); + + lpni = lnet_find_peer_ni_locked(msg->msg_hdr.dest_nid); + if (!lpni) { + lnet_net_unlock(cpt); + CERROR("Expected that a peer is already created for %s\n", + libcfs_nid2str(msg->msg_hdr.dest_nid)); + msg->msg_no_resend = true; + lnet_finalize(msg, -EFAULT); + lnet_net_lock(cpt); + } else { + struct lnet_peer *peer; + int rc; + lnet_nid_t src_nid = LNET_NID_ANY; + + /* if this message is not being routed and the + * peer is non-MR then we must use the same + * src_nid that was used in the original send. + * Otherwise if we're routing the message (IE + * we're a router) then we can use any of our + * local interfaces. It doesn't matter to the + * final destination. + */ + peer = lpni->lpni_peer_net->lpn_peer; + if (!msg->msg_routing && + !lnet_peer_is_multi_rail(peer)) + src_nid = le64_to_cpu(msg->msg_hdr.src_nid); + + /* If we originally specified a src NID, then we + * must attempt to reuse it in the resend as well. + */ + if (msg->msg_src_nid_param != LNET_NID_ANY) + src_nid = msg->msg_src_nid_param; + lnet_peer_ni_decref_locked(lpni); + + lnet_net_unlock(cpt); + rc = lnet_send(src_nid, msg, LNET_NID_ANY); + if (rc) { + CERROR("Error sending %s to %s: %d\n", + lnet_msgtyp2str(msg->msg_type), + libcfs_id2str(msg->msg_target), rc); + msg->msg_no_resend = true; + lnet_finalize(msg, rc); + } + lnet_net_lock(cpt); + } + } +} + +static void +lnet_resend_pending_msgs(void) +{ + int i; + + cfs_cpt_for_each(i, lnet_cpt_table()) { + lnet_net_lock(i); + lnet_resend_pending_msgs_locked(the_lnet.ln_mt_resendqs[i], i); + lnet_net_unlock(i); + } +} + +/* called with cpt and ni_lock held */ +static void +lnet_unlink_ni_recovery_mdh_locked(struct lnet_ni *ni, int cpt) +{ + struct lnet_handle_md recovery_mdh; + + LNetInvalidateMDHandle(&recovery_mdh); + + if (ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING) { + recovery_mdh = ni->ni_ping_mdh; + LNetInvalidateMDHandle(&ni->ni_ping_mdh); + } + lnet_ni_unlock(ni); + lnet_net_unlock(cpt); + if (!LNetMDHandleIsInvalid(recovery_mdh)) + LNetMDUnlink(recovery_mdh); + lnet_net_lock(cpt); + lnet_ni_lock(ni); +} + +static void +lnet_recover_local_nis(void) +{ + struct list_head processed_list; + struct list_head local_queue; + struct lnet_handle_md mdh; + struct lnet_ni *tmp; + struct lnet_ni *ni; + lnet_nid_t nid; + int healthv; + int rc; + + INIT_LIST_HEAD(&local_queue); + INIT_LIST_HEAD(&processed_list); + + /* splice the recovery queue on a local queue. We will iterate + * through the local queue and update it as needed. Once we're + * done with the traversal, we'll splice the local queue back on + * the head of the ln_mt_localNIRecovq. Any newly added local NIs + * will be traversed in the next iteration. + */ + lnet_net_lock(0); + list_splice_init(&the_lnet.ln_mt_localNIRecovq, + &local_queue); + lnet_net_unlock(0); + + list_for_each_entry_safe(ni, tmp, &local_queue, ni_recovery) { + /* if an NI is being deleted or it is now healthy, there + * is no need to keep it around in the recovery queue. + * The monitor thread is the only thread responsible for + * removing the NI from the recovery queue. + * Multiple threads can be adding NIs to the recovery + * queue. + */ + healthv = atomic_read(&ni->ni_healthv); + + lnet_net_lock(0); + lnet_ni_lock(ni); + if (!(ni->ni_state & LNET_NI_STATE_ACTIVE) || + healthv == LNET_MAX_HEALTH_VALUE) { + list_del_init(&ni->ni_recovery); + lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_ni_unlock(ni); + lnet_ni_decref_locked(ni, 0); + lnet_net_unlock(0); + continue; + } + lnet_ni_unlock(ni); + lnet_net_unlock(0); + + /* protect the ni->ni_state field. Once we call the + * lnet_send_ping function it's possible we receive + * a response before we check the rc. The lock ensures + * a stable value for the ni_state RECOVERY_PENDING bit + */ + lnet_ni_lock(ni); + if (!(ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING)) { + ni->ni_state |= LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + mdh = ni->ni_ping_mdh; + /* Invalidate the ni mdh in case it's deleted. + * We'll unlink the mdh in this case below. + */ + LNetInvalidateMDHandle(&ni->ni_ping_mdh); + nid = ni->ni_nid; + + /* remove the NI from the local queue and drop the + * reference count to it while we're recovering + * it. The reason for that, is that the NI could + * be deleted, and the way the code is structured + * is if we don't drop the NI, then the deletion + * code will enter a loop waiting for the + * reference count to be removed while holding the + * ln_mutex_lock(). When we look up the peer to + * send to in lnet_select_pathway() we will try to + * lock the ln_mutex_lock() as well, leading to + * a deadlock. By dropping the refcount and + * removing it from the list, we allow for the NI + * to be removed, then we use the cached NID to + * look it up again. If it's gone, then we just + * continue examining the rest of the queue. + */ + lnet_net_lock(0); + list_del_init(&ni->ni_recovery); + lnet_ni_decref_locked(ni, 0); + lnet_net_unlock(0); + + rc = lnet_send_ping(nid, &mdh, + LNET_INTERFACES_MIN, (void *)nid, + the_lnet.ln_mt_eqh, true); + /* lookup the nid again */ + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + /* the NI has been deleted when we dropped + * the ref count + */ + lnet_net_unlock(0); + LNetMDUnlink(mdh); + continue; + } + /* Same note as in lnet_recover_peer_nis(). When + * we're sending the ping, the NI is free to be + * deleted or manipulated. By this point it + * could've been added back on the recovery queue, + * and a refcount taken on it. + * So we can't just add it blindly again or we'll + * corrupt the queue. We must check under lock if + * it's not on any list and if not then add it + * to the processed list, which will eventually be + * spliced back on to the recovery queue. + */ + ni->ni_ping_mdh = mdh; + if (list_empty(&ni->ni_recovery)) { + list_add_tail(&ni->ni_recovery, + &processed_list); + lnet_ni_addref_locked(ni, 0); + } + lnet_net_unlock(0); + + lnet_ni_lock(ni); + if (rc) + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + } + lnet_ni_unlock(ni); + } + + /* put back the remaining NIs on the ln_mt_localNIRecovq to be + * reexamined in the next iteration. + */ + list_splice_init(&processed_list, &local_queue); + lnet_net_lock(0); + list_splice(&local_queue, &the_lnet.ln_mt_localNIRecovq); + lnet_net_unlock(0); +} + +static struct list_head ** +lnet_create_array_of_queues(void) +{ + struct list_head **qs; + struct list_head *q; + int i; + + qs = cfs_percpt_alloc(lnet_cpt_table(), + sizeof(struct list_head)); + if (!qs) { + CERROR("Failed to allocate queues\n"); + return NULL; + } + + cfs_percpt_for_each(q, i, qs) + INIT_LIST_HEAD(q); + + return qs; +} + +static int +lnet_resendqs_create(void) +{ + struct list_head **resendqs; + + resendqs = lnet_create_array_of_queues(); + if (!resendqs) + return -ENOMEM; + + lnet_net_lock(LNET_LOCK_EX); + the_lnet.ln_mt_resendqs = resendqs; + lnet_net_unlock(LNET_LOCK_EX); + + return 0; +} + +static void +lnet_clean_local_ni_recoveryq(void) +{ + struct lnet_ni *ni; + + /* This is only called when the monitor thread has stopped */ + lnet_net_lock(0); + + while (!list_empty(&the_lnet.ln_mt_localNIRecovq)) { + ni = list_entry(the_lnet.ln_mt_localNIRecovq.next, + struct lnet_ni, ni_recovery); + list_del_init(&ni->ni_recovery); + lnet_ni_lock(ni); + lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_ni_unlock(ni); + lnet_ni_decref_locked(ni, 0); + } + + lnet_net_unlock(0); +} + +static void +lnet_clean_resendqs(void) +{ + struct lnet_msg *msg, *tmp; + struct list_head msgs; + int i; + + INIT_LIST_HEAD(&msgs); + + cfs_cpt_for_each(i, lnet_cpt_table()) { + lnet_net_lock(i); + list_splice_init(the_lnet.ln_mt_resendqs[i], &msgs); + lnet_net_unlock(i); + list_for_each_entry_safe(msg, tmp, &msgs, msg_list) { + list_del_init(&msg->msg_list); + msg->msg_no_resend = true; + lnet_finalize(msg, -ESHUTDOWN); + } + } + + cfs_percpt_free(the_lnet.ln_mt_resendqs); +} + static int lnet_monitor_thread(void *arg) { @@ -2405,6 +2732,10 @@ struct lnet_ni * if (lnet_router_checker_active()) lnet_check_routers(); + lnet_resend_pending_msgs(); + + lnet_recover_local_nis(); + /* TODO do we need to check if we should sleep without * timeout? Technically, an active system will always * have messages in flight so this check will always @@ -2429,42 +2760,180 @@ struct lnet_ni * return 0; } -int lnet_monitor_thr_start(void) +/* lnet_send_ping + * Sends a ping. + * Returns == 0 if success + * Returns > 0 if LNetMDBind or prior fails + * Returns < 0 if LNetGet fails + */ +int +lnet_send_ping(lnet_nid_t dest_nid, + struct lnet_handle_md *mdh, int nnis, + void *user_data, struct lnet_handle_eq eqh, bool recovery) { + struct lnet_md md = { NULL }; + struct lnet_process_id id; + struct lnet_ping_buffer *pbuf; int rc; + + if (dest_nid == LNET_NID_ANY) { + rc = -EHOSTUNREACH; + goto fail_error; + } + + pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); + if (!pbuf) { + rc = ENOMEM; + goto fail_error; + } + + /* initialize md content */ + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(nnis); + md.threshold = 2; /* GET/REPLY */ + md.max_size = 0; + md.options = LNET_MD_TRUNCATE; + md.user_ptr = user_data; + md.eq_handle = eqh; + + rc = LNetMDBind(md, LNET_UNLINK, mdh); + if (rc) { + lnet_ping_buffer_decref(pbuf); + CERROR("Can't bind MD: %d\n", rc); + rc = -rc; /* change the rc to positive */ + goto fail_error; + } + id.pid = LNET_PID_LUSTRE; + id.nid = dest_nid; + + rc = LNetGet(LNET_NID_ANY, *mdh, id, + LNET_RESERVED_PORTAL, + LNET_PROTO_PING_MATCHBITS, 0, recovery); + if (rc) + goto fail_unlink_md; + + return 0; + +fail_unlink_md: + LNetMDUnlink(*mdh); + LNetInvalidateMDHandle(mdh); +fail_error: + return rc; +} + +static void +lnet_mt_event_handler(struct lnet_event *event) +{ + lnet_nid_t nid = (lnet_nid_t)event->md.user_ptr; + struct lnet_ni *ni; + struct lnet_ping_buffer *pbuf; + + /* TODO: remove assert */ + LASSERT(event->type == LNET_EVENT_REPLY || + event->type == LNET_EVENT_SEND || + event->type == LNET_EVENT_UNLINK); + + CDEBUG(D_NET, "Received event: %d status: %d\n", event->type, + event->status); + + switch (event->type) { + case LNET_EVENT_REPLY: + /* If the NI has been restored completely then remove from + * the recovery queue + */ + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + lnet_net_unlock(0); + break; + } + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + lnet_net_unlock(0); + break; + case LNET_EVENT_SEND: + CDEBUG(D_NET, "%s recovery message sent %s:%d\n", + libcfs_nid2str(nid), + (event->status) ? "unsuccessfully" : + "successfully", event->status); + break; + case LNET_EVENT_UNLINK: + /* nothing to do */ + CDEBUG(D_NET, "%s recovery ping unlinked\n", + libcfs_nid2str(nid)); + break; + default: + CERROR("Unexpected event: %d\n", event->type); + return; + } + if (event->unlinked) { + pbuf = LNET_PING_INFO_TO_BUFFER(event->md.start); + lnet_ping_buffer_decref(pbuf); + } +} + +int lnet_monitor_thr_start(void) +{ + int rc = 0; struct task_struct *task; - LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + if (the_lnet.ln_mt_state != LNET_MT_STATE_SHUTDOWN) + return -EALREADY; - init_completion(&the_lnet.ln_mt_signal); + rc = lnet_resendqs_create(); + if (rc) + return rc; + + rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); + if (rc != 0) { + CERROR("Can't allocate monitor thread EQ: %d\n", rc); + goto clean_queues; + } /* Pre monitor thread start processing */ rc = lnet_router_pre_mt_start(); - if (!rc) - return rc; + if (rc) + goto free_mem; + + init_completion(&the_lnet.ln_mt_signal); the_lnet.ln_mt_state = LNET_MT_STATE_RUNNING; task = kthread_run(lnet_monitor_thread, NULL, "monitor_thread"); if (IS_ERR(task)) { rc = PTR_ERR(task); CERROR("Can't start monitor thread: %d\n", rc); - /* block until event callback signals exit */ - wait_for_completion(&the_lnet.ln_mt_signal); - - /* clean up */ - lnet_router_cleanup(); - the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; - return -ENOMEM; + goto clean_thread; } /* post monitor thread start processing */ lnet_router_post_mt_start(); return 0; + +clean_thread: + the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + /* block until event callback signals exit */ + wait_for_completion(&the_lnet.ln_mt_signal); + /* clean up */ + lnet_router_cleanup(); +free_mem: + the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + LNetEQFree(the_lnet.ln_mt_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); + return rc; +clean_queues: + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + return rc; } void lnet_monitor_thr_stop(void) { + int rc; + if (the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) return; @@ -2478,7 +2947,12 @@ void lnet_monitor_thr_stop(void) wait_for_completion(&the_lnet.ln_mt_signal); LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); + /* perform cleanup tasks */ lnet_router_cleanup(); + lnet_clean_resendqs(); + lnet_clean_local_ni_recoveryq(); + rc = LNetEQFree(the_lnet.ln_mt_eqh); + LASSERT(rc == 0); } void @@ -3173,6 +3647,8 @@ void lnet_monitor_thr_stop(void) lnet_drop_message(msg->msg_rxni, msg->msg_rx_cpt, msg->msg_private, msg->msg_len, msg->msg_type); + + msg->msg_no_resend = true; /* * NB: message will not generate event because w/o attached MD, * but we still should give error code so lnet_msg_decommit() @@ -3338,6 +3814,7 @@ void lnet_monitor_thr_stop(void) if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); + msg->msg_no_resend = true; lnet_finalize(msg, rc); } @@ -3476,7 +3953,7 @@ struct lnet_msg * int LNetGet(lnet_nid_t self, struct lnet_handle_md mdh, struct lnet_process_id target, unsigned int portal, - u64 match_bits, unsigned int offset) + u64 match_bits, unsigned int offset, bool recovery) { struct lnet_msg *msg; struct lnet_libmd *md; @@ -3499,6 +3976,8 @@ struct lnet_msg * return -ENOMEM; } + msg->msg_recovery = recovery; + cpt = lnet_cpt_of_cookie(mdh.cookie); lnet_res_lock(cpt); @@ -3542,6 +4021,7 @@ struct lnet_msg * if (rc < 0) { CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); + msg->msg_no_resend = true; lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 7869b96..e7f7469 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -469,6 +469,234 @@ return 0; } +static void +lnet_dec_healthv_locked(atomic_t *healthv) +{ + int h = atomic_read(healthv); + + if (h < lnet_health_sensitivity) { + atomic_set(healthv, 0); + } else { + h -= lnet_health_sensitivity; + atomic_set(healthv, h); + } +} + +static inline void +lnet_inc_healthv(atomic_t *healthv) +{ + atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); +} + +static void +lnet_handle_local_failure(struct lnet_msg *msg) +{ + struct lnet_ni *local_ni; + + local_ni = msg->msg_txni; + + /* the lnet_net_lock(0) is used to protect the addref on the ni + * and the recovery queue. + */ + lnet_net_lock(0); + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(0); + return; + } + + lnet_dec_healthv_locked(&local_ni->ni_healthv); + /* add the NI to the recovery queue if it's not already there + * and it's health value is actually below the maximum. It's + * possible that the sensitivity might be set to 0, and the health + * value will not be reduced. In this case, there is no reason to + * invoke recovery + */ + if (list_empty(&local_ni->ni_recovery) && + atomic_read(&local_ni->ni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("ni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(local_ni->ni_nid), + atomic_read(&local_ni->ni_healthv)); + list_add_tail(&local_ni->ni_recovery, + &the_lnet.ln_mt_localNIRecovq); + lnet_ni_addref_locked(local_ni, 0); + } + lnet_net_unlock(0); +} + +/* Do a health check on the message: + * return -1 if we're not going to handle the error + * success case will return -1 as well + * return 0 if it the message is requeued for send + */ +static int +lnet_health_check(struct lnet_msg *msg) +{ + enum lnet_msg_hstatus hstatus = msg->msg_health_status; + + /* TODO: lnet_incr_hstats(hstatus); */ + + LASSERT(msg->msg_txni); + + if (hstatus != LNET_MSG_STATUS_OK && + ktime_compare(ktime_get(), msg->msg_deadline) >= 0) + return -1; + + /* if we're shutting down no point in handling health. */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return -1; + + switch (hstatus) { + case LNET_MSG_STATUS_OK: + lnet_inc_healthv(&msg->msg_txni->ni_healthv); + /* we can finalize this message */ + return -1; + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + case LNET_MSG_STATUS_LOCAL_DROPPED: + case LNET_MSG_STATUS_LOCAL_ABORTED: + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + lnet_handle_local_failure(msg); + /* add to the re-send queue */ + goto resend; + + /* TODO: since the remote dropped the message we can + * attempt a resend safely. + */ + case LNET_MSG_STATUS_REMOTE_DROPPED: + break; + + /* These errors will not trigger a resend so simply + * finalize the message + */ + case LNET_MSG_STATUS_LOCAL_ERROR: + lnet_handle_local_failure(msg); + return -1; + case LNET_MSG_STATUS_REMOTE_ERROR: + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + return -1; + } + +resend: + /* don't resend recovery messages */ + if (msg->msg_recovery) + return -1; + + /* if we explicitly indicated we don't want to resend then just + * return + */ + if (msg->msg_no_resend) + return -1; + + lnet_net_lock(msg->msg_tx_cpt); + + /* remove message from the active list and reset it in preparation + * for a resend. Two exception to this + * + * 1. the router case, when a message is committed for rx when + * received, then tx when it is sent. When committed to both tx and + * rx we don't want to remove it from the active list. + * + * 2. The REPLY case since it uses the same msg block for the GET + * that was received. + */ + if (!msg->msg_routing && msg->msg_type != LNET_MSG_REPLY) { + list_del_init(&msg->msg_activelist); + msg->msg_onactivelist = 0; + } + + /* The msg_target.nid which was originally set + * when calling LNetGet() or LNetPut() might've + * been overwritten if we're routing this message. + * Call lnet_return_tx_credits_locked() to return + * the credit this message consumed. The message will + * consume another credit when it gets resent. + */ + msg->msg_target.nid = msg->msg_hdr.dest_nid; + lnet_msg_decommit_tx(msg, -EAGAIN); + msg->msg_sending = 0; + msg->msg_receiving = 0; + msg->msg_target_is_router = 0; + + CDEBUG(D_NET, "%s->%s:%s:%s - queuing for resend\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(hstatus)); + + list_add_tail(&msg->msg_list, the_lnet.ln_mt_resendqs[msg->msg_tx_cpt]); + lnet_net_unlock(msg->msg_tx_cpt); + + wake_up(&the_lnet.ln_mt_waitq); + return 0; +} + +static void +lnet_detach_md(struct lnet_msg *msg, int status) +{ + int cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + + lnet_res_lock(cpt); + lnet_msg_detach_md(msg, status); + lnet_res_unlock(cpt); +} + +static bool +lnet_is_health_check(struct lnet_msg *msg) +{ + bool hc; + int status = msg->msg_ev.status; + + /* perform a health check for any message committed for transmit */ + hc = msg->msg_tx_committed; + + /* Check for status inconsistencies */ + if (hc && + ((!status && msg->msg_health_status != LNET_MSG_STATUS_OK) || + (status && msg->msg_health_status == LNET_MSG_STATUS_OK))) { + CERROR("Msg is in inconsistent state, don't perform health checking (%d, %d)\n", + status, msg->msg_health_status); + hc = false; + } + + CDEBUG(D_NET, "health check = %d, status = %d, hstatus = %d\n", + hc, status, msg->msg_health_status); + + return hc; +} + +char * +lnet_health_error2str(enum lnet_msg_hstatus hstatus) +{ + switch (hstatus) { + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + return "LOCAL_INTERRUPT"; + case LNET_MSG_STATUS_LOCAL_DROPPED: + return "LOCAL_DROPPED"; + case LNET_MSG_STATUS_LOCAL_ABORTED: + return "LOCAL_ABORTED"; + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + return "LOCAL_NO_ROUTE"; + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + return "LOCAL_TIMEOUT"; + case LNET_MSG_STATUS_LOCAL_ERROR: + return "LOCAL_ERROR"; + case LNET_MSG_STATUS_REMOTE_DROPPED: + return "REMOTE_DROPPED"; + case LNET_MSG_STATUS_REMOTE_ERROR: + return "REMOTE_ERROR"; + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + return "REMOTE_TIMEOUT"; + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + return "NETWORK_TIMEOUT"; + case LNET_MSG_STATUS_OK: + return "OK"; + default: + return ""; + } +} + void lnet_finalize(struct lnet_msg *msg, int status) { @@ -477,6 +705,7 @@ int cpt; int rc; int i; + bool hc; LASSERT(!in_interrupt()); @@ -485,15 +714,27 @@ msg->msg_ev.status = status; - if (msg->msg_md) { - cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); - - lnet_res_lock(cpt); - lnet_msg_detach_md(msg, status); - lnet_res_unlock(cpt); - } + /* if the message is successfully sent, no need to keep the MD around */ + if (msg->msg_md && !status) + lnet_detach_md(msg, status); again: + hc = lnet_is_health_check(msg); + + /* the MD would've been detached from the message if it was + * successfully sent. However, if it wasn't successfully sent the + * MD would be around. And since we recalculate whether to + * health check or not, it's possible that we change our minds and + * we don't want to health check this message. In this case also + * free the MD. + * + * If the message is successful we're going to + * go through the lnet_health_check() function, but that'll just + * increment the appropriate health value and return. + */ + if (msg->msg_md && !hc) + lnet_detach_md(msg, status); + rc = 0; if (!msg->msg_tx_committed && !msg->msg_rx_committed) { /* not committed to network yet */ @@ -502,6 +743,28 @@ return; } + if (hc) { + /* Check the health status of the message. If it has one + * of the errors that we're supposed to handle, and it has + * not timed out, then + * 1. Decrement the appropriate health_value + * 2. queue the message on the resend queue + * + * if the message send is success, timed out or failed in the + * health check for any reason then we'll just finalize the + * message. Otherwise just return since the message has been + * put on the resend queue. + */ + if (!lnet_health_check(msg)) + return; + + /* if we get here then we need to clean up the md because we're + * finalizing the message. + */ + if (msg->msg_md) + lnet_detach_md(msg, status); + } + /* * NB: routed message can be committed for both receiving and sending, * we should finalize in LIFO order and keep counters correct. @@ -536,7 +799,7 @@ while ((msg = list_first_entry_or_null(&container->msc_finalizing, struct lnet_msg, msg_list)) != NULL) { - list_del(&msg->msg_list); + list_del_init(&msg->msg_list); /* * NB drops and regains the lnet lock if it actually does @@ -575,7 +838,7 @@ msg_activelist)) != NULL) { LASSERT(msg->msg_onactivelist); msg->msg_onactivelist = 0; - list_del(&msg->msg_activelist); + list_del_init(&msg->msg_activelist); kfree(msg); count++; } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 1534ab2..121876e 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2713,9 +2713,7 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) static int lnet_peer_send_ping(struct lnet_peer *lp) __must_hold(&lp->lp_lock) { - struct lnet_md md = { NULL }; - struct lnet_process_id id; - struct lnet_ping_buffer *pbuf; + lnet_nid_t pnid; int nnis; int rc; int cpt; @@ -2724,54 +2722,35 @@ static int lnet_peer_send_ping(struct lnet_peer *lp) lp->lp_state &= ~LNET_PEER_FORCE_PING; spin_unlock(&lp->lp_lock); - nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN); - pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); - if (!pbuf) { - rc = -ENOMEM; - goto fail_error; - } - - /* initialize md content */ - md.start = &pbuf->pb_info; - md.length = LNET_PING_INFO_SIZE(nnis); - md.threshold = 2; /* GET/REPLY */ - md.max_size = 0; - md.options = LNET_MD_TRUNCATE; - md.user_ptr = lp; - md.eq_handle = the_lnet.ln_dc_eqh; - - rc = LNetMDBind(md, LNET_UNLINK, &lp->lp_ping_mdh); - if (rc != 0) { - lnet_ping_buffer_decref(pbuf); - CERROR("Can't bind MD: %d\n", rc); - goto fail_error; - } cpt = lnet_net_lock_current(); /* Refcount for MD. */ lnet_peer_addref_locked(lp); - id.pid = LNET_PID_LUSTRE; - id.nid = lnet_peer_select_nid(lp); + pnid = lnet_peer_select_nid(lp); lnet_net_unlock(cpt); - if (id.nid == LNET_NID_ANY) { - rc = -EHOSTUNREACH; - goto fail_unlink_md; - } + nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN); - rc = LNetGet(LNET_NID_ANY, lp->lp_ping_mdh, id, - LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); - if (rc) - goto fail_unlink_md; + rc = lnet_send_ping(pnid, &lp->lp_ping_mdh, nnis, lp, + the_lnet.ln_dc_eqh, false); + /* if LNetMDBind in lnet_send_ping fails we need to decrement the + * refcount on the peer, otherwise LNetMDUnlink will be called + * which will eventually do that. + */ + if (rc > 0) { + lnet_net_lock(cpt); + lnet_peer_decref_locked(lp); + lnet_net_unlock(cpt); + rc = -rc; /* change the rc to negative value */ + goto fail_error; + } else if (rc < 0) { + goto fail_error; + } CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); spin_lock(&lp->lp_lock); return 0; -fail_unlink_md: - LNetMDUnlink(lp->lp_ping_mdh); - LNetInvalidateMDHandle(&lp->lp_ping_mdh); fail_error: CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); /* diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 3f9d8c5..7c3bbd8 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1079,7 +1079,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_net_unlock(rtr->lpni_cpt); rc = LNetGet(LNET_NID_ANY, mdh, id, LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0); + LNET_PROTO_PING_MATCHBITS, 0, false); lnet_net_lock(rtr->lpni_cpt); if (rc) diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index 295d704..a5941e4 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -425,7 +425,7 @@ struct srpc_bulk * } else { LASSERT(options & LNET_MD_OP_GET); - rc = LNetGet(self, *mdh, peer, portal, matchbits, 0); + rc = LNetGet(self, *mdh, peer, portal, matchbits, 0, false); } if (rc) { From patchwork Thu Feb 27 21:09:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A575138D for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40507246A0 for ; Thu, 27 Feb 2020 21:22:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40507246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6456E3488AC; Thu, 27 Feb 2020 13:20:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5DAF221FAF1 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2DB40EEC; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2C66E46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:08 -0500 Message-Id: <1582838290-17243-81-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 080/622] lnet: handle o2iblnd tx failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Monitor the different types of failures that might occur on the transmit and flag the type of failure to be propagated to LNet which will handle either by attempting a resend or simply finalizing the message and propagating a failure to the ULP. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 8cf835e425d8 ("LU-9120 lnet: handle o2iblnd tx failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32765 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 2 +- net/lnet/klnds/o2iblnd/o2iblnd.h | 4 ++- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 59 ++++++++++++++++++++++++++++++++----- 3 files changed, 55 insertions(+), 10 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 825fe30..017fe5f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -519,7 +519,7 @@ static int kiblnd_del_peer(struct lnet_ni *ni, lnet_nid_t nid) write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(&zombies, -EIO); + kiblnd_txlist_done(&zombies, -EIO, LNET_MSG_STATUS_LOCAL_ERROR); return rc; } diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 9021051..999b58d 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -515,6 +515,7 @@ struct kib_tx { /* transmit message */ short tx_queued; /* queued for sending */ short tx_waiting; /* waiting for peer_ni */ int tx_status; /* LNET completion status */ + enum lnet_msg_hstatus tx_hstatus; /* health status of the transmit */ ktime_t tx_deadline; /* completion deadline */ u64 tx_cookie; /* completion cookie */ struct lnet_msg *tx_lntmsg[2]; /* lnet msgs to finalize on completion */ @@ -1027,7 +1028,8 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, void kiblnd_close_conn_locked(struct kib_conn *conn, int error); void kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid); -void kiblnd_txlist_done(struct list_head *txlist, int status); +void kiblnd_txlist_done(struct list_head *txlist, int status, + enum lnet_msg_hstatus hstatus); void kiblnd_qp_event(struct ib_event *event, void *arg); void kiblnd_cq_event(struct ib_event *event, void *arg); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 60706b4..007058a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -89,12 +89,17 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, if (!lntmsg[i]) continue; + /* propagate health status to LNet for requests */ + if (i == 0 && lntmsg[i]) + lntmsg[i]->msg_health_status = tx->tx_hstatus; + lnet_finalize(lntmsg[i], rc); } } void -kiblnd_txlist_done(struct list_head *txlist, int status) +kiblnd_txlist_done(struct list_head *txlist, int status, + enum lnet_msg_hstatus hstatus) { struct kib_tx *tx; @@ -105,6 +110,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, /* complete now */ tx->tx_waiting = 0; tx->tx_status = status; + tx->tx_hstatus = hstatus; kiblnd_tx_done(tx); } } @@ -134,6 +140,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, LASSERT(!tx->tx_nfrags); tx->tx_gaps = false; + tx->tx_hstatus = LNET_MSG_STATUS_OK; return tx; } @@ -265,10 +272,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, } if (!tx->tx_status) { /* success so far */ - if (status < 0) /* failed? */ + if (status < 0) { /* failed? */ tx->tx_status = status; - else if (txtype == IBLND_MSG_GET_REQ) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_ERROR; + } else if (txtype == IBLND_MSG_GET_REQ) { lnet_set_reply_msg_len(ni, tx->tx_lntmsg[1], status); + } } tx->tx_waiting = 0; @@ -846,6 +855,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, * posted NOOPs complete */ spin_unlock(&conn->ibc_lock); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); spin_lock(&conn->ibc_lock); CDEBUG(D_NET, "%s(%d): redundant or enough NOOP\n", @@ -1045,6 +1055,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, conn->ibc_noops_posted--; if (failed) { + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_DROPPED; tx->tx_waiting = 0; /* don't wait for peer_ni */ tx->tx_status = -EIO; } @@ -1393,7 +1404,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CWARN("Abort reconnection of %s: %s\n", libcfs_nid2str(peer_ni->ibp_nid), reason); - kiblnd_txlist_done(&txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED, + LNET_MSG_STATUS_LOCAL_ABORTED); return false; } @@ -1471,6 +1483,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (tx) { tx->tx_status = -EHOSTUNREACH; tx->tx_waiting = 0; + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); } return; @@ -1607,6 +1620,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup GET sink for %s: %d\n", libcfs_nid2str(target.nid), rc); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); return -EIO; } @@ -1757,6 +1771,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, return; failed_1: + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); failed_0: lnet_finalize(lntmsg, -EIO); @@ -1839,6 +1854,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup PUT sink for %s: %d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), rc); + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); /* tell peer_ni it's over */ kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, @@ -2050,13 +2066,34 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (txs == &conn->ibc_active_txs) { LASSERT(!tx->tx_queued); LASSERT(tx->tx_waiting || tx->tx_sending); + if (conn->ibc_comms_error == -ETIMEDOUT) { + if (tx->tx_waiting && !tx->tx_sending) + tx->tx_hstatus = + LNET_MSG_STATUS_REMOTE_TIMEOUT; + else if (tx->tx_sending) + tx->tx_hstatus = + LNET_MSG_STATUS_NETWORK_TIMEOUT; + } } else { LASSERT(tx->tx_queued); + if (conn->ibc_comms_error == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; } tx->tx_status = -ECONNABORTED; tx->tx_waiting = 0; + /* TODO: This makes an assumption that + * kiblnd_tx_complete() will be called for each tx. If + * that event is dropped we could end up with stale + * connections floating around. We'd like to deal with + * that in a better way. + * + * Also that means we can exceed the timeout by many + * seconds. + */ if (!tx->tx_sending) { tx->tx_queued = 0; list_del(&tx->tx_list); @@ -2066,7 +2103,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, spin_unlock(&conn->ibc_lock); - kiblnd_txlist_done(&zombies, -ECONNABORTED); + /* aborting transmits occurs when finalizing the connection. + * The connection is finalized on error + */ + kiblnd_txlist_done(&zombies, -ECONNABORTED, -1); } static void @@ -2147,7 +2187,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CNETERR("Deleting messages for %s: connection failed\n", libcfs_nid2str(peer_ni->ibp_nid)); - kiblnd_txlist_done(&zombies, -EHOSTUNREACH); + kiblnd_txlist_done(&zombies, error, + LNET_MSG_STATUS_LOCAL_DROPPED); } static void @@ -2223,7 +2264,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, kiblnd_close_conn_locked(conn, -ECONNABORTED); write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(&txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED, + LNET_MSG_STATUS_LOCAL_ERROR); return; } @@ -3300,7 +3342,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); if (!list_empty(&timedout_txs)) - kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT); + kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT, + LNET_MSG_STATUS_LOCAL_TIMEOUT); /* * Handle timeout by closing the whole From patchwork Thu Feb 27 21:09:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409823 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A503E159A for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DEB1246A0 for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DEB1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 81F91348A0C; Thu, 27 Feb 2020 13:21:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B378221FAF1 for ; Thu, 27 Feb 2020 13:18:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 30E2CEEE; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2FC5A46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:09 -0500 Message-Id: <1582838290-17243-82-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 081/622] lnet: handle socklnd tx failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Update the socklnd to propagate the health status up to LNet for handling. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 25c1cb2c4d6f ("LU-9120 lnet: handle socklnd tx failure") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32766 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.h | 1 + net/lnet/klnds/socklnd/socklnd_cb.c | 49 ++++++++++++++++++++++++++++++++++--- 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 04381a0..48884cf 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -289,6 +289,7 @@ struct ksock_tx { /* transmit packet */ time64_t tx_deadline; /* when (in secs) tx times out */ struct ksock_msg tx_msg; /* socklnd message buffer */ int tx_desc_size; /* size of this descriptor */ + enum lnet_msg_hstatus tx_hstatus; /* health status of tx */ union { struct { struct kvec iov; /* virt hdr */ diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 5b75ea6..d50e0d2 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -56,6 +56,7 @@ struct ksock_tx * tx->tx_zc_aborted = 0; tx->tx_zc_capable = 0; tx->tx_zc_checked = 0; + tx->tx_hstatus = LNET_MSG_STATUS_OK; tx->tx_desc_size = size; atomic_inc(&ksocknal_data.ksnd_nactive_txs); @@ -328,18 +329,26 @@ struct ksock_tx * ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx, int rc) { struct lnet_msg *lnetmsg = tx->tx_lnetmsg; + enum lnet_msg_hstatus hstatus = tx->tx_hstatus; LASSERT(ni || tx->tx_conn); - if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) + if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) { rc = -EIO; + hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } if (tx->tx_conn) ksocknal_conn_decref(tx->tx_conn); ksocknal_free_tx(tx); - if (lnetmsg) /* KSOCK_MSG_NOOP go without lnetmsg */ + if (lnetmsg) { /* KSOCK_MSG_NOOP go without lnetmsg */ + if (rc) + CERROR("tx failure rc = %d, hstatus = %d\n", rc, + hstatus); + lnetmsg->msg_health_status = hstatus; lnet_finalize(lnetmsg, rc); + } } void @@ -362,6 +371,20 @@ struct ksock_tx * list_del(&tx->tx_list); + if (tx->tx_hstatus == LNET_MSG_STATUS_OK) { + if (error == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + else if (error == -ENETDOWN || + error == -EHOSTUNREACH || + error == -ENETUNREACH) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_DROPPED; + /* for all other errors we don't want to + * retransmit + */ + else if (error) + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } + LASSERT(atomic_read(&tx->tx_refcount) == 1); ksocknal_tx_done(ni, tx, error); } @@ -481,12 +504,25 @@ struct ksock_tx * wake_up(&ksocknal_data.ksnd_reaper_waitq); spin_unlock_bh(&ksocknal_data.ksnd_reaper_lock); + + /* set the health status of the message which determines + * whether we should retry the transmit + */ + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; return rc; } /* Actual error */ LASSERT(rc < 0); + /* set the health status of the message which determines + * whether we should retry the transmit + */ + if (rc == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (!conn->ksnc_closing) { switch (rc) { case -ECONNRESET: @@ -509,7 +545,7 @@ struct ksock_tx * ksocknal_uncheck_zc_req(tx); /* it's not an error if conn is being closed */ - ksocknal_close_conn_and_siblings(conn, (conn->ksnc_closing) ? 0 : rc); + ksocknal_close_conn_and_siblings(conn, conn->ksnc_closing ? 0 : rc); return rc; } @@ -2167,6 +2203,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) { /* We're called with a shared lock on ksnd_global_lock */ struct ksock_conn *conn; + struct ksock_tx *tx; list_for_each_entry(conn, &peer_ni->ksnp_conns, ksnc_list) { int error; @@ -2229,6 +2266,10 @@ void ksocknal_write_callback(struct ksock_conn *conn) * buffered in the socket's send buffer */ ksocknal_conn_addref(conn); + list_for_each_entry(tx, &conn->ksnc_tx_queue, + tx_list) + tx->tx_hstatus = + LNET_MSG_STATUS_LOCAL_TIMEOUT; CNETERR("Timeout sending data to %s (%pI4h:%d) the network or that node may be down.\n", libcfs_id2str(peer_ni->ksnp_id), &conn->ksnc_ipaddr, @@ -2255,6 +2296,8 @@ void ksocknal_write_callback(struct ksock_conn *conn) if (ktime_get_seconds() < tx->tx_deadline) break; + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; + list_del(&tx->tx_list); list_add_tail(&tx->tx_list, &stale_txs); } From patchwork Thu Feb 27 21:09:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A55B14BC for ; Thu, 27 Feb 2020 21:23:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E710F246A0 for ; Thu, 27 Feb 2020 21:23:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E710F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B0DF3348AE1; Thu, 27 Feb 2020 13:21:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1719321FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3514AEF1; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 32E2A468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:10 -0500 Message-Id: <1582838290-17243-83-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 082/622] lnet: handle remote errors in LNet X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add health value in the peer NI structure. Decrement the value whenever there is an error sending to the peer. Modify the selection algorithm to look at the peer NI health value when selecting the best peer NI to send to. Put the peer NI on the recovery queue whenever there is an error sending to it. Attempt only to resend on REMOTE DROPPED since we're sure the message was never received by the peer. For other errors finalize the message. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 76fad19c2dea ("LU-9120 lnet: handle remote errors in LNet") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32767 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 6 + include/linux/lnet/lib-types.h | 12 ++ net/lnet/lnet/api-ni.c | 1 + net/lnet/lnet/lib-move.c | 311 +++++++++++++++++++++++++++++++++++------ net/lnet/lnet/lib-msg.c | 87 ++++++++++-- net/lnet/lnet/peer.c | 9 ++ 6 files changed, 368 insertions(+), 58 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 965fc5f..b8ca114 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -894,6 +894,12 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return false; } +static inline void +lnet_inc_healthv(atomic_t *healthv) +{ + atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); +} + void lnet_incr_stats(struct lnet_element_stats *stats, enum lnet_msg_type msg_type, enum lnet_stats_type stats_type); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 8c3bf34..19b83a4 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -478,6 +478,8 @@ struct lnet_peer_ni { struct list_head lpni_peer_nis; /* chain on remote peer list */ struct list_head lpni_on_remote_peer_ni_list; + /* chain on recovery queue */ + struct list_head lpni_recovery; /* chain on peer hash */ struct list_head lpni_hashlist; /* messages blocking for tx credits */ @@ -529,6 +531,10 @@ struct lnet_peer_ni { lnet_nid_t lpni_nid; /* # refs */ atomic_t lpni_refcount; + /* health value for the peer */ + atomic_t lpni_healthv; + /* recovery ping mdh */ + struct lnet_handle_md lpni_recovery_ping_mdh; /* CPT this peer attached on */ int lpni_cpt; /* state flags -- protected by lpni_lock */ @@ -558,6 +564,10 @@ struct lnet_peer_ni { /* Preferred path added due to traffic on non-MR peer_ni */ #define LNET_PEER_NI_NON_MR_PREF BIT(0) +/* peer is being recovered. */ +#define LNET_PEER_NI_RECOVERY_PENDING BIT(1) +/* peer is being deleted */ +#define LNET_PEER_NI_DELETING BIT(2) struct lnet_peer { /* chain on pt_peer_list */ @@ -1088,6 +1098,8 @@ struct lnet { struct list_head **ln_mt_resendqs; /* local NIs to recover */ struct list_head ln_mt_localNIRecovq; + /* local NIs to recover */ + struct list_head ln_mt_peerNIRecovq; /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index deef404..97d9be5 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -832,6 +832,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_dc_working); INIT_LIST_HEAD(&the_lnet.ln_dc_expired); INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq); + INIT_LIST_HEAD(&the_lnet.ln_mt_peerNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); rc = lnet_descriptor_setup(); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index f3f4b84..5224490 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1025,15 +1025,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } if (txpeer) { - /* - * TODO: - * Once the patch for the health comes in we need to set - * the health of the peer ni to bad when we fail to send - * a message. - * int status = msg->msg_ev.status; - * if (status != 0) - * lnet_set_peer_ni_health_locked(txpeer, false) - */ msg->msg_txpeer = NULL; lnet_peer_ni_decref_locked(txpeer); } @@ -1545,6 +1536,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, int best_lpni_credits = INT_MIN; bool preferred = false; bool ni_is_pref; + int best_lpni_healthv = 0; + int lpni_healthv; while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { /* if the best_ni we've chosen aleady has this lpni @@ -1553,6 +1546,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, best_ni->ni_nid); + lpni_healthv = atomic_read(&lpni->lpni_healthv); + CDEBUG(D_NET, "%s ni_is_pref = %d\n", libcfs_nid2str(best_ni->ni_nid), ni_is_pref); @@ -1562,8 +1557,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lpni->lpni_txcredits, best_lpni_credits, lpni->lpni_seq, best_lpni->lpni_seq); + /* pick the healthiest peer ni */ + if (lpni_healthv < best_lpni_healthv) { + continue; + } else if (lpni_healthv > best_lpni_healthv) { + best_lpni_healthv = lpni_healthv; /* if this is a preferred peer use it */ - if (!preferred && ni_is_pref) { + } else if (!preferred && ni_is_pref) { preferred = true; } else if (preferred && !ni_is_pref) { /* @@ -2408,6 +2408,16 @@ struct lnet_ni * return 0; } +enum lnet_mt_event_type { + MT_TYPE_LOCAL_NI = 0, + MT_TYPE_PEER_NI +}; + +struct lnet_mt_event_info { + enum lnet_mt_event_type mt_type; + lnet_nid_t mt_nid; +}; + static void lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) { @@ -2503,6 +2513,7 @@ struct lnet_ni * static void lnet_recover_local_nis(void) { + struct lnet_mt_event_info *ev_info; struct list_head processed_list; struct list_head local_queue; struct lnet_handle_md mdh; @@ -2550,15 +2561,24 @@ struct lnet_ni * lnet_ni_unlock(ni); lnet_net_unlock(0); - /* protect the ni->ni_state field. Once we call the - * lnet_send_ping function it's possible we receive - * a response before we check the rc. The lock ensures - * a stable value for the ni_state RECOVERY_PENDING bit - */ + CDEBUG(D_NET, "attempting to recover local ni: %s\n", + libcfs_nid2str(ni->ni_nid)); + lnet_ni_lock(ni); if (!(ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING)) { ni->ni_state |= LNET_NI_STATE_RECOVERY_PENDING; lnet_ni_unlock(ni); + + ev_info = kzalloc(sizeof(*ev_info), GFP_NOFS); + if (!ev_info) { + CERROR("out of memory. Can't recover %s\n", + libcfs_nid2str(ni->ni_nid)); + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + continue; + } + mdh = ni->ni_ping_mdh; /* Invalidate the ni mdh in case it's deleted. * We'll unlink the mdh in this case below. @@ -2587,9 +2607,10 @@ struct lnet_ni * lnet_ni_decref_locked(ni, 0); lnet_net_unlock(0); - rc = lnet_send_ping(nid, &mdh, - LNET_INTERFACES_MIN, (void *)nid, - the_lnet.ln_mt_eqh, true); + ev_info->mt_type = MT_TYPE_LOCAL_NI; + ev_info->mt_nid = nid; + rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN, + ev_info, the_lnet.ln_mt_eqh, true); /* lookup the nid again */ lnet_net_lock(0); ni = lnet_nid2ni_locked(nid, 0); @@ -2694,6 +2715,44 @@ struct lnet_ni * } static void +lnet_unlink_lpni_recovery_mdh_locked(struct lnet_peer_ni *lpni, int cpt) +{ + struct lnet_handle_md recovery_mdh; + + LNetInvalidateMDHandle(&recovery_mdh); + + if (lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING) { + recovery_mdh = lpni->lpni_recovery_ping_mdh; + LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); + } + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(cpt); + if (!LNetMDHandleIsInvalid(recovery_mdh)) + LNetMDUnlink(recovery_mdh); + lnet_net_lock(cpt); + spin_lock(&lpni->lpni_lock); +} + +static void +lnet_clean_peer_ni_recoveryq(void) +{ + struct lnet_peer_ni *lpni, *tmp; + + lnet_net_lock(LNET_LOCK_EX); + + list_for_each_entry_safe(lpni, tmp, &the_lnet.ln_mt_peerNIRecovq, + lpni_recovery) { + list_del_init(&lpni->lpni_recovery); + spin_lock(&lpni->lpni_lock); + lnet_unlink_lpni_recovery_mdh_locked(lpni, LNET_LOCK_EX); + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + } + + lnet_net_unlock(LNET_LOCK_EX); +} + +static void lnet_clean_resendqs(void) { struct lnet_msg *msg, *tmp; @@ -2716,6 +2775,128 @@ struct lnet_ni * cfs_percpt_free(the_lnet.ln_mt_resendqs); } +static void +lnet_recover_peer_nis(void) +{ + struct lnet_mt_event_info *ev_info; + struct list_head processed_list; + struct list_head local_queue; + struct lnet_handle_md mdh; + struct lnet_peer_ni *lpni; + struct lnet_peer_ni *tmp; + lnet_nid_t nid; + int healthv; + int rc; + + INIT_LIST_HEAD(&local_queue); + INIT_LIST_HEAD(&processed_list); + + /* Always use cpt 0 for locking across all interactions with + * ln_mt_peerNIRecovq + */ + lnet_net_lock(0); + list_splice_init(&the_lnet.ln_mt_peerNIRecovq, + &local_queue); + lnet_net_unlock(0); + + list_for_each_entry_safe(lpni, tmp, &local_queue, + lpni_recovery) { + /* The same protection strategy is used here as is in the + * local recovery case. + */ + lnet_net_lock(0); + healthv = atomic_read(&lpni->lpni_healthv); + spin_lock(&lpni->lpni_lock); + if (lpni->lpni_state & LNET_PEER_NI_DELETING || + healthv == LNET_MAX_HEALTH_VALUE) { + list_del_init(&lpni->lpni_recovery); + lnet_unlink_lpni_recovery_mdh_locked(lpni, 0); + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + continue; + } + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(0); + + /* NOTE: we're racing with peer deletion from user space. + * It's possible that a peer is deleted after we check its + * state. In this case the recovery can create a new peer + */ + spin_lock(&lpni->lpni_lock); + if (!(lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING) && + !(lpni->lpni_state & LNET_PEER_NI_DELETING)) { + lpni->lpni_state |= LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + + ev_info = kzalloc(sizeof(*ev_info), GFP_NOFS); + if (!ev_info) { + CERROR("out of memory. Can't recover %s\n", + libcfs_nid2str(lpni->lpni_nid)); + spin_lock(&lpni->lpni_lock); + lpni->lpni_state &= + ~LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + continue; + } + + /* look at the comments in lnet_recover_local_nis() */ + mdh = lpni->lpni_recovery_ping_mdh; + LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); + nid = lpni->lpni_nid; + lnet_net_lock(0); + list_del_init(&lpni->lpni_recovery); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + + ev_info->mt_type = MT_TYPE_PEER_NI; + ev_info->mt_nid = nid; + rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN, + ev_info, the_lnet.ln_mt_eqh, true); + lnet_net_lock(0); + /* lnet_find_peer_ni_locked() grabs a refcount for + * us. No need to take it explicitly. + */ + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(0); + LNetMDUnlink(mdh); + continue; + } + + lpni->lpni_recovery_ping_mdh = mdh; + /* While we're unlocked the lpni could've been + * readded on the recovery queue. In this case we + * don't need to add it to the local queue, since + * it's already on there and the thread that added + * it would've incremented the refcount on the + * peer, which means we need to decref the refcount + * that was implicitly grabbed by find_peer_ni_locked. + * Otherwise, if the lpni is still not on + * the recovery queue, then we'll add it to the + * processed list. + */ + if (list_empty(&lpni->lpni_recovery)) + list_add_tail(&lpni->lpni_recovery, + &processed_list); + else + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(0); + + spin_lock(&lpni->lpni_lock); + if (rc) + lpni->lpni_state &= + ~LNET_PEER_NI_RECOVERY_PENDING; + } + spin_unlock(&lpni->lpni_lock); + } + + list_splice_init(&processed_list, &local_queue); + lnet_net_lock(0); + list_splice(&local_queue, &the_lnet.ln_mt_peerNIRecovq); + lnet_net_unlock(0); +} + static int lnet_monitor_thread(void *arg) { @@ -2736,6 +2917,8 @@ struct lnet_ni * lnet_recover_local_nis(); + lnet_recover_peer_nis(); + /* TODO do we need to check if we should sleep without * timeout? Technically, an active system will always * have messages in flight so this check will always @@ -2822,10 +3005,61 @@ struct lnet_ni * } static void +lnet_handle_recovery_reply(struct lnet_mt_event_info *ev_info, + int status) +{ + lnet_nid_t nid = ev_info->mt_nid; + + if (ev_info->mt_type == MT_TYPE_LOCAL_NI) { + struct lnet_ni *ni; + + lnet_net_lock(0); + ni = lnet_nid2ni_locked(nid, 0); + if (!ni) { + lnet_net_unlock(0); + return; + } + lnet_ni_lock(ni); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + lnet_ni_unlock(ni); + lnet_net_unlock(0); + + if (status != 0) { + CERROR("local NI recovery failed with %d\n", status); + return; + } + /* need to increment healthv for the ni here, because in + * the lnet_finalize() path we don't have access to this + * NI. And in order to get access to it, we'll need to + * carry forward too much information. + * In the peer case, it'll naturally be incremented + */ + lnet_inc_healthv(&ni->ni_healthv); + } else { + struct lnet_peer_ni *lpni; + int cpt; + + cpt = lnet_net_lock_current(); + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(cpt); + return; + } + spin_lock(&lpni->lpni_lock); + lpni->lpni_state &= ~LNET_PEER_NI_RECOVERY_PENDING; + spin_unlock(&lpni->lpni_lock); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + + if (status != 0) + CERROR("peer NI recovery failed with %d\n", status); + } +} + +static void lnet_mt_event_handler(struct lnet_event *event) { - lnet_nid_t nid = (lnet_nid_t)event->md.user_ptr; - struct lnet_ni *ni; + struct lnet_mt_event_info *ev_info = event->md.user_ptr; struct lnet_ping_buffer *pbuf; /* TODO: remove assert */ @@ -2837,37 +3071,25 @@ struct lnet_ni * event->status); switch (event->type) { + case LNET_EVENT_UNLINK: + CDEBUG(D_NET, "%s recovery ping unlinked\n", + libcfs_nid2str(ev_info->mt_nid)); + /* fall-through */ case LNET_EVENT_REPLY: - /* If the NI has been restored completely then remove from - * the recovery queue - */ - lnet_net_lock(0); - ni = lnet_nid2ni_locked(nid, 0); - if (!ni) { - lnet_net_unlock(0); - break; - } - lnet_ni_lock(ni); - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; - lnet_ni_unlock(ni); - lnet_net_unlock(0); + lnet_handle_recovery_reply(ev_info, event->status); break; case LNET_EVENT_SEND: CDEBUG(D_NET, "%s recovery message sent %s:%d\n", - libcfs_nid2str(nid), + libcfs_nid2str(ev_info->mt_nid), (event->status) ? "unsuccessfully" : "successfully", event->status); break; - case LNET_EVENT_UNLINK: - /* nothing to do */ - CDEBUG(D_NET, "%s recovery ping unlinked\n", - libcfs_nid2str(nid)); - break; default: CERROR("Unexpected event: %d\n", event->type); - return; + break; } if (event->unlinked) { + kfree(ev_info); pbuf = LNET_PING_INFO_TO_BUFFER(event->md.start); lnet_ping_buffer_decref(pbuf); } @@ -2919,14 +3141,16 @@ int lnet_monitor_thr_start(void) lnet_router_cleanup(); free_mem: the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); LNetEQFree(the_lnet.ln_mt_eqh); LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); return rc; clean_queues: - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); return rc; } @@ -2949,8 +3173,9 @@ void lnet_monitor_thr_stop(void) /* perform cleanup tasks */ lnet_router_cleanup(); - lnet_clean_resendqs(); lnet_clean_local_ni_recoveryq(); + lnet_clean_peer_ni_recoveryq(); + lnet_clean_resendqs(); rc = LNetEQFree(the_lnet.ln_mt_eqh); LASSERT(rc == 0); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index e7f7469..046923b 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -482,12 +482,6 @@ } } -static inline void -lnet_inc_healthv(atomic_t *healthv) -{ - atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); -} - static void lnet_handle_local_failure(struct lnet_msg *msg) { @@ -524,6 +518,43 @@ lnet_net_unlock(0); } +static void +lnet_handle_remote_failure(struct lnet_msg *msg) +{ + struct lnet_peer_ni *lpni; + + lpni = msg->msg_txpeer; + + /* lpni could be NULL if we're in the LOLND case */ + if (!lpni) + return; + + lnet_net_lock(0); + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(0); + return; + } + + lnet_dec_healthv_locked(&lpni->lpni_healthv); + /* add the peer NI to the recovery queue if it's not already there + * and it's health value is actually below the maximum. It's + * possible that the sensitivity might be set to 0, and the health + * value will not be reduced. In this case, there is no reason to + * invoke recovery + */ + if (list_empty(&lpni->lpni_recovery) && + atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("lpni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(lpni->lpni_nid), + atomic_read(&lpni->lpni_healthv)); + list_add_tail(&lpni->lpni_recovery, + &the_lnet.ln_mt_peerNIRecovq); + lnet_peer_ni_addref_locked(lpni); + } + lnet_net_unlock(0); +} + /* Do a health check on the message: * return -1 if we're not going to handle the error * success case will return -1 as well @@ -533,11 +564,20 @@ lnet_health_check(struct lnet_msg *msg) { enum lnet_msg_hstatus hstatus = msg->msg_health_status; + bool lo = false; /* TODO: lnet_incr_hstats(hstatus); */ LASSERT(msg->msg_txni); + /* if we're sending to the LOLND then the msg_txpeer will not be + * set. So no need to sanity check it. + */ + if (LNET_NETTYP(LNET_NIDNET(msg->msg_txni->ni_nid)) != LOLND) + LASSERT(msg->msg_txpeer); + else + lo = true; + if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; @@ -546,9 +586,21 @@ if (the_lnet.ln_state != LNET_STATE_RUNNING) return -1; + CDEBUG(D_NET, "health check: %s->%s: %s: %s\n", + libcfs_nid2str(msg->msg_txni->ni_nid), + (lo) ? "self" : libcfs_nid2str(msg->msg_txpeer->lpni_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(hstatus)); + switch (hstatus) { case LNET_MSG_STATUS_OK: lnet_inc_healthv(&msg->msg_txni->ni_healthv); + /* It's possible msg_txpeer is NULL in the LOLND + * case. + */ + if (msg->msg_txpeer) + lnet_inc_healthv(&msg->msg_txpeer->lpni_healthv); + /* we can finalize this message */ return -1; case LNET_MSG_STATUS_LOCAL_INTERRUPT: @@ -560,22 +612,27 @@ /* add to the re-send queue */ goto resend; - /* TODO: since the remote dropped the message we can - * attempt a resend safely. - */ - case LNET_MSG_STATUS_REMOTE_DROPPED: - break; - - /* These errors will not trigger a resend so simply - * finalize the message - */ + /* These errors will not trigger a resend so simply + * finalize the message + */ case LNET_MSG_STATUS_LOCAL_ERROR: lnet_handle_local_failure(msg); return -1; + + /* TODO: since the remote dropped the message we can + * attempt a resend safely. + */ + case LNET_MSG_STATUS_REMOTE_DROPPED: + lnet_handle_remote_failure(msg); + goto resend; + case LNET_MSG_STATUS_REMOTE_ERROR: case LNET_MSG_STATUS_REMOTE_TIMEOUT: case LNET_MSG_STATUS_NETWORK_TIMEOUT: + lnet_handle_remote_failure(msg); return -1; + default: + LBUG(); } resend: diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 121876e..4a62f9a 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -124,6 +124,7 @@ INIT_LIST_HEAD(&lpni->lpni_routes); INIT_LIST_HEAD(&lpni->lpni_hashlist); INIT_LIST_HEAD(&lpni->lpni_peer_nis); + INIT_LIST_HEAD(&lpni->lpni_recovery); INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list); spin_lock_init(&lpni->lpni_lock); @@ -133,6 +134,7 @@ lpni->lpni_ping_feats = LNET_PING_FEAT_INVAL; lpni->lpni_nid = nid; lpni->lpni_cpt = cpt; + atomic_set(&lpni->lpni_healthv, LNET_MAX_HEALTH_VALUE); lnet_set_peer_ni_health_locked(lpni, true); net = lnet_get_net_locked(LNET_NIDNET(nid)); @@ -331,6 +333,13 @@ /* remove peer ni from the hash list. */ list_del_init(&lpni->lpni_hashlist); + /* indicate the peer is being deleted so the monitor thread can + * remove it from the recovery queue. + */ + spin_lock(&lpni->lpni_lock); + lpni->lpni_state |= LNET_PEER_NI_DELETING; + spin_unlock(&lpni->lpni_lock); + /* decrement the ref count on the peer table */ ptable = the_lnet.ln_peer_tables[lpni->lpni_cpt]; LASSERT(atomic_read(&ptable->pt_number) > 0); From patchwork Thu Feb 27 21:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB713159A for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A43D5246A1 for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A43D5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70E10348845; Thu, 27 Feb 2020 13:20:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C43B21FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 37816EF3; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 35DA546F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:11 -0500 Message-Id: <1582838290-17243-84-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 083/622] lnet: add retry count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added a module parameter to define the number of retries on a message. It defaults to 0, which means no retries will be attempted. Each message will keep track of the number of times it has been retransmitted. When queuing it on the resend queue, the retry count will be checked and if it's exceeded, then the message will be finalized. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 20e23980eae2 ("LU-9120 lnet: add retry count") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32769 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/api-ni.c | 5 +++++ net/lnet/lnet/lib-msg.c | 8 +++++++- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index b8ca114..ace0d51 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -478,6 +478,7 @@ struct lnet_ni * struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_transaction_timeout; +extern unsigned int lnet_retry_count; extern unsigned int lnet_numa_range; extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_peer_discovery_disabled; diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 19b83a4..1108e3b 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -103,6 +103,8 @@ struct lnet_msg { enum lnet_msg_hstatus msg_health_status; /* This is a recovery message */ bool msg_recovery; + /* the number of times a transmission has been retried */ + int msg_retry_count; /* flag to indicate that we do not want to resend this message */ bool msg_no_resend; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 97d9be5..a54fe2c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -116,6 +116,11 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_transaction_timeout, "Time in seconds to wait for a REPLY or an ACK"); +unsigned int lnet_retry_count; +module_param(lnet_retry_count, uint, 0444); +MODULE_PARM_DESC(lnet_retry_count, + "Maximum number of times to retry transmitting a message"); + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 046923b..9841e14 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -556,7 +556,8 @@ } /* Do a health check on the message: - * return -1 if we're not going to handle the error + * return -1 if we're not going to handle the error or + * if we've reached the maximum number of retries. * success case will return -1 as well * return 0 if it the message is requeued for send */ @@ -646,6 +647,11 @@ if (msg->msg_no_resend) return -1; + /* check if the message has exceeded the number of retries */ + if (msg->msg_retry_count >= lnet_retry_count) + return -1; + msg->msg_retry_count++; + lnet_net_lock(msg->msg_tx_cpt); /* remove message from the active list and reset it in preparation From patchwork Thu Feb 27 21:09:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB225159A for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3E88246A0 for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3E88246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D41B43487E7; Thu, 27 Feb 2020 13:21:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C342921FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3A5E0EF4; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 38DDF46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:12 -0500 Message-Id: <1582838290-17243-85-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 084/622] lnet: calculate the lnd timeout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Calculate the LND timeout based on the transaction timeout and the retry count. Both of these are user defined values. Whenever they are set the lnd timeout is calculated. The LNDs use these timeouts instead of the LND timeout module parameter. Retry count can be set to 0, which means no retries. In that case the LND timeout will default to 5 seconds, which is the same as the default transaction timeout. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 84f3af43c4bd ("LU-9120 lnet: calculate the lnd timeout") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32770 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 20 +++++++++++--------- net/lnet/klnds/socklnd/socklnd.c | 6 +++--- net/lnet/klnds/socklnd/socklnd_cb.c | 22 ++++++++++++---------- net/lnet/lnet/api-ni.c | 9 +++++++++ 5 files changed, 37 insertions(+), 22 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index ace0d51..5500e3f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -85,6 +85,7 @@ extern struct kmem_cache *lnet_small_mds_cachep; /* <= LNET_SMALL_MD_SIZE bytes * MDs kmem_cache */ +#define LNET_LND_DEFAULT_TIMEOUT 5 static inline int lnet_is_route_alive(struct lnet_route *route) { @@ -676,6 +677,7 @@ void lnet_copy_kiov2iter(struct iov_iter *to, struct page *lnet_kvaddr_to_page(unsigned long vaddr); int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset); +unsigned int lnet_get_lnd_timeout(void); void lnet_register_lnd(struct lnet_lnd *lnd); void lnet_unregister_lnd(struct lnet_lnd *lnd); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 007058a..c6e8e73 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1205,7 +1205,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(!tx->tx_queued); /* not queued for sending already */ LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED); - timeout_ns = *kiblnd_tunables.kib_timeout * NSEC_PER_SEC; + timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC; tx->tx_queued = 1; tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns); @@ -1333,14 +1333,14 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (*kiblnd_tunables.kib_use_priv_port) { rc = kiblnd_resolve_addr(cmid, &srcaddr, &dstaddr, - *kiblnd_tunables.kib_timeout * 1000); + lnet_get_lnd_timeout() * 1000); } else { rc = rdma_resolve_addr(cmid, (struct sockaddr *)&srcaddr, (struct sockaddr *)&dstaddr, - *kiblnd_tunables.kib_timeout * 1000); + lnet_get_lnd_timeout() * 1000); } - if (rc) { + if (rc != 0) { /* Can't initiate address resolution: */ CERROR("Can't resolve addr for %s: %d\n", libcfs_nid2str(peer_ni->ibp_nid), rc); @@ -3097,8 +3097,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, event->status); rc = event->status; } else { - rc = rdma_resolve_route( - cmid, *kiblnd_tunables.kib_timeout * 1000); + rc = rdma_resolve_route(cmid, + lnet_get_lnd_timeout() * 1000); if (!rc) { struct kib_net *net = peer_ni->ibp_ni->ni_data; struct kib_dev *dev = net->ibn_dev; @@ -3499,6 +3499,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, const int n = 4; const int p = 1; int chunk = kiblnd_data.kib_peer_hash_size; + unsigned int lnd_timeout; spin_unlock_irqrestore(lock, flags); dropped_lock = 1; @@ -3512,9 +3513,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, * connection within (n+1)/n times the timeout * interval. */ - if (*kiblnd_tunables.kib_timeout > n * p) - chunk = (chunk * n * p) / - *kiblnd_tunables.kib_timeout; + + lnd_timeout = lnet_get_lnd_timeout(); + if (lnd_timeout > n * p) + chunk = (chunk * n * p) / lnd_timeout; if (!chunk) chunk = 1; diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 03fa706..891d3bd 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1284,7 +1284,7 @@ struct ksock_peer * /* Set the deadline for the outgoing HELLO to drain */ conn->ksnc_tx_bufnob = sock->sk->sk_wmem_queued; conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); mb(); /* order with adding to peer_ni's conn list */ list_add(&conn->ksnc_list, &peer_ni->ksnp_conns); @@ -1674,7 +1674,7 @@ struct ksock_peer * switch (conn->ksnc_rx_state) { case SOCKNAL_RX_LNET_PAYLOAD: last_rcv = conn->ksnc_rx_deadline - - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); CERROR("Completing partial receive from %s[%d], ip %pI4h:%d, with error, wanted: %zd, left: %d, last alive is %lld secs ago\n", libcfs_id2str(conn->ksnc_peer->ksnp_id), conn->ksnc_type, &conn->ksnc_ipaddr, conn->ksnc_port, @@ -1849,7 +1849,7 @@ struct ksock_peer * if (bufnob < conn->ksnc_tx_bufnob) { /* something got ACKed */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); peer_ni->ksnp_last_alive = now; conn->ksnc_tx_bufnob = bufnob; } diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index d50e0d2..8bc23d2 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -222,7 +222,7 @@ struct ksock_tx * * something got ACKed */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_tx_bufnob = bufnob; mb(); @@ -268,7 +268,7 @@ struct ksock_tx * conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_rx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); mb(); /* order with setting rx_started */ conn->ksnc_rx_started = 1; @@ -423,7 +423,7 @@ struct ksock_tx * /* ZC_REQ is going to be pinned to the peer_ni */ tx->tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); LASSERT(!tx->tx_msg.ksm_zc_cookies[0]); @@ -705,7 +705,7 @@ struct ksock_conn * if (list_empty(&conn->ksnc_tx_queue) && !bufnob) { /* First packet starts the timeout */ conn->ksnc_tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); if (conn->ksnc_tx_bufnob > 0) /* something got ACKed */ conn->ksnc_peer->ksnp_last_alive = ktime_get_seconds(); conn->ksnc_tx_bufnob = 0; @@ -881,7 +881,7 @@ struct ksock_route * ksocknal_find_connecting_route_locked(peer_ni)) { /* the message is going to be pinned to the peer_ni */ tx->tx_deadline = ktime_get_seconds() + - *ksocknal_tunables.ksnd_timeout; + lnet_get_lnd_timeout(); /* Queue the message until a connection is established */ list_add_tail(&tx->tx_list, &peer_ni->ksnp_tx_queue); @@ -1663,7 +1663,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) /* socket type set on active connections - not set on passive */ LASSERT(!active == !(conn->ksnc_type != SOCKLND_CONN_NONE)); - timeout = active ? *ksocknal_tunables.ksnd_timeout : + timeout = active ? lnet_get_lnd_timeout() : lnet_acceptor_timeout(); rc = lnet_sock_read(sock, &hello->kshm_magic, @@ -1801,7 +1801,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) int retry_later = 0; int rc = 0; - deadline = ktime_get_seconds() + *ksocknal_tunables.ksnd_timeout; + deadline = ktime_get_seconds() + lnet_get_lnd_timeout(); write_lock_bh(&ksocknal_data.ksnd_global_lock); @@ -2552,6 +2552,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) const int n = 4; const int p = 1; int chunk = ksocknal_data.ksnd_peer_hash_size; + unsigned int lnd_timeout; /* * Time to check for timeouts on a few more peers: I do @@ -2561,9 +2562,10 @@ void ksocknal_write_callback(struct ksock_conn *conn) * timeout on any connection within (n+1)/n times the * timeout interval. */ - if (*ksocknal_tunables.ksnd_timeout > n * p) - chunk = (chunk * n * p) / - *ksocknal_tunables.ksnd_timeout; + + lnd_timeout = lnet_get_lnd_timeout(); + if (lnd_timeout > n * p) + chunk = (chunk * n * p) / lnd_timeout; if (!chunk) chunk = 1; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index a54fe2c..e467d64 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -121,6 +121,8 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_retry_count, "Maximum number of times to retry transmitting a message"); +unsigned int lnet_lnd_timeout = LNET_LND_DEFAULT_TIMEOUT; + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or @@ -570,6 +572,13 @@ static void lnet_assert_wire_constants(void) return NULL; } +unsigned int +lnet_get_lnd_timeout(void) +{ + return lnet_lnd_timeout; +} +EXPORT_SYMBOL(lnet_get_lnd_timeout); + void lnet_register_lnd(struct lnet_lnd *lnd) { From patchwork Thu Feb 27 21:09:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D84BC14BC for ; Thu, 27 Feb 2020 21:23:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1075246A0 for ; Thu, 27 Feb 2020 21:23:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1075246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9947C21FDCF; Thu, 27 Feb 2020 13:21:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 278EF21FA9A for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3D647EF5; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BB9346C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:13 -0500 Message-Id: <1582838290-17243-86-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 085/622] lnet: sysfs functions for module params X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Allow transaction timeout and retry count module parameters to be set and shown via sysfs. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 5169827bf790 ("LU-9120 lnet: sysfs functions for module params") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32861 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 84 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 77 insertions(+), 7 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e467d64..38e35bb 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -111,13 +111,27 @@ struct lnet the_lnet = { unsigned int lnet_transaction_timeout = 5; static int transaction_to_set(const char *val, const struct kernel_param *kp); -module_param_call(lnet_transaction_timeout, transaction_to_set, param_get_int, - &lnet_transaction_timeout, 0444); +static struct kernel_param_ops param_ops_transaction_timeout = { + .set = transaction_to_set, + .get = param_get_int, +}; + +#define param_check_transaction_timeout(name, p) \ + __param_check(name, p, int) +module_param(lnet_transaction_timeout, transaction_timeout, 0644); MODULE_PARM_DESC(lnet_transaction_timeout, - "Time in seconds to wait for a REPLY or an ACK"); + "Maximum number of seconds to wait for a peer response."); unsigned int lnet_retry_count; -module_param(lnet_retry_count, uint, 0444); +static int retry_count_set(const char *val, const struct kernel_param *kp); +static struct kernel_param_ops param_ops_retry_count = { + .set = retry_count_set, + .get = param_get_int, +}; + +#define param_check_retry_count(name, p) \ + __param_check(name, p, int) +module_param(lnet_retry_count, retry_count, 0644); MODULE_PARM_DESC(lnet_retry_count, "Maximum number of times to retry transmitting a message"); @@ -241,10 +255,15 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (value == 0) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value < lnet_retry_count || value == 0) { mutex_unlock(&the_lnet.ln_api_mutex); - CERROR("Invalid value for lnet_transaction_timeout (%lu).\n", - value); + CERROR("Invalid value for lnet_transaction_timeout (%lu). Has to be greater than lnet_retry_count (%u)\n", + value, lnet_retry_count); return -EINVAL; } @@ -254,6 +273,57 @@ static int lnet_discover(struct lnet_process_id id, u32 force, } *transaction_to = value; + if (lnet_retry_count == 0) + lnet_lnd_timeout = value; + else + lnet_lnd_timeout = value / lnet_retry_count; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int +retry_count_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *retry_count = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_retry_count'\n"); + return rc; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + if (value > lnet_transaction_timeout) { + mutex_unlock(&the_lnet.ln_api_mutex); + CERROR("Invalid value for lnet_retry_count (%lu). Has to be smaller than lnet_transaction_timeout (%u)\n", + value, lnet_transaction_timeout); + return -EINVAL; + } + + if (value == *retry_count) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *retry_count = value; + + if (value == 0) + lnet_lnd_timeout = lnet_transaction_timeout; + else + lnet_lnd_timeout = lnet_transaction_timeout / value; mutex_unlock(&the_lnet.ln_api_mutex); From patchwork Thu Feb 27 21:09:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 221D3159A for ; Thu, 27 Feb 2020 21:22:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0AE60246A0 for ; Thu, 27 Feb 2020 21:22:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AE60246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 16EED21FF2D; Thu, 27 Feb 2020 13:21:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F80421FA53 for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3FEF2EF6; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3EBB446D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:14 -0500 Message-Id: <1582838290-17243-87-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 086/622] lnet: timeout delayed REPLYs and ACKs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a GET or a PUT which require an ACK are sent, add a response tracker block on a percpt queue. When the REPLY/ACK are received then remove the block from the percpt queue. The monitor thread will wake up periodically to check if any of the blocks have expired and if so, it will send a timeout event to the ULP and flag the MD as stale, then unlink. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: a57fa1176e74 ("LU-9120 lnet: timeout delayed REPLYs and ACKs") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32771 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 20 ++++ include/linux/lnet/lib-types.h | 20 ++++ net/lnet/lnet/lib-move.c | 210 ++++++++++++++++++++++++++++++++++++++++- net/lnet/lnet/lib-msg.c | 9 ++ 4 files changed, 258 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 5500e3f..c2191e5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -438,6 +438,25 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, lnet_net_unlock(0); } +static inline struct lnet_rsp_tracker * +lnet_rspt_alloc(int cpt) +{ + struct lnet_rsp_tracker *rspt; + + rspt = kzalloc(sizeof(*rspt), GFP_NOFS); + lnet_net_lock(cpt); + lnet_net_unlock(cpt); + return rspt; +} + +static inline void +lnet_rspt_free(struct lnet_rsp_tracker *rspt, int cpt) +{ + kfree(rspt); + lnet_net_lock(cpt); + lnet_net_unlock(cpt); +} + void lnet_ni_free(struct lnet_ni *ni); void lnet_net_free(struct lnet_net *net); @@ -614,6 +633,7 @@ struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *get_msg); void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, unsigned int len); +void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt); void lnet_finalize(struct lnet_msg *msg, int rc); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 1108e3b..d815a87 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -75,6 +75,17 @@ enum lnet_msg_hstatus { LNET_MSG_STATUS_NETWORK_TIMEOUT }; +struct lnet_rsp_tracker { + /* chain on the waiting list */ + struct list_head rspt_on_list; + /* cpt to lock */ + int rspt_cpt; + /* deadline of the REPLY/ACK */ + ktime_t rspt_deadline; + /* parent MD */ + struct lnet_handle_md rspt_mdh; +}; + struct lnet_msg { struct list_head msg_activelist; struct list_head msg_list; /* Q for credits/MD */ @@ -201,6 +212,7 @@ struct lnet_libmd { unsigned int md_flags; unsigned int md_niov; /* # frags at end of struct */ void *md_user_ptr; + struct lnet_rsp_tracker *md_rspt_ptr; struct lnet_eq *md_eq; struct lnet_handle_md md_bulk_handle; union { @@ -1102,6 +1114,14 @@ struct lnet { struct list_head ln_mt_localNIRecovq; /* local NIs to recover */ struct list_head ln_mt_peerNIRecovq; + /* + * An array of queues for GET/PUT waiting for REPLY/ACK respectively. + * There are CPT number of queues. Since response trackers will be + * added on the fast path we can't afford to grab the exclusive + * net lock to protect these queues. The CPT will be calculated + * based on the mdh cookie. + */ + struct list_head **ln_mt_rstq; /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 5224490..55cbf57 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2418,6 +2418,110 @@ struct lnet_mt_event_info { lnet_nid_t mt_nid; }; +void +lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt) +{ + struct lnet_rsp_tracker *rspt; + + /* msg has a refcount on the MD so the MD is not going away. + * The rspt queue for the cpt is protected by + * the lnet_net_lock(cpt). cpt is the cpt of the MD cookie. + */ + lnet_res_lock(cpt); + if (!md->md_rspt_ptr) { + lnet_res_unlock(cpt); + return; + } + rspt = md->md_rspt_ptr; + md->md_rspt_ptr = NULL; + + /* debug code */ + LASSERT(rspt->rspt_cpt == cpt); + + /* invalidate the handle to indicate that a response has been + * received, which will then lead the monitor thread to clean up + * the rspt block. + */ + LNetInvalidateMDHandle(&rspt->rspt_mdh); + lnet_res_unlock(cpt); +} + +static void +lnet_finalize_expired_responses(bool force) +{ + struct lnet_libmd *md; + struct list_head local_queue; + struct lnet_rsp_tracker *rspt, *tmp; + int i; + + if (!the_lnet.ln_mt_rstq) + return; + + cfs_cpt_for_each(i, lnet_cpt_table()) { + INIT_LIST_HEAD(&local_queue); + + lnet_net_lock(i); + if (!the_lnet.ln_mt_rstq[i]) { + lnet_net_unlock(i); + continue; + } + list_splice_init(the_lnet.ln_mt_rstq[i], &local_queue); + lnet_net_unlock(i); + + list_for_each_entry_safe(rspt, tmp, &local_queue, + rspt_on_list) { + /* The rspt mdh will be invalidated when a response + * is received or whenever we want to discard the + * block the monitor thread will walk the queue + * and clean up any rsts with an invalid mdh. + * The monitor thread will walk the queue until + * the first unexpired rspt block. This means that + * some rspt blocks which received their + * corresponding responses will linger in the + * queue until they are cleaned up eventually. + */ + lnet_res_lock(i); + if (LNetMDHandleIsInvalid(rspt->rspt_mdh)) { + lnet_res_unlock(i); + list_del_init(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + continue; + } + + if (ktime_compare(ktime_get(), + rspt->rspt_deadline) >= 0 || + force) { + md = lnet_handle2md(&rspt->rspt_mdh); + if (!md) { + LNetInvalidateMDHandle(&rspt->rspt_mdh); + lnet_res_unlock(i); + list_del_init(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + continue; + } + LASSERT(md->md_rspt_ptr == rspt); + md->md_rspt_ptr = NULL; + lnet_res_unlock(i); + + list_del_init(&rspt->rspt_on_list); + + CDEBUG(D_NET, + "Response timed out: md = %p\n", md); + LNetMDUnlink(rspt->rspt_mdh); + lnet_rspt_free(rspt, i); + } else { + lnet_res_unlock(i); + break; + } + } + + lnet_net_lock(i); + if (!list_empty(&local_queue)) + list_splice(&local_queue, the_lnet.ln_mt_rstq[i]); + lnet_net_unlock(i); + } +} + static void lnet_resend_pending_msgs_locked(struct list_head *resendq, int cpt) { @@ -2900,6 +3004,8 @@ struct lnet_mt_event_info { static int lnet_monitor_thread(void *arg) { + int wakeup_counter = 0; + /* The monitor thread takes care of the following: * 1. Checks the aliveness of routers * 2. Checks if there are messages on the resend queue to resend @@ -2915,6 +3021,12 @@ struct lnet_mt_event_info { lnet_resend_pending_msgs(); + wakeup_counter++; + if (wakeup_counter >= lnet_transaction_timeout / 2) { + lnet_finalize_expired_responses(false); + wakeup_counter = 0; + } + lnet_recover_local_nis(); lnet_recover_peer_nis(); @@ -3095,6 +3207,29 @@ struct lnet_mt_event_info { } } +static int +lnet_rsp_tracker_create(void) +{ + struct list_head **rstqs; + + rstqs = lnet_create_array_of_queues(); + if (!rstqs) + return -ENOMEM; + + the_lnet.ln_mt_rstq = rstqs; + + return 0; +} + +static void +lnet_rsp_tracker_clean(void) +{ + lnet_finalize_expired_responses(true); + + cfs_percpt_free(the_lnet.ln_mt_rstq); + the_lnet.ln_mt_rstq = NULL; +} + int lnet_monitor_thr_start(void) { int rc = 0; @@ -3107,6 +3242,10 @@ int lnet_monitor_thr_start(void) if (rc) return rc; + rc = lnet_rsp_tracker_create(); + if (rc) + goto clean_queues; + rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); if (rc != 0) { CERROR("Can't allocate monitor thread EQ: %d\n", rc); @@ -3141,6 +3280,7 @@ int lnet_monitor_thr_start(void) lnet_router_cleanup(); free_mem: the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3148,6 +3288,7 @@ int lnet_monitor_thr_start(void) LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); return rc; clean_queues: + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3173,6 +3314,7 @@ void lnet_monitor_thr_stop(void) /* perform cleanup tasks */ lnet_router_cleanup(); + lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); @@ -3917,6 +4059,41 @@ void lnet_monitor_thr_stop(void) } } +static void +lnet_attach_rsp_tracker(struct lnet_rsp_tracker *rspt, int cpt, + struct lnet_libmd *md, struct lnet_handle_md mdh) +{ + s64 timeout_ns; + + /* MD has a refcount taken by message so it's not going away. + * The MD however can be looked up. We need to secure the access + * to the md_rspt_ptr by taking the res_lock. + * The rspt can be accessed without protection up to when it gets + * added to the list. + */ + + /* debug code */ + LASSERT(!md->md_rspt_ptr); + + /* we'll use that same event in case we never get a response */ + rspt->rspt_mdh = mdh; + rspt->rspt_cpt = cpt; + timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; + rspt->rspt_deadline = ktime_add_ns(ktime_get(), timeout_ns); + + lnet_res_lock(cpt); + /* store the rspt so we can access it when we get the REPLY */ + md->md_rspt_ptr = rspt; + lnet_res_unlock(cpt); + + /* add to the list of tracked responses. It's added to tail of the + * list in order to expire all the older entries first. + */ + lnet_net_lock(cpt); + list_add_tail(&rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); + lnet_net_unlock(cpt); +} + /** * Initiate an asynchronous PUT operation. * @@ -3968,6 +4145,7 @@ void lnet_monitor_thr_stop(void) u64 match_bits, unsigned int offset, u64 hdr_data) { + struct lnet_rsp_tracker *rspt = NULL; struct lnet_msg *msg; struct lnet_libmd *md; int cpt; @@ -3991,6 +4169,17 @@ void lnet_monitor_thr_stop(void) msg->msg_vmflush = !!(current->flags & PF_MEMALLOC); cpt = lnet_cpt_of_cookie(mdh.cookie); + + if (ack == LNET_ACK_REQ) { + rspt = lnet_rspt_alloc(cpt); + if (!rspt) { + CERROR("Dropping PUT to %s: ENOMEM on response tracker\n", + libcfs_id2str(target)); + return -ENOMEM; + } + INIT_LIST_HEAD(&rspt->rspt_on_list); + } + lnet_res_lock(cpt); md = lnet_handle2md(&mdh); @@ -4003,6 +4192,7 @@ void lnet_monitor_thr_stop(void) md->md_me->me_portal); lnet_res_unlock(cpt); + kfree(rspt); kfree(msg); return -ENOENT; } @@ -4035,11 +4225,15 @@ void lnet_monitor_thr_stop(void) lnet_build_msg_event(msg, LNET_EVENT_SEND); + if (ack == LNET_ACK_REQ) + lnet_attach_rsp_tracker(rspt, cpt, md, mdh); + rc = lnet_send(self, msg, LNET_NID_ANY); if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; + lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } @@ -4180,6 +4374,7 @@ struct lnet_msg * struct lnet_process_id target, unsigned int portal, u64 match_bits, unsigned int offset, bool recovery) { + struct lnet_rsp_tracker *rspt; struct lnet_msg *msg; struct lnet_libmd *md; int cpt; @@ -4201,9 +4396,18 @@ struct lnet_msg * return -ENOMEM; } + cpt = lnet_cpt_of_cookie(mdh.cookie); + + rspt = lnet_rspt_alloc(cpt); + if (!rspt) { + CERROR("Dropping GET to %s: ENOMEM on response tracker\n", + libcfs_id2str(target)); + return -ENOMEM; + } + INIT_LIST_HEAD(&rspt->rspt_on_list); + msg->msg_recovery = recovery; - cpt = lnet_cpt_of_cookie(mdh.cookie); lnet_res_lock(cpt); md = lnet_handle2md(&mdh); @@ -4218,6 +4422,7 @@ struct lnet_msg * lnet_res_unlock(cpt); kfree(msg); + kfree(rspt); return -ENOENT; } @@ -4242,11 +4447,14 @@ struct lnet_msg * lnet_build_msg_event(msg, LNET_EVENT_SEND); + lnet_attach_rsp_tracker(rspt, cpt, md, mdh); + rc = lnet_send(self, msg, LNET_NID_ANY); if (rc < 0) { CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; + lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 9841e14..5046648 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -777,6 +777,15 @@ msg->msg_ev.status = status; + /* if this is an ACK or a REPLY then make sure to remove the + * response tracker. + */ + if (msg->msg_ev.type == LNET_EVENT_REPLY || + msg->msg_ev.type == LNET_EVENT_ACK) { + cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + lnet_detach_rsp_tracker(msg->msg_md, cpt); + } + /* if the message is successfully sent, no need to keep the MD around */ if (msg->msg_md && !status) lnet_detach_md(msg, status); From patchwork Thu Feb 27 21:09:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409835 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4351138D for ; Thu, 27 Feb 2020 21:23:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BCF59246A0 for ; Thu, 27 Feb 2020 21:23:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BCF59246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0786221FE17; Thu, 27 Feb 2020 13:21:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D9D0B21FA5D for ; Thu, 27 Feb 2020 13:18:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 42A8EEF7; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4191D468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:15 -0500 Message-Id: <1582838290-17243-88-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 087/622] lnet: remove duplicate timeout mechanism X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Remove the duplicate GET/PUT timeout mechanism currently implemented for discovery, as it has been replaced by a more generic timeout mechanism for all GET/PUT messages. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 0b1947d14188 ("LU-9120 lnet: remove duplicate timeout mechanism") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32992 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 39 --------------------------------------- 1 file changed, 39 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4a62f9a..ca9b90b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2925,25 +2925,6 @@ static int lnet_peer_rediscover(struct lnet_peer *lp) } /* - * Returns the first peer on the ln_dc_working queue if its timeout - * has expired. Takes the current time as an argument so as to not - * obsessively re-check the clock. The oldest discovery request will - * be at the head of the queue. - */ -static struct lnet_peer *lnet_peer_get_dc_timed_out(time64_t now) -{ - struct lnet_peer *lp; - - if (list_empty(&the_lnet.ln_dc_working)) - return NULL; - lp = list_first_entry(&the_lnet.ln_dc_working, - struct lnet_peer, lp_dc_list); - if (now < lp->lp_last_queued + lnet_transaction_timeout) - return NULL; - return lp; -} - -/* * Discovering this peer is taking too long. Cancel any Ping or Push * that discovery is waiting on by unlinking the relevant MDs. The * lnet_discovery_event_handler() will proceed from here and complete @@ -2998,8 +2979,6 @@ static int lnet_peer_discovery_wait_for_work(void) break; if (!list_empty(&the_lnet.ln_msg_resend)) break; - if (lnet_peer_get_dc_timed_out(ktime_get_real_seconds())) - break; lnet_net_unlock(cpt); /* @@ -3068,7 +3047,6 @@ static void lnet_resend_msgs(void) static int lnet_peer_discovery(void *arg) { struct lnet_peer *lp; - time64_t now; int rc; CDEBUG(D_NET, "started\n"); @@ -3159,23 +3137,6 @@ static int lnet_peer_discovery(void *arg) break; } - /* - * Now that the ln_dc_request queue has been emptied - * check the ln_dc_working queue for peers that are - * taking too long. Move all that are found to the - * ln_dc_expired queue and time out any pending - * Ping or Push. We have to drop the lnet_net_lock - * in the loop because lnet_peer_cancel_discovery() - * calls LNetMDUnlink(). - */ - now = ktime_get_real_seconds(); - while ((lp = lnet_peer_get_dc_timed_out(now)) != NULL) { - list_move(&lp->lp_dc_list, &the_lnet.ln_dc_expired); - lnet_net_unlock(LNET_LOCK_EX); - lnet_peer_cancel_discovery(lp); - lnet_net_lock(LNET_LOCK_EX); - } - lnet_net_unlock(LNET_LOCK_EX); } From patchwork Thu Feb 27 21:09:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BA2F138D for ; Thu, 27 Feb 2020 21:23:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 541BF246A0 for ; Thu, 27 Feb 2020 21:23:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 541BF246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 799B9348B06; Thu, 27 Feb 2020 13:21:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27E3621FA9F for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 45ED4EF8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 447A346F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:16 -0500 Message-Id: <1582838290-17243-89-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 088/622] lnet: handle fatal device error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The o2iblnd can receive device status on the QP event handler. There are three in specific that are being handled in this patch: IB_EVENT_DEVICE_FATAL IB_EVENT_PORT_ERR IB_EVENT_PORT_ACTIVE For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set in fatal error mode. This NI will no longer be selected when sending messages. When PORT_ACTIVE is received the NI associated with the QP has the fatal error cleared and future messages can use that NI. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 6b1571209a99 ("LU-9120 lnet: handle fatal device error") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32772 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 7 +++++++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 13 +++++++++++++ net/lnet/lnet/lib-move.c | 6 +++++- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index d815a87..2b3e76a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -443,6 +443,13 @@ struct lnet_ni { atomic_t ni_healthv; /* + * Set to 1 by the LND when it receives an event telling it the device + * has gone into a fatal state. Set to 0 when the LND receives an + * even telling it the device is back online. + */ + atomic_t ni_fatal_error_on; + + /* * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index c6e8e73..293a859 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -3567,6 +3567,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, rdma_notify(conn->ibc_cmid, IB_EVENT_COMM_EST); return; + case IB_EVENT_PORT_ERR: + case IB_EVENT_DEVICE_FATAL: + CERROR("Fatal device error for NI %s\n", + libcfs_nid2str(conn->ibc_peer->ibp_ni->ni_nid)); + atomic_set(&conn->ibc_peer->ibp_ni->ni_fatal_error_on, 1); + return; + + case IB_EVENT_PORT_ACTIVE: + CERROR("Port reactivated for NI %s\n", + libcfs_nid2str(conn->ibc_peer->ibp_ni->ni_nid)); + atomic_set(&conn->ibc_peer->ibp_ni->ni_fatal_error_on, 0); + return; + default: CERROR("%s: Async QP event type %d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), event->event); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 55cbf57..8d5f1e5 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1303,9 +1303,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, unsigned int distance; int ni_credits; int ni_healthv; + int ni_fatal; ni_credits = atomic_read(&ni->ni_tx_credits); ni_healthv = atomic_read(&ni->ni_healthv); + ni_fatal = atomic_read(&ni->ni_fatal_error_on); /* * calculate the distance from the CPT on which @@ -1334,7 +1336,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * Select on health, shorter distance, available * credits, then round-robin. */ - if (ni_healthv < best_healthv) { + if (ni_fatal) { + continue; + } else if (ni_healthv < best_healthv) { continue; } else if (ni_healthv > best_healthv) { best_healthv = ni_healthv; From patchwork Thu Feb 27 21:09:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B71A6138D for ; Thu, 27 Feb 2020 21:23:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F991246A0 for ; Thu, 27 Feb 2020 21:23:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F991246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88500348AD7; Thu, 27 Feb 2020 13:21:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E27421FACC for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 48C73EF9; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 474DA46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:17 -0500 Message-Id: <1582838290-17243-90-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 089/622] lnet: reset health value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added an IOCTL to set the local or peer ni health value. This would be useful in debugging where we can test the selection algorithm and recovery mechanism by reducing the health of an interface. If the value specified is -1 then reset the health value to maximum. This is useful to reset the system once a network issue has been resolved. There would be no need to wait for the interface to go to fully healthy on its own. It might be desirable to shortcut the process. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 2f5a6d1233ac ("LU-9120 lnet: reset health value") Lustre-commit: b04c35874dca ("LU-11283 lnet: fix setting health value manually") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32773 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ include/uapi/linux/lnet/libcfs_ioctl.h | 3 +- include/uapi/linux/lnet/lnet-dlc.h | 14 ++++++++ net/lnet/lnet/api-ni.c | 51 +++++++++++++++++++++++++++ net/lnet/lnet/lib-msg.c | 16 +-------- net/lnet/lnet/peer.c | 64 ++++++++++++++++++++++++++++++++++ 6 files changed, 134 insertions(+), 16 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index c2191e5..bd6ea90 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -524,6 +524,8 @@ struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *lnet_get_ni_idx_locked(int idx); int lnet_get_peer_list(u32 *countp, u32 *sizep, struct lnet_process_id __user *ids); +extern void lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all); +extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni); void lnet_router_debugfs_init(void); void lnet_router_debugfs_fini(void); diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 4396d26..458a634 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -148,6 +148,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 99, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 101 +#define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 102 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 484435d..2d3aad8 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -230,6 +230,20 @@ struct lnet_ioctl_peer_cfg { void __user *prcfg_bulk; }; + +enum lnet_health_type { + LNET_HEALTH_TYPE_LOCAL_NI = 0, + LNET_HEALTH_TYPE_PEER_NI, +}; + +struct lnet_ioctl_reset_health_cfg { + struct libcfs_ioctl_hdr rh_hdr; + enum lnet_health_type rh_type; + bool rh_all; + int rh_value; + lnet_nid_t rh_nid; +}; + struct lnet_ioctl_set_value { struct libcfs_ioctl_hdr sv_hdr; __u32 sv_value; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 38e35bb..0cadb2a 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3163,6 +3163,35 @@ u32 lnet_get_dlc_seq_locked(void) return atomic_read(&lnet_dlc_seq_no); } +static void +lnet_ni_set_healthv(lnet_nid_t nid, int value, bool all) +{ + struct lnet_net *net; + struct lnet_ni *ni; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + if (ni->ni_nid == nid || all) { + atomic_set(&ni->ni_healthv, value); + if (list_empty(&ni->ni_recovery) && + value < LNET_MAX_HEALTH_VALUE) { + CERROR("manually adding local NI %s to recovery\n", + libcfs_nid2str(ni->ni_nid)); + list_add_tail(&ni->ni_recovery, + &the_lnet.ln_mt_localNIRecovq); + lnet_ni_addref_locked(ni, 0); + } + if (!all) { + lnet_net_unlock(LNET_LOCK_EX); + return; + } + } + } + } + lnet_net_unlock(LNET_LOCK_EX); +} + /** * LNet ioctl handler. * @@ -3446,6 +3475,28 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_SET_HEALHV: { + struct lnet_ioctl_reset_health_cfg *cfg = arg; + int value; + + if (cfg->rh_hdr.ioc_len < sizeof(*cfg)) + return -EINVAL; + if (cfg->rh_value < 0 || + cfg->rh_value > LNET_MAX_HEALTH_VALUE) + value = LNET_MAX_HEALTH_VALUE; + else + value = cfg->rh_value; + mutex_lock(&the_lnet.ln_api_mutex); + if (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) + lnet_ni_set_healthv(cfg->rh_nid, value, + cfg->rh_all); + else + lnet_peer_ni_set_healthv(cfg->rh_nid, value, + cfg->rh_all); + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + case IOC_LIBCFS_NOTIFY_ROUTER: { time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0]; diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 5046648..32d49e9 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -530,12 +530,6 @@ return; lnet_net_lock(0); - /* the mt could've shutdown and cleaned up the queues */ - if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { - lnet_net_unlock(0); - return; - } - lnet_dec_healthv_locked(&lpni->lpni_healthv); /* add the peer NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's @@ -543,15 +537,7 @@ * value will not be reduced. In this case, there is no reason to * invoke recovery */ - if (list_empty(&lpni->lpni_recovery) && - atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { - CERROR("lpni %s added to recovery queue. Health = %d\n", - libcfs_nid2str(lpni->lpni_nid), - atomic_read(&lpni->lpni_healthv)); - list_add_tail(&lpni->lpni_recovery, - &the_lnet.ln_mt_peerNIRecovq); - lnet_peer_ni_addref_locked(lpni); - } + lnet_peer_ni_add_to_recoveryq_locked(lpni); lnet_net_unlock(0); } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index ca9b90b..9dbb3bd4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3437,3 +3437,67 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) out: return rc; } + +void +lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni) +{ + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) + return; + + if (list_empty(&lpni->lpni_recovery) && + atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) { + CERROR("lpni %s added to recovery queue. Health = %d\n", + libcfs_nid2str(lpni->lpni_nid), + atomic_read(&lpni->lpni_healthv)); + list_add_tail(&lpni->lpni_recovery, + &the_lnet.ln_mt_peerNIRecovq); + lnet_peer_ni_addref_locked(lpni); + } +} + +/* Call with the ln_api_mutex held */ +void +lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all) +{ + struct lnet_peer_table *ptable; + struct lnet_peer *lp; + struct lnet_peer_net *lpn; + struct lnet_peer_ni *lpni; + int lncpt; + int cpt; + + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return; + + if (!all) { + lnet_net_lock(LNET_LOCK_EX); + lpni = lnet_find_peer_ni_locked(nid); + atomic_set(&lpni->lpni_healthv, value); + lnet_peer_ni_add_to_recoveryq_locked(lpni); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(LNET_LOCK_EX); + return; + } + + lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); + + /* Walk all the peers and reset the healhv for each one to the + * maximum value. + */ + lnet_net_lock(LNET_LOCK_EX); + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + list_for_each_entry(lp, &ptable->pt_peer_list, lp_peer_list) { + list_for_each_entry(lpn, &lp->lp_peer_nets, + lpn_peer_nets) { + list_for_each_entry(lpni, &lpn->lpn_peer_nis, + lpni_peer_nis) { + atomic_set(&lpni->lpni_healthv, value); + lnet_peer_ni_add_to_recoveryq_locked(lpni); + } + } + } + } + lnet_net_unlock(LNET_LOCK_EX); +} From patchwork Thu Feb 27 21:09:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E35F814BC for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB705246A1 for ; Thu, 27 Feb 2020 21:22:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB705246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0E69348910; Thu, 27 Feb 2020 13:21:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D76AD21FA63 for ; Thu, 27 Feb 2020 13:18:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4C0A7EFA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4A2BC46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:18 -0500 Message-Id: <1582838290-17243-91-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 090/622] lnet: add health statistics X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add a health statistics block for each local and peer NI. These statistics will be incremented when processing errors reported by lnet_finalize() WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 67908ab34371 ("LU-9120 lnet: add health statistics") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32775 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 18 +++++++++++++++ net/lnet/lnet/lib-msg.c | 52 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 2b3e76a..e5d4128 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -338,6 +338,22 @@ struct lnet_element_stats { struct lnet_comm_count el_drop_stats; }; +struct lnet_health_local_stats { + atomic_t hlt_local_interrupt; + atomic_t hlt_local_dropped; + atomic_t hlt_local_aborted; + atomic_t hlt_local_no_route; + atomic_t hlt_local_timeout; + atomic_t hlt_local_error; +}; + +struct lnet_health_remote_stats { + atomic_t hlt_remote_dropped; + atomic_t hlt_remote_timeout; + atomic_t hlt_remote_error; + atomic_t hlt_network_timeout; +}; + struct lnet_net { /* chain on the ln_nets */ struct list_head net_list; @@ -426,6 +442,7 @@ struct lnet_ni { /* NI statistics */ struct lnet_element_stats ni_stats; + struct lnet_health_local_stats ni_hstats; /* physical device CPT */ int ni_dev_cpt; @@ -511,6 +528,7 @@ struct lnet_peer_ni { struct list_head lpni_rtr_list; /* statistics kept on each peer NI */ struct lnet_element_stats lpni_stats; + struct lnet_health_remote_stats lpni_hstats; /* spin lock protecting credits and lpni_txq / lpni_rtrq */ spinlock_t lpni_lock; /* # tx credits available */ diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 32d49e9..dc51a17 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -541,6 +541,54 @@ lnet_net_unlock(0); } +static void +lnet_incr_hstats(struct lnet_msg *msg, enum lnet_msg_hstatus hstatus) +{ + struct lnet_ni *ni = msg->msg_txni; + struct lnet_peer_ni *lpni = msg->msg_txpeer; + + switch (hstatus) { + case LNET_MSG_STATUS_LOCAL_INTERRUPT: + atomic_inc(&ni->ni_hstats.hlt_local_interrupt); + break; + case LNET_MSG_STATUS_LOCAL_DROPPED: + atomic_inc(&ni->ni_hstats.hlt_local_dropped); + break; + case LNET_MSG_STATUS_LOCAL_ABORTED: + atomic_inc(&ni->ni_hstats.hlt_local_aborted); + break; + case LNET_MSG_STATUS_LOCAL_NO_ROUTE: + atomic_inc(&ni->ni_hstats.hlt_local_no_route); + break; + case LNET_MSG_STATUS_LOCAL_TIMEOUT: + atomic_inc(&ni->ni_hstats.hlt_local_timeout); + break; + case LNET_MSG_STATUS_LOCAL_ERROR: + atomic_inc(&ni->ni_hstats.hlt_local_error); + break; + case LNET_MSG_STATUS_REMOTE_DROPPED: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_dropped); + break; + case LNET_MSG_STATUS_REMOTE_ERROR: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_error); + break; + case LNET_MSG_STATUS_REMOTE_TIMEOUT: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_remote_timeout); + break; + case LNET_MSG_STATUS_NETWORK_TIMEOUT: + if (lpni) + atomic_inc(&lpni->lpni_hstats.hlt_network_timeout); + break; + case LNET_MSG_STATUS_OK: + break; + default: + LBUG(); + } +} + /* Do a health check on the message: * return -1 if we're not going to handle the error or * if we've reached the maximum number of retries. @@ -553,8 +601,6 @@ enum lnet_msg_hstatus hstatus = msg->msg_health_status; bool lo = false; - /* TODO: lnet_incr_hstats(hstatus); */ - LASSERT(msg->msg_txni); /* if we're sending to the LOLND then the msg_txpeer will not be @@ -565,6 +611,8 @@ else lo = true; + lnet_incr_hstats(msg, hstatus); + if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; From patchwork Thu Feb 27 21:09:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38DEC14BC for ; Thu, 27 Feb 2020 21:23:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 20966246A0 for ; Thu, 27 Feb 2020 21:23:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20966246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 74056348AFD; Thu, 27 Feb 2020 13:21:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B1DE21FA63 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4E806EFB; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4D4A7468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:19 -0500 Message-Id: <1582838290-17243-92-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 091/622] lnet: Add ioctl to get health stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata At the time of this patch the sysfs statistics features is still in development. Therefore, using ioctl to get the stats from LNet. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 10958cac798d ("LU-9120 lnet: Add ioctl to get health stats") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32776 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++- include/uapi/linux/lnet/lnet-dlc.h | 31 ++++++++++++++++----- net/lnet/lnet/api-ni.c | 49 ++++++++++++++++++++++++++++++++++ net/lnet/lnet/peer.c | 29 ++++++++++++++++---- 5 files changed, 101 insertions(+), 12 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index bd6ea90..ba237df 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -823,6 +823,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, u32 *ni_peer_tx_credits, u32 *peer_tx_credits, u32 *peer_rtr_credits, u32 *peer_min_rtr_credtis, u32 *peer_tx_qnob); +int lnet_get_peer_ni_hstats(struct lnet_ioctl_peer_ni_hstats *stats); static inline bool lnet_is_peer_ni_healthy_locked(struct lnet_peer_ni *lpni) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 458a634..683d508 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -149,6 +149,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 102 +#define IOC_LIBCFS_GET_LOCAL_HSTATS _IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 103 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 2d3aad8..8e9850c 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -163,6 +163,31 @@ struct lnet_ioctl_element_stats { __u32 iel_drop_count; }; +enum lnet_health_type { + LNET_HEALTH_TYPE_LOCAL_NI = 0, + LNET_HEALTH_TYPE_PEER_NI, +}; + +struct lnet_ioctl_local_ni_hstats { + struct libcfs_ioctl_hdr hlni_hdr; + lnet_nid_t hlni_nid; + __u32 hlni_local_interrupt; + __u32 hlni_local_dropped; + __u32 hlni_local_aborted; + __u32 hlni_local_no_route; + __u32 hlni_local_timeout; + __u32 hlni_local_error; + __s32 hlni_health_value; +}; + +struct lnet_ioctl_peer_ni_hstats { + __u32 hlpni_remote_dropped; + __u32 hlpni_remote_timeout; + __u32 hlpni_remote_error; + __u32 hlpni_network_timeout; + __s32 hlpni_health_value; +}; + struct lnet_ioctl_element_msg_stats { struct libcfs_ioctl_hdr im_hdr; __u32 im_idx; @@ -230,12 +255,6 @@ struct lnet_ioctl_peer_cfg { void __user *prcfg_bulk; }; - -enum lnet_health_type { - LNET_HEALTH_TYPE_LOCAL_NI = 0, - LNET_HEALTH_TYPE_PEER_NI, -}; - struct lnet_ioctl_reset_health_cfg { struct libcfs_ioctl_hdr rh_hdr; enum lnet_health_type rh_type; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0cadb2a..14a8f2c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3192,6 +3192,42 @@ u32 lnet_get_dlc_seq_locked(void) lnet_net_unlock(LNET_LOCK_EX); } +static int +lnet_get_local_ni_hstats(struct lnet_ioctl_local_ni_hstats *stats) +{ + int cpt, rc = 0; + struct lnet_ni *ni; + lnet_nid_t nid = stats->hlni_nid; + + cpt = lnet_net_lock_current(); + ni = lnet_nid2ni_locked(nid, cpt); + + if (!ni) { + rc = -ENOENT; + goto unlock; + } + + stats->hlni_local_interrupt = + atomic_read(&ni->ni_hstats.hlt_local_interrupt); + stats->hlni_local_dropped = + atomic_read(&ni->ni_hstats.hlt_local_dropped); + stats->hlni_local_aborted = + atomic_read(&ni->ni_hstats.hlt_local_aborted); + stats->hlni_local_no_route = + atomic_read(&ni->ni_hstats.hlt_local_no_route); + stats->hlni_local_timeout = + atomic_read(&ni->ni_hstats.hlt_local_timeout); + stats->hlni_local_error = + atomic_read(&ni->ni_hstats.hlt_local_error); + stats->hlni_health_value = + atomic_read(&ni->ni_healthv); + +unlock: + lnet_net_unlock(cpt); + + return rc; +} + /** * LNet ioctl handler. * @@ -3399,6 +3435,19 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_GET_LOCAL_HSTATS: { + struct lnet_ioctl_local_ni_hstats *stats = arg; + + if (stats->hlni_hdr.ioc_len < sizeof(*stats)) + return -EINVAL; + + mutex_lock(&the_lnet.ln_api_mutex); + rc = lnet_get_local_ni_hstats(stats); + mutex_unlock(&the_lnet.ln_api_mutex); + + return rc; + } + case IOC_LIBCFS_ADD_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 9dbb3bd4..4a38ca6 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3339,6 +3339,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) { struct lnet_ioctl_element_stats *lpni_stats; struct lnet_ioctl_element_msg_stats *lpni_msg_stats; + struct lnet_ioctl_peer_ni_hstats *lpni_hstats; struct lnet_peer_ni_credit_info *lpni_info; struct lnet_peer_ni *lpni; struct lnet_peer *lp; @@ -3354,7 +3355,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) } size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats) + - sizeof(*lpni_msg_stats); + sizeof(*lpni_msg_stats) + sizeof(*lpni_hstats); size *= lp->lp_nnis; if (size > cfg->prcfg_size) { cfg->prcfg_size = size; @@ -3380,6 +3381,9 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_msg_stats = kzalloc(sizeof(*lpni_msg_stats), GFP_KERNEL); if (!lpni_msg_stats) goto out_free_stats; + lpni_hstats = kzalloc(sizeof(*lpni_hstats), GFP_NOFS); + if (!lpni_hstats) + goto out_free_msg_stats; lpni = NULL; @@ -3387,7 +3391,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { nid = lpni->lpni_nid; if (copy_to_user(bulk, &nid, sizeof(nid))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(nid); memset(lpni_info, 0, sizeof(*lpni_info)); @@ -3406,7 +3410,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_info->cr_peer_min_tx_credits = lpni->lpni_mintxcredits; lpni_info->cr_peer_tx_qnob = lpni->lpni_txqnob; if (copy_to_user(bulk, lpni_info, sizeof(*lpni_info))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_info); memset(lpni_stats, 0, sizeof(*lpni_stats)); @@ -3417,15 +3421,30 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpni_stats->iel_drop_count = lnet_sum_stats(&lpni->lpni_stats, LNET_STATS_TYPE_DROP); if (copy_to_user(bulk, lpni_stats, sizeof(*lpni_stats))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_stats); lnet_usr_translate_stats(lpni_msg_stats, &lpni->lpni_stats); if (copy_to_user(bulk, lpni_msg_stats, sizeof(*lpni_msg_stats))) - goto out_free_msg_stats; + goto out_free_hstats; bulk += sizeof(*lpni_msg_stats); + lpni_hstats->hlpni_network_timeout = + atomic_read(&lpni->lpni_hstats.hlt_network_timeout); + lpni_hstats->hlpni_remote_dropped = + atomic_read(&lpni->lpni_hstats.hlt_remote_dropped); + lpni_hstats->hlpni_remote_timeout = + atomic_read(&lpni->lpni_hstats.hlt_remote_timeout); + lpni_hstats->hlpni_remote_error = + atomic_read(&lpni->lpni_hstats.hlt_remote_error); + lpni_hstats->hlpni_health_value = + atomic_read(&lpni->lpni_healthv); + if (copy_to_user(bulk, lpni_hstats, sizeof(*lpni_hstats))) + goto out_free_hstats; + bulk += sizeof(*lpni_hstats); } rc = 0; +out_free_hstats: + kfree(lpni_hstats); out_free_msg_stats: kfree(lpni_msg_stats); out_free_stats: From patchwork Thu Feb 27 21:09:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FE5414BC for ; Thu, 27 Feb 2020 21:23:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 68F3D246A0 for ; Thu, 27 Feb 2020 21:23:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68F3D246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 547BD21FED4; Thu, 27 Feb 2020 13:21:51 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 916AE21FA63 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 52C78EFC; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5043746D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:20 -0500 Message-Id: <1582838290-17243-93-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 092/622] lnet: remove obsolete health functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Removed obsolete health functions that were originally added during the Multi-Rail project. Some assumptions were made about the health implementation back then, that are no longer true. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: ba05b3a98a0c ("LU-9120 lnet: remove obsolete health functions") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32862 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 40 ---------------------------------------- net/lnet/lnet/api-ni.c | 9 --------- net/lnet/lnet/lib-move.c | 6 ------ net/lnet/lnet/peer.c | 8 -------- 4 files changed, 63 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index ba237df..74660d3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -494,7 +494,6 @@ struct lnet_ni * struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid); struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt); struct lnet_ni *lnet_net2ni_addref(u32 net); -bool lnet_is_ni_healthy_locked(struct lnet_ni *ni); struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_transaction_timeout; @@ -825,45 +824,6 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, u32 *peer_tx_qnob); int lnet_get_peer_ni_hstats(struct lnet_ioctl_peer_ni_hstats *stats); -static inline bool -lnet_is_peer_ni_healthy_locked(struct lnet_peer_ni *lpni) -{ - return lpni->lpni_healthy; -} - -static inline void -lnet_set_peer_ni_health_locked(struct lnet_peer_ni *lpni, bool health) -{ - lpni->lpni_healthy = health; -} - -static inline bool -lnet_is_peer_net_healthy_locked(struct lnet_peer_net *peer_net) -{ - struct lnet_peer_ni *lpni; - - list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_peer_nis) { - if (lnet_is_peer_ni_healthy_locked(lpni)) - return true; - } - - return false; -} - -static inline bool -lnet_is_peer_healthy_locked(struct lnet_peer *peer) -{ - struct lnet_peer_net *peer_net; - - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { - if (lnet_is_peer_net_healthy_locked(peer_net)) - return true; - } - - return false; -} - static inline struct lnet_peer_net * lnet_find_peer_net_locked(struct lnet_peer *peer, u32 net_id) { diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 14a8f2c..1ee24c7 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1155,15 +1155,6 @@ struct lnet_net * return !!net; } -bool -lnet_is_ni_healthy_locked(struct lnet_ni *ni) -{ - if (ni->ni_state & LNET_NI_STATE_ACTIVE) - return true; - - return false; -} - struct lnet_ni * lnet_nid2ni_locked(lnet_nid_t nid, int cpt) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 8d5f1e5..c33cf8d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2323,12 +2323,6 @@ struct lnet_ni * } lnet_peer_ni_decref_locked(lpni); - /* If peer is not healthy then can not send anything to it */ - if (!lnet_is_peer_healthy_locked(peer)) { - lnet_net_unlock(cpt); - return -EHOSTUNREACH; - } - /* Identify the different send cases */ if (src_nid == LNET_NID_ANY) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4a38ca6..b20230b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -135,7 +135,6 @@ lpni->lpni_nid = nid; lpni->lpni_cpt = cpt; atomic_set(&lpni->lpni_healthv, LNET_MAX_HEALTH_VALUE); - lnet_set_peer_ni_health_locked(lpni, true); net = lnet_get_net_locked(LNET_NIDNET(nid)); lpni->lpni_net = net; @@ -2694,8 +2693,6 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) /* Look for a direct-connected NID for this peer. */ lpni = NULL; while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id)) continue; break; @@ -2706,8 +2703,6 @@ static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) /* Look for a routed-connected NID for this peer. */ lpni = NULL; while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { - if (!lnet_is_peer_ni_healthy_locked(lpni)) - continue; if (!lnet_find_rnet_locked(lpni->lpni_peer_net->lpn_net_id)) continue; break; @@ -3082,9 +3077,6 @@ static int lnet_peer_discovery(void *arg) * forever, in case the GET message (for ping) * doesn't get a REPLY or the PUT message (for * push) doesn't get an ACK. - * - * TODO: LNet Health will deal with this scenario - * in a generic way. */ lp->lp_last_queued = ktime_get_real_seconds(); lnet_net_unlock(LNET_LOCK_EX); From patchwork Thu Feb 27 21:09:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B053114BC for ; Thu, 27 Feb 2020 21:23:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 99043246A0 for ; Thu, 27 Feb 2020 21:23:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99043246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45F5D348B2A; Thu, 27 Feb 2020 13:21:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E764121FAD6 for ; Thu, 27 Feb 2020 13:18:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 54E48EFD; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 535AD46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:21 -0500 Message-Id: <1582838290-17243-94-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 093/622] lnet: set health value from user space X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Collect debugging information for ioctl setting manually health value. Test if a peer is returned by lnet_find_peer_ni_locked() when lnet_get_peer_info() is called. This was discovered when the user land tools were updated for setting the health value. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: c0ad398fd716 ("LU-9120 lnet: set health value from user space") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32863 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 6 ++++++ net/lnet/lnet/peer.c | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 1ee24c7..82703dd 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3526,6 +3526,12 @@ u32 lnet_get_dlc_seq_locked(void) value = LNET_MAX_HEALTH_VALUE; else value = cfg->rh_value; + CDEBUG(D_NET, + "Manually setting healthv to %d for %s:%s. all = %d\n", + value, + (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) ? + "local" : "peer", + libcfs_nid2str(cfg->rh_nid), cfg->rh_all); mutex_lock(&the_lnet.ln_api_mutex); if (cfg->rh_type == LNET_HEALTH_TYPE_LOCAL_NI) lnet_ni_set_healthv(cfg->rh_nid, value, diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b20230b..2fc5dfc 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3484,6 +3484,10 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) if (!all) { lnet_net_lock(LNET_LOCK_EX); lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + lnet_net_unlock(LNET_LOCK_EX); + return; + } atomic_set(&lpni->lpni_healthv, value); lnet_peer_ni_add_to_recoveryq_locked(lpni); lnet_peer_ni_decref_locked(lpni); From patchwork Thu Feb 27 21:09:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D8C1892A for ; Thu, 27 Feb 2020 21:33:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C14E024677 for ; Thu, 27 Feb 2020 21:33:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C14E024677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 23E82349D36; Thu, 27 Feb 2020 13:28:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3616A21FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 57D69EFE; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5698B46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:22 -0500 Message-Id: <1582838290-17243-95-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 094/622] lnet: add global health statistics X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added global health statistics Print that from lnetctl. lnetctl stats show lnet_selftest passes the statistics block over the wire. This, unfortunately, creates an unnecessary backwards compatibility link for lnet_selftest, which shouldn't be there. This patch breaks this backwards compatibility, which means lnet_selftest will not work with older selftest modules. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 15020fd977af ("LU-9120 lnet: add global health statistics") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32949 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ include/uapi/linux/lnet/lnet-types.h | 13 +++++++++++++ net/lnet/lnet/api-ni.c | 13 +++++++++++++ net/lnet/lnet/lib-move.c | 11 +++++++++++ net/lnet/lnet/lib-msg.c | 28 +++++++++++++++++++++++----- 5 files changed, 62 insertions(+), 5 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 74660d3..e4d9ccc 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -445,6 +445,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, rspt = kzalloc(sizeof(*rspt), GFP_NOFS); lnet_net_lock(cpt); + the_lnet.ln_counters[cpt]->rst_alloc++; lnet_net_unlock(cpt); return rspt; } @@ -454,6 +455,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, { kfree(rspt); lnet_net_lock(cpt); + the_lnet.ln_counters[cpt]->rst_alloc--; lnet_net_unlock(cpt); } diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h index 2afdd83..1da72c4 100644 --- a/include/uapi/linux/lnet/lnet-types.h +++ b/include/uapi/linux/lnet/lnet-types.h @@ -278,11 +278,24 @@ struct lnet_ping_info { struct lnet_counters { __u32 msgs_alloc; __u32 msgs_max; + __u32 rst_alloc; __u32 errors; __u32 send_count; __u32 recv_count; __u32 route_count; __u32 drop_count; + __u32 resend_count; + __u32 response_timeout_count; + __u32 local_interrupt_count; + __u32 local_dropped_count; + __u32 local_aborted_count; + __u32 local_no_route_count; + __u32 local_timeout_count; + __u32 local_error_count; + __u32 remote_dropped_count; + __u32 remote_error_count; + __u32 remote_timeout_count; + __u32 network_timeout_count; __u64 send_length; __u64 recv_length; __u64 route_length; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 82703dd..d58006d 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -694,7 +694,20 @@ static void lnet_assert_wire_constants(void) cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) { counters->msgs_max += ctr->msgs_max; counters->msgs_alloc += ctr->msgs_alloc; + counters->rst_alloc += ctr->rst_alloc; counters->errors += ctr->errors; + counters->resend_count += ctr->resend_count; + counters->response_timeout_count += ctr->response_timeout_count; + counters->local_interrupt_count += ctr->local_interrupt_count; + counters->local_dropped_count += ctr->local_dropped_count; + counters->local_aborted_count += ctr->local_aborted_count; + counters->local_no_route_count += ctr->local_no_route_count; + counters->local_timeout_count += ctr->local_timeout_count; + counters->local_error_count += ctr->local_error_count; + counters->remote_dropped_count += ctr->remote_dropped_count; + counters->remote_error_count += ctr->remote_error_count; + counters->remote_timeout_count += ctr->remote_timeout_count; + counters->network_timeout_count += ctr->network_timeout_count; counters->send_count += ctr->send_count; counters->recv_count += ctr->recv_count; counters->route_count += ctr->route_count; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index c33cf8d..6a3704d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2501,6 +2501,10 @@ struct lnet_mt_event_info { md->md_rspt_ptr = NULL; lnet_res_unlock(i); + lnet_net_lock(i); + the_lnet.ln_counters[i]->response_timeout_count++; + lnet_net_unlock(i); + list_del_init(&rspt->rspt_on_list); CDEBUG(D_NET, @@ -2567,6 +2571,11 @@ struct lnet_mt_event_info { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); + CDEBUG(D_NET, "resending %s->%s: %s recovery %d\n", + libcfs_nid2str(src_nid), + libcfs_id2str(msg->msg_target), + lnet_msgtyp2str(msg->msg_type), + msg->msg_recovery); rc = lnet_send(src_nid, msg, LNET_NID_ANY); if (rc) { CERROR("Error sending %s to %s: %d\n", @@ -2576,6 +2585,8 @@ struct lnet_mt_event_info { lnet_finalize(msg, rc); } lnet_net_lock(cpt); + if (!rc) + the_lnet.ln_counters[cpt]->resend_count++; } } } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index dc51a17..70decc7 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -546,41 +546,52 @@ { struct lnet_ni *ni = msg->msg_txni; struct lnet_peer_ni *lpni = msg->msg_txpeer; + struct lnet_counters *counters = the_lnet.ln_counters[0]; switch (hstatus) { case LNET_MSG_STATUS_LOCAL_INTERRUPT: atomic_inc(&ni->ni_hstats.hlt_local_interrupt); + counters->local_interrupt_count++; break; case LNET_MSG_STATUS_LOCAL_DROPPED: atomic_inc(&ni->ni_hstats.hlt_local_dropped); + counters->local_dropped_count++; break; case LNET_MSG_STATUS_LOCAL_ABORTED: atomic_inc(&ni->ni_hstats.hlt_local_aborted); + counters->local_aborted_count++; break; case LNET_MSG_STATUS_LOCAL_NO_ROUTE: atomic_inc(&ni->ni_hstats.hlt_local_no_route); + counters->local_no_route_count++; break; case LNET_MSG_STATUS_LOCAL_TIMEOUT: atomic_inc(&ni->ni_hstats.hlt_local_timeout); + counters->local_timeout_count++; break; case LNET_MSG_STATUS_LOCAL_ERROR: atomic_inc(&ni->ni_hstats.hlt_local_error); + counters->local_error_count++; break; case LNET_MSG_STATUS_REMOTE_DROPPED: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_dropped); + counters->remote_dropped_count++; break; case LNET_MSG_STATUS_REMOTE_ERROR: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_error); + counters->remote_error_count++; break; case LNET_MSG_STATUS_REMOTE_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_timeout); + counters->remote_timeout_count++; break; case LNET_MSG_STATUS_NETWORK_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_network_timeout); + counters->network_timeout_count++; break; case LNET_MSG_STATUS_OK: break; @@ -601,6 +612,10 @@ enum lnet_msg_hstatus hstatus = msg->msg_health_status; bool lo = false; + /* if we're shutting down no point in handling health. */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return -1; + LASSERT(msg->msg_txni); /* if we're sending to the LOLND then the msg_txpeer will not be @@ -611,15 +626,18 @@ else lo = true; - lnet_incr_hstats(msg, hstatus); - if (hstatus != LNET_MSG_STATUS_OK && ktime_compare(ktime_get(), msg->msg_deadline) >= 0) return -1; - /* if we're shutting down no point in handling health. */ - if (the_lnet.ln_state != LNET_STATE_RUNNING) - return -1; + /* stats are only incremented for errors so avoid wasting time + * incrementing statistics if there is no error. + */ + if (hstatus != LNET_MSG_STATUS_OK) { + lnet_net_lock(0); + lnet_incr_hstats(msg, hstatus); + lnet_net_unlock(0); + } CDEBUG(D_NET, "health check: %s->%s: %s: %s\n", libcfs_nid2str(msg->msg_txni->ni_nid), From patchwork Thu Feb 27 21:09:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD4F5138D for ; Thu, 27 Feb 2020 21:24:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A4F5F246A0 for ; Thu, 27 Feb 2020 21:24:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A4F5F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3AE19348B49; Thu, 27 Feb 2020 13:21:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8CBE721FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5AE3BEFF; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 59B07468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:23 -0500 Message-Id: <1582838290-17243-96-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 095/622] lnet: print recovery queues content X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add commands to lnetctl to print recovery queues content from user space. Associated code to handle the IOCTL added in LNet module. for local NIs: lnetctl debug recovery --local for peer NIs: lnetctl debug recovery --peer WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 826ea19c077b ("LU-9120 lnet: print recovery queues content") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32950 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/uapi/linux/lnet/libcfs_ioctl.h | 3 +- include/uapi/linux/lnet/lnet-dlc.h | 8 +++++ net/lnet/lnet/api-ni.c | 53 ++++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 683d508..dfb73f7 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -150,6 +150,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_HEALHV _IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_LOCAL_HSTATS _IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 103 +#define IOC_LIBCFS_GET_RECOVERY_QUEUE _IOWR(IOC_LIBCFS_TYPE, 104, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 104 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 8e9850c..87f7680 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -35,6 +35,7 @@ #define MAX_NUM_SHOW_ENTRIES 32 #define LNET_MAX_STR_LEN 128 #define LNET_MAX_SHOW_NUM_CPT 128 +#define LNET_MAX_SHOW_NUM_NID 128 #define LNET_UNDEFINED_HOPS ((__u32)(-1)) /* @@ -263,6 +264,13 @@ struct lnet_ioctl_reset_health_cfg { lnet_nid_t rh_nid; }; +struct lnet_ioctl_recovery_list { + struct libcfs_ioctl_hdr rlst_hdr; + enum lnet_health_type rlst_type; + int rlst_num_nids; + lnet_nid_t rlst_nid_array[LNET_MAX_SHOW_NUM_NID]; +}; + struct lnet_ioctl_set_value { struct libcfs_ioctl_hdr sv_hdr; __u32 sv_value; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index d58006d..07bc29f 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3232,6 +3232,44 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } +static int +lnet_get_local_ni_recovery_list(struct lnet_ioctl_recovery_list *list) +{ + struct lnet_ni *ni; + int i = 0; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(ni, &the_lnet.ln_mt_localNIRecovq, ni_recovery) { + list->rlst_nid_array[i] = ni->ni_nid; + i++; + if (i >= LNET_MAX_SHOW_NUM_NID) + break; + } + lnet_net_unlock(LNET_LOCK_EX); + list->rlst_num_nids = i; + + return 0; +} + +static int +lnet_get_peer_ni_recovery_list(struct lnet_ioctl_recovery_list *list) +{ + struct lnet_peer_ni *lpni; + int i = 0; + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(lpni, &the_lnet.ln_mt_peerNIRecovq, lpni_recovery) { + list->rlst_nid_array[i] = lpni->lpni_nid; + i++; + if (i >= LNET_MAX_SHOW_NUM_NID) + break; + } + lnet_net_unlock(LNET_LOCK_EX); + list->rlst_num_nids = i; + + return 0; +} + /** * LNet ioctl handler. * @@ -3452,6 +3490,21 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_GET_RECOVERY_QUEUE: { + struct lnet_ioctl_recovery_list *list = arg; + + if (list->rlst_hdr.ioc_len < sizeof(*list)) + return -EINVAL; + + mutex_lock(&the_lnet.ln_api_mutex); + if (list->rlst_type == LNET_HEALTH_TYPE_LOCAL_NI) + rc = lnet_get_local_ni_recovery_list(list); + else + rc = lnet_get_peer_ni_recovery_list(list); + mutex_unlock(&the_lnet.ln_api_mutex); + return rc; + } + case IOC_LIBCFS_ADD_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; From patchwork Thu Feb 27 21:09:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D2691580 for ; Thu, 27 Feb 2020 21:27:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05B41246A0 for ; Thu, 27 Feb 2020 21:27:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05B41246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDEBD348BDA; Thu, 27 Feb 2020 13:24:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E2A6321FB39 for ; Thu, 27 Feb 2020 13:18:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F5541020; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5CA4846C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:24 -0500 Message-Id: <1582838290-17243-97-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 096/622] lnet: health error simulation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Modified the error simulation code to simulate health errors for testing purposes. The specific error can be set. If multiple errors are configured then one at random is chosen from the set. EX: lctl net_drop_add -s *@tcp -d *@tcp -m GET -i 1 -e local_interrupt The -e can be repeated multiple times to specify different errors to simulate. The available set are local_interrupt local_dropped local_aborted local_no_route local_error local_timeout remote_error remote_dropped remote_timeout network_timeout random a -n, "--random", has been added to randomize error generation for drop rules. This will rely an interval value provided via -i. This will generate a random number no bigger than interval. If the number is smaller than half of the interval then the rule isn't matched, otherwise it is. The purpose of this is because drop matching can happen multiple times in the path of sending the message, and using time based or rate will not result in even error generation across the multiple calls. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 5c17777d97bd ("LU-9120 lnet: health error simulation") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32951 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 4 +- include/linux/lnet/lib-types.h | 3 +- include/uapi/linux/lnet/lnetctl.h | 17 +++++++++ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 ++- net/lnet/klnds/socklnd/socklnd_cb.c | 27 ++++++++++---- net/lnet/lnet/lib-move.c | 2 +- net/lnet/lnet/lib-msg.c | 24 ++++++++++++ net/lnet/lnet/net_fault.c | 73 ++++++++++++++++++++++++++++++++++--- 8 files changed, 138 insertions(+), 18 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index e4d9ccc..4915a87 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -639,6 +639,8 @@ void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt); void lnet_finalize(struct lnet_msg *msg, int rc); +bool lnet_send_error_simulation(struct lnet_msg *msg, + enum lnet_msg_hstatus *hstatus); void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, u32 msg_type); @@ -661,7 +663,7 @@ void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, int lnet_fault_init(void); void lnet_fault_fini(void); -bool lnet_drop_rule_match(struct lnet_hdr *hdr); +bool lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus); int lnet_delay_rule_add(struct lnet_fault_attr *attr); int lnet_delay_rule_del(lnet_nid_t src, lnet_nid_t dst, bool shutdown); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e5d4128..f82ebb6 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -72,7 +72,8 @@ enum lnet_msg_hstatus { LNET_MSG_STATUS_REMOTE_ERROR, LNET_MSG_STATUS_REMOTE_DROPPED, LNET_MSG_STATUS_REMOTE_TIMEOUT, - LNET_MSG_STATUS_NETWORK_TIMEOUT + LNET_MSG_STATUS_NETWORK_TIMEOUT, + LNET_MSG_STATUS_END, }; struct lnet_rsp_tracker { diff --git a/include/uapi/linux/lnet/lnetctl.h b/include/uapi/linux/lnet/lnetctl.h index 191689c..2eb9c82 100644 --- a/include/uapi/linux/lnet/lnetctl.h +++ b/include/uapi/linux/lnet/lnetctl.h @@ -41,6 +41,19 @@ enum { #define LNET_GET_BIT (1 << 2) #define LNET_REPLY_BIT (1 << 3) +#define HSTATUS_END 11 +#define HSTATUS_LOCAL_INTERRUPT_BIT (1 << 1) +#define HSTATUS_LOCAL_DROPPED_BIT (1 << 2) +#define HSTATUS_LOCAL_ABORTED_BIT (1 << 3) +#define HSTATUS_LOCAL_NO_ROUTE_BIT (1 << 4) +#define HSTATUS_LOCAL_ERROR_BIT (1 << 5) +#define HSTATUS_LOCAL_TIMEOUT_BIT (1 << 6) +#define HSTATUS_REMOTE_ERROR_BIT (1 << 7) +#define HSTATUS_REMOTE_DROPPED_BIT (1 << 8) +#define HSTATUS_REMOTE_TIMEOUT_BIT (1 << 9) +#define HSTATUS_NETWORK_TIMEOUT_BIT (1 << 10) +#define HSTATUS_RANDOM 0xffffffff + /** ioctl parameter for LNet fault simulation */ struct lnet_fault_attr { /** @@ -78,6 +91,10 @@ struct lnet_fault_attr { * with da_rate */ __u32 da_interval; + /** error type mask */ + __u32 da_health_error_mask; + /** randomize error generation */ + bool da_random; } drop; /** message latency simulation */ struct { diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 293a859..5680f2a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -912,7 +912,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, bad->wr_id, bad->opcode, bad->send_flags, libcfs_nid2str(conn->ibc_peer->ibp_nid)); bad = NULL; - rc = ib_post_send(conn->ibc_cmid->qp, wrq, &bad); + if (lnet_send_error_simulation(tx->tx_lntmsg[0], + &tx->tx_hstatus)) + rc = -EINVAL; + else + rc = ib_post_send(conn->ibc_cmid->qp, wrq, &bad); } conn->ibc_last_send = ktime_get(); diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 8bc23d2..057c7f3 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -335,7 +335,8 @@ struct ksock_tx * if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) { rc = -EIO; - hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (hstatus == LNET_MSG_STATUS_OK) + hstatus = LNET_MSG_STATUS_LOCAL_ERROR; } if (tx->tx_conn) @@ -467,6 +468,13 @@ struct ksock_tx * ksocknal_process_transmit(struct ksock_conn *conn, struct ksock_tx *tx) { int rc; + bool error_sim = false; + + if (lnet_send_error_simulation(tx->tx_lnetmsg, &tx->tx_hstatus)) { + error_sim = true; + rc = -EINVAL; + goto simulate_error; + } if (tx->tx_zc_capable && !tx->tx_zc_checked) ksocknal_check_zc_req(tx); @@ -512,16 +520,19 @@ struct ksock_tx * return rc; } +simulate_error: /* Actual error */ LASSERT(rc < 0); - /* set the health status of the message which determines - * whether we should retry the transmit - */ - if (rc == -ETIMEDOUT) - tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; - else - tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + if (!error_sim) { + /* set the health status of the message which determines + * whether we should retry the transmit + */ + if (rc == -ETIMEDOUT) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_TIMEOUT; + else + tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; + } if (!conn->ksnc_closing) { switch (rc) { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 6a3704d..eb0b48d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3875,7 +3875,7 @@ void lnet_monitor_thr_stop(void) } if (!list_empty(&the_lnet.ln_drop_rules) && - lnet_drop_rule_match(hdr)) { + lnet_drop_rule_match(hdr, NULL)) { CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n", libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid), lnet_msgtyp2str(type)); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 70decc7..5072238 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -812,6 +812,30 @@ } } +bool +lnet_send_error_simulation(struct lnet_msg *msg, + enum lnet_msg_hstatus *hstatus) +{ + if (!msg) + return false; + + if (list_empty(&the_lnet.ln_drop_rules)) + return false; + + /* match only health rules */ + if (!lnet_drop_rule_match(&msg->msg_hdr, hstatus)) + return false; + + CDEBUG(D_NET, "src %s, dst %s: %s simulate health error: %s\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(*hstatus)); + + return true; +} +EXPORT_SYMBOL(lnet_send_error_simulation); + void lnet_finalize(struct lnet_msg *msg, int status) { diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index 4589b17..becb709 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -292,13 +292,56 @@ struct lnet_drop_rule { lnet_net_unlock(cpt); } +static void +lnet_fault_match_health(enum lnet_msg_hstatus *hstatus, __u32 mask) +{ + int choice; + int delta; + int best_delta; + int i; + + /* assign a random failure */ + choice = prandom_u32_max(LNET_MSG_STATUS_END - LNET_MSG_STATUS_OK); + if (choice == 0) + choice++; + + if (mask == HSTATUS_RANDOM) { + *hstatus = choice; + return; + } + + if (mask & (1 << choice)) { + *hstatus = choice; + return; + } + + /* round to the closest ON bit */ + i = HSTATUS_END; + best_delta = HSTATUS_END; + while (i > 0) { + if (mask & (1 << i)) { + delta = choice - i; + if (delta < 0) + delta *= -1; + if (delta < best_delta) { + best_delta = delta; + choice = i; + } + } + i--; + } + + *hstatus = choice; +} + /** * check source/destination NID, portal, message type and drop rate, * decide whether should drop this message or not */ static bool drop_rule_match(struct lnet_drop_rule *rule, lnet_nid_t src, - lnet_nid_t dst, unsigned int type, unsigned int portal) + lnet_nid_t dst, unsigned int type, unsigned int portal, + enum lnet_msg_hstatus *hstatus) { struct lnet_fault_attr *attr = &rule->dr_attr; bool drop; @@ -306,9 +349,23 @@ struct lnet_drop_rule { if (!lnet_fault_attr_match(attr, src, dst, type, portal)) return false; + /* if we're trying to match a health status error but it hasn't + * been set in the rule, then don't match + */ + if ((hstatus && !attr->u.drop.da_health_error_mask) || + (!hstatus && attr->u.drop.da_health_error_mask)) + return false; + /* match this rule, check drop rate now */ spin_lock(&rule->dr_lock); - if (rule->dr_drop_time) { /* time based drop */ + if (attr->u.drop.da_random) { + int value = prandom_u32_max(attr->u.drop.da_interval); + + if (value >= (attr->u.drop.da_interval / 2)) + drop = true; + else + drop = false; + } else if (rule->dr_drop_time) { /* time based drop */ time64_t now = ktime_get_seconds(); rule->dr_stat.fs_count++; @@ -340,6 +397,9 @@ struct lnet_drop_rule { } if (drop) { /* drop this message, update counters */ + if (hstatus) + lnet_fault_match_health(hstatus, + attr->u.drop.da_health_error_mask); lnet_fault_stat_inc(&rule->dr_stat, type); rule->dr_stat.u.drop.ds_dropped++; } @@ -352,12 +412,12 @@ struct lnet_drop_rule { * Check if message from @src to @dst can match any existed drop rule */ bool -lnet_drop_rule_match(struct lnet_hdr *hdr) +lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus) { - struct lnet_drop_rule *rule; lnet_nid_t src = le64_to_cpu(hdr->src_nid); lnet_nid_t dst = le64_to_cpu(hdr->dest_nid); unsigned int typ = le32_to_cpu(hdr->type); + struct lnet_drop_rule *rule; unsigned int ptl = -1; bool drop = false; int cpt; @@ -373,12 +433,13 @@ struct lnet_drop_rule { cpt = lnet_net_lock_current(); list_for_each_entry(rule, &the_lnet.ln_drop_rules, dr_link) { - drop = drop_rule_match(rule, src, dst, typ, ptl); + drop = drop_rule_match(rule, src, dst, typ, ptl, + hstatus); if (drop) break; } - lnet_net_unlock(cpt); + return drop; } From patchwork Thu Feb 27 21:09:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CDB414BC for ; Thu, 27 Feb 2020 21:24:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 656A5246A0 for ; Thu, 27 Feb 2020 21:24:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 656A5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76DB021FF70; Thu, 27 Feb 2020 13:21:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44E0521FB39 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 626A11021; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F94646D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:25 -0500 Message-Id: <1582838290-17243-98-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 097/622] lustre: ptlrpc: replace simple_strtol with kstrtol X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Eventually simple_strtol() will be removed so replace its use in the ptlrpc with kstrtoXXX() class of functions. WC-bug-id: https://jira.whamcloud.com/browse/LU-9325 Lustre-commit: 8f37d64b6bc9 ("LU-9325 ptlrpc: replace simple_strtol with kstrtol") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32785 Reviewed-by: Andreas Dilger Reviewed-by: Nikitas Angelinas Signed-off-by: James Simmons --- fs/lustre/ptlrpc/lproc_ptlrpc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 6af3384..eb0ecc0 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -1303,13 +1303,13 @@ int lprocfs_wr_import(struct file *file, const char __user *buffer, ptr = strstr(uuid, "::"); if (ptr) { u32 inst; - char *endptr; + int rc; *ptr = 0; do_reconn = 0; ptr += strlen("::"); - inst = simple_strtoul(ptr, &endptr, 10); - if (*endptr) { + rc = kstrtouint(ptr, 10, &inst); + if (rc) { CERROR("config: wrong instance # %s\n", ptr); } else if (inst != imp->imp_connect_data.ocd_instance) { CDEBUG(D_INFO, From patchwork Thu Feb 27 21:09:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C9881580 for ; Thu, 27 Feb 2020 21:27:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15303246A0 for ; Thu, 27 Feb 2020 21:27:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15303246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B9D8349218; Thu, 27 Feb 2020 13:24:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88C7121FAD5 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 63EBC1022; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 62A2446A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:26 -0500 Message-Id: <1582838290-17243-99-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 098/622] lustre: obd: use correct ip_compute_csum() version X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The linux kernel provides a generic platform independent version of ip_compute_csum() as well as platform optimized versions. Some platforms will disable the generic platform version in favor of the optimized one. If the generic version is disabled and if the checksum.h header from asm-generic is used then we will end up with a undefined symbol error when loading the obdclass module. The solution is to use the platform specific checksum.h header that will handle using the generic or optimized version for us. As a bounus we get better performance with the right kernel configuration. WC-bug-id: https://jira.whamcloud.com/browse/LU-11224 Lustre-commit: 82fe90a1d07d ("LU-11224 obd: use correct ip_compute_csum() version") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/32953 Reviewed-by: Li Xi Reviewed-by: Li Dongyang Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c index 8348b16..5cb9a25 100644 --- a/fs/lustre/obdclass/integrity.c +++ b/fs/lustre/obdclass/integrity.c @@ -28,7 +28,7 @@ */ #include #include -#include +#include #include #include From patchwork Thu Feb 27 21:09:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410259 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03A08138D for ; Thu, 27 Feb 2020 21:33:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E02FA24677 for ; Thu, 27 Feb 2020 21:33:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E02FA24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38C71349267; Thu, 27 Feb 2020 13:28:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CCEF821FAD5 for ; Thu, 27 Feb 2020 13:18:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 66F6D1023; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6572E46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:27 -0500 Message-Id: <1582838290-17243-100-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 099/622] lustre: osc: serialize access to idle_timeout vs cleanup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev use lprocfs_climp_check() and up_read() as cl_import can disappear due to umount. WC-bug-id: https://jira.whamcloud.com/browse/LU-11175 Lustre-commit: 5874da0b670b ("LU-11175 osc: serialize access to idle_timeout vs cleanup") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/32883 Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 0a12079..efb4998 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -604,8 +604,15 @@ static ssize_t idle_timeout_show(struct kobject *kobj, struct attribute *attr, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &obd->u.cli; + int ret; - return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); + ret = lprocfs_climp_check(obd); + if (ret) + return ret; + ret = sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout); + up_read(&obd->u.cli.cl_sem); + + return ret; } static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, @@ -625,6 +632,10 @@ static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, if (val > CONNECTION_SWITCH_MAX) return -ERANGE; + rc = lprocfs_climp_check(obd); + if (rc) + return rc; + cli->cl_import->imp_idle_timeout = val; /* to initiate the connection if it's in IDLE state */ @@ -633,6 +644,7 @@ static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, if (req) ptlrpc_req_finished(req); } + up_read(&obd->u.cli.cl_sem); return count; } @@ -645,12 +657,18 @@ static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, obd_kset.kobj); struct client_obd *cli = &dev->u.cli; struct ptlrpc_request *req; + int rc; + + rc = lprocfs_climp_check(dev); + if (rc) + return rc; /* to initiate the connection if it's in IDLE state */ req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); if (req) ptlrpc_req_finished(req); ptlrpc_pinger_force(cli->cl_import); + up_read(&dev->u.cli.cl_sem); return count; } From patchwork Thu Feb 27 21:09:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24AB014BC for ; Thu, 27 Feb 2020 21:24:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D0B7246A0 for ; Thu, 27 Feb 2020 21:24:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D0B7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 01A8E348B9C; Thu, 27 Feb 2020 13:22:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1A0C421FB4B for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6E16E1024; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C584468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:28 -0500 Message-Id: <1582838290-17243-101-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 100/622] lustre: mdc: remove obsolete intent opcodes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In enum ldlm_intent_flags, remove the obsolete constants IT_UNLINK, IT_TRUNC, IT_EXEC, IT_PIN, IT_SETXATTR. Remove any handling code for these opcodes. WC-bug-id: https://jira.whamcloud.com/browse/LU-11014 Lustre-commit: 511ea5850f25 ("LU-11014 mdc: remove obsolete intent opcodes") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32361 Reviewed-by: Fan Yong Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 - fs/lustre/include/obd.h | 4 +--- fs/lustre/ldlm/ldlm_lock.c | 2 -- fs/lustre/mdc/mdc_locks.c | 44 +++------------------------------- fs/lustre/ptlrpc/layout.c | 15 ------------ include/uapi/linux/lustre/lustre_idl.h | 14 +++++------ 6 files changed, 11 insertions(+), 69 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 807d080..ed4fc42 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -203,7 +203,6 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_LDLM_INTENT_GETATTR; extern struct req_format RQF_LDLM_INTENT_OPEN; extern struct req_format RQF_LDLM_INTENT_CREATE; -extern struct req_format RQF_LDLM_INTENT_UNLINK; extern struct req_format RQF_LDLM_INTENT_GETXATTR; extern struct req_format RQF_LDLM_CANCEL; extern struct req_format RQF_LDLM_CALLBACK; diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index de9642f..175a99f 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -700,8 +700,6 @@ static inline int it_to_lock_mode(struct lookup_intent *it) return LCK_PR; else if (it->it_op & IT_GETXATTR) return LCK_PR; - else if (it->it_op & IT_SETXATTR) - return LCK_PW; LASSERTF(0, "Invalid it_op: %d\n", it->it_op); return -EINVAL; @@ -730,7 +728,7 @@ enum md_cli_flags { */ static inline bool it_has_reply_body(const struct lookup_intent *it) { - return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR); + return it->it_op & (IT_OPEN | IT_LOOKUP | IT_GETATTR); } struct md_op_data { diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 1bf387a..4f746ad 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -123,8 +123,6 @@ const char *ldlm_it2str(enum ldlm_intent_flags it) return "getattr"; case IT_LOOKUP: return "lookup"; - case IT_UNLINK: - return "unlink"; case IT_GETXATTR: return "getxattr"; case IT_LAYOUT: diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index abbc908..80f2e10 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -430,42 +430,6 @@ static int mdc_save_lovea(struct ptlrpc_request *req, return req; } -static struct ptlrpc_request *mdc_intent_unlink_pack(struct obd_export *exp, - struct lookup_intent *it, - struct md_op_data *op_data) -{ - struct ptlrpc_request *req; - struct obd_device *obddev = class_exp2obd(exp); - struct ldlm_intent *lit; - int rc; - - req = ptlrpc_request_alloc(class_exp2cliimp(exp), - &RQF_LDLM_INTENT_UNLINK); - if (!req) - return ERR_PTR(-ENOMEM); - - req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, - op_data->op_namelen + 1); - - rc = ldlm_prep_enqueue_req(exp, req, NULL, 0); - if (rc) { - ptlrpc_request_free(req); - return ERR_PTR(rc); - } - - /* pack the intent */ - lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); - lit->opc = (u64)it->it_op; - - /* pack the intended request */ - mdc_unlink_pack(req, op_data); - - req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, - obddev->u.cli.cl_default_mds_easize); - ptlrpc_request_set_replen(req); - return req; -} - static struct ptlrpc_request * mdc_intent_getattr_pack(struct obd_export *exp, struct lookup_intent *it, struct md_op_data *op_data, u32 acl_bufsize) @@ -820,18 +784,18 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, LASSERT(!policy); saved_flags |= LDLM_FL_HAS_INTENT; - if (it->it_op & (IT_UNLINK | IT_GETATTR | IT_READDIR)) + if (it->it_op & (IT_GETATTR | IT_READDIR)) policy = &update_policy; else if (it->it_op & IT_LAYOUT) policy = &layout_policy; - else if (it->it_op & (IT_GETXATTR | IT_SETXATTR)) + else if (it->it_op & IT_GETXATTR) policy = &getxattr_policy; else policy = &lookup_policy; } generation = obddev->u.cli.cl_import->imp_generation; - if (!it || (it->it_op & (IT_CREAT | IT_OPEN_CREAT))) + if (!it || (it->it_op & (IT_OPEN | IT_CREAT))) acl_bufsize = imp->imp_connect_data.ocd_max_easize; else acl_bufsize = LUSTRE_POSIX_ACL_MAX_SIZE_OLD; @@ -845,8 +809,6 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, res_id.name[3] = LDLM_FLOCK; } else if (it->it_op & IT_OPEN) { req = mdc_intent_open_pack(exp, it, op_data, acl_bufsize); - } else if (it->it_op & IT_UNLINK) { - req = mdc_intent_unlink_pack(exp, it, op_data); } else if (it->it_op & (IT_GETATTR | IT_LOOKUP)) { req = mdc_intent_getattr_pack(exp, it, op_data, acl_bufsize); } else if (it->it_op & IT_READDIR) { diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index ae573a2..70344b9 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -462,15 +462,6 @@ &RMF_FILE_SECCTX }; -static const struct req_msg_field *ldlm_intent_unlink_client[] = { - &RMF_PTLRPC_BODY, - &RMF_DLM_REQ, - &RMF_LDLM_INTENT, - &RMF_REC_REINT, /* coincides with mds_reint_unlink_client[] */ - &RMF_CAPA1, - &RMF_NAME -}; - static const struct req_msg_field *ldlm_intent_getxattr_client[] = { &RMF_PTLRPC_BODY, &RMF_DLM_REQ, @@ -756,7 +747,6 @@ &RQF_LDLM_INTENT_GETATTR, &RQF_LDLM_INTENT_OPEN, &RQF_LDLM_INTENT_CREATE, - &RQF_LDLM_INTENT_UNLINK, &RQF_LDLM_INTENT_GETXATTR, &RQF_LLOG_ORIGIN_HANDLE_CREATE, &RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK, @@ -1431,11 +1421,6 @@ struct req_format RQF_LDLM_INTENT_CREATE = ldlm_intent_create_client, ldlm_intent_getattr_server); EXPORT_SYMBOL(RQF_LDLM_INTENT_CREATE); -struct req_format RQF_LDLM_INTENT_UNLINK = - DEFINE_REQ_FMT0("LDLM_INTENT_UNLINK", - ldlm_intent_unlink_client, ldlm_intent_server); -EXPORT_SYMBOL(RQF_LDLM_INTENT_UNLINK); - struct req_format RQF_LDLM_INTENT_GETXATTR = DEFINE_REQ_FMT0("LDLM_INTENT_GETXATTR", ldlm_intent_getxattr_client, diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index dc9872cf3..249a3d5 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2190,19 +2190,19 @@ struct ldlm_flock_wire { enum ldlm_intent_flags { IT_OPEN = 0x00000001, IT_CREAT = 0x00000002, - IT_OPEN_CREAT = 0x00000003, - IT_READDIR = 0x00000004, + IT_OPEN_CREAT = IT_OPEN | IT_CREAT, /* To allow case label. */ + IT_READDIR = 0x00000004, /* Used by mdc, not put on the wire. */ IT_GETATTR = 0x00000008, IT_LOOKUP = 0x00000010, - IT_UNLINK = 0x00000020, - IT_TRUNC = 0x00000040, +/* IT_UNLINK = 0x00000020, Obsolete. */ +/* IT_TRUNC = 0x00000040, Obsolete. */ IT_GETXATTR = 0x00000080, - IT_EXEC = 0x00000100, - IT_PIN = 0x00000200, +/* IT_EXEC = 0x00000100, Obsolete. */ +/* IT_PIN = 0x00000200, Obsolete. */ IT_LAYOUT = 0x00000400, IT_QUOTA_DQACQ = 0x00000800, IT_QUOTA_CONN = 0x00001000, - IT_SETXATTR = 0x00002000, +/* IT_SETXATTR = 0x00002000, Obsolete. */ IT_GLIMPSE = 0x00004000, IT_BRW = 0x00008000, }; From patchwork Thu Feb 27 21:09:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DEF2138D for ; Thu, 27 Feb 2020 21:33:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66723246A1 for ; Thu, 27 Feb 2020 21:33:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66723246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BB05821F964; Thu, 27 Feb 2020 13:28:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6F4C921FAFB for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6F8A81025; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6D3E446C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:29 -0500 Message-Id: <1582838290-17243-102-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 101/622] lustre: llite: fix setstripe for specific osts upon dir X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong LOV_USER_MAGIC_SPECIFIC function is broken and it was not available for setting directory. 1) llite doesn't handle LOV_USER_MAGIC_SPECIFIC case properly for dir {set,get}_stripe, and ioctl LL_IOC_LOV_SETSTRIPE did not alloc enough buf, copy ost lists from userspace. 2) lod_get_default_lov_striping() did not handle LOV_USER_MAGIC_SPECIFIC type that newly created files/dir won't inherit parent setting well. 3) there is not any case to cover lfs setstripe '-o' interface which make it hard to figure out when this function was broken. WC-bug-id: https://jira.whamcloud.com/browse/LU-11146 Lustre-commit: 083d62ee6de5 ("LU-11146 lustre: fix setstripe for specific osts upon dir") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/32814 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 71 ++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 56 insertions(+), 15 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 751d0183..06f7bd3 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -541,6 +541,21 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, lum_size = sizeof(struct lmv_user_md); break; } + case LOV_USER_MAGIC_SPECIFIC: { + struct lov_user_md_v3 *v3 = + (struct lov_user_md_v3 *)lump; + if (v3->lmm_stripe_count > LOV_MAX_STRIPE_COUNT) + return -EINVAL; + if (lump->lmm_magic != + cpu_to_le32(LOV_USER_MAGIC_SPECIFIC)) { + lustre_swab_lov_user_md_v3(v3); + lustre_swab_lov_user_md_objects(v3->lmm_objects, + v3->lmm_stripe_count); + } + lum_size = lov_user_md_size(v3->lmm_stripe_count, + LOV_USER_MAGIC_SPECIFIC); + break; + } default: { CDEBUG(D_IOCTL, "bad userland LOV MAGIC: %#08x != %#08x nor %#08x\n", @@ -695,6 +710,16 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC) lustre_swab_lmv_user_md((struct lmv_user_md *)lmm); break; + case LOV_USER_MAGIC_SPECIFIC: { + struct lov_user_md_v3 *v3 = (struct lov_user_md_v3 *)lmm; + + if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) { + lustre_swab_lov_user_md_v3(v3); + lustre_swab_lov_user_md_objects(v3->lmm_objects, + v3->lmm_stripe_count); + } + } + break; default: CERROR("unknown magic: %lX\n", (unsigned long)lmm->lmm_magic); rc = -EPROTO; @@ -1230,35 +1255,51 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) } case LL_IOC_LOV_SETSTRIPE_NEW: case LL_IOC_LOV_SETSTRIPE: { - struct lov_user_md_v3 lumv3; - struct lov_user_md_v1 *lumv1 = (struct lov_user_md_v1 *)&lumv3; + struct lov_user_md_v3 *lumv3 = NULL; + struct lov_user_md_v1 lumv1; + struct lov_user_md_v1 *lumv1_ptr = &lumv1; struct lov_user_md_v1 __user *lumv1p = (void __user *)arg; struct lov_user_md_v3 __user *lumv3p = (void __user *)arg; + int lum_size; int set_default = 0; BUILD_BUG_ON(sizeof(struct lov_user_md_v3) <= sizeof(struct lov_comp_md_v1)); - BUILD_BUG_ON(sizeof(lumv3) != sizeof(*lumv3p)); - BUILD_BUG_ON(sizeof(lumv3.lmm_objects[0]) != - sizeof(lumv3p->lmm_objects[0])); + BUILD_BUG_ON(sizeof(*lumv3) != sizeof(*lumv3p)); /* first try with v1 which is smaller than v3 */ - if (copy_from_user(lumv1, lumv1p, sizeof(*lumv1))) + if (copy_from_user(&lumv1, lumv1p, sizeof(lumv1))) return -EFAULT; - if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) { - if (copy_from_user(&lumv3, lumv3p, sizeof(lumv3))) - return -EFAULT; - if (lumv3.lmm_magic != LOV_USER_MAGIC_V3) - return -EINVAL; - } - if (is_root_inode(inode)) set_default = 1; - /* in v1 and v3 cases lumv1 points to data */ - rc = ll_dir_setstripe(inode, lumv1, set_default); + switch (lumv1.lmm_magic) { + case LOV_USER_MAGIC_V3: + case LOV_USER_MAGIC_SPECIFIC: + lum_size = ll_lov_user_md_size(&lumv1); + if (lum_size < 0) + return lum_size; + lumv3 = kzalloc(lum_size, GFP_NOFS); + if (!lumv3) + return -ENOMEM; + if (copy_from_user(lumv3, lumv3p, lum_size)) { + rc = -EFAULT; + goto out; + } + lumv1_ptr = (struct lov_user_md_v1 *)lumv3; + break; + case LOV_USER_MAGIC_V1: + break; + default: + rc = -ENOTSUPP; + goto out; + } + /* in v1 and v3 cases lumv1 points to data */ + rc = ll_dir_setstripe(inode, lumv1_ptr, set_default); +out: + kfree(lumv3); return rc; } case LL_IOC_LMV_GETSTRIPE: { From patchwork Thu Feb 27 21:09:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11D3B138D for ; Thu, 27 Feb 2020 21:24:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EEBB5246A0 for ; Thu, 27 Feb 2020 21:24:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EEBB5246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 50D9821FF8D; Thu, 27 Feb 2020 13:22:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C606221FAFB for ; Thu, 27 Feb 2020 13:18:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 723751026; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F79046A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:30 -0500 Message-Id: <1582838290-17243-103-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 102/622] lustre: osc: enable/disable OSC grant shrink X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam Add an OSC sysfs interface to enable/disable client's grant shrink feature. lctl get_param osc.*.grant_shrink lctl set_param osc.*.grant_shrink={0,1} WC-bug-id: https://jira.whamcloud.com/browse/LU-8708 Lustre-commit: 3e070e30a98d ("LU-8708 osc: enable/disable OSC grant shrink") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/23203 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index efb4998..16de266 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -674,6 +674,72 @@ static ssize_t idle_connect_store(struct kobject *kobj, struct attribute *attr, } LUSTRE_WO_ATTR(idle_connect); +static ssize_t grant_shrink_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct obd_device *obd = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct client_obd *cli = &obd->u.cli; + struct obd_connect_data *ocd; + ssize_t len; + + len = lprocfs_climp_check(obd); + if (len) + return len; + + ocd = &cli->cl_import->imp_connect_data; + + len = snprintf(buf, PAGE_SIZE, "%d\n", + !!OCD_HAS_FLAG(ocd, GRANT_SHRINK)); + up_read(&obd->u.cli.cl_sem); + + return len; +} + +static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct client_obd *cli = &dev->u.cli; + struct obd_connect_data *ocd; + bool val; + int rc; + + if (!dev) + return 0; + + rc = kstrtobool(buffer, &val); + if (rc) + return rc; + + rc = lprocfs_climp_check(dev); + if (rc) + return rc; + + ocd = &cli->cl_import->imp_connect_data; + + if (!val) { + if (OCD_HAS_FLAG(ocd, GRANT_SHRINK)) + ocd->ocd_connect_flags &= ~OBD_CONNECT_GRANT_SHRINK; + } else { + /** + * server replied obd_connect_data is always bigger, so + * client's imp_connect_flags_orig are always supported + * by the server + */ + if (!OCD_HAS_FLAG(ocd, GRANT_SHRINK) && + cli->cl_import->imp_connect_flags_orig & + OBD_CONNECT_GRANT_SHRINK) + ocd->ocd_connect_flags |= OBD_CONNECT_GRANT_SHRINK; + } + + up_read(&dev->u.cli.cl_sem); + + return count; +} +LUSTRE_RW_ATTR(grant_shrink); + LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags); LPROC_SEQ_FOPS_RO_TYPE(osc, server_uuid); LPROC_SEQ_FOPS_RO_TYPE(osc, timeouts); @@ -889,6 +955,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_ping.attr, &lustre_attr_idle_timeout.attr, &lustre_attr_idle_connect.attr, + &lustre_attr_grant_shrink.attr, NULL, }; From patchwork Thu Feb 27 21:09:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6D7D14BC for ; Thu, 27 Feb 2020 21:24:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF7E8246A0 for ; Thu, 27 Feb 2020 21:24:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF7E8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0062348B56; Thu, 27 Feb 2020 13:21:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1386021FAFB for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 742611027; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 728D246D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:31 -0500 Message-Id: <1582838290-17243-104-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 103/622] lustre: protocol: MDT as a statfs proxy X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev MDT can act as a proxy for statfs data. this should make df faster (RTT vs RTT*(#MDTs+1)) and enable idling connections so that clients don't connect to each OST just to report statfs data. the protocol has been changing slightly to let MDT differentiate self and aggregated statfs. also, obd_statfs has got a new field "granted" where OST reports how much space has been granted to the requesting MDT so that space can be added to available space. client's NID is used to distribute MDS_STATFS among MDTS. WC-bug-id: https://jira.whamcloud.com/browse/LU-10018 Lustre-commit: b500d5193360 ("LU-10018 protocol: MDT as a statfs proxy") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/29136 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 1 + fs/lustre/include/obd_class.h | 7 +++- fs/lustre/include/obd_support.h | 2 + fs/lustre/llite/llite_lib.c | 9 ++++- fs/lustre/lmv/lmv_obd.c | 65 ++++++++++++++++++++++++++------- fs/lustre/mdc/mdc_request.c | 13 +++++++ fs/lustre/ptlrpc/layout.c | 2 +- fs/lustre/ptlrpc/pack_generic.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 8 ++-- include/uapi/linux/lustre/lustre_idl.h | 3 +- include/uapi/linux/lustre/lustre_user.h | 7 ++-- 11 files changed, 92 insertions(+), 27 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 175a99f..9286755 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -442,6 +442,7 @@ struct lmv_obd { u32 tgts_size; /* size of tgts array */ struct lmv_tgt_desc **tgts; + int lmv_statfs_start; struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 0153c50..a3ef5d5 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -47,6 +47,8 @@ #define OBD_STATFS_FROM_CACHE 0x0002 /* the statfs is only for retrieving information from MDT0 */ #define OBD_STATFS_FOR_MDT0 0x0004 +/* get aggregated statfs from MDT */ +#define OBD_STATFS_SUM 0x0008 /* OBD Device Declarations */ extern rwlock_t obd_dev_lock; @@ -947,7 +949,10 @@ static inline int obd_statfs(const struct lu_env *env, struct obd_export *exp, CDEBUG(D_SUPER, "osfs %lld, max_age %lld\n", obd->obd_osfs_age, max_age); - if (obd->obd_osfs_age < max_age) { + /* ignore cache if aggregated isn't expected */ + if (obd->obd_osfs_age < max_age || + ((obd->obd_osfs.os_state & OS_STATE_SUM) && + !(flags & OBD_STATFS_SUM))) { rc = OBP(obd, statfs)(env, exp, osfs, max_age, flags); if (rc == 0) { spin_lock(&obd->obd_osfs_lock); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 28becfa..3d14723 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -137,7 +137,9 @@ #define OBD_FAIL_MDS_GET_ROOT_NET 0x11b #define OBD_FAIL_MDS_GET_ROOT_PACK 0x11c #define OBD_FAIL_MDS_STATFS_PACK 0x11d +#define OBD_FAIL_MDS_STATFS_SUM_PACK 0x11d #define OBD_FAIL_MDS_STATFS_NET 0x11e +#define OBD_FAIL_MDS_STATFS_SUM_NET 0x11e #define OBD_FAIL_MDS_GETATTR_NAME_NET 0x11f #define OBD_FAIL_MDS_PIN_NET 0x120 #define OBD_FAIL_MDS_UNPIN_NET 0x121 diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index c04146f..8b3e2a3 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -211,7 +211,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | - OBD_CONNECT2_DIR_MIGRATE; + OBD_CONNECT2_DIR_MIGRATE | + OBD_CONNECT2_SUM_STATFS; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; @@ -1751,6 +1752,9 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, osfs->os_bavail, osfs->os_blocks, osfs->os_ffree, osfs->os_files); + if (osfs->os_state & OS_STATE_SUM) + goto out; + if (sbi->ll_flags & LL_SBI_LAZYSTATFS) flags |= OBD_STATFS_NODELAY; @@ -1779,6 +1783,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, osfs->os_ffree = obd_osfs.os_ffree; } +out: return rc; } @@ -1793,7 +1798,7 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STAFS, 1); /* Some amount of caching on the client is allowed */ - rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, 0); + rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, OBD_STATFS_SUM); if (rc) return rc; diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index c7bf8c7..90a46c4 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1325,6 +1325,33 @@ static int lmv_process_config(struct obd_device *obd, u32 len, void *buf) return rc; } +static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) +{ + int i; + + if (flags & OBD_STATFS_FOR_MDT0) + return 0; + + if (lmv->lmv_statfs_start || lmv->desc.ld_tgt_count == 1) + return lmv->lmv_statfs_start; + + /* choose initial MDT for this client */ + for (i = 0;; i++) { + struct lnet_process_id lnet_id; + + if (LNetGetId(i, &lnet_id) == -ENOENT) + break; + + if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) { + lmv->lmv_statfs_start = + lnet_id.nid % lmv->desc.ld_tgt_count; + break; + } + } + + return lmv->lmv_statfs_start; +} + static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, struct obd_statfs *osfs, time64_t max_age, u32 flags) { @@ -1332,41 +1359,51 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, struct lmv_obd *lmv = &obd->u.lmv; struct obd_statfs *temp; int rc = 0; - u32 i; + u32 i, idx; temp = kzalloc(sizeof(*temp), GFP_NOFS); if (!temp) return -ENOMEM; - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp) + /* distribute statfs among MDTs */ + idx = lmv_select_statfs_mdt(lmv, flags); + + for (i = 0; i < lmv->desc.ld_tgt_count; i++, idx++) { + idx = idx % lmv->desc.ld_tgt_count; + if (!lmv->tgts[idx] || !lmv->tgts[idx]->ltd_exp) continue; - rc = obd_statfs(env, lmv->tgts[i]->ltd_exp, temp, + rc = obd_statfs(env, lmv->tgts[idx]->ltd_exp, temp, max_age, flags); if (rc) { CERROR("can't stat MDS #%d (%s), error %d\n", i, - lmv->tgts[i]->ltd_exp->exp_obd->obd_name, + lmv->tgts[idx]->ltd_exp->exp_obd->obd_name, rc); goto out_free_temp; } + if (temp->os_state & OS_STATE_SUM || + flags == OBD_STATFS_FOR_MDT0) { + /* Reset to the last aggregated values + * and don't sum with non-aggrated data. + * If the statfs is from mount, it needs to retrieve + * necessary information from MDT0. i.e. mount does + * not need the merged osfs from all of MDT. Also + * clients can be mounted as long as MDT0 is in + * service + */ + *osfs = *temp; + break; + } + if (i == 0) { *osfs = *temp; - /* If the statfs is from mount, it will needs - * retrieve necessary information from MDT0. - * i.e. mount does not need the merged osfs - * from all of MDT. - * And also clients can be mounted as long as - * MDT0 is in service - */ - if (flags & OBD_STATFS_FOR_MDT0) - goto out_free_temp; } else { osfs->os_bavail += temp->os_bavail; osfs->os_blocks += temp->os_blocks; osfs->os_ffree += temp->os_ffree; osfs->os_files += temp->os_files; + osfs->os_granted += temp->os_granted; } } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index b173937..3341761 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1495,6 +1495,19 @@ static int mdc_statfs(const struct lu_env *env, goto output; } + if ((flags & OBD_STATFS_SUM) && + (exp_connect_flags2(exp) & OBD_CONNECT2_SUM_STATFS)) { + /* request aggregated states */ + struct mdt_body *body; + + body = req_capsule_client_get(&req->rq_pill, &RMF_MDT_BODY); + if (!body) { + rc = -EPROTO; + goto out; + } + body->mbo_valid = OBD_MD_FLAGSTATFS; + } + ptlrpc_request_set_replen(req); if (flags & OBD_STATFS_NODELAY) { diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 70344b9..225a73e 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1252,7 +1252,7 @@ struct req_format RQF_MDS_GET_ROOT = EXPORT_SYMBOL(RQF_MDS_GET_ROOT); struct req_format RQF_MDS_STATFS = - DEFINE_REQ_FMT0("MDS_STATFS", empty, obd_statfs_server); + DEFINE_REQ_FMT0("MDS_STATFS", mdt_body_only, obd_statfs_server); EXPORT_SYMBOL(RQF_MDS_STATFS); struct req_format RQF_MDS_SYNC = diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index d09cf3f..e71f79d 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1645,7 +1645,7 @@ void lustre_swab_obd_statfs(struct obd_statfs *os) __swab32s(&os->os_state); __swab32s(&os->os_fprecreated); BUILD_BUG_ON(offsetof(typeof(*os), os_fprecreated) == 0); - BUILD_BUG_ON(offsetof(typeof(*os), os_spare2) == 0); + __swab32s(&os->os_granted); BUILD_BUG_ON(offsetof(typeof(*os), os_spare3) == 0); BUILD_BUG_ON(offsetof(typeof(*os), os_spare4) == 0); BUILD_BUG_ON(offsetof(typeof(*os), os_spare5) == 0); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 1afbb41..30083c2 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1696,10 +1696,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_fprecreated)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_fprecreated) == 4, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_fprecreated)); - LASSERTF((int)offsetof(struct obd_statfs, os_spare2) == 112, "found %lld\n", - (long long)(int)offsetof(struct obd_statfs, os_spare2)); - LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare2) == 4, "found %lld\n", - (long long)(int)sizeof(((struct obd_statfs *)0)->os_spare2)); + LASSERTF((int)offsetof(struct obd_statfs, os_granted) == 112, "found %lld\n", + (long long)(int)offsetof(struct obd_statfs, os_granted)); + LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_granted) == 4, "found %lld\n", + (long long)(int)sizeof(((struct obd_statfs *)0)->os_granted)); LASSERTF((int)offsetof(struct obd_statfs, os_spare3) == 116, "found %lld\n", (long long)(int)offsetof(struct obd_statfs, os_spare3)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare3) == 4, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 249a3d5..c65663a 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -793,6 +793,7 @@ struct ptlrpc_body_v2 { */ #define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ +#define OBD_CONNECT2_SUM_STATFS 0x8ULL /* MDT return aggregated stats */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ #define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents * for wbc, also operations @@ -1167,7 +1168,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLXATTRLS (0x0000002000000000ULL) /* xattr list */ #define OBD_MD_FLXATTRRM (0x0000004000000000ULL) /* xattr remove */ #define OBD_MD_FLACL (0x0000008000000000ULL) /* ACL */ -/* OBD_MD_FLRMTPERM (0x0000010000000000ULL) remote perm, obsolete */ +#define OBD_MD_FLAGSTATFS (0x0000010000000000ULL) /* aggregated statfs */ #define OBD_MD_FLMDSCAPA (0x0000020000000000ULL) /* MDS capability */ #define OBD_MD_FLOSSCAPA (0x0000040000000000ULL) /* OSS capability */ /* OBD_MD_FLCKSPLIT (0x0000080000000000ULL) obsolete 2.3.58*/ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 421c977..f25bb9b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -104,6 +104,7 @@ enum obd_statfs_state { OS_STATE_NOPRECREATE = 0x00000004, /**< no object precreation */ OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ + OS_STATE_SUM = 0x00000100, /**< aggregated for all tagrets */ }; struct obd_statfs { @@ -121,9 +122,9 @@ struct obd_statfs { __u32 os_fprecreated; /* objs available now to the caller * used in QoS code to find preferred OSTs */ - __u32 os_spare2; /* Unused padding fields. Remember */ - __u32 os_spare3; /* to fix lustre_swab_obd_statfs() */ - __u32 os_spare4; + __u32 os_granted; /* space granted for MDS */ + __u32 os_spare3; /* Unused padding fields. Remember */ + __u32 os_spare4; /* to fix lustre_swab_obd_statfs() */ __u32 os_spare5; __u32 os_spare6; __u32 os_spare7; From patchwork Thu Feb 27 21:09:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410233 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C426138D for ; Thu, 27 Feb 2020 21:33:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E92B9246A1 for ; Thu, 27 Feb 2020 21:33:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E92B9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 32EC93488D0; Thu, 27 Feb 2020 13:28:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69AE721FB55 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 778241029; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7564046F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:32 -0500 Message-Id: <1582838290-17243-105-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 104/622] lustre: ldlm: correct logic in ldlm_prepare_lru_list() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In ldlm_prepare_lru_list() fix an (x != a || x != b) type error and correct a use after free. WC-bug-id: https://jira.whamcloud.com/browse/LU-11075 Lustre-commit: aecafb57d5b6 ("LU-11075 ldlm: correct logic in ldlm_prepare_lru_list()") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/32660 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index bc441f0..f045d30 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1643,7 +1643,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* No locks which got blocking requests. */ LASSERT(!ldlm_is_bl_ast(lock)); - if (!ldlm_is_canceling(lock) || + if (!ldlm_is_canceling(lock) && !ldlm_is_converting(lock)) break; @@ -1686,7 +1686,6 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, if (result == LDLM_POLICY_SKIP_LOCK) { lu_ref_del(&lock->l_reference, __func__, current); - LDLM_LOCK_RELEASE(lock); if (no_wait) { spin_lock(&ns->ns_lock); if (!list_empty(&lock->l_lru) && @@ -1694,6 +1693,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, ns->ns_last_pos = &lock->l_lru; spin_unlock(&ns->ns_lock); } + + LDLM_LOCK_RELEASE(lock); continue; } From patchwork Thu Feb 27 21:09:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E050E138D for ; Thu, 27 Feb 2020 21:24:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8E25246A0 for ; Thu, 27 Feb 2020 21:24:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8E25246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6810E348B7F; Thu, 27 Feb 2020 13:21:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AAA8821F9C5 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 79C7A102D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7821047C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:33 -0500 Message-Id: <1582838290-17243-106-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 105/622] lustre: llite: check truncate race for DOM pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin In ll_dom_finish_open() check vmpage mapping still exists after locking and exit otherwise. This can happen if page has been truncated concurrently. WC-bug-id: https://jira.whamcloud.com/browse/LU-11275 Lustre-commit: 0f7d7b200b58 ("LU-11275 llite: check truncate race for DOM pages") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33087 Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 68fb623..ae39b2c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -496,6 +496,13 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, break; } lock_page(vmpage); + if (!vmpage->mapping) { + unlock_page(vmpage); + put_page(vmpage); + /* page was truncated */ + rc = -ENODATA; + goto out_io; + } clp = cl_page_find(env, obj, vmpage->index, vmpage, CPT_CACHEABLE); if (IS_ERR(clp)) { From patchwork Thu Feb 27 21:09:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36D8414BC for ; Thu, 27 Feb 2020 21:24:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1F860246A0 for ; Thu, 27 Feb 2020 21:24:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F860246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB8B421FF75; Thu, 27 Feb 2020 13:22:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC8D421FB57 for ; Thu, 27 Feb 2020 13:18:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7C87D102E; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7B05B468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:34 -0500 Message-Id: <1582838290-17243-107-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 106/622] lnet: lnd: conditionally set health status X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata For specific error scenarios a more accurate health status is set per transmit. These shouldn't be overwritten in kiblnd_txlist_done() WC-bug-id: https://jira.whamcloud.com/browse/LU-11271 Lustre-commit: cf3cc2c72e6e ("LU-11271 lnd: conditionally set health status") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33042 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 5680f2a..68ab7d5 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -110,7 +110,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, /* complete now */ tx->tx_waiting = 0; tx->tx_status = status; - tx->tx_hstatus = hstatus; + if (hstatus != LNET_MSG_STATUS_OK) + tx->tx_hstatus = hstatus; kiblnd_tx_done(tx); } } @@ -2108,9 +2109,11 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, spin_unlock(&conn->ibc_lock); /* aborting transmits occurs when finalizing the connection. - * The connection is finalized on error + * The connection is finalized on error. + * Passing LNET_MSG_STATUS_OK to txlist_done() will not + * override the value already set in tx->tx_hstatus above. */ - kiblnd_txlist_done(&zombies, -ECONNABORTED, -1); + kiblnd_txlist_done(&zombies, -ECONNABORTED, LNET_MSG_STATUS_OK); } static void From patchwork Thu Feb 27 21:09:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7E1014BC for ; Thu, 27 Feb 2020 21:24:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D0600246A0 for ; Thu, 27 Feb 2020 21:24:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0600246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 735DF348BD8; Thu, 27 Feb 2020 13:22:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B90F21FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7F085102F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7DEA746A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:35 -0500 Message-Id: <1582838290-17243-108-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 107/622] lnet: router handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Re-create the md and mdh if the router checker ping times out. When re-transmitting a message do so even if the peer is marked down to fulfill the message's retry quota. WC-bug-id: https://jira.whamcloud.com/browse/LU-11272 Lustre-commit: 05becd69bc0c ("LU-11272 lnet: router handling") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33043 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 12 ++++++++++-- net/lnet/lnet/router.c | 8 +++++++- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index eb0b48d..3cab970 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -678,7 +678,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * may drop the lnet_net_lock */ static int -lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp) +lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp, + struct lnet_msg *msg) { time64_t now = ktime_get_seconds(); @@ -689,6 +690,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return 1; /* + * If we're resending a message, let's attempt to send it even if + * the peer is down to fulfill our resend quota on the message + */ + if (msg->msg_retry_count > 0) + return 1; + + /* * Peer appears dead, but we should avoid frequent NI queries (at * most once per lnet_queryinterval seconds). */ @@ -746,7 +754,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, /* NB 'lp' is always the next hop */ if (!(msg->msg_target.pid & LNET_PID_USERFLAG) && - !lnet_peer_alive_locked(ni, lp)) { + !lnet_peer_alive_locked(ni, lp, msg)) { the_lnet.ln_counters[cpt]->drop_count++; the_lnet.ln_counters[cpt]->drop_length += msg->msg_len; lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 7c3bbd8..66a116c 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1042,7 +1042,13 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) } rcd = rtr->lpni_rcd; - if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis) + + /* The response to the router checker ping could've timed out and + * the mdh might've been invalidated, so we need to update it + * again. + */ + if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis || + LNetMDHandleIsInvalid(rcd->rcd_mdh)) rcd = lnet_update_rc_data_locked(rtr); if (!rcd) return; From patchwork Thu Feb 27 21:09:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409865 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A815914BC for ; Thu, 27 Feb 2020 21:24:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 90D24246A0 for ; Thu, 27 Feb 2020 21:24:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90D24246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B1B5F348C00; Thu, 27 Feb 2020 13:22:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F20721FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 826E41030; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 80CA746C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:36 -0500 Message-Id: <1582838290-17243-109-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 108/622] lustre: obd: check '-o network' and peer discovery conflict X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson "-o network=net" client mount option is not taken into account when LNet dynamic peer discovery is active. Check if LNet dynamic peer discovery is active on local node. If it is, return error if "-o network=net" option is specified. This patch will have to be reverted when the incompatibility between "-o network=net" client mount option and LNet dynamic peer discovery is resolved. WC-bug-id: https://jira.whamcloud.com/browse/LU-11057 Lustre-commit: 2269d27e07cb ("LU-11057 obd: check '-o network' and peer discovery conflict") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/32562 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_mount.c | 7 +++++++ include/linux/lnet/api.h | 1 + net/lnet/lnet/api-ni.c | 13 +++++++++++++ 3 files changed, 21 insertions(+) diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c index 5cf404c..d143112 100644 --- a/fs/lustre/obdclass/obd_mount.c +++ b/fs/lustre/obdclass/obd_mount.c @@ -1169,6 +1169,13 @@ int lmd_parse(char *options, struct lustre_mount_data *lmd) rc = lmd_parse_network(lmd, s1 + 8); if (rc) goto invalid; + + /* check if LNet dynamic peer discovery is activated */ + if (LNetGetPeerDiscoveryStatus()) { + CERROR("LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account.\n"); + goto invalid; + } + clear++; } diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index a57ecc8..4b152c8 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -207,6 +207,7 @@ int LNetGet(lnet_nid_t self, int LNetClearLazyPortal(int portal); int LNetCtl(unsigned int cmd, void *arg); void LNetDebugPeer(struct lnet_process_id id); +int LNetGetPeerDiscoveryStatus(void); /** @} lnet_misc */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 07bc29f..c81f46f 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -4038,3 +4038,16 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, kfree(buf); return rc; } + +/** + * Retrieve peer discovery status. + * + * Return 1 if lnet_peer_discovery_disabled is 0 + * 0 if lnet_peer_discovery_disabled is 1 + */ +int +LNetGetPeerDiscoveryStatus(void) +{ + return !lnet_peer_discovery_disabled; +} +EXPORT_SYMBOL(LNetGetPeerDiscoveryStatus); From patchwork Thu Feb 27 21:09:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F1BA138D for ; Thu, 27 Feb 2020 21:24:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8795B246A0 for ; Thu, 27 Feb 2020 21:24:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8795B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E897348C45; Thu, 27 Feb 2020 13:22:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C55B921FA6E for ; Thu, 27 Feb 2020 13:18:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 854771031; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 83AD946D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:37 -0500 Message-Id: <1582838290-17243-110-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 109/622] lnet: update logging X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add the retry count when logging message sending/resending. Make timed out responses visible on net error. Log cases when a message is not resent WC-bug-id: https://jira.whamcloud.com/browse/LU-11273 Lustre-commit: b9523f474346 ("LU-11273 lnet: update logging") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33044 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 13 +++++++------ net/lnet/lnet/lib-msg.c | 21 ++++++++++++++++++--- 2 files changed, 25 insertions(+), 9 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 3cab970..84a30e0 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1517,14 +1517,14 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, rc = lnet_post_send_locked(msg, 0); if (!rc) - CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s try# %d\n", libcfs_nid2str(msg->msg_hdr.src_nid), libcfs_nid2str(msg->msg_txni->ni_nid), libcfs_nid2str(sd->sd_src_nid), libcfs_nid2str(msg->msg_hdr.dest_nid), libcfs_nid2str(sd->sd_dst_nid), libcfs_nid2str(msg->msg_txpeer->lpni_nid), - lnet_msgtyp2str(msg->msg_type)); + lnet_msgtyp2str(msg->msg_type), msg->msg_retry_count); return rc; } @@ -2515,8 +2515,7 @@ struct lnet_mt_event_info { list_del_init(&rspt->rspt_on_list); - CDEBUG(D_NET, - "Response timed out: md = %p\n", md); + CNETERR("Response timed out: md = %p\n", md); LNetMDUnlink(rspt->rspt_mdh); lnet_rspt_free(rspt, i); } else { @@ -2579,11 +2578,13 @@ struct lnet_mt_event_info { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); - CDEBUG(D_NET, "resending %s->%s: %s recovery %d\n", + CDEBUG(D_NET, + "resending %s->%s: %s recovery %d try# %d\n", libcfs_nid2str(src_nid), libcfs_id2str(msg->msg_target), lnet_msgtyp2str(msg->msg_type), - msg->msg_recovery); + msg->msg_recovery, + msg->msg_retry_count); rc = lnet_send(src_nid, msg, LNET_NID_ANY); if (rc) { CERROR("Error sending %s to %s: %d\n", diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 5072238..9b52549 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -690,18 +690,33 @@ resend: /* don't resend recovery messages */ - if (msg->msg_recovery) + if (msg->msg_recovery) { + CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } /* if we explicitly indicated we don't want to resend then just * return */ - if (msg->msg_no_resend) + if (msg->msg_no_resend) { + CDEBUG(D_NET, "msg %s->%s requested no resend. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } /* check if the message has exceeded the number of retries */ - if (msg->msg_retry_count >= lnet_retry_count) + if (msg->msg_retry_count >= lnet_retry_count) { + CNETERR("msg %s->%s exceeded retry count %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); return -1; + } msg->msg_retry_count++; lnet_net_lock(msg->msg_tx_cpt); From patchwork Thu Feb 27 21:09:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410015 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E16951580 for ; Thu, 27 Feb 2020 21:27:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C6C31246A0 for ; Thu, 27 Feb 2020 21:27:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6C31246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 25636349241; Thu, 27 Feb 2020 13:24:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2991921FAF5 for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 87EE21032; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8677D46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:38 -0500 Message-Id: <1582838290-17243-111-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 110/622] lustre: ldlm: don't cancel DoM locks before replay X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Weigh a DOM locks before lock replay like that is done for OSC EXTENT locks and don't cancel locks with data. Add DoM replay tests for file creation and write cases. WC-bug-id: https://jira.whamcloud.com/browse/LU-10961 Lustre-commit: b44b1ff8c7fc ("LU-10961 ldlm: don't cancel DoM locks before replay") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/32791 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 1 + fs/lustre/mdc/mdc_request.c | 6 ++++++ fs/lustre/osc/osc_lock.c | 22 ++++++++++++++-------- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 5ba4f97..dc8071a 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -714,6 +714,7 @@ void osc_lock_cancel(const struct lu_env *env, const struct cl_lock_slice *slice); void osc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice); int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data); +unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock); /**************************************************************************** * diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 3341761..0ee42dd 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2510,6 +2510,12 @@ static int mdc_cancel_weight(struct ldlm_lock *lock) if (lock->l_policy_data.l_inodebits.bits & MDS_INODELOCK_OPEN) return 0; + /* Special case for DoM locks, cancel only unused and granted locks */ + if (ldlm_has_dom(lock) && + (lock->l_granted_mode != lock->l_req_mode || + osc_ldlm_weigh_ast(lock) != 0)) + return 0; + return 1; } diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index b7b33fb..1a2b0bd 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -608,8 +608,8 @@ static bool weigh_cb(const struct lu_env *env, struct cl_io *io, struct cl_page *page = ops->ops_cl.cpl_page; if (cl_page_is_vmlocked(env, page) || - PageDirty(page->cp_vmpage) || PageWriteback(page->cp_vmpage) - ) + PageDirty(page->cp_vmpage) || + PageWriteback(page->cp_vmpage)) return false; *(pgoff_t *)cbdata = osc_index(ops) + 1; @@ -618,7 +618,7 @@ static bool weigh_cb(const struct lu_env *env, struct cl_io *io, static unsigned long osc_lock_weight(const struct lu_env *env, struct osc_object *oscobj, - struct ldlm_extent *extent) + loff_t start, loff_t end) { struct cl_io *io = osc_env_thread_io(env); struct cl_object *obj = cl_object_top(&oscobj->oo_cl); @@ -631,11 +631,10 @@ static unsigned long osc_lock_weight(const struct lu_env *env, if (result != 0) return result; - page_index = cl_index(obj, extent->start); + page_index = cl_index(obj, start); if (!osc_page_gang_lookup(env, io, oscobj, - page_index, - cl_index(obj, extent->end), + page_index, cl_index(obj, end), weigh_cb, (void *)&page_index)) result = 1; cl_io_fini(env, io); @@ -668,7 +667,8 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) /* Mostly because lack of memory, do not eliminate this lock */ return 1; - LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT); + LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT || + ldlm_has_dom(dlmlock)); lock_res_and_lock(dlmlock); obj = dlmlock->l_ast_data; if (obj) @@ -695,7 +695,12 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) goto out; } - weight = osc_lock_weight(env, obj, &dlmlock->l_policy_data.l_extent); + if (ldlm_has_dom(dlmlock)) + weight = osc_lock_weight(env, obj, 0, OBD_OBJECT_EOF); + else + weight = osc_lock_weight(env, obj, + dlmlock->l_policy_data.l_extent.start, + dlmlock->l_policy_data.l_extent.end); out: if (obj) @@ -704,6 +709,7 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) cl_env_put(env, &refcheck); return weight; } +EXPORT_SYMBOL(osc_ldlm_weigh_ast); static void osc_lock_build_einfo(const struct lu_env *env, const struct cl_lock *lock, From patchwork Thu Feb 27 21:09:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB245138D for ; Thu, 27 Feb 2020 21:24:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B3B2E246A0 for ; Thu, 27 Feb 2020 21:24:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3B2E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFE0D348C35; Thu, 27 Feb 2020 13:22:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8015E21FA79 for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8AA961037; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 893FB468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:39 -0500 Message-Id: <1582838290-17243-112-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 111/622] lnet: lnd: Clean up logging X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata No need to output error in ksocknal_tx_done() as this error is tracked in lnet. No need to keep a cookie in the connection. It's always set to the message. This will allow us to set the msg's health status properly before calling lnet_finalize() WC-bug-id: https://jira.whamcloud.com/browse/LU-11309 Lustre-commit: cdf462b19345 ("LU-11309 lnd: Clean up logging") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33096 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 5 ++++- net/lnet/klnds/socklnd/socklnd.h | 3 +-- net/lnet/klnds/socklnd/socklnd_cb.c | 10 +++++----- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 891d3bd..72ecf80 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1680,7 +1680,10 @@ struct ksock_peer * &conn->ksnc_ipaddr, conn->ksnc_port, iov_iter_count(&conn->ksnc_rx_to), conn->ksnc_rx_nob_left, ktime_get_seconds() - last_rcv); - lnet_finalize(conn->ksnc_cookie, -EIO); + if (conn->ksnc_lnet_msg) + conn->ksnc_lnet_msg->msg_health_status = + LNET_MSG_STATUS_REMOTE_ERROR; + lnet_finalize(conn->ksnc_lnet_msg, -EIO); break; case SOCKNAL_RX_LNET_HEADER: if (conn->ksnc_rx_started) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 48884cf..c8d8acf 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -355,8 +355,7 @@ struct ksock_conn { u32 ksnc_rx_csum; /* partial checksum for incoming * data */ - void *ksnc_cookie; /* rx lnet_finalize passthru arg - */ + struct lnet_msg *ksnc_lnet_msg; /* rx lnet_finalize arg */ struct ksock_msg ksnc_msg; /* incoming message buffer: * V2.x message takes the * whole struct diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 057c7f3..10a1934 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -344,9 +344,6 @@ struct ksock_tx * ksocknal_free_tx(tx); if (lnetmsg) { /* KSOCK_MSG_NOOP go without lnetmsg */ - if (rc) - CERROR("tx failure rc = %d, hstatus = %d\n", rc, - hstatus); lnetmsg->msg_health_status = hstatus; lnet_finalize(lnetmsg, rc); } @@ -1266,7 +1263,10 @@ struct ksock_route * le64_to_cpu(lhdr->src_nid) != id->nid); } - lnet_finalize(conn->ksnc_cookie, rc); + if (rc && conn->ksnc_lnet_msg) + conn->ksnc_lnet_msg->msg_health_status = + LNET_MSG_STATUS_REMOTE_ERROR; + lnet_finalize(conn->ksnc_lnet_msg, rc); if (rc) { ksocknal_new_packet(conn, 0); @@ -1300,7 +1300,7 @@ struct ksock_route * LASSERT(iov_iter_count(to) <= rlen); LASSERT(to->nr_segs <= LNET_MAX_IOV); - conn->ksnc_cookie = msg; + conn->ksnc_lnet_msg = msg; conn->ksnc_rx_nob_left = rlen; conn->ksnc_rx_to = *to; From patchwork Thu Feb 27 21:09:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 731E5138D for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5B935246A1 for ; Thu, 27 Feb 2020 21:22:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B935246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A423C348943; Thu, 27 Feb 2020 13:21:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D996B21FAEC for ; Thu, 27 Feb 2020 13:18:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8D3901038; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C0D146A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:40 -0500 Message-Id: <1582838290-17243-113-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 112/622] lustre: mdt: revoke lease lock for truncate X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jian Yu Lustre lease lock is usually used to protect file data against concurrent access. Open lock used on MDT side is for this purpose. However, truncate will change file data but it doesn't revoke lease lock. This patch fixes the issue by acquiring open sem, checking lease count and revoking lease if there exists any pending lease on the file. WC-bug-id: https://jira.whamcloud.com/browse/LU-10660 Lustre-commit: e4c168165df2 ("LU-10660 mdt: revoke lease lock for truncate") Signed-off-by: Jian Yu Reviewed-on: https://review.whamcloud.com/33093 Reviewed-by: Andreas Dilger Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 7 +++++++ include/uapi/linux/lustre/lustre_idl.h | 1 + 2 files changed, 8 insertions(+) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 8b3e2a3..37558a8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1616,6 +1616,13 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, clear_bit(LLIF_DATA_MODIFIED, &lli->lli_flags); } + if (attr->ia_valid & ATTR_FILE) { + struct ll_file_data *fd = LUSTRE_FPRIVATE(attr->ia_file); + + if (fd->fd_lease_och) + op_data->op_bias |= MDS_TRUNC_KEEP_LEASE; + } + op_data->op_attr = *attr; op_data->op_xvalid = xvalid; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index c65663a..7f857be 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1700,6 +1700,7 @@ enum mds_op_bias { MDS_CLOSE_LAYOUT_MERGE = 1 << 15, MDS_CLOSE_RESYNC_DONE = 1 << 16, MDS_CLOSE_LAYOUT_SPLIT = 1 << 17, + MDS_TRUNC_KEEP_LEASE = 1 << 18, }; #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP | \ From patchwork Thu Feb 27 21:09:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C576E14BC for ; Thu, 27 Feb 2020 21:24:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ABF48246A0 for ; Thu, 27 Feb 2020 21:24:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABF48246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76EA3348C82; Thu, 27 Feb 2020 13:22:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 26EBC21FA8C for ; Thu, 27 Feb 2020 13:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 906ED1039; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8EDD946C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:41 -0500 Message-Id: <1582838290-17243-114-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 113/622] lustre: ptlrpc: race in AT early reply X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In ptlrpc_at_check_timed, the refcount of the request could be already dropped to zero, the ptlrpc_server_drop_request could continue without the "scp_at_lock" and free the request by writing 0x5a5a5a5a5a5a5a5a to the memory, but the following "atomic_inc_not_zero(&rq->rq_refcount)" will return nonzero and cause freed request to be used in ptlrpc_at_send_early_reply. WC-bug-id: https://jira.whamcloud.com/browse/LU-11281 Lustre-commit: 48e409e65edd ("LU-11281 ptlrpc: race in AT early reply") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/33071 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index cf920ae..a9155b2 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1224,14 +1224,18 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) break; } - ptlrpc_at_remove_timed(rq); /** * ptlrpc_server_drop_request() may drop * refcount to 0 already. Let's check this and * don't add entry to work_list */ - if (likely(atomic_inc_not_zero(&rq->rq_refcount))) + if (likely(atomic_inc_not_zero(&rq->rq_refcount))) { + ptlrpc_at_remove_timed(rq); list_add(&rq->rq_timed_list, &work_list); + } else { + ptlrpc_at_remove_timed(rq); + } + counter++; } From patchwork Thu Feb 27 21:09:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9A8414BC for ; Thu, 27 Feb 2020 21:24:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C21DA246A0 for ; Thu, 27 Feb 2020 21:24:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C21DA246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 830B3348C76; Thu, 27 Feb 2020 13:22:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6957421FADE for ; Thu, 27 Feb 2020 13:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 934CA103B; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 91F3A47C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:42 -0500 Message-Id: <1582838290-17243-115-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 114/622] lustre: migrate: migrate striped directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Migrate striped directory in below steps: 1. create target object if needed: if source is directory, a target object is always created, otherwise if source is already located on the target MDT, or source still has link on source MDT, then skip creating. a) if source is directory, detach source stripes and attach them to target. b) migrate source xattrs to target. c) if source is regular file, update PFID to target fid. d) update fid to target for all links of source 2. update namespace a) migrate dirent from source parent to target parent. b) update linkea parent fid to target parent. c) destroy source object. This implementation improves following fields: 1. all involved objects are locked to avoid race. 2. directory migration doesn't migrate its dir entries, instead it's done in each sub file migration, this avoids timeout in migrating dir entries for large directory, and also avoids touching dir entries without lock. 3. file/dir is migrated in one transaction, so migrate recovery is the same as others. 4. migrating directory can be accessed (modifiable) like normal directory. 5. if migration of sub files under a directory fails, user can redo migrate to finish migration of this directory. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 169738e30a7e ("LU-4684 migrate: migrate striped directory") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31427 Reviewed-by: Andreas Dilger Reviewed-by: Fan Yong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 24 ++- fs/lustre/include/lustre_lmv.h | 18 +- fs/lustre/llite/file.c | 11 + fs/lustre/llite/llite_lib.c | 90 +++++---- fs/lustre/lmv/lmv_internal.h | 15 +- fs/lustre/lmv/lmv_obd.c | 357 ++++++++++++++++++++++----------- fs/lustre/mdc/mdc_internal.h | 2 + fs/lustre/mdc/mdc_lib.c | 45 +++-- fs/lustre/mdc/mdc_reint.c | 5 +- fs/lustre/ptlrpc/wiretest.c | 16 +- include/uapi/linux/lustre/lustre_idl.h | 16 +- 11 files changed, 403 insertions(+), 196 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index e49954c..a709ad7 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1229,6 +1229,26 @@ struct lu_name { int ln_namelen; }; +static inline bool name_is_dot_or_dotdot(const char *name, int namelen) +{ + return name[0] == '.' && + (namelen == 1 || (namelen == 2 && name[1] == '.')); +} + +static inline bool lu_name_is_dot_or_dotdot(const struct lu_name *lname) +{ + return name_is_dot_or_dotdot(lname->ln_name, lname->ln_namelen); +} + +static inline bool lu_name_is_valid_len(const char *name, size_t name_len) +{ + return name && + name_len > 0 && + name_len < INT_MAX && + strlen(name) == name_len && + memchr(name, '/', name_len) == NULL; +} + /** * Validate names (path components) * @@ -1240,9 +1260,7 @@ struct lu_name { */ static inline bool lu_name_is_valid_2(const char *name, size_t name_len) { - return name && name_len > 0 && name_len < INT_MAX && - name[name_len] == '\0' && strlen(name) == name_len && - !memchr(name, '/', name_len); + return lu_name_is_valid_len(name, name_len) && name[name_len] == '\0'; } /** diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index 5e15c62..ff279e1 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -47,6 +47,8 @@ struct lmv_stripe_md { u32 lsm_md_master_mdt_index; u32 lsm_md_hash_type; u32 lsm_md_layout_version; + u32 lsm_md_migrate_offset; + u32 lsm_md_migrate_hash; u32 lsm_md_default_count; u32 lsm_md_default_index; char lsm_md_pool_name[LOV_MAXPOOLNAME + 1]; @@ -63,6 +65,10 @@ struct lmv_stripe_md { lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index || lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type || lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version || + lsm1->lsm_md_migrate_offset != + lsm2->lsm_md_migrate_offset || + lsm1->lsm_md_migrate_hash != + lsm2->lsm_md_migrate_hash || strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name) != 0) return false; @@ -137,18 +143,14 @@ static inline int lmv_name_to_stripe_index(u32 lmv_hash_type, unsigned int stripe_count, const char *name, int namelen) { - u32 hash_type = lmv_hash_type & LMV_HASH_TYPE_MASK; int idx; LASSERT(namelen > 0); - if (stripe_count <= 1) - return 0; - /* for migrating object, always start from 0 stripe */ - if (lmv_hash_type & LMV_HASH_FLAG_MIGRATION) + if (stripe_count <= 1) return 0; - switch (hash_type) { + switch (lmv_hash_type & LMV_HASH_TYPE_MASK) { case LMV_HASH_TYPE_ALL_CHARS: idx = lmv_hash_all_chars(stripe_count, name, namelen); break; @@ -159,8 +161,8 @@ static inline int lmv_name_to_stripe_index(u32 lmv_hash_type, idx = -EBADFD; break; } - CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name, - hash_type, idx); + CDEBUG(D_INFO, "name %.*s hash_type %#x idx %d/%u\n", namelen, name, + lmv_hash_type, idx, stripe_count); return idx; } diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index ae39b2c..fd39948 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3836,6 +3836,17 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (!child_inode) return -ENOENT; + if (!(exp_connect_flags2(ll_i2sbi(parent)->ll_md_exp) & + OBD_CONNECT2_DIR_MIGRATE)) { + if (le32_to_cpu(lum->lum_stripe_count) > 1 || + ll_i2info(child_inode)->lli_lsm_md) { + CERROR("%s: MDT doesn't support stripe directory migration!\n", + ll_get_fsname(parent->i_sb, NULL, 0)); + rc = -EOPNOTSUPP; + goto out_iput; + } + } + /* * lfs migrate command needs to be blocked on the client * by checking the migrate FID against the FID of the diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 37558a8..636ddf8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1254,14 +1254,8 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) * different, so it reset lsm_md to NULL to avoid * initializing lsm for slave inode. */ - /* For migrating inode, master stripe and master object will - * be same, so we only need assign this inode - */ - if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION && !i) - lsm->lsm_md_oinfo[i].lmo_root = inode; - else - lsm->lsm_md_oinfo[i].lmo_root = - ll_iget_anon_dir(inode->i_sb, fid, md); + lsm->lsm_md_oinfo[i].lmo_root = + ll_iget_anon_dir(inode->i_sb, fid, md); if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) { int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root); @@ -1273,20 +1267,6 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) return 0; } -static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1, - const struct lmv_stripe_md *lsm_md2) -{ - return lsm_md1->lsm_md_magic == lsm_md2->lsm_md_magic && - lsm_md1->lsm_md_stripe_count == lsm_md2->lsm_md_stripe_count && - lsm_md1->lsm_md_master_mdt_index == - lsm_md2->lsm_md_master_mdt_index && - lsm_md1->lsm_md_hash_type == lsm_md2->lsm_md_hash_type && - lsm_md1->lsm_md_layout_version == - lsm_md2->lsm_md_layout_version && - !strcmp(lsm_md1->lsm_md_pool_name, - lsm_md2->lsm_md_pool_name); -} - static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) { struct ll_inode_info *lli = ll_i2info(inode); @@ -1297,27 +1277,53 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) CDEBUG(D_INODE, "update lsm %p of " DFID "\n", lli->lli_lsm_md, PFID(ll_inode2fid(inode))); - /* no striped information from request. */ - if (!lsm) { - if (!lli->lli_lsm_md) { - return 0; - } else if (lli->lli_lsm_md->lsm_md_hash_type & - LMV_HASH_FLAG_MIGRATION) { - /* - * migration is done, the temporay MIGRATE layout has - * been removed - */ - CDEBUG(D_INODE, DFID " finish migration.\n", - PFID(ll_inode2fid(inode))); - lmv_free_memmd(lli->lli_lsm_md); - lli->lli_lsm_md = NULL; - return 0; - } - /* - * The lustre_md from req does not include stripeEA, - * see ll_md_setattr - */ + /* + * no striped information from request, lustre_md from req does not + * include stripeEA, see ll_md_setattr() + */ + if (!lsm) return 0; + + /* Compare the old and new stripe information */ + if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { + struct lmv_stripe_md *old_lsm = lli->lli_lsm_md; + bool layout_changed = lsm->lsm_md_layout_version > + old_lsm->lsm_md_layout_version; + int mask = layout_changed ? D_INODE : D_ERROR; + int idx; + + CDEBUG(mask, + "%s: inode@%p "DFID" lmv layout %s magic %#x/%#x stripe count %d/%d master_mdt %d/%d hash_type %#x/%#x version %d/%d migrate offset %d/%d migrate hash %#x/%#x pool %s/%s\n", + ll_get_fsname(inode->i_sb, NULL, 0), inode, + PFID(&lli->lli_fid), + layout_changed ? "changed" : "mismatch", + lsm->lsm_md_magic, old_lsm->lsm_md_magic, + lsm->lsm_md_stripe_count, + old_lsm->lsm_md_stripe_count, + lsm->lsm_md_master_mdt_index, + old_lsm->lsm_md_master_mdt_index, + lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type, + lsm->lsm_md_layout_version, + old_lsm->lsm_md_layout_version, + lsm->lsm_md_migrate_offset, + old_lsm->lsm_md_migrate_offset, + lsm->lsm_md_migrate_hash, + old_lsm->lsm_md_migrate_hash, + lsm->lsm_md_pool_name, + old_lsm->lsm_md_pool_name); + + for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) + CDEBUG(mask, "old stripe[%d] "DFID"\n", + idx, PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid)); + + for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) + CDEBUG(mask, "new stripe[%d] "DFID"\n", + idx, PFID(&lsm->lsm_md_oinfo[idx].lmo_fid)); + + if (!layout_changed) + return -EINVAL; + + ll_dir_clear_lsm_md(inode); } /* set the directory layout */ diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index 6794f11..c4a2fb8 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -123,18 +123,21 @@ static inline int lmv_stripe_md_size(int stripe_count) return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]); } -int lmv_name_to_stripe_index(enum lmv_hash_type hashtype, - unsigned int max_mdt_index, - const char *name, int namelen); - +/* for file under migrating directory, return the target stripe info */ static inline const struct lmv_oinfo * lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name, int namelen) { + u32 hash_type = lsm->lsm_md_hash_type; + u32 stripe_count = lsm->lsm_md_stripe_count; int stripe_index; - stripe_index = lmv_name_to_stripe_index(lsm->lsm_md_hash_type, - lsm->lsm_md_stripe_count, + if (hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type &= ~LMV_HASH_FLAG_MIGRATION; + stripe_count = lsm->lsm_md_migrate_offset; + } + + stripe_index = lmv_name_to_stripe_index(hash_type, stripe_count, name, namelen); if (stripe_index < 0) return ERR_PTR(stripe_index); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 90a46c4..3ddffd8 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1836,154 +1836,284 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data, return md_link(tgt->ltd_exp, op_data, request); } -static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, - const char *old, size_t oldlen, - const char *new, size_t newlen, - struct ptlrpc_request **request) +static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, + const char *name, size_t namelen, + struct ptlrpc_request **request) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct obd_export *target_exp; - struct lmv_tgt_desc *src_tgt; - struct lmv_tgt_desc *tgt_tgt; - struct mdt_body *body; + struct lmv_stripe_md *lsm = op_data->op_mea1; + struct lmv_tgt_desc *parent_tgt; + struct lmv_tgt_desc *sp_tgt; + struct lmv_tgt_desc *tp_tgt = NULL; + struct lmv_tgt_desc *child_tgt; + struct lmv_tgt_desc *tgt; + struct lu_fid target_fid; int rc; - LASSERT(oldlen != 0); + LASSERT(op_data->op_cli_flags & CLI_MIGRATE); + LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n", + PFID(&op_data->op_fid3)); - CDEBUG(D_INODE, "RENAME %.*s in " DFID ":%d to %.*s in " DFID ":%d\n", - (int)oldlen, old, PFID(&op_data->op_fid1), - op_data->op_mea1 ? op_data->op_mea1->lsm_md_stripe_count : 0, - (int)newlen, new, PFID(&op_data->op_fid2), - op_data->op_mea2 ? op_data->op_mea2->lsm_md_stripe_count : 0); + CDEBUG(D_INODE, "MIGRATE "DFID"/%.*s\n", + PFID(&op_data->op_fid1), (int)namelen, name); op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid()); op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - if (op_data->op_cli_flags & CLI_MIGRATE) { - LASSERTF(fid_is_sane(&op_data->op_fid3), - "invalid FID " DFID "\n", - PFID(&op_data->op_fid3)); - - if (op_data->op_mea1) { - struct lmv_stripe_md *lsm = op_data->op_mea1; - struct lmv_tgt_desc *tmp; - - /* Fix the parent fid for striped dir */ - tmp = lmv_locate_target_for_name(lmv, lsm, old, - oldlen, - &op_data->op_fid1, - NULL); - if (IS_ERR(tmp)) - return PTR_ERR(tmp); + parent_tgt = lmv_find_target(lmv, &op_data->op_fid1); + if (IS_ERR(parent_tgt)) + return PTR_ERR(parent_tgt); + + if (lsm) { + u32 hash_type = lsm->lsm_md_hash_type; + u32 stripe_count = lsm->lsm_md_stripe_count; + + /* + * old stripes are appended after new stripes for migrating + * directory. + */ + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type = lsm->lsm_md_migrate_hash; + stripe_count -= lsm->lsm_md_migrate_offset; } - rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data); - if (rc) + rc = lmv_name_to_stripe_index(hash_type, stripe_count, name, + namelen); + if (rc < 0) return rc; - src_tgt = lmv_find_target(lmv, &op_data->op_fid3); - if (IS_ERR(src_tgt)) - return PTR_ERR(src_tgt); - target_exp = src_tgt->ltd_exp; - } else { - if (op_data->op_mea1) { - struct lmv_stripe_md *lsm = op_data->op_mea1; + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) + rc += lsm->lsm_md_migrate_offset; - src_tgt = lmv_locate_target_for_name(lmv, lsm, old, - oldlen, - &op_data->op_fid1, - &op_data->op_mds); - } else { - src_tgt = lmv_find_target(lmv, &op_data->op_fid1); - } - if (IS_ERR(src_tgt)) - return PTR_ERR(src_tgt); + /* save it in fid4 temporarily for early cancel */ + op_data->op_fid4 = lsm->lsm_md_oinfo[rc].lmo_fid; + sp_tgt = lmv_get_target(lmv, lsm->lsm_md_oinfo[rc].lmo_mds, + NULL); + if (IS_ERR(sp_tgt)) + return PTR_ERR(sp_tgt); - if (op_data->op_mea2) { - struct lmv_stripe_md *lsm = op_data->op_mea2; - - tgt_tgt = lmv_locate_target_for_name(lmv, lsm, new, - newlen, - &op_data->op_fid2, - &op_data->op_mds); - } else { - tgt_tgt = lmv_find_target(lmv, &op_data->op_fid2); + /* + * if parent is being migrated too, fill op_fid2 with target + * stripe fid, otherwise the target stripe is not created yet. + */ + if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + hash_type = lsm->lsm_md_hash_type & + ~LMV_HASH_FLAG_MIGRATION; + stripe_count = lsm->lsm_md_migrate_offset; + + rc = lmv_name_to_stripe_index(hash_type, stripe_count, + name, namelen); + if (rc < 0) + return rc; + + op_data->op_fid2 = lsm->lsm_md_oinfo[rc].lmo_fid; + tp_tgt = lmv_get_target(lmv, + lsm->lsm_md_oinfo[rc].lmo_mds, + NULL); + if (IS_ERR(tp_tgt)) + return PTR_ERR(tp_tgt); } - if (IS_ERR(tgt_tgt)) - return PTR_ERR(tgt_tgt); - - target_exp = tgt_tgt->ltd_exp; + } else { + sp_tgt = parent_tgt; } - /* - * LOOKUP lock on src child (fid3) should also be cancelled for - * src_tgt in mdc_rename. - */ - op_data->op_flags |= MF_MDC_CANCEL_FID1 | MF_MDC_CANCEL_FID3; + child_tgt = lmv_find_target(lmv, &op_data->op_fid3); + if (IS_ERR(child_tgt)) + return PTR_ERR(child_tgt); - /* - * Cancel UPDATE locks on tgt parent (fid2), tgt_tgt is its - * own target. - */ - rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_UPDATE, - MF_MDC_CANCEL_FID2); + rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); if (rc) return rc; + /* - * Cancel LOOKUP locks on source child (fid3) for parent tgt_tgt. + * for directory, send migrate request to the MDT where the object will + * be migrated to, because we can't create a striped directory remotely. + * + * otherwise, send to the MDT where source is located because regular + * file may open lease. + * + * NB. if MDT doesn't support DIR_MIGRATE, send to source MDT too for + * backward compatibility. */ - if (fid_is_sane(&op_data->op_fid3)) { - struct lmv_tgt_desc *tgt; - - tgt = lmv_find_target(lmv, &op_data->op_fid1); + if (S_ISDIR(op_data->op_mode) && + (exp_connect_flags2(exp) & OBD_CONNECT2_DIR_MIGRATE)) { + tgt = lmv_find_target(lmv, &target_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); + } else { + tgt = child_tgt; + } - /* Cancel LOOKUP lock on its parent */ - rc = lmv_early_cancel(exp, tgt, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_LOOKUP, - MF_MDC_CANCEL_FID3); + /* cancel UPDATE lock of parent master object */ + rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); + if (rc) + return rc; + + /* cancel UPDATE lock of source parent */ + if (sp_tgt != parent_tgt) { + /* + * migrate RPC packs master object FID, because we can only pack + * two FIDs in reint RPC, but MDS needs to know both source + * parent and target parent, and it will obtain them from master + * FID and LMV, the other FID in RPC is kept for target. + * + * since this FID is not passed to MDC, cancel it anyway. + */ + rc = lmv_early_cancel(exp, sp_tgt, op_data, -1, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID4); if (rc) return rc; - rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_ELC, + op_data->op_flags &= ~MF_MDC_CANCEL_FID4; + } + op_data->op_fid4 = target_fid; + + /* cancel UPDATE locks of target parent */ + rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID2); + if (rc) + return rc; + + /* cancel LOOKUP lock of source if source is remote object */ + if (child_tgt != sp_tgt) { + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, + LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID3); if (rc) return rc; } -retry_rename: + /* cancel ELC locks of source */ + rc = lmv_early_cancel(exp, child_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_ELC, MF_MDC_CANCEL_FID3); + if (rc) + return rc; + + rc = md_rename(tgt->ltd_exp, op_data, name, namelen, NULL, 0, request); + + return rc; +} + +static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, + const char *old, size_t oldlen, + const char *new, size_t newlen, + struct ptlrpc_request **request) +{ + struct obd_device *obd = exp->exp_obd; + struct lmv_obd *lmv = &obd->u.lmv; + struct lmv_stripe_md *lsm = op_data->op_mea1; + struct lmv_tgt_desc *sp_tgt; + struct lmv_tgt_desc *tp_tgt = NULL; + struct lmv_tgt_desc *tgt; + struct mdt_body *body; + int rc; + + LASSERT(oldlen != 0); + + if (op_data->op_cli_flags & CLI_MIGRATE) { + rc = lmv_migrate(exp, op_data, old, oldlen, request); + return rc; + } + + op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid()); + op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); + op_data->op_cap = current_cap(); + + CDEBUG(D_INODE, "RENAME "DFID"/%.*s to "DFID"/%.*s\n", + PFID(&op_data->op_fid1), (int)oldlen, old, + PFID(&op_data->op_fid2), (int)newlen, new); + + if (lsm) + sp_tgt = lmv_locate_target_for_name(lmv, lsm, old, oldlen, + &op_data->op_fid1, + &op_data->op_mds); + else + sp_tgt = lmv_find_target(lmv, &op_data->op_fid1); + if (IS_ERR(sp_tgt)) + return PTR_ERR(sp_tgt); + + lsm = op_data->op_mea2; + if (lsm) + tp_tgt = lmv_locate_target_for_name(lmv, lsm, new, newlen, + &op_data->op_fid2, + &op_data->op_mds); + else + tp_tgt = lmv_find_target(lmv, &op_data->op_fid2); + if (IS_ERR(tp_tgt)) + return PTR_ERR(tp_tgt); + /* - * Cancel all the locks on tgt child (fid4). + * Since the target child might be destroyed, and it might + * become orphan, and we can only check orphan on the local + * MDT right now, so we send rename request to the MDT where + * target child is located. If target child does not exist, + * then it will send the request to the target parent */ if (fid_is_sane(&op_data->op_fid4)) { - struct lmv_tgt_desc *tgt; - - rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx, - LCK_EX, MDS_INODELOCK_ELC, - MF_MDC_CANCEL_FID4); - if (rc) - return rc; - tgt = lmv_find_target(lmv, &op_data->op_fid4); if (IS_ERR(tgt)) return PTR_ERR(tgt); + } else { + tgt = tp_tgt; + } - /* - * Since the target child might be destroyed, and it might - * become orphan, and we can only check orphan on the local - * MDT right now, so we send rename request to the MDT where - * target child is located. If target child does not exist, - * then it will send the request to the target parent - */ - target_exp = tgt->ltd_exp; + op_data->op_flags |= MF_MDC_CANCEL_FID4; + + /* cancel UPDATE locks of source parent */ + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); + if (rc != 0) + return rc; + + /* cancel UPDATE locks of target parent */ + rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID2); + if (rc != 0) + return rc; + + if (fid_is_sane(&op_data->op_fid3)) { + struct lmv_tgt_desc *src_tgt; + + src_tgt = lmv_find_target(lmv, &op_data->op_fid3); + if (IS_ERR(src_tgt)) + return PTR_ERR(src_tgt); + + /* cancel LOOKUP lock of source on source parent */ + if (src_tgt != sp_tgt) { + rc = lmv_early_cancel(exp, sp_tgt, op_data, + tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_LOOKUP, + MF_MDC_CANCEL_FID3); + if (rc != 0) + return rc; + } + + /* cancel ELC locks of source */ + rc = lmv_early_cancel(exp, src_tgt, op_data, tgt->ltd_idx, + LCK_EX, MDS_INODELOCK_ELC, + MF_MDC_CANCEL_FID3); + if (rc != 0) + return rc; + } + +retry_rename: + if (fid_is_sane(&op_data->op_fid4)) { + /* cancel LOOKUP lock of target on target parent */ + if (tgt != tp_tgt) { + rc = lmv_early_cancel(exp, tp_tgt, op_data, + tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_LOOKUP, + MF_MDC_CANCEL_FID4); + if (rc != 0) + return rc; + } } - rc = md_rename(target_exp, op_data, old, oldlen, new, newlen, request); + rc = md_rename(tgt->ltd_exp, op_data, old, oldlen, new, newlen, + request); if (rc && rc != -EXDEV) return rc; @@ -2001,6 +2131,11 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fid4 = body->mbo_fid1; ptlrpc_req_finished(*request); *request = NULL; + + tgt = lmv_find_target(lmv, &op_data->op_fid4); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + goto retry_rename; } @@ -2743,6 +2878,8 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm, else lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type); lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version); + lsm->lsm_md_migrate_offset = le32_to_cpu(lmm1->lmv_migrate_offset); + lsm->lsm_md_migrate_hash = le32_to_cpu(lmm1->lmv_migrate_hash); cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name, sizeof(lsm->lsm_md_pool_name)); @@ -2750,7 +2887,7 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm, return -E2BIG; CDEBUG(D_INFO, - "unpack lsm count %d, master %d hash_type %d layout_version %d\n", + "unpack lsm count %d, master %d hash_type %#x layout_version %d\n", lsm->lsm_md_stripe_count, lsm->lsm_md_master_mdt_index, lsm->lsm_md_hash_type, lsm->lsm_md_layout_version); @@ -2783,16 +2920,8 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, if (lsm && !lmm) { int i; - for (i = 0; i < lsm->lsm_md_stripe_count; i++) { - /* - * For migrating inode, the master stripe and master - * object will be the same, so do not need iput, see - * ll_update_lsm_md - */ - if (!(lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION && - !i)) - iput(lsm->lsm_md_oinfo[i].lmo_root); - } + for (i = 0; i < lsm->lsm_md_stripe_count; i++) + iput(lsm->lsm_md_oinfo[i].lmo_root); kvfree(lsm); *lsmp = NULL; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 6cfa79c..b4af9778 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -63,6 +63,8 @@ void mdc_file_secctx_pack(struct ptlrpc_request *req, void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, const char *old, size_t oldlen, const char *new, size_t newlen); +void mdc_migrate_pack(struct ptlrpc_request *req, struct md_op_data *op_data, + const char *name, size_t namelen); void mdc_close_pack(struct ptlrpc_request *req, struct md_op_data *op_data); /* mdc/mdc_locks.c */ diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 1d38574..5b1691e 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -489,8 +489,7 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT); /* XXX do something about time, uid, gid */ - rec->rn_opcode = op_data->op_cli_flags & CLI_MIGRATE ? - REINT_MIGRATE : REINT_RENAME; + rec->rn_opcode = REINT_RENAME; rec->rn_fsuid = op_data->op_fsuid; rec->rn_fsgid = op_data->op_fsgid; rec->rn_cap = op_data->op_cap.cap[0]; @@ -506,22 +505,42 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, if (new) mdc_pack_name(req, &RMF_SYMTGT, new, newlen); +} - if (op_data->op_cli_flags & CLI_MIGRATE) { - char *tmp; +void mdc_migrate_pack(struct ptlrpc_request *req, struct md_op_data *op_data, + const char *name, size_t namelen) +{ + struct mdt_rec_rename *rec; + char *ea; - if (op_data->op_bias & MDS_CLOSE_MIGRATE) { - struct mdt_ioepoch *epoch; + BUILD_BUG_ON(sizeof(struct mdt_rec_reint) != + sizeof(struct mdt_rec_rename)); + rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT); - mdc_close_intent_pack(req, op_data); - epoch = req_capsule_client_get(&req->rq_pill, - &RMF_MDT_EPOCH); - mdc_ioepoch_pack(epoch, op_data); - } + rec->rn_opcode = REINT_MIGRATE; + rec->rn_fsuid = op_data->op_fsuid; + rec->rn_fsgid = op_data->op_fsgid; + rec->rn_cap = op_data->op_cap.cap[0]; + rec->rn_suppgid1 = op_data->op_suppgids[0]; + rec->rn_suppgid2 = op_data->op_suppgids[1]; + rec->rn_fid1 = op_data->op_fid1; + rec->rn_fid2 = op_data->op_fid4; + rec->rn_time = op_data->op_mod_time; + rec->rn_mode = op_data->op_mode; + rec->rn_bias = op_data->op_bias; - tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA); - memcpy(tmp, op_data->op_data, op_data->op_data_size); + mdc_pack_name(req, &RMF_NAME, name, namelen); + + if (op_data->op_bias & MDS_CLOSE_MIGRATE) { + struct mdt_ioepoch *epoch; + + mdc_close_intent_pack(req, op_data); + epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH); + mdc_ioepoch_pack(epoch, op_data); } + + ea = req_capsule_client_get(&req->rq_pill, &RMF_EADATA); + memcpy(ea, op_data->op_data, op_data->op_data_size); } void mdc_getattr_pack(struct ptlrpc_request *req, u64 valid, u32 flags, diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 030c247..355cee1 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -403,7 +403,10 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, if (exp_connect_cancelset(exp) && req) ldlm_cli_cancel_list(&cancels, count, req, 0); - mdc_rename_pack(req, op_data, old, oldlen, new, newlen); + if (op_data->op_cli_flags & CLI_MIGRATE) + mdc_migrate_pack(req, op_data, old, oldlen); + else + mdc_rename_pack(req, op_data, old, oldlen, new, newlen); req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, obd->u.cli.cl_default_mds_easize); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 30083c2..4095767 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1627,13 +1627,17 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_layout_version)); LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version) == 4, "found %lld\n", (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version)); - LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding1) == 20, "found %lld\n", - (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding1)); - LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1) == 4, "found %lld\n", - (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1)); - LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding2) == 24, "found %lld\n", + LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_migrate_offset) == 20, "found %lld\n", + (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_migrate_offset)); + LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_migrate_offset) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_migrate_offset)); + LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_migrate_hash) == 24, "found %lld\n", + (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_migrate_hash)); + LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_migrate_hash) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_migrate_hash)); + LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding2) == 28, "found %lld\n", (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding2)); - LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2) == 8, "found %lld\n", + LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2) == 4, "found %lld\n", (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2)); LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding3) == 32, "found %lld\n", (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding3)); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 7f857be..522bd52 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1941,9 +1941,19 @@ struct lmv_mds_md_v1 { * be used to mark the object status, * for example migrating or dead. */ - __u32 lmv_layout_version; /* Used for directory restriping */ - __u32 lmv_padding1; - __u64 lmv_padding2; + __u32 lmv_layout_version; /* increased each time layout changed, + * by directory migration, restripe + * and LFSCK. + */ + __u32 lmv_migrate_offset; /* once this is set, it means this + * directory is been migrated, stripes + * before this offset belong to target, + * from this to source. + */ + __u32 lmv_migrate_hash; /* hash type of source stripes of + * migrating directory + */ + __u32 lmv_padding2; __u64 lmv_padding3; char lmv_pool_name[LOV_MAXPOOLNAME + 1];/* pool name */ struct lu_fid lmv_stripe_fids[0]; /* FIDs for each stripe */ From patchwork Thu Feb 27 21:09:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409875 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56A7A138D for ; Thu, 27 Feb 2020 21:24:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F890246A0 for ; Thu, 27 Feb 2020 21:24:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F890246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADEEE348830; Thu, 27 Feb 2020 13:22:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C09DF21FA7D for ; Thu, 27 Feb 2020 13:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 96A60103C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 957B1468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:43 -0500 Message-Id: <1582838290-17243-116-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 115/622] lustre: obdclass: remove unused ll_import_cachep X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The ll_import_cache is not used anywhere, and can be removed. WC-bug-id: https://jira.whamcloud.com/browse/LU-10899 Lustre-commit: e23250110729 ("LU-10899 obdclass: remove unused ll_import_cachep") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33119 Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index fc50aba..a122332 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -48,7 +48,6 @@ static struct kmem_cache *obd_device_cachep; struct kmem_cache *obdo_cachep; EXPORT_SYMBOL(obdo_cachep); -static struct kmem_cache *import_cachep; static struct kobj_type class_ktype; static struct workqueue_struct *zombie_wq; @@ -648,8 +647,6 @@ void obd_cleanup_caches(void) obd_device_cachep = NULL; kmem_cache_destroy(obdo_cachep); obdo_cachep = NULL; - kmem_cache_destroy(import_cachep); - import_cachep = NULL; } int obd_init_caches(void) @@ -667,13 +664,6 @@ int obd_init_caches(void) if (!obdo_cachep) goto out; - LASSERT(!import_cachep); - import_cachep = kmem_cache_create("ll_import_cache", - sizeof(struct obd_import), - 0, 0, NULL); - if (!import_cachep) - goto out; - return 0; out: obd_cleanup_caches(); From patchwork Thu Feb 27 21:09:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30BCE159A for ; Thu, 27 Feb 2020 21:22:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 197D4246A0 for ; Thu, 27 Feb 2020 21:22:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 197D4246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C181821FF31; Thu, 27 Feb 2020 13:21:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0EE9921FA7D for ; Thu, 27 Feb 2020 13:18:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 99662103D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9859E46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:44 -0500 Message-Id: <1582838290-17243-117-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 116/622] lustre: ptlrpc: add debugging for idle connections X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Add a "debug" parameter for the idle client disconnection so that it can log disconnect/reconnect events to the console. Print the idle time in the "import" file. Enable the connection debugging for all test runs. WC-bug-id: https://jira.whamcloud.com/browse/LU-11128 Lustre-commit: 0aa58d26f5df ("LU-11128 ptlrpc: add debugging for idle connections") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33168 Reviewed-by: Alex Zhuravlev Reviewed-by: Nathaniel Clark Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 1 + fs/lustre/obdclass/lprocfs_status.c | 6 ++++-- fs/lustre/osc/lproc_osc.c | 34 ++++++++++++++++++++++------------ fs/lustre/osc/osc_request.c | 1 + fs/lustre/ptlrpc/client.c | 6 ++++-- fs/lustre/ptlrpc/import.c | 4 +++- 6 files changed, 35 insertions(+), 17 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index c4452e1..1fd6246 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -304,6 +304,7 @@ struct obd_import { u32 imp_connect_op; u32 imp_idle_timeout; + u32 imp_idle_debug; struct obd_connect_data imp_connect_data; u64 imp_connect_flags_orig; u64 imp_connect_flags2_orig; diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index fbd46df..747baff 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -802,11 +802,13 @@ int lprocfs_rd_import(struct seq_file *m, void *data) " current_connection: %s\n" " connection_attempts: %u\n" " generation: %u\n" - " in-progress_invalidations: %u\n", + " in-progress_invalidations: %u\n" + " idle: %lld sec\n", nidstr, imp->imp_conn_cnt, imp->imp_generation, - atomic_read(&imp->imp_inval_count)); + atomic_read(&imp->imp_inval_count), + ktime_get_real_seconds() - imp->imp_last_reply_time); spin_unlock(&imp->imp_lock); if (!obd->obd_svc_stats) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 16de266..f025275 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -622,27 +622,37 @@ static ssize_t idle_timeout_store(struct kobject *kobj, struct attribute *attr, obd_kset.kobj); struct client_obd *cli = &obd->u.cli; struct ptlrpc_request *req; + unsigned int idle_debug = 0; unsigned int val; int rc; - rc = kstrtouint(buffer, 0, &val); - if (rc) - return rc; + if (strncmp(buffer, "debug", 5) == 0) { + idle_debug = D_CONSOLE; + } else if (strncmp(buffer, "nodebug", 6) == 0) { + idle_debug = D_HA; + } else { + rc = kstrtouint(buffer, 0, &val); + if (rc) + return rc; - if (val > CONNECTION_SWITCH_MAX) - return -ERANGE; + if (val > CONNECTION_SWITCH_MAX) + return -ERANGE; + } rc = lprocfs_climp_check(obd); if (rc) return rc; - cli->cl_import->imp_idle_timeout = val; - - /* to initiate the connection if it's in IDLE state */ - if (!val) { - req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_STATFS); - if (req) - ptlrpc_req_finished(req); + if (idle_debug) { + cli->cl_import->imp_idle_timeout = val; + } else { + /* to initiate the connection if it's in IDLE state */ + if (!val) { + req = ptlrpc_request_alloc(cli->cl_import, + &RQF_OST_STATFS); + if (req) + ptlrpc_req_finished(req); + } } up_read(&obd->u.cli.cl_sem); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 1a9ed8d..2784e1e 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -3271,6 +3271,7 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg) list_add_tail(&cli->cl_shrink_list, &osc_shrink_list); spin_unlock(&osc_shrink_lock); cli->cl_import->imp_idle_timeout = osc_idle_timeout; + cli->cl_import->imp_idle_debug = D_HA; return rc; diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 57b08de..691df1a 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -890,8 +890,10 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, if (unlikely(imp->imp_state == LUSTRE_IMP_IDLE)) { int rc; - CDEBUG(D_INFO, "%s: connect at new req\n", - imp->imp_obd->obd_name); + CDEBUG_LIMIT(imp->imp_idle_debug, + "%s: reconnect after %llds idle\n", + imp->imp_obd->obd_name, ktime_get_real_seconds() - + imp->imp_last_reply_time); spin_lock(&imp->imp_lock); if (imp->imp_state == LUSTRE_IMP_IDLE) { imp->imp_generation++; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index b90f78c..b11bb2f 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1623,7 +1623,9 @@ int ptlrpc_disconnect_and_idle_import(struct obd_import *imp) if (IS_ERR(req)) return PTR_ERR(req); - CDEBUG(D_INFO, "%s: disconnect\n", imp->imp_obd->obd_name); + CDEBUG_LIMIT(imp->imp_idle_debug, "%s: disconnect after %llus idle\n", + imp->imp_obd->obd_name, + ktime_get_real_seconds() - imp->imp_last_reply_time); req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret; ptlrpcd_add_req(req); From patchwork Thu Feb 27 21:09:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409877 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36BAE14BC for ; Thu, 27 Feb 2020 21:24:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1F64C246A0 for ; Thu, 27 Feb 2020 21:24:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F64C246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0BF8C348CEA; Thu, 27 Feb 2020 13:22:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69B9821FAF4 for ; Thu, 27 Feb 2020 13:18:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9CA68103E; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9B47646C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:45 -0500 Message-Id: <1582838290-17243-118-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 117/622] lustre: obdclass: Add lbug_on_eviction option X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ryan Haasken , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ryan Haasken Add an lbug_on_eviction sysfs interface. When it is set to a non-zero value on a client, it will cause the client to LBUG whenever it is evicted by the server. Note, an MDS is a client to OSTs, and every server is a client of MGS. Thus, it is probably desireable to leave this set to zero on servers. Cray-bug-id: LUS-2591 WC-bug-id: https://jira.whamcloud.com/browse/LU-5026 Lustre-commit: 97381ffc9231 ("LU-5026 obdclass: Add lbug_on_eviction option") Signed-off-by: Ryan Haasken Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/10257 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/obdclass/class_obd.c | 2 ++ fs/lustre/obdclass/obd_sysfs.c | 2 ++ fs/lustre/ptlrpc/import.c | 1 + 4 files changed, 6 insertions(+) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 3d14723..04ef76f 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -43,6 +43,7 @@ extern unsigned int obd_debug_peer_on_timeout; extern unsigned int obd_dump_on_timeout; extern unsigned int obd_dump_on_eviction; +extern unsigned int obd_lbug_on_eviction; /* obd_timeout should only be used for recovery, not for * networking / disk / timings affected by load (use Adaptive Timeouts) */ diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 7e436af..4ef9cca 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -56,6 +56,8 @@ EXPORT_SYMBOL(obd_dump_on_timeout); unsigned int obd_dump_on_eviction; EXPORT_SYMBOL(obd_dump_on_eviction); +unsigned int obd_lbug_on_eviction; +EXPORT_SYMBOL(obd_lbug_on_eviction); unsigned long obd_max_dirty_pages; EXPORT_SYMBOL(obd_max_dirty_pages); atomic_long_t obd_dirty_pages; diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c index cd2917e..73e44e7 100644 --- a/fs/lustre/obdclass/obd_sysfs.c +++ b/fs/lustre/obdclass/obd_sysfs.c @@ -118,6 +118,7 @@ static ssize_t static_uintvalue_store(struct kobject *kobj, LUSTRE_STATIC_UINT_ATTR(at_extra, &at_extra); LUSTRE_STATIC_UINT_ATTR(at_early_margin, &at_early_margin); LUSTRE_STATIC_UINT_ATTR(at_history, &at_history); +LUSTRE_STATIC_UINT_ATTR(lbug_on_eviction, &obd_lbug_on_eviction); static ssize_t max_dirty_mb_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -280,6 +281,7 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, &lustre_sattr_at_extra.u.attr, &lustre_sattr_at_early_margin.u.attr, &lustre_sattr_at_history.u.attr, + &lustre_sattr_lbug_on_eviction.u.attr, NULL, }; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index b11bb2f..73a345f 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1385,6 +1385,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) "%s: This client was evicted by %.*s; in progress operations using this service will fail.\n", imp->imp_obd->obd_name, target_len, target_start); + LASSERTF(!obd_lbug_on_eviction, "LBUG upon eviction"); } CDEBUG(D_HA, "evicted from %s@%s; invalidating\n", obd2cli_tgt(imp->imp_obd), From patchwork Thu Feb 27 21:09:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409881 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4AF80138D for ; Thu, 27 Feb 2020 21:24:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 33912246A0 for ; Thu, 27 Feb 2020 21:24:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33912246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B4DD348D21; Thu, 27 Feb 2020 13:22:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BF59421FB61 for ; Thu, 27 Feb 2020 13:18:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A04D4103F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9E40946D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:46 -0500 Message-Id: <1582838290-17243-119-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 118/622] lustre: lmv: support accessing migrating directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Migrating directory contains stripes of both old and new layout, and its sub files may be located on either one. To avoid race between access and new creations, there are 4 rules to access migrating directory: 1. always create new file under new layout. 2. any operation that tries to create new file under old layout will be rejected, e.g., 'mv a /b', if b exists and is under old layout, this rename should fail with -EBUSY. 3. operations that access file by name should try old layout first, if file doesn't exist, then it will retry new layout, such operations include: lookup, getattr_name, unlink, open-by-name, link, rename. 4. according to rule 1, open(O_CREAT | O_EXCL) and create() will create new file under new layout, but they should check existing file in one transaction, however this can't be done for old layout, so check existing file under old layout on client side, then issue the open/create request to new layout. Disable sanity 230d for ZFS backend because it will trigger lots of sync, which may cause system hung. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: 976b609abcdf ("LU-4684 lmv: support accessing migrating directory") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/31504 Reviewed-by: Fan Yong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 12 ++ fs/lustre/lmv/lmv_intent.c | 132 +++++++------ fs/lustre/lmv/lmv_internal.h | 75 +++++-- fs/lustre/lmv/lmv_obd.c | 453 ++++++++++++++++++++++--------------------- 4 files changed, 381 insertions(+), 291 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 9286755..b404391 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -787,6 +787,18 @@ struct md_op_data { u32 op_projid; u16 op_mirror_id; + + /* + * used to access migrating dir: if it's set, assume migration is + * finished, use the new layout to access dir, otherwise use old layout. + * By default it's not set, because new files are created under new + * layout, if we can't find file with name under both old and new + * layout, we are sure file with name doesn't exist, but in reverse + * order there may be a race with creation by others. + */ + bool op_post_migrate; + /* used to access dir with bash hash */ + u32 op_stripe_index; }; struct md_callback { diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 355a2af..3f51032 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -191,7 +191,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, op_data->op_fid1 = fid; op_data->op_fid2 = fid; - tgt = lmv_locate_mds(lmv, op_data, &fid); + tgt = lmv_get_target(lmv, lsm->lsm_md_oinfo[i].lmo_mds, NULL); if (IS_ERR(tgt)) { rc = PTR_ERR(tgt); goto cleanup; @@ -269,8 +269,52 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; struct mdt_body *body; + u64 flags = it->it_flags; int rc; + if ((it->it_op & IT_CREAT) && !(flags & MDS_OPEN_BY_FID)) { + /* don't allow create under dir with bad hash */ + if (lmv_is_dir_bad_hash(op_data->op_mea1)) + return -EBADF; + + if (lmv_is_dir_migrating(op_data->op_mea1)) { + if (flags & O_EXCL) { + /* + * open(O_CREAT | O_EXCL) needs to check + * existing name, which should be done on both + * old and new layout, to avoid creating new + * file under old layout, check old layout on + * client side. + */ + tgt = lmv_locate_tgt(lmv, op_data, + &op_data->op_fid1); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + rc = md_getattr_name(tgt->ltd_exp, op_data, + reqp); + if (!rc) { + ptlrpc_req_finished(*reqp); + *reqp = NULL; + return -EEXIST; + } + + if (rc != -ENOENT) + return rc; + + op_data->op_post_migrate = true; + } else { + /* + * open(O_CREAT) will be sent to MDT in old + * layout first, to avoid creating new file + * under old layout, clear O_CREAT. + */ + it->it_flags &= ~O_CREAT; + } + } + } + +retry: if (it->it_flags & MDS_OPEN_BY_FID) { LASSERT(fid_is_sane(&op_data->op_fid2)); @@ -292,7 +336,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, LASSERT(fid_is_zero(&op_data->op_fid2)); LASSERT(op_data->op_name); - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); } @@ -325,8 +369,21 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, */ if ((it->it_disposition & DISP_LOOKUP_NEG) && !(it->it_disposition & DISP_OPEN_CREATE) && - !(it->it_disposition & DISP_OPEN_OPEN)) + !(it->it_disposition & DISP_OPEN_OPEN)) { + if (!(it->it_flags & MDS_OPEN_BY_FID) && + lmv_dir_retry_check_update(op_data)) { + ptlrpc_req_finished(*reqp); + it->it_request = NULL; + it->it_disposition = 0; + *reqp = NULL; + + it->it_flags = flags; + fid_zero(&op_data->op_fid2); + goto retry; + } + return rc; + } body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY); if (!body) @@ -357,43 +414,25 @@ static int lmv_intent_lookup(struct obd_export *exp, ldlm_blocking_callback cb_blocking, u64 extra_lock_flags) { - struct lmv_stripe_md *lsm = op_data->op_mea1; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt = NULL; struct mdt_body *body; - int rc = 0; + int rc; - /* - * If it returns ERR_PTR(-EBADFD) then it is an unknown hash type - * it will try all stripes to locate the object - */ - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); - if (IS_ERR(tgt) && (PTR_ERR(tgt) != -EBADFD)) +retry: + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + if (IS_ERR(tgt)) return PTR_ERR(tgt); - /* - * Both migrating dir and unknown hash dir need to try - * all of sub-stripes - */ - if (lsm && !lmv_is_known_hash_type(lsm->lsm_md_hash_type)) { - struct lmv_oinfo *oinfo = &lsm->lsm_md_oinfo[0]; - - op_data->op_fid1 = oinfo->lmo_fid; - op_data->op_mds = oinfo->lmo_mds; - tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - } - if (!fid_is_sane(&op_data->op_fid2)) fid_zero(&op_data->op_fid2); CDEBUG(D_INODE, - "LOOKUP_INTENT with fid1=" DFID ", fid2=" DFID ", name='%s' -> mds #%u lsm=%p lsm_magic=%x\n", + "LOOKUP_INTENT with fid1=" DFID ", fid2=" DFID ", name='%s' -> mds #%u\n", PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), op_data->op_name ? op_data->op_name : "", - tgt->ltd_idx, lsm, !lsm ? -1 : lsm->lsm_md_magic); + tgt->ltd_idx); op_data->op_bias &= ~MDS_CROSS_REF; @@ -415,39 +454,14 @@ static int lmv_intent_lookup(struct obd_export *exp, return rc; } return rc; - } else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm && - lmv_need_try_all_stripes(lsm)) { - /* - * For migrating and unknown hash type directory, it will - * try to target the entry on other stripes - */ - int stripe_index; - - for (stripe_index = 1; - stripe_index < lsm->lsm_md_stripe_count && - it_disposition(it, DISP_LOOKUP_NEG); stripe_index++) { - struct lmv_oinfo *oinfo; - - /* release the previous request */ - ptlrpc_req_finished(*reqp); - it->it_request = NULL; - *reqp = NULL; - - oinfo = &lsm->lsm_md_oinfo[stripe_index]; - tgt = lmv_find_target(lmv, &oinfo->lmo_fid); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - - CDEBUG(D_INODE, "Try other stripes " DFID "\n", - PFID(&oinfo->lmo_fid)); + } else if (it_disposition(it, DISP_LOOKUP_NEG) && + lmv_dir_retry_check_update(op_data)) { + ptlrpc_req_finished(*reqp); + it->it_request = NULL; + it->it_disposition = 0; + *reqp = NULL; - op_data->op_fid1 = oinfo->lmo_fid; - it->it_disposition &= ~DISP_ENQ_COMPLETE; - rc = md_intent_lock(tgt->ltd_exp, op_data, it, reqp, - cb_blocking, extra_lock_flags); - if (rc) - return rc; - } + goto retry; } if (!it_has_reply_body(it)) diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index c4a2fb8..e434919 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -58,6 +58,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, ldlm_blocking_callback cb_blocking, int extra_lock_flags); +int lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data, + struct ptlrpc_request **preq); + static inline struct obd_device *lmv2obd_dev(struct lmv_obd *lmv) { return container_of_safe(lmv, struct obd_device, u.lmv); @@ -126,15 +129,20 @@ static inline int lmv_stripe_md_size(int stripe_count) /* for file under migrating directory, return the target stripe info */ static inline const struct lmv_oinfo * lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name, - int namelen) + int namelen, bool post_migrate) { u32 hash_type = lsm->lsm_md_hash_type; u32 stripe_count = lsm->lsm_md_stripe_count; int stripe_index; if (hash_type & LMV_HASH_FLAG_MIGRATION) { - hash_type &= ~LMV_HASH_FLAG_MIGRATION; - stripe_count = lsm->lsm_md_migrate_offset; + if (post_migrate) { + hash_type &= ~LMV_HASH_FLAG_MIGRATION; + stripe_count = lsm->lsm_md_migrate_offset; + } else { + hash_type = lsm->lsm_md_migrate_hash; + stripe_count -= lsm->lsm_md_migrate_offset; + } } stripe_index = lmv_name_to_stripe_index(hash_type, stripe_count, @@ -142,23 +150,64 @@ static inline int lmv_stripe_md_size(int stripe_count) if (stripe_index < 0) return ERR_PTR(stripe_index); - LASSERTF(stripe_index < lsm->lsm_md_stripe_count, - "stripe_index = %d, stripe_count = %d hash_type = %x name = %.*s\n", - stripe_index, lsm->lsm_md_stripe_count, - lsm->lsm_md_hash_type, namelen, name); + if ((lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) && !post_migrate) + stripe_index += lsm->lsm_md_migrate_offset; + + if (stripe_index >= lsm->lsm_md_stripe_count) { + CERROR("stripe_index %d stripe_count %d hash_type %#x migrate_offset %d migrate_hash %#x name %.*s\n", + stripe_index, lsm->lsm_md_stripe_count, + lsm->lsm_md_hash_type, lsm->lsm_md_migrate_offset, + lsm->lsm_md_migrate_hash, namelen, name); + return ERR_PTR(-EBADF); + } return &lsm->lsm_md_oinfo[stripe_index]; } -static inline bool lmv_need_try_all_stripes(const struct lmv_stripe_md *lsm) +static inline bool lmv_is_dir_migrating(const struct lmv_stripe_md *lsm) +{ + return lsm ? lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION : false; +} + +static inline bool lmv_is_dir_bad_hash(const struct lmv_stripe_md *lsm) +{ + if (!lsm) + return false; + + if (lmv_is_dir_migrating(lsm)) { + if (lsm->lsm_md_stripe_count - lsm->lsm_md_migrate_offset > 1) + return !lmv_is_known_hash_type( + lsm->lsm_md_migrate_hash); + return false; + } + + return !lmv_is_known_hash_type(lsm->lsm_md_hash_type); +} + +static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) { - return !lmv_is_known_hash_type(lsm->lsm_md_hash_type) || - lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION; + const struct lmv_stripe_md *lsm = op_data->op_mea1; + + if (!lsm) + return false; + + if (lmv_is_dir_migrating(lsm) && !op_data->op_post_migrate) { + op_data->op_post_migrate = true; + return true; + } + + if (lmv_is_dir_bad_hash(lsm) && + op_data->op_stripe_index < lsm->lsm_md_stripe_count - 1) { + op_data->op_stripe_index++; + return true; + } + + return false; } -struct lmv_tgt_desc -*lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data, - struct lu_fid *fid); +struct lmv_tgt_desc *lmv_locate_tgt(struct lmv_obd *lmv, + struct md_op_data *op_data, + struct lu_fid *fid); /* lproc_lmv.c */ int lmv_tunables_init(struct obd_device *obd); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 3ddffd8..0da9269 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1141,7 +1141,7 @@ static int lmv_placement_policy(struct obd_device *obd, * 1. See if the stripe offset is specified by lum. * 2. Then check if there is default stripe offset. * 3. Finally choose MDS by name hash if the parent - * is striped directory. (see lmv_locate_mds()). + * is striped directory. (see lmv_locate_tgt()). */ if (op_data->op_cli_flags & CLI_SET_MEA && lum && le32_to_cpu(lum->lum_stripe_offset) != (u32)-1) { @@ -1511,26 +1511,31 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -/** - * Choosing the MDT by name or FID in @op_data. - * For non-striped directory, it will locate MDT by fid. - * For striped-directory, it will locate MDT by name. And also - * it will reset op_fid1 with the FID of the chosen stripe. - **/ -static struct lmv_tgt_desc * -lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, - const char *name, int namelen, struct lu_fid *fid, - u32 *mds) +struct lmv_tgt_desc* +__lmv_locate_tgt(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, + const char *name, int namelen, struct lu_fid *fid, u32 *mds, + bool post_migrate) { const struct lmv_oinfo *oinfo; struct lmv_tgt_desc *tgt; + if (!lsm || namelen == 0) { + tgt = lmv_find_target(lmv, fid); + if (IS_ERR(tgt)) + return tgt; + + LASSERT(mds); + *mds = tgt->ltd_idx; + return tgt; + } + if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_BAD_NAME_HASH)) { if (cfs_fail_val >= lsm->lsm_md_stripe_count) return ERR_PTR(-EBADF); oinfo = &lsm->lsm_md_oinfo[cfs_fail_val]; } else { - oinfo = lsm_name_to_stripe_info(lsm, name, namelen); + oinfo = lsm_name_to_stripe_info(lsm, name, namelen, + post_migrate); if (IS_ERR(oinfo)) return ERR_CAST(oinfo); } @@ -1544,16 +1549,17 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, CDEBUG(D_INFO, "locate on mds %u " DFID "\n", oinfo->lmo_mds, PFID(&oinfo->lmo_fid)); + return tgt; } /** - * Locate mds by fid or name + * Locate mdt by fid or name * - * For striped directory (lsm != NULL), it will locate the stripe - * by name hash (see lsm_name_to_stripe_info()). Note: if the hash_type - * is unknown, it will return -EBADFD, and lmv_intent_lookup might need - * walk through all of stripes to locate the entry. + * For striped directory, it will locate the stripe by name hash, if hash_type + * is unknown, it will return the stripe specified by 'op_data->op_stripe_index' + * which is set outside, and if dir is migrating, 'op_data->op_post_migrate' + * indicates whether old or new layout is used to locate. * * For normal direcotry, it will locate MDS by FID directly. * @@ -1566,10 +1572,11 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, * ERR_PTR(errno) if failed. */ struct lmv_tgt_desc* -lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data, +lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data, struct lu_fid *fid) { struct lmv_stripe_md *lsm = op_data->op_mea1; + struct lmv_oinfo *oinfo; struct lmv_tgt_desc *tgt; /* @@ -1579,17 +1586,15 @@ struct lmv_tgt_desc* */ if (op_data->op_bias & MDS_CREATE_VOLATILE && (int)op_data->op_mds != -1) { - int i; - tgt = lmv_get_target(lmv, op_data->op_mds, NULL); if (IS_ERR(tgt)) return tgt; if (lsm) { + int i; + /* refill the right parent fid */ for (i = 0; i < lsm->lsm_md_stripe_count; i++) { - struct lmv_oinfo *oinfo; - oinfo = &lsm->lsm_md_oinfo[i]; if (oinfo->lmo_mds == op_data->op_mds) { *fid = oinfo->lmo_fid; @@ -1600,23 +1605,22 @@ struct lmv_tgt_desc* if (i == lsm->lsm_md_stripe_count) *fid = lsm->lsm_md_oinfo[0].lmo_fid; } + } else if (lmv_is_dir_bad_hash(lsm)) { + LASSERT(op_data->op_stripe_index < lsm->lsm_md_stripe_count); + oinfo = &lsm->lsm_md_oinfo[op_data->op_stripe_index]; - return tgt; - } - - if (!lsm || !op_data->op_namelen) { - tgt = lmv_find_target(lmv, fid); - if (IS_ERR(tgt)) - return tgt; - - op_data->op_mds = tgt->ltd_idx; + *fid = oinfo->lmo_fid; + op_data->op_mds = oinfo->lmo_mds; - return tgt; + tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); + } else { + tgt = __lmv_locate_tgt(lmv, lsm, op_data->op_name, + op_data->op_namelen, fid, + &op_data->op_mds, + op_data->op_post_migrate); } - return lmv_locate_target_for_name(lmv, lsm, op_data->op_name, - op_data->op_namelen, fid, - &op_data->op_mds); + return tgt; } static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, @@ -1632,7 +1636,33 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (!lmv->desc.ld_active_tgt_count) return -EIO; - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); + if (lmv_is_dir_bad_hash(op_data->op_mea1)) + return -EBADF; + + if (lmv_is_dir_migrating(op_data->op_mea1)) { + /* + * if parent is migrating, create() needs to lookup existing + * name, to avoid creating new file under old layout of + * migrating directory, check old layout here. + */ + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + rc = md_getattr_name(tgt->ltd_exp, op_data, request); + if (!rc) { + ptlrpc_req_finished(*request); + *request = NULL; + return -EEXIST; + } + + if (rc != -ENOENT) + return rc; + + op_data->op_post_migrate = true; + } + + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1685,7 +1715,7 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(&op_data->op_fid1)); - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); + tgt = lmv_find_target(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1696,18 +1726,18 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, extra_lock_flags); } -static int +int lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data, struct ptlrpc_request **preq) { - struct ptlrpc_request *req = NULL; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; struct mdt_body *body; int rc; - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); +retry: + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1716,30 +1746,26 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, PFID(&op_data->op_fid1), tgt->ltd_idx); rc = md_getattr_name(tgt->ltd_exp, op_data, preq); - if (rc != 0) + if (rc == -ENOENT && lmv_dir_retry_check_update(op_data)) { + ptlrpc_req_finished(*preq); + *preq = NULL; + goto retry; + } + + if (rc) return rc; body = req_capsule_server_get(&(*preq)->rq_pill, &RMF_MDT_BODY); if (body->mbo_valid & OBD_MD_MDS) { - struct lu_fid rid = body->mbo_fid1; - - CDEBUG(D_INODE, "Request attrs for " DFID "\n", - PFID(&rid)); - - tgt = lmv_find_target(lmv, &rid); - if (IS_ERR(tgt)) { - ptlrpc_req_finished(*preq); - *preq = NULL; - return PTR_ERR(tgt); - } - - op_data->op_fid1 = rid; + op_data->op_fid1 = body->mbo_fid1; op_data->op_valid |= OBD_MD_FLCROSSREF; op_data->op_namelen = 0; op_data->op_name = NULL; - rc = md_getattr_name(tgt->ltd_exp, op_data, &req); + ptlrpc_req_finished(*preq); - *preq = req; + *preq = NULL; + + goto retry; } return rc; @@ -1808,19 +1834,40 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid()); op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - if (op_data->op_mea2) { - struct lmv_stripe_md *lsm = op_data->op_mea2; - const struct lmv_oinfo *oinfo; - oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name, - op_data->op_namelen); - if (IS_ERR(oinfo)) - return PTR_ERR(oinfo); + if (lmv_is_dir_migrating(op_data->op_mea2)) { + struct lu_fid fid1 = op_data->op_fid1; + struct lmv_stripe_md *lsm1 = op_data->op_mea1; - op_data->op_fid2 = oinfo->lmo_fid; + /* + * avoid creating new file under old layout of migrating + * directory, check it here. + */ + tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, op_data->op_name, + op_data->op_namelen, &op_data->op_fid2, + &op_data->op_mds, false); + tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + op_data->op_fid1 = op_data->op_fid2; + op_data->op_mea1 = op_data->op_mea2; + rc = md_getattr_name(tgt->ltd_exp, op_data, request); + op_data->op_fid1 = fid1; + op_data->op_mea1 = lsm1; + if (!rc) { + ptlrpc_req_finished(*request); + *request = NULL; + return -EEXIST; + } + + if (rc != -ENOENT) + return rc; } - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2); + tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, op_data->op_name, + op_data->op_namelen, &op_data->op_fid2, + &op_data->op_mds, true); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2004,9 +2051,9 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_stripe_md *lsm = op_data->op_mea1; struct lmv_tgt_desc *sp_tgt; struct lmv_tgt_desc *tp_tgt = NULL; + struct lmv_tgt_desc *src_tgt = NULL; struct lmv_tgt_desc *tgt; struct mdt_body *body; int rc; @@ -2022,26 +2069,44 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - CDEBUG(D_INODE, "RENAME "DFID"/%.*s to "DFID"/%.*s\n", - PFID(&op_data->op_fid1), (int)oldlen, old, - PFID(&op_data->op_fid2), (int)newlen, new); + if (lmv_is_dir_migrating(op_data->op_mea2)) { + struct lu_fid fid1 = op_data->op_fid1; + struct lmv_stripe_md *lsm1 = op_data->op_mea1; - if (lsm) - sp_tgt = lmv_locate_target_for_name(lmv, lsm, old, oldlen, - &op_data->op_fid1, - &op_data->op_mds); - else - sp_tgt = lmv_find_target(lmv, &op_data->op_fid1); - if (IS_ERR(sp_tgt)) - return PTR_ERR(sp_tgt); + /* + * we avoid creating new file under old layout of migrating + * directory, if there is an existing file with new name under + * old layout, we can't unlink file in old layout and rename to + * new layout in one transaction, so return -EBUSY here.` + */ + tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, new, newlen, + &op_data->op_fid2, &op_data->op_mds, + false); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); - lsm = op_data->op_mea2; - if (lsm) - tp_tgt = lmv_locate_target_for_name(lmv, lsm, new, newlen, - &op_data->op_fid2, - &op_data->op_mds); - else - tp_tgt = lmv_find_target(lmv, &op_data->op_fid2); + op_data->op_fid1 = op_data->op_fid2; + op_data->op_mea1 = op_data->op_mea2; + op_data->op_name = new; + op_data->op_namelen = newlen; + rc = md_getattr_name(tgt->ltd_exp, op_data, request); + op_data->op_fid1 = fid1; + op_data->op_mea1 = lsm1; + op_data->op_name = NULL; + op_data->op_namelen = 0; + if (!rc) { + ptlrpc_req_finished(*request); + *request = NULL; + return -EBUSY; + } + + if (rc != -ENOENT) + return rc; + } + + /* rename to new layout for migrating directory */ + tp_tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, new, newlen, + &op_data->op_fid2, &op_data->op_mds, true); if (IS_ERR(tp_tgt)) return PTR_ERR(tp_tgt); @@ -2062,34 +2127,28 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, op_data->op_flags |= MF_MDC_CANCEL_FID4; - /* cancel UPDATE locks of source parent */ - rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, LCK_EX, - MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); - if (rc != 0) - return rc; - /* cancel UPDATE locks of target parent */ rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID2); if (rc != 0) return rc; - if (fid_is_sane(&op_data->op_fid3)) { - struct lmv_tgt_desc *src_tgt; - - src_tgt = lmv_find_target(lmv, &op_data->op_fid3); - if (IS_ERR(src_tgt)) - return PTR_ERR(src_tgt); - - /* cancel LOOKUP lock of source on source parent */ - if (src_tgt != sp_tgt) { - rc = lmv_early_cancel(exp, sp_tgt, op_data, + if (fid_is_sane(&op_data->op_fid4)) { + /* cancel LOOKUP lock of target on target parent */ + if (tgt != tp_tgt) { + rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, MDS_INODELOCK_LOOKUP, - MF_MDC_CANCEL_FID3); + MF_MDC_CANCEL_FID4); if (rc != 0) return rc; } + } + + if (fid_is_sane(&op_data->op_fid3)) { + src_tgt = lmv_find_target(lmv, &op_data->op_fid3); + if (IS_ERR(src_tgt)) + return PTR_ERR(src_tgt); /* cancel ELC locks of source */ rc = lmv_early_cancel(exp, src_tgt, op_data, tgt->ltd_idx, @@ -2099,21 +2158,44 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, return rc; } -retry_rename: - if (fid_is_sane(&op_data->op_fid4)) { - /* cancel LOOKUP lock of target on target parent */ - if (tgt != tp_tgt) { - rc = lmv_early_cancel(exp, tp_tgt, op_data, +retry: + sp_tgt = __lmv_locate_tgt(lmv, op_data->op_mea1, old, oldlen, + &op_data->op_fid1, &op_data->op_mds, + op_data->op_post_migrate); + if (IS_ERR(sp_tgt)) + return PTR_ERR(sp_tgt); + + /* cancel UPDATE locks of source parent */ + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); + if (rc != 0) + return rc; + + if (fid_is_sane(&op_data->op_fid3)) { + /* cancel LOOKUP lock of source on source parent */ + if (src_tgt != sp_tgt) { + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, LCK_EX, MDS_INODELOCK_LOOKUP, - MF_MDC_CANCEL_FID4); + MF_MDC_CANCEL_FID3); if (rc != 0) return rc; } } +rename: + CDEBUG(D_INODE, "RENAME " DFID "/%.*s to " DFID "/%.*s\n", + PFID(&op_data->op_fid1), (int)oldlen, old, + PFID(&op_data->op_fid2), (int)newlen, new); + rc = md_rename(tgt->ltd_exp, op_data, old, oldlen, new, newlen, request); + if (rc == -ENOENT && lmv_dir_retry_check_update(op_data)) { + ptlrpc_req_finished(*request); + *request = NULL; + goto retry; + } + if (rc && rc != -EXDEV) return rc; @@ -2125,10 +2207,8 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, if (likely(!(body->mbo_valid & OBD_MD_MDS))) return rc; - CDEBUG(D_INODE, "%s: try rename to another MDT for " DFID "\n", - exp->exp_obd->obd_name, PFID(&body->mbo_fid1)); - op_data->op_fid4 = body->mbo_fid1; + ptlrpc_req_finished(*request); *request = NULL; @@ -2136,7 +2216,19 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(tgt)) return PTR_ERR(tgt); - goto retry_rename; + if (fid_is_sane(&op_data->op_fid4)) { + /* cancel LOOKUP lock of target on target parent */ + if (tgt != tp_tgt) { + rc = lmv_early_cancel(exp, tp_tgt, op_data, + tgt->ltd_idx, LCK_EX, + MDS_INODELOCK_LOOKUP, + MF_MDC_CANCEL_FID4); + if (rc != 0) + return rc; + } + } + + goto rename; } static int lmv_setattr(struct obd_export *exp, struct md_op_data *op_data, @@ -2575,68 +2667,30 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, struct ptlrpc_request **request) { - struct lmv_stripe_md *lsm = op_data->op_mea1; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_tgt_desc *parent_tgt = NULL; - struct lmv_tgt_desc *tgt = NULL; - struct mdt_body *body; - int stripe_index = 0; + struct lmv_tgt_desc *tgt; + struct lmv_tgt_desc *parent_tgt; + struct mdt_body *body; int rc; -retry_unlink: - /* For striped dir, we need to locate the parent as well */ - if (lsm) { - struct lmv_tgt_desc *tmp; - - LASSERT(op_data->op_name && op_data->op_namelen); - - tmp = lmv_locate_target_for_name(lmv, lsm, - op_data->op_name, - op_data->op_namelen, - &op_data->op_fid1, - &op_data->op_mds); - - /* - * return -EBADFD means unknown hash type, might - * need try all sub-stripe here - */ - if (IS_ERR(tmp) && PTR_ERR(tmp) != -EBADFD) - return PTR_ERR(tmp); - - /* - * Note: both migrating dir and unknown hash dir need to - * try all of sub-stripes, so we need start search the - * name from stripe 0, but migrating dir is already handled - * inside lmv_locate_target_for_name(), so we only check - * unknown hash type directory here - */ - if (!lmv_is_known_hash_type(lsm->lsm_md_hash_type)) { - struct lmv_oinfo *oinfo; - - oinfo = &lsm->lsm_md_oinfo[stripe_index]; - - op_data->op_fid1 = oinfo->lmo_fid; - op_data->op_mds = oinfo->lmo_mds; - } - } - -try_next_stripe: - /* Send unlink requests to the MDT where the child is located */ - if (likely(!fid_is_zero(&op_data->op_fid2))) - tgt = lmv_find_target(lmv, &op_data->op_fid2); - else if (lsm) - tgt = lmv_get_target(lmv, op_data->op_mds, NULL); - else - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); - - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid()); op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); +retry: + parent_tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + if (IS_ERR(parent_tgt)) + return PTR_ERR(parent_tgt); + + if (likely(!fid_is_zero(&op_data->op_fid2))) { + tgt = lmv_find_target(lmv, &op_data->op_fid2); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + } else { + tgt = parent_tgt; + } + /* * If child's fid is given, cancel unused locks for it if it is from * another export than parent. @@ -2646,50 +2700,29 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, */ op_data->op_flags |= MF_MDC_CANCEL_FID1 | MF_MDC_CANCEL_FID3; - /* - * Cancel FULL locks on child (fid3). - */ - parent_tgt = lmv_find_target(lmv, &op_data->op_fid1); - if (IS_ERR(parent_tgt)) - return PTR_ERR(parent_tgt); - - if (parent_tgt != tgt) { + if (parent_tgt != tgt) rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID3); - } rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX, MDS_INODELOCK_ELC, MF_MDC_CANCEL_FID3); - if (rc != 0) + if (rc) return rc; CDEBUG(D_INODE, "unlink with fid=" DFID "/" DFID " -> mds #%u\n", PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), tgt->ltd_idx); rc = md_unlink(tgt->ltd_exp, op_data, request); - if (rc != 0 && rc != -EREMOTE && rc != -ENOENT) - return rc; - - /* Try next stripe if it is needed. */ - if (rc == -ENOENT && lsm && lmv_need_try_all_stripes(lsm)) { - struct lmv_oinfo *oinfo; - - stripe_index++; - if (stripe_index >= lsm->lsm_md_stripe_count) - return rc; - - oinfo = &lsm->lsm_md_oinfo[stripe_index]; - - op_data->op_fid1 = oinfo->lmo_fid; - op_data->op_mds = oinfo->lmo_mds; - + if (rc == -ENOENT && lmv_dir_retry_check_update(op_data)) { ptlrpc_req_finished(*request); *request = NULL; - - goto try_next_stripe; + goto retry; } + if (rc != -EREMOTE) + return rc; + body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY); if (!body) return -EPROTO; @@ -2698,34 +2731,16 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, if (likely(!(body->mbo_valid & OBD_MD_MDS))) return rc; - CDEBUG(D_INODE, "%s: try unlink to another MDT for " DFID "\n", - exp->exp_obd->obd_name, PFID(&body->mbo_fid1)); - - /* This is a remote object, try remote MDT, Note: it may - * try more than 1 time here, Considering following case - * /mnt/lustre is root on MDT0, remote1 is on MDT1 - * 1. Initially A does not know where remote1 is, it send - * unlink RPC to MDT0, MDT0 return -EREMOTE, it will - * resend unlink RPC to MDT1 (retry 1st time). - * - * 2. During the unlink RPC in flight, - * client B mv /mnt/lustre/remote1 /mnt/lustre/remote2 - * and create new remote1, but on MDT0 - * - * 3. MDT1 get unlink RPC(from A), then do remote lock on - * /mnt/lustre, then lookup get fid of remote1, and find - * it is remote dir again, and replay -EREMOTE again. - * - * 4. Then A will resend unlink RPC to MDT0. (retry 2nd times). - * - * In theory, it might try unlimited time here, but it should - * be very rare case. - */ + /* This is a remote object, try remote MDT. */ op_data->op_fid2 = body->mbo_fid1; ptlrpc_req_finished(*request); *request = NULL; - goto retry_unlink; + tgt = lmv_find_target(lmv, &op_data->op_fid2); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + goto retry; } static int lmv_precleanup(struct obd_device *obd) @@ -3134,7 +3149,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp, if (!fid_is_sane(&op_data->op_fid2)) return -EINVAL; - tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1); + tgt = lmv_find_target(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -3172,7 +3187,7 @@ static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, const struct lmv_oinfo *oinfo; LASSERT(lsm); - oinfo = lsm_name_to_stripe_info(lsm, name, namelen); + oinfo = lsm_name_to_stripe_info(lsm, name, namelen, false); if (IS_ERR(oinfo)) return PTR_ERR(oinfo); From patchwork Thu Feb 27 21:09:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409879 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9917714BC for ; Thu, 27 Feb 2020 21:24:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 81A7F246A0 for ; Thu, 27 Feb 2020 21:24:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81A7F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B786348CFE; Thu, 27 Feb 2020 13:22:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2150721FB61 for ; Thu, 27 Feb 2020 13:18:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A27BC1040; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A118546F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:47 -0500 Message-Id: <1582838290-17243-120-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 119/622] lustre: mdc: move RPC semaphore code to lustre/osp X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The "MDC RPC semaphore" is no longer used by MDC code since patch http://review.whamcloud.com/14374 "LU-5319 mdc: manage number of modify RPCs in flight" landed. It is only still used by the OSP currently in the OpenSFS branch. While there are plans to remove this from the OSP as well, it makes sense to move all of this code from MDC to OSP so that it will also be cleaned up when that functionality lands. WC-bug-id: https://jira.whamcloud.com/browse/LU-6864 Lustre-commit: 040ca57f2ebd ("LU-6864 mdc: move RPC semaphore code to lustre/osp") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32412 Reviewed-by: Lai Siyao Reviewed-by: James Simmons Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_mdc.h | 96 ----------------------------------------- fs/lustre/include/obd.h | 2 - fs/lustre/include/obd_support.h | 2 +- 3 files changed, 1 insertion(+), 99 deletions(-) diff --git a/fs/lustre/include/lustre_mdc.h b/fs/lustre/include/lustre_mdc.h index 208989f..aecb6ee 100644 --- a/fs/lustre/include/lustre_mdc.h +++ b/fs/lustre/include/lustre_mdc.h @@ -60,102 +60,6 @@ struct ptlrpc_request; struct obd_device; -/** - * Serializes in-flight MDT-modifying RPC requests to preserve idempotency. - * - * This mutex is used to implement execute-once semantics on the MDT. - * The MDT stores the last transaction ID and result for every client in - * its last_rcvd file. If the client doesn't get a reply, it can safely - * resend the request and the MDT will reconstruct the reply being aware - * that the request has already been executed. Without this lock, - * execution status of concurrent in-flight requests would be - * overwritten. - * - * This design limits the extent to which we can keep a full pipeline of - * in-flight requests from a single client. This limitation could be - * overcome by allowing multiple slots per client in the last_rcvd file. - */ -struct mdc_rpc_lock { - /** Lock protecting in-flight RPC concurrency. */ - struct mutex rpcl_mutex; - /** Intent associated with currently executing request. */ - struct lookup_intent *rpcl_it; - /** Used for MDS/RPC load testing purposes. */ - int rpcl_fakes; -}; - -#define MDC_FAKE_RPCL_IT ((void *)0x2c0012bfUL) - -static inline void mdc_init_rpc_lock(struct mdc_rpc_lock *lck) -{ - mutex_init(&lck->rpcl_mutex); - lck->rpcl_it = NULL; -} - -static inline void mdc_get_rpc_lock(struct mdc_rpc_lock *lck, - struct lookup_intent *it) -{ - if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP || - it->it_op == IT_LAYOUT || it->it_op == IT_READDIR)) - return; - - /* This would normally block until the existing request finishes. - * If fail_loc is set it will block until the regular request is - * done, then set rpcl_it to MDC_FAKE_RPCL_IT. Once that is set - * it will only be cleared when all fake requests are finished. - * Only when all fake requests are finished can normal requests - * be sent, to ensure they are recoverable again. - */ -again: - mutex_lock(&lck->rpcl_mutex); - - if (CFS_FAIL_CHECK_QUIET(OBD_FAIL_MDC_RPCS_SEM)) { - lck->rpcl_it = MDC_FAKE_RPCL_IT; - lck->rpcl_fakes++; - mutex_unlock(&lck->rpcl_mutex); - return; - } - - /* This will only happen when the CFS_FAIL_CHECK() was - * just turned off but there are still requests in progress. - * Wait until they finish. It doesn't need to be efficient - * in this extremely rare case, just have low overhead in - * the common case when it isn't true. - */ - while (unlikely(lck->rpcl_it == MDC_FAKE_RPCL_IT)) { - mutex_unlock(&lck->rpcl_mutex); - schedule_timeout_uninterruptible(HZ / 4); - goto again; - } - - LASSERT(!lck->rpcl_it); - lck->rpcl_it = it; -} - -static inline void mdc_put_rpc_lock(struct mdc_rpc_lock *lck, - struct lookup_intent *it) -{ - if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP || - it->it_op == IT_LAYOUT || it->it_op == IT_READDIR)) - return; - - if (lck->rpcl_it == MDC_FAKE_RPCL_IT) { /* OBD_FAIL_MDC_RPCS_SEM */ - mutex_lock(&lck->rpcl_mutex); - - LASSERTF(lck->rpcl_fakes > 0, "%d\n", lck->rpcl_fakes); - lck->rpcl_fakes--; - - if (lck->rpcl_fakes == 0) - lck->rpcl_it = NULL; - - } else { - LASSERTF(it == lck->rpcl_it, "%p != %p\n", it, lck->rpcl_it); - lck->rpcl_it = NULL; - } - - mutex_unlock(&lck->rpcl_mutex); -} - static inline void mdc_get_mod_rpc_slot(struct ptlrpc_request *req, struct lookup_intent *it) { diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index b404391..3910c10 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -304,8 +304,6 @@ struct client_obd { atomic_t cl_destroy_in_flight; wait_queue_head_t cl_destroy_waitq; - struct mdc_rpc_lock *cl_rpc_lock; - /* modify rpcs in flight * currently used for metadata only */ diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 04ef76f..c2db38f 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -385,7 +385,7 @@ #define OBD_FAIL_MDC_ENQUEUE_PAUSE 0x801 #define OBD_FAIL_MDC_OLD_EXT_FLAGS 0x802 #define OBD_FAIL_MDC_GETATTR_ENQUEUE 0x803 -#define OBD_FAIL_MDC_RPCS_SEM 0x804 +#define OBD_FAIL_MDC_RPCS_SEM 0x804 /* deprecated */ #define OBD_FAIL_MDC_LIGHTWEIGHT 0x805 #define OBD_FAIL_MDC_CLOSE 0x806 #define OBD_FAIL_MDC_MERGE 0x807 From patchwork Thu Feb 27 21:09:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409883 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E085314BC for ; Thu, 27 Feb 2020 21:24:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C80C1246A0 for ; Thu, 27 Feb 2020 21:24:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C80C1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9587134888F; Thu, 27 Feb 2020 13:22:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 774EB21FB61 for ; Thu, 27 Feb 2020 13:18:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A56081041; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A43F4468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:48 -0500 Message-Id: <1582838290-17243-121-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 120/622] lnet: libcfs: fix wrong check in libcfs_debug_vmsg2() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Logic here is we skip output if time is before @cdls_next reach and increase @cdls_count. however we did it in the opposite way: 1)libcfs_debug_vmsg2() is called for a long time, that means current check succeed, we skip print messages and return, we will skip all messages later too.. 2)libcfs_debug_vmsg2() is called frequently, current check fail every time, message will be bumped out always. the worst case is we never skip any messages. Also fix test case to cover this later which is from Andreas: The test_60a() llog test is being run on the MGS, while the check in test_60b() to confirm that CDEBUG_LIMIT() works properly is being run on the client. There has been a breakage in CDEBUG_LIMIT() that this test failed to catch, now we need to track it down. Change test_60b to dump the dmesg logs on the MGS. Fixes: b49946b2e ("staging: lustre: libcfs: discard cfs_time_after()") WC-bug-id: https://jira.whamcloud.com/browse/LU-11373 Lustre-commit: 4037c1462730 ("LU-11373 libcfs: fix wrong check in libcfs_debug_vmsg2()") Signed-off-by: Andreas Dilger Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/33154 Reviewed-by: James Simmons Signed-off-by: James Simmons --- net/lnet/libcfs/tracefile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index 6e4cc31..bda3523 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -544,7 +544,7 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, if (cdls) { if (libcfs_console_ratelimit && cdls->cdls_next && /* not first time ever */ - !time_after(jiffies, cdls->cdls_next)) { + time_before(jiffies, cdls->cdls_next)) { /* skipping a console message */ cdls->cdls_count++; if (tcd) From patchwork Thu Feb 27 21:09:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409815 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56BD9138D for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3FE0C246A2 for ; Thu, 27 Feb 2020 21:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3FE0C246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3A14021FD46; Thu, 27 Feb 2020 13:21:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA51321FADA for ; Thu, 27 Feb 2020 13:18:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A85611042; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A724346A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:49 -0500 Message-Id: <1582838290-17243-122-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 121/622] lustre: ptlrpc: new request vs disconnect race X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev new request can race with disconnect-by-idle process. disconnect code detect this state and initiate a new connection. WC-bug-id: https://jira.whamcloud.com/browse/LU-11128 Lustre-commit: 93d20d171c20 ("LU-11128 ptlrpc: new request vs disconnect race") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/32980 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 15 ++++++++++----- fs/lustre/ptlrpc/import.c | 32 +++++++++++++++++++++++++++++--- 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 691df1a..7be597c 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -887,6 +887,13 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, struct ptlrpc_request *request; int connect = 0; + request = __ptlrpc_request_alloc(imp, pool); + if (!request) + return NULL; + + /* initiate connection if needed when the import has been + * referenced by the new request to avoid races with disconnect + */ if (unlikely(imp->imp_state == LUSTRE_IMP_IDLE)) { int rc; @@ -904,16 +911,14 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, spin_unlock(&imp->imp_lock); if (connect) { rc = ptlrpc_connect_import(imp); - if (rc < 0) + if (rc < 0) { + ptlrpc_request_free(request); return NULL; + } ptlrpc_pinger_add_import(imp); } } - request = __ptlrpc_request_alloc(imp, pool); - if (!request) - return NULL; - req_capsule_init(&request->rq_pill, request, RCL_CLIENT); req_capsule_set(&request->rq_pill, format); return request; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 73a345f..f59af80 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1593,13 +1593,39 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, void *data, int rc) { struct obd_import *imp = req->rq_import; + int connect = 0; + + DEBUG_REQ(D_HA, req, "inflight=%d, refcount=%d: rc = %d\n", + atomic_read(&imp->imp_inflight), + atomic_read(&imp->imp_refcount), rc); - LASSERT(imp->imp_state == LUSTRE_IMP_CONNECTING); spin_lock(&imp->imp_lock); - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_IDLE); - memset(&imp->imp_remote_handle, 0, sizeof(imp->imp_remote_handle)); + /* DISCONNECT reply can be late and another connection can just + * be initiated. so we have to abort disconnection. + */ + if (req->rq_import_generation == imp->imp_generation && + imp->imp_state != LUSTRE_IMP_CLOSED) { + LASSERTF(imp->imp_state == LUSTRE_IMP_CONNECTING, + "%s\n", ptlrpc_import_state_name(imp->imp_state)); + imp->imp_state = LUSTRE_IMP_IDLE; + memset(&imp->imp_remote_handle, 0, + sizeof(imp->imp_remote_handle)); + /* take our DISCONNECT into account */ + if (atomic_read(&imp->imp_inflight) > 1) { + imp->imp_generation++; + imp->imp_initiated_at = imp->imp_generation; + IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_NEW); + connect = 1; + } + } spin_unlock(&imp->imp_lock); + if (connect) { + rc = ptlrpc_connect_import(imp); + if (rc >= 0) + ptlrpc_pinger_add_import(imp); + } + return 0; } From patchwork Thu Feb 27 21:09:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409887 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E945138D for ; Thu, 27 Feb 2020 21:24:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2755B246A0 for ; Thu, 27 Feb 2020 21:24:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2755B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14AE3348D5F; Thu, 27 Feb 2020 13:22:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1CF9A21FADA for ; Thu, 27 Feb 2020 13:18:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id ABC211043; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AA66546C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:50 -0500 Message-Id: <1582838290-17243-123-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 122/622] lustre: misc: name open file handles as such X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In a number of places in the code, rename variables from "*_handle" or "*_fh" to "*_open_handle" so that it is clear this is referencing an open file handle rather than something else (e.g. a lock handle). Also rename the confusingly-named mti_close_handle to mti_open_handle, since this is referencing an open file handle, even if it is used at close time to close the file. mfd_handle2mfd() -> mfd_open_handle2mfd() mdt_file_data.mfd_handle -> mfd_open_handle mdt_file_data.mfd_old_handle -> mfd_open_handle_old mdt_thread_info.mti_close_handle -> mti_open_handle mdt_body.mbo_handle -> mbo_open_handle mdt_io_epoch.mio_handle -> mio_open_handle md_op_data.op_handle -> op_open_handle mdt_rec_create.cr_old_handle -> cr_open_handle_old mdt_reint_record.rr_handle -> rr_open_handle obd_client_handle.och_fh -> och_open_handle Change the resync code path to use a "lease_handle" to avoid confusion with an open handle: mdt_rec_resync.rs_handle -> rs_lease_handle use md_op_data.op_lease_handle add mdt_reint_record.rr_lease_handle WC-bug-id: https://jira.whamcloud.com/browse/LU-8174 Lustre-commit: ccb133fd2266 ("LU-8174 misc: name open file handles as such") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/26953 Reviewed-by: Lai Siyao Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 4 ++-- fs/lustre/llite/file.c | 22 +++++++++++----------- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/mdc/mdc_lib.c | 4 ++-- fs/lustre/mdc/mdc_reint.c | 4 ++-- fs/lustre/mdc/mdc_request.c | 26 ++++++++++++++------------ fs/lustre/ptlrpc/pack_generic.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 24 ++++++++++++------------ include/uapi/linux/lustre/lustre_idl.h | 12 ++++++------ 9 files changed, 51 insertions(+), 49 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 3910c10..7cf9745 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -737,7 +737,7 @@ struct md_op_data { struct lu_fid op_fid4; /* to the operation locks. */ u32 op_mds; /* what mds server open will go to */ u32 op_mode; - struct lustre_handle op_handle; + struct lustre_handle op_open_handle; s64 op_mod_time; const char *op_name; size_t op_namelen; @@ -933,7 +933,7 @@ struct md_open_data { }; struct obd_client_handle { - struct lustre_handle och_fh; + struct lustre_handle och_open_handle; struct lu_fid och_fid; struct md_open_data *och_mod; struct lustre_handle och_lease_handle; /* open lock for lease */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fd39948..a46f5d3 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -103,7 +103,7 @@ static void ll_prepare_close(struct inode *inode, struct md_op_data *op_data, op_data->op_attr_flags = ll_inode_to_ext_flags(inode->i_flags); if (test_bit(LLIF_PROJECT_INHERIT, &lli->lli_flags)) op_data->op_attr_flags |= LUSTRE_PROJINHERIT_FL; - op_data->op_handle = och->och_fh; + op_data->op_open_handle = och->och_open_handle; /* * For HSM: if inode data has been modified, pack it so that @@ -230,7 +230,7 @@ static int ll_close_inode_openhandle(struct inode *inode, out: md_clear_open_replay_data(md_exp, och); - och->och_fh.cookie = DEAD_HANDLE_MAGIC; + och->och_open_handle.cookie = DEAD_HANDLE_MAGIC; kfree(och); ptlrpc_req_finished(req); @@ -613,7 +613,7 @@ static int ll_och_fill(struct obd_export *md_exp, struct lookup_intent *it, struct mdt_body *body; body = req_capsule_server_get(&it->it_request->rq_pill, &RMF_MDT_BODY); - och->och_fh = body->mbo_handle; + och->och_open_handle = body->mbo_open_handle; och->och_fid = body->mbo_fid1; och->och_lease_handle.cookie = it->it_lock_handle; och->och_magic = OBD_CLIENT_HANDLE_MAGIC; @@ -903,7 +903,7 @@ static int ll_md_blocking_lease_ast(struct ldlm_lock *lock, * if it has an open lock in cache already. */ static int ll_lease_och_acquire(struct inode *inode, struct file *file, - struct lustre_handle *old_handle) + struct lustre_handle *old_open_handle) { struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct ll_inode_info *lli = ll_i2info(inode); @@ -939,7 +939,7 @@ static int ll_lease_och_acquire(struct inode *inode, struct file *file, *och_p = NULL; } - *old_handle = fd->fd_och->och_fh; + *old_open_handle = fd->fd_och->och_open_handle; out_unlock: mutex_unlock(&lli->lli_och_mutex); @@ -999,7 +999,7 @@ static int ll_lease_och_release(struct inode *inode, struct file *file) struct ll_sb_info *sbi = ll_i2sbi(inode); struct md_op_data *op_data; struct ptlrpc_request *req = NULL; - struct lustre_handle old_handle = { 0 }; + struct lustre_handle old_open_handle = { 0 }; struct obd_client_handle *och = NULL; int rc; int rc2; @@ -1011,7 +1011,7 @@ static int ll_lease_och_release(struct inode *inode, struct file *file) if (!(fmode & file->f_mode) || (file->f_mode & FMODE_EXEC)) return ERR_PTR(-EPERM); - rc = ll_lease_och_acquire(inode, file, &old_handle); + rc = ll_lease_och_acquire(inode, file, &old_open_handle); if (rc) return ERR_PTR(rc); } @@ -1028,7 +1028,7 @@ static int ll_lease_och_release(struct inode *inode, struct file *file) } /* To tell the MDT this openhandle is from the same owner */ - op_data->op_handle = old_handle; + op_data->op_open_handle = old_open_handle; it.it_flags = fmode | open_flags; it.it_flags |= MDS_OPEN_LOCK | MDS_OPEN_BY_FID | MDS_OPEN_LEASE; @@ -1230,7 +1230,7 @@ static int ll_lease_file_resync(struct obd_client_handle *och, if (rc) goto out; - op_data->op_handle = och->och_lease_handle; + op_data->op_lease_handle = och->och_lease_handle; rc = md_file_resync(sbi->ll_md_exp, op_data); if (rc) goto out; @@ -3892,7 +3892,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (rc) goto out_close; - op_data->op_handle = och->och_fh; + op_data->op_open_handle = och->och_open_handle; op_data->op_data_version = data_version; op_data->op_lease_handle = och->och_lease_handle; op_data->op_bias |= MDS_CLOSE_MIGRATE; @@ -3919,7 +3919,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, obd_mod_put(och->och_mod); md_clear_open_replay_data(ll_i2sbi(parent)->ll_md_exp, och); - och->och_fh.cookie = DEAD_HANDLE_MAGIC; + och->och_open_handle.cookie = DEAD_HANDLE_MAGIC; kfree(och); och = NULL; } diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 636ddf8..be67652 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2258,7 +2258,7 @@ void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req) return; op_data->op_fid1 = body->mbo_fid1; - op_data->op_handle = body->mbo_handle; + op_data->op_open_handle = body->mbo_open_handle; op_data->op_mod_time = get_seconds(); md_close(exp, op_data, NULL, &close_req); ptlrpc_req_finished(close_req); diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 5b1691e..00a6be4 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -254,7 +254,7 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data, rec->cr_suppgid2 = op_data->op_suppgids[1]; rec->cr_bias = op_data->op_bias; rec->cr_umask = current_umask(); - rec->cr_old_handle = op_data->op_handle; + rec->cr_open_handle_old = op_data->op_open_handle; if (op_data->op_name) { mdc_pack_name(req, &RMF_NAME, op_data->op_name, @@ -359,7 +359,7 @@ static void mdc_setattr_pack_rec(struct mdt_rec_setattr *rec, static void mdc_ioepoch_pack(struct mdt_ioepoch *epoch, struct md_op_data *op_data) { - epoch->mio_handle = op_data->op_handle; + epoch->mio_open_handle = op_data->op_open_handle; epoch->mio_unused1 = 0; epoch->mio_unused2 = 0; epoch->mio_padding = 0; diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 355cee1..5d82449 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -456,9 +456,9 @@ int mdc_file_resync(struct obd_export *exp, struct md_op_data *op_data) rec->rs_fid = op_data->op_fid1; rec->rs_bias = op_data->op_bias; - lock = ldlm_handle2lock(&op_data->op_handle); + lock = ldlm_handle2lock(&op_data->op_lease_handle); if (lock) { - rec->rs_handle = lock->l_remote_handle; + rec->rs_lease_handle = lock->l_remote_handle; LDLM_LOCK_PUT(lock); } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 0ee42dd..15f94ea 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -593,7 +593,7 @@ void mdc_replay_open(struct ptlrpc_request *req) struct md_open_data *mod = req->rq_cb_data; struct ptlrpc_request *close_req; struct obd_client_handle *och; - struct lustre_handle old; + struct lustre_handle old_open_handle = { }; struct mdt_body *body; if (!mod) { @@ -606,22 +606,22 @@ void mdc_replay_open(struct ptlrpc_request *req) spin_lock(&req->rq_lock); och = mod->mod_och; - if (och && och->och_fh.cookie) + if (och && och->och_open_handle.cookie) req->rq_early_free_repbuf = 1; else req->rq_early_free_repbuf = 0; spin_unlock(&req->rq_lock); if (req->rq_early_free_repbuf) { - struct lustre_handle *file_fh; + struct lustre_handle *file_open_handle; LASSERT(och->och_magic == OBD_CLIENT_HANDLE_MAGIC); - file_fh = &och->och_fh; + file_open_handle = &och->och_open_handle; CDEBUG(D_HA, "updating handle from %#llx to %#llx\n", - file_fh->cookie, body->mbo_handle.cookie); - old = *file_fh; - *file_fh = body->mbo_handle; + file_open_handle->cookie, body->mbo_open_handle.cookie); + old_open_handle = *file_open_handle; + *file_open_handle = body->mbo_open_handle; } close_req = mod->mod_close_req; @@ -635,10 +635,11 @@ void mdc_replay_open(struct ptlrpc_request *req) LASSERT(epoch); if (req->rq_early_free_repbuf) - LASSERT(!memcmp(&old, &epoch->mio_handle, sizeof(old))); + LASSERT(old_open_handle.cookie == + epoch->mio_open_handle.cookie); DEBUG_REQ(D_HA, close_req, "updating close body with new fh"); - epoch->mio_handle = body->mbo_handle; + epoch->mio_open_handle = body->mbo_open_handle; } } @@ -722,11 +723,12 @@ int mdc_set_open_replay_data(struct obd_export *exp, } rec->cr_fid2 = body->mbo_fid1; - rec->cr_old_handle.cookie = body->mbo_handle.cookie; + rec->cr_open_handle_old = body->mbo_open_handle; open_req->rq_replay_cb = mdc_replay_open; if (!fid_is_sane(&body->mbo_fid1)) { DEBUG_REQ(D_ERROR, open_req, - "Saving replay request with insane fid"); + "saving replay request with insane FID " DFID, + PFID(&body->mbo_fid1)); LBUG(); } @@ -774,7 +776,7 @@ static int mdc_clear_open_replay_data(struct obd_export *exp, spin_lock(&mod->mod_open_req->rq_lock); if (mod->mod_och) - mod->mod_och->och_fh.cookie = 0; + mod->mod_och->och_open_handle.cookie = 0; mod->mod_open_req->rq_early_free_repbuf = 0; spin_unlock(&mod->mod_open_req->rq_lock); mdc_free_open(mod); diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index e71f79d..653a8d7 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1770,7 +1770,7 @@ void lustre_swab_mdt_body(struct mdt_body *b) void lustre_swab_mdt_ioepoch(struct mdt_ioepoch *b) { /* handle is opaque */ - /* mio_handle is opaque */ + /* mio_open_handle is opaque */ BUILD_BUG_ON(!offsetof(typeof(*b), mio_unused1)); BUILD_BUG_ON(!offsetof(typeof(*b), mio_unused2)); BUILD_BUG_ON(!offsetof(typeof(*b), mio_padding)); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 4095767..845aff4 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1961,10 +1961,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct mdt_body, mbo_fid2)); LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fid2) == 16, "found %lld\n", (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fid2)); - LASSERTF((int)offsetof(struct mdt_body, mbo_handle) == 32, "found %lld\n", - (long long)(int)offsetof(struct mdt_body, mbo_handle)); - LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_handle) == 8, "found %lld\n", - (long long)(int)sizeof(((struct mdt_body *)0)->mbo_handle)); + LASSERTF((int)offsetof(struct mdt_body, mbo_open_handle) == 32, "found %lld\n", + (long long)(int)offsetof(struct mdt_body, mbo_open_handle)); + LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_open_handle) == 8, "found %lld\n", + (long long)(int)sizeof(((struct mdt_body *)0)->mbo_open_handle)); LASSERTF((int)offsetof(struct mdt_body, mbo_valid) == 40, "found %lld\n", (long long)(int)offsetof(struct mdt_body, mbo_valid)); LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_valid) == 8, "found %lld\n", @@ -2162,10 +2162,10 @@ void lustre_assert_wire_constants(void) /* Checks for struct mdt_ioepoch */ LASSERTF((int)sizeof(struct mdt_ioepoch) == 24, "found %lld\n", (long long)(int)sizeof(struct mdt_ioepoch)); - LASSERTF((int)offsetof(struct mdt_ioepoch, mio_handle) == 0, "found %lld\n", - (long long)(int)offsetof(struct mdt_ioepoch, mio_handle)); - LASSERTF((int)sizeof(((struct mdt_ioepoch *)0)->mio_handle) == 8, "found %lld\n", - (long long)(int)sizeof(((struct mdt_ioepoch *)0)->mio_handle)); + LASSERTF((int)offsetof(struct mdt_ioepoch, mio_open_handle) == 0, "found %lld\n", + (long long)(int)offsetof(struct mdt_ioepoch, mio_open_handle)); + LASSERTF((int)sizeof(((struct mdt_ioepoch *)0)->mio_open_handle) == 8, "found %lld\n", + (long long)(int)sizeof(((struct mdt_ioepoch *)0)->mio_open_handle)); LASSERTF((int)offsetof(struct mdt_ioepoch, mio_unused1) == 8, "found %lld\n", (long long)(int)offsetof(struct mdt_ioepoch, mio_unused1)); LASSERTF((int)sizeof(((struct mdt_ioepoch *)0)->mio_unused1) == 8, "found %lld\n", @@ -2334,10 +2334,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct mdt_rec_create, cr_fid2)); LASSERTF((int)sizeof(((struct mdt_rec_create *)0)->cr_fid2) == 16, "found %lld\n", (long long)(int)sizeof(((struct mdt_rec_create *)0)->cr_fid2)); - LASSERTF((int)offsetof(struct mdt_rec_create, cr_old_handle) == 72, "found %lld\n", - (long long)(int)offsetof(struct mdt_rec_create, cr_old_handle)); - LASSERTF((int)sizeof(((struct mdt_rec_create *)0)->cr_old_handle) == 8, "found %lld\n", - (long long)(int)sizeof(((struct mdt_rec_create *)0)->cr_old_handle)); + LASSERTF((int)offsetof(struct mdt_rec_create, cr_open_handle_old) == 72, "found %lld\n", + (long long)(int)offsetof(struct mdt_rec_create, cr_open_handle_old)); + LASSERTF((int)sizeof(((struct mdt_rec_create *)0)->cr_open_handle_old) == 8, "found %lld\n", + (long long)(int)sizeof(((struct mdt_rec_create *)0)->cr_open_handle_old)); LASSERTF((int)offsetof(struct mdt_rec_create, cr_time) == 80, "found %lld\n", (long long)(int)offsetof(struct mdt_rec_create, cr_time)); LASSERTF((int)sizeof(((struct mdt_rec_create *)0)->cr_time) == 8, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 522bd52..39f2d3b 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1574,7 +1574,7 @@ enum md_transient_state { struct mdt_body { struct lu_fid mbo_fid1; struct lu_fid mbo_fid2; - struct lustre_handle mbo_handle; + struct lustre_handle mbo_open_handle; __u64 mbo_valid; __u64 mbo_size; /* Offset, in the case of MDS_READPAGE */ __s64 mbo_mtime; @@ -1612,7 +1612,7 @@ struct mdt_body { }; /* 216 */ struct mdt_ioepoch { - struct lustre_handle mio_handle; + struct lustre_handle mio_open_handle; __u64 mio_unused1; /* was ioepoch */ __u32 mio_unused2; /* was flags */ __u32 mio_padding; @@ -1719,9 +1719,9 @@ struct mdt_rec_create { __u32 cr_suppgid1_h; __u32 cr_suppgid2; __u32 cr_suppgid2_h; - struct lu_fid cr_fid1; - struct lu_fid cr_fid2; - struct lustre_handle cr_old_handle; /* handle in case of open replay */ + struct lu_fid cr_fid1; + struct lu_fid cr_fid2; + struct lustre_handle cr_open_handle_old; /* in case of open replay */ __s64 cr_time; __u64 cr_rdev; __u64 cr_ioepoch; @@ -1864,7 +1864,7 @@ struct mdt_rec_resync { __u32 rs_suppgid2_h; struct lu_fid rs_fid; __u8 rs_padding0[sizeof(struct lu_fid)]; - struct lustre_handle rs_handle; /* rr_mtime */ + struct lustre_handle rs_lease_handle; /* rr_mtime */ __s64 rs_padding1; /* rr_atime */ __s64 rs_padding2; /* rr_ctime */ __u64 rs_padding3; /* rr_size */ From patchwork Thu Feb 27 21:09:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409891 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B446E14BC for ; Thu, 27 Feb 2020 21:25:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9CD97246A0 for ; Thu, 27 Feb 2020 21:25:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CD97246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB2AB348D85; Thu, 27 Feb 2020 13:22:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 767C021FB35 for ; Thu, 27 Feb 2020 13:18:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AE7D21048; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AD79746D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:51 -0500 Message-Id: <1582838290-17243-124-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 123/622] lustre: ldlm: cleanup LVB handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini For the client side LVB handling is barely used. In the OpenSFS tree lvbo handling was reworked for LU-5042. Merge those changes as well as remove all server related code. WC-bug-id: https://jira.whamcloud.com/browse/LU-5042 Lustre-commit: 8739f13233e ("LU-5042 ldlm: delay filling resource's LVB upon replay") Signed-off-by: Bruno Faccini Reviewed-on: http://review.whamcloud.com/10845 Reviewed-by: Jinshan Xiong Reviewed-by: Niu Yawei Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 62 ++---------------------------------------- fs/lustre/ldlm/ldlm_resource.c | 39 ++++---------------------- 2 files changed, 8 insertions(+), 93 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 6ad12a3..1133e20 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -280,16 +280,12 @@ struct ldlm_pool { * Currently LVBs are used by: * - OSC-OST code to maintain current object size/times * - layout lock code to return the layout when the layout lock is granted + * + * To ensure delayed LVB initialization, it is highly recommended to use the set + * of ldlm_[res_]lvbo_[init,update,fill]() functions. */ struct ldlm_valblock_ops { - int (*lvbo_init)(struct ldlm_resource *res); - int (*lvbo_update)(struct ldlm_resource *res, struct ldlm_lock *lock, - struct ptlrpc_request *r, int increase); int (*lvbo_free)(struct ldlm_resource *res); - /* Return size of lvb data appropriate RPC size can be reserved */ - int (*lvbo_size)(struct ldlm_lock *lock); - /* Called to fill in lvb data to RPC buffer @buf */ - int (*lvbo_fill)(struct ldlm_lock *lock, void *buf, int buflen); }; /** @@ -922,36 +918,6 @@ static inline bool ldlm_has_dom(struct ldlm_lock *lock) return &lock->l_resource->lr_ns_bucket->nsb_at_estimate; } -static inline int ldlm_lvbo_init(struct ldlm_resource *res) -{ - struct ldlm_namespace *ns = ldlm_res_to_ns(res); - - if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) - return ns->ns_lvbo->lvbo_init(res); - - return 0; -} - -static inline int ldlm_lvbo_size(struct ldlm_lock *lock) -{ - struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); - - if (ns->ns_lvbo && ns->ns_lvbo->lvbo_size) - return ns->ns_lvbo->lvbo_size(lock); - - return 0; -} - -static inline int ldlm_lvbo_fill(struct ldlm_lock *lock, void *buf, int len) -{ - struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); - - if (ns->ns_lvbo) - return ns->ns_lvbo->lvbo_fill(lock, buf, len); - - return 0; -} - struct ldlm_ast_work { struct ldlm_lock *w_lock; int w_blocking; @@ -1111,28 +1077,6 @@ static inline struct ldlm_lock *ldlm_handle2lock(const struct lustre_handle *h) return lock; } -/** - * Update Lock Value Block Operations (LVBO) on a resource taking into account - * data from request @r - */ -static inline int ldlm_lvbo_update(struct ldlm_resource *res, - struct ldlm_lock *lock, - struct ptlrpc_request *req, int increase) -{ - struct ldlm_namespace *ns = ldlm_res_to_ns(res); - - if (ns->ns_lvbo && ns->ns_lvbo->lvbo_update) - return ns->ns_lvbo->lvbo_update(res, lock, req, increase); - - return 0; -} - -static inline int ldlm_res_lvbo_update(struct ldlm_resource *res, - struct ptlrpc_request *req, int increase) -{ - return ldlm_lvbo_update(res, NULL, req, increase); -} - int ldlm_error2errno(enum ldlm_error error); #if LUSTRE_TRACKS_LOCK_EXP_REFS diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 5d73132..59b17b5 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -1062,11 +1062,10 @@ static struct ldlm_resource *ldlm_resource_new(enum ldlm_type ldlm_type) spin_lock_init(&res->lr_lock); lu_ref_init(&res->lr_reference); - /* The creator of the resource must unlock the mutex after LVB - * initialization. + /* Since LVB init can be delayed now, there is no longer need to + * immediately acquire mutex here. */ mutex_init(&res->lr_lvb_mutex); - mutex_lock(&res->lr_lvb_mutex); return res; } @@ -1087,7 +1086,6 @@ struct ldlm_resource * struct cfs_hash_bd bd; u64 version; int ns_refcount = 0; - int rc; LASSERT(!parent); LASSERT(ns->ns_rs_hash); @@ -1097,7 +1095,7 @@ struct ldlm_resource * hnode = cfs_hash_bd_lookup_locked(ns->ns_rs_hash, &bd, (void *)name); if (hnode) { cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 0); - goto lvbo_init; + goto found; } version = cfs_hash_bd_version_get(&bd); @@ -1125,25 +1123,12 @@ struct ldlm_resource * cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 1); /* Clean lu_ref for failed resource. */ lu_ref_fini(&res->lr_reference); - /* We have taken lr_lvb_mutex. Drop it. */ - mutex_unlock(&res->lr_lvb_mutex); if (res->lr_itree) kmem_cache_free(ldlm_interval_tree_slab, res->lr_itree); kmem_cache_free(ldlm_resource_slab, res); -lvbo_init: +found: res = hlist_entry(hnode, struct ldlm_resource, lr_hash); - /* Synchronize with regard to resource creation. */ - if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) { - mutex_lock(&res->lr_lvb_mutex); - mutex_unlock(&res->lr_lvb_mutex); - } - - if (unlikely(res->lr_lvb_len < 0)) { - rc = res->lr_lvb_len; - ldlm_resource_putref(res); - res = ERR_PTR(rc); - } return res; } /* We won! Let's add the resource. */ @@ -1152,22 +1137,8 @@ struct ldlm_resource * ns_refcount = ldlm_namespace_get_return(ns); cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 1); - if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) { - OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CREATE_RESOURCE, 2); - rc = ns->ns_lvbo->lvbo_init(res); - if (rc < 0) { - CERROR("%s: lvbo_init failed for resource %#llx:%#llx: rc = %d\n", - ns->ns_obd->obd_name, name->name[0], - name->name[1], rc); - res->lr_lvb_len = rc; - mutex_unlock(&res->lr_lvb_mutex); - ldlm_resource_putref(res); - return ERR_PTR(rc); - } - } - /* We create resource with locked lr_lvb_mutex. */ - mutex_unlock(&res->lr_lvb_mutex); + OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CREATE_RESOURCE, 2); /* Let's see if we happened to be the very first resource in this * namespace. If so, and this is a client namespace, we need to move From patchwork Thu Feb 27 21:09:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409895 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26B5A138D for ; Thu, 27 Feb 2020 21:25:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B915246A0 for ; Thu, 27 Feb 2020 21:25:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B915246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EEAD4348DB7; Thu, 27 Feb 2020 13:22:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D3EA321FA55 for ; Thu, 27 Feb 2020 13:18:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B184D104A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B07AB468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:52 -0500 Message-Id: <1582838290-17243-125-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 124/622] lustre: ldlm: pass preallocated env to methods X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev to save on env allocation. Benchmarks made by Shuichi Ihara demonstrated 13% improvement for small I/Os: 564k vs 639k IOPS. the details can be found at https://jira.whamcloud.com/browse/LU-11164. Lustre-commit:e02cb40761ff8 ("LU-11164 ldlm: pass env to lvbo methods") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/32832 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 +- fs/lustre/ldlm/ldlm_internal.h | 3 ++- fs/lustre/ldlm/ldlm_lock.c | 6 ++++-- fs/lustre/ldlm/ldlm_request.c | 3 ++- fs/lustre/lov/lov_obd.c | 4 ++-- fs/lustre/ptlrpc/client.c | 23 ++++++++++++++++++++--- fs/lustre/ptlrpc/ptlrpcd.c | 2 +- 7 files changed, 32 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index cf13555..cbd524c 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1842,7 +1842,7 @@ struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid, struct ptlrpc_request_set *ptlrpc_prep_fcset(int max, set_producer_func func, void *arg); int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set); -int ptlrpc_set_wait(struct ptlrpc_request_set *); +int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set); void ptlrpc_set_destroy(struct ptlrpc_request_set *); void ptlrpc_set_add_req(struct ptlrpc_request_set *, struct ptlrpc_request *); #define PTLRPCD_SET ((struct ptlrpc_request_set *)1) diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index ec68713..df57c02 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -137,7 +137,8 @@ struct ldlm_lock * enum ldlm_type type, enum ldlm_mode mode, const struct ldlm_callback_suite *cbs, void *data, u32 lvb_len, enum lvb_type lvb_type); -enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns, +enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, + struct ldlm_namespace *ns, struct ldlm_lock **lock, void *cookie, u64 *flags); void ldlm_lock_addref_internal(struct ldlm_lock *lock, enum ldlm_mode mode); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 4f746ad..bdbbfec 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1578,7 +1578,8 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, * Does not block. As a result of enqueue the lock would be put * into granted or waiting list. */ -enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns, +enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, + struct ldlm_namespace *ns, struct ldlm_lock **lockp, void *cookie, u64 *flags) { @@ -1832,7 +1833,7 @@ int ldlm_run_ast_work(struct ldlm_namespace *ns, struct list_head *rpc_list, goto out; } - ptlrpc_set_wait(arg->set); + ptlrpc_set_wait(NULL, arg->set); ptlrpc_set_destroy(arg->set); rc = atomic_read(&arg->restart) ? -ERESTART : 0; @@ -1945,6 +1946,7 @@ int ldlm_lock_set_data(const struct lustre_handle *lockh, void *data) EXPORT_SYMBOL(ldlm_lock_set_data); struct export_cl_data { + const struct lu_env *ecl_env; struct obd_export *ecl_exp; int ecl_loop; }; diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index f045d30..9d3330c 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -343,6 +343,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, const struct lustre_handle *lockh, int rc) { struct ldlm_namespace *ns = exp->exp_obd->obd_namespace; + const struct lu_env *env = NULL; int is_replay = *flags & LDLM_FL_REPLAY; struct ldlm_lock *lock; struct ldlm_reply *reply; @@ -487,7 +488,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, } if (!is_replay) { - rc = ldlm_lock_enqueue(ns, &lock, NULL, flags); + rc = ldlm_lock_enqueue(env, ns, &lock, NULL, flags); if (lock->l_completion_ast) { int err = lock->l_completion_ast(lock, *flags, NULL); diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 35eaa1f..9a6ffe8 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -948,7 +948,7 @@ static int lov_statfs(const struct lu_env *env, struct obd_export *exp, goto out_set; } - rc = ptlrpc_set_wait(rqset); + rc = ptlrpc_set_wait(env, rqset); out_set: if (rc < 0) @@ -1249,7 +1249,7 @@ static int lov_set_info_async(const struct lu_env *env, struct obd_export *exp, lov_tgts_putref(obddev); if (no_set) { - err = ptlrpc_set_wait(set); + err = ptlrpc_set_wait(env, set); if (rc == 0) rc = err; ptlrpc_set_destroy(set); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 7be597c..fabe675 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2278,9 +2278,10 @@ time64_t ptlrpc_set_next_timeout(struct ptlrpc_request_set *set) * error or otherwise be interrupted). * Returns 0 on success or error code otherwise. */ -int ptlrpc_set_wait(struct ptlrpc_request_set *set) +int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set) { struct ptlrpc_request *req; + struct lu_env _env; time64_t timeout; int rc; @@ -2295,6 +2296,19 @@ int ptlrpc_set_wait(struct ptlrpc_request_set *set) if (list_empty(&set->set_requests)) return 0; + /* ideally we want env provide by the caller all the time, + * but at the moment that would mean a massive change in + * LDLM while benefits would be close to zero, so just + * initialize env here for those rare cases + */ + if (!env) { + /* XXX: skip on the client side? */ + rc = lu_env_init(&_env, LCT_DT_THREAD); + if (rc) + return rc; + env = &_env; + } + do { timeout = ptlrpc_set_next_timeout(set); @@ -2313,7 +2327,7 @@ int ptlrpc_set_wait(struct ptlrpc_request_set *set) * so we allow interrupts during the timeout. */ rc = l_wait_event_abortable_timeout(set->set_waitq, - ptlrpc_check_set(NULL, set), + ptlrpc_check_set(env, set), HZ); if (rc == 0) { rc = -ETIMEDOUT; @@ -2380,6 +2394,9 @@ int ptlrpc_set_wait(struct ptlrpc_request_set *set) rc = req->rq_status; } + if (env && env == &_env) + lu_env_fini(&_env); + return rc; } EXPORT_SYMBOL(ptlrpc_set_wait); @@ -2841,7 +2858,7 @@ int ptlrpc_queue_wait(struct ptlrpc_request *req) /* add a ref for the set (see comment in ptlrpc_set_add_req) */ ptlrpc_request_addref(req); ptlrpc_set_add_req(set, req); - rc = ptlrpc_set_wait(set); + rc = ptlrpc_set_wait(NULL, set); ptlrpc_set_destroy(set); return rc; diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c index c0b091c..e9c03ba 100644 --- a/fs/lustre/ptlrpc/ptlrpcd.c +++ b/fs/lustre/ptlrpc/ptlrpcd.c @@ -469,7 +469,7 @@ static int ptlrpcd(void *arg) * Wait for inflight requests to drain. */ if (!list_empty(&set->set_requests)) - ptlrpc_set_wait(set); + ptlrpc_set_wait(&env, set); lu_context_fini(&env.le_ctx); lu_context_fini(env.le_ses); From patchwork Thu Feb 27 21:09:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 887D714BC for ; Thu, 27 Feb 2020 21:23:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7163D246A0 for ; Thu, 27 Feb 2020 21:23:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7163D246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D89AA3489E0; Thu, 27 Feb 2020 13:21:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 362C721FB7B for ; Thu, 27 Feb 2020 13:18:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B4E031053; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B37A346F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:53 -0500 Message-Id: <1582838290-17243-126-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 125/622] lustre: osc: move obdo_cache to OSC code X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The obdo_cache slab is only used by the OSC code today, so it does not need to be allocated in obdclass on servers. Move it to only be allocated when the OSC module is loaded. Rename obdo_cachep to osc_obdo_kmem to match other slab caches created by the OSC. WC-bug-id: https://jira.whamcloud.com/browse/LU-10899 Lustre-commit: 48df66be72c9 ("LU-10899 osc: move obdo_cache to OSC code") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33141 Reviewed-by: James Simmons Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 1 + fs/lustre/include/obd_class.h | 3 --- fs/lustre/obdclass/genops.c | 10 ---------- fs/lustre/osc/osc_dev.c | 8 ++++++-- fs/lustre/osc/osc_request.c | 11 +++++------ 5 files changed, 12 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index dc8071a..dabcee0 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -557,6 +557,7 @@ struct osc_brw_async_args { extern struct kmem_cache *osc_session_kmem; extern struct kmem_cache *osc_extent_kmem; extern struct kmem_cache *osc_quota_kmem; +extern struct kmem_cache *osc_obdo_kmem; extern struct lu_context_key osc_key; extern struct lu_context_key osc_session_key; diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index a3ef5d5..01eb385 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1651,9 +1651,6 @@ static inline int md_unpackmd(struct obd_export *exp, int obd_init_caches(void); void obd_cleanup_caches(void); -/* support routines */ -extern struct kmem_cache *obdo_cachep; - typedef int (*register_lwp_cb)(void *data); struct lwp_register_item { diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index a122332..e5e2f73 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -46,8 +46,6 @@ static struct obd_device *obd_devs[MAX_OBD_DEVICES]; static struct kmem_cache *obd_device_cachep; -struct kmem_cache *obdo_cachep; -EXPORT_SYMBOL(obdo_cachep); static struct kobj_type class_ktype; static struct workqueue_struct *zombie_wq; @@ -645,8 +643,6 @@ void obd_cleanup_caches(void) { kmem_cache_destroy(obd_device_cachep); obd_device_cachep = NULL; - kmem_cache_destroy(obdo_cachep); - obdo_cachep = NULL; } int obd_init_caches(void) @@ -658,12 +654,6 @@ int obd_init_caches(void) if (!obd_device_cachep) goto out; - LASSERT(!obdo_cachep); - obdo_cachep = kmem_cache_create("ll_obdo_cache", sizeof(struct obdo), - 0, 0, NULL); - if (!obdo_cachep) - goto out; - return 0; out: obd_cleanup_caches(); diff --git a/fs/lustre/osc/osc_dev.c b/fs/lustre/osc/osc_dev.c index 3d0687a..b8bf75a 100644 --- a/fs/lustre/osc/osc_dev.c +++ b/fs/lustre/osc/osc_dev.c @@ -55,9 +55,8 @@ struct kmem_cache *osc_thread_kmem; struct kmem_cache *osc_session_kmem; struct kmem_cache *osc_extent_kmem; -EXPORT_SYMBOL(osc_extent_kmem); struct kmem_cache *osc_quota_kmem; -EXPORT_SYMBOL(osc_quota_kmem); +struct kmem_cache *osc_obdo_kmem; struct lu_kmem_descr osc_caches[] = { { @@ -91,6 +90,11 @@ struct lu_kmem_descr osc_caches[] = { .ckd_size = sizeof(struct osc_quota_info) }, { + .ckd_cache = &osc_obdo_kmem, + .ckd_name = "osc_obdo_kmem", + .ckd_size = sizeof(struct obdo) + }, + { .ckd_cache = NULL } }; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 2784e1e..e968360 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -749,7 +749,7 @@ static int osc_shrink_grant_interpret(const struct lu_env *env, LASSERT(body); osc_update_grant(cli, body); out: - kmem_cache_free(obdo_cachep, oa); + kmem_cache_free(osc_obdo_kmem, oa); return rc; } @@ -2115,7 +2115,7 @@ static int brw_interpret(const struct lu_env *env, cl_object_attr_update(env, obj, attr, valid); cl_object_attr_unlock(obj); } - kmem_cache_free(obdo_cachep, aa->aa_oa); + kmem_cache_free(osc_obdo_kmem, aa->aa_oa); if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0) osc_inc_unstable_pages(req); @@ -2223,7 +2223,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli, goto out; } - oa = kmem_cache_zalloc(obdo_cachep, GFP_NOFS); + oa = kmem_cache_zalloc(osc_obdo_kmem, GFP_NOFS); if (!oa) { rc = -ENOMEM; goto out; @@ -2349,8 +2349,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli, if (rc != 0) { LASSERT(!req); - if (oa) - kmem_cache_free(obdo_cachep, oa); + kmem_cache_free(osc_obdo_kmem, oa); kfree(pga); /* this should happen rarely and is pretty bad, it makes the * pending list not follow the dirty order @@ -2960,7 +2959,7 @@ int osc_set_info_async(const struct lu_env *env, struct obd_export *exp, struct obdo *oa; aa = ptlrpc_req_async_args(aa, req); - oa = kmem_cache_zalloc(obdo_cachep, GFP_NOFS); + oa = kmem_cache_zalloc(osc_obdo_kmem, GFP_NOFS); if (!oa) { ptlrpc_req_finished(req); return -ENOMEM; From patchwork Thu Feb 27 21:09:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44588138D for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2D0E0246A0 for ; Thu, 27 Feb 2020 21:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D0E0246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 11ED5348A09; Thu, 27 Feb 2020 13:21:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8C62B21FB7B for ; Thu, 27 Feb 2020 13:18:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B92971054; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B67F746A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:54 -0500 Message-Id: <1582838290-17243-127-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 126/622] lustre: llite: zero lum for stripeless files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In the IOC_MDC_GETFILEINFO/LL_IOC_MDC_GETINFO case of ll_dir_ioctl(), if the file has no striping then zero out the lum buffer so that userspace won't be confused by garbage. WC-bug-id: https://jira.whamcloud.com/browse/LU-11380 Lustre-commit: fab95b4345db ("LU-11380 llite: zero lum for stripeless files") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/33172 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 06f7bd3..55a1efb 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1442,15 +1442,14 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto out_req; } - if (rc < 0) { - if (rc == -ENODATA && (cmd == IOC_MDC_GETFILEINFO || - cmd == LL_IOC_MDC_GETINFO)) { - rc = 0; - goto skip_lmm; - } + if (rc == -ENODATA && (cmd == IOC_MDC_GETFILEINFO || + cmd == LL_IOC_MDC_GETINFO)) { + lmmsize = 0; + rc = 0; + } + if (rc < 0) goto out_req; - } if (cmd == IOC_MDC_GETFILESTRIPE || cmd == LL_IOC_LOV_GETSTRIPE || @@ -1462,14 +1461,23 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) lmdp = (struct lov_user_mds_data __user *)arg; lump = &lmdp->lmd_lmm; } - if (copy_to_user(lump, lmm, lmmsize)) { + + if (lmmsize == 0) { + /* If the file has no striping then zero out *lump so + * that the caller isn't confused by garbage. + */ + if (clear_user(lump, sizeof(*lump))) { + rc = -EFAULT; + goto out_req; + } + } else if (copy_to_user(lump, lmm, lmmsize)) { if (copy_to_user(lump, lmm, sizeof(*lump))) { rc = -EFAULT; goto out_req; } rc = -EOVERFLOW; } -skip_lmm: + if (cmd == IOC_MDC_GETFILEINFO || cmd == LL_IOC_MDC_GETINFO) { struct lov_user_mds_data __user *lmdp; lstat_t st = { 0 }; From patchwork Thu Feb 27 21:09:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409899 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7EA6A14BC for ; Thu, 27 Feb 2020 21:25:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6747E246A0 for ; Thu, 27 Feb 2020 21:25:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6747E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B3F05348DF0; Thu, 27 Feb 2020 13:22:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D63F721FB80 for ; Thu, 27 Feb 2020 13:18:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BAF401055; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B9C7D46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:55 -0500 Message-Id: <1582838290-17243-128-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 127/622] lustre: idl: remove obsolete RPC flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove RPC flags that are no longer in use: - OBD_MD_FLQOS has never been used in master branch - OBD_MD_FLEPOCH unused since v2_7_50_0-38-gd5d5b349f2 - OBD_MD_REINT unused since before 1.6 - OBD_MD_FLMDSCAPA unused since v2_7_55_0-15-g353ef58b1d - OBD_MD_FLOSSCAPA unused since v2_7_55_0-15-g353ef58b1d Rename OBD_MD_FLGENER to OBD_MD_FLPARENT to more accurately describe that this flag is only used to mark the parent FID in the OST obdo. WC-bug-id: https://jira.whamcloud.com/browse/LU-11397 Lustre-commit: f63366a3c285 ("LU-11397 idl: remove obsolete RPC flags") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33202 Reviewed-by: Mike Pershin Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obdo.c | 2 +- fs/lustre/ptlrpc/layout.c | 6 ++---- fs/lustre/ptlrpc/pack_generic.c | 5 +---- fs/lustre/ptlrpc/wiretest.c | 14 ++------------ include/uapi/linux/lustre/lustre_idl.h | 18 ++++++++---------- 5 files changed, 14 insertions(+), 31 deletions(-) diff --git a/fs/lustre/obdclass/obdo.c b/fs/lustre/obdclass/obdo.c index e5475f1..8fd2922 100644 --- a/fs/lustre/obdclass/obdo.c +++ b/fs/lustre/obdclass/obdo.c @@ -48,7 +48,7 @@ void obdo_set_parent_fid(struct obdo *dst, const struct lu_fid *parent) dst->o_parent_oid = fid_oid(parent); dst->o_parent_seq = fid_seq(parent); dst->o_parent_ver = fid_ver(parent); - dst->o_valid |= OBD_MD_FLGENER | OBD_MD_FLFID; + dst->o_valid |= OBD_MD_FLPARENT | OBD_MD_FLFID; } EXPORT_SYMBOL(obdo_set_parent_fid); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 225a73e..efbff69 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1022,13 +1022,11 @@ struct req_msg_field RMF_LOGCOOKIES = EXPORT_SYMBOL(RMF_LOGCOOKIES); struct req_msg_field RMF_CAPA1 = - DEFINE_MSGF("capa", 0, sizeof(struct lustre_capa), - lustre_swab_lustre_capa, NULL); + DEFINE_MSGF("capa", 0, 0, NULL, NULL); EXPORT_SYMBOL(RMF_CAPA1); struct req_msg_field RMF_CAPA2 = - DEFINE_MSGF("capa", 0, sizeof(struct lustre_capa), - lustre_swab_lustre_capa, NULL); + DEFINE_MSGF("capa", 0, 0, NULL, NULL); EXPORT_SYMBOL(RMF_CAPA2); struct req_msg_field RMF_LAYOUT_INTENT = diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 653a8d7..6da9aca 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2242,12 +2242,9 @@ static void dump_obdo(struct obdo *oa) else if (valid & OBD_MD_FLCKSUM) CDEBUG(D_RPCTRACE, "obdo: o_checksum (o_nlink) = %u\n", oa->o_nlink); - if (valid & OBD_MD_FLGENER) + if (valid & OBD_MD_FLPARENT) CDEBUG(D_RPCTRACE, "obdo: o_parent_oid = %x\n", oa->o_parent_oid); - if (valid & OBD_MD_FLEPOCH) - CDEBUG(D_RPCTRACE, "obdo: o_ioepoch = %lld\n", - oa->o_ioepoch); if (valid & OBD_MD_FLFID) { CDEBUG(D_RPCTRACE, "obdo: o_stripe_idx = %u\n", oa->o_stripe_idx); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 845aff4..42af0b8 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1335,8 +1335,8 @@ void lustre_assert_wire_constants(void) OBD_MD_FLFLAGS); LASSERTF(OBD_MD_FLNLINK == (0x00002000ULL), "found 0x%.16llxULL\n", OBD_MD_FLNLINK); - LASSERTF(OBD_MD_FLGENER == (0x00004000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLGENER); + LASSERTF(OBD_MD_FLPARENT == (0x00004000ULL), "found 0x%.16llxULL\n", + OBD_MD_FLPARENT); LASSERTF(OBD_MD_FLRDEV == (0x00010000ULL), "found 0x%.16llxULL\n", OBD_MD_FLRDEV); LASSERTF(OBD_MD_FLEASIZE == (0x00020000ULL), "found 0x%.16llxULL\n", @@ -1347,14 +1347,10 @@ void lustre_assert_wire_constants(void) OBD_MD_FLHANDLE); LASSERTF(OBD_MD_FLCKSUM == (0x00100000ULL), "found 0x%.16llxULL\n", OBD_MD_FLCKSUM); - LASSERTF(OBD_MD_FLQOS == (0x00200000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLQOS); LASSERTF(OBD_MD_FLGROUP == (0x01000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLGROUP); LASSERTF(OBD_MD_FLFID == (0x02000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLFID); - LASSERTF(OBD_MD_FLEPOCH == (0x04000000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLEPOCH); LASSERTF(OBD_MD_FLGRANT == (0x08000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLGRANT); LASSERTF(OBD_MD_FLDIREA == (0x10000000ULL), "found 0x%.16llxULL\n", @@ -1367,8 +1363,6 @@ void lustre_assert_wire_constants(void) OBD_MD_FLMODEASIZE); LASSERTF(OBD_MD_MDS == (0x0000000100000000ULL), "found 0x%.16llxULL\n", OBD_MD_MDS); - LASSERTF(OBD_MD_REINT == (0x0000000200000000ULL), "found 0x%.16llxULL\n", - OBD_MD_REINT); LASSERTF(OBD_MD_MEA == (0x0000000400000000ULL), "found 0x%.16llxULL\n", OBD_MD_MEA); LASSERTF(OBD_MD_TSTATE == (0x0000000800000000ULL), @@ -1381,10 +1375,6 @@ void lustre_assert_wire_constants(void) OBD_MD_FLXATTRRM); LASSERTF(OBD_MD_FLACL == (0x0000008000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLACL); - LASSERTF(OBD_MD_FLMDSCAPA == (0x0000020000000000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLMDSCAPA); - LASSERTF(OBD_MD_FLOSSCAPA == (0x0000040000000000ULL), "found 0x%.16llxULL\n", - OBD_MD_FLOSSCAPA); LASSERTF(OBD_MD_FLCROSSREF == (0x0000100000000000ULL), "found 0x%.16llxULL\n", OBD_MD_FLCROSSREF); LASSERTF(OBD_MD_FLGETATTRLOCK == (0x0000200000000000ULL), "found 0x%.16llxULL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 39f2d3b..8002e046 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1137,21 +1137,19 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLFLAGS (0x00000800ULL) /* flags word */ #define OBD_MD_DOM_SIZE (0x00001000ULL) /* Data-on-MDT component size */ #define OBD_MD_FLNLINK (0x00002000ULL) /* link count */ -#define OBD_MD_FLGENER (0x00004000ULL) /* generation number */ -#define OBD_MD_LAYOUT_VERSION (0x00008000ULL) /* layout version for - * OST objects - */ +#define OBD_MD_FLPARENT (0x00004000ULL) /* parent FID */ +#define OBD_MD_LAYOUT_VERSION (0x00008000ULL) /* OST object layout version */ #define OBD_MD_FLRDEV (0x00010000ULL) /* device number */ #define OBD_MD_FLEASIZE (0x00020000ULL) /* extended attribute data */ #define OBD_MD_LINKNAME (0x00040000ULL) /* symbolic link target */ #define OBD_MD_FLHANDLE (0x00080000ULL) /* file/lock handle */ #define OBD_MD_FLCKSUM (0x00100000ULL) /* bulk data checksum */ -#define OBD_MD_FLQOS (0x00200000ULL) /* quality of service stats */ +/* OBD_MD_FLQOS (0x00200000ULL) has never been used */ #define OBD_MD_FLPRJQUOTA (0x00400000ULL) /* over quota flags sent from ost */ /* OBD_MD_FLCOOKIE (0x00800000ULL) obsolete in 2.8 */ #define OBD_MD_FLGROUP (0x01000000ULL) /* group */ #define OBD_MD_FLFID (0x02000000ULL) /* ->ost write inline fid */ -#define OBD_MD_FLEPOCH (0x04000000ULL) /* ->ost write with ioepoch */ +/* OBD_MD_FLEPOCH (0x04000000ULL) obsolete 2.7.50 */ /* ->mds if epoch opens or closes */ #define OBD_MD_FLGRANT (0x08000000ULL) /* ost preallocation space grant */ #define OBD_MD_FLDIREA (0x10000000ULL) /* dir's extended attribute data */ @@ -1160,7 +1158,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLMODEASIZE (0x80000000ULL) /* EA size will be changed */ #define OBD_MD_MDS (0x0000000100000000ULL) /* where an inode lives on */ -#define OBD_MD_REINT (0x0000000200000000ULL) /* reintegrate oa */ +/* OBD_MD_REINT (0x0000000200000000ULL) obsolete 1.8 */ #define OBD_MD_MEA (0x0000000400000000ULL) /* CMD split EA */ #define OBD_MD_TSTATE (0x0000000800000000ULL) /* transient state field */ @@ -1169,8 +1167,8 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLXATTRRM (0x0000004000000000ULL) /* xattr remove */ #define OBD_MD_FLACL (0x0000008000000000ULL) /* ACL */ #define OBD_MD_FLAGSTATFS (0x0000010000000000ULL) /* aggregated statfs */ -#define OBD_MD_FLMDSCAPA (0x0000020000000000ULL) /* MDS capability */ -#define OBD_MD_FLOSSCAPA (0x0000040000000000ULL) /* OSS capability */ +/* OBD_MD_FLMDSCAPA (0x0000020000000000ULL) obsolete 2.7.54 */ +/* OBD_MD_FLOSSCAPA (0x0000040000000000ULL) obsolete 2.7.54 */ /* OBD_MD_FLCKSPLIT (0x0000080000000000ULL) obsolete 2.3.58*/ #define OBD_MD_FLCROSSREF (0x0000100000000000ULL) /* Cross-ref case */ #define OBD_MD_FLGETATTRLOCK (0x0000200000000000ULL) /* Get IOEpoch attributes @@ -1202,7 +1200,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) OBD_MD_FLCTIME | OBD_MD_FLSIZE | OBD_MD_FLBLKSZ | \ OBD_MD_FLMODE | OBD_MD_FLTYPE | OBD_MD_FLUID | \ OBD_MD_FLGID | OBD_MD_FLFLAGS | OBD_MD_FLNLINK | \ - OBD_MD_FLGENER | OBD_MD_FLRDEV | OBD_MD_FLGROUP | \ + OBD_MD_FLPARENT | OBD_MD_FLRDEV | OBD_MD_FLGROUP | \ OBD_MD_FLPROJID) #define OBD_MD_FLXATTRALL (OBD_MD_FLXATTR | OBD_MD_FLXATTRLS) From patchwork Thu Feb 27 21:09:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409885 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13FDB14BC for ; Thu, 27 Feb 2020 21:24:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F0F88246A0 for ; Thu, 27 Feb 2020 21:24:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F0F88246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 24280348D51; Thu, 27 Feb 2020 13:22:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B1C121F982 for ; Thu, 27 Feb 2020 13:18:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BED8A1056; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BCD9A468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:56 -0500 Message-Id: <1582838290-17243-129-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 128/622] lustre: flr: add 'nosync' flag for FLR mirrors X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam This patch allows 'nosync' flag to be set for FLR mirror components, which makes lfs mirror resync skip on mirrors with this flag unless mirror resync explicitly requested those mirrors to be resync. This flag can be cleared by set '^nosync' on any component of the mirror. WC-bug-id: https://jira.whamcloud.com/browse/LU-11400 Lustre-commit: 8a0554450eaa ("LU-11400 flr: add 'nosync' flag for FLR mirrors") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/33205 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_ea.c | 3 +++ fs/lustre/lov/lov_internal.h | 1 + fs/lustre/lov/lov_pack.c | 3 +++ fs/lustre/ptlrpc/pack_generic.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 18 +++++++++++++----- include/uapi/linux/lustre/lustre_user.h | 8 ++++++-- 6 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index edca3b0..31a18d0 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -478,6 +478,9 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm, lsm->lsm_entries[i] = lsme; lsme->lsme_id = le32_to_cpu(lcme->lcme_id); lsme->lsme_flags = le32_to_cpu(lcme->lcme_flags); + if (lsme->lsme_flags & LCME_FL_NOSYNC) + lsme->lsme_timestamp = + le64_to_cpu(lcme->lcme_timestamp); lu_extent_le_to_cpu(&lsme->lsme_extent, &lcme->lcme_extent); if (i == entry_count - 1) { diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h index 5dba8d3..376ac52 100644 --- a/fs/lustre/lov/lov_internal.h +++ b/fs/lustre/lov/lov_internal.h @@ -50,6 +50,7 @@ struct lov_stripe_md_entry { u32 lsme_magic; u32 lsme_flags; u32 lsme_pattern; + u64 lsme_timestamp; u32 lsme_stripe_size; u16 lsme_stripe_count; u16 lsme_layout_gen; diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c index 3dbc6aa..5f8b281 100644 --- a/fs/lustre/lov/lov_pack.c +++ b/fs/lustre/lov/lov_pack.c @@ -201,6 +201,9 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf, lcme->lcme_id = cpu_to_le32(lsme->lsme_id); lcme->lcme_flags = cpu_to_le32(lsme->lsme_flags); + if (lsme->lsme_flags & LCME_FL_NOSYNC) + lcme->lcme_timestamp = + cpu_to_le64(lsme->lsme_timestamp); lcme->lcme_extent.e_start = cpu_to_le64(lsme->lsme_extent.e_start); lcme->lcme_extent.e_end = diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 6da9aca..d93dbe1 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2062,13 +2062,13 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum) } __swab32s(&ent->lcme_id); __swab32s(&ent->lcme_flags); + __swab64s(&ent->lcme_timestamp); __swab64s(&ent->lcme_extent.e_start); __swab64s(&ent->lcme_extent.e_end); __swab32s(&ent->lcme_offset); __swab32s(&ent->lcme_size); __swab32s(&ent->lcme_layout_gen); BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_1) == 0); - BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding_2) == 0); v1 = (struct lov_user_md_v1 *)((char *)lum + off); stripe_count = v1->lmm_stripe_count; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 42af0b8..c6dd256 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1532,14 +1532,14 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_layout_gen)); LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen) == 4, "found %lld\n", (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_layout_gen)); - LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1) == 36, "found %lld\n", + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_timestamp) == 36, "found %lld\n", + (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_timestamp)); + LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_timestamp) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_timestamp)); + LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1) == 44, "found %lld\n", (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_1)); LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1) == 4, "found %lld\n", (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_1)); - LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2) == 40, "found %lld\n", - (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding_2)); - LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2) == 8, "found %lld\n", - (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding_2)); LASSERTF(LCME_FL_INIT == 0x00000010UL, "found 0x%.8xUL\n", (unsigned int)LCME_FL_INIT); LASSERTF(LCME_FL_NEG == 0x80000000UL, "found 0x%.8xUL\n", @@ -1666,6 +1666,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_bavail)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_bavail) == 8, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_bavail)); + LASSERTF((int)offsetof(struct obd_statfs, os_files) == 32, "found %lld\n", + (long long)(int)offsetof(struct obd_statfs, os_files)); + LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_files) == 8, "found %lld\n", + (long long)(int)sizeof(((struct obd_statfs *)0)->os_files)); LASSERTF((int)offsetof(struct obd_statfs, os_ffree) == 40, "found %lld\n", (long long)(int)offsetof(struct obd_statfs, os_ffree)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_ffree) == 8, "found %lld\n", @@ -1682,6 +1686,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_namelen)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_namelen) == 4, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_namelen)); + LASSERTF((int)offsetof(struct obd_statfs, os_maxbytes) == 96, "found %lld\n", + (long long)(int)offsetof(struct obd_statfs, os_maxbytes)); + LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_maxbytes) == 8, "found %lld\n", + (long long)(int)sizeof(((struct obd_statfs *)0)->os_maxbytes)); LASSERTF((int)offsetof(struct obd_statfs, os_state) == 104, "found %lld\n", (long long)(int)offsetof(struct obd_statfs, os_state)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_state) == 4, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index f25bb9b..bff6f76 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -483,16 +483,20 @@ enum lov_comp_md_entry_flags { LCME_FL_PREF_RW = LCME_FL_PREF_RD | LCME_FL_PREF_WR, LCME_FL_OFFLINE = 0x00000008, /* Not used */ LCME_FL_INIT = 0x00000010, /* instantiated */ + LCME_FL_NOSYNC = 0x00000020, /* FLR: no sync for the mirror */ LCME_FL_NEG = 0x80000000, /* used to indicate a negative * flag, won't be stored on disk */ }; #define LCME_KNOWN_FLAGS (LCME_FL_NEG | LCME_FL_INIT | LCME_FL_STALE | \ - LCME_FL_PREF_RW) + LCME_FL_PREF_RW | LCME_FL_NOSYNC) /* The flags can be set by users at mirror creation time. */ #define LCME_USER_FLAGS (LCME_FL_PREF_RW) +/* The flags are for mirrors */ +#define LCME_MIRROR_FLAGS (LCME_FL_NOSYNC) + /* the highest bit in obdo::o_layout_version is used to mark if the file is * being resynced. */ @@ -519,8 +523,8 @@ struct lov_comp_md_entry_v1 { */ __u32 lcme_size; /* size of component blob */ __u32 lcme_layout_gen; + __u64 lcme_timestamp; /* snapshot time if applicable*/ __u32 lcme_padding_1; - __u64 lcme_padding_2; } __packed; #define SEQ_ID_MAX 0x0000FFFF From patchwork Thu Feb 27 21:09:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409903 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C7D4138D for ; Thu, 27 Feb 2020 21:25:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 34CC3246A0 for ; Thu, 27 Feb 2020 21:25:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34CC3246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 65D95348E20; Thu, 27 Feb 2020 13:22:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 902B521F982 for ; Thu, 27 Feb 2020 13:18:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C243E1058; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BFB7346D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:57 -0500 Message-Id: <1582838290-17243-130-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 129/622] lustre: llite: create checksums to replace checksum_pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Create llite.*.checksums, which matches llite.*.checksum_pages in functionality. Now the llite layer have something that matches osc.*.checksums. In time we can retire checksum_pages and change it to its original purpose of enabling per-page checksums (which was not implemented in the CLIO development). WC-bug-id: https://jira.whamcloud.com/browse/LU-10906 Lustre-commit: 123ee3cf96dd ("LU-10906 llite: create checksums to replace checksum_pages") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33222 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/lproc_llite.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 5ac6689..5fc7705 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -599,8 +599,8 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file, LPROC_SEQ_FOPS(ll_max_cached_mb); -static ssize_t checksum_pages_show(struct kobject *kobj, struct attribute *attr, - char *buf) +static ssize_t checksums_show(struct kobject *kobj, struct attribute *attr, + char *buf) { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); @@ -608,10 +608,8 @@ static ssize_t checksum_pages_show(struct kobject *kobj, struct attribute *attr, return sprintf(buf, "%u\n", (sbi->ll_flags & LL_SBI_CHECKSUM) ? 1 : 0); } -static ssize_t checksum_pages_store(struct kobject *kobj, - struct attribute *attr, - const char *buffer, - size_t count) +static ssize_t checksums_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); @@ -642,7 +640,9 @@ static ssize_t checksum_pages_store(struct kobject *kobj, return count; } -LUSTRE_RW_ATTR(checksum_pages); +LUSTRE_RW_ATTR(checksums); + +LUSTRE_ATTR(checksum_pages, 0644, checksums_show, checksums_store); static ssize_t ll_rd_track_id(struct kobject *kobj, char *buf, enum stats_track_type type) @@ -1250,6 +1250,7 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, &lustre_attr_max_read_ahead_mb.attr, &lustre_attr_max_read_ahead_per_file_mb.attr, &lustre_attr_max_read_ahead_whole_mb.attr, + &lustre_attr_checksums.attr, &lustre_attr_checksum_pages.attr, &lustre_attr_stats_track_pid.attr, &lustre_attr_stats_track_ppid.attr, From patchwork Thu Feb 27 21:09:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409889 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34B94138D for ; Thu, 27 Feb 2020 21:25:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1D3D6246A0 for ; Thu, 27 Feb 2020 21:25:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D3D6246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9A288348D79; Thu, 27 Feb 2020 13:22:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D969921F982 for ; Thu, 27 Feb 2020 13:18:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C3B7D1059; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C29FF46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:58 -0500 Message-Id: <1582838290-17243-131-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 130/622] lustre: ptlrpc: don't change buffer when signature is ready X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The lm_repsize is part of buffer being used in signature calculation and must not be changed after calculation is done. Patch reverts related changes from commit 13372d6c and moves related lm_repsize update into MDC where DOM read-on-open buffer is prepared WC-bug-id: https://jira.whamcloud.com/browse/LU-11414 Lustre-commit: cf503e047c7f ("LU-11414 ptlrpc: don't change buffer when signature is ready") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33223 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_locks.c | 30 +++++++++++++++++++++--------- fs/lustre/ptlrpc/niobuf.c | 5 ----- 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 80f2e10..09f9bc5 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -256,7 +256,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, int count = 0; enum ldlm_mode mode; int rc; - int repsize; + int repsize, repsize_estimate; it->it_create_mode = (it->it_create_mode & ~S_IFMT) | S_IFREG; @@ -347,22 +347,34 @@ static int mdc_save_lovea(struct ptlrpc_request *req, /* Get real repbuf allocated size as rounded up power of 2 */ repsize = size_roundup_power2(req->rq_replen + lustre_msg_early_size()); - /* Estimate free space for DoM files in repbuf */ - repsize -= req->rq_replen - obddev->u.cli.cl_max_mds_easize + - sizeof(struct lov_comp_md_v1) + - sizeof(struct lov_comp_md_entry_v1) + - lov_mds_md_size(0, LOV_MAGIC_V3); - - if (repsize < obddev->u.cli.cl_dom_min_inline_repsize) { - repsize = obddev->u.cli.cl_dom_min_inline_repsize - repsize; + repsize_estimate = repsize - (req->rq_replen - + obddev->u.cli.cl_max_mds_easize + + sizeof(struct lov_comp_md_v1) + + sizeof(struct lov_comp_md_entry_v1) + + lov_mds_md_size(0, LOV_MAGIC_V3)); + + if (repsize_estimate < obddev->u.cli.cl_dom_min_inline_repsize) { + repsize = obddev->u.cli.cl_dom_min_inline_repsize - + repsize_estimate + sizeof(struct niobuf_remote); req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, RCL_SERVER, sizeof(struct niobuf_remote) + repsize); ptlrpc_request_set_replen(req); CDEBUG(D_INFO, "Increase repbuf by %d bytes, total: %d\n", repsize, req->rq_replen); + repsize = size_roundup_power2(req->rq_replen + + lustre_msg_early_size()); } + /* The only way to report real allocated repbuf size to the server + * is the lm_repsize but it must be set prior buffer allocation itself + * due to security reasons - it is part of buffer used in signature + * calculation (see LU-11414). Therefore the saved size is predicted + * value as rq_replen rounded to the next higher power of 2. + * Such estimation is safe. Though the final allocated buffer might + * be even larger, it is not possible to know that at this point. + */ + req->rq_reqmsg->lm_repsize = repsize; return req; } diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index e8ba57b..2e866fe 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -617,11 +617,6 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) request->rq_status = rc; goto cleanup_bulk; } - /* Use real allocated value in lm_repsize, - * so the server may use whole reply buffer - * without resends where it is needed. - */ - request->rq_reqmsg->lm_repsize = request->rq_repbuf_len; } else { request->rq_repdata = NULL; request->rq_repmsg = NULL; From patchwork Thu Feb 27 21:09:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6A67138D for ; Thu, 27 Feb 2020 21:25:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8F554246A0 for ; Thu, 27 Feb 2020 21:25:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F554246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28205348E47; Thu, 27 Feb 2020 13:22:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3A8FC21FADC for ; Thu, 27 Feb 2020 13:18:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C6D7A105A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C5D4446A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:59 -0500 Message-Id: <1582838290-17243-132-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 131/622] lustre: ldlm: update l_blocking_lock under lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Update l_blocking_lock under with locking to prevent race between lock_handle_convert0() and ldlm_work_bl_ast() code. WC-bug-id: https://jira.whamcloud.com/browse/LU-11287 Lustre-commit: 2a520282888d ("LU-11287 ldlm: update l_blocking_lock under lock") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33124 Reviewed-by: Lai Siyao Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lock.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index bdbbfec..869d664 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1639,16 +1639,7 @@ enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, lock = list_first_entry(arg->list, struct ldlm_lock, l_bl_ast); - /* nobody should touch l_bl_ast */ - lock_res_and_lock(lock); - list_del_init(&lock->l_bl_ast); - - LASSERT(ldlm_is_ast_sent(lock)); - LASSERT(lock->l_bl_ast_run == 0); LASSERT(lock->l_blocking_lock); - lock->l_bl_ast_run++; - unlock_res_and_lock(lock); - ldlm_lock2desc(lock->l_blocking_lock, &d); /* copy blocking lock ibits in cancel_bits as well, * new client may use them for lock convert and it is @@ -1658,9 +1649,16 @@ enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, d.l_policy_data.l_inodebits.cancel_bits = lock->l_blocking_lock->l_policy_data.l_inodebits.bits; + /* nobody should touch l_bl_ast */ + lock_res_and_lock(lock); + list_del_init(&lock->l_bl_ast); + + LASSERT(ldlm_is_ast_sent(lock)); + LASSERT(lock->l_bl_ast_run == 0); + lock->l_bl_ast_run++; + unlock_res_and_lock(lock); + rc = lock->l_blocking_ast(lock, &d, (void *)arg, LDLM_CB_BLOCKING); - LDLM_LOCK_RELEASE(lock->l_blocking_lock); - lock->l_blocking_lock = NULL; LDLM_LOCK_RELEASE(lock); return rc; From patchwork Thu Feb 27 21:10:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409911 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8917D159A for ; Thu, 27 Feb 2020 21:25:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 71D6B246A0 for ; Thu, 27 Feb 2020 21:25:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71D6B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D06D23487C3; Thu, 27 Feb 2020 13:22:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7B3D421FB8B for ; Thu, 27 Feb 2020 13:18:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CA4AC105B; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C8AFB46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:00 -0500 Message-Id: <1582838290-17243-133-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 132/622] lustre: mgc: don't proccess cld during stopping X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko The patch fixes the log processing during stopping. It was general protection fault at mgc_process_cfg_log() at lsi access. Lsi pointer was wrong 38323172756f6663, and all cld->cld_cfg.cfg_sb had invalid data. WC-bug-id: https://jira.whamcloud.com/browse/LU-10595 Lustre-commit: bda43cbe369a ("LU-10595 mgc: don't proccess cld during stopping") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-6199 Reviewed-on: https://review.whamcloud.com/33190 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mgc/mgc_request.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index c114aa8..785461b 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -1651,6 +1651,11 @@ int mgc_process_log(struct obd_device *mgc, struct config_llog_data *cld) goto restart; } else { mutex_lock(&cld->cld_lock); + /* unlock/lock mutex, so check stopping again */ + if (cld->cld_stopping) { + mutex_unlock(&cld->cld_lock); + return 0; + } spin_lock(&config_list_lock); cld->cld_lostlock = 1; spin_unlock(&config_list_lock); From patchwork Thu Feb 27 21:10:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E18F2138D for ; Thu, 27 Feb 2020 21:25:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9C9E246A1 for ; Thu, 27 Feb 2020 21:25:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C9C9E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B483348EA1; Thu, 27 Feb 2020 13:22:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BF9A221FB8B for ; Thu, 27 Feb 2020 13:18:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CCCE4105D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CB92A468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:01 -0500 Message-Id: <1582838290-17243-134-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 133/622] lustre: obdclass: make mod rpc slot wait queue FIFO X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vladimir Saveliev , Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev Relatively big load may cause a process to spin for a long time without successful grabbing a free mod rpc slot. It has been observed a process spinning more than 100 seconds when there were 72 mdtest-s and 8 IOR-s. Make mod rpc slot wait queue to run FIFO so that waiting thread got free mod rpc slot in order they entered the queue. Cray-bug-id: LUS-6380 WC-bug-id: https://jira.whamcloud.com/browse/LU-11441 Lustre-commit: 7fa0fd415770 ("LU-11441 obdclass: make mod rpc slot wait queue FIFO") Signed-off-by: Alexander Zarochentsev Signed-off-by: Vladimir Saveliev Reviewed-on: https://review.whamcloud.com/33282 Reviewed-by: Alex Zhuravlev Reviewed-by: Alexandr Boyko Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index e5e2f73..da53572 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1574,8 +1574,9 @@ u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc, CDEBUG(D_RPCTRACE, "%s: sleeping for a modify RPC slot opc %u, max %hu\n", cli->cl_import->imp_obd->obd_name, opc, max); - wait_event_idle(cli->cl_mod_rpcs_waitq, - obd_mod_rpc_slot_avail(cli, close_req)); + wait_event_idle_exclusive(cli->cl_mod_rpcs_waitq, + obd_mod_rpc_slot_avail(cli, + close_req)); } while (true); } EXPORT_SYMBOL(obd_get_mod_rpc_slot); From patchwork Thu Feb 27 21:10:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409893 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B623014BC for ; Thu, 27 Feb 2020 21:25:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9ED96246A0 for ; Thu, 27 Feb 2020 21:25:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9ED96246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EFFCC34885A; Thu, 27 Feb 2020 13:22:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0D73B21FB92 for ; Thu, 27 Feb 2020 13:18:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D061713C8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CE5D246D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:02 -0500 Message-Id: <1582838290-17243-135-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 134/622] lustre: mdc: use old statfs format X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev when the client talks to old server with no support for aggregated statfs WC-bug-id: https://jira.whamcloud.com/browse/LU-11375 Lustre-commit: e70a6fd8a640 ("LU-11375 mdc: use old statfs format") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/33162 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 + fs/lustre/mdc/mdc_request.c | 9 +++++++-- fs/lustre/ptlrpc/layout.c | 8 +++++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index ed4fc42..36656c6 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -133,6 +133,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_MDS_CONNECT; extern struct req_format RQF_MDS_DISCONNECT; extern struct req_format RQF_MDS_STATFS; +extern struct req_format RQF_MDS_STATFS_NEW; extern struct req_format RQF_MDS_GET_ROOT; extern struct req_format RQF_MDS_SYNC; extern struct req_format RQF_MDS_GETXATTR; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 15f94ea..5cc1e1f 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1474,6 +1474,7 @@ static int mdc_statfs(const struct lu_env *env, time64_t max_age, u32 flags) { struct obd_device *obd = class_exp2obd(exp); + struct req_format *fmt; struct ptlrpc_request *req; struct obd_statfs *msfs; struct obd_import *imp = NULL; @@ -1490,8 +1491,12 @@ static int mdc_statfs(const struct lu_env *env, if (!imp) return -ENODEV; - req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_STATFS, - LUSTRE_MDS_VERSION, MDS_STATFS); + fmt = &RQF_MDS_STATFS; + if ((exp_connect_flags2(exp) & OBD_CONNECT2_SUM_STATFS) && + (flags & OBD_STATFS_SUM)) + fmt = &RQF_MDS_STATFS_NEW; + req = ptlrpc_request_alloc_pack(imp, fmt, LUSTRE_MDS_VERSION, + MDS_STATFS); if (!req) { rc = -ENOMEM; goto output; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index efbff69..92d2fc2 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -683,6 +683,7 @@ &RQF_MDS_GET_INFO, &RQF_MDS_GET_ROOT, &RQF_MDS_STATFS, + &RQF_MDS_STATFS_NEW, &RQF_MDS_GETATTR, &RQF_MDS_GETATTR_NAME, &RQF_MDS_GETXATTR, @@ -1250,9 +1251,13 @@ struct req_format RQF_MDS_GET_ROOT = EXPORT_SYMBOL(RQF_MDS_GET_ROOT); struct req_format RQF_MDS_STATFS = - DEFINE_REQ_FMT0("MDS_STATFS", mdt_body_only, obd_statfs_server); + DEFINE_REQ_FMT0("MDS_STATFS", empty, obd_statfs_server); EXPORT_SYMBOL(RQF_MDS_STATFS); +struct req_format RQF_MDS_STATFS_NEW = + DEFINE_REQ_FMT0("MDS_STATFS_NEW", mdt_body_only, obd_statfs_server); +EXPORT_SYMBOL(RQF_MDS_STATFS_NEW); + struct req_format RQF_MDS_SYNC = DEFINE_REQ_FMT0("MDS_SYNC", mdt_body_capa, mdt_body_only); EXPORT_SYMBOL(RQF_MDS_SYNC); @@ -2134,6 +2139,7 @@ u32 req_capsule_fmt_size(u32 magic, const struct req_format *fmt, size += cfs_size_round(fmt->rf_fields[loc].d[i]->rmf_size); return size; } +EXPORT_SYMBOL(req_capsule_fmt_size); /** * Changes the format of an RPC. From patchwork Thu Feb 27 21:10:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409897 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F55014BC for ; Thu, 27 Feb 2020 21:25:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08179246A0 for ; Thu, 27 Feb 2020 21:25:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08179246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DA8C348DE6; Thu, 27 Feb 2020 13:22:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 635EB21FB92 for ; Thu, 27 Feb 2020 13:18:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D3C6513CA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D1A7646A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:03 -0500 Message-Id: <1582838290-17243-136-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 135/622] lnet: Fix selftest backward compatibility post health X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma Post LNet health feature landing, lnet-selftest loses backward compatibility. This patch fixes that by adding a new structure lnet_counters_common similar to lnet_counters(pre-Health version). Now, lnet_counters_common is the struct that selftest depends on. Also, adds a struct lnet_counters_health specifically for health stats. WC-bug-id: https://jira.whamcloud.com/browse/LU-11422 Lustre-commit: 60f6f2b480b4 ("LU-11422 lnet: Fix selftest backward compatibility post health") Signed-off-by: Sonia Sharma Reviewed-on: https://review.whamcloud.com/33242 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 5 ++- include/uapi/linux/lnet/lnet-types.h | 58 +++++++++++++++------------ net/lnet/lnet/api-ni.c | 78 +++++++++++++++++++++++++----------- net/lnet/lnet/lib-move.c | 18 +++++---- net/lnet/lnet/lib-msg.c | 57 +++++++++++++------------- net/lnet/lnet/router_proc.c | 14 ++++--- net/lnet/selftest/framework.c | 28 ++++++------- net/lnet/selftest/rpc.h | 10 ++--- 8 files changed, 157 insertions(+), 111 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 4915a87..a1dad9f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -445,7 +445,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, rspt = kzalloc(sizeof(*rspt), GFP_NOFS); lnet_net_lock(cpt); - the_lnet.ln_counters[cpt]->rst_alloc++; + the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc++; lnet_net_unlock(cpt); return rspt; } @@ -455,7 +455,7 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, { kfree(rspt); lnet_net_lock(cpt); - the_lnet.ln_counters[cpt]->rst_alloc--; + the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc--; lnet_net_unlock(cpt); } @@ -675,6 +675,7 @@ int lnet_delay_rule_list(int pos, struct lnet_fault_attr *attr, /** @} lnet_fault_simulation */ +void lnet_counters_get_common(struct lnet_counters_common *common); void lnet_counters_get(struct lnet_counters *counters); void lnet_counters_reset(void); diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h index 1da72c4..cf263b9 100644 --- a/include/uapi/linux/lnet/lnet-types.h +++ b/include/uapi/linux/lnet/lnet-types.h @@ -275,33 +275,41 @@ struct lnet_ping_info { #define LNET_PING_INFO_LONI(PINFO) ((PINFO)->pi_ni[0].ns_nid) #define LNET_PING_INFO_SEQNO(PINFO) ((PINFO)->pi_ni[0].ns_status) -struct lnet_counters { - __u32 msgs_alloc; - __u32 msgs_max; - __u32 rst_alloc; - __u32 errors; - __u32 send_count; - __u32 recv_count; - __u32 route_count; - __u32 drop_count; - __u32 resend_count; - __u32 response_timeout_count; - __u32 local_interrupt_count; - __u32 local_dropped_count; - __u32 local_aborted_count; - __u32 local_no_route_count; - __u32 local_timeout_count; - __u32 local_error_count; - __u32 remote_dropped_count; - __u32 remote_error_count; - __u32 remote_timeout_count; - __u32 network_timeout_count; - __u64 send_length; - __u64 recv_length; - __u64 route_length; - __u64 drop_length; +struct lnet_counters_common { + __u32 lcc_msgs_alloc; + __u32 lcc_msgs_max; + __u32 lcc_errors; + __u32 lcc_send_count; + __u32 lcc_recv_count; + __u32 lcc_route_count; + __u32 lcc_drop_count; + __u64 lcc_send_length; + __u64 lcc_recv_length; + __u64 lcc_route_length; + __u64 lcc_drop_length; } __packed; +struct lnet_counters_health { + __u32 lch_rst_alloc; + __u32 lch_resend_count; + __u32 lch_response_timeout_count; + __u32 lch_local_interrupt_count; + __u32 lch_local_dropped_count; + __u32 lch_local_aborted_count; + __u32 lch_local_no_route_count; + __u32 lch_local_timeout_count; + __u32 lch_local_error_count; + __u32 lch_remote_dropped_count; + __u32 lch_remote_error_count; + __u32 lch_remote_timeout_count; + __u32 lch_network_timeout_count; +}; + +struct lnet_counters { + struct lnet_counters_common lct_common; + struct lnet_counters_health lct_health; +}; + #define LNET_NI_STATUS_UP 0x15aac0de #define LNET_NI_STATUS_DOWN 0xdeadface #define LNET_NI_STATUS_INVALID 0x00000000 diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index c81f46f..21e0175 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -682,40 +682,70 @@ static void lnet_assert_wire_constants(void) EXPORT_SYMBOL(lnet_unregister_lnd); void +lnet_counters_get_common(struct lnet_counters_common *common) +{ + struct lnet_counters *ctr; + int i; + + memset(common, 0, sizeof(*common)); + + lnet_net_lock(LNET_LOCK_EX); + + cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) { + common->lcc_msgs_max += ctr->lct_common.lcc_msgs_max; + common->lcc_msgs_alloc += ctr->lct_common.lcc_msgs_alloc; + common->lcc_errors += ctr->lct_common.lcc_errors; + common->lcc_send_count += ctr->lct_common.lcc_send_count; + common->lcc_recv_count += ctr->lct_common.lcc_recv_count; + common->lcc_route_count += ctr->lct_common.lcc_route_count; + common->lcc_drop_count += ctr->lct_common.lcc_drop_count; + common->lcc_send_length += ctr->lct_common.lcc_send_length; + common->lcc_recv_length += ctr->lct_common.lcc_recv_length; + common->lcc_route_length += ctr->lct_common.lcc_route_length; + common->lcc_drop_length += ctr->lct_common.lcc_drop_length; + } + lnet_net_unlock(LNET_LOCK_EX); +} +EXPORT_SYMBOL(lnet_counters_get_common); + +void lnet_counters_get(struct lnet_counters *counters) { struct lnet_counters *ctr; + struct lnet_counters_health *health = &counters->lct_health; int i; memset(counters, 0, sizeof(*counters)); + lnet_counters_get_common(&counters->lct_common); + lnet_net_lock(LNET_LOCK_EX); cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) { - counters->msgs_max += ctr->msgs_max; - counters->msgs_alloc += ctr->msgs_alloc; - counters->rst_alloc += ctr->rst_alloc; - counters->errors += ctr->errors; - counters->resend_count += ctr->resend_count; - counters->response_timeout_count += ctr->response_timeout_count; - counters->local_interrupt_count += ctr->local_interrupt_count; - counters->local_dropped_count += ctr->local_dropped_count; - counters->local_aborted_count += ctr->local_aborted_count; - counters->local_no_route_count += ctr->local_no_route_count; - counters->local_timeout_count += ctr->local_timeout_count; - counters->local_error_count += ctr->local_error_count; - counters->remote_dropped_count += ctr->remote_dropped_count; - counters->remote_error_count += ctr->remote_error_count; - counters->remote_timeout_count += ctr->remote_timeout_count; - counters->network_timeout_count += ctr->network_timeout_count; - counters->send_count += ctr->send_count; - counters->recv_count += ctr->recv_count; - counters->route_count += ctr->route_count; - counters->drop_count += ctr->drop_count; - counters->send_length += ctr->send_length; - counters->recv_length += ctr->recv_length; - counters->route_length += ctr->route_length; - counters->drop_length += ctr->drop_length; + health->lch_rst_alloc += ctr->lct_health.lch_rst_alloc; + health->lch_resend_count += ctr->lct_health.lch_resend_count; + health->lch_response_timeout_count += + ctr->lct_health.lch_response_timeout_count; + health->lch_local_interrupt_count += + ctr->lct_health.lch_local_interrupt_count; + health->lch_local_dropped_count += + ctr->lct_health.lch_local_dropped_count; + health->lch_local_aborted_count += + ctr->lct_health.lch_local_aborted_count; + health->lch_local_no_route_count += + ctr->lct_health.lch_local_no_route_count; + health->lch_local_timeout_count += + ctr->lct_health.lch_local_timeout_count; + health->lch_local_error_count += + ctr->lct_health.lch_local_error_count; + health->lch_remote_dropped_count += + ctr->lct_health.lch_remote_dropped_count; + health->lch_remote_error_count += + ctr->lct_health.lch_remote_error_count; + health->lch_remote_timeout_count += + ctr->lct_health.lch_remote_timeout_count; + health->lch_network_timeout_count += + ctr->lct_health.lch_network_timeout_count; } lnet_net_unlock(LNET_LOCK_EX); } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 84a30e0..38ee970 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -755,8 +755,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, /* NB 'lp' is always the next hop */ if (!(msg->msg_target.pid & LNET_PID_USERFLAG) && !lnet_peer_alive_locked(ni, lp, msg)) { - the_lnet.ln_counters[cpt]->drop_count++; - the_lnet.ln_counters[cpt]->drop_length += msg->msg_len; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length += + msg->msg_len; lnet_net_unlock(cpt); if (msg->msg_txpeer) lnet_incr_stats(&msg->msg_txpeer->lpni_stats, @@ -2510,7 +2511,7 @@ struct lnet_mt_event_info { lnet_res_unlock(i); lnet_net_lock(i); - the_lnet.ln_counters[i]->response_timeout_count++; + the_lnet.ln_counters[i]->lct_health.lch_response_timeout_count++; lnet_net_unlock(i); list_del_init(&rspt->rspt_on_list); @@ -2595,7 +2596,7 @@ struct lnet_mt_event_info { } lnet_net_lock(cpt); if (!rc) - the_lnet.ln_counters[cpt]->resend_count++; + the_lnet.ln_counters[cpt]->lct_health.lch_resend_count++; } } } @@ -3346,8 +3347,8 @@ void lnet_monitor_thr_stop(void) { lnet_net_lock(cpt); lnet_incr_stats(&ni->ni_stats, msg_type, LNET_STATS_TYPE_DROP); - the_lnet.ln_counters[cpt]->drop_count++; - the_lnet.ln_counters[cpt]->drop_length += nob; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length += nob; lnet_net_unlock(cpt); lnet_ni_recv(ni, private, NULL, 0, 0, 0, nob); @@ -4329,8 +4330,9 @@ struct lnet_msg * lnet_net_lock(cpt); lnet_incr_stats(&ni->ni_stats, LNET_MSG_GET, LNET_STATS_TYPE_DROP); - the_lnet.ln_counters[cpt]->drop_count++; - the_lnet.ln_counters[cpt]->drop_length += getmd->md_length; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++; + the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length += + getmd->md_length; lnet_net_unlock(cpt); kfree(msg); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 9b52549..433401f 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -140,7 +140,7 @@ lnet_msg_commit(struct lnet_msg *msg, int cpt) { struct lnet_msg_container *container = the_lnet.ln_msg_containers[cpt]; - struct lnet_counters *counters = the_lnet.ln_counters[cpt]; + struct lnet_counters_common *common; s64 timeout_ns; /* set the message deadline */ @@ -169,30 +169,31 @@ msg->msg_onactivelist = 1; list_add_tail(&msg->msg_activelist, &container->msc_active); - counters->msgs_alloc++; - if (counters->msgs_alloc > counters->msgs_max) - counters->msgs_max = counters->msgs_alloc; + common = &the_lnet.ln_counters[cpt]->lct_common; + common->lcc_msgs_alloc++; + if (common->lcc_msgs_alloc > common->lcc_msgs_max) + common->lcc_msgs_max = common->lcc_msgs_alloc; } static void lnet_msg_decommit_tx(struct lnet_msg *msg, int status) { - struct lnet_counters *counters; + struct lnet_counters_common *common; struct lnet_event *ev = &msg->msg_ev; LASSERT(msg->msg_tx_committed); if (status) goto out; - counters = the_lnet.ln_counters[msg->msg_tx_cpt]; + common = &the_lnet.ln_counters[msg->msg_tx_cpt]->lct_common; switch (ev->type) { default: /* routed message */ LASSERT(msg->msg_routing); LASSERT(msg->msg_rx_committed); LASSERT(!ev->type); - counters->route_length += msg->msg_len; - counters->route_count++; + common->lcc_route_length += msg->msg_len; + common->lcc_route_count++; goto incr_stats; case LNET_EVENT_PUT: @@ -206,7 +207,7 @@ case LNET_EVENT_SEND: LASSERT(!msg->msg_rx_committed); if (msg->msg_type == LNET_MSG_PUT) - counters->send_length += msg->msg_len; + common->lcc_send_length += msg->msg_len; break; case LNET_EVENT_GET: @@ -220,7 +221,7 @@ break; } - counters->send_count++; + common->lcc_send_count++; incr_stats: if (msg->msg_txpeer) @@ -239,7 +240,7 @@ static void lnet_msg_decommit_rx(struct lnet_msg *msg, int status) { - struct lnet_counters *counters; + struct lnet_counters_common *common; struct lnet_event *ev = &msg->msg_ev; LASSERT(!msg->msg_tx_committed); /* decommitted or never committed */ @@ -248,7 +249,7 @@ if (status) goto out; - counters = the_lnet.ln_counters[msg->msg_rx_cpt]; + common = &the_lnet.ln_counters[msg->msg_rx_cpt]->lct_common; switch (ev->type) { default: LASSERT(!ev->type); @@ -268,7 +269,7 @@ */ LASSERT(msg->msg_type == LNET_MSG_REPLY || msg->msg_type == LNET_MSG_GET); - counters->send_length += msg->msg_wanted; + common->lcc_send_length += msg->msg_wanted; break; case LNET_EVENT_PUT: @@ -285,7 +286,7 @@ break; } - counters->recv_count++; + common->lcc_recv_count++; incr_stats: if (msg->msg_rxpeer) @@ -297,7 +298,7 @@ msg->msg_type, LNET_STATS_TYPE_RECV); if (ev->type == LNET_EVENT_PUT || ev->type == LNET_EVENT_REPLY) - counters->recv_length += msg->msg_wanted; + common->lcc_recv_length += msg->msg_wanted; out: lnet_return_rx_credits_locked(msg); @@ -330,7 +331,7 @@ list_del(&msg->msg_activelist); msg->msg_onactivelist = 0; - the_lnet.ln_counters[cpt2]->msgs_alloc--; + the_lnet.ln_counters[cpt2]->lct_common.lcc_msgs_alloc--; if (cpt2 != cpt) { lnet_net_unlock(cpt2); @@ -546,52 +547,54 @@ { struct lnet_ni *ni = msg->msg_txni; struct lnet_peer_ni *lpni = msg->msg_txpeer; - struct lnet_counters *counters = the_lnet.ln_counters[0]; + struct lnet_counters_health *health; + + health = &the_lnet.ln_counters[0]->lct_health; switch (hstatus) { case LNET_MSG_STATUS_LOCAL_INTERRUPT: atomic_inc(&ni->ni_hstats.hlt_local_interrupt); - counters->local_interrupt_count++; + health->lch_local_interrupt_count++; break; case LNET_MSG_STATUS_LOCAL_DROPPED: atomic_inc(&ni->ni_hstats.hlt_local_dropped); - counters->local_dropped_count++; + health->lch_local_dropped_count++; break; case LNET_MSG_STATUS_LOCAL_ABORTED: atomic_inc(&ni->ni_hstats.hlt_local_aborted); - counters->local_aborted_count++; + health->lch_local_aborted_count++; break; case LNET_MSG_STATUS_LOCAL_NO_ROUTE: atomic_inc(&ni->ni_hstats.hlt_local_no_route); - counters->local_no_route_count++; + health->lch_local_no_route_count++; break; case LNET_MSG_STATUS_LOCAL_TIMEOUT: atomic_inc(&ni->ni_hstats.hlt_local_timeout); - counters->local_timeout_count++; + health->lch_local_timeout_count++; break; case LNET_MSG_STATUS_LOCAL_ERROR: atomic_inc(&ni->ni_hstats.hlt_local_error); - counters->local_error_count++; + health->lch_local_error_count++; break; case LNET_MSG_STATUS_REMOTE_DROPPED: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_dropped); - counters->remote_dropped_count++; + health->lch_remote_dropped_count++; break; case LNET_MSG_STATUS_REMOTE_ERROR: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_error); - counters->remote_error_count++; + health->lch_remote_error_count++; break; case LNET_MSG_STATUS_REMOTE_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_remote_timeout); - counters->remote_timeout_count++; + health->lch_remote_timeout_count++; break; case LNET_MSG_STATUS_NETWORK_TIMEOUT: if (lpni) atomic_inc(&lpni->lpni_hstats.hlt_network_timeout); - counters->network_timeout_count++; + health->lch_network_timeout_count++; break; case LNET_MSG_STATUS_OK: break; diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index ebe7993..45abcfb 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -79,6 +79,7 @@ static int proc_lnet_stats(struct ctl_table *table, int write, { int rc; struct lnet_counters *ctrs; + struct lnet_counters_common common; size_t nob = *lenp; loff_t pos = *ppos; int len; @@ -102,15 +103,16 @@ static int proc_lnet_stats(struct ctl_table *table, int write, } lnet_counters_get(ctrs); + common = ctrs->lct_common; len = snprintf(tmpstr, tmpsiz, "%u %u %u %u %u %u %u %llu %llu %llu %llu", - ctrs->msgs_alloc, ctrs->msgs_max, - ctrs->errors, - ctrs->send_count, ctrs->recv_count, - ctrs->route_count, ctrs->drop_count, - ctrs->send_length, ctrs->recv_length, - ctrs->route_length, ctrs->drop_length); + common.lcc_msgs_alloc, common.lcc_msgs_max, + common.lcc_errors, + common.lcc_send_count, common.lcc_recv_count, + common.lcc_route_count, common.lcc_drop_count, + common.lcc_send_length, common.lcc_recv_length, + common.lcc_route_length, common.lcc_drop_length); if (pos >= min_t(int, len, strlen(tmpstr))) rc = 0; diff --git a/net/lnet/selftest/framework.c b/net/lnet/selftest/framework.c index c8c42b9..00e7363 100644 --- a/net/lnet/selftest/framework.c +++ b/net/lnet/selftest/framework.c @@ -82,19 +82,19 @@ __swab64s(&(rc).bulk_put); \ } while (0) -#define sfw_unpack_lnet_counters(lc) \ -do { \ - __swab32s(&(lc).errors); \ - __swab32s(&(lc).msgs_max); \ - __swab32s(&(lc).msgs_alloc); \ - __swab32s(&(lc).send_count); \ - __swab32s(&(lc).recv_count); \ - __swab32s(&(lc).drop_count); \ - __swab32s(&(lc).route_count); \ - __swab64s(&(lc).send_length); \ - __swab64s(&(lc).recv_length); \ - __swab64s(&(lc).drop_length); \ - __swab64s(&(lc).route_length); \ +#define sfw_unpack_lnet_counters(lc) \ +do { \ + __swab32s(&(lc).lcc_errors); \ + __swab32s(&(lc).lcc_msgs_max); \ + __swab32s(&(lc).lcc_msgs_alloc); \ + __swab32s(&(lc).lcc_send_count); \ + __swab32s(&(lc).lcc_recv_count); \ + __swab32s(&(lc).lcc_drop_count); \ + __swab32s(&(lc).lcc_route_count); \ + __swab64s(&(lc).lcc_send_length); \ + __swab64s(&(lc).lcc_recv_length); \ + __swab64s(&(lc).lcc_drop_length); \ + __swab64s(&(lc).lcc_route_length); \ } while (0) #define sfw_test_active(t) (atomic_read(&(t)->tsi_nactive)) @@ -377,7 +377,7 @@ return 0; } - lnet_counters_get(&reply->str_lnet); + lnet_counters_get_common(&reply->str_lnet); srpc_get_counters(&reply->str_rpc); /* diff --git a/net/lnet/selftest/rpc.h b/net/lnet/selftest/rpc.h index 8ccae3a..6d07452 100644 --- a/net/lnet/selftest/rpc.h +++ b/net/lnet/selftest/rpc.h @@ -160,11 +160,11 @@ struct srpc_stat_reqst { } __packed; struct srpc_stat_reply { - u32 str_status; - struct lst_sid str_sid; - struct sfw_counters str_fw; - struct srpc_counters str_rpc; - struct lnet_counters str_lnet; + u32 str_status; + struct lst_sid str_sid; + struct sfw_counters str_fw; + struct srpc_counters str_rpc; + struct lnet_counters_common str_lnet; } __packed; struct test_bulk_req { From patchwork Thu Feb 27 21:10:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4EE06138D for ; Thu, 27 Feb 2020 21:25:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 372A0246A0 for ; Thu, 27 Feb 2020 21:25:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 372A0246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 43B10348E12; Thu, 27 Feb 2020 13:22:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA3E721FB50 for ; Thu, 27 Feb 2020 13:18:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D5C6213CB; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D473D46F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:04 -0500 Message-Id: <1582838290-17243-137-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 136/622] lustre: osc: clarify short_io_bytes is maximum value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Clarify in the code that the "osc.*.short_io_bytes" parameter is the maximum IO size to pack into request/reply not the minimum. Allow short_io to be disabled completely if it is set to zero. It would be nice to also change the name of the /sysfs functions in a similar manner but that also changes the /sysfs tunable name (via LUSTRE_RW_ATTR() macro) and has compatibility implications for sites that may have changed this value. WC-bug-id: https://jira.whamcloud.com/browse/LU-1757 Lustre-commit: b90812a674f6 ("LU-1757 osc: clarify short_io_bytes is maximum value") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33173 Reviewed-by: Mike Pershin Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 +- fs/lustre/ldlm/ldlm_lib.c | 2 ++ fs/lustre/obdclass/lprocfs_status.c | 6 +++--- fs/lustre/osc/osc_request.c | 7 +++---- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 7cf9745..2587136 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -252,7 +252,7 @@ struct client_obd { atomic_t cl_pending_r_pages; u32 cl_max_pages_per_rpc; u32 cl_max_rpcs_in_flight; - u32 cl_short_io_bytes; + u32 cl_max_short_io_bytes; struct obd_histogram cl_read_rpc_hist; struct obd_histogram cl_write_rpc_hist; struct obd_histogram cl_read_page_hist; diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 838ddb3..5fe5711 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -374,6 +374,8 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) */ cli->cl_max_pages_per_rpc = PTLRPC_MAX_BRW_PAGES; + cli->cl_max_short_io_bytes = OBD_MAX_SHORT_IO_BYTES; + /* * set cl_chunkbits default value to PAGE_CACHE_SHIFT, * it will be updated at OSC connection time. diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 747baff..b3dbe85 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -1896,7 +1896,7 @@ ssize_t short_io_bytes_show(struct kobject *kobj, struct attribute *attr, int rc; spin_lock(&cli->cl_loi_list_lock); - rc = sprintf(buf, "%d\n", cli->cl_short_io_bytes); + rc = sprintf(buf, "%d\n", cli->cl_max_short_io_bytes); spin_unlock(&cli->cl_loi_list_lock); return rc; } @@ -1922,7 +1922,7 @@ ssize_t short_io_bytes_store(struct kobject *kobj, struct attribute *attr, if (rc) goto out; - if (val > OBD_MAX_SHORT_IO_BYTES || val < MIN_SHORT_IO_BYTES) { + if (val && (val < MIN_SHORT_IO_BYTES || val > OBD_MAX_SHORT_IO_BYTES)) { rc = -ERANGE; goto out; } @@ -1933,7 +1933,7 @@ ssize_t short_io_bytes_store(struct kobject *kobj, struct attribute *attr, if (val > (cli->cl_max_pages_per_rpc << PAGE_SHIFT)) rc = -ERANGE; else - cli->cl_short_io_bytes = val; + cli->cl_max_short_io_bytes = val; spin_unlock(&cli->cl_loi_list_lock); out: diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index e968360..4524a98 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1321,9 +1321,9 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, for (i = 0; i < page_count; i++) short_io_size += pga[i]->count; - /* Check if we can do a short io. */ - if (!(short_io_size <= cli->cl_short_io_bytes && niocount == 1 && - imp_connect_shortio(cli->cl_import))) + /* Check if read/write is small enough to be a short io. */ + if (short_io_size > cli->cl_max_short_io_bytes || niocount > 1 || + !imp_connect_shortio(cli->cl_import)) short_io_size = 0; req_capsule_set_size(pill, &RMF_SHORT_IO, RCL_CLIENT, @@ -1762,7 +1762,6 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) CERROR("Unexpected +ve rc %d\n", rc); return -EPROTO; } - LASSERT(req->rq_bulk->bd_nob == aa->aa_requested_nob); if (req->rq_bulk && sptlrpc_cli_unwrap_bulk_write(req, req->rq_bulk)) From patchwork Thu Feb 27 21:10:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66D3A138D for ; Thu, 27 Feb 2020 21:25:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4DC35246A0 for ; Thu, 27 Feb 2020 21:25:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DC35246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F1B08348927; Thu, 27 Feb 2020 13:22:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C54921FAAC for ; Thu, 27 Feb 2020 13:18:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DA21913CC; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D7BC046C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:05 -0500 Message-Id: <1582838290-17243-138-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 137/622] lustre: ptlrpc: Make CPU binding switchable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell LU-6325 added CPT binding to the ptlrpc worker threads on the servers. This is often desirable, especially where NUMA latencies are high, but it is not always beneficial. If NUMA latencies are low, there is little benefit, and sometimes it can be quite costly: In particular, if NID-CPT hashing with routers leads to an unbalanced workload by CPT, it is easy to end up in a situation where the CPUs in one CPT are maxed out but others are idle. To this end, we add module parameters to allow disabling the strict binding behavior, allowing threads to use all CPUs. This is complicated a bit because we still want separate service partitions - The existing "no affinity" behavior places all service threads in a single service partition, which gives only one queue for service wakeups. So we separate binding behavior from CPT association, allowing us to keep multiple service partitions where desired. Module parameters are added to ldlm, mdt, and ost, of the form "servicename_cpu_bind", such as "mds_rdpg_cpu_bind". Setting them to "0" will disable the strict CPU binding behavior for the threads in that service. Parameters were not added for certain minor services which do not have any CPT affinity/binding behavior today. (This appears to be because they are not expected to be performance sensitive.) cray-bug-id: LUS-6518 WC-bug-id: https://jira.whamcloud.com/browse/LU-11454 Lustre-commit: 3eb7a1dfc3e7 ("LU-11454 ptlrpc: Make CPU binding switchable") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/33262 Reviewed-by: Andreas Dilger Reviewed-by: Chris Horn Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 12 ++++++++---- fs/lustre/ldlm/ldlm_lockd.c | 8 +++++++- fs/lustre/ptlrpc/service.c | 25 +++++++++++++++---------- 3 files changed, 30 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index cbd524c..81a6ac9 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1480,14 +1480,16 @@ struct ptlrpc_service { int srv_watchdog_factor; /** under unregister_service */ unsigned srv_is_stopping:1; + /** Whether or not to restrict service threads to CPUs in this CPT */ + unsigned srv_cpt_bind:1; /** max # request buffers */ int srv_nrqbds_max; /** max # request buffers in history per partition */ int srv_hist_nrqbds_cpt_max; - /** number of CPTs this service bound on */ + /** number of CPTs this service associated with */ int srv_ncpts; - /** CPTs array this service bound on */ + /** CPTs array this service associated with */ u32 *srv_cpts; /** 2^srv_cptab_bits >= cfs_cpt_numbert(srv_cptable) */ int srv_cpt_bits; @@ -1934,8 +1936,8 @@ struct ptlrpc_service_thr_conf { * other members of this structure. */ unsigned int tc_nthrs_user; - /* set NUMA node affinity for service threads */ - unsigned int tc_cpu_affinity; + /* bind service threads to only CPUs in their associated CPT */ + unsigned int tc_cpu_bind; /* Tags for lu_context associated with service thread */ u32 tc_ctx_tags; }; @@ -1944,6 +1946,8 @@ struct ptlrpc_service_cpt_conf { struct cfs_cpt_table *cc_cptable; /* string pattern to describe CPTs for a service */ char *cc_pattern; + /* whether or not to have per-CPT service partitions */ + bool cc_affinity; }; struct ptlrpc_service_conf { diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index b50a3f7..204b11b 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -49,6 +49,11 @@ module_param(ldlm_num_threads, int, 0444); MODULE_PARM_DESC(ldlm_num_threads, "number of DLM service threads to start"); +static unsigned int ldlm_cpu_bind = 1; +module_param(ldlm_cpu_bind, uint, 0444); +MODULE_PARM_DESC(ldlm_cpu_bind, + "bind DLM service threads to particular CPU partitions"); + static char *ldlm_cpts; module_param(ldlm_cpts, charp, 0444); MODULE_PARM_DESC(ldlm_cpts, "CPU partitions ldlm threads should run on"); @@ -1006,11 +1011,12 @@ static int ldlm_setup(void) .tc_nthrs_base = LDLM_NTHRS_BASE, .tc_nthrs_max = LDLM_NTHRS_MAX, .tc_nthrs_user = ldlm_num_threads, - .tc_cpu_affinity = 1, + .tc_cpu_bind = ldlm_cpu_bind, .tc_ctx_tags = LCT_MD_THREAD | LCT_DT_THREAD, }, .psc_cpt = { .cc_pattern = ldlm_cpts, + .cc_affinity = true, }, .psc_ops = { .so_req_handler = ldlm_callback_handler, diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index a9155b2..b94ed6a 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -573,7 +573,13 @@ struct ptlrpc_service * if (!cptable) cptable = cfs_cpt_tab; - if (!conf->psc_thr.tc_cpu_affinity) { + if (conf->psc_thr.tc_cpu_bind > 1) { + CERROR("%s: Invalid cpu bind value %d, only 1 or 0 allowed\n", + conf->psc_name, conf->psc_thr.tc_cpu_bind); + return ERR_PTR(-EINVAL); + } + + if (!cconf->cc_affinity) { ncpts = 1; } else { ncpts = cfs_cpt_number(cptable); @@ -611,6 +617,7 @@ struct ptlrpc_service * service->srv_cptable = cptable; service->srv_cpts = cpts; service->srv_ncpts = ncpts; + service->srv_cpt_bind = conf->psc_thr.tc_cpu_bind; service->srv_cpt_bits = 0; /* it's zero already, easy to read... */ while ((1 << service->srv_cpt_bits) < cfs_cpt_number(cptable)) @@ -646,7 +653,7 @@ struct ptlrpc_service * service->srv_ops = conf->psc_ops; for (i = 0; i < ncpts; i++) { - if (!conf->psc_thr.tc_cpu_affinity) + if (!cconf->cc_affinity) cpt = CFS_CPT_ANY; else cpt = cpts ? cpts[i] : i; @@ -2105,14 +2112,12 @@ static int ptlrpc_main(void *arg) thread->t_pid = current->pid; unshare_fs_struct(); - /* NB: we will call cfs_cpt_bind() for all threads, because we - * might want to run lustre server only on a subset of system CPUs, - * in that case ->scp_cpt is CFS_CPT_ANY - */ - rc = cfs_cpt_bind(svc->srv_cptable, svcpt->scp_cpt); - if (rc != 0) { - CWARN("%s: failed to bind %s on CPT %d\n", - svc->srv_name, thread->t_name, svcpt->scp_cpt); + if (svc->srv_cpt_bind) { + rc = cfs_cpt_bind(svc->srv_cptable, svcpt->scp_cpt); + if (rc != 0) { + CWARN("%s: failed to bind %s on CPT %d\n", + svc->srv_name, thread->t_name, svcpt->scp_cpt); + } } ginfo = groups_alloc(0); From patchwork Thu Feb 27 21:10:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409825 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B71A814BC for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9FEEC246A0 for ; Thu, 27 Feb 2020 21:23:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9FEEC246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7C0D521FDC6; Thu, 27 Feb 2020 13:21:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 748FF21FAAC for ; Thu, 27 Feb 2020 13:18:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DC55213CD; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DAA26468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:06 -0500 Message-Id: <1582838290-17243-139-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 138/622] lustre: misc: quiet console messages at startup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Some modules print less-than-useful messages on every load. Turn these into internal debug messages to reduce noise. The message in gss_init_svc_upcall() should also be quieted, but it exposes that this function is waiting 1.5s on each module load for lsvcgssd to start. This should be fixed separately. WC-bug-id: https://jira.whamcloud.com/browse/LU-1095 Lustre-commit: ed0c19d250f6 ("LU-1095 misc: quiet console messages at startup") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33281 Reviewed-by: Nathaniel Clark Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 0da9269..81b86a0 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -257,10 +257,12 @@ static int lmv_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize) for (i = 0; i < lmv->desc.ld_tgt_count; i++) { struct lmv_tgt_desc *tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) { + if (!tgt || !tgt->ltd_exp) { CWARN("%s: NULL export for %d\n", obd->obd_name, i); continue; } + if (!tgt->ltd_active) + continue; rc = md_init_ea_size(tgt->ltd_exp, easize, def_easize); if (rc) { From patchwork Thu Feb 27 21:10:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409919 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E239A14BC for ; Thu, 27 Feb 2020 21:25:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB36C246A0 for ; Thu, 27 Feb 2020 21:25:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB36C246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8E529348ED2; Thu, 27 Feb 2020 13:23:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B8CBF21FAAC for ; Thu, 27 Feb 2020 13:18:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E0FEF13D5; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DDFB646D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:07 -0500 Message-Id: <1582838290-17243-140-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 139/622] lustre: ldlm: don't apply ELC to converting and DOM locks X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Prevent ELC for locks being converted and for locks having DOM bit set to avoid data flush without need. WC-bug-id: https://jira.whamcloud.com/browse/LU-11276 Lustre-commit: 70a01a6c9c7c ("LU-11276 ldlm: don't apply ELC to converting and DOM locks") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33125 Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 9d3330c..1afe9a5 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1823,7 +1823,8 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, /* If somebody is already doing CANCEL, or blocking AST came, * skip this lock. */ - if (ldlm_is_bl_ast(lock) || ldlm_is_canceling(lock)) + if (ldlm_is_bl_ast(lock) || ldlm_is_canceling(lock) || + ldlm_is_converting(lock)) continue; if (lockmode_compat(lock->l_granted_mode, mode)) @@ -1831,10 +1832,11 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, /* If policy is given and this is IBITS lock, add to list only * those locks that match by policy. + * Skip locks with DoM bit always to don't flush data. */ if (policy && (lock->l_resource->lr_type == LDLM_IBITS) && - !(lock->l_policy_data.l_inodebits.bits & - policy->l_inodebits.bits)) + (!(lock->l_policy_data.l_inodebits.bits & + policy->l_inodebits.bits) || ldlm_has_dom(lock))) continue; /* See CBPENDING comment in ldlm_cancel_lru */ From patchwork Thu Feb 27 21:10:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409829 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECFFB138D for ; Thu, 27 Feb 2020 21:23:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D54B2246A0 for ; Thu, 27 Feb 2020 21:23:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D54B2246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 62F38348A4F; Thu, 27 Feb 2020 13:21:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 11DB921FAAC for ; Thu, 27 Feb 2020 13:19:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E247513D6; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E0FFA46A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:08 -0500 Message-Id: <1582838290-17243-141-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 140/622] lustre: class: use INIT_LIST_HEAD_RCU instead INIT_LIST_HEAD X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng Use INIT_LIST_HEAD_RCU to avoid compiler optimization too much in some case. WC-bug-id: https://jira.whamcloud.com/browse/LU-11453 Lustre-commit: 68bc3984975b ("LU-11453 class: use INIT_LIST_HEAD_RCU instead INIT_LIST_HEAD") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/33317 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index da53572..4465dd9 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -821,7 +821,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd, spin_lock_init(&export->exp_uncommitted_replies_lock); INIT_LIST_HEAD(&export->exp_uncommitted_replies); INIT_LIST_HEAD(&export->exp_req_replay_queue); - INIT_LIST_HEAD(&export->exp_handle.h_link); + INIT_LIST_HEAD_RCU(&export->exp_handle.h_link); INIT_LIST_HEAD(&export->exp_hp_rpcs); class_handle_hash(&export->exp_handle, &export_handle_ops); spin_lock_init(&export->exp_lock); @@ -1018,7 +1018,7 @@ struct obd_import *class_new_import(struct obd_device *obd) atomic_set(&imp->imp_replay_inflight, 0); atomic_set(&imp->imp_inval_count, 0); INIT_LIST_HEAD(&imp->imp_conn_list); - INIT_LIST_HEAD(&imp->imp_handle.h_link); + INIT_LIST_HEAD_RCU(&imp->imp_handle.h_link); class_handle_hash(&imp->imp_handle, &import_handle_ops); init_imp_at(&imp->imp_at); From patchwork Thu Feb 27 21:10:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D671914BC for ; Thu, 27 Feb 2020 21:23:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BEF88246A0 for ; Thu, 27 Feb 2020 21:23:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BEF88246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1ABE321FCB4; Thu, 27 Feb 2020 13:21:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 52A0221FB3F for ; Thu, 27 Feb 2020 13:19:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E538B13D7; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E3AF346F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:09 -0500 Message-Id: <1582838290-17243-142-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 141/622] lustre: uapi: add new changerec_type X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The Lazy Size on MDT is causing the trusted.som xattr to be logged in the changelog whenever a file is needed to update this xattr data casued by file open/close or truncate operations. The original patch landed to the OpenSFS branch fixes this problem to avoid logging this xattr for every file. This introduces a new changelog_rec_type that the mdc changelog code needs to be aware of. WC-bug-id: https://jira.whamcloud.com/browse/LU-11450 Lustre-commit: faf6f514c172 ("LU-11450 mdd: avoid logging trusted.som xattr in changelogs") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/33323 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index bff6f76..844e50e 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -966,6 +966,7 @@ enum la_valid { /********* Changelogs **********/ /** Changelog record types */ enum changelog_rec_type { + CL_NONE = -1, CL_MARK = 0, CL_CREATE = 1, /* namespace */ CL_MKDIR = 2, /* namespace */ From patchwork Thu Feb 27 21:10:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409923 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1806138D for ; Thu, 27 Feb 2020 21:25:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D9480246A0 for ; Thu, 27 Feb 2020 21:25:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9480246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1A3F2348EFE; Thu, 27 Feb 2020 13:23:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93FD621FB3F for ; Thu, 27 Feb 2020 13:19:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E7DB013D8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E681C46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:10 -0500 Message-Id: <1582838290-17243-143-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 142/622] lustre: ldlm: check double grant race after resource change X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang In ldlm_handle_cp_callback(), we call lock_res_and_lock and then check if the ldlm lock has already been granted. If the lock resource has changed, we release the lock and go ahead allocating new resource, then grabs the lock again before calling ldlm_grant_lock(). However this gives another thread an opportunity to grab the lock and pass the check, while we change the resource. Eventually the other thread calls ldlm_grant_lock() on the same ldlm lock and triggers a LASSERT. Fix the issue by doing double grant race check after changing the lock resource. WC-bug-id: https://jira.whamcloud.com/browse/LU-8391 Lustre-commit: fef1020406a0 ("LU-8391 ldlm: check double grant race after resource change") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/21275 Reviewed-by: Andreas Dilger Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lockd.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 204b11b..6905ee5 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -214,6 +214,21 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, } lock_res_and_lock(lock); + + if (!ldlm_res_eq(&dlm_req->lock_desc.l_resource.lr_name, + &lock->l_resource->lr_name)) { + ldlm_resource_unlink_lock(lock); + unlock_res_and_lock(lock); + rc = ldlm_lock_change_resource(ns, lock, + &dlm_req->lock_desc.l_resource.lr_name); + if (rc < 0) { + LDLM_ERROR(lock, "Failed to allocate resource"); + goto out; + } + LDLM_DEBUG(lock, "completion AST, new resource"); + lock_res_and_lock(lock); + } + if (ldlm_is_destroyed(lock) || lock->l_granted_mode == lock->l_req_mode) { /* bug 11300: the lock has already been granted */ @@ -240,20 +255,6 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, } ldlm_resource_unlink_lock(lock); - if (memcmp(&dlm_req->lock_desc.l_resource.lr_name, - &lock->l_resource->lr_name, - sizeof(lock->l_resource->lr_name)) != 0) { - unlock_res_and_lock(lock); - rc = ldlm_lock_change_resource(ns, lock, - &dlm_req->lock_desc.l_resource.lr_name); - if (rc < 0) { - LDLM_ERROR(lock, "Failed to allocate resource"); - goto out; - } - LDLM_DEBUG(lock, "completion AST, new resource"); - CERROR("change resource!\n"); - lock_res_and_lock(lock); - } if (dlm_req->lock_flags & LDLM_FL_AST_SENT) { /* BL_AST locks are not needed in LRU. From patchwork Thu Feb 27 21:10:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409927 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D09E614BC for ; Thu, 27 Feb 2020 21:25:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B96DD246A0 for ; Thu, 27 Feb 2020 21:25:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B96DD246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 55CCF348F25; Thu, 27 Feb 2020 13:23:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4CD821FBAD for ; Thu, 27 Feb 2020 13:19:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EAAA813E8; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E970D468; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:11 -0500 Message-Id: <1582838290-17243-144-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 143/622] lustre: mdc: grow lvb buffer to hold layout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam Write intent RPC could generate a layout bigger than the initial mdt_max_mdsize, so that the new layout cannot be returned to client, this patch fix the client side issue by: * define a new MAX_MD_SIZE to hold a reasonal composite layout, and keeps old MAX_MD_SIZE as MAX_MD_SIZE_OLD. WC-bug-id: https://jira.whamcloud.com/browse/LU-11158 Lustre-commit: e5abcf83c057 ("LU-11158 mdt: grow lvb buffer to hold layout") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/32847 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_locks.c | 4 +++- include/uapi/linux/lustre/lustre_idl.h | 5 ++++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 09f9bc5..f9d66a4 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -614,7 +614,7 @@ static int mdc_finish_enqueue(struct obd_export *exp, (!it_disposition(it, DISP_OPEN_OPEN) || it->it_status != 0)) mdc_clear_replay_flag(req, it->it_status); - DEBUG_REQ(D_RPCTRACE, req, "op: %d disposition: %x, status: %d", + DEBUG_REQ(D_RPCTRACE, req, "op: %x disposition: %x, status: %d", it->it_op, it->it_disposition, it->it_status); /* We know what to expect, so we do any byte flipping required here */ @@ -680,6 +680,8 @@ static int mdc_finish_enqueue(struct obd_export *exp, * is packed into RMF_DLM_LVB of req */ lvb_len = req_capsule_get_size(pill, &RMF_DLM_LVB, RCL_SERVER); + CDEBUG(D_INFO, "%s: layout return lvb %d transno %lld\n", + class_exp2obd(exp)->obd_name, lvb_len, req->rq_transno); if (lvb_len > 0) { lvb_data = req_capsule_server_sized_get(pill, &RMF_DLM_LVB, diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 8002e046..2f15671 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1049,8 +1049,11 @@ struct lov_mds_md_v1 { /* LOV EA mds/wire data (little-endian) */ struct lov_ost_data_v1 lmm_objects[0]; /* per-stripe data */ }; -#define MAX_MD_SIZE \ +#define MAX_MD_SIZE_OLD \ (sizeof(struct lov_mds_md) + 4 * sizeof(struct lov_ost_data)) +#define MAX_MD_SIZE \ + (sizeof(struct lov_comp_md_v1) + \ + 4 * (sizeof(struct lov_comp_md_entry_v1) + MAX_MD_SIZE_OLD)) #define MIN_MD_SIZE \ (sizeof(struct lov_mds_md) + 1 * sizeof(struct lov_ost_data)) From patchwork Thu Feb 27 21:10:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409931 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C711138D for ; Thu, 27 Feb 2020 21:25:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 353A9246A0 for ; Thu, 27 Feb 2020 21:25:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 353A9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8A66348F56; Thu, 27 Feb 2020 13:23:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 258CC21FBAD for ; Thu, 27 Feb 2020 13:19:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EDA8713E9; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EC4FA46D; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:12 -0500 Message-Id: <1582838290-17243-145-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 144/622] lustre: osc: re-check target versus available grant X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev - under the spinlock, otherwise it's possible that available grant has changed since target calculation and bytes to shrink go negative. - tgt_grant_alloc() should avoid negative grants WC-bug-id: https://jira.whamcloud.com/browse/LU-11288 Lustre-commit: fcbd8c981239 ("LU-11288 osc: re-check target versus available grant") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/33226 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 4524a98..18b99a9 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -811,6 +811,12 @@ int osc_shrink_grant_to_target(struct client_obd *cli, u64 target_bytes) osc_announce_cached(cli, &body->oa, 0); spin_lock(&cli->cl_loi_list_lock); + if (target_bytes >= cli->cl_avail_grant) { + /* available grant has changed since target calculation */ + spin_unlock(&cli->cl_loi_list_lock); + rc = 0; + goto out_free; + } body->oa.o_grant = cli->cl_avail_grant - target_bytes; cli->cl_avail_grant = target_bytes; spin_unlock(&cli->cl_loi_list_lock); @@ -826,6 +832,7 @@ int osc_shrink_grant_to_target(struct client_obd *cli, u64 target_bytes) sizeof(*body), body, NULL); if (rc != 0) __osc_update_grant(cli, body->oa.o_grant); +out_free: kfree(body); return rc; } From patchwork Thu Feb 27 21:10:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409935 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A91F514BC for ; Thu, 27 Feb 2020 21:26:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 92067246A0 for ; Thu, 27 Feb 2020 21:26:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92067246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 563EE348F80; Thu, 27 Feb 2020 13:23:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6A4EC21FBB4 for ; Thu, 27 Feb 2020 13:19:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F0E7613EA; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EFAC546A; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:13 -0500 Message-Id: <1582838290-17243-146-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 145/622] lnet: unlink md if fail to send recovery X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata MD for recovery ping should be unlinked if we fail to send the GET. WC-bug-id: https://jira.whamcloud.com/browse/LU-11474 Lustre-commit: e0132e16df15 ("LU-11474 lnet: unlink md if fail to send recovery") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33306 Reviewed-by: Sonia Sharma Reviewed-by: Doug Oucharek Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 7 ++++-- net/lnet/lnet/lib-move.c | 48 +++++++++++++++++++++++++++++++++--------- 2 files changed, 43 insertions(+), 12 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index f82ebb6..b2159b0 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -317,7 +317,8 @@ struct lnet_tx_queue { #define LNET_NI_STATE_ACTIVE (1 << 1) #define LNET_NI_STATE_FAILED (1 << 2) #define LNET_NI_STATE_RECOVERY_PENDING (1 << 3) -#define LNET_NI_STATE_DELETING (1 << 4) +#define LNET_NI_STATE_RECOVERY_FAILED BIT(4) +#define LNET_NI_STATE_DELETING BIT(5) enum lnet_stats_type { LNET_STATS_TYPE_SEND = 0, @@ -606,8 +607,10 @@ struct lnet_peer_ni { #define LNET_PEER_NI_NON_MR_PREF BIT(0) /* peer is being recovered. */ #define LNET_PEER_NI_RECOVERY_PENDING BIT(1) +/* recovery ping failed */ +#define LNET_PEER_NI_RECOVERY_FAILED BIT(2) /* peer is being deleted */ -#define LNET_PEER_NI_DELETING BIT(2) +#define LNET_PEER_NI_DELETING BIT(3) struct lnet_peer { /* chain on pt_peer_list */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 38ee970..b54fbab 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2615,13 +2615,13 @@ struct lnet_mt_event_info { /* called with cpt and ni_lock held */ static void -lnet_unlink_ni_recovery_mdh_locked(struct lnet_ni *ni, int cpt) +lnet_unlink_ni_recovery_mdh_locked(struct lnet_ni *ni, int cpt, bool force) { struct lnet_handle_md recovery_mdh; LNetInvalidateMDHandle(&recovery_mdh); - if (ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING) { + if (ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING || force) { recovery_mdh = ni->ni_ping_mdh; LNetInvalidateMDHandle(&ni->ni_ping_mdh); } @@ -2675,12 +2675,22 @@ struct lnet_mt_event_info { if (!(ni->ni_state & LNET_NI_STATE_ACTIVE) || healthv == LNET_MAX_HEALTH_VALUE) { list_del_init(&ni->ni_recovery); - lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_unlink_ni_recovery_mdh_locked(ni, 0, false); lnet_ni_unlock(ni); lnet_ni_decref_locked(ni, 0); lnet_net_unlock(0); continue; } + + /* if the local NI failed recovery we must unlink the md. + * But we want to keep the local_ni on the recovery queue + * so we can continue the attempts to recover it. + */ + if (ni->ni_state & LNET_NI_STATE_RECOVERY_FAILED) { + lnet_unlink_ni_recovery_mdh_locked(ni, 0, true); + ni->ni_state &= ~LNET_NI_STATE_RECOVERY_FAILED; + } + lnet_ni_unlock(ni); lnet_net_unlock(0); @@ -2829,7 +2839,7 @@ struct lnet_mt_event_info { struct lnet_ni, ni_recovery); list_del_init(&ni->ni_recovery); lnet_ni_lock(ni); - lnet_unlink_ni_recovery_mdh_locked(ni, 0); + lnet_unlink_ni_recovery_mdh_locked(ni, 0, true); lnet_ni_unlock(ni); lnet_ni_decref_locked(ni, 0); } @@ -2838,13 +2848,14 @@ struct lnet_mt_event_info { } static void -lnet_unlink_lpni_recovery_mdh_locked(struct lnet_peer_ni *lpni, int cpt) +lnet_unlink_lpni_recovery_mdh_locked(struct lnet_peer_ni *lpni, int cpt, + bool force) { struct lnet_handle_md recovery_mdh; LNetInvalidateMDHandle(&recovery_mdh); - if (lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING) { + if (lpni->lpni_state & LNET_PEER_NI_RECOVERY_PENDING || force) { recovery_mdh = lpni->lpni_recovery_ping_mdh; LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); } @@ -2867,7 +2878,7 @@ struct lnet_mt_event_info { lpni_recovery) { list_del_init(&lpni->lpni_recovery); spin_lock(&lpni->lpni_lock); - lnet_unlink_lpni_recovery_mdh_locked(lpni, LNET_LOCK_EX); + lnet_unlink_lpni_recovery_mdh_locked(lpni, LNET_LOCK_EX, true); spin_unlock(&lpni->lpni_lock); lnet_peer_ni_decref_locked(lpni); } @@ -2933,12 +2944,22 @@ struct lnet_mt_event_info { if (lpni->lpni_state & LNET_PEER_NI_DELETING || healthv == LNET_MAX_HEALTH_VALUE) { list_del_init(&lpni->lpni_recovery); - lnet_unlink_lpni_recovery_mdh_locked(lpni, 0); + lnet_unlink_lpni_recovery_mdh_locked(lpni, 0, false); spin_unlock(&lpni->lpni_lock); lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(0); continue; } + + /* If the peer NI has failed recovery we must unlink the + * md. But we want to keep the peer ni on the recovery + * queue so we can try to continue recovering it + */ + if (lpni->lpni_state & LNET_PEER_NI_RECOVERY_FAILED) { + lnet_unlink_lpni_recovery_mdh_locked(lpni, 0, true); + lpni->lpni_state &= ~LNET_PEER_NI_RECOVERY_FAILED; + } + spin_unlock(&lpni->lpni_lock); lnet_net_unlock(0); @@ -3152,11 +3173,14 @@ struct lnet_mt_event_info { } lnet_ni_lock(ni); ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + if (status) + ni->ni_state |= LNET_NI_STATE_RECOVERY_FAILED; lnet_ni_unlock(ni); lnet_net_unlock(0); if (status != 0) { - CERROR("local NI recovery failed with %d\n", status); + CERROR("local NI (%s) recovery failed with %d\n", + libcfs_nid2str(nid), status); return; } /* need to increment healthv for the ni here, because in @@ -3178,12 +3202,15 @@ struct lnet_mt_event_info { } spin_lock(&lpni->lpni_lock); lpni->lpni_state &= ~LNET_PEER_NI_RECOVERY_PENDING; + if (status) + lpni->lpni_state |= LNET_PEER_NI_RECOVERY_FAILED; spin_unlock(&lpni->lpni_lock); lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); if (status != 0) - CERROR("peer NI recovery failed with %d\n", status); + CERROR("peer NI (%s) recovery failed with %d\n", + libcfs_nid2str(nid), status); } } @@ -3214,6 +3241,7 @@ struct lnet_mt_event_info { libcfs_nid2str(ev_info->mt_nid), (event->status) ? "unsuccessfully" : "successfully", event->status); + lnet_handle_recovery_reply(ev_info, event->status); break; default: CERROR("Unexpected event: %d\n", event->type); From patchwork Thu Feb 27 21:10:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409939 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEB9B138D for ; Thu, 27 Feb 2020 21:26:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D55E4246A0 for ; Thu, 27 Feb 2020 21:26:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D55E4246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 01EB5348FB1; Thu, 27 Feb 2020 13:23:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C298221FB5C for ; Thu, 27 Feb 2020 13:19:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F41151E80; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F2D1A46C; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:14 -0500 Message-Id: <1582838290-17243-147-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 146/622] lustre: obd: use correct names for conn_uuid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The LUSTRE_R[OW]_ATTR() macros assume that the name of the sysfs file to create matches the beginning of the function names. In the case of LUSTRE_RO_ATTR(conn_uuid) this maps to the function conn_uuid_show() and generated sysfs files "conn_uuid". While it makes sense to standardize this interface we need to keep the old xxx_conn_uuid. We can create these xxx_conn_uuid sysfs files by using the base sysfs attr macro LUSTRE_ATTR(). WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: f2bf876ef77e ("LU-8066 obd: use correct names for conn_uuid") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33213 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/lproc_mdc.c | 7 ++++--- fs/lustre/mgc/lproc_mgc.c | 5 +++-- fs/lustre/obdclass/lprocfs_status.c | 24 ------------------------ fs/lustre/osc/lproc_osc.c | 5 +++-- 4 files changed, 10 insertions(+), 31 deletions(-) diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 0c52bcf..746dd21 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -303,8 +303,8 @@ static ssize_t max_mod_rpcs_in_flight_store(struct kobject *kobj, LUSTRE_RW_ATTR(max_pages_per_rpc); -#define mdc_conn_uuid_show conn_uuid_show -LUSTRE_RO_ATTR(mdc_conn_uuid); +LUSTRE_ATTR(mds_conn_uuid, 0444, conn_uuid_show, NULL); +LUSTRE_RO_ATTR(conn_uuid); LUSTRE_RO_ATTR(ping); @@ -529,7 +529,8 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file, &lustre_attr_max_rpcs_in_flight.attr, &lustre_attr_max_mod_rpcs_in_flight.attr, &lustre_attr_max_pages_per_rpc.attr, - &lustre_attr_mdc_conn_uuid.attr, + &lustre_attr_mds_conn_uuid.attr, + &lustre_attr_conn_uuid.attr, &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/mgc/lproc_mgc.c b/fs/lustre/mgc/lproc_mgc.c index 4c276f9..676d479 100644 --- a/fs/lustre/mgc/lproc_mgc.c +++ b/fs/lustre/mgc/lproc_mgc.c @@ -66,13 +66,14 @@ struct lprocfs_vars lprocfs_mgc_obd_vars[] = { { NULL } }; -#define mgs_conn_uuid_show conn_uuid_show -LUSTRE_RO_ATTR(mgs_conn_uuid); +LUSTRE_ATTR(mgs_conn_uuid, 0444, conn_uuid_show, NULL); +LUSTRE_RO_ATTR(conn_uuid); LUSTRE_RO_ATTR(ping); static struct attribute *mgc_attrs[] = { &lustre_attr_mgs_conn_uuid.attr, + &lustre_attr_conn_uuid.attr, &lustre_attr_ping.attr, NULL, }; diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index b3dbe85..cce9bec 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -524,30 +524,6 @@ int lprocfs_rd_server_uuid(struct seq_file *m, void *data) } EXPORT_SYMBOL(lprocfs_rd_server_uuid); -int lprocfs_rd_conn_uuid(struct seq_file *m, void *data) -{ - struct obd_device *obd = data; - struct ptlrpc_connection *conn; - int rc; - - LASSERT(obd); - - rc = lprocfs_climp_check(obd); - if (rc) - return rc; - - conn = obd->u.cli.cl_import->imp_connection; - if (conn && obd->u.cli.cl_import) - seq_printf(m, "%s\n", conn->c_remote_uuid.uuid); - else - seq_puts(m, "\n"); - - up_read(&obd->u.cli.cl_sem); - - return 0; -} -EXPORT_SYMBOL(lprocfs_rd_conn_uuid); - /** * Lock statistics structure for access, possibly only on this CPU. * diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index f025275..d9030b7 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -173,8 +173,8 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj, } LUSTRE_RW_ATTR(max_dirty_mb); -#define ost_conn_uuid_show conn_uuid_show -LUSTRE_RO_ATTR(ost_conn_uuid); +LUSTRE_ATTR(ost_conn_uuid, 0444, conn_uuid_show, NULL); +LUSTRE_RO_ATTR(conn_uuid); LUSTRE_RO_ATTR(ping); @@ -962,6 +962,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev) &lustre_attr_short_io_bytes.attr, &lustre_attr_resend_count.attr, &lustre_attr_ost_conn_uuid.attr, + &lustre_attr_conn_uuid.attr, &lustre_attr_ping.attr, &lustre_attr_idle_timeout.attr, &lustre_attr_idle_connect.attr, From patchwork Thu Feb 27 21:10:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409943 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F34B14BC for ; Thu, 27 Feb 2020 21:26:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 47BA4246A0 for ; Thu, 27 Feb 2020 21:26:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47BA4246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A8E96348FD4; Thu, 27 Feb 2020 13:23:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2512621FB5C for ; Thu, 27 Feb 2020 13:19:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 02A2F1E85; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 01983468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:15 -0500 Message-Id: <1582838290-17243-148-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 147/622] lustre: idl: use proper ATTR/MDS_ATTR/MDS_OPEN flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Add proper MDS_ATTR_* and MDS_OPEN_* flags for different flags namespaces. The MDS_OPEN_OWNEROVERRIDE was being mapped into the MDS_ATTR_* flags in some cases. This did not conflict yet, but add separate ATTR_OVERRIDE and MDS_ATTR_OVERRIDE flags for this use so they don't conflict in the future. Remove the MDS_OPEN_CROSS flag, since this was only used internally as a hack to pass open flags to mdd_permission(), but was truncating the u64 open flags to a 32-bit int in the process. WC-bug-id: https://jira.whamcloud.com/browse/LU-10030 Lustre-commit: 9c2ffe39bd32 ("LU-10030 idl: use proper ATTR/MDS_ATTR/MDS_OPEN flags") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32107 Reviewed-by: James Simmons Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 6 ++---- include/uapi/linux/lustre/lustre_idl.h | 1 + include/uapi/linux/lustre/lustre_user.h | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c6dd256..f72e5fc 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -251,8 +251,6 @@ void lustre_assert_wire_constants(void) (long long)MDS_ATTR_KILL_SGID); LASSERTF(MDS_ATTR_CTIME_SET == 0x0000000000002000ULL, "found 0x%.16llxULL\n", (long long)MDS_ATTR_CTIME_SET); - LASSERTF(MDS_ATTR_FROM_OPEN == 0x0000000000004000ULL, "found 0x%.16llxULL\n", - (long long)MDS_ATTR_FROM_OPEN); LASSERTF(MDS_ATTR_BLOCKS == 0x0000000000008000ULL, "found 0x%.16llxULL\n", (long long)MDS_ATTR_BLOCKS); LASSERTF(MDS_ATTR_PROJID == 0x0000000000010000ULL, "found 0x%.16llxULL\n", @@ -262,6 +260,8 @@ void lustre_assert_wire_constants(void) (long long)MDS_ATTR_LSIZE); LASSERTF(MDS_ATTR_LBLOCKS == 0x0000000000040000ULL, "found 0x%.16llxULL\n", (long long)MDS_ATTR_LBLOCKS); + LASSERTF(MDS_ATTR_OVERRIDE == 0x0000000002000000ULL, "found 0x%.16llxULL\n", + (long long)MDS_ATTR_OVERRIDE); LASSERTF(FLD_QUERY == 900, "found %lld\n", (long long)FLD_QUERY); LASSERTF(FLD_FIRST_OPC == 900, "found %lld\n", @@ -2094,8 +2094,6 @@ void lustre_assert_wire_constants(void) MDS_FMODE_EXEC); LASSERTF(MDS_OPEN_CREATED == 000000000010UL, "found 0%.11oUL\n", MDS_OPEN_CREATED); - LASSERTF(MDS_OPEN_CROSS == 000000000020UL, "found 0%.11oUL\n", - MDS_OPEN_CROSS); LASSERTF(MDS_OPEN_CREAT == 000000000100UL, "found 0%.11oUL\n", MDS_OPEN_CREAT); LASSERTF(MDS_OPEN_EXCL == 000000000200UL, "found 0%.11oUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 2f15671..d46a921 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1681,6 +1681,7 @@ struct mdt_rec_setattr { #define MDS_ATTR_PROJID 0x10000ULL /* = 65536 */ #define MDS_ATTR_LSIZE 0x20000ULL /* = 131072 */ #define MDS_ATTR_LBLOCKS 0x40000ULL /* = 262144 */ +#define MDS_ATTR_OVERRIDE 0x2000000ULL /* = 33554432 */ enum mds_op_bias { /* MDS_CHECK_SPLIT = 1 << 0, obsolete before 2.3.58 */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 844e50e..db751d8 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -922,7 +922,7 @@ enum la_valid { /* MDS_FMODE_SOM 04000000 obsolete since 2.8.0 */ #define MDS_OPEN_CREATED 00000010 -#define MDS_OPEN_CROSS 00000020 +/* MDS_OPEN_CROSS 00000020 obsolete in 2.12, internal use only */ #define MDS_OPEN_CREAT 00000100 #define MDS_OPEN_EXCL 00000200 From patchwork Thu Feb 27 21:10:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410121 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AD1C92A for ; Thu, 27 Feb 2020 21:30:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E7A5C20801 for ; Thu, 27 Feb 2020 21:30:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E7A5C20801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 943B33496D7; Thu, 27 Feb 2020 13:26:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 82C4E21FBBA for ; Thu, 27 Feb 2020 13:19:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 060861E86; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 047EA46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:16 -0500 Message-Id: <1582838290-17243-149-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 148/622] lustre: llite: optimize read on open pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jinshan Xiong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jinshan Xiong Current read-on-open implementation does allocate cl_page after data are piggied back by open request, which is expensive and not necessary. This patch improves the case by just adding the pages into page cache. As long as those pages will be discarded at lock revocation, there should be no concerns. WC-bug-id: https://jira.whamcloud.com/browse/LU-11427 Lustre-commit: 02e766f5ed95 ("LU-11427 llite: optimize read on open pages") Signed-off-by: Jinshan Xiong Reviewed-on: https://review.whamcloud.com/33234 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 58 +++++-------------------------------------------- fs/lustre/llite/namei.c | 7 +++++- 2 files changed, 11 insertions(+), 54 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a46f5d3..2fd906f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -420,14 +420,10 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, struct page *vmpage; struct niobuf_remote *rnb; char *data; - struct lu_env *env; - struct cl_io *io; - u16 refcheck; struct lustre_handle lockh; struct ldlm_lock *lock; unsigned long index, start; struct niobuf_local lnb; - int rc; bool dom_lock = false; if (!obj) @@ -440,37 +436,21 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, dom_lock = ldlm_has_dom(lock); LDLM_LOCK_PUT(lock); } - if (!dom_lock) return; - env = cl_env_get(&refcheck); - if (IS_ERR(env)) - return; - if (!req_capsule_has_field(&req->rq_pill, &RMF_NIOBUF_INLINE, - RCL_SERVER)) { - rc = -ENODATA; - goto out_env; - } + RCL_SERVER)) + return; rnb = req_capsule_server_get(&req->rq_pill, &RMF_NIOBUF_INLINE); - data = (char *)rnb + sizeof(*rnb); - - if (!rnb || rnb->rnb_len == 0) { - rc = 0; - goto out_env; - } + if (!rnb || rnb->rnb_len == 0) + return; CDEBUG(D_INFO, "Get data buffer along with open, len %i, i_size %llu\n", rnb->rnb_len, i_size_read(inode)); - io = vvp_env_thread_io(env); - io->ci_obj = obj; - io->ci_ignore_layout = 1; - rc = cl_io_init(env, io, CIT_MISC, obj); - if (rc) - goto out_io; + data = (char *)rnb + sizeof(*rnb); lnb.lnb_file_offset = rnb->rnb_offset; start = lnb.lnb_file_offset / PAGE_SIZE; @@ -478,8 +458,6 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, LASSERT(lnb.lnb_file_offset % PAGE_SIZE == 0); lnb.lnb_page_offset = 0; do { - struct cl_page *clp; - lnb.lnb_data = data + (index << PAGE_SHIFT); lnb.lnb_len = rnb->rnb_len - (index << PAGE_SHIFT); if (lnb.lnb_len > PAGE_SIZE) @@ -495,35 +473,9 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, PTR_ERR(vmpage)); break; } - lock_page(vmpage); - if (!vmpage->mapping) { - unlock_page(vmpage); - put_page(vmpage); - /* page was truncated */ - rc = -ENODATA; - goto out_io; - } - clp = cl_page_find(env, obj, vmpage->index, vmpage, - CPT_CACHEABLE); - if (IS_ERR(clp)) { - unlock_page(vmpage); - put_page(vmpage); - rc = PTR_ERR(clp); - goto out_io; - } - - /* export page */ - cl_page_export(env, clp, 1); - cl_page_put(env, clp); - unlock_page(vmpage); put_page(vmpage); index++; } while (rnb->rnb_len > (index << PAGE_SHIFT)); - rc = 0; -out_io: - cl_io_fini(env, io); -out_env: - cl_env_put(env, &refcheck); } static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 4ac62b2..530c2df 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -185,8 +185,13 @@ int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) int rc; u16 refcheck; - if (!lli->lli_clob) + if (!lli->lli_clob) { + /* Due to DoM read on open, there may exist pages for Lustre + * regular file even though cl_object is not set up yet. + */ + truncate_inode_pages(inode->i_mapping, 0); return 0; + } env = cl_env_get(&refcheck); if (IS_ERR(env)) From patchwork Thu Feb 27 21:10:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409909 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B9F4138D for ; Thu, 27 Feb 2020 21:25:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14479246A0 for ; Thu, 27 Feb 2020 21:25:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14479246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BBB8C348E6A; Thu, 27 Feb 2020 13:22:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D974721FBBA for ; Thu, 27 Feb 2020 13:19:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 08E671E87; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0741946F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:17 -0500 Message-Id: <1582838290-17243-150-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 149/622] lnet: set the health status correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata There are cases where the health status wasn't set properly. Most notably in the tx_done we need to deal with a specific set of errno: ENETDOWN, EHOSTUNREACH, ENETUNREACH, ECONNREFUSED, ECONNRESET. In all those cases we can try and resend to other available peer NIs. WC-bug-id: https://jira.whamcloud.com/browse/LU-11476 Lustre-commit: 5d77f0d8dc74 ("LU-11476 lnet: set the health status correctly") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33307 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd_cb.c | 8 ++++++-- net/lnet/lnet/lib-move.c | 5 ++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 10a1934..abb3529 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -374,8 +374,10 @@ struct ksock_tx * tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; else if (error == -ENETDOWN || error == -EHOSTUNREACH || - error == -ENETUNREACH) - tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_DROPPED; + error == -ENETUNREACH || + error == -ECONNREFUSED || + error == -ECONNRESET) + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_DROPPED; /* for all other errors we don't want to * retransmit */ @@ -901,6 +903,7 @@ struct ksock_route * /* NB Routes may be ignored if connections to them failed recently */ CNETERR("No usable routes to %s\n", libcfs_id2str(id)); + tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_ERROR; return -EHOSTUNREACH; } @@ -986,6 +989,7 @@ struct ksock_route * if (!rc) return 0; + lntmsg->msg_health_status = tx->tx_hstatus; ksocknal_free_tx(tx); return -EIO; } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index b54fbab..bbbcd8d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -770,10 +770,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); - if (do_send) { - msg->msg_health_status = LNET_MSG_STATUS_LOCAL_DROPPED; + msg->msg_health_status = LNET_MSG_STATUS_LOCAL_DROPPED; + if (do_send) lnet_finalize(msg, -EHOSTUNREACH); - } lnet_net_lock(cpt); return -EHOSTUNREACH; From patchwork Thu Feb 27 21:10:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9100138D for ; Thu, 27 Feb 2020 21:25:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A1CEB246A0 for ; Thu, 27 Feb 2020 21:25:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1CEB246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 46959348E95; Thu, 27 Feb 2020 13:22:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28FB121FBBA for ; Thu, 27 Feb 2020 13:19:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0BE5E1E88; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0A4AF46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:18 -0500 Message-Id: <1582838290-17243-151-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 150/622] lustre: lov: add debugging info for statfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In obd_statfs() print the device name in the debug logs for clarity. WC-bug-id: https://jira.whamcloud.com/browse/LU-7770 Lustre-commit: b917406a7f0a ("LU-7770 lov: fix statfs for conf-sanity test_50b") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33369 Reviewed-by: Ben Evans Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 14 +++++++------- fs/lustre/lov/lov_obd.c | 4 +--- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 01eb385..742e92a 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -891,8 +891,8 @@ static inline int obd_statfs_async(struct obd_export *exp, time64_t max_age, struct ptlrpc_request_set *rqset) { - int rc = 0; struct obd_device *obd; + int rc = 0; if (!exp || !exp->exp_obd) return -EINVAL; @@ -903,8 +903,8 @@ static inline int obd_statfs_async(struct obd_export *exp, return -EOPNOTSUPP; } - CDEBUG(D_SUPER, "%s: osfs %p age %lld, max_age %lld\n", - obd->obd_name, &obd->obd_osfs, obd->obd_osfs_age, max_age); + CDEBUG(D_SUPER, "%s: age %lld, max_age %lld\n", + obd->obd_name, obd->obd_osfs_age, max_age); if (obd->obd_osfs_age < max_age) { rc = OBP(obd, statfs_async)(exp, oinfo, max_age, rqset); } else { @@ -935,20 +935,20 @@ static inline int obd_statfs(const struct lu_env *env, struct obd_export *exp, struct obd_device *obd = exp->exp_obd; int rc = 0; - if (!obd) + if (unlikely(!obd)) return -EINVAL; rc = obd_check_dev_active(obd); if (rc) return rc; - if (!obd->obd_type || !obd->obd_type->typ_dt_ops->statfs) { + if (unlikely(!obd->obd_type || !obd->obd_type->typ_dt_ops->statfs)) { CERROR("%s: no %s operation\n", obd->obd_name, __func__); return -EOPNOTSUPP; } - CDEBUG(D_SUPER, "osfs %lld, max_age %lld\n", - obd->obd_osfs_age, max_age); + CDEBUG(D_SUPER, "%s: age %lld, max_age %lld\n", + obd->obd_name, obd->obd_osfs_age, max_age); /* ignore cache if aggregated isn't expected */ if (obd->obd_osfs_age < max_age || ((obd->obd_osfs.os_state & OS_STATE_SUM) && diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 9a6ffe8..a16c663 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -1122,9 +1122,7 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len, if (!lov->lov_tgts[i] || !lov->lov_tgts[i]->ltd_exp) continue; - /* ll_umount_begin() sets force flag but for lov, not - * osc. Let's pass it through - */ + /* ll_umount_begin() sets force on lov, pass to osc */ osc_obd = class_exp2obd(lov->lov_tgts[i]->ltd_exp); osc_obd->obd_force = obddev->obd_force; err = obd_iocontrol(cmd, lov->lov_tgts[i]->ltd_exp, From patchwork Thu Feb 27 21:10:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409917 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A273714BC for ; Thu, 27 Feb 2020 21:25:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8ADD7246A0 for ; Thu, 27 Feb 2020 21:25:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8ADD7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AC03B348935; Thu, 27 Feb 2020 13:22:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BF7B21FB51 for ; Thu, 27 Feb 2020 13:19:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0EF751E89; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0D50B468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:19 -0500 Message-Id: <1582838290-17243-152-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 151/622] lnet: Decrement health on timeout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a response times out we want to decrement the health of the immediate next hop peer ni, so we don't use that interface if there are others available. When sending a message if there is a response tracker associated with the MD, store the next-hop-nid there. If the response times out then we can look up the peer_ni using the cached NID, and decrement its health value. WC-bug-id: https://jira.whamcloud.com/browse/LU-11472 Lustre-commit: 139d69141b73 ("LU-11472 lnet: Decrement health on timeout") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33308 Reviewed-by: Sonia Sharma Reviewed-by: Doug Oucharek Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/lib-move.c | 33 ++++++++++++++++++++++++++++++++- net/lnet/lnet/lib-msg.c | 24 +++++++++++++++--------- 4 files changed, 50 insertions(+), 10 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index a1dad9f..ecacd65 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -641,6 +641,7 @@ void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, void lnet_finalize(struct lnet_msg *msg, int rc); bool lnet_send_error_simulation(struct lnet_msg *msg, enum lnet_msg_hstatus *hstatus); +void lnet_handle_remote_failure_locked(struct lnet_peer_ni *lpni); void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, u32 msg_type); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index b2159b0..ce0caa9 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -81,6 +81,8 @@ struct lnet_rsp_tracker { struct list_head rspt_on_list; /* cpt to lock */ int rspt_cpt; + /* nid of next hop */ + lnet_nid_t rspt_next_hop_nid; /* deadline of the REPLY/ACK */ ktime_t rspt_deadline; /* parent MD */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index bbbcd8d..548ea88 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1432,6 +1432,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, u32 send_case = sd->sd_send_case; int rc; u32 routing = send_case & REMOTE_DST; + struct lnet_rsp_tracker *rspt; /* Increment sequence number of the selected peer so that we * pick the next one in Round Robin. @@ -1515,6 +1516,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, msg->msg_hdr.dest_nid = cpu_to_le64(msg->msg_txpeer->lpni_nid); } + /* if we have response tracker block update it with the next hop + * nid + */ + if (msg->msg_md) { + rspt = msg->msg_md->md_rspt_ptr; + if (rspt) { + rspt->rspt_next_hop_nid = msg->msg_txpeer->lpni_nid; + CDEBUG(D_NET, "rspt_next_hop_nid = %s\n", + libcfs_nid2str(rspt->rspt_next_hop_nid)); + } + } + rc = lnet_post_send_locked(msg, 0); if (!rc) CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s try# %d\n", @@ -2497,6 +2510,9 @@ struct lnet_mt_event_info { if (ktime_compare(ktime_get(), rspt->rspt_deadline) >= 0 || force) { + struct lnet_peer_ni *lpni; + lnet_nid_t nid; + md = lnet_handle2md(&rspt->rspt_mdh); if (!md) { LNetInvalidateMDHandle(&rspt->rspt_mdh); @@ -2515,9 +2531,24 @@ struct lnet_mt_event_info { list_del_init(&rspt->rspt_on_list); - CNETERR("Response timed out: md = %p\n", md); + nid = rspt->rspt_next_hop_nid; + + CNETERR("Response timed out: md = %p: nid = %s\n", + md, libcfs_nid2str(nid)); LNetMDUnlink(rspt->rspt_mdh); lnet_rspt_free(rspt, i); + + /* If there is a timeout on the response + * from the next hop decrement its health + * value so that we don't use it + */ + lnet_net_lock(0); + lpni = lnet_find_peer_ni_locked(nid); + if (lpni) { + lnet_handle_remote_failure_locked(lpni); + lnet_peer_ni_decref_locked(lpni); + } + lnet_net_unlock(0); } else { lnet_res_unlock(i); break; diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 433401f..f626ca3 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -519,18 +519,13 @@ lnet_net_unlock(0); } -static void -lnet_handle_remote_failure(struct lnet_msg *msg) +void +lnet_handle_remote_failure_locked(struct lnet_peer_ni *lpni) { - struct lnet_peer_ni *lpni; - - lpni = msg->msg_txpeer; - /* lpni could be NULL if we're in the LOLND case */ if (!lpni) return; - lnet_net_lock(0); lnet_dec_healthv_locked(&lpni->lpni_healthv); /* add the peer NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's @@ -539,6 +534,17 @@ * invoke recovery */ lnet_peer_ni_add_to_recoveryq_locked(lpni); +} + +static void +lnet_handle_remote_failure(struct lnet_peer_ni *lpni) +{ + /* lpni could be NULL if we're in the LOLND case */ + if (!lpni) + return; + + lnet_net_lock(0); + lnet_handle_remote_failure_locked(lpni); lnet_net_unlock(0); } @@ -679,13 +685,13 @@ * attempt a resend safely. */ case LNET_MSG_STATUS_REMOTE_DROPPED: - lnet_handle_remote_failure(msg); + lnet_handle_remote_failure(msg->msg_txpeer); goto resend; case LNET_MSG_STATUS_REMOTE_ERROR: case LNET_MSG_STATUS_REMOTE_TIMEOUT: case LNET_MSG_STATUS_NETWORK_TIMEOUT: - lnet_handle_remote_failure(msg); + lnet_handle_remote_failure(msg->msg_txpeer); return -1; default: LBUG(); From patchwork Thu Feb 27 21:10:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409921 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9622138D for ; Thu, 27 Feb 2020 21:25:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B214B246A0 for ; Thu, 27 Feb 2020 21:25:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B214B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 243FC21FF40; Thu, 27 Feb 2020 13:23:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C659321FB3F for ; Thu, 27 Feb 2020 13:19:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1227C1E8A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1046146C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:20 -0500 Message-Id: <1582838290-17243-153-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 152/622] lustre: quota: fix setattr project check X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Similar patch motivated by upstream patch: ext4: fix setattr project check in fssetxattr ioctl Currently, project quota could be changed by fssetxattr ioctl, and existed permission check inode_owner_or_capable() is obviously not enough, just think that common users could change project id of file, that could make users to break project quota easily. This patch try to follow same regular of xfs project quota: "Project Quota ID state is only allowed to change from within the init namespace. Enforce that restriction only if we are trying to change the quota ID state. Everything else is allowed in user namespaces." WC-bug-id: https://jira.whamcloud.com/browse/LU-11101 Lustre-commit: 2d3bbce0c9f3 ("LU-11101 quota: fix setattr project check") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/32730 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 42 ++++++++++++++++++++++++++++++---------- fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/llite_lib.c | 9 +++++++++ 3 files changed, 42 insertions(+), 10 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 2fd906f..ed0470d 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2780,6 +2780,30 @@ int ll_ioctl_fsgetxattr(struct inode *inode, unsigned int cmd, return 0; } +int ll_ioctl_check_project(struct inode *inode, struct fsxattr *fa) +{ + /* + * Project Quota ID state is only allowed to change from within the init + * namespace. Enforce that restriction only if we are trying to change + * the quota ID state. Everything else is allowed in user namespaces. + */ + if (current_user_ns() == &init_user_ns) + return 0; + + if (ll_i2info(inode)->lli_projid != fa->fsx_projid) + return -EINVAL; + + if (test_bit(LLIF_PROJECT_INHERIT, &ll_i2info(inode)->lli_flags)) { + if (!(fa->fsx_xflags & FS_XFLAG_PROJINHERIT)) + return -EINVAL; + } else { + if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT) + return -EINVAL; + } + + return 0; +} + int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd, unsigned long arg) { @@ -2791,22 +2815,20 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd, int rc = 0; int flags; - /* only root could change project ID */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; + if (copy_from_user(&fsxattr, + (const struct fsxattr __user *)arg, + sizeof(fsxattr))) + return -EFAULT; + + rc = ll_ioctl_check_project(inode, &fsxattr); + if (rc) + return rc; op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, 0, LUSTRE_OPC_ANY, NULL); if (IS_ERR(op_data)) return PTR_ERR(op_data); - if (copy_from_user(&fsxattr, - (const struct fsxattr __user *)arg, - sizeof(fsxattr))) { - rc = -EFAULT; - goto out_fsxattr; - } - flags = ll_xflags_to_inode_flags(fsxattr.fsx_xflags); op_data->op_attr_flags = ll_inode_to_ext_flags(flags); if (fsxattr.fsx_xflags & FS_XFLAG_PROJINHERIT) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index edb5f2a..d6fc6a29 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -829,6 +829,7 @@ int ll_migrate(struct inode *parent, struct file *file, int ll_get_fid_by_name(struct inode *parent, const char *name, int namelen, struct lu_fid *fid, struct inode **inode); int ll_inode_permission(struct inode *inode, int mask); +int ll_ioctl_check_project(struct inode *inode, struct fsxattr *fa); int ll_ioctl_fsgetxattr(struct inode *inode, unsigned int cmd, unsigned long arg); int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index be67652..859fdf4 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2094,10 +2094,19 @@ int ll_iocontrol(struct inode *inode, struct file *file, struct md_op_data *op_data; struct cl_object *obj; struct iattr *attr; + struct fsxattr fa = { 0 }; if (get_user(flags, (int __user *)arg)) return -EFAULT; + fa.fsx_projid = ll_i2info(inode)->lli_projid; + if (flags & LUSTRE_PROJINHERIT_FL) + fa.fsx_xflags = FS_XFLAG_PROJINHERIT; + + rc = ll_ioctl_check_project(inode, &fa); + if (rc) + return rc; + op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, 0, LUSTRE_OPC_ANY, NULL); if (IS_ERR(op_data)) From patchwork Thu Feb 27 21:10:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409947 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C776814BC for ; Thu, 27 Feb 2020 21:26:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B0276246A0 for ; Thu, 27 Feb 2020 21:26:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B0276246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C87E21FDA6; Thu, 27 Feb 2020 13:23:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A2C021FBDB for ; Thu, 27 Feb 2020 13:19:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 15B841E8D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1363046D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:21 -0500 Message-Id: <1582838290-17243-154-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 153/622] lnet: socklnd: dynamically set LND parameters X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma Currently, the socklnd parameters cannot be set dynamically. Only the default values are set which cannot be changed by deleting and re-adding the net with DLC. This patch allows setting socklnd parameters dynamically. WC-bug-id: https://jira.whamcloud.com/browse/LU-11371 Lustre-commit: 1d94072c63f5 ("LU-11371 socklnd: dynamically set LND parameters") Signed-off-by: Sonia Sharma Reviewed-on: https://review.whamcloud.com/33191 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 72ecf80..ba5623a 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2723,6 +2723,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) ksocknal_startup(struct lnet_ni *ni) { struct ksock_net *net; + struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables; struct ksock_interface *ksi = NULL; struct lnet_inetdev *ifaces = NULL; int i = 0; @@ -2745,17 +2746,28 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) spin_lock_init(&net->ksnn_lock); net->ksnn_incarnation = ktime_get_real_ns(); ni->ni_data = net; - if (!ni->ni_net->net_tunables_set) { - ni->ni_net->net_tunables.lct_peer_timeout = + net_tunables = &ni->ni_net->net_tunables; + + if (net_tunables->lct_peer_timeout == -1) + net_tunables->lct_peer_timeout = *ksocknal_tunables.ksnd_peertimeout; - ni->ni_net->net_tunables.lct_max_tx_credits = + + if (net_tunables->lct_max_tx_credits == -1) + net_tunables->lct_max_tx_credits = *ksocknal_tunables.ksnd_credits; - ni->ni_net->net_tunables.lct_peer_tx_credits = + + if (net_tunables->lct_peer_tx_credits == -1) + net_tunables->lct_peer_tx_credits = *ksocknal_tunables.ksnd_peertxcredits; - ni->ni_net->net_tunables.lct_peer_rtr_credits = + + if (net_tunables->lct_peer_tx_credits > + net_tunables->lct_max_tx_credits) + net_tunables->lct_peer_tx_credits = + net_tunables->lct_max_tx_credits; + + if (net_tunables->lct_peer_rtr_credits == -1) + net_tunables->lct_peer_rtr_credits = *ksocknal_tunables.ksnd_peerrtrcredits; - ni->ni_net->net_tunables_set = true; - } rc = lnet_inet_enumerate(&ifaces); if (rc < 0) From patchwork Thu Feb 27 21:10:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409925 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D1DE138D for ; Thu, 27 Feb 2020 21:25:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15846246A0 for ; Thu, 27 Feb 2020 21:25:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15846246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6AC40348F1C; Thu, 27 Feb 2020 13:23:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6E79521FB2D for ; Thu, 27 Feb 2020 13:19:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 18E351E96; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1694346A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:22 -0500 Message-Id: <1582838290-17243-155-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 154/622] lustre: flr: add mirror write command X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam This change allows issuing a RESYNC lease write lock to notify MDS to prepare destination mirror for the write (instantiate components of the mirror), then client copy data from a file or STDIN to the specified mirror of the mirrored file. After the data copy, a RESYNC_DONE lease unlock is issued to MDS to update the layout of the mirrored file. WC-bug-id: https://jira.whamcloud.com/browse/LU-10258 Lustre-commit: 14171e787dd0 ("LU-10258 lfs: lfs mirror write command") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/33219 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 10 ++++++++-- fs/lustre/mdc/mdc_reint.c | 1 + fs/lustre/ptlrpc/pack_generic.c | 1 + fs/lustre/ptlrpc/wiretest.c | 16 ++++++++++++---- include/uapi/linux/lustre/lustre_idl.h | 6 ++++-- include/uapi/linux/lustre/lustre_user.h | 10 ++++++++++ 6 files changed, 36 insertions(+), 8 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index ed0470d..9de37d2 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1162,10 +1162,11 @@ static int ll_lease_close(struct obd_client_handle *och, struct inode *inode, * After lease is taken, send the RPC MDS_REINT_RESYNC to the MDT */ static int ll_lease_file_resync(struct obd_client_handle *och, - struct inode *inode) + struct inode *inode, unsigned long arg) { struct ll_sb_info *sbi = ll_i2sbi(inode); struct md_op_data *op_data; + struct ll_ioc_lease_id ioc; u64 data_version_unused; int rc; @@ -1174,6 +1175,10 @@ static int ll_lease_file_resync(struct obd_client_handle *och, if (IS_ERR(op_data)) return PTR_ERR(op_data); + if (copy_from_user(&ioc, (struct ll_ioc_lease_id __user *)arg, + sizeof(ioc))) + return -EFAULT; + /* before starting file resync, it's necessary to clean up page cache * in client memory, otherwise once the layout version is increased, * writing back cached data will be denied the OSTs. @@ -1183,6 +1188,7 @@ static int ll_lease_file_resync(struct obd_client_handle *och, goto out; op_data->op_lease_handle = och->och_lease_handle; + op_data->op_mirror_id = ioc.lil_mirror_id; rc = md_file_resync(sbi->ll_md_exp, op_data); if (rc) goto out; @@ -3048,7 +3054,7 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, return PTR_ERR(och); if (ioc->lil_flags & LL_LEASE_RESYNC) { - rc = ll_lease_file_resync(och, inode); + rc = ll_lease_file_resync(och, inode, arg); if (rc) { ll_lease_close(och, inode, NULL); return rc; diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 5d82449..062685c 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -455,6 +455,7 @@ int mdc_file_resync(struct obd_export *exp, struct md_op_data *op_data) rec->rs_cap = op_data->op_cap.cap[0]; rec->rs_fid = op_data->op_fid1; rec->rs_bias = op_data->op_bias; + rec->rs_mirror_id = op_data->op_mirror_id; lock = ldlm_handle2lock(&op_data->op_lease_handle); if (lock) { diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index d93dbe1..231cb26 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1917,6 +1917,7 @@ void lustre_swab_mdt_rec_reint (struct mdt_rec_reint *rr) __swab32s(&rr->rr_flags); __swab32s(&rr->rr_flags_h); __swab32s(&rr->rr_umask); + __swab16s(&rr->rr_mirror_id); BUILD_BUG_ON(offsetof(typeof(*rr), rr_padding_4) == 0); }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index f72e5fc..66dce80 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -2854,9 +2854,13 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct mdt_rec_resync, rs_padding8)); LASSERTF((int)sizeof(((struct mdt_rec_resync *)0)->rs_padding8) == 4, "found %lld\n", (long long)(int)sizeof(((struct mdt_rec_resync *)0)->rs_padding8)); - LASSERTF((int)offsetof(struct mdt_rec_resync, rs_padding9) == 132, "found %lld\n", + LASSERTF((int)offsetof(struct mdt_rec_resync, rs_mirror_id) == 132, "found %lld\n", + (long long)(int)offsetof(struct mdt_rec_resync, rs_mirror_id)); + LASSERTF((int)sizeof(((struct mdt_rec_resync *)0)->rs_mirror_id) == 2, "found %lld\n", + (long long)(int)sizeof(((struct mdt_rec_resync *)0)->rs_mirror_id)); + LASSERTF((int)offsetof(struct mdt_rec_resync, rs_padding9) == 134, "found %lld\n", (long long)(int)offsetof(struct mdt_rec_resync, rs_padding9)); - LASSERTF((int)sizeof(((struct mdt_rec_resync *)0)->rs_padding9) == 4, "found %lld\n", + LASSERTF((int)sizeof(((struct mdt_rec_resync *)0)->rs_padding9) == 2, "found %lld\n", (long long)(int)sizeof(((struct mdt_rec_resync *)0)->rs_padding9)); /* Checks for struct mdt_rec_reint */ @@ -2950,9 +2954,13 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct mdt_rec_reint, rr_umask)); LASSERTF((int)sizeof(((struct mdt_rec_reint *)0)->rr_umask) == 4, "found %lld\n", (long long)(int)sizeof(((struct mdt_rec_reint *)0)->rr_umask)); - LASSERTF((int)offsetof(struct mdt_rec_reint, rr_padding_4) == 132, "found %lld\n", + LASSERTF((int)offsetof(struct mdt_rec_reint, rr_mirror_id) == 132, "found %lld\n", + (long long)(int)offsetof(struct mdt_rec_reint, rr_mirror_id)); + LASSERTF((int)sizeof(((struct mdt_rec_reint *)0)->rr_mirror_id) == 2, "found %lld\n", + (long long)(int)sizeof(((struct mdt_rec_reint *)0)->rr_mirror_id)); + LASSERTF((int)offsetof(struct mdt_rec_reint, rr_padding_4) == 134, "found %lld\n", (long long)(int)offsetof(struct mdt_rec_reint, rr_padding_4)); - LASSERTF((int)sizeof(((struct mdt_rec_reint *)0)->rr_padding_4) == 4, "found %lld\n", + LASSERTF((int)sizeof(((struct mdt_rec_reint *)0)->rr_padding_4) == 2, "found %lld\n", (long long)(int)sizeof(((struct mdt_rec_reint *)0)->rr_padding_4)); /* Checks for struct lmv_desc */ diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index d46a921..8330fe1 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1876,7 +1876,8 @@ struct mdt_rec_resync { __u32 rs_padding6; /* rr_flags */ __u32 rs_padding7; /* rr_flags_h */ __u32 rs_padding8; /* rr_umask */ - __u32 rs_padding9; /* rr_padding_4 */ + __u16 rs_mirror_id; + __u16 rs_padding9; /* rr_padding_4 */ }; /* @@ -1910,7 +1911,8 @@ struct mdt_rec_reint { __u32 rr_flags; __u32 rr_flags_h; __u32 rr_umask; - __u32 rr_padding_4; /* also fix lustre_swab_mdt_rec_reint */ + __u16 rr_mirror_id; + __u16 rr_padding_4; /* also fix lustre_swab_mdt_rec_reint */ }; /* lmv structures */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index db751d8..5551cbf 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -277,6 +277,16 @@ struct ll_ioc_lease { __u32 lil_ids[0]; }; +struct ll_ioc_lease_id { + __u32 lil_mode; + __u32 lil_flags; + __u32 lil_count; + __u16 lil_mirror_id; + __u16 lil_padding1; + __u64 lil_padding2; + __u32 lil_ids[0]; +}; + /* * The ioctl naming rules: * LL_* - works on the currently opened filehandle instead of parent dir From patchwork Thu Feb 27 21:10:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409929 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 024E414BC for ; Thu, 27 Feb 2020 21:25:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DF018246A0 for ; Thu, 27 Feb 2020 21:25:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF018246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EA332348F48; Thu, 27 Feb 2020 13:23:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C465821FB2D for ; Thu, 27 Feb 2020 13:19:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1AE251EAF; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 19E2C468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:23 -0500 Message-Id: <1582838290-17243-156-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 155/622] lnet: properly error check sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Reject setting health sensitivity greater than the maximum health value. WC-bug-id: https://jira.whamcloud.com/browse/LU-11530 Lustre-commit: a5c1cd5ec240 ("LU-11530 lnet: properly error check sensitivity") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33392 Reviewed-by: Sonia Sharma Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 21e0175..a2c648e 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -175,9 +175,11 @@ static int lnet_discover(struct lnet_process_id id, u32 force, return 0; } - if (value == *sensitivity) { + if (value > LNET_MAX_HEALTH_VALUE) { mutex_unlock(&the_lnet.ln_api_mutex); - return 0; + CERROR("Invalid health value. Maximum: %d value = %lu\n", + LNET_MAX_HEALTH_VALUE, value); + return -EINVAL; } *sensitivity = value; From patchwork Thu Feb 27 21:10:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409951 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79DB314E3 for ; Thu, 27 Feb 2020 21:26:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 627FA246A0 for ; Thu, 27 Feb 2020 21:26:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 627FA246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C2EDF349026; Thu, 27 Feb 2020 13:23:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 162B121FB2D for ; Thu, 27 Feb 2020 13:19:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1E1C21EB0; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1C87C46F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:24 -0500 Message-Id: <1582838290-17243-157-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 156/622] lustre: llite: add lock for dir layout data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Directory layout data should be accessed with lock, because directory migration may change it, if it's accessed without lock, it may cause crash. Introduce an rw_semaphore 'lli_lsm_sem', any MD operation that uses directory layout data will take read lock, and ll_update_lsm_md() will take write lock when setting lsm. WC-bug-id: https://jira.whamcloud.com/browse/LU-4684 Lustre-commit: ae828cd3b092 ("LU-4684 llite: add lock for dir layout data") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/32946 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 16 ++++ fs/lustre/include/obd.h | 2 + fs/lustre/llite/dir.c | 29 +++---- fs/lustre/llite/file.c | 5 +- fs/lustre/llite/llite_internal.h | 3 + fs/lustre/llite/llite_lib.c | 168 ++++++++++++++++++++------------------- fs/lustre/llite/namei.c | 2 + fs/lustre/llite/statahead.c | 137 ++++++++++++++++--------------- fs/lustre/lmv/lmv_obd.c | 2 - 9 files changed, 199 insertions(+), 165 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index ff279e1..1246c25 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -81,6 +81,22 @@ struct lmv_stripe_md { return true; } +static inline void lsm_md_dump(int mask, const struct lmv_stripe_md *lsm) +{ + int i; + + CDEBUG(mask, + "magic %#x stripe count %d master mdt %d hash type %#x version %d migrate offset %d migrate hash %#x pool %s\n", + lsm->lsm_md_magic, lsm->lsm_md_stripe_count, + lsm->lsm_md_master_mdt_index, lsm->lsm_md_hash_type, + lsm->lsm_md_layout_version, lsm->lsm_md_migrate_offset, + lsm->lsm_md_migrate_hash, lsm->lsm_md_pool_name); + + for (i = 0; i < lsm->lsm_md_stripe_count; i++) + CDEBUG(mask, "stripe[%d] "DFID"\n", + i, PFID(&lsm->lsm_md_oinfo[i].lmo_fid)); +} + union lmv_mds_md; void lmv_free_memmd(struct lmv_stripe_md *lsm); diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 2587136..4829e11 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -741,6 +741,8 @@ struct md_op_data { s64 op_mod_time; const char *op_name; size_t op_namelen; + struct rw_semaphore *op_mea1_sem; + struct rw_semaphore *op_mea2_sem; struct lmv_stripe_md *op_mea1; struct lmv_stripe_md *op_mea2; u32 op_suppgids[2]; diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 55a1efb..3da9d14 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -298,6 +298,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) int hash64 = sbi->ll_flags & LL_SBI_64BIT_HASH; bool api32 = ll_need_32bit_api(sbi); struct md_op_data *op_data; + struct lu_fid pfid = { 0 }; int rc; CDEBUG(D_VFSTRACE, @@ -313,14 +314,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) goto out; } - op_data = ll_prep_md_op_data(NULL, inode, inode, NULL, 0, 0, - LUSTRE_OPC_ANY, inode); - if (IS_ERR(op_data)) { - rc = PTR_ERR(op_data); - goto out; - } - - if (unlikely(op_data->op_mea1)) { + if (unlikely(ll_i2info(inode)->lli_lsm_md)) { /* * This is only needed for striped dir to fill .., * see lmv_read_page @@ -332,21 +326,28 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) parent = file_dentry(filp)->d_parent->d_inode; if (ll_have_md_lock(parent, &ibits, LCK_MINMODE)) - op_data->op_fid3 = *ll_inode2fid(parent); + pfid = *ll_inode2fid(parent); } /* * If it can not find in cache, do lookup .. on the master * object */ - if (fid_is_zero(&op_data->op_fid3)) { - rc = ll_dir_get_parent_fid(inode, &op_data->op_fid3); - if (rc) { - ll_finish_md_op_data(op_data); + if (fid_is_zero(&pfid)) { + rc = ll_dir_get_parent_fid(inode, &pfid); + if (rc) return rc; - } } } + + op_data = ll_prep_md_op_data(NULL, inode, inode, NULL, 0, 0, + LUSTRE_OPC_ANY, inode); + if (IS_ERR(op_data)) { + rc = PTR_ERR(op_data); + goto out; + } + op_data->op_fid3 = pfid; + ctx->pos = pos; rc = ll_dir_read(inode, &pos, op_data, ctx); pos = ctx->pos; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 9de37d2..e1fba1c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4080,12 +4080,15 @@ static int ll_inode_revalidate(struct dentry *dentry, enum ldlm_intent_flags op) static int ll_merge_md_attr(struct inode *inode) { + struct ll_inode_info *lli = ll_i2info(inode); struct cl_attr attr = { 0 }; int rc; - LASSERT(ll_i2info(inode)->lli_lsm_md); + LASSERT(lli->lli_lsm_md); + down_read(&lli->lli_lsm_sem); rc = md_merge_attr(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md, &attr, ll_md_blocking_ast); + up_read(&lli->lli_lsm_sem); if (rc) return rc; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d6fc6a29..d41531b 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -168,6 +168,8 @@ struct ll_inode_info { unsigned int lli_sa_enabled:1; /* generation for statahead */ unsigned int lli_sa_generation; + /* rw lock protects lli_lsm_md */ + struct rw_semaphore lli_lsm_sem; /* directory stripe information */ struct lmv_stripe_md *lli_lsm_md; /* default directory stripe offset. This is extracted @@ -905,6 +907,7 @@ enum { LUSTRE_OPC_ANY = 5, }; +void ll_unlock_md_op_lsm(struct md_op_data *op_data); struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, struct inode *i1, struct inode *i2, const char *name, size_t namelen, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 859fdf4..ed2d1c6 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -933,6 +933,7 @@ void ll_lli_init(struct ll_inode_info *lli) lli->lli_opendir_pid = 0; lli->lli_sa_enabled = 0; lli->lli_def_stripe_offset = -1; + init_rwsem(&lli->lli_lsm_sem); } else { mutex_init(&lli->lli_size_mutex); lli->lli_symlink_name = NULL; @@ -1237,10 +1238,17 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb, static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) { struct lmv_stripe_md *lsm = md->lmv; + struct ll_inode_info *lli = ll_i2info(inode); struct lu_fid *fid; int i; LASSERT(lsm); + + CDEBUG(D_INODE, "%s: "DFID" set dir layout:\n", + ll_get_fsname(inode->i_sb, NULL, 0), + PFID(&lli->lli_fid)); + lsm_md_dump(D_INODE, lsm); + /* * XXX sigh, this lsm_root initialization should be in * LMV layer, but it needs ll_iget right now, so we @@ -1260,10 +1268,16 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root); lsm->lsm_md_oinfo[i].lmo_root = NULL; + while (i-- > 0) { + iput(lsm->lsm_md_oinfo[i].lmo_root); + lsm->lsm_md_oinfo[i].lmo_root = NULL; + } return rc; } } + lli->lli_lsm_md = lsm; + return 0; } @@ -1271,7 +1285,7 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) { struct ll_inode_info *lli = ll_i2info(inode); struct lmv_stripe_md *lsm = md->lmv; - int rc; + int rc = 0; LASSERT(S_ISDIR(inode->i_mode)); CDEBUG(D_INODE, "update lsm %p of " DFID "\n", lli->lli_lsm_md, @@ -1284,53 +1298,43 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) if (!lsm) return 0; - /* Compare the old and new stripe information */ + /* + * normally dir layout doesn't change, only take read lock to check + * that to avoid blocking other MD operations. + */ + if (lli->lli_lsm_md) + down_read(&lli->lli_lsm_sem); + else + down_write(&lli->lli_lsm_sem); + + /* + * if dir layout mismatch, check whether version is increased, which + * means layout is changed, this happens in dir migration and lfsck. + */ if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { - struct lmv_stripe_md *old_lsm = lli->lli_lsm_md; - bool layout_changed = lsm->lsm_md_layout_version > - old_lsm->lsm_md_layout_version; - int mask = layout_changed ? D_INODE : D_ERROR; - int idx; - - CDEBUG(mask, - "%s: inode@%p "DFID" lmv layout %s magic %#x/%#x stripe count %d/%d master_mdt %d/%d hash_type %#x/%#x version %d/%d migrate offset %d/%d migrate hash %#x/%#x pool %s/%s\n", - ll_get_fsname(inode->i_sb, NULL, 0), inode, - PFID(&lli->lli_fid), - layout_changed ? "changed" : "mismatch", - lsm->lsm_md_magic, old_lsm->lsm_md_magic, - lsm->lsm_md_stripe_count, - old_lsm->lsm_md_stripe_count, - lsm->lsm_md_master_mdt_index, - old_lsm->lsm_md_master_mdt_index, - lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type, - lsm->lsm_md_layout_version, - old_lsm->lsm_md_layout_version, - lsm->lsm_md_migrate_offset, - old_lsm->lsm_md_migrate_offset, - lsm->lsm_md_migrate_hash, - old_lsm->lsm_md_migrate_hash, - lsm->lsm_md_pool_name, - old_lsm->lsm_md_pool_name); - - for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) - CDEBUG(mask, "old stripe[%d] "DFID"\n", - idx, PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid)); - - for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) - CDEBUG(mask, "new stripe[%d] "DFID"\n", - idx, PFID(&lsm->lsm_md_oinfo[idx].lmo_fid)); - - if (!layout_changed) - return -EINVAL; + if (lsm->lsm_md_layout_version <= + lli->lli_lsm_md->lsm_md_layout_version) { + CERROR("%s: " DFID " dir layout mismatch:\n", + ll_get_fsname(inode->i_sb, NULL, 0), + PFID(&lli->lli_fid)); + lsm_md_dump(D_ERROR, lli->lli_lsm_md); + lsm_md_dump(D_ERROR, lsm); + rc = -EINVAL; + goto unlock; + } + /* layout changed, switch to write lock */ + up_read(&lli->lli_lsm_sem); + down_write(&lli->lli_lsm_sem); ll_dir_clear_lsm_md(inode); } - /* set the directory layout */ + /* set directory layout */ if (!lli->lli_lsm_md) { struct cl_attr *attr; rc = ll_init_lsm_md(inode, md); + up_write(&lli->lli_lsm_sem); if (rc) return rc; @@ -1339,18 +1343,25 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) * will not free this lsm */ md->lmv = NULL; - lli->lli_lsm_md = lsm; + + /* + * md_merge_attr() may take long, since lsm is already set, + * switch to read lock. + */ + down_read(&lli->lli_lsm_sem); attr = kzalloc(sizeof(*attr), GFP_NOFS); - if (!attr) - return -ENOMEM; + if (!attr) { + rc = -ENOMEM; + goto unlock; + } /* validate the lsm */ rc = md_merge_attr(ll_i2mdexp(inode), lsm, attr, ll_md_blocking_ast); if (rc) { kfree(attr); - return rc; + goto unlock; } if (md->body->mbo_valid & OBD_MD_FLNLINK) @@ -1365,47 +1376,11 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) md->body->mbo_mtime = attr->cat_mtime; kfree(attr); - - CDEBUG(D_INODE, "Set lsm %p magic %x to " DFID "\n", lsm, - lsm->lsm_md_magic, PFID(ll_inode2fid(inode))); - return 0; } +unlock: + up_read(&lli->lli_lsm_sem); - /* Compare the old and new stripe information */ - if (!lsm_md_eq(lli->lli_lsm_md, lsm)) { - struct lmv_stripe_md *old_lsm = lli->lli_lsm_md; - int idx; - - CERROR("%s: inode " DFID "(%p)'s lmv layout mismatch (%p)/(%p) magic:0x%x/0x%x stripe count: %d/%d master_mdt: %d/%d hash_type:0x%x/0x%x layout: 0x%x/0x%x pool:%s/%s\n", - ll_get_fsname(inode->i_sb, NULL, 0), PFID(&lli->lli_fid), - inode, lsm, old_lsm, - lsm->lsm_md_magic, old_lsm->lsm_md_magic, - lsm->lsm_md_stripe_count, - old_lsm->lsm_md_stripe_count, - lsm->lsm_md_master_mdt_index, - old_lsm->lsm_md_master_mdt_index, - lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type, - lsm->lsm_md_layout_version, - old_lsm->lsm_md_layout_version, - lsm->lsm_md_pool_name, - old_lsm->lsm_md_pool_name); - - for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) { - CERROR("%s: sub FIDs in old lsm idx %d, old: " DFID "\n", - ll_get_fsname(inode->i_sb, NULL, 0), idx, - PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid)); - } - - for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) { - CERROR("%s: sub FIDs in new lsm idx %d, new: " DFID "\n", - ll_get_fsname(inode->i_sb, NULL, 0), idx, - PFID(&lsm->lsm_md_oinfo[idx].lmo_fid)); - } - - return -EIO; - } - - return 0; + return rc; } void ll_clear_inode(struct inode *inode) @@ -2417,6 +2392,23 @@ int ll_obd_statfs(struct inode *inode, void __user *arg) return rc; } +/* + * this is normally called in ll_fini_md_op_data(), but sometimes it needs to + * be called early to avoid deadlock. + */ +void ll_unlock_md_op_lsm(struct md_op_data *op_data) +{ + if (op_data->op_mea2_sem) { + up_read(op_data->op_mea2_sem); + op_data->op_mea2_sem = NULL; + } + + if (op_data->op_mea1_sem) { + up_read(op_data->op_mea1_sem); + op_data->op_mea1_sem = NULL; + } +} + /* this function prepares md_op_data hint for passing ot down to MD stack. */ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, struct inode *i1, struct inode *i2, @@ -2444,7 +2436,10 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, ll_i2gids(op_data->op_suppgids, i1, i2); op_data->op_fid1 = *ll_inode2fid(i1); op_data->op_default_stripe_offset = -1; + if (S_ISDIR(i1->i_mode)) { + down_read(&ll_i2info(i1)->lli_lsm_sem); + op_data->op_mea1_sem = &ll_i2info(i1)->lli_lsm_sem; op_data->op_mea1 = ll_i2info(i1)->lli_lsm_md; if (opc == LUSTRE_OPC_MKDIR) op_data->op_default_stripe_offset = @@ -2453,8 +2448,14 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, if (i2) { op_data->op_fid2 = *ll_inode2fid(i2); - if (S_ISDIR(i2->i_mode)) + if (S_ISDIR(i2->i_mode)) { + if (i2 != i1) { + down_read(&ll_i2info(i2)->lli_lsm_sem); + op_data->op_mea2_sem = + &ll_i2info(i2)->lli_lsm_sem; + } op_data->op_mea2 = ll_i2info(i2)->lli_lsm_md; + } } else { fid_zero(&op_data->op_fid2); } @@ -2483,6 +2484,7 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, void ll_finish_md_op_data(struct md_op_data *op_data) { + ll_unlock_md_op_lsm(op_data); security_release_secctx(op_data->op_file_secctx, op_data->op_file_secctx_size); kfree(op_data); diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 530c2df..3e3fbd9 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -777,6 +777,8 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, goto out; } + /* dir layout may change */ + ll_unlock_md_op_lsm(op_data); rc = ll_lookup_it_finish(req, it, parent, &dentry); if (rc != 0) { ll_intent_release(it); diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 122b9d8..1de62b5 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -332,6 +332,58 @@ static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, return (index == sai->sai_index_wait); } +/* finish async stat RPC arguments */ +static void sa_fini_data(struct md_enqueue_info *minfo) +{ + ll_unlock_md_op_lsm(&minfo->mi_data); + iput(minfo->mi_dir); + kfree(minfo); +} + +static int ll_statahead_interpret(struct ptlrpc_request *req, + struct md_enqueue_info *minfo, int rc); + +/* + * prepare arguments for async stat RPC. + */ +static struct md_enqueue_info * +sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry) +{ + struct md_enqueue_info *minfo; + struct ldlm_enqueue_info *einfo; + struct md_op_data *op_data; + + minfo = kzalloc(sizeof(*minfo), GFP_NOFS); + if (!minfo) + return ERR_PTR(-ENOMEM); + + op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child, + entry->se_qstr.name, entry->se_qstr.len, 0, + LUSTRE_OPC_ANY, NULL); + if (IS_ERR(op_data)) { + kfree(minfo); + return (struct md_enqueue_info *)op_data; + } + + if (!child) + op_data->op_fid2 = entry->se_fid; + + minfo->mi_it.it_op = IT_GETATTR; + minfo->mi_dir = igrab(dir); + minfo->mi_cb = ll_statahead_interpret; + minfo->mi_cbdata = entry; + + einfo = &minfo->mi_einfo; + einfo->ei_type = LDLM_IBITS; + einfo->ei_mode = it_to_lock_mode(&minfo->mi_it); + einfo->ei_cb_bl = ll_md_blocking_ast; + einfo->ei_cb_cp = ldlm_completion_ast; + einfo->ei_cb_gl = NULL; + einfo->ei_cbdata = NULL; + + return minfo; +} + /* * release resources used in async stat RPC, update entry state and wakeup if * scanner process it waiting on this entry. @@ -348,8 +400,7 @@ static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, if (minfo) { entry->se_minfo = NULL; ll_intent_release(&minfo->mi_it); - iput(minfo->mi_dir); - kfree(minfo); + sa_fini_data(minfo); } if (req) { @@ -685,17 +736,16 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, if (rc) { ll_intent_release(it); - iput(dir); - kfree(minfo); + sa_fini_data(minfo); } else { - /* - * release ibits lock ASAP to avoid deadlock when statahead + /* release ibits lock ASAP to avoid deadlock when statahead * thread enqueues lock on parent in readdir and another * process enqueues lock on child with parent lock held, eg. * unlink. */ handle = it->it_lock_handle; ll_intent_drop_lock(it); + ll_unlock_md_op_lsm(&minfo->mi_data); } spin_lock(&lli->lli_sa_lock); @@ -729,54 +779,6 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, return rc; } -/* finish async stat RPC arguments */ -static void sa_fini_data(struct md_enqueue_info *minfo) -{ - iput(minfo->mi_dir); - kfree(minfo); -} - -/** - * prepare arguments for async stat RPC. - */ -static struct md_enqueue_info * -sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry) -{ - struct md_enqueue_info *minfo; - struct ldlm_enqueue_info *einfo; - struct md_op_data *op_data; - - minfo = kzalloc(sizeof(*minfo), GFP_NOFS); - if (!minfo) - return ERR_PTR(-ENOMEM); - - op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child, - entry->se_qstr.name, entry->se_qstr.len, 0, - LUSTRE_OPC_ANY, NULL); - if (IS_ERR(op_data)) { - kfree(minfo); - return (struct md_enqueue_info *)op_data; - } - - if (!child) - op_data->op_fid2 = entry->se_fid; - - minfo->mi_it.it_op = IT_GETATTR; - minfo->mi_dir = igrab(dir); - minfo->mi_cb = ll_statahead_interpret; - minfo->mi_cbdata = entry; - - einfo = &minfo->mi_einfo; - einfo->ei_type = LDLM_IBITS; - einfo->ei_mode = it_to_lock_mode(&minfo->mi_it); - einfo->ei_cb_bl = ll_md_blocking_ast; - einfo->ei_cb_cp = ldlm_completion_ast; - einfo->ei_cb_gl = NULL; - einfo->ei_cbdata = NULL; - - return minfo; -} - /* async stat for file not found in dcache */ static int sa_lookup(struct inode *dir, struct sa_entry *entry) { @@ -818,22 +820,20 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, if (d_mountpoint(dentry)) return 1; + minfo = sa_prep_data(dir, inode, entry); + if (IS_ERR(minfo)) + return PTR_ERR(minfo); + entry->se_inode = igrab(inode); rc = md_revalidate_lock(ll_i2mdexp(dir), &it, ll_inode2fid(inode), NULL); if (rc == 1) { entry->se_handle = it.it_lock_handle; ll_intent_release(&it); + sa_fini_data(minfo); return 1; } - minfo = sa_prep_data(dir, inode, entry); - if (IS_ERR(minfo)) { - entry->se_inode = NULL; - iput(inode); - return PTR_ERR(minfo); - } - rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo); if (rc) { entry->se_inode = NULL; @@ -982,10 +982,9 @@ static int ll_statahead_thread(void *arg) CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n", sai, parent); - op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0, - LUSTRE_OPC_ANY, dir); - if (IS_ERR(op_data)) { - rc = PTR_ERR(op_data); + op_data = kzalloc(sizeof(*op_data), GFP_NOFS); + if (!op_data) { + rc = -ENOMEM; goto out; } @@ -993,8 +992,16 @@ static int ll_statahead_thread(void *arg) struct lu_dirpage *dp; struct lu_dirent *ent; + op_data = ll_prep_md_op_data(op_data, dir, dir, NULL, 0, 0, + LUSTRE_OPC_ANY, dir); + if (IS_ERR(op_data)) { + rc = PTR_ERR(op_data); + break; + } + sai->sai_in_readpage = 1; page = ll_get_dir_page(dir, op_data, pos); + ll_unlock_md_op_lsm(op_data); sai->sai_in_readpage = 0; if (IS_ERR(page)) { rc = PTR_ERR(page); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 81b86a0..e98f33d 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1901,8 +1901,6 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, int rc; LASSERT(op_data->op_cli_flags & CLI_MIGRATE); - LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n", - PFID(&op_data->op_fid3)); CDEBUG(D_INODE, "MIGRATE "DFID"/%.*s\n", PFID(&op_data->op_fid1), (int)namelen, name); From patchwork Thu Feb 27 21:10:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410237 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40B7C92A for ; Thu, 27 Feb 2020 21:33:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2986224677 for ; Thu, 27 Feb 2020 21:33:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2986224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9F4BE34927B; Thu, 27 Feb 2020 13:28:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DF9721FB2D for ; Thu, 27 Feb 2020 13:19:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 21BA81EB2; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1F69B46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:25 -0500 Message-Id: <1582838290-17243-158-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 157/622] lnet: configure recovery interval X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added a module parameter to configure the interval between each recovery ping. Some sites might not want to ping failed NIDs once a second and might desire a longer interval. The interval defaults to 1 second. Monitor thread now wakes up depending on the smallest interval it needs to monitor WC-bug-id: https://jira.whamcloud.com/browse/LU-11468 Lustre-commit: dc1f5f08b420 ("LU-11468 lnet: configure recovery interval") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33309 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/api-ni.c | 52 +++++++++++++++++++++++++++++++++++++++++++ net/lnet/lnet/lib-move.c | 24 +++++++++++++------- 3 files changed, 69 insertions(+), 8 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index ecacd65..26095a6 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -502,6 +502,7 @@ struct lnet_ni * extern unsigned int lnet_retry_count; extern unsigned int lnet_numa_range; extern unsigned int lnet_health_sensitivity; +extern unsigned int lnet_recovery_interval; extern unsigned int lnet_peer_discovery_disabled; extern int portal_rotor; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index a2c648e..c4f698d 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -95,6 +95,23 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_health_sensitivity, "Value to decrement the health value by on error"); +/* lnet_recovery_interval determines how often we should perform recovery + * on unhealthy interfaces. + */ +unsigned int lnet_recovery_interval = 1; +static int recovery_interval_set(const char *val, + const struct kernel_param *kp); +static struct kernel_param_ops param_ops_recovery_interval = { + .set = recovery_interval_set, + .get = param_get_int, +}; + +#define param_check_recovery_interval(name, p) \ + __param_check(name, p, int) +module_param(lnet_recovery_interval, recovery_interval, 0644); +MODULE_PARM_DESC(lnet_recovery_interval, + "Interval to recover unhealthy interfaces in seconds"); + static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT; static int intf_max_set(const char *val, const struct kernel_param *kp); module_param_call(lnet_interfaces_max, intf_max_set, param_get_int, @@ -190,6 +207,41 @@ static int lnet_discover(struct lnet_process_id id, u32 force, } static int +recovery_interval_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *interval = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_recovery_interval'\n"); + return rc; + } + + if (value < 1) { + CERROR("lnet_recovery_interval must be at least 1 second\n"); + return -EINVAL; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *interval = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int discovery_set(const char *val, const struct kernel_param *kp) { int rc; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 548ea88..434aa09 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3074,7 +3074,10 @@ struct lnet_mt_event_info { static int lnet_monitor_thread(void *arg) { - int wakeup_counter = 0; + time64_t recovery_timeout = 0; + time64_t rsp_timeout = 0; + int interval; + time64_t now; /* The monitor thread takes care of the following: * 1. Checks the aliveness of routers @@ -3086,20 +3089,23 @@ struct lnet_mt_event_info { * and pings them. */ while (the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING) { + now = ktime_get_real_seconds(); + if (lnet_router_checker_active()) lnet_check_routers(); lnet_resend_pending_msgs(); - wakeup_counter++; - if (wakeup_counter >= lnet_transaction_timeout / 2) { + if (now >= rsp_timeout) { lnet_finalize_expired_responses(false); - wakeup_counter = 0; + rsp_timeout = now + (lnet_transaction_timeout / 2); } - lnet_recover_local_nis(); - - lnet_recover_peer_nis(); + if (now >= recovery_timeout) { + lnet_recover_local_nis(); + lnet_recover_peer_nis(); + recovery_timeout = now + lnet_recovery_interval; + } /* TODO do we need to check if we should sleep without * timeout? Technically, an active system will always @@ -3109,8 +3115,10 @@ struct lnet_mt_event_info { * cases where we get a complaint that an idle thread * is waking up unnecessarily. */ + interval = min(lnet_recovery_interval, + lnet_transaction_timeout / 2); wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, - false, HZ); + false, HZ * interval); } /* clean up the router checker */ From patchwork Thu Feb 27 21:10:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409955 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB0CF14E3 for ; Thu, 27 Feb 2020 21:26:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D38B7246A0 for ; Thu, 27 Feb 2020 21:26:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D38B7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5707134904E; Thu, 27 Feb 2020 13:23:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C526421FBEF for ; Thu, 27 Feb 2020 13:19:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 237F91EB3; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 224D346D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:26 -0500 Message-Id: <1582838290-17243-159-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 158/622] lustre: osc: Do not walk full extent list X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell It is only possible to merge with the extent immediately before or immediately after the one we are trying to add, so do not continue to walk the extent list after passing that extent. This has a significant impact when writing large sparse files, where most writes create a new extent, and many extents are too distant to be merged with their neighbors. Writing 2 GiB of data randomly 4K at a time, we see an improvement of about 15% with this patch. mpirun -n 1 $IOR -w -t 4K -b 2G -o ./file -z w/o patch: write 285.86 MiB/s w/patch: write 324.03 MiB/s Cray-bug-id: LUS-6523 WC-bug-id: https://jira.whamcloud.com/browse/LU-11423 Lustre-commit: 7f8143cf85b7 ("LU-11423 osc: Do not walk full extent list") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/33227 Reviewed-by: Jinshan Xiong Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 2ed7ca2..961fc6bf 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -746,7 +746,7 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env, pgoff_t ext_chk_end = ext->oe_end >> ppc_bits; LASSERT(osc_extent_sanity_check_nolock(ext) == 0); - if (chunk > ext_chk_end + 1) + if (chunk > ext_chk_end + 1 || chunk < ext_chk_start) break; /* if covering by different locks, no chance to match */ From patchwork Thu Feb 27 21:10:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409933 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9417414BC for ; Thu, 27 Feb 2020 21:26:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7CD30246A0 for ; Thu, 27 Feb 2020 21:26:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7CD30246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8750C21FC24; Thu, 27 Feb 2020 13:23:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1531521FB5F for ; Thu, 27 Feb 2020 13:19:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 269E2223F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 252F346A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:27 -0500 Message-Id: <1582838290-17243-160-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 159/622] lnet: separate ni state from recovery X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata To make the code more readable we make the ni_state an enumerated type, and create a separate bit filed to track the recovery state. Both of these are protected by the lnet_ni_lock() WC-bug-id: https://jira.whamcloud.com/browse/LU-11514 Lustre-commit: 2be10428ac22 ("LU-11514 lnet: separate ni state from recovery") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33361 Reviewed-by: Sonia Sharma Reviewed-by: Doug Oucharek Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 24 ++++++++++++++++-------- net/lnet/lnet/api-ni.c | 8 +++----- net/lnet/lnet/config.c | 2 +- net/lnet/lnet/lib-move.c | 23 +++++++++++++---------- 4 files changed, 33 insertions(+), 24 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index ce0caa9..b1a6f6a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -315,12 +315,17 @@ struct lnet_tx_queue { struct list_head tq_delayed; /* delayed TXs */ }; -#define LNET_NI_STATE_INIT (1 << 0) -#define LNET_NI_STATE_ACTIVE (1 << 1) -#define LNET_NI_STATE_FAILED (1 << 2) -#define LNET_NI_STATE_RECOVERY_PENDING (1 << 3) -#define LNET_NI_STATE_RECOVERY_FAILED BIT(4) -#define LNET_NI_STATE_DELETING BIT(5) +enum lnet_ni_state { + /* initial state when NI is created */ + LNET_NI_STATE_INIT = 0, + /* set when NI is brought up */ + LNET_NI_STATE_ACTIVE, + /* set when NI is being shutdown */ + LNET_NI_STATE_DELETING, +}; + +#define LNET_NI_RECOVERY_PENDING BIT(0) +#define LNET_NI_RECOVERY_FAILED BIT(1) enum lnet_stats_type { LNET_STATS_TYPE_SEND = 0, @@ -435,8 +440,11 @@ struct lnet_ni { /* my health status */ struct lnet_ni_status *ni_status; - /* NI FSM */ - u32 ni_state; + /* NI FSM. Protected by lnet_ni_lock() */ + enum lnet_ni_state ni_state; + + /* Recovery state. Protected by lnet_ni_lock() */ + u32 ni_recovery_state; /* per NI LND tunables */ struct lnet_lnd_tunables ni_lnd_tunables; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index c4f698d..25592db 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1823,7 +1823,7 @@ static void lnet_push_target_fini(void) list_del_init(&ni->ni_netlist); /* the ni should be in deleting state. If it's not it's * a bug */ - LASSERT(ni->ni_state & LNET_NI_STATE_DELETING); + LASSERT(ni->ni_state == LNET_NI_STATE_DELETING); cfs_percpt_for_each(ref, j, ni->ni_refs) { if (!*ref) continue; @@ -1871,8 +1871,7 @@ static void lnet_push_target_fini(void) lnet_net_lock(LNET_LOCK_EX); lnet_ni_lock(ni); - ni->ni_state |= LNET_NI_STATE_DELETING; - ni->ni_state &= ~LNET_NI_STATE_ACTIVE; + ni->ni_state = LNET_NI_STATE_DELETING; lnet_ni_unlock(ni); lnet_ni_unlink_locked(ni); lnet_incr_dlc_seq(); @@ -2005,8 +2004,7 @@ static void lnet_push_target_fini(void) } lnet_ni_lock(ni); - ni->ni_state |= LNET_NI_STATE_ACTIVE; - ni->ni_state &= ~LNET_NI_STATE_INIT; + ni->ni_state = LNET_NI_STATE_ACTIVE; lnet_ni_unlock(ni); /* We keep a reference on the loopback net through the loopback NI */ diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index ea62d36..5e0831a 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -467,7 +467,7 @@ struct lnet_net * ni->ni_net_ns = NULL; ni->ni_last_alive = ktime_get_real_seconds(); - ni->ni_state |= LNET_NI_STATE_INIT; + ni->ni_state = LNET_NI_STATE_INIT; list_add_tail(&ni->ni_netlist, &net->net_ni_added); /* diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 434aa09..eacda4c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2651,7 +2651,8 @@ struct lnet_mt_event_info { LNetInvalidateMDHandle(&recovery_mdh); - if (ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING || force) { + if (ni->ni_recovery_state & LNET_NI_RECOVERY_PENDING || + force) { recovery_mdh = ni->ni_ping_mdh; LNetInvalidateMDHandle(&ni->ni_ping_mdh); } @@ -2702,7 +2703,7 @@ struct lnet_mt_event_info { lnet_net_lock(0); lnet_ni_lock(ni); - if (!(ni->ni_state & LNET_NI_STATE_ACTIVE) || + if (ni->ni_state != LNET_NI_STATE_ACTIVE || healthv == LNET_MAX_HEALTH_VALUE) { list_del_init(&ni->ni_recovery); lnet_unlink_ni_recovery_mdh_locked(ni, 0, false); @@ -2716,9 +2717,9 @@ struct lnet_mt_event_info { * But we want to keep the local_ni on the recovery queue * so we can continue the attempts to recover it. */ - if (ni->ni_state & LNET_NI_STATE_RECOVERY_FAILED) { + if (ni->ni_recovery_state & LNET_NI_RECOVERY_FAILED) { lnet_unlink_ni_recovery_mdh_locked(ni, 0, true); - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_FAILED; + ni->ni_recovery_state &= ~LNET_NI_RECOVERY_FAILED; } lnet_ni_unlock(ni); @@ -2728,8 +2729,8 @@ struct lnet_mt_event_info { libcfs_nid2str(ni->ni_nid)); lnet_ni_lock(ni); - if (!(ni->ni_state & LNET_NI_STATE_RECOVERY_PENDING)) { - ni->ni_state |= LNET_NI_STATE_RECOVERY_PENDING; + if (!(ni->ni_recovery_state & LNET_NI_RECOVERY_PENDING)) { + ni->ni_recovery_state |= LNET_NI_RECOVERY_PENDING; lnet_ni_unlock(ni); ev_info = kzalloc(sizeof(*ev_info), GFP_NOFS); @@ -2737,7 +2738,8 @@ struct lnet_mt_event_info { CERROR("out of memory. Can't recover %s\n", libcfs_nid2str(ni->ni_nid)); lnet_ni_lock(ni); - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + ni->ni_recovery_state &= + ~LNET_NI_RECOVERY_PENDING; lnet_ni_unlock(ni); continue; } @@ -2806,7 +2808,8 @@ struct lnet_mt_event_info { lnet_ni_lock(ni); if (rc) - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + ni->ni_recovery_state &= + ~LNET_NI_RECOVERY_PENDING; } lnet_ni_unlock(ni); } @@ -3210,9 +3213,9 @@ struct lnet_mt_event_info { return; } lnet_ni_lock(ni); - ni->ni_state &= ~LNET_NI_STATE_RECOVERY_PENDING; + ni->ni_recovery_state &= ~LNET_NI_RECOVERY_PENDING; if (status) - ni->ni_state |= LNET_NI_STATE_RECOVERY_FAILED; + ni->ni_recovery_state |= LNET_NI_RECOVERY_FAILED; lnet_ni_unlock(ni); lnet_net_unlock(0); From patchwork Thu Feb 27 21:10:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410019 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E821E1580 for ; Thu, 27 Feb 2020 21:28:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D05B9246A0 for ; Thu, 27 Feb 2020 21:28:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D05B9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70EB934926C; Thu, 27 Feb 2020 13:24:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BB0A21FB5F for ; Thu, 27 Feb 2020 13:19:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 29ADA2240; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 285A2468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:28 -0500 Message-Id: <1582838290-17243-161-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 160/622] lustre: mdc: move empty xattr handling to mdc layer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" Extract duplicated logic around empty xattr handling from several places in llite and consolidate it in mdc_getxattr(). WC-bug-id: https://jira.whamcloud.com/browse/LU-11380 Lustre-commit: 0f42b388432c ("LU-11380 mdc: move empty xattr handling to mdc layer") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/33198 Reviewed-by: Mike Pershin Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 16 +++++-------- fs/lustre/llite/xattr.c | 44 ++++------------------------------ fs/lustre/mdc/mdc_request.c | 57 ++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 65 insertions(+), 52 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index e1fba1c..246d5de 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4391,7 +4391,6 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock) { struct ll_sb_info *sbi = ll_i2sbi(inode); struct ptlrpc_request *req; - struct mdt_body *body; void *lvbdata; void *lmm; int lmmsize; @@ -4411,19 +4410,16 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock) * completion AST because it doesn't have a large enough buffer */ rc = ll_get_default_mdsize(sbi, &lmmsize); - if (rc == 0) - rc = md_getxattr(sbi->ll_md_exp, ll_inode2fid(inode), - OBD_MD_FLXATTR, XATTR_NAME_LOV, lmmsize, &req); if (rc < 0) return rc; - body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); - if (!body) { - rc = -EPROTO; - goto out; - } + rc = md_getxattr(sbi->ll_md_exp, ll_inode2fid(inode), OBD_MD_FLXATTR, + XATTR_NAME_LOV, lmmsize, &req); + if (rc < 0) + return rc; - lmmsize = body->mbo_eadatasize; + lmmsize = rc; + rc = 0; if (lmmsize == 0) /* empty layout */ { rc = 0; goto out; diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 636334e..948aaf6 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -326,7 +326,6 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, struct ll_inode_info *lli = ll_i2info(inode); struct ll_sb_info *sbi = ll_i2sbi(inode); struct ptlrpc_request *req = NULL; - struct mdt_body *body; void *xdata; int rc; @@ -358,57 +357,24 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, if (rc < 0) goto out_xattr; - body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); - LASSERT(body); - /* only detect the xattr size */ - if (size == 0) { - /* LU-11109: Older MDTs do not distinguish - * between nonexistent xattrs and zero length - * values in this case. Newer MDTs will return - * -ENODATA or set OBD_MD_FLXATTR. - */ - rc = body->mbo_eadatasize; + if (size == 0) goto out; - } - if (size < body->mbo_eadatasize) { - CERROR("server bug: replied size %u > %u\n", - body->mbo_eadatasize, (int)size); + if (size < rc) { rc = -ERANGE; goto out; } - if (body->mbo_eadatasize == 0) { - /* LU-11109: Newer MDTs set OBD_MD_FLXATTR on - * success so that we can distinguish between - * zero length value and nonexistent xattr. - * - * If OBD_MD_FLXATTR is not set then we keep - * the old behavior and return -ENODATA for - * getxattr() when mbo_eadatasize is 0. But - * -ENODATA only makes sense for getxattr() - * and not for listxattr(). - */ - if (body->mbo_valid & OBD_MD_FLXATTR) - rc = 0; - else if (valid == OBD_MD_FLXATTR) - rc = -ENODATA; - else - rc = 0; - goto out; - } - /* do not need swab xattr data */ xdata = req_capsule_server_sized_get(&req->rq_pill, &RMF_EADATA, - body->mbo_eadatasize); + rc); if (!xdata) { - rc = -EFAULT; + rc = -EPROTO; goto out; } - memcpy(buffer, xdata, body->mbo_eadatasize); - rc = body->mbo_eadatasize; + memcpy(buffer, xdata, rc); } out_xattr: diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 5cc1e1f..6934e57 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -432,12 +432,63 @@ static int mdc_getxattr(struct obd_export *exp, const struct lu_fid *fid, u64 obd_md_valid, const char *name, size_t buf_size, struct ptlrpc_request **req) { + struct mdt_body *body; + int rc; + LASSERT(obd_md_valid == OBD_MD_FLXATTR || obd_md_valid == OBD_MD_FLXATTRLS); - return mdc_xattr_common(exp, &RQF_MDS_GETXATTR, fid, MDS_GETXATTR, - obd_md_valid, name, NULL, 0, buf_size, 0, -1, - req); + rc = mdc_xattr_common(exp, &RQF_MDS_GETXATTR, fid, MDS_GETXATTR, + obd_md_valid, name, NULL, 0, buf_size, 0, -1, + req); + if (rc < 0) + goto out; + + body = req_capsule_server_get(&(*req)->rq_pill, &RMF_MDT_BODY); + if (!body) { + rc = -EPROTO; + goto out; + } + + /* only detect the xattr size */ + if (buf_size == 0) { + /* LU-11109: Older MDTs do not distinguish + * between nonexistent xattrs and zero length + * values in this case. Newer MDTs will return + * -ENODATA or set OBD_MD_FLXATTR. + */ + rc = body->mbo_eadatasize; + goto out; + } + + if (body->mbo_eadatasize == 0) { + /* LU-11109: Newer MDTs set OBD_MD_FLXATTR on + * success so that we can distinguish between + * zero length value and nonexistent xattr. + * + * If OBD_MD_FLXATTR is not set then we keep + * the old behavior and return -ENODATA for + * getxattr() when mbo_eadatasize is 0. But + * -ENODATA only makes sense for getxattr() + * and not for listxattr(). + */ + if (body->mbo_valid & OBD_MD_FLXATTR) + rc = 0; + else if (obd_md_valid == OBD_MD_FLXATTR) + rc = -ENODATA; + else + rc = 0; + goto out; + } + + rc = body->mbo_eadatasize; +out: + if (rc < 0) { + ptlrpc_req_finished(*req); + *req = NULL; + } + + return rc; } #ifdef CONFIG_LUSTRE_FS_POSIX_ACL From patchwork Thu Feb 27 21:10:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F99D1871 for ; Thu, 27 Feb 2020 21:26:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7808F246A0 for ; Thu, 27 Feb 2020 21:26:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7808F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DBDE4348A6B; Thu, 27 Feb 2020 13:23:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C12EA21FBFC for ; Thu, 27 Feb 2020 13:19:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2D6782241; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2BBA246F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:29 -0500 Message-Id: <1582838290-17243-162-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 161/622] lustre: obd: remove portals handle from OBD import X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" OBD imports are never looked up using the portals handle (imp_handle) they contain, so remove it. Also remove the unused functions class_conn2obd() and class_conn2cliimp(). WC-bug-id: https://jira.whamcloud.com/browse/LU-11445 Lustre-commit: 59729e4c0867 ("LU-11445 obd: remove portals handle from OBD import") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/33250 Reviewed-by: Mike Pershin Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 10 +++++++--- fs/lustre/obdclass/genops.c | 21 +++------------------ 2 files changed, 10 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 1fd6246..f16d621 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -43,9 +43,15 @@ * * @{ */ +#include +#include +#include +#include +#include +#include +#include #include -#include #include /** @@ -154,8 +160,6 @@ struct import_state_hist { * Imports are representing client-side view to remote target. */ struct obd_import { - /** Local handle (== id) for this import. */ - struct portals_handle imp_handle; /** Reference counter */ atomic_t imp_refcount; struct lustre_handle imp_dlm_handle; /* client's ldlm export */ diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 4465dd9..2254943 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -863,7 +863,6 @@ static struct obd_export *__class_new_export(struct obd_device *obd, exit_unlock: spin_unlock(&obd->obd_dev_lock); - class_handle_unhash(&export->exp_handle); obd_destroy_export(export); kfree(export); return ERR_PTR(rc); @@ -903,7 +902,7 @@ void class_unlink_export(struct obd_export *exp) } /* Import management functions */ -static void class_import_destroy(struct obd_import *imp) +static void obd_zombie_import_free(struct obd_import *imp) { struct obd_import_conn *imp_conn; @@ -924,19 +923,9 @@ static void class_import_destroy(struct obd_import *imp) LASSERT(!imp->imp_sec); class_decref(imp->imp_obd, "import", imp); - OBD_FREE_RCU(imp, sizeof(*imp), &imp->imp_handle); + kfree(imp); } -static void import_handle_addref(void *import) -{ - class_import_get(import); -} - -static struct portals_handle_ops import_handle_ops = { - .hop_addref = import_handle_addref, - .hop_free = NULL, -}; - struct obd_import *class_import_get(struct obd_import *import) { atomic_inc(&import->imp_refcount); @@ -985,7 +974,7 @@ static void obd_zombie_imp_cull(struct work_struct *ws) struct obd_import *import = container_of(ws, struct obd_import, imp_zombie_work); - class_import_destroy(import); + obd_zombie_import_free(import); } struct obd_import *class_new_import(struct obd_device *obd) @@ -1018,8 +1007,6 @@ struct obd_import *class_new_import(struct obd_device *obd) atomic_set(&imp->imp_replay_inflight, 0); atomic_set(&imp->imp_inval_count, 0); INIT_LIST_HEAD(&imp->imp_conn_list); - INIT_LIST_HEAD_RCU(&imp->imp_handle.h_link); - class_handle_hash(&imp->imp_handle, &import_handle_ops); init_imp_at(&imp->imp_at); /* the default magic is V2, will be used in connect RPC, and @@ -1036,8 +1023,6 @@ void class_destroy_import(struct obd_import *import) LASSERT(import); LASSERT(import != LP_POISON); - class_handle_unhash(&import->imp_handle); - spin_lock(&import->imp_lock); import->imp_generation++; spin_unlock(&import->imp_lock); From patchwork Thu Feb 27 21:10:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409937 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16B5B138D for ; Thu, 27 Feb 2020 21:26:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F34BC246A0 for ; Thu, 27 Feb 2020 21:26:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F34BC246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3DDBE348FA8; Thu, 27 Feb 2020 13:23:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2240121FB65 for ; Thu, 27 Feb 2020 13:19:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2FC8E2242; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2EA0046C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:30 -0500 Message-Id: <1582838290-17243-163-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 162/622] lustre: mgc: restore mgc binding for sptlrpc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The work for LU-9034 mapped config logs to separate mgc devices. This change prevented the ability to configure sptlrpc. A later work around was introduced in LU-9567. Recently it was reported that the work around introduced can now cause a MGC failover panic. This patch is the proper fix in that the sptlrpc is properly bound to an mgc device. The sptlrpc config record expects 2 pieces of data: * [0]: fs_name/target_name, * [1]: rule string What was happening is that when you set cfg_instance it was used to create a new instance name of the form fsname-%p. For sptlrpc it expects it to only be fsname. The solution is to test if the config record is for sptlrpc and in that can keep the first record field as is. With this change we can drop cfg_obdname which only sptlrpc used. WC-bug-id: https://jira.whamcloud.com/browse/LU-10937 Lustre-commit: ca9300e53dc2 ("LU-10937 mgc: restore mgc binding for sptlrpc") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33311 Reviewed-by: Sebastien Buisson Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 1 - fs/lustre/mgc/mgc_request.c | 7 +------ fs/lustre/obdclass/obd_config.c | 5 ++++- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 742e92a..434bb79 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -166,7 +166,6 @@ int class_config_llog_handler(const struct lu_env *env, /* Passed as data param to class_config_parse_llog */ struct config_llog_instance { - char *cfg_obdname; void *cfg_instance; struct super_block *cfg_sb; struct obd_uuid cfg_uuid; diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index 785461b..5bfa1b7 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -224,10 +224,8 @@ struct config_llog_data *do_config_log_add(struct obd_device *obd, /* Keep the mgc around until we are done */ cld->cld_mgcexp = class_export_get(obd->obd_self_export); - if (cld_is_sptlrpc(cld)) { + if (cld_is_sptlrpc(cld)) sptlrpc_conf_log_start(logname); - cld->cld_cfg.cfg_obdname = obd->obd_name; - } spin_lock(&config_list_lock); list_add(&cld->cld_list_chain, &config_llog_list); @@ -273,9 +271,6 @@ struct config_llog_data *do_config_log_add(struct obd_device *obd, lcfg.cfg_instance = sb ? (void *)sb : (void *)obd; - if (type == CONFIG_T_SPTLRPC) - lcfg.cfg_instance = NULL; - cld = config_log_find(logname, &lcfg); if (unlikely(cld)) return cld; diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c index 550cee0..398f888 100644 --- a/fs/lustre/obdclass/obd_config.c +++ b/fs/lustre/obdclass/obd_config.c @@ -1357,6 +1357,7 @@ int class_config_llog_handler(const struct lu_env *env, lustre_cfg_bufs_init(&bufs, lcfg); if (clli && clli->cfg_instance && + lcfg->lcfg_command != LCFG_SPTLRPC_CONF && LUSTRE_CFG_BUFLEN(lcfg, 0) > 0) { inst_len = LUSTRE_CFG_BUFLEN(lcfg, 0) + sizeof(clli->cfg_instance) * 2 + 4; @@ -1389,12 +1390,14 @@ int class_config_llog_handler(const struct lu_env *env, */ if (clli && !clli->cfg_instance && lcfg->lcfg_command == LCFG_SPTLRPC_CONF) { + struct obd_device *obd = clli->cfg_instance; + lustre_cfg_bufs_set(&bufs, 2, bufs.lcfg_buf[1], bufs.lcfg_buflen[1]); lustre_cfg_bufs_set(&bufs, 1, bufs.lcfg_buf[0], bufs.lcfg_buflen[0]); lustre_cfg_bufs_set_string(&bufs, 0, - clli->cfg_obdname); + obd->obd_name); } /* Add net info to setup command From patchwork Thu Feb 27 21:10:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409941 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9C4714BC for ; Thu, 27 Feb 2020 21:26:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9290B246A0 for ; Thu, 27 Feb 2020 21:26:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9290B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E3AA234893C; Thu, 27 Feb 2020 13:23:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7895A21FB2A for ; Thu, 27 Feb 2020 13:19:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 327AA2376; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3171346D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:31 -0500 Message-Id: <1582838290-17243-164-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 163/622] lnet: peer deletion code may hide error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma lnet_peer_ni_del_locked might return -EBUSY if the NID to be deleted is a gateway. Check for the return value of lnet_peer_ni_del_locked in lnet_peer_del_nid. WC-bug-id: https://jira.whamcloud.com/browse/LU-10876 Lustre-commit: a3b6109705dc ("LU-10876 lnet: peer deletion code may hide error") Signed-off-by: Sonia Sharma Reviewed-on: https://review.whamcloud.com/31861 Reviewed-by: Dmitry Eremin Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 2fc5dfc..24a5cd3 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -494,7 +494,9 @@ void lnet_peer_uninit(void) } lnet_net_lock(LNET_LOCK_EX); - lnet_peer_ni_del_locked(lpni); + + rc = lnet_peer_ni_del_locked(lpni); + lnet_net_unlock(LNET_LOCK_EX); out: From patchwork Thu Feb 27 21:10:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 142C414BC for ; Thu, 27 Feb 2020 21:26:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F0D59246A0 for ; Thu, 27 Feb 2020 21:26:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F0D59246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B96A349098; Thu, 27 Feb 2020 13:23:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B938F21FB2A for ; Thu, 27 Feb 2020 13:19:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 353902377; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 345AA468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:32 -0500 Message-Id: <1582838290-17243-165-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 164/622] lustre: hsm: make changelog flag argument an enum X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Since the changelog record flag is being stored on disk, pass it around as an enum instead of a signed int. Also make it clear at the caller that only the low 12 bits of the flag are normally being stored in the changelog records, since this isn't obvious to the reader. For open and close records, the bottom 32 bits of open flags are recorded. WC-bug-id: https://jira.whamcloud.com/browse/LU-10030 Lustre-commit: 2496089a0017 ("LU-10030 hsm: make changelog flag argument an enum") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32112 Reviewed-by: Sebastien Buisson Reviewed-by: John L. Hammond Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 34 ++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 5551cbf..3bd6fc7 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1019,16 +1019,17 @@ static inline const char *changelog_type2str(int type) return NULL; } -/* per-record flags */ +/* 12 bits of per-record data can be stored in the bottom of the flags */ #define CLF_FLAGSHIFT 12 -#define CLF_FLAGMASK ((1U << CLF_FLAGSHIFT) - 1) -#define CLF_VERMASK (~CLF_FLAGMASK) enum changelog_rec_flags { CLF_VERSION = 0x1000, CLF_RENAME = 0x2000, CLF_JOBID = 0x4000, CLF_EXTRA_FLAGS = 0x8000, - CLF_SUPPORTED = CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS + CLF_SUPPORTED = CLF_VERSION | CLF_RENAME | CLF_JOBID | + CLF_EXTRA_FLAGS, + CLF_FLAGMASK = (1U << CLF_FLAGSHIFT) - 1, + CLF_VERMASK = ~CLF_FLAGMASK, }; /* Anything under the flagmask may be per-type (if desired) */ @@ -1089,29 +1090,32 @@ static inline enum hsm_event hsm_get_cl_event(__u16 flags) return CLF_GET_BITS(flags, CLF_HSM_EVENT_H, CLF_HSM_EVENT_L); } -static inline void hsm_set_cl_event(int *flags, enum hsm_event he) +static inline void hsm_set_cl_event(enum changelog_rec_flags *clf_flags, + enum hsm_event he) { - *flags |= (he << CLF_HSM_EVENT_L); + *clf_flags |= (he << CLF_HSM_EVENT_L); } -static inline __u16 hsm_get_cl_flags(int flags) +static inline __u16 hsm_get_cl_flags(enum changelog_rec_flags clf_flags) { - return CLF_GET_BITS(flags, CLF_HSM_FLAG_H, CLF_HSM_FLAG_L); + return CLF_GET_BITS(clf_flags, CLF_HSM_FLAG_H, CLF_HSM_FLAG_L); } -static inline void hsm_set_cl_flags(int *flags, int bits) +static inline void hsm_set_cl_flags(enum changelog_rec_flags *clf_flags, + unsigned int bits) { - *flags |= (bits << CLF_HSM_FLAG_L); + *clf_flags |= (bits << CLF_HSM_FLAG_L); } -static inline int hsm_get_cl_error(int flags) +static inline int hsm_get_cl_error(enum changelog_rec_flags clf_flags) { - return CLF_GET_BITS(flags, CLF_HSM_ERR_H, CLF_HSM_ERR_L); + return CLF_GET_BITS(clf_flags, CLF_HSM_ERR_H, CLF_HSM_ERR_L); } -static inline void hsm_set_cl_error(int *flags, int error) +static inline void hsm_set_cl_error(enum changelog_rec_flags *clf_flags, + unsigned int error) { - *flags |= (error << CLF_HSM_ERR_L); + *clf_flags |= (error << CLF_HSM_ERR_L); } enum changelog_rec_extra_flags { @@ -1198,7 +1202,7 @@ struct changelog_ext_nid { __u32 padding; }; -/* Changelog extra extension to include OPEN mode. */ +/* Changelog extra extension to include low 32 bits of MDS_OPEN_* flags. */ struct changelog_ext_openmode { __u32 cr_openflags; }; From patchwork Thu Feb 27 21:10:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409967 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAD111871 for ; Thu, 27 Feb 2020 21:26:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 937C0246A0 for ; Thu, 27 Feb 2020 21:26:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 937C0246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 885323490BF; Thu, 27 Feb 2020 13:23:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C86121FC16 for ; Thu, 27 Feb 2020 13:19:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 389ED2378; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3747746A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:33 -0500 Message-Id: <1582838290-17243-166-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 165/622] lustre: ldlm: don't skip bl_ast for local lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin During downgrade to COS the lock renews own blocking AST states and start reprocessing. Any new lock conflict will cause new blocking AST and related async commit as needed. For the linux client we can remove server specific code. WC-bug-id: https://jira.whamcloud.com/browse/LU-11102 Lustre-commit: 75a417fa0065 ("LU-11102 ldlm: don't skip bl_ast for local lock") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33458 Reviewed-by: Vitaly Fertman Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lock.c | 130 ++++----------------------------------------- 1 file changed, 9 insertions(+), 121 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 869d664..b9771ef 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -595,9 +595,15 @@ static void ldlm_add_bl_work_item(struct ldlm_lock *lock, struct ldlm_lock *new, */ if (ldlm_is_ast_discard_data(new)) ldlm_set_discard_data(lock); - LASSERT(list_empty(&lock->l_bl_ast)); - list_add(&lock->l_bl_ast, work_list); - LDLM_LOCK_GET(lock); + /* Lock can be converted from a blocking state back to granted + * after lock convert or COS downgrade but still be in an + * older bl_list because it is controlled only by + * ldlm_work_bl_ast_lock(), let it be processed there. + */ + if (list_empty(&lock->l_bl_ast)) { + list_add(&lock->l_bl_ast, work_list); + LDLM_LOCK_GET(lock); + } LASSERT(!lock->l_blocking_lock); lock->l_blocking_lock = LDLM_LOCK_GET(new); } @@ -1624,47 +1630,6 @@ enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, } /** - * Process a call to blocking AST callback for a lock in ast_work list - */ -static int -ldlm_work_bl_ast_lock(struct ptlrpc_request_set *rqset, void *opaq) -{ - struct ldlm_cb_set_arg *arg = opaq; - struct ldlm_lock_desc d; - int rc; - struct ldlm_lock *lock; - - if (list_empty(arg->list)) - return -ENOENT; - - lock = list_first_entry(arg->list, struct ldlm_lock, l_bl_ast); - - LASSERT(lock->l_blocking_lock); - ldlm_lock2desc(lock->l_blocking_lock, &d); - /* copy blocking lock ibits in cancel_bits as well, - * new client may use them for lock convert and it is - * important to use new field to convert locks from - * new servers only - */ - d.l_policy_data.l_inodebits.cancel_bits = - lock->l_blocking_lock->l_policy_data.l_inodebits.bits; - - /* nobody should touch l_bl_ast */ - lock_res_and_lock(lock); - list_del_init(&lock->l_bl_ast); - - LASSERT(ldlm_is_ast_sent(lock)); - LASSERT(lock->l_bl_ast_run == 0); - lock->l_bl_ast_run++; - unlock_res_and_lock(lock); - - rc = lock->l_blocking_ast(lock, &d, (void *)arg, LDLM_CB_BLOCKING); - LDLM_LOCK_RELEASE(lock); - - return rc; -} - -/** * Process a call to completion AST callback for a lock in ast_work list */ static int @@ -1711,71 +1676,6 @@ enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, } /** - * Process a call to revocation AST callback for a lock in ast_work list - */ -static int -ldlm_work_revoke_ast_lock(struct ptlrpc_request_set *rqset, void *opaq) -{ - struct ldlm_cb_set_arg *arg = opaq; - struct ldlm_lock_desc desc; - int rc; - struct ldlm_lock *lock; - - if (list_empty(arg->list)) - return -ENOENT; - - lock = list_first_entry(arg->list, struct ldlm_lock, l_rk_ast); - list_del_init(&lock->l_rk_ast); - - /* the desc just pretend to exclusive */ - ldlm_lock2desc(lock, &desc); - desc.l_req_mode = LCK_EX; - desc.l_granted_mode = 0; - - rc = lock->l_blocking_ast(lock, &desc, (void *)arg, LDLM_CB_BLOCKING); - LDLM_LOCK_RELEASE(lock); - - return rc; -} - -/** - * Process a call to glimpse AST callback for a lock in ast_work list - */ -static int ldlm_work_gl_ast_lock(struct ptlrpc_request_set *rqset, void *opaq) -{ - struct ldlm_cb_set_arg *arg = opaq; - struct ldlm_glimpse_work *gl_work; - struct ldlm_lock *lock; - int rc = 0; - - if (list_empty(arg->list)) - return -ENOENT; - - gl_work = list_first_entry(arg->list, struct ldlm_glimpse_work, - gl_list); - list_del_init(&gl_work->gl_list); - - lock = gl_work->gl_lock; - - /* transfer the glimpse descriptor to ldlm_cb_set_arg */ - arg->gl_desc = gl_work->gl_desc; - - /* invoke the actual glimpse callback */ - if (lock->l_glimpse_ast(lock, (void *)arg) == 0) - rc = 1; - - LDLM_LOCK_RELEASE(lock); - - if (gl_work->gl_flags & LDLM_GL_WORK_SLAB_ALLOCATED) - kmem_cache_free(ldlm_glimpse_work_kmem, gl_work); - else - kfree(gl_work); - gl_work = NULL; - - return rc; -} - -/** * Process list of locks in need of ASTs being sent. * * Used on server to send multiple ASTs together instead of sending one by @@ -1799,22 +1699,10 @@ int ldlm_run_ast_work(struct ldlm_namespace *ns, struct list_head *rpc_list, arg->list = rpc_list; switch (ast_type) { - case LDLM_WORK_BL_AST: - arg->type = LDLM_BL_CALLBACK; - work_ast_lock = ldlm_work_bl_ast_lock; - break; case LDLM_WORK_CP_AST: arg->type = LDLM_CP_CALLBACK; work_ast_lock = ldlm_work_cp_ast_lock; break; - case LDLM_WORK_REVOKE_AST: - arg->type = LDLM_BL_CALLBACK; - work_ast_lock = ldlm_work_revoke_ast_lock; - break; - case LDLM_WORK_GL_AST: - arg->type = LDLM_GL_CALLBACK; - work_ast_lock = ldlm_work_gl_ast_lock; - break; default: LBUG(); } From patchwork Thu Feb 27 21:10:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409971 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C47C1871 for ; Thu, 27 Feb 2020 21:26:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 45695246A1 for ; Thu, 27 Feb 2020 21:26:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45695246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 358F5348AC3; Thu, 27 Feb 2020 13:23:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 723FE21FC19 for ; Thu, 27 Feb 2020 13:19:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3B7CA2379; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3A50A46F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:34 -0500 Message-Id: <1582838290-17243-167-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 166/622] lustre: clio: use pagevec_release for many pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang When Lustre releases cached pages, it always uses page_release, even when releasing many pages. When clearing OST ldlm lock lrus in parallel with lots of cached data, the ldlm_bl threads spend most of their time contending for the zone lock taken by page_release. Also, when osc_lru_reclaim kicks in when there's not enough LRU slots during I/O, the contention on zone lock kills I/O performance. Switching to pagevec when we expect to actually release the pages (discard_pages, truncate, lru reclaim) brings significant performance benefits as shown below. This patch introduces cl_pagevec_put() to release the pages in batches using pagevec, which is essentially calling release_pages(). mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o ... -i 1 [-B] mode write (GB/s) read (GB/s) master O_DIRECT 20.8 21.8 master+patch O_DIRECT 20.7 22.2 master Buffered 11.6 12.3 master+patch Buffered 15.3 19.6 Also clean up the dead lovsub_page related code. WC-bug-id: https://jira.whamcloud.com/browse/LU-9906 Lustre-commit: b4a959eb61bc ("LU-9906 clio: use pagevec_release for many pages") Signed-off-by: Patrick Farrell Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/28667 Reviewed-by: Andreas Dilger Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 7 ++++- fs/lustre/include/lustre_osc.h | 1 + fs/lustre/llite/vvp_page.c | 19 ++++++++---- fs/lustre/lov/Makefile | 2 +- fs/lustre/lov/lov_cl_internal.h | 13 -------- fs/lustre/lov/lovsub_page.c | 68 ----------------------------------------- fs/lustre/obdclass/cl_page.c | 36 +++++++++++++++------- fs/lustre/obdecho/echo_client.c | 3 +- fs/lustre/osc/osc_cache.c | 14 +++++++-- fs/lustre/osc/osc_page.c | 5 ++- 10 files changed, 64 insertions(+), 104 deletions(-) delete mode 100644 fs/lustre/lov/lovsub_page.c diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index c96a5b7..3337bbf 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -95,6 +95,7 @@ #include #include #include +#include struct inode; @@ -896,7 +897,8 @@ struct cl_page_operations { const struct cl_page_slice *slice); /** Destructor. Frees resources and slice itself. */ void (*cpo_fini)(const struct lu_env *env, - struct cl_page_slice *slice); + struct cl_page_slice *slice, + struct pagevec *pvec); /** * Optional debugging helper. Prints given page slice. * @@ -2147,6 +2149,9 @@ struct cl_page *cl_page_alloc(const struct lu_env *env, enum cl_page_type type); void cl_page_get(struct cl_page *page); void cl_page_put(const struct lu_env *env, struct cl_page *page); +void cl_pagevec_put(const struct lu_env *env, + struct cl_page *page, + struct pagevec *pvec); void cl_page_print(const struct lu_env *env, void *cookie, lu_printer_t printer, const struct cl_page *pg); void cl_page_header_print(const struct lu_env *env, void *cookie, diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index dabcee0..aa3d4c3 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -179,6 +179,7 @@ struct osc_thread_info { struct lustre_handle oti_handle; struct cl_page_list oti_plist; struct cl_io oti_io; + struct pagevec oti_pagevec; void *oti_pvec[OTI_PVEC_SIZE]; /* * Fields used by cl_lock_discard_pages(). diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index 78a70b5..bd4ec85 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -54,16 +54,22 @@ * */ -static void vvp_page_fini_common(struct vvp_page *vpg) +static void vvp_page_fini_common(struct vvp_page *vpg, struct pagevec *pvec) { struct page *vmpage = vpg->vpg_page; LASSERT(vmpage); - put_page(vmpage); + if (pvec) { + if (!pagevec_add(pvec, vmpage)) + pagevec_release(pvec); + } else { + put_page(vmpage); + } } static void vvp_page_fini(const struct lu_env *env, - struct cl_page_slice *slice) + struct cl_page_slice *slice, + struct pagevec *pvec) { struct vvp_page *vpg = cl2vvp_page(slice); struct page *vmpage = vpg->vpg_page; @@ -73,7 +79,7 @@ static void vvp_page_fini(const struct lu_env *env, * VPG_FREEING state. */ LASSERT((struct cl_page *)vmpage->private != slice->cpl_page); - vvp_page_fini_common(vpg); + vvp_page_fini_common(vpg, pvec); } static int vvp_page_own(const struct lu_env *env, @@ -471,13 +477,14 @@ static int vvp_transient_page_is_vmlocked(const struct lu_env *env, } static void vvp_transient_page_fini(const struct lu_env *env, - struct cl_page_slice *slice) + struct cl_page_slice *slice, + struct pagevec *pvec) { struct vvp_page *vpg = cl2vvp_page(slice); struct cl_page *clp = slice->cpl_page; struct vvp_object *clobj = cl2vvp(clp->cp_obj); - vvp_page_fini_common(vpg); + vvp_page_fini_common(vpg, pvec); atomic_dec(&clobj->vob_transient_pages); } diff --git a/fs/lustre/lov/Makefile b/fs/lustre/lov/Makefile index abdaac0..2f0b761 100644 --- a/fs/lustre/lov/Makefile +++ b/fs/lustre/lov/Makefile @@ -4,5 +4,5 @@ ccflags-y += -I$(srctree)/$(src)/../include obj-$(CONFIG_LUSTRE_FS) += lov.o lov-y := lov_obd.o lov_pack.o lov_offset.o lov_merge.o \ lov_request.o lov_ea.o lov_dev.o lov_object.o lov_page.o \ - lov_lock.o lov_io.o lovsub_dev.o lovsub_object.o lovsub_page.o \ + lov_lock.o lov_io.o lovsub_dev.o lovsub_object.o \ lov_pool.o lproc_lov.o diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index 875af37..e14567d 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -466,10 +466,6 @@ struct lov_sublock_env { struct cl_io *lse_io; }; -struct lovsub_page { - struct cl_page_slice lsb_cl; -}; - struct lov_thread_info { struct cl_object_conf lti_stripe_conf; struct lu_fid lti_fid; @@ -626,8 +622,6 @@ struct lov_io_sub *lov_sub_get(const struct lu_env *env, struct lov_io *lio, int lov_page_init(const struct lu_env *env, struct cl_object *ob, struct cl_page *page, pgoff_t index); -int lovsub_page_init(const struct lu_env *env, struct cl_object *ob, - struct cl_page *page, pgoff_t index); int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj, struct cl_page *page, pgoff_t index); int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj, @@ -782,13 +776,6 @@ static inline struct lov_page *cl2lov_page(const struct cl_page_slice *slice) return container_of(slice, struct lov_page, lps_cl); } -static inline struct lovsub_page * -cl2lovsub_page(const struct cl_page_slice *slice) -{ - LINVRNT(lovsub_is_object(&slice->cpl_obj->co_lu)); - return container_of(slice, struct lovsub_page, lsb_cl); -} - static inline struct lov_io *cl2lov_io(const struct lu_env *env, const struct cl_io_slice *ios) { diff --git a/fs/lustre/lov/lovsub_page.c b/fs/lustre/lov/lovsub_page.c deleted file mode 100644 index a8aa583..0000000 --- a/fs/lustre/lov/lovsub_page.c +++ /dev/null @@ -1,68 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved. - * Use is subject to license terms. - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * Lustre is a trademark of Sun Microsystems, Inc. - * - * Implementation of cl_page for LOVSUB layer. - * - * Author: Nikita Danilov - */ - -#define DEBUG_SUBSYSTEM S_LOV - -#include "lov_cl_internal.h" - -/** \addtogroup lov - * @{ - */ - -/***************************************************************************** - * - * Lovsub page operations. - * - */ - -static void lovsub_page_fini(const struct lu_env *env, - struct cl_page_slice *slice) -{ -} - -static const struct cl_page_operations lovsub_page_ops = { - .cpo_fini = lovsub_page_fini -}; - -int lovsub_page_init(const struct lu_env *env, struct cl_object *obj, - struct cl_page *page, pgoff_t index) -{ - struct lovsub_page *lsb = cl_object_page_slice(obj, page); - - cl_page_slice_add(page, &lsb->lsb_cl, obj, index, &lovsub_page_ops); - return 0; -} - -/** @} lov */ diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 8dbd312..3076f8c 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -90,7 +90,8 @@ static void cl_page_get_trust(struct cl_page *page) return NULL; } -static void cl_page_free(const struct lu_env *env, struct cl_page *page) +static void cl_page_free(const struct lu_env *env, struct cl_page *page, + struct pagevec *pvec) { struct cl_object *obj = page->cp_obj; struct cl_page_slice *slice; @@ -104,7 +105,7 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *page) cpl_linkage)) != NULL) { list_del_init(page->cp_layers.next); if (unlikely(slice->cpl_ops->cpo_fini)) - slice->cpl_ops->cpo_fini(env, slice); + slice->cpl_ops->cpo_fini(env, slice, pvec); } lu_object_ref_del_at(&obj->co_lu, &page->cp_obj_ref, "cl_page", page); cl_object_put(env, obj); @@ -152,7 +153,7 @@ struct cl_page *cl_page_alloc(const struct lu_env *env, page, ind); if (result != 0) { __cl_page_delete(env, page); - cl_page_free(env, page); + cl_page_free(env, page, NULL); page = ERR_PTR(result); break; } @@ -299,15 +300,13 @@ void cl_page_get(struct cl_page *page) EXPORT_SYMBOL(cl_page_get); /** - * Releases a reference to a page. + * Releases a reference to a page, use the pagevec to release the pages + * in batch if provided. * - * When last reference is released, page is returned to the cache, unless it - * is in cl_page_state::CPS_FREEING state, in which case it is immediately - * destroyed. - * - * \see cl_object_put(), cl_lock_put(). + * Users need to do a final pagevec_release() to release any trailing pages. */ -void cl_page_put(const struct lu_env *env, struct cl_page *page) +void cl_pagevec_put(const struct lu_env *env, struct cl_page *page, + struct pagevec *pvec) { CL_PAGE_HEADER(D_TRACE, env, page, "%d\n", refcount_read(&page->cp_ref)); @@ -322,9 +321,24 @@ void cl_page_put(const struct lu_env *env, struct cl_page *page) * Page is no longer reachable by other threads. Tear * it down. */ - cl_page_free(env, page); + cl_page_free(env, page, pvec); } } +EXPORT_SYMBOL(cl_pagevec_put); + +/** + * Releases a reference to a page, wrapper to cl_pagevec_put + * + * When last reference is released, page is returned to the cache, unless it + * is in cl_page_state::CPS_FREEING state, in which case it is immediately + * destroyed. + * + * \see cl_object_put(), cl_lock_put(). + */ +void cl_page_put(const struct lu_env *env, struct cl_page *page) +{ + cl_pagevec_put(env, page, NULL); +} EXPORT_SYMBOL(cl_page_put); /** diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 0735a5a..5ac4519 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -259,7 +259,8 @@ static void echo_page_completion(const struct lu_env *env, } static void echo_page_fini(const struct lu_env *env, - struct cl_page_slice *slice) + struct cl_page_slice *slice, + struct pagevec *pvec) { struct echo_object *eco = cl2echo_obj(slice->cpl_obj); diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 961fc6bf..47aee99 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -985,6 +985,7 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index, struct client_obd *cli = osc_cli(obj); struct osc_async_page *oap; struct osc_async_page *tmp; + struct pagevec *pvec; int pages_in_chunk = 0; int ppc_bits = cli->cl_chunkbits - PAGE_SHIFT; u64 trunc_chunk = trunc_index >> ppc_bits; @@ -1008,6 +1009,8 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index, io = osc_env_thread_io(env); io->ci_obj = cl_object_top(osc2cl(obj)); io->ci_ignore_layout = 1; + pvec = &osc_env_info(env)->oti_pagevec; + pagevec_init(pvec); rc = cl_io_init(env, io, CIT_MISC, io->ci_obj); if (rc < 0) goto out; @@ -1046,11 +1049,13 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index, } lu_ref_del(&page->cp_reference, "truncate", current); - cl_page_put(env, page); + cl_pagevec_put(env, page, pvec); --ext->oe_nr_pages; ++nr_pages; } + pagevec_release(pvec); + EASSERTF(ergo(ext->oe_start >= trunc_index + !!partial, ext->oe_nr_pages == 0), ext, "trunc_index %lu, partial %d\n", trunc_index, partial); @@ -3030,6 +3035,7 @@ bool osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io, osc_page_gang_cbt cb, void *cbdata) { struct osc_page *ops; + struct pagevec *pagevec; void **pvec; pgoff_t idx; unsigned int nr; @@ -3040,6 +3046,8 @@ bool osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io, idx = start; pvec = osc_env_info(env)->oti_pvec; + pagevec = &osc_env_info(env)->oti_pagevec; + pagevec_init(pagevec); spin_lock(&osc->oo_tree_lock); while ((nr = radix_tree_gang_lookup(&osc->oo_tree, pvec, idx, OTI_PVEC_SIZE)) > 0) { @@ -3086,8 +3094,10 @@ bool osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io, page = ops->ops_cl.cpl_page; lu_ref_del(&page->cp_reference, "gang_lookup", current); - cl_page_put(env, page); + cl_pagevec_put(env, page, pagevec); } + pagevec_release(pagevec); + if (nr < OTI_PVEC_SIZE || end_of_region) break; diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 9236e02..4dc6c18 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -506,8 +506,10 @@ static void osc_lru_use(struct client_obd *cli, struct osc_page *opg) static void discard_pagevec(const struct lu_env *env, struct cl_io *io, struct cl_page **pvec, int max_index) { + struct pagevec *pagevec = &osc_env_info(env)->oti_pagevec; int i; + pagevec_init(pagevec); for (i = 0; i < max_index; i++) { struct cl_page *page = pvec[i]; @@ -515,10 +517,11 @@ static void discard_pagevec(const struct lu_env *env, struct cl_io *io, cl_page_delete(env, page); cl_page_discard(env, io, page); cl_page_disown(env, io, page); - cl_page_put(env, page); + cl_pagevec_put(env, page, pagevec); pvec[i] = NULL; } + pagevec_release(pagevec); } /** From patchwork Thu Feb 27 21:10:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409945 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BA77138D for ; Thu, 27 Feb 2020 21:26:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1434D246A0 for ; Thu, 27 Feb 2020 21:26:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1434D246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 67C0121FBDF; Thu, 27 Feb 2020 13:23:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB0DF21FA58 for ; Thu, 27 Feb 2020 13:19:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3E5E5237E; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3D62946C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:35 -0500 Message-Id: <1582838290-17243-168-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 167/622] lustre: lmv: allocate fid on parent MDT in migrate X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao During directory migration, if the migrated file is not directory, the target should be allocated on its parent MDT, not user specified MDT. Because if it's parent is striped, this file should be migrated to the MDT by its name hash, not the starting MDT of its parent. Add sanity 230k to check file data not changed after migration. WC-bug-id: https://jira.whamcloud.com/browse/LU-11642 Lustre-commit: a857446dc648 ("LU-11642 lmv: allocate fid on parent MDT in migrate") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/33641 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index e98f33d..428904c 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1970,7 +1970,10 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(child_tgt)) return PTR_ERR(child_tgt); - rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); + if (!S_ISDIR(op_data->op_mode) && tp_tgt) + rc = __lmv_fid_alloc(lmv, &target_fid, tp_tgt->ltd_idx); + else + rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); if (rc) return rc; From patchwork Thu Feb 27 21:10:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C20214E3 for ; Thu, 27 Feb 2020 21:26:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8EC1246A0 for ; Thu, 27 Feb 2020 21:26:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8EC1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 700AE3490EF; Thu, 27 Feb 2020 13:23:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 18BCF21FA58 for ; Thu, 27 Feb 2020 13:19:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 41D28237F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 40628468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:36 -0500 Message-Id: <1582838290-17243-169-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 168/622] lustre: ptlrpc: Do not map unrecognized ELDLM errnos to EIO X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler The lustre_errno_hton and lustre_errno_ntoh functions map between host and network error numbers before they are sent over the network. If an errno is unrecognized then it is mapped to EIO. However an optimization for x86 and i386 architectures replaced the functions with macros that simply return the original errno. The result is that x86 and i386 return the original values for ELDLM errnos and all other architectures return EIO. This difference is known to break glimpse lock callback handling which depends on clients responding with ELDLM_NO_LOCK_DATA. The difference in errnos may result in other as yet unidentified bugs. The fix defines mappings for the ELDLM errors that leaves the values unchanged. Error numbers not found in the mapping tables are still mapped to EIO. Cray-bug-id: LUS-6057 WC-bug-id: https://jira.whamcloud.com/browse/LU-9793 Lustre-commit: 641e1d546742 ("LU-9793 ptlrpc: Do not map unrecognized ELDLM errnos to EIO") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/33471 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/errno.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/lustre/ptlrpc/errno.c b/fs/lustre/ptlrpc/errno.c index b904524..2975010 100644 --- a/fs/lustre/ptlrpc/errno.c +++ b/fs/lustre/ptlrpc/errno.c @@ -30,6 +30,7 @@ #include #include #include +#include /* * The two translation tables below must define a one-to-one mapping between @@ -187,6 +188,19 @@ [EBADTYPE] = LUSTRE_EBADTYPE, [EJUKEBOX] = LUSTRE_EJUKEBOX, [EIOCBQUEUED] = LUSTRE_EIOCBQUEUED, + + /* + * The ELDLM errors are Lustre specific errors whose ranges + * lie in the middle of the above system errors. The ELDLM + * numbers must be preserved to avoid LU-9793. + */ + [ELDLM_LOCK_CHANGED] = ELDLM_LOCK_CHANGED, + [ELDLM_LOCK_ABORTED] = ELDLM_LOCK_ABORTED, + [ELDLM_LOCK_REPLACED] = ELDLM_LOCK_REPLACED, + [ELDLM_NO_LOCK_DATA] = ELDLM_NO_LOCK_DATA, + [ELDLM_LOCK_WOULDBLOCK] = ELDLM_LOCK_WOULDBLOCK, + [ELDLM_NAMESPACE_EXISTS] = ELDLM_NAMESPACE_EXISTS, + [ELDLM_BAD_NAMESPACE] = ELDLM_BAD_NAMESPACE, }; static int lustre_errno_ntoh_mapping[] = { @@ -333,6 +347,19 @@ [LUSTRE_EBADTYPE] = EBADTYPE, [LUSTRE_EJUKEBOX] = EJUKEBOX, [LUSTRE_EIOCBQUEUED] = EIOCBQUEUED, + + /* + * The ELDLM errors are Lustre specific errors whose ranges + * lie in the middle of the above system errors. The ELDLM + * numbers must be preserved to avoid LU-9793. + */ + [ELDLM_LOCK_CHANGED] = ELDLM_LOCK_CHANGED, + [ELDLM_LOCK_ABORTED] = ELDLM_LOCK_ABORTED, + [ELDLM_LOCK_REPLACED] = ELDLM_LOCK_REPLACED, + [ELDLM_NO_LOCK_DATA] = ELDLM_NO_LOCK_DATA, + [ELDLM_LOCK_WOULDBLOCK] = ELDLM_LOCK_WOULDBLOCK, + [ELDLM_NAMESPACE_EXISTS] = ELDLM_NAMESPACE_EXISTS, + [ELDLM_BAD_NAMESPACE] = ELDLM_BAD_NAMESPACE, }; unsigned int lustre_errno_hton(unsigned int h) From patchwork Thu Feb 27 21:10:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86ABC1580 for ; Thu, 27 Feb 2020 21:27:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6F355246A0 for ; Thu, 27 Feb 2020 21:27:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F355246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 30C3621FE0C; Thu, 27 Feb 2020 13:23:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70C2C21FA58 for ; Thu, 27 Feb 2020 13:19:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 448A62380; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 435B746A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:37 -0500 Message-Id: <1582838290-17243-170-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 169/622] lustre: llite: protect reading inode->i_data.nrpages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam truncate_inode_pages() looks up pages in the radix tree without lock, and could miss finding pages removed from the radix tree by __remove_mapping(), so that after calling truncate_inode_pages() we need to read the nrpages of the inode->i_data with the protection of tree_lock. Since it could still be in the race window of __remove_mapping()-> __delete_from_page_cache()->page_cache_tree_delte(), before the nrpages being decreased. WC-bug-id: https://jira.whamcloud.com/browse/LU-11582 Lustre-commit: 04c172b68676 ("LU-11582 llite: protect reading inode->i_data.nrpages") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/33639 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index ed2d1c6..b766402 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2011,6 +2011,8 @@ int ll_read_inode2(struct inode *inode, void *opaque) void ll_delete_inode(struct inode *inode) { struct ll_inode_info *lli = ll_i2info(inode); + struct address_space *mapping = &inode->i_data; + unsigned long nrpages; if (S_ISREG(inode->i_mode) && lli->lli_clob) /* discard all dirty pages before truncating them, required by @@ -2019,11 +2021,26 @@ void ll_delete_inode(struct inode *inode) cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, CL_FSYNC_LOCAL, 1); - truncate_inode_pages_final(&inode->i_data); + truncate_inode_pages_final(mapping); - LASSERTF(!inode->i_data.nrpages, - "inode=" DFID "(%p) nrpages=%lu, see http://jira.whamcloud.com/browse/LU-118\n", - PFID(ll_inode2fid(inode)), inode, inode->i_data.nrpages); + /* Workaround for LU-118: Note nrpages may not be totally updated when + * truncate_inode_pages() returns, as there can be a page in the process + * of deletion (inside __delete_from_page_cache()) in the specified + * range. Thus mapping->nrpages can be non-zero when this function + * returns even after truncation of the whole mapping. Only do this if + * npages isn't already zero. + */ + nrpages = mapping->nrpages; + if (nrpages) { + xa_lock_irq(&mapping->i_pages); + nrpages = mapping->nrpages; + xa_unlock_irq(&mapping->i_pages); + } /* Workaround end */ + + LASSERTF(nrpages == 0, + "%s: inode="DFID"(%p) nrpages=%lu, see https://jira.whamcloud.com/browse/LU-118\n", + ll_get_fsname(inode->i_sb, NULL, 0), + PFID(ll_inode2fid(inode)), inode, nrpages); ll_clear_inode(inode); clear_inode(inode); From patchwork Thu Feb 27 21:10:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409949 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14FCD138D for ; Thu, 27 Feb 2020 21:26:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EF287246A0 for ; Thu, 27 Feb 2020 21:26:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF287246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1CA3721FDC6; Thu, 27 Feb 2020 13:23:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B5B6B21FC20 for ; Thu, 27 Feb 2020 13:19:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4839B2381; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 461F746D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:38 -0500 Message-Id: <1582838290-17243-171-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 170/622] lustre: mdt: fix read-on-open for big PAGE_SIZE X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Client PAGE_SIZE can be larger than server one so data returned from server along with OPEN can be misaligned on client. Patch replaces assertion on client with check and graceful exit, changes MDC_DOM_DEF_INLINE_REPSIZE to be PAGE_SIZE at least and updates mdt_dom_read_on_open() to return file tail for maximum possible page size that can fit into reply. WC-bug-id: https://jira.whamcloud.com/browse/LU-11595 Lustre-commit: 4d7b022e373d ("LU-11595 mdt: fix read-on-open for big PAGE_SIZE") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33606 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 22 ++++++++++++++++++++-- fs/lustre/mdc/mdc_internal.h | 3 ++- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 246d5de..44337a2 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -447,8 +447,26 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, if (!rnb || rnb->rnb_len == 0) return; - CDEBUG(D_INFO, "Get data buffer along with open, len %i, i_size %llu\n", - rnb->rnb_len, i_size_read(inode)); + /* LU-11595: Server may return whole file and that is OK always or + * it may return just file tail and its offset must be aligned with + * client PAGE_SIZE to be used on that client, if server's PAGE_SIZE is + * smaller then offset may be not aligned and that data is just ignored. + */ + if (rnb->rnb_offset % PAGE_SIZE) + return; + + /* Server returns whole file or just file tail if it fills in + * reply buffer, in both cases total size should be inode size. + */ + if (rnb->rnb_offset + rnb->rnb_len < i_size_read(inode)) { + CERROR("%s: server returns off/len %llu/%u < i_size %llu\n", + ll_get_fsname(inode->i_sb, NULL, 0), rnb->rnb_offset, + rnb->rnb_len, i_size_read(inode)); + return; + } + + CDEBUG(D_INFO, "Get data along with open at %llu len %i, i_size %llu\n", + rnb->rnb_offset, rnb->rnb_len, i_size_read(inode)); data = (char *)rnb + sizeof(*rnb); diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index b4af9778..7a6ec81 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -162,7 +162,8 @@ int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock, int mdc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data); int mdc_fill_lvb(struct ptlrpc_request *req, struct ost_lvb *lvb); -#define MDC_DOM_DEF_INLINE_REPSIZE 8192 +/* the minimum inline repsize should be PAGE_SIZE at least */ +#define MDC_DOM_DEF_INLINE_REPSIZE max(8192UL, PAGE_SIZE) #define MDC_DOM_MAX_INLINE_REPSIZE XATTR_SIZE_MAX #endif From patchwork Thu Feb 27 21:10:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409953 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6458814BC for ; Thu, 27 Feb 2020 21:26:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4CDA2246A0 for ; Thu, 27 Feb 2020 21:26:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CDA2246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7A9B349047; Thu, 27 Feb 2020 13:23:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0212D21FC27 for ; Thu, 27 Feb 2020 13:19:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4A6C62386; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 496A346C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:39 -0500 Message-Id: <1582838290-17243-172-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 171/622] lustre: llite: handle -ENODATA in ll_layout_fetch() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" In ll_layout_fetch() handle -ENODATA returns from mdc_getxattr(). This is needed for interop and restores the behavior from before commit 0f42b388432c (LU-11380 mdc: move empty xattr to mdc layer) landed. WC-bug-id: https://jira.whamcloud.com/browse/LU-11662 Lustre-commit: e3f367f3660d ("LU-11662 llite: handle -ENODATA in ll_layout_fetch()") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/33665 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 44337a2..25d7986 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4433,8 +4433,13 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock) rc = md_getxattr(sbi->ll_md_exp, ll_inode2fid(inode), OBD_MD_FLXATTR, XATTR_NAME_LOV, lmmsize, &req); - if (rc < 0) + if (rc < 0) { + if (rc == -ENODATA) { + rc = 0; + goto out; /* empty layout */ + } return rc; + } lmmsize = rc; rc = 0; From patchwork Thu Feb 27 21:10:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F27511871 for ; Thu, 27 Feb 2020 21:27:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DB529246A0 for ; Thu, 27 Feb 2020 21:27:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB529246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2873C21CA80; Thu, 27 Feb 2020 13:23:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 449B021FC2D for ; Thu, 27 Feb 2020 13:19:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4E4E12387; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4CAFC468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:40 -0500 Message-Id: <1582838290-17243-173-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 172/622] lustre: hsm: increase upper limit of maximum HSM backends registered with MDT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Teddy Zheng , Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Teddy Zheng Lustre only supports at most 32 HSM backends, which limits HSM to be applied to other features, such as LPCC. This patch breaks the limitation by allowing the system take any interger number as a valid archive-id. WC-bug-id: https://jira.whamcloud.com/browse/LU-10114 Lustre-commit: 3bfb6107ba4e ("LU-10114 hsm: increase upper limit of maximum HSM backends registered with MDT") Signed-off-by: Teddy Zheng Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/32197 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_export.h | 15 +++- fs/lustre/llite/dir.c | 115 +++++++++++++++++++++++--- fs/lustre/llite/file.c | 15 ++-- fs/lustre/llite/llite_lib.c | 3 +- fs/lustre/lmv/lmv_obd.c | 31 +++++-- fs/lustre/mdc/mdc_request.c | 81 +++++++++++++----- fs/lustre/ptlrpc/layout.c | 2 +- include/uapi/linux/lustre/lustre_idl.h | 10 ++- include/uapi/linux/lustre/lustre_kernelcomm.h | 15 +++- 9 files changed, 235 insertions(+), 52 deletions(-) diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index 57cf68b..c94efb0 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -276,11 +276,22 @@ static inline int exp_connect_lock_convert(struct obd_export *exp) struct obd_export *class_conn2export(struct lustre_handle *conn); -#define KKUC_CT_DATA_MAGIC 0x092013cea +static inline int exp_connect_archive_id_array(struct obd_export *exp) +{ + return !!(exp_connect_flags2(exp) & OBD_CONNECT2_ARCHIVE_ID_ARRAY); +} + +enum { + /* archive_ids in array format */ + KKUC_CT_DATA_ARRAY_MAGIC = 0x092013cea, + /* archive_ids in bitmap format */ + KKUC_CT_DATA_BITMAP_MAGIC = 0x082018cea, +}; struct kkuc_ct_data { u32 kcd_magic; - u32 kcd_archive; + u32 kcd_nr_archives; + u32 kcd_archives[0]; }; /** @} export */ diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 3da9d14..f54987a 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -931,19 +931,114 @@ static int ll_ioc_copy_end(struct super_block *sb, struct hsm_copy *copy) return rc ? rc : rc2; } -static int copy_and_ioctl(int cmd, struct obd_export *exp, - const void __user *data, size_t size) +static int copy_and_ct_start(int cmd, struct obd_export *exp, + const struct lustre_kernelcomm __user *data) { - void *copy; + struct lustre_kernelcomm *lk; + struct lustre_kernelcomm *tmp; + size_t size = sizeof(*lk); + size_t new_size; int rc; + int i; - copy = memdup_user(data, size); - if (IS_ERR(copy)) - return PTR_ERR(copy); + lk = memdup_user(data, size); + if (IS_ERR(lk)) { + rc = PTR_ERR(lk); + goto out_lk; + } + + if (lk->lk_flags & LK_FLG_STOP) + goto do_ioctl; + + if (!(lk->lk_flags & LK_FLG_DATANR)) { + u32 archive_mask = lk->lk_data_count; + int count; + + /* old hsm agent to old MDS */ + if (!exp_connect_archive_id_array(exp)) + goto do_ioctl; + + /* old hsm agent to new MDS */ + lk->lk_flags |= LK_FLG_DATANR; + + if (archive_mask == 0) + goto do_ioctl; + + count = hweight32(archive_mask); + new_size = offsetof(struct lustre_kernelcomm, lk_data[count]); + tmp = kmalloc(new_size, GFP_KERNEL); + if (!tmp) { + rc = -ENOMEM; + goto out_lk; + } + memcpy(tmp, lk, size); + tmp->lk_data_count = count; + kfree(lk); + lk = tmp; + size = new_size; + + count = 0; + for (i = 0; i < sizeof(archive_mask) * 8; i++) { + if (BIT(i) & archive_mask) { + lk->lk_data[count] = i + 1; + count++; + } + } + goto do_ioctl; + } + + /* new hsm agent to new mds */ + if (lk->lk_data_count > 0) { + new_size = offsetof(struct lustre_kernelcomm, + lk_data[lk->lk_data_count]); + tmp = kmalloc(new_size, GFP_KERNEL); + if (!tmp) { + rc = -ENOMEM; + goto out_lk; + } + + kfree(lk); + lk = tmp; + size = new_size; + + if (copy_from_user(lk, data, size)) { + rc = -EFAULT; + goto out_lk; + } + } + + /* new hsm agent to old MDS */ + if (!exp_connect_archive_id_array(exp)) { + u32 archives = 0; + + if (lk->lk_data_count > LL_HSM_ORIGIN_MAX_ARCHIVE) { + rc = -EINVAL; + goto out_lk; + } + + for (i = 0; i < lk->lk_data_count; i++) { + if (lk->lk_data[i] > LL_HSM_ORIGIN_MAX_ARCHIVE) { + rc = -EINVAL; + CERROR("%s: archive id %d requested but only [0 - %zu] supported: rc = %d\n", + exp->exp_obd->obd_name, lk->lk_data[i], + LL_HSM_ORIGIN_MAX_ARCHIVE, rc); + goto out_lk; + } - rc = obd_iocontrol(cmd, exp, size, copy, NULL); - kfree(copy); + if (lk->lk_data[i] == 0) { + archives = 0; + break; + } + archives |= BIT(lk->lk_data[i] - 1); + } + lk->lk_flags &= ~LK_FLG_DATANR; + lk->lk_data_count = archives; + } +do_ioctl: + rc = obd_iocontrol(cmd, exp, size, lk, NULL); +out_lk: + kfree(lk); return rc; } @@ -1671,8 +1766,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) if (!capable(CAP_SYS_ADMIN)) return -EPERM; - rc = copy_and_ioctl(cmd, sbi->ll_md_exp, (void __user *)arg, - sizeof(struct lustre_kernelcomm)); + rc = copy_and_ct_start(cmd, sbi->ll_md_exp, + (struct lustre_kernelcomm __user *)arg); return rc; case LL_IOC_HSM_COPY_START: { diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 25d7986..7078734 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2397,6 +2397,7 @@ static int ll_swap_layouts(struct file *file1, struct file *file2, int ll_hsm_state_set(struct inode *inode, struct hsm_state_set *hss) { + struct obd_export *exp = ll_i2mdexp(inode); struct md_op_data *op_data; int rc; @@ -2411,18 +2412,20 @@ int ll_hsm_state_set(struct inode *inode, struct hsm_state_set *hss) !capable(CAP_SYS_ADMIN)) return -EPERM; - /* Detect out-of range archive id */ - if ((hss->hss_valid & HSS_ARCHIVE_ID) && - (hss->hss_archive_id > LL_HSM_MAX_ARCHIVE)) - return -EINVAL; + if (!exp_connect_archive_id_array(exp)) { + /* Detect out-of range archive id */ + if ((hss->hss_valid & HSS_ARCHIVE_ID) && + (hss->hss_archive_id > LL_HSM_ORIGIN_MAX_ARCHIVE)) + return -EINVAL; + } op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, 0, LUSTRE_OPC_ANY, hss); if (IS_ERR(op_data)) return PTR_ERR(op_data); - rc = obd_iocontrol(LL_IOC_HSM_STATE_SET, ll_i2mdexp(inode), - sizeof(*op_data), op_data, NULL); + rc = obd_iocontrol(LL_IOC_HSM_STATE_SET, exp, sizeof(*op_data), + op_data, NULL); ll_finish_md_op_data(op_data); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index b766402..4797ee9 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -212,7 +212,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags2 = OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_DIR_MIGRATE | - OBD_CONNECT2_SUM_STATFS; + OBD_CONNECT2_SUM_STATFS | + OBD_CONNECT2_ARCHIVE_ID_ARRAY; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 428904c..9f9abd3 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -788,18 +788,39 @@ static int lmv_hsm_ct_register(struct obd_device *obd, unsigned int cmd, u32 i, j; int err; bool any_set = false; - struct kkuc_ct_data kcd = { - .kcd_magic = KKUC_CT_DATA_MAGIC, - .kcd_archive = lk->lk_data, - }; + struct kkuc_ct_data *kcd; + size_t kcd_size; int rc = 0; filp = fget(lk->lk_wfd); if (!filp) return -EBADF; + if (lk->lk_flags & LK_FLG_DATANR) + kcd_size = offsetof(struct kkuc_ct_data, + kcd_archives[lk->lk_data_count]); + else + kcd_size = sizeof(*kcd); + + kcd = kmalloc(kcd_size, GFP_KERNEL); + if (!kcd) { + rc = -ENOMEM; + goto err_fput; + } + + kcd->kcd_nr_archives = lk->lk_data_count; + if (lk->lk_flags & LK_FLG_DATANR) { + kcd->kcd_magic = KKUC_CT_DATA_ARRAY_MAGIC; + if (lk->lk_data_count > 0) + memcpy(kcd->kcd_archives, lk->lk_data, + sizeof(*kcd->kcd_archives) * lk->lk_data_count); + } else { + kcd->kcd_magic = KKUC_CT_DATA_BITMAP_MAGIC; + } + rc = libcfs_kkuc_group_add(filp, &obd->obd_uuid, lk->lk_uid, - lk->lk_group, &kcd, sizeof(kcd)); + lk->lk_group, kcd, kcd_size); + kfree(kcd); if (rc) goto err_fput; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 6934e57..d702fd1 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1689,31 +1689,56 @@ static int mdc_ioc_hsm_progress(struct obd_export *exp, return rc; } -static int mdc_ioc_hsm_ct_register(struct obd_import *imp, u32 archives) +/** + * Send hsm_ct_register to MDS + * + * @imp import + * @ archive_count if in bitmap format, it is the bitmap, + * else it is the count of archive_ids + * @archives if in bitmap format, it is NULL, + * else it is archive_id lists + * + * Return: 0 on success, negated error code on failure. + */ +static int mdc_ioc_hsm_ct_register(struct obd_import *imp, u32 archive_count, + u32 *archives) { - u32 *archive_mask; + u32 *archive_array; struct ptlrpc_request *req; + size_t archives_size; int rc; - req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_HSM_CT_REGISTER, - LUSTRE_MDS_VERSION, - MDS_HSM_CT_REGISTER); - if (!req) { - rc = -ENOMEM; - goto out; + req = ptlrpc_request_alloc(imp, &RQF_MDS_HSM_CT_REGISTER); + if (!req) + return -ENOMEM; + + if (archives) + archives_size = sizeof(*archive_array) * archive_count; + else + archives_size = sizeof(archive_count); + + req_capsule_set_size(&req->rq_pill, &RMF_MDS_HSM_ARCHIVE, + RCL_CLIENT, archives_size); + + rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_HSM_CT_REGISTER); + if (rc) { + ptlrpc_request_free(req); + return -ENOMEM; } mdc_pack_body(req, NULL, 0, 0, -1, 0); - /* Copy hsm_progress struct */ - archive_mask = req_capsule_client_get(&req->rq_pill, - &RMF_MDS_HSM_ARCHIVE); - if (!archive_mask) { + archive_array = req_capsule_client_get(&req->rq_pill, + &RMF_MDS_HSM_ARCHIVE); + if (!archive_array) { rc = -EPROTO; goto out; } - *archive_mask = archives; + if (archives) + memcpy(archive_array, archives, archives_size); + else + *archive_array = archive_count; ptlrpc_request_set_replen(req); @@ -2249,7 +2274,6 @@ static int mdc_ioc_hsm_ct_start(struct obd_export *exp, struct lustre_kernelcomm *lk) { struct obd_import *imp = class_exp2cliimp(exp); - u32 archive = lk->lk_data; int rc = 0; if (lk->lk_group != KUC_GRP_HSM) { @@ -2264,7 +2288,12 @@ static int mdc_ioc_hsm_ct_start(struct obd_export *exp, /* Unregister with the coordinator */ rc = mdc_ioc_hsm_ct_unregister(imp); } else { - rc = mdc_ioc_hsm_ct_register(imp, archive); + u32 *archives = NULL; + + if ((lk->lk_flags & LK_FLG_DATANR) && lk->lk_data_count > 0) + archives = lk->lk_data; + + rc = mdc_ioc_hsm_ct_register(imp, lk->lk_data_count, archives); } return rc; @@ -2314,17 +2343,29 @@ static int mdc_hsm_copytool_send(const struct obd_uuid *uuid, */ static int mdc_hsm_ct_reregister(void *data, void *cb_arg) { - struct kkuc_ct_data *kcd = data; struct obd_import *imp = (struct obd_import *)cb_arg; + struct kkuc_ct_data *kcd = data; + u32 *archives = NULL; int rc; - if (!kcd || kcd->kcd_magic != KKUC_CT_DATA_MAGIC) + if (!kcd || + (kcd->kcd_magic != KKUC_CT_DATA_ARRAY_MAGIC && + kcd->kcd_magic != KKUC_CT_DATA_BITMAP_MAGIC)) return -EPROTO; - CDEBUG(D_HA, "%s: recover copytool registration to MDT (archive=%#x)\n", - imp->imp_obd->obd_name, kcd->kcd_archive); - rc = mdc_ioc_hsm_ct_register(imp, kcd->kcd_archive); + if (kcd->kcd_magic == KKUC_CT_DATA_BITMAP_MAGIC) { + CDEBUG(D_HA, + "%s: recover copytool registration to MDT (archive=%#x)\n", + imp->imp_obd->obd_name, kcd->kcd_nr_archives); + } else { + CDEBUG(D_HA, + "%s: recover copytool registration to MDT (archive nr = %u)\n", + imp->imp_obd->obd_name, kcd->kcd_nr_archives); + if (kcd->kcd_nr_archives != 0) + archives = kcd->kcd_archives; + } + rc = mdc_ioc_hsm_ct_register(imp, kcd->kcd_nr_archives, archives); /* ignore error if the copytool is already registered */ return (rc == -EEXIST) ? 0 : rc; } diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 92d2fc2..2e74ae1b 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1127,7 +1127,7 @@ struct req_msg_field RMF_MDS_HSM_USER_ITEM = EXPORT_SYMBOL(RMF_MDS_HSM_USER_ITEM); struct req_msg_field RMF_MDS_HSM_ARCHIVE = - DEFINE_MSGF("hsm_archive", 0, + DEFINE_MSGF("hsm_archive", RMF_F_STRUCT_ARRAY, sizeof(u32), lustre_swab_generic_32s, NULL); EXPORT_SYMBOL(RMF_MDS_HSM_ARCHIVE); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 8330fe1..599fe86 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -194,12 +194,14 @@ enum { LUSTRE_FID_INIT_OID = 1UL }; -/* copytool uses a 32b bitmask field to encode archive-Ids during register - * with MDT thru kuc. +/* copytool can use any nonnegative integer to represent archive-Ids during + * register with MDT thru kuc. * archive num = 0 => all - * archive num from 1 to 32 + * archive num from 1 to MAX_U32 */ -#define LL_HSM_MAX_ARCHIVE (sizeof(__u32) * 8) +#define LL_HSM_ORIGIN_MAX_ARCHIVE (sizeof(__u32) * 8) +/* the max count of archive ids that one agent can support */ +#define LL_HSM_MAX_ARCHIVES_PER_AGENT 1024 /** * Different FID Format diff --git a/include/uapi/linux/lustre/lustre_kernelcomm.h b/include/uapi/linux/lustre/lustre_kernelcomm.h index d84a8fc..8c5dec7 100644 --- a/include/uapi/linux/lustre/lustre_kernelcomm.h +++ b/include/uapi/linux/lustre/lustre_kernelcomm.h @@ -75,17 +75,26 @@ enum kuc_generic_message_type { #define KUC_GRP_HSM 0x02 #define KUC_GRP_MAX KUC_GRP_HSM -#define LK_FLG_STOP 0x01 +enum lk_flags { + LK_FLG_STOP = 0x0001, + LK_FLG_DATANR = 0x0002, +}; #define LK_NOFD -1U -/* kernelcomm control structure, passed from userspace to kernel */ +/* kernelcomm control structure, passed from userspace to kernel. + * For compatibility with old copytools, users who pass ARCHIVE_IDs + * to kernel using lk_data_count and lk_data should fill lk_flags with + * LK_FLG_DATANR. Otherwise kernel will take lk_data_count as bitmap of + * ARCHIVE IDs. + */ struct lustre_kernelcomm { __u32 lk_wfd; __u32 lk_rfd; __u32 lk_uid; __u32 lk_group; - __u32 lk_data; + __u32 lk_data_count; __u32 lk_flags; + __u32 lk_data[0]; } __packed; #endif /* __UAPI_LUSTRE_KERNELCOMM_H__ */ From patchwork Thu Feb 27 21:10:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409957 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1CAFF14BC for ; Thu, 27 Feb 2020 21:26:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 04D29246A0 for ; Thu, 27 Feb 2020 21:26:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 04D29246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27C1134906B; Thu, 27 Feb 2020 13:23:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9A3D221FABE for ; Thu, 27 Feb 2020 13:19:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 50C7C2388; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4F7FA46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:41 -0500 Message-Id: <1582838290-17243-174-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 173/622] lustre: osc: wrong page offset for T10PI checksum X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi The page offset might could be non-zero value. Thus, when calculating T10PI checksum, the offset should be correct value. WC-bug-id: https://jira.whamcloud.com/browse/LU-11697 Lustre-commit: c1f052055446 ("LU-11697 osc: wrong page offset for T10PI checksum") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/33727 Reviewed-by: Alex Zhuravlev Reviewed-by: Andreas Dilger Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 18b99a9..1fc7a57 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1153,7 +1153,8 @@ static int osc_checksum_bulk_t10pi(const char *obd_name, int nob, * The left guard number should be able to hold checksums of a * whole page */ - rc = obd_page_dif_generate_buffer(obd_name, pga[i]->pg, 0, + rc = obd_page_dif_generate_buffer(obd_name, pga[i]->pg, + pga[i]->off & ~PAGE_MASK, count, guard_start + used_number, guard_number - used_number, From patchwork Thu Feb 27 21:10:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86E021580 for ; Thu, 27 Feb 2020 21:27:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6CFCF246A0 for ; Thu, 27 Feb 2020 21:27:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CFCF246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7A98B34912B; Thu, 27 Feb 2020 13:24:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB1E421FAC1 for ; Thu, 27 Feb 2020 13:19:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 53EAA2389; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5264146F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:42 -0500 Message-Id: <1582838290-17243-175-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 174/622] lnet: increase lnet transaction timeout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma Increase the new LNet Health transaction timeout to the original 50s value, to avoid spurious lnet-selftest failures and expected false timeouts under load. WC-bug-id: https://jira.whamcloud.com/browse/LU-11389 Lustre-commit: 73fdd1579d87 ("LU-11389 lnet: increase lnet transaction timeout") Signed-off-by: Sonia Sharma Reviewed-on: https://review.whamcloud.com/33231 Reviewed-by: Andreas Dilger Reviewed-by: James Nunez Reviewed-by: Amir Shehata Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 25592db..3ee10da 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -126,7 +126,7 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_peer_discovery_disabled, "Set to 1 to disable peer discovery on this node."); -unsigned int lnet_transaction_timeout = 5; +unsigned int lnet_transaction_timeout = 50; static int transaction_to_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_transaction_timeout = { .set = transaction_to_set, From patchwork Thu Feb 27 21:10:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CFF714E3 for ; Thu, 27 Feb 2020 21:26:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 23DE1246A0 for ; Thu, 27 Feb 2020 21:26:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23DE1246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B9D13349093; Thu, 27 Feb 2020 13:23:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2C76C21FB8F for ; Thu, 27 Feb 2020 13:19:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 56D812481; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 55CBD46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:43 -0500 Message-Id: <1582838290-17243-176-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 175/622] lnet: handle multi-md usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The MD can be used multiple times. The response tracker needs to have the same lifespan as the MD. If we re-use the MD and a response tracker has already been attached to it, then we'll update the deadline for the response tracker. This means the deadline on the MD is for its last user. WC-bug-id: https://jira.whamcloud.com/browse/LU-11734 Lustre-commit: 8c249097e627 ("LU-11734 lnet: handle multi-md usage") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33794 Reviewed-by: James Simmons Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 - net/lnet/lnet/lib-move.c | 47 +++++++++++++++++-------------- net/lnet/lnet/lib-msg.c | 64 +++++++++++++++++++++---------------------- 3 files changed, 57 insertions(+), 55 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 26095a6..bbb678f 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -550,7 +550,6 @@ int lnet_get_peer_list(u32 *countp, u32 *sizep, void lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md, unsigned int offset, unsigned int mlen); -void lnet_msg_detach_md(struct lnet_msg *msg, int status); void lnet_build_unlink_event(struct lnet_libmd *md, struct lnet_event *ev); void lnet_build_msg_event(struct lnet_msg *msg, enum lnet_event_kind ev_type); void lnet_msg_commit(struct lnet_msg *msg, int cpt); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index eacda4c..3bcac03 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2437,6 +2437,7 @@ struct lnet_mt_event_info { lnet_nid_t mt_nid; }; +/* called with res_lock held */ void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt) { @@ -2446,11 +2447,9 @@ struct lnet_mt_event_info { * The rspt queue for the cpt is protected by * the lnet_net_lock(cpt). cpt is the cpt of the MD cookie. */ - lnet_res_lock(cpt); - if (!md->md_rspt_ptr) { - lnet_res_unlock(cpt); + if (!md->md_rspt_ptr) return; - } + rspt = md->md_rspt_ptr; md->md_rspt_ptr = NULL; @@ -2462,7 +2461,6 @@ struct lnet_mt_event_info { * the rspt block. */ LNetInvalidateMDHandle(&rspt->rspt_mdh); - lnet_res_unlock(cpt); } static void @@ -4152,6 +4150,8 @@ void lnet_monitor_thr_stop(void) struct lnet_libmd *md, struct lnet_handle_md mdh) { s64 timeout_ns; + bool new_entry = true; + struct lnet_rsp_tracker *local_rspt; /* MD has a refcount taken by message so it's not going away. * The MD however can be looked up. We need to secure the access @@ -4159,27 +4159,34 @@ void lnet_monitor_thr_stop(void) * The rspt can be accessed without protection up to when it gets * added to the list. */ - - /* debug code */ - LASSERT(!md->md_rspt_ptr); - - /* we'll use that same event in case we never get a response */ - rspt->rspt_mdh = mdh; - rspt->rspt_cpt = cpt; - timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; - rspt->rspt_deadline = ktime_add_ns(ktime_get(), timeout_ns); - lnet_res_lock(cpt); - /* store the rspt so we can access it when we get the REPLY */ - md->md_rspt_ptr = rspt; - lnet_res_unlock(cpt); + local_rspt = md->md_rspt_ptr; + timeout_ns = lnet_transaction_timeout * NSEC_PER_SEC; + if (local_rspt) { + /* we already have an rspt attached to the md, so we'll + * update the deadline on that one. + */ + kfree(rspt); + new_entry = false; + } else { + /* new md */ + rspt->rspt_mdh = mdh; + rspt->rspt_cpt = cpt; + /* store the rspt so we can access it when we get the REPLY */ + md->md_rspt_ptr = rspt; + local_rspt = rspt; + } + local_rspt->rspt_deadline = ktime_add_ns(ktime_get(), timeout_ns); /* add to the list of tracked responses. It's added to tail of the * list in order to expire all the older entries first. */ lnet_net_lock(cpt); - list_add_tail(&rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); + if (!new_entry && !list_empty(&local_rspt->rspt_on_list)) + list_del_init(&local_rspt->rspt_on_list); + list_add_tail(&local_rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); lnet_net_unlock(cpt); + lnet_res_unlock(cpt); } /** @@ -4321,7 +4328,6 @@ void lnet_monitor_thr_stop(void) CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; - lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } @@ -4543,7 +4549,6 @@ struct lnet_msg * CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); msg->msg_no_resend = true; - lnet_detach_rsp_tracker(msg->msg_md, cpt); lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index f626ca3..af0675e 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -369,29 +369,6 @@ lnet_md_deconstruct(md, &msg->msg_ev.md); } -void -lnet_msg_detach_md(struct lnet_msg *msg, int status) -{ - struct lnet_libmd *md = msg->msg_md; - int unlink; - - /* Now it's safe to drop my caller's ref */ - md->md_refcount--; - LASSERT(md->md_refcount >= 0); - - unlink = lnet_md_unlinkable(md); - if (md->md_eq) { - msg->msg_ev.status = status; - msg->msg_ev.unlinked = unlink; - lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev); - } - - if (unlink) - lnet_md_unlink(md); - - msg->msg_md = NULL; -} - static int lnet_complete_msg_locked(struct lnet_msg *msg, int cpt) { @@ -772,12 +749,42 @@ } static void +lnet_msg_detach_md(struct lnet_msg *msg, int cpt, int status) +{ + struct lnet_libmd *md = msg->msg_md; + int unlink; + + /* Now it's safe to drop my caller's ref */ + md->md_refcount--; + LASSERT(md->md_refcount >= 0); + + unlink = lnet_md_unlinkable(md); + if (md->md_eq) { + msg->msg_ev.status = status; + msg->msg_ev.unlinked = unlink; + lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev); + } + + if (unlink) { + /* if this is an ACK or a REPLY then make sure to remove the + * response tracker. + */ + if (msg->msg_ev.type == LNET_EVENT_REPLY || + msg->msg_ev.type == LNET_EVENT_ACK) + lnet_detach_rsp_tracker(msg->msg_md, cpt); + lnet_md_unlink(md); + } + + msg->msg_md = NULL; +} + +static void lnet_detach_md(struct lnet_msg *msg, int status) { int cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); lnet_res_lock(cpt); - lnet_msg_detach_md(msg, status); + lnet_msg_detach_md(msg, cpt, status); lnet_res_unlock(cpt); } @@ -877,15 +884,6 @@ msg->msg_ev.status = status; - /* if this is an ACK or a REPLY then make sure to remove the - * response tracker. - */ - if (msg->msg_ev.type == LNET_EVENT_REPLY || - msg->msg_ev.type == LNET_EVENT_ACK) { - cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); - lnet_detach_rsp_tracker(msg->msg_md, cpt); - } - /* if the message is successfully sent, no need to keep the MD around */ if (msg->msg_md && !status) lnet_detach_md(msg, status); From patchwork Thu Feb 27 21:10:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D8571580 for ; Thu, 27 Feb 2020 21:27:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 765CB246A0 for ; Thu, 27 Feb 2020 21:27:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 765CB246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 78B9A21FF61; Thu, 27 Feb 2020 13:24:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8234321FC35 for ; Thu, 27 Feb 2020 13:19:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5A04F2482; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5892046D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:44 -0500 Message-Id: <1582838290-17243-177-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 176/622] lustre: uapi: fix warnings when lustre_user.h included X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Checking for lustre/lustre_user.h in a configure script generates a warning because of the included checking lustre/lustre_user.h usability... no checking lustre/lustre_user.h presence... yes WARNING: present but cannot be compiled WARNING: check for missing prerequisite headers? WARNING: see the Autoconf documentation WARNING: section "Present But Cannot Be Compiled" WARNING: proceeding with the preprocessor's result WARNING: in the future, the compiler will take precedence Looking into config.log it shows: In file included from /usr/include/lustre/lustre_user.h:59, from conftest.c:91: /usr/include/sys/quota.h:221: error: expected declaration specifiers or '...' before 'caddr_t' Since we don't really need much from the header, just use the default linux UAPI quota header. Fix an unused variable warning in ll_dir_ioctl(). Lustre-commit: db0592145574 ("LU-11783 utils: fix warnings when lustre_user.h included") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33876 Reviewed-by: Wang Shilong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 2 +- include/uapi/linux/lustre/lustre_user.h | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f54987a..ef4fa36 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1356,7 +1356,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct lov_user_md_v1 *lumv1_ptr = &lumv1; struct lov_user_md_v1 __user *lumv1p = (void __user *)arg; struct lov_user_md_v3 __user *lumv3p = (void __user *)arg; - int lum_size; + int lum_size = 0; int set_default = 0; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 3bd6fc7..649aeeb 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -44,10 +44,10 @@ #include #include +#include #ifdef __KERNEL__ # include -# include # include # include /* snprintf() */ # include @@ -57,7 +57,6 @@ # include # include /* snprintf() */ # include -# include # include #endif /* __KERNEL__ */ #include From patchwork Thu Feb 27 21:10:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409993 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E316E14E3 for ; Thu, 27 Feb 2020 21:27:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CBB83246A1 for ; Thu, 27 Feb 2020 21:27:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBB83246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3526B348B63; Thu, 27 Feb 2020 13:24:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C72C721FC09 for ; Thu, 27 Feb 2020 13:19:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5CEE62487; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5BEF8468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:45 -0500 Message-Id: <1582838290-17243-178-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 177/622] lustre: obdclass: lu_dirent record length missing '0' X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao In lu_dirent packing, a '0' is appended after name, but it's not counted in size calcuation, which may cause crash. WC-bug-id: https://jira.whamcloud.com/browse/LU-11753 Lustre-commit: 77f01308c509 ("LU-11753 obdclass: lu_dirent record length missing '0'") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/33865 Reviewed-by: Stephan Thiell Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_idl.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 599fe86..4236a43 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -480,10 +480,11 @@ static inline size_t lu_dirent_calc_size(size_t namelen, __u16 attr) if (attr & LUDA_TYPE) { const size_t align = sizeof(struct luda_type) - 1; - size = (sizeof(struct lu_dirent) + namelen + align) & ~align; + size = (sizeof(struct lu_dirent) + namelen + 1 + align) & + ~align; size += sizeof(struct luda_type); } else { - size = sizeof(struct lu_dirent) + namelen; + size = sizeof(struct lu_dirent) + namelen + 1; } return (size + 7) & ~7; From patchwork Thu Feb 27 21:10:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409997 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39A361580 for ; Thu, 27 Feb 2020 21:27:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 22696246A0 for ; Thu, 27 Feb 2020 21:27:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22696246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AC819348AD1; Thu, 27 Feb 2020 13:24:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 16BC421FB0C for ; Thu, 27 Feb 2020 13:19:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5FD6E2488; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5EE3146A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:46 -0500 Message-Id: <1582838290-17243-179-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 178/622] lustre: update version to 2.11.99 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" With nearly all of the the missing patches from the lustre 2.12 version merged upstream its time to update the upstream clients version. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_ver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h index d7c53c5..8ceb57d 100644 --- a/include/uapi/linux/lustre/lustre_ver.h +++ b/include/uapi/linux/lustre/lustre_ver.h @@ -2,10 +2,10 @@ #define _LUSTRE_VER_H_ #define LUSTRE_MAJOR 2 -#define LUSTRE_MINOR 10 +#define LUSTRE_MINOR 11 #define LUSTRE_PATCH 99 #define LUSTRE_FIX 0 -#define LUSTRE_VERSION_STRING "2.10.99" +#define LUSTRE_VERSION_STRING "2.11.99" #define OBD_OCD_VERSION(major, minor, patch, fix) \ (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix)) From patchwork Thu Feb 27 21:10:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410001 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D635714E3 for ; Thu, 27 Feb 2020 21:27:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BEE39246A0 for ; Thu, 27 Feb 2020 21:27:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BEE39246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9E676348BB4; Thu, 27 Feb 2020 13:24:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58A7621FB0C for ; Thu, 27 Feb 2020 13:19:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 633C82489; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 61E4046C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:47 -0500 Message-Id: <1582838290-17243-180-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 179/622] lustre: osc: limit chunk number of write submit X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam Don't queue too many pages in an extent for a write RPC, we need to take care of the chunk limit in write submit as well (refers to LU-8135 for more details). WC-bug-id: https://jira.whamcloud.com/browse/LU-10239 Lustre-commit: 93ef6e7863b4 ("LU-10239 osc: limit chunk number of write submit") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/30627 Reviewed-by: Andreas Dilger Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 30 ------------------------------ fs/lustre/osc/osc_internal.h | 30 ++++++++++++++++++++++++++++++ fs/lustre/osc/osc_io.c | 27 +++++++++++++++++++++++++-- 3 files changed, 55 insertions(+), 32 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 47aee99..1ff258c 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1937,36 +1937,6 @@ static int try_to_add_extent_for_io(struct client_obd *cli, return 1; } -static inline unsigned int osc_max_write_chunks(const struct client_obd *cli) -{ - /* - * LU-8135: - * - * The maximum size of a single transaction is about 64MB in ZFS. - * #define DMU_MAX_ACCESS (64 * 1024 * 1024) - * - * Since ZFS is a copy-on-write file system, a single dirty page in - * a chunk will result in the rewrite of the whole chunk, therefore - * an RPC shouldn't be allowed to contain too many chunks otherwise - * it will make transaction size much bigger than 64MB, especially - * with big block size for ZFS. - * - * This piece of code is to make sure that OSC won't send write RPCs - * with too many chunks. The maximum chunk size that an RPC can cover - * is set to PTLRPC_MAX_BRW_SIZE, which is defined to 16MB. Ideally - * OST should tell the client what the biggest transaction size is, - * but it's good enough for now. - * - * This limitation doesn't apply to ldiskfs, which allows as many - * chunks in one RPC as we want. However, it won't have any benefits - * to have too many discontiguous pages in one RPC. - * - * An osc_extent won't cover over a RPC size, so the chunks in an - * osc_extent won't bigger than PTLRPC_MAX_BRW_SIZE >> chunkbits. - */ - return PTLRPC_MAX_BRW_SIZE >> cli->cl_chunkbits; -} - /** * In order to prevent multiple ptlrpcd from breaking contiguous extents, * get_write_extent() takes all appropriate extents in atomic. diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index 3ba209f..2cb737b 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -162,6 +162,36 @@ unsigned long osc_cache_shrink_count(struct shrinker *sk, unsigned long osc_cache_shrink_scan(struct shrinker *sk, struct shrink_control *sc); +static inline unsigned int osc_max_write_chunks(const struct client_obd *cli) +{ + /* + * LU-8135: + * + * The maximum size of a single transaction is about 64MB in ZFS. + * #define DMU_MAX_ACCESS (64 * 1024 * 1024) + * + * Since ZFS is a copy-on-write file system, a single dirty page in + * a chunk will result in the rewrite of the whole chunk, therefore + * an RPC shouldn't be allowed to contain too many chunks otherwise + * it will make transaction size much bigger than 64MB, especially + * with big block size for ZFS. + * + * This piece of code is to make sure that OSC won't send write RPCs + * with too many chunks. The maximum chunk size that an RPC can cover + * is set to PTLRPC_MAX_BRW_SIZE, which is defined to 16MB. Ideally + * OST should tell the client what the biggest transaction size is, + * but it's good enough for now. + * + * This limitation doesn't apply to ldiskfs, which allows as many + * chunks in one RPC as we want. However, it won't have any benefits + * to have too many discontiguous pages in one RPC. + * + * An osc_extent won't cover over a RPC size, so the chunks in an + * osc_extent won't bigger than PTLRPC_MAX_BRW_SIZE >> chunkbits. + */ + return PTLRPC_MAX_BRW_SIZE >> cli->cl_chunkbits; +} + static inline void osc_set_io_portal(struct ptlrpc_request *req) { struct obd_import *imp = req->rq_import; diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 1485962..56f30cb 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -122,6 +122,9 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, int result = 0; int brw_flags; unsigned int max_pages; + unsigned int ppc_bits; /* pages per chunk bits */ + unsigned int ppc; + bool sync_queue = false; LASSERT(qin->pl_nr > 0); @@ -130,6 +133,8 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, osc = cl2osc(ios->cis_obj); cli = osc_cli(osc); max_pages = cli->cl_max_pages_per_rpc; + ppc_bits = cli->cl_chunkbits - PAGE_SHIFT; + ppc = 1 << ppc_bits; brw_flags = osc_io_srvlock(cl2osc_io(env, ios)) ? OBD_BRW_SRVLOCK : 0; brw_flags |= crt == CRT_WRITE ? OBD_BRW_WRITE : OBD_BRW_READ; @@ -186,12 +191,30 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, else /* async IO */ cl_page_list_del(env, qin, page); - if (++queued == max_pages) { - queued = 0; + queued++; + if (queued == max_pages) { + sync_queue = true; + } else if (crt == CRT_WRITE) { + unsigned int chunks; + unsigned int next_chunks; + + chunks = (queued + ppc - 1) >> ppc_bits; + /* chunk number if add another page */ + next_chunks = (queued + ppc) >> ppc_bits; + + /* next page will excceed write chunk limit */ + if (chunks == osc_max_write_chunks(cli) && + next_chunks > chunks) + sync_queue = true; + } + + if (sync_queue) { result = osc_queue_sync_pages(env, io, osc, &list, brw_flags); if (result < 0) break; + queued = 0; + sync_queue = false; } } From patchwork Thu Feb 27 21:10:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410005 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6C0C1580 for ; Thu, 27 Feb 2020 21:27:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8F4F8246A0 for ; Thu, 27 Feb 2020 21:27:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F4F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4ECF13491CF; Thu, 27 Feb 2020 13:24:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADF8F21FB0C for ; Thu, 27 Feb 2020 13:19:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 65E0D248A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 64D6C46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:48 -0500 Message-Id: <1582838290-17243-181-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 180/622] lustre: osc: speed up page cache cleanup during blocking ASTs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko While we are cleaning a write lock, we don't need to check if page cache pages under this lock are covered by another lock. If a client needs to give up its lock, cleaning gigabytes of page cache can take quite a long time. WC-bug-id: https://jira.whamcloud.com/browse/LU-11296 Lustre-commit: b9ebb17277c7 ("LU-11296 osc: speed up page cache cleanup during blocking ASTs") Signed-off-by: Andrew Perepechko Cray-bug-id: LUS-6352 Reviewed-by: Patrick Farrell Reviewed-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/33090 Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_lock.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 1a2b0bd..eccea37 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -372,7 +372,12 @@ static int osc_lock_flush(struct osc_object *obj, pgoff_t start, pgoff_t end, rc = 0; } - rc2 = osc_lock_discard_pages(env, obj, start, end, discard); + /* + * Do not try to match other locks with CLM_WRITE since we already + * know there're none + */ + rc2 = osc_lock_discard_pages(env, obj, start, end, + mode == CLM_WRITE || discard); if (rc == 0 && rc2 < 0) rc = rc2; From patchwork Thu Feb 27 21:10:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E786814E3 for ; Thu, 27 Feb 2020 21:27:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE85E246A0 for ; Thu, 27 Feb 2020 21:27:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE85E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 082C8348BCD; Thu, 27 Feb 2020 13:24:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F11EC21FB16 for ; Thu, 27 Feb 2020 13:19:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 692FA2498; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 67C42468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:49 -0500 Message-Id: <1582838290-17243-182-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 181/622] lustre: lmv: Fix style issues for lmv_fld.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/lmv/lmv_fld.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 72ee63625055 ("LU-6142 lmv: Fix style issues for lmv_fld.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33566 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_fld.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/lustre/lmv/lmv_fld.c b/fs/lustre/lmv/lmv_fld.c index 00dc858..ef2c866 100644 --- a/fs/lustre/lmv/lmv_fld.c +++ b/fs/lustre/lmv/lmv_fld.c @@ -58,15 +58,17 @@ int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds) */ if (!fid_is_sane(fid) || !(fid_seq_in_fldb(fid_seq(fid)) || fid_seq_is_local_file(fid_seq(fid)))) { - CERROR("%s: invalid FID " DFID "\n", obd->obd_name, PFID(fid)); - return -EINVAL; + rc = -EINVAL; + CERROR("%s: invalid FID " DFID ": rc = %d\n", obd->obd_name, + PFID(fid), rc); + return rc; } rc = fld_client_lookup(&lmv->lmv_fld, fid_seq(fid), mds, LU_SEQ_RANGE_MDT, NULL); if (rc) { - CERROR("Error while looking for mds number. Seq %#llx, err = %d\n", - fid_seq(fid), rc); + CERROR("%s: Error while looking for mds number. Seq %#llx: rc = %d\n", + obd->obd_name, fid_seq(fid), rc); return rc; } @@ -74,9 +76,10 @@ int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds) *mds, PFID(fid)); if (*mds >= lmv->desc.ld_tgt_count) { - CERROR("FLD lookup got invalid mds #%x (max: %x) for fid=" DFID "\n", *mds, lmv->desc.ld_tgt_count, - PFID(fid)); rc = -EINVAL; + CERROR("%s: FLD lookup got invalid mds #%x (max: %x) for fid=" DFID ": rc = %d\n", + obd->obd_name, *mds, lmv->desc.ld_tgt_count, PFID(fid), + rc); } return rc; } From patchwork Thu Feb 27 21:10:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F33414E3 for ; Thu, 27 Feb 2020 21:26:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4437C246A0 for ; Thu, 27 Feb 2020 21:26:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4437C246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C4C83490B7; Thu, 27 Feb 2020 13:23:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42D9921FC2B for ; Thu, 27 Feb 2020 13:19:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6C40E2499; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6AEFA46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:50 -0500 Message-Id: <1582838290-17243-183-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 182/622] lustre: llite: Fix style issues for llite_nfs.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/llite/llite_nfs.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: c648f5ddc3e8 ("LU-6142 llite: Fix style issues for llite_nfs.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33809 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/llite/llite_nfs.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index 434f92b..de8f707 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -64,12 +64,11 @@ struct inode *search_inode_for_lustre(struct super_block *sb, struct ptlrpc_request *req = NULL; struct inode *inode = NULL; int eadatalen = 0; - unsigned long hash = cl_fid_build_ino(fid, - ll_need_32bit_api(sbi)); + unsigned long hash = cl_fid_build_ino(fid, ll_need_32bit_api(sbi)); struct md_op_data *op_data; int rc; - CDEBUG(D_INFO, "searching inode for:(%lu," DFID ")\n", hash, PFID(fid)); + CDEBUG(D_INFO, "searching inode for:(%lu,"DFID")\n", hash, PFID(fid)); inode = ilookup5(sb, hash, ll_test_inode_by_fid, (void *)fid); if (inode) @@ -79,7 +78,8 @@ struct inode *search_inode_for_lustre(struct super_block *sb, if (rc) return ERR_PTR(rc); - /* Because inode is NULL, ll_prep_md_op_data can not + /* + * Because inode is NULL, ll_prep_md_op_data can not * be used here. So we allocate op_data ourselves */ op_data = kzalloc(sizeof(*op_data), GFP_NOFS); @@ -94,6 +94,10 @@ struct inode *search_inode_for_lustre(struct super_block *sb, rc = md_getattr(sbi->ll_md_exp, op_data, &req); kfree(op_data); if (rc) { + /* + * Suppress erroneous/confusing messages when NFS + * is out of sync and requests old data. + */ CDEBUG(D_INFO, "can't get object attrs, fid " DFID ", rc %d\n", PFID(fid), rc); return ERR_PTR(rc); @@ -107,8 +111,8 @@ struct inode *search_inode_for_lustre(struct super_block *sb, } struct lustre_nfs_fid { - struct lu_fid lnf_child; - struct lu_fid lnf_parent; + struct lu_fid lnf_child; + struct lu_fid lnf_parent; }; static struct dentry * From patchwork Thu Feb 27 21:10:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410013 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A64441580 for ; Thu, 27 Feb 2020 21:27:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8EF7B246A0 for ; Thu, 27 Feb 2020 21:27:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8EF7B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B35D349222; Thu, 27 Feb 2020 13:24:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 84A9221FC4C for ; Thu, 27 Feb 2020 13:19:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6F32D249A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6DC7746C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:51 -0500 Message-Id: <1582838290-17243-184-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 183/622] lustre: llite: Fix style issues for lcommon_misc.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/llite/lcommon_misc.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: aac46ee4f871 ("LU-6142 llite: Fix style issues for lcommon_misc.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33810 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/llite/lcommon_misc.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/lustre/llite/lcommon_misc.c b/fs/lustre/llite/lcommon_misc.c index 48503d6..d833a16 100644 --- a/fs/lustre/llite/lcommon_misc.c +++ b/fs/lustre/llite/lcommon_misc.c @@ -41,14 +41,17 @@ #include "llite_internal.h" -/* Initialize the default and maximum LOV EA and cookie sizes. This allows +/* + * Initialize the default and maximum LOV EA and cookie sizes. This allows * us to make MDS RPCs with large enough reply buffers to hold the * maximum-sized (= maximum striped) EA and cookie without having to * calculate this (via a call into the LOV + OSCs) each time we make an RPC. */ static int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp) { - u32 val_size, max_easize, def_easize; + u32 val_size; + u32 max_easize; + u32 def_easize; int rc; val_size = sizeof(max_easize); @@ -83,9 +86,9 @@ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, enum obd_notify_event ev, void *owner) { struct lustre_client_ocd *lco; - struct client_obd *cli; + struct client_obd *cli; u64 flags; - int result; + int result; if (!strcmp(watched->obd_type->typ_name, LUSTRE_OSC_NAME) && watched->obd_set_up && !watched->obd_stopping) { @@ -117,13 +120,13 @@ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, struct ll_grouplock *lg) { - struct lu_env *env; - struct cl_io *io; - struct cl_lock *lock; - struct cl_lock_descr *descr; - u32 enqflags; + struct lu_env *env; + struct cl_io *io; + struct cl_lock *lock; + struct cl_lock_descr *descr; + u32 enqflags; u16 refcheck; - int rc; + int rc; env = cl_env_get(&refcheck); if (IS_ERR(env)) From patchwork Thu Feb 27 21:10:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5359714BC for ; Thu, 27 Feb 2020 21:26:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C202246A0 for ; Thu, 27 Feb 2020 21:26:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C202246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B204B348AE5; Thu, 27 Feb 2020 13:23:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C653921FC4C for ; Thu, 27 Feb 2020 13:19:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 722F3249B; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7106D46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:52 -0500 Message-Id: <1582838290-17243-185-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 184/622] lustre: llite: Fix style issues for symlink.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/llite/symlink.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: e486703b5278 ("LU-6142 llite: Fix style issues for symlink.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33811 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/llite/symlink.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/lustre/llite/symlink.c b/fs/lustre/llite/symlink.c index 0690fdb..d2922d1 100644 --- a/fs/lustre/llite/symlink.c +++ b/fs/lustre/llite/symlink.c @@ -53,7 +53,8 @@ static int ll_readlink_internal(struct inode *inode, int print_limit = min_t(int, PAGE_SIZE - 128, symlen); *symname = lli->lli_symlink_name; - /* If the total CDEBUG() size is larger than a page, it + /* + * If the total CDEBUG() size is larger than a page, it * will print a warning to the console, avoid this by * printing just the last part of the symlink. */ @@ -97,11 +98,11 @@ static int ll_readlink_internal(struct inode *inode, } *symname = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_MD); - if (!*symname || - strnlen(*symname, symlen) != symlen - 1) { + if (!*symname || strnlen(*symname, symlen) != symlen - 1) { /* not full/NULL terminated */ - CERROR("inode %lu: symlink not NULL terminated string of length %d\n", - inode->i_ino, symlen - 1); + CERROR("%s: inode " DFID ": symlink not NULL terminated string of length %d\n", + ll_get_fsname(inode->i_sb, NULL, 0), + PFID(ll_inode2fid(inode)), symlen - 1); rc = -EPROTO; goto failed; } @@ -143,7 +144,8 @@ static const char *ll_get_link(struct dentry *dentry, return ERR_PTR(rc); } - /* symname may contain a pointer to the request message buffer, + /* + * symname may contain a pointer to the request message buffer, * we delay request releasing then. */ set_delayed_call(done, ll_put_link, request); From patchwork Thu Feb 27 21:10:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD1671871 for ; Thu, 27 Feb 2020 21:26:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B5B84246A0 for ; Thu, 27 Feb 2020 21:26:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5B84246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0B15F3490EC; Thu, 27 Feb 2020 13:23:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1340021FC50 for ; Thu, 27 Feb 2020 13:19:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 75214249C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 73FFD468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:53 -0500 Message-Id: <1582838290-17243-186-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 185/622] lustre: headers: define pct(a, b) once X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ben Evans pct is defined 6 times in different places. Define it in one. Also change it to a static inline to do a better job of enforcing types. WC-bug-id: https://jira.whamcloud.com/browse/LU-10171 Lustre-commit: 9b924e86b27d ("LU-10171 headers: define pct(a,b) once") Signed-off-by: Ben Evans Reviewed-on: https://review.whamcloud.com/29852 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/fld/fld_cache.c | 12 ++---------- fs/lustre/include/lprocfs_status.h | 5 +++++ fs/lustre/llite/lproc_llite.c | 7 +++---- fs/lustre/obdclass/genops.c | 6 +----- fs/lustre/osc/lproc_osc.c | 10 +++------- 5 files changed, 14 insertions(+), 26 deletions(-) diff --git a/fs/lustre/fld/fld_cache.c b/fs/lustre/fld/fld_cache.c index d289c29..96be544 100644 --- a/fs/lustre/fld/fld_cache.c +++ b/fs/lustre/fld/fld_cache.c @@ -94,22 +94,14 @@ struct fld_cache *fld_cache_init(const char *name, */ void fld_cache_fini(struct fld_cache *cache) { - u64 pct; - LASSERT(cache); fld_cache_flush(cache); - if (cache->fci_stat.fst_count > 0) { - pct = cache->fci_stat.fst_cache * 100; - do_div(pct, cache->fci_stat.fst_count); - } else { - pct = 0; - } - CDEBUG(D_INFO, "FLD cache statistics (%s):\n", cache->fci_name); CDEBUG(D_INFO, " Total reqs: %llu\n", cache->fci_stat.fst_count); CDEBUG(D_INFO, " Cache reqs: %llu\n", cache->fci_stat.fst_cache); - CDEBUG(D_INFO, " Cache hits: %llu%%\n", pct); + CDEBUG(D_INFO, " Cache hits: %u%%\n", + pct(cache->fci_stat.fst_cache, cache->fci_stat.fst_count)); kfree(cache); } diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 32d43fb..1ef548ae 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -58,6 +58,11 @@ struct lprocfs_vars { umode_t proc_mode; }; +static inline u32 pct(s64 a, s64 b) +{ + return b ? a * 100 / b : 0; +} + struct lprocfs_static_vars { struct lprocfs_vars *obd_vars; const struct attribute_group *sysfs_vars; diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 5fc7705..4060271 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1483,8 +1483,6 @@ void ll_debugfs_unregister_super(struct super_block *sb) lprocfs_free_stats(&sbi->ll_stats); } -#define pct(a, b) (b ? a * 100 / b : 0) - static void ll_display_extents_info(struct ll_rw_extents_info *io_extents, struct seq_file *seq, int which) { @@ -1508,8 +1506,9 @@ static void ll_display_extents_info(struct ll_rw_extents_info *io_extents, w = pp_info->pp_w_hist.oh_buckets[i]; read_cum += r; write_cum += w; - end = 1 << (i + LL_HIST_START - units); - seq_printf(seq, "%4lu%c - %4lu%c%c: %14lu %4lu %4lu | %14lu %4lu %4lu\n", + end = BIT(i + LL_HIST_START - units); + seq_printf(seq, + "%4lu%c - %4lu%c%c: %14lu %4u %4u | %14lu %4u %4u\n", start, *unitp, end, *unitp, (i == LL_HIST_MAX - 1) ? '+' : ' ', r, pct(r, read_tot), pct(read_cum, read_tot), diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 2254943..fd9dd96 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1425,8 +1425,6 @@ int obd_set_max_mod_rpcs_in_flight(struct client_obd *cli, u16 max) } EXPORT_SYMBOL(obd_set_max_mod_rpcs_in_flight); -#define pct(a, b) (b ? (a * 100) / b : 0) - int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) { unsigned long mod_tot = 0, mod_cum; @@ -1452,7 +1450,7 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) unsigned long mod = cli->cl_mod_rpcs_hist.oh_buckets[i]; mod_cum += mod; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u\n", i, mod, pct(mod, mod_tot), pct(mod_cum, mod_tot)); if (mod_cum == mod_tot) @@ -1464,8 +1462,6 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) return 0; } EXPORT_SYMBOL(obd_mod_rpc_stats_seq_show); -#undef pct - /* * The number of modify RPCs sent in parallel is limited * because the server has a finite number of slots per client to diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index d9030b7..ac64724 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -780,8 +780,6 @@ static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr, { NULL } }; -#define pct(a, b) (b ? a * 100 / b : 0) - static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v) { struct timespec64 now; @@ -820,7 +818,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", 1 << i, r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), @@ -844,7 +842,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", i, r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), @@ -868,7 +866,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", (i == 0) ? 0 : 1 << (i - 1), r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), pct(write_cum, write_tot)); @@ -881,8 +879,6 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v) return 0; } -#undef pct - static ssize_t osc_rpc_stats_seq_write(struct file *file, const char __user *buf, size_t len, loff_t *off) From patchwork Thu Feb 27 21:10:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5475514E3 for ; Thu, 27 Feb 2020 21:27:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3959E246A0 for ; Thu, 27 Feb 2020 21:27:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3959E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB71D348B12; Thu, 27 Feb 2020 13:23:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 68EF521FC50 for ; Thu, 27 Feb 2020 13:19:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 78796249D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 76DA446F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:54 -0500 Message-Id: <1582838290-17243-187-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 186/622] lustre: obdclass: report all obd states for OBD_IOC_GETDEVICE X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The wrong state '--' which is reported when the obd device is inactive. Reporting the "IN" state cover all the information that is provided by 'devices' debugfs file. Now all the information from 'devices' can be collected from the lustre sysfs tree. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: adfec49f334d ("LU-8066 obdclass: report all obd states for OBD_IOC_GETDEVICE") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33774 Reviewed-by: Ben Evans Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/class_obd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 4ef9cca..0435f62 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -427,6 +427,8 @@ int class_handle_ioctl(unsigned int cmd, unsigned long arg) if (obd->obd_stopping) status = "ST"; + else if (obd->obd_inactive) + status = "IN"; else if (obd->obd_set_up) status = "UP"; else if (obd->obd_attached) From patchwork Thu Feb 27 21:10:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A20C614E3 for ; Thu, 27 Feb 2020 21:27:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8AB91246A0 for ; Thu, 27 Feb 2020 21:27:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8AB91246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A50FC21FF2F; Thu, 27 Feb 2020 13:23:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A964D21FC59 for ; Thu, 27 Feb 2020 13:19:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7D31D24AE; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A2DE46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:55 -0500 Message-Id: <1582838290-17243-188-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 187/622] lustre: ldlm: remove trace from ldlm_pool_count() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: "John L. Hammond" The trace in ldlm_pool_count() is too noisy given its information value so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-10862 Lustre-commit: 3fe2096dfc30 ("LU-10862 ldlm: remove trace from ldlm_pool_{count,skrink}()") Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/31820 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_pool.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c index d2149a6..b2b3ead 100644 --- a/fs/lustre/ldlm/ldlm_pool.c +++ b/fs/lustre/ldlm/ldlm_pool.c @@ -773,9 +773,6 @@ static unsigned long ldlm_pools_count(enum ldlm_side client, gfp_t gfp_mask) if (client == LDLM_NAMESPACE_CLIENT && !(gfp_mask & __GFP_FS)) return 0; - CDEBUG(D_DLMTRACE, "Request to count %s locks from all pools\n", - client == LDLM_NAMESPACE_CLIENT ? "client" : "server"); - /* * Find out how many resources we may release. */ From patchwork Thu Feb 27 21:10:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410241 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05176138D for ; Thu, 27 Feb 2020 21:33:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E188824677 for ; Thu, 27 Feb 2020 21:33:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E188824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0C31E348CB5; Thu, 27 Feb 2020 13:28:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EBC3E21FBA3 for ; Thu, 27 Feb 2020 13:19:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7EA3524AF; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7D51446C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:56 -0500 Message-Id: <1582838290-17243-189-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 188/622] lustre: ptlrpc: clean up rq_interpret_reply callbacks X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Clean up the function prototypes of several rq_interpret_reply callback functions to match the function pointer type instead of using typecasting to avoid the risk of bad function pointers. Clean up related code to match code style. WC-bug-id: https://jira.whamcloud.com/browse/LU-11398 Lustre-commit: 4014ddbb2350 ("LU-11398 ptlrpc: clean up rq_interpret_reply callbacks") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33203 Reviewed-by: Arshad Hussain Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 11 +++++---- fs/lustre/mdc/mdc_dev.c | 10 ++++---- fs/lustre/osc/osc_io.c | 4 ++-- fs/lustre/osc/osc_request.c | 53 +++++++++++++++++++++++-------------------- fs/lustre/ptlrpc/client.c | 13 ++++------- fs/lustre/ptlrpc/import.c | 9 ++++---- 6 files changed, 50 insertions(+), 50 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 1afe9a5..b9e9ae9 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -826,8 +826,9 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, */ static int lock_convert_interpret(const struct lu_env *env, struct ptlrpc_request *req, - struct ldlm_async_args *aa, int rc) + void *args, int rc) { + struct ldlm_async_args *aa = args; struct ldlm_lock *lock; struct ldlm_reply *reply; @@ -1010,7 +1011,7 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) aa = ptlrpc_req_async_args(aa, req); ldlm_lock2handle(lock, &aa->lock_handle); - req->rq_interpret_reply = (ptlrpc_interpterer_t)lock_convert_interpret; + req->rq_interpret_reply = lock_convert_interpret; ptlrpcd_add_req(req); return 0; @@ -2117,9 +2118,9 @@ static int ldlm_chain_lock_for_replay(struct ldlm_lock *lock, void *closure) } static int replay_lock_interpret(const struct lu_env *env, - struct ptlrpc_request *req, - struct ldlm_async_args *aa, int rc) + struct ptlrpc_request *req, void *args, int rc) { + struct ldlm_async_args *aa = args; struct ldlm_lock *lock; struct ldlm_reply *reply; struct obd_export *exp; @@ -2234,7 +2235,7 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock) atomic_inc(&req->rq_import->imp_replay_inflight); aa = ptlrpc_req_async_args(aa, req); aa->lock_handle = body->lock_handle[0]; - req->rq_interpret_reply = (ptlrpc_interpterer_t)replay_lock_interpret; + req->rq_interpret_reply = replay_lock_interpret; ptlrpcd_add_req(req); return 0; diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 21dc83e..306b917 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -602,8 +602,9 @@ int mdc_enqueue_fini(struct ptlrpc_request *req, osc_enqueue_upcall_f upcall, } int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req, - struct osc_enqueue_args *aa, int rc) + void *args, int rc) { + struct osc_enqueue_args *aa = args; struct ldlm_lock *lock; struct lustre_handle *lockh = &aa->oa_lockh; enum ldlm_mode mode = aa->oa_mode; @@ -745,8 +746,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, aa->oa_flags = flags; aa->oa_lvb = lvb; - req->rq_interpret_reply = - (ptlrpc_interpterer_t)mdc_enqueue_interpret; + req->rq_interpret_reply = mdc_enqueue_interpret; ptlrpcd_add_req(req); } else { ptlrpc_req_finished(req); @@ -1121,9 +1121,9 @@ struct mdc_data_version_args { static int mdc_data_version_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *arg, int rc) + void *args, int rc) { - struct mdc_data_version_args *dva = arg; + struct mdc_data_version_args *dva = args; struct osc_io *oio = dva->dva_oio; const struct mdt_body *body; diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 56f30cb..76657f3 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -656,9 +656,9 @@ struct osc_data_version_args { static int osc_data_version_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *arg, int rc) + void *args, int rc) { - struct osc_data_version_args *dva = arg; + struct osc_data_version_args *dva = args; struct osc_io *oio = dva->dva_oio; const struct ost_body *body; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 1fc7a57..ba84bd1 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -188,9 +188,9 @@ static int osc_setattr(const struct lu_env *env, struct obd_export *exp, } static int osc_setattr_interpret(const struct lu_env *env, - struct ptlrpc_request *req, - struct osc_setattr_args *sa, int rc) + struct ptlrpc_request *req, void *args, int rc) { + struct osc_setattr_args *sa = args; struct ost_body *body; if (rc != 0) @@ -236,8 +236,7 @@ int osc_setattr_async(struct obd_export *exp, struct obdo *oa, /* Do not wait for response. */ ptlrpcd_add_req(req); } else { - req->rq_interpret_reply = - (ptlrpc_interpterer_t)osc_setattr_interpret; + req->rq_interpret_reply = osc_setattr_interpret; sa = ptlrpc_req_async_args(sa, req); sa->sa_oa = oa; @@ -417,7 +416,7 @@ int osc_punch_send(struct obd_export *exp, struct obdo *oa, ptlrpc_request_set_replen(req); - req->rq_interpret_reply = (ptlrpc_interpterer_t)osc_setattr_interpret; + req->rq_interpret_reply = osc_setattr_interpret; sa = ptlrpc_req_async_args(sa, req); sa->sa_oa = oa; sa->sa_upcall = upcall; @@ -545,13 +544,13 @@ static int osc_resource_get_unused(struct obd_export *exp, struct obdo *oa, } static int osc_destroy_interpret(const struct lu_env *env, - struct ptlrpc_request *req, void *data, - int rc) + struct ptlrpc_request *req, void *args, int rc) { struct client_obd *cli = &req->rq_import->imp_obd->u.cli; atomic_dec(&cli->cl_destroy_in_flight); wake_up(&cli->cl_destroy_waitq); + return 0; } @@ -734,14 +733,14 @@ struct grant_thread_data { static int osc_shrink_grant_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *aa, int rc) + void *args, int rc) { + struct osc_brw_async_args *aa = args; struct client_obd *cli = &req->rq_import->imp_obd->u.cli; - struct obdo *oa = ((struct osc_brw_async_args *)aa)->aa_oa; struct ost_body *body; if (rc != 0) { - __osc_update_grant(cli, oa->o_grant); + __osc_update_grant(cli, aa->aa_oa->o_grant); goto out; } @@ -749,7 +748,8 @@ static int osc_shrink_grant_interpret(const struct lu_env *env, LASSERT(body); osc_update_grant(cli, body); out: - kmem_cache_free(osc_obdo_kmem, oa); + kmem_cache_free(osc_obdo_kmem, aa->aa_oa); + return rc; } @@ -1951,7 +1951,8 @@ static int osc_brw_redo_request(struct ptlrpc_request *request, request, oap->oap_request); } } - /* New request takes over pga and oaps from old request. + /* + * New request takes over pga and oaps from old request. * Note that copying a list_head doesn't work, need to move it... */ aa->aa_resends++; @@ -2034,9 +2035,9 @@ static void osc_release_ppga(struct brw_page **ppga, u32 count) } static int brw_interpret(const struct lu_env *env, - struct ptlrpc_request *req, void *data, int rc) + struct ptlrpc_request *req, void *args, int rc) { - struct osc_brw_async_args *aa = data; + struct osc_brw_async_args *aa = args; struct osc_extent *ext; struct osc_extent *tmp; struct client_obd *cli = aa->aa_cli; @@ -2044,7 +2045,8 @@ static int brw_interpret(const struct lu_env *env, rc = osc_brw_fini_request(req, rc); CDEBUG(D_INODE, "request %p aa %p rc %d\n", req, aa, rc); - /* When server return -EINPROGRESS, client should always retry + /* + * When server returns -EINPROGRESS, client should always retry * regardless of the number of times the bulk was resent already. */ if (osc_recoverable_error(rc) && !req->rq_no_delay) { @@ -2425,8 +2427,9 @@ int osc_enqueue_fini(struct ptlrpc_request *req, osc_enqueue_upcall_f upcall, } int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req, - struct osc_enqueue_args *aa, int rc) + void *args, int rc) { + struct osc_enqueue_args *aa = args; struct ldlm_lock *lock; struct lustre_handle *lockh = &aa->oa_lockh; enum ldlm_mode mode = aa->oa_mode; @@ -2627,8 +2630,7 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, aa->oa_flags = NULL; } - req->rq_interpret_reply = - (ptlrpc_interpterer_t)osc_enqueue_interpret; + req->rq_interpret_reply = osc_enqueue_interpret; ptlrpc_set_add_req(rqset, req); } else if (intent) { ptlrpc_req_finished(req); @@ -2690,16 +2692,16 @@ int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id, static int osc_statfs_interpret(const struct lu_env *env, struct ptlrpc_request *req, - struct osc_async_args *aa, int rc) + void *args, int rc) { + struct osc_async_args *aa = args; struct obd_statfs *msfs; if (rc == -EBADR) - /* The request has in fact never been sent - * due to issues at a higher level (LOV). - * Exit immediately since the caller is - * aware of the problem and takes care - * of the clean up + /* The request has in fact never been sent due to + * issues at a higher level (LOV). Exit immediately + * since the caller is aware of the problem and takes + * care of the clean up */ return rc; @@ -2721,6 +2723,7 @@ static int osc_statfs_interpret(const struct lu_env *env, *aa->aa_oi->oi_osfs = *msfs; out: rc = aa->aa_oi->oi_cb_up(aa->aa_oi, rc); + return rc; } @@ -2759,7 +2762,7 @@ static int osc_statfs_async(struct obd_export *exp, req->rq_no_delay = 1; } - req->rq_interpret_reply = (ptlrpc_interpterer_t)osc_statfs_interpret; + req->rq_interpret_reply = osc_statfs_interpret; aa = ptlrpc_req_async_args(aa, req); aa->aa_oi = oinfo; diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index fabe675..ff212a3 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2872,9 +2872,9 @@ int ptlrpc_queue_wait(struct ptlrpc_request *req) */ static int ptlrpc_replay_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *data, int rc) + void *args, int rc) { - struct ptlrpc_replay_async_args *aa = data; + struct ptlrpc_replay_async_args *aa = args; struct obd_import *imp = req->rq_import; atomic_dec(&imp->imp_replay_inflight); @@ -2993,10 +2993,7 @@ int ptlrpc_replay_req(struct ptlrpc_request *req) /* Re-adjust the timeout for current conditions */ ptlrpc_at_set_req_timeout(req); - /* - * Tell server the net_latency, so the server can calculate how long - * it should wait for next replay - */ + /* Tell server net_latency to calculate how long to wait for reply. */ lustre_msg_set_service_time(req->rq_reqmsg, ptlrpc_at_get_net_latency(req)); DEBUG_REQ(D_HA, req, "REPLAY"); @@ -3252,9 +3249,9 @@ static void ptlrpcd_add_work_req(struct ptlrpc_request *req) } static int work_interpreter(const struct lu_env *env, - struct ptlrpc_request *req, void *data, int rc) + struct ptlrpc_request *req, void *args, int rc) { - struct ptlrpc_work_async_args *arg = data; + struct ptlrpc_work_async_args *arg = args; LASSERT(ptlrpcd_check_work(req)); diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index f59af80..867aff6 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -104,7 +104,7 @@ static void __import_set_state(struct obd_import *imp, static int ptlrpc_connect_interpret(const struct lu_env *env, struct ptlrpc_request *request, - void *data, int rc); + void *args, int rc); /* Only this function is allowed to change the import state when it is * CLOSED. I would rather refcount the import and free it after @@ -1263,11 +1263,10 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, */ static int completed_replay_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *data, int rc) + void *args, int rc) { atomic_dec(&req->rq_import->imp_replay_inflight); - if (req->rq_status == 0 && - !req->rq_import->imp_vbr_failed) { + if (req->rq_status == 0 && !req->rq_import->imp_vbr_failed) { ptlrpc_import_recovery_state_machine(req->rq_import); } else { if (req->rq_import->imp_vbr_failed) { @@ -1590,7 +1589,7 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, struct ptlrpc_request *req, - void *data, int rc) + void *args, int rc) { struct obd_import *imp = req->rq_import; int connect = 0; From patchwork Thu Feb 27 21:10:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410245 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83A4F92A for ; Thu, 27 Feb 2020 21:33:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6C5C824677 for ; Thu, 27 Feb 2020 21:33:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C5C824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDBAF3492D5; Thu, 27 Feb 2020 13:28:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4C93521FBA3 for ; Thu, 27 Feb 2020 13:19:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8168E24B0; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 80400468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:57 -0500 Message-Id: <1582838290-17243-190-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 189/622] lustre: lov: quiet lov_dump_lmm_ console messages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Limit messages in lov_dump_lmm_objects() and lov_dump_lmm_common() printing to the console repeatedly when D_ERROR is used. Change CDEBUG() to CDEBUG_LIMIT() so that rate-limiting is applied. WC-bug-id: https://jira.whamcloud.com/browse/LU-11579 Lustre-commit: d9ef75eb8226 ("LU-11579 lov: quiet lov_dump_lmm_ console messages") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33513 Reviewed-by: Bobi Jam Reviewed-by: Patrick Farrell Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_pack.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c index 5f8b281..c6dec2d 100644 --- a/fs/lustre/lov/lov_pack.c +++ b/fs/lustre/lov/lov_pack.c @@ -55,13 +55,13 @@ void lov_dump_lmm_common(int level, void *lmmp) struct ost_id oi; lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi); - CDEBUG(level, "objid " DOSTID ", magic 0x%08x, pattern %#x\n", - POSTID(&oi), le32_to_cpu(lmm->lmm_magic), - le32_to_cpu(lmm->lmm_pattern)); - CDEBUG(level, "stripe_size %u, stripe_count %u, layout_gen %u\n", - le32_to_cpu(lmm->lmm_stripe_size), - le16_to_cpu(lmm->lmm_stripe_count), - le16_to_cpu(lmm->lmm_layout_gen)); + CDEBUG_LIMIT(level, "objid " DOSTID ", magic 0x%08x, pattern %#x\n", + POSTID(&oi), le32_to_cpu(lmm->lmm_magic), + le32_to_cpu(lmm->lmm_pattern)); + CDEBUG_LIMIT(level, "stripe_size %u, stripe_count %u, layout_gen %u\n", + le32_to_cpu(lmm->lmm_stripe_size), + le16_to_cpu(lmm->lmm_stripe_count), + le16_to_cpu(lmm->lmm_layout_gen)); } static void lov_dump_lmm_objects(int level, struct lov_ost_data *lod, @@ -70,8 +70,9 @@ static void lov_dump_lmm_objects(int level, struct lov_ost_data *lod, int i; if (stripe_count > LOV_V1_INSANE_STRIPE_COUNT) { - CDEBUG(level, "bad stripe_count %u > max_stripe_count %u\n", - stripe_count, LOV_V1_INSANE_STRIPE_COUNT); + CDEBUG_LIMIT(level, + "bad stripe_count %u > max_stripe_count %u\n", + stripe_count, LOV_V1_INSANE_STRIPE_COUNT); return; } @@ -79,8 +80,8 @@ static void lov_dump_lmm_objects(int level, struct lov_ost_data *lod, struct ost_id oi; ostid_le_to_cpu(&lod->l_ost_oi, &oi); - CDEBUG(level, "stripe %u idx %u subobj " DOSTID "\n", i, - le32_to_cpu(lod->l_ost_idx), POSTID(&oi)); + CDEBUG_LIMIT(level, "stripe %u idx %u subobj " DOSTID "\n", i, + le32_to_cpu(lod->l_ost_idx), POSTID(&oi)); } } @@ -94,7 +95,8 @@ void lov_dump_lmm_v1(int level, struct lov_mds_md_v1 *lmm) void lov_dump_lmm_v3(int level, struct lov_mds_md_v3 *lmm) { lov_dump_lmm_common(level, lmm); - CDEBUG(level, "pool_name " LOV_POOLNAMEF "\n", lmm->lmm_pool_name); + CDEBUG_LIMIT(level, "pool_name " LOV_POOLNAMEF "\n", + lmm->lmm_pool_name); lov_dump_lmm_objects(level, lmm->lmm_objects, le16_to_cpu(lmm->lmm_stripe_count)); } From patchwork Thu Feb 27 21:10:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410023 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 93C6B138D for ; Thu, 27 Feb 2020 21:28:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7890B246A0 for ; Thu, 27 Feb 2020 21:28:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7890B246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 479BF3487A0; Thu, 27 Feb 2020 13:24:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B0C1021FBA3 for ; Thu, 27 Feb 2020 13:19:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8446224B1; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8337146D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:58 -0500 Message-Id: <1582838290-17243-191-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 190/622] lustre: lov: cl_cache could miss initialize X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng The cl_cache may be missed initialize when we mount a client with deactivate osc and then active it. WC-bug-id: https://jira.whamcloud.com/browse/LU-11658 Lustre-commit: 42e83c44eb5a ("LU-11658 lov: cl_cache could miss initialize") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/33650 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index a16c663..08d7edc 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -360,23 +360,6 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid, tgt = lov->lov_tgts[index]; if (!tgt) continue; - /* - * LU-642, initially inactive OSC could miss the obd_connect, - * we make up for it here. - */ - if (ev == OBD_NOTIFY_ACTIVATE && !tgt->ltd_exp && - obd_uuid_equals(uuid, &tgt->ltd_uuid)) { - struct obd_uuid lov_osc_uuid = {"LOV_OSC_UUID"}; - - obd_connect(NULL, &tgt->ltd_exp, tgt->ltd_obd, - &lov_osc_uuid, &lov->lov_ocd, NULL); - } - if (!tgt->ltd_exp) - continue; - - CDEBUG(D_INFO, "lov idx %d is %s conn %#llx\n", - index, obd_uuid2str(&tgt->ltd_uuid), - tgt->ltd_exp->exp_handle.h_cookie); if (obd_uuid_equals(uuid, &tgt->ltd_uuid)) break; } @@ -389,6 +372,31 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid, if (ev == OBD_NOTIFY_DEACTIVATE || ev == OBD_NOTIFY_ACTIVATE) { activate = (ev == OBD_NOTIFY_ACTIVATE) ? 1 : 0; + /* + * LU-642, initially inactive OSC could miss the obd_connect, + * we make up for it here. + */ + if (activate && !tgt->ltd_exp) { + int rc; + struct obd_uuid lov_osc_uuid = {"LOV_OSC_UUID"}; + + rc = obd_connect(NULL, &tgt->ltd_exp, tgt->ltd_obd, + &lov_osc_uuid, &lov->lov_ocd, NULL); + if (rc || !tgt->ltd_exp) { + index = rc; + goto out; + } + rc = obd_set_info_async(NULL, tgt->ltd_exp, + sizeof(KEY_CACHE_SET), + KEY_CACHE_SET, + sizeof(struct cl_client_cache), + lov->lov_cache, NULL); + if (rc < 0) { + index = rc; + goto out; + } + } + if (lov->lov_tgts[index]->ltd_activate == activate) { CDEBUG(D_INFO, "OSC %s already %sactivate!\n", uuid->uuid, activate ? "" : "de"); @@ -421,6 +429,10 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid, CERROR("Unknown event(%d) for uuid %s", ev, uuid->uuid); } + if (tgt->ltd_exp) + CDEBUG(D_INFO, "%s: lov idx %d conn %llx\n", obd_uuid2str(uuid), + index, tgt->ltd_exp->exp_handle.h_cookie); + out: lov_tgts_putref(obd); return index; From patchwork Thu Feb 27 21:10:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DEC81580 for ; Thu, 27 Feb 2020 21:28:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 467BE246A0 for ; Thu, 27 Feb 2020 21:28:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 467BE246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D95CE3492B9; Thu, 27 Feb 2020 13:24:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F2C7A21FC63 for ; Thu, 27 Feb 2020 13:19:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8836C24BA; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8610A46F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:10:59 -0500 Message-Id: <1582838290-17243-192-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 191/622] lnet: socklnd: improve scheduling algorithm X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Modified the scheduling algorithm to use all scheduler threads available. Previously a connection is assigned a single thread and can only use that one. With this patch any scheduler thread available on the assigned CPT can pick up and work on requests queued on the connection. WC-bug-id: https://jira.whamcloud.com/browse/LU-11415 Lustre-commit: 89df5e712ffd ("LU-11415 socklnd: improve scheduling algorithm") Reviewed-on: https://review.whamcloud.com/33740 Signed-off-by: Amir Shehata Reviewed-by: Jinshan Xiong Reviewed-by: Olaf Weber Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 156 +++++++++++++----------------------- net/lnet/klnds/socklnd/socklnd.h | 18 ++--- net/lnet/klnds/socklnd/socklnd_cb.c | 8 +- 3 files changed, 65 insertions(+), 117 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index ba5623a..8b283ac 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -648,34 +648,21 @@ struct ksock_peer * static struct ksock_sched * ksocknal_choose_scheduler_locked(unsigned int cpt) { - struct ksock_sched_info *info = ksocknal_data.ksnd_sched_info[cpt]; - struct ksock_sched *sched; + struct ksock_sched *sched = ksocknal_data.ksnd_schedulers[cpt]; int i; - if (info->ksi_nthreads == 0) { - cfs_percpt_for_each(info, i, ksocknal_data.ksnd_sched_info) { - if (info->ksi_nthreads > 0) { + if (sched->kss_nthreads == 0) { + cfs_percpt_for_each(sched, i, ksocknal_data.ksnd_schedulers) { + if (sched->kss_nthreads > 0) { CDEBUG(D_NET, "scheduler[%d] has no threads. selected scheduler[%d]\n", - cpt, info->ksi_cpt); - goto select_sched; + cpt, sched->kss_cpt); + return sched; } } return NULL; } -select_sched: - sched = &info->ksi_scheds[0]; - /* - * NB: it's safe so far, but info->ksi_nthreads could be changed - * at runtime when we have dynamic LNet configuration, then we - * need to take care of this. - */ - for (i = 1; i < info->ksi_nthreads; i++) { - if (sched->kss_nconns > info->ksi_scheds[i].kss_nconns) - sched = &info->ksi_scheds[i]; - } - return sched; } @@ -1276,7 +1263,7 @@ struct ksock_peer * * The cpt might have changed if we ended up selecting a non cpt * native scheduler. So use the scheduler's cpt instead. */ - cpt = sched->kss_info->ksi_cpt; + cpt = sched->kss_cpt; sched->kss_nconns++; conn->ksnc_scheduler = sched; @@ -1316,11 +1303,11 @@ struct ksock_peer * * (b) normal I/O on the conn is blocked until I setup and call the * socket callbacks. */ - CDEBUG(D_NET, "New conn %s p %d.x %pI4h -> %pI4h/%d incarnation:%lld sched[%d:%d]\n", + CDEBUG(D_NET, + "New conn %s p %d.x %pI4h -> %pI4h/%d incarnation:%lld sched[%d]\n", libcfs_id2str(peerid), conn->ksnc_proto->pro_version, &conn->ksnc_myipaddr, &conn->ksnc_ipaddr, - conn->ksnc_port, incarnation, cpt, - (int)(sched - &sched->kss_info->ksi_scheds[0])); + conn->ksnc_port, incarnation, cpt); if (active) { /* additional routes after interface exchange? */ @@ -2209,7 +2196,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) data->ioc_u32[1] = conn->ksnc_port; data->ioc_u32[2] = conn->ksnc_myipaddr; data->ioc_u32[3] = conn->ksnc_type; - data->ioc_u32[4] = conn->ksnc_scheduler->kss_info->ksi_cpt; + data->ioc_u32[4] = conn->ksnc_scheduler->kss_cpt; data->ioc_u32[5] = rxmem; data->ioc_u32[6] = conn->ksnc_peer->ksnp_id.pid; ksocknal_conn_decref(conn); @@ -2248,14 +2235,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) { LASSERT(!atomic_read(&ksocknal_data.ksnd_nactive_txs)); - if (ksocknal_data.ksnd_sched_info) { - struct ksock_sched_info *info; - int i; - - cfs_percpt_for_each(info, i, ksocknal_data.ksnd_sched_info) - kfree(info->ksi_scheds); - cfs_percpt_free(ksocknal_data.ksnd_sched_info); - } + if (ksocknal_data.ksnd_schedulers) + cfs_percpt_free(ksocknal_data.ksnd_schedulers); kvfree(ksocknal_data.ksnd_peers); @@ -2282,10 +2263,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) static void ksocknal_base_shutdown(void) { - struct ksock_sched_info *info; struct ksock_sched *sched; int i; - int j; LASSERT(!ksocknal_data.ksnd_nnets); @@ -2305,22 +2284,14 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) LASSERT(list_empty(&ksocknal_data.ksnd_connd_connreqs)); LASSERT(list_empty(&ksocknal_data.ksnd_connd_routes)); - if (ksocknal_data.ksnd_sched_info) { - cfs_percpt_for_each(info, i, - ksocknal_data.ksnd_sched_info) { - if (!info->ksi_scheds) - continue; + if (ksocknal_data.ksnd_schedulers) { + cfs_percpt_for_each(sched, i, + ksocknal_data.ksnd_schedulers) { - for (j = 0; j < info->ksi_nthreads_max; j++) { - sched = &info->ksi_scheds[j]; - LASSERT(list_empty( - &sched->kss_tx_conns)); - LASSERT(list_empty( - &sched->kss_rx_conns)); - LASSERT(list_empty( - &sched->kss_zombie_noop_txs)); - LASSERT(!sched->kss_nconns); - } + LASSERT(list_empty(&sched->kss_tx_conns)); + LASSERT(list_empty(&sched->kss_rx_conns)); + LASSERT(list_empty(&sched->kss_zombie_noop_txs)); + LASSERT(!sched->kss_nconns); } } @@ -2329,17 +2300,10 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) wake_up_all(&ksocknal_data.ksnd_connd_waitq); wake_up_all(&ksocknal_data.ksnd_reaper_waitq); - if (ksocknal_data.ksnd_sched_info) { - cfs_percpt_for_each(info, i, - ksocknal_data.ksnd_sched_info) { - if (!info->ksi_scheds) - continue; - - for (j = 0; j < info->ksi_nthreads_max; j++) { - sched = &info->ksi_scheds[j]; + if (ksocknal_data.ksnd_schedulers) { + cfs_percpt_for_each(sched, i, + ksocknal_data.ksnd_schedulers) wake_up_all(&sched->kss_waitq); - } - } } i = 4; @@ -2367,7 +2331,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) static int ksocknal_base_startup(void) { - struct ksock_sched_info *info; + struct ksock_sched *sched; int rc; int i; @@ -2409,15 +2373,18 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) ksocknal_data.ksnd_init = SOCKNAL_INIT_DATA; try_module_get(THIS_MODULE); - ksocknal_data.ksnd_sched_info = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(*info)); - if (!ksocknal_data.ksnd_sched_info) + /* Create a scheduler block per available CPT */ + ksocknal_data.ksnd_schedulers = cfs_percpt_alloc(lnet_cpt_table(), + sizeof(*sched)); + if (!ksocknal_data.ksnd_schedulers) goto failed; - cfs_percpt_for_each(info, i, ksocknal_data.ksnd_sched_info) { - struct ksock_sched *sched; + cfs_percpt_for_each(sched, i, ksocknal_data.ksnd_schedulers) { int nthrs; + /* make sure not to allocate more threads than there are + * cores/CPUs in the CPT + */ nthrs = cfs_cpt_weight(lnet_cpt_table(), i); if (*ksocknal_tunables.ksnd_nscheds > 0) { nthrs = min(nthrs, *ksocknal_tunables.ksnd_nscheds); @@ -2429,27 +2396,14 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) nthrs = min(max(SOCKNAL_NSCHEDS, nthrs >> 1), nthrs); } - info->ksi_nthreads_max = nthrs; - info->ksi_cpt = i; - - if (nthrs == 0) - continue; - - info->ksi_scheds = kzalloc_cpt(info->ksi_nthreads_max * sizeof(*sched), - GFP_NOFS, i); - if (!info->ksi_scheds) - goto failed; - - for (; nthrs > 0; nthrs--) { - sched = &info->ksi_scheds[nthrs - 1]; + sched->kss_nthreads_max = nthrs; + sched->kss_cpt = i; - sched->kss_info = info; - spin_lock_init(&sched->kss_lock); - INIT_LIST_HEAD(&sched->kss_rx_conns); - INIT_LIST_HEAD(&sched->kss_tx_conns); - INIT_LIST_HEAD(&sched->kss_zombie_noop_txs); - init_waitqueue_head(&sched->kss_waitq); - } + spin_lock_init(&sched->kss_lock); + INIT_LIST_HEAD(&sched->kss_rx_conns); + INIT_LIST_HEAD(&sched->kss_tx_conns); + INIT_LIST_HEAD(&sched->kss_zombie_noop_txs); + init_waitqueue_head(&sched->kss_waitq); } ksocknal_data.ksnd_connd_starting = 0; @@ -2646,37 +2600,35 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) } static int -ksocknal_start_schedulers(struct ksock_sched_info *info) +ksocknal_start_schedulers(struct ksock_sched *sched) { int nthrs; int rc = 0; int i; - if (!info->ksi_nthreads) { + if (sched->kss_nthreads == 0) { if (*ksocknal_tunables.ksnd_nscheds > 0) { - nthrs = info->ksi_nthreads_max; + nthrs = sched->kss_nthreads_max; } else { nthrs = cfs_cpt_weight(lnet_cpt_table(), - info->ksi_cpt); + sched->kss_cpt); nthrs = min(max(SOCKNAL_NSCHEDS, nthrs >> 1), nthrs); nthrs = min(SOCKNAL_NSCHEDS_HIGH, nthrs); } - nthrs = min(nthrs, info->ksi_nthreads_max); + nthrs = min(nthrs, sched->kss_nthreads_max); } else { - LASSERT(info->ksi_nthreads <= info->ksi_nthreads_max); + LASSERT(sched->kss_nthreads <= sched->kss_nthreads_max); /* increase two threads if there is new interface */ - nthrs = min(2, info->ksi_nthreads_max - info->ksi_nthreads); + nthrs = min(2, sched->kss_nthreads_max - sched->kss_nthreads); } for (i = 0; i < nthrs; i++) { long id; char name[20]; - struct ksock_sched *sched; - id = KSOCK_THREAD_ID(info->ksi_cpt, info->ksi_nthreads + i); - sched = &info->ksi_scheds[KSOCK_THREAD_SID(id)]; + id = KSOCK_THREAD_ID(sched->kss_cpt, sched->kss_nthreads + i); snprintf(name, sizeof(name), "socknal_sd%02d_%02d", - info->ksi_cpt, (int)(sched - &info->ksi_scheds[0])); + sched->kss_cpt, (int)KSOCK_THREAD_SID(id)); rc = ksocknal_thread_start(ksocknal_scheduler, (void *)id, name); @@ -2684,11 +2636,11 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) continue; CERROR("Can't spawn thread %d for scheduler[%d]: %d\n", - info->ksi_cpt, info->ksi_nthreads + i, rc); + sched->kss_cpt, (int)KSOCK_THREAD_SID(id), rc); break; } - info->ksi_nthreads += i; + sched->kss_nthreads += i; return rc; } @@ -2703,16 +2655,16 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) return -EINVAL; for (i = 0; i < ncpts; i++) { - struct ksock_sched_info *info; + struct ksock_sched *sched; int cpt = !cpts ? i : cpts[i]; LASSERT(cpt < cfs_cpt_number(lnet_cpt_table())); - info = ksocknal_data.ksnd_sched_info[cpt]; + sched = ksocknal_data.ksnd_schedulers[cpt]; - if (!newif && info->ksi_nthreads > 0) + if (!newif && sched->kss_nthreads > 0) continue; - rc = ksocknal_start_schedulers(info); + rc = ksocknal_start_schedulers(sched); if (rc) return rc; } diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index c8d8acf..2e292f0 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -74,8 +74,7 @@ # define SOCKNAL_RISK_KMAP_DEADLOCK 1 #endif -struct ksock_sched_info; - +/* per scheduler state */ struct ksock_sched { /* per scheduler state */ spinlock_t kss_lock; /* serialise */ struct list_head kss_rx_conns; /* conn waiting to be read */ @@ -85,15 +84,14 @@ struct ksock_sched { /* per scheduler state */ int kss_nconns; /* # connections assigned to * this scheduler */ - struct ksock_sched_info *kss_info; /* owner of it */ + /* max allowed threads */ + int kss_nthreads_max; + /* number of threads */ + int kss_nthreads; + /* CPT id */ + int kss_cpt; }; -struct ksock_sched_info { - int ksi_nthreads_max; /* max allowed threads */ - int ksi_nthreads; /* number of threads */ - int ksi_cpt; /* CPT id */ - struct ksock_sched *ksi_scheds; /* array of schedulers */ -}; #define KSOCK_CPT_SHIFT 16 #define KSOCK_THREAD_ID(cpt, sid) (((cpt) << KSOCK_CPT_SHIFT) | (sid)) @@ -197,7 +195,7 @@ struct ksock_nal_data { int ksnd_nthreads; /* # live threads */ int ksnd_shuttingdown; /* tell threads to exit */ - struct ksock_sched_info **ksnd_sched_info; /* schedulers info */ + struct ksock_sched **ksnd_schedulers; /* schedulers info */ atomic_t ksnd_nactive_txs; /* #active txs */ diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index abb3529..581f734 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -1349,7 +1349,6 @@ struct ksock_route * int ksocknal_scheduler(void *arg) { - struct ksock_sched_info *info; struct ksock_sched *sched; struct ksock_conn *conn; struct ksock_tx *tx; @@ -1357,13 +1356,12 @@ int ksocknal_scheduler(void *arg) int nloops = 0; long id = (long)arg; - info = ksocknal_data.ksnd_sched_info[KSOCK_THREAD_CPT(id)]; - sched = &info->ksi_scheds[KSOCK_THREAD_SID(id)]; + sched = ksocknal_data.ksnd_schedulers[KSOCK_THREAD_CPT(id)]; - rc = cfs_cpt_bind(lnet_cpt_table(), info->ksi_cpt); + rc = cfs_cpt_bind(lnet_cpt_table(), sched->kss_cpt); if (rc) { CWARN("Can't set CPU partition affinity to %d: %d\n", - info->ksi_cpt, rc); + sched->kss_cpt, rc); } spin_lock_bh(&sched->kss_lock); From patchwork Thu Feb 27 21:11:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410125 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF54C138D for ; Thu, 27 Feb 2020 21:30:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C826F20801 for ; Thu, 27 Feb 2020 21:30:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C826F20801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D050934970A; Thu, 27 Feb 2020 13:26:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58C2221FA80 for ; Thu, 27 Feb 2020 13:19:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8B9F024BB; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8918E46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:00 -0500 Message-Id: <1582838290-17243-193-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 192/622] lustre: ldlm: Adjust search_* functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The search_itree and search_queue functions should both return either a pointer to a found lock or NULL. Currently, search_itree just returns the contents of data->lmd_lock, whether or not a lock was found. search_queue will do the same under certain cirumstances. Zero lmd_lock in both search_* functions, and also stop searching in search_itree once a lock is found. cray-bug-id: LUS-6783 WC-bug-id: https://jira.whamcloud.com/browse/LU-11719 Lustre-commit: a231148843bd ("LU-11719 ldlm: Adjust search_* functions") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/33754 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lock.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index b9771ef..06690a6 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1159,6 +1159,8 @@ static struct ldlm_lock *search_itree(struct ldlm_resource *res, { int idx; + data->lmd_lock = NULL; + for (idx = 0; idx < LCK_MODE_NUM; idx++) { struct ldlm_interval_tree *tree = &res->lr_itree[idx]; @@ -1172,11 +1174,14 @@ static struct ldlm_lock *search_itree(struct ldlm_resource *res, data->lmd_policy->l_extent.start, data->lmd_policy->l_extent.end, lock_matches, data); + if (data->lmd_lock) + return data->lmd_lock; } - return data->lmd_lock; + + return NULL; } -/** +/* * Search for a lock with given properties in a queue. * * @queue search for a lock in this queue @@ -1189,9 +1194,12 @@ static struct ldlm_lock *search_queue(struct list_head *queue, { struct ldlm_lock *lock; + data->lmd_lock = NULL; + list_for_each_entry(lock, queue, l_res_link) if (lock_matches(lock, data)) return data->lmd_lock; + return NULL; } From patchwork Thu Feb 27 21:11:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410271 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8333E92A for ; Thu, 27 Feb 2020 21:33:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6BE1324677 for ; Thu, 27 Feb 2020 21:33:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6BE1324677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A109C349DD0; Thu, 27 Feb 2020 13:28:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9F76621FB94 for ; Thu, 27 Feb 2020 13:19:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8D87D24BC; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C1D746C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:01 -0500 Message-Id: <1582838290-17243-194-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 193/622] lustre: sysfs: make ping sysfs file read and writable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Starting with 4.15 kernels any sysfs read only is limited to root access only. To retain the ability for non root users to detect if a remote server is alive using the 'ping' sysfs file we need to change it to writable. Retain the read ability so older tools will work. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: 6bbae72c6900 ("LU-8066 sysfs: make ping sysfs file read and writable") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33776 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 3 ++- fs/lustre/mdc/lproc_mdc.c | 2 +- fs/lustre/mgc/lproc_mgc.c | 2 +- fs/lustre/osc/lproc_osc.c | 2 +- fs/lustre/ptlrpc/lproc_ptlrpc.c | 9 +++++++++ 5 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 1ef548ae..c1079f1 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -462,7 +462,8 @@ int lprocfs_wr_uint(struct file *file, const char __user *buffer, struct adaptive_timeout; int lprocfs_at_hist_helper(struct seq_file *m, struct adaptive_timeout *at); int lprocfs_rd_timeouts(struct seq_file *m, void *data); - +ssize_t ping_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count); ssize_t ping_show(struct kobject *kobj, struct attribute *attr, char *buffer); diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 746dd21..70c9eaf 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -306,7 +306,7 @@ static ssize_t max_mod_rpcs_in_flight_store(struct kobject *kobj, LUSTRE_ATTR(mds_conn_uuid, 0444, conn_uuid_show, NULL); LUSTRE_RO_ATTR(conn_uuid); -LUSTRE_RO_ATTR(ping); +LUSTRE_RW_ATTR(ping); static ssize_t mdc_rpc_stats_seq_write(struct file *file, const char __user *buf, diff --git a/fs/lustre/mgc/lproc_mgc.c b/fs/lustre/mgc/lproc_mgc.c index 676d479..0c716df 100644 --- a/fs/lustre/mgc/lproc_mgc.c +++ b/fs/lustre/mgc/lproc_mgc.c @@ -69,7 +69,7 @@ struct lprocfs_vars lprocfs_mgc_obd_vars[] = { LUSTRE_ATTR(mgs_conn_uuid, 0444, conn_uuid_show, NULL); LUSTRE_RO_ATTR(conn_uuid); -LUSTRE_RO_ATTR(ping); +LUSTRE_RW_ATTR(ping); static struct attribute *mgc_attrs[] = { &lustre_attr_mgs_conn_uuid.attr, diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index ac64724..ea67d20 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -176,7 +176,7 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj, LUSTRE_ATTR(ost_conn_uuid, 0444, conn_uuid_show, NULL); LUSTRE_RO_ATTR(conn_uuid); -LUSTRE_RO_ATTR(ping); +LUSTRE_RW_ATTR(ping); static int osc_cached_mb_seq_show(struct seq_file *m, void *v) { diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index eb0ecc0..700e109 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -1234,6 +1234,7 @@ void ptlrpc_lprocfs_unregister_obd(struct obd_device *obd) } EXPORT_SYMBOL(ptlrpc_lprocfs_unregister_obd); +/* Kept for older tools */ ssize_t ping_show(struct kobject *kobj, struct attribute *attr, char *buffer) { @@ -1260,6 +1261,14 @@ ssize_t ping_show(struct kobject *kobj, struct attribute *attr, } EXPORT_SYMBOL(ping_show); +ssize_t ping_store(struct kobject *kobj, struct attribute *attr, + const char *buffer, size_t count) +{ + return ping_show(kobj, attr, (char *)buffer); +} +EXPORT_SYMBOL(ping_store); + + #undef BUFLEN /* Write the connection UUID to this file to attempt to connect to that node. From patchwork Thu Feb 27 21:11:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410249 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31497138D for ; Thu, 27 Feb 2020 21:33:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 19D4924677 for ; Thu, 27 Feb 2020 21:33:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19D4924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB0F4349370; Thu, 27 Feb 2020 13:28:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0368D21FB94 for ; Thu, 27 Feb 2020 13:19:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 90BCC24BD; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F469468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:02 -0500 Message-Id: <1582838290-17243-195-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 194/622] lustre: ptlrpc: connect vs import invalidate race X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Connect can't be sent while import invalidate is in progress, thus it leaves the import in not initialized state. Don't allow reconnect in evicted state. Cray-bug-id: LUS-6322 WC-bug-id: https://jira.whamcloud.com/browse/LU-7558 Lustre-commit: b1827ff1da82 ("LU-7558 ptlrpc: connect vs import invalidate race") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/33718 Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/import.c | 6 ++++++ fs/lustre/ptlrpc/recover.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index c2db38f..5ff270a 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -353,6 +353,7 @@ #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK 0x51b #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK 0x51c #define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 +#define OBD_FAIL_PTLRPC_CONNECT_RACE 0x531 #define OBD_FAIL_OBD_PING_NET 0x600 /* OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 obsolete since 1.5 */ diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 867aff6..df6c459 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -38,6 +38,7 @@ #define DEBUG_SUBSYSTEM S_RPC #include +#include #include #include #include @@ -273,6 +274,10 @@ void ptlrpc_invalidate_import(struct obd_import *imp) if (!imp->imp_invalid || imp->imp_obd->obd_no_recov) ptlrpc_deactivate_import(imp); + if (OBD_FAIL_PRECHECK(OBD_FAIL_PTLRPC_CONNECT_RACE)) { + OBD_RACE(OBD_FAIL_PTLRPC_CONNECT_RACE); + msleep(10 * MSEC_PER_SEC); + } CFS_FAIL_TIMEOUT(OBD_FAIL_MGS_CONNECT_NET, 3 * cfs_fail_val / 2); LASSERT(imp->imp_invalid); @@ -615,6 +620,7 @@ int ptlrpc_connect_import(struct obd_import *imp) CERROR("already connected\n"); return 0; } else if (imp->imp_state == LUSTRE_IMP_CONNECTING || + imp->imp_state == LUSTRE_IMP_EVICTED || imp->imp_connected) { spin_unlock(&imp->imp_lock); CERROR("already connecting\n"); diff --git a/fs/lustre/ptlrpc/recover.c b/fs/lustre/ptlrpc/recover.c index 7c09c4e..ceab288 100644 --- a/fs/lustre/ptlrpc/recover.c +++ b/fs/lustre/ptlrpc/recover.c @@ -339,6 +339,8 @@ int ptlrpc_recover_import(struct obd_import *imp, char *new_uuid, int async) if (rc) goto out; + OBD_RACE(OBD_FAIL_PTLRPC_CONNECT_RACE); + rc = ptlrpc_connect_import(imp); if (rc) goto out; From patchwork Thu Feb 27 21:11:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410031 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F689138D for ; Thu, 27 Feb 2020 21:28:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 769AC246A0 for ; Thu, 27 Feb 2020 21:28:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 769AC246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0608B34935A; Thu, 27 Feb 2020 13:24:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 446DF21FC4C for ; Thu, 27 Feb 2020 13:19:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 93CCC2AC7; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9216A46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:03 -0500 Message-Id: <1582838290-17243-196-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 195/622] lustre: ptlrpc: always unregister bulk X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In ptlrpc_check_set, the bulk should be unregistered before ptl_send_rpc in any case. WC-bug-id: https://jira.whamcloud.com/browse/LU-11647 Lustre-commit: 21c53b18a1bc ("LU-11647 ptlrpc: always unregister bulk") Signed-off-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/22378 Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index ff212a3..f57ec1883 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1902,9 +1902,6 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) spin_lock(&req->rq_lock); req->rq_resend = 1; spin_unlock(&req->rq_lock); - if (req->rq_bulk && - !ptlrpc_unregister_bulk(req, 1)) - continue; } /* * rq_wait_ctx is only touched by ptlrpcd, @@ -1931,6 +1928,13 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) spin_unlock(&req->rq_lock); } + /* In any case, the previous bulk should be + * cleaned up to prepare for the new sending + */ + if (req->rq_bulk && + !ptlrpc_unregister_bulk(req, 1)) + continue; + rc = ptl_send_rpc(req, 0); if (rc == -ENOMEM) { spin_lock(&imp->imp_lock); From patchwork Thu Feb 27 21:11:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410279 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0456B92A for ; Thu, 27 Feb 2020 21:33:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E151B24677 for ; Thu, 27 Feb 2020 21:33:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E151B24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E9E3521FA17; Thu, 27 Feb 2020 13:28:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 885AA21FC4C for ; Thu, 27 Feb 2020 13:19:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 964AB2ACF; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 950A746F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:04 -0500 Message-Id: <1582838290-17243-197-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 196/622] lustre: sptlrpc: split sptlrpc_process_config() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Make sptlrpc_process_config() more than a single line wapper exporting function. Instead migrate the lcfg parsing out of __sptlrpc_process_config() so that we can use this function for both LCFG_PARAM and LCFG_SET_PARAM handling. The first field parsed from struct lustre_cfg *lcfg is the target. This can be "_mgs", file system name, or an obd target i.e fsname-MDT0000. We can move to extracting the file system name out of the target string using server_name2fsname(). WC-bug-id: https://jira.whamcloud.com/browse/LU-10937 Lustre-commit: 0ff7d548eb7b ("LU-10937 sptlrpc: split sptlrpc_process_config()") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33760 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_disk.h | 1 + fs/lustre/obdclass/obd_mount.c | 5 ++- fs/lustre/ptlrpc/sec_config.c | 85 +++++++++++++++++++++++++---------------- 3 files changed, 57 insertions(+), 34 deletions(-) diff --git a/fs/lustre/include/lustre_disk.h b/fs/lustre/include/lustre_disk.h index 92618e8..b6b693f 100644 --- a/fs/lustre/include/lustre_disk.h +++ b/fs/lustre/include/lustre_disk.h @@ -145,6 +145,7 @@ struct lustre_sb_info { /****************** prototypes *********************/ /* obd_mount.c */ +int server_name2fsname(const char *svname, char *fsname, const char **endptr); int lustre_start_mgc(struct super_block *sb); int lustre_common_put_super(struct super_block *sb); diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c index d143112..6c68bc7 100644 --- a/fs/lustre/obdclass/obd_mount.c +++ b/fs/lustre/obdclass/obd_mount.c @@ -597,8 +597,8 @@ int lustre_put_lsi(struct super_block *sb) * * Returns: rc < 0 on error */ -static int server_name2fsname(const char *svname, char *fsname, - const char **endptr) +int server_name2fsname(const char *svname, char *fsname, + const char **endptr) { const char *dash; @@ -618,6 +618,7 @@ static int server_name2fsname(const char *svname, char *fsname, return 0; } +EXPORT_SYMBOL(server_name2fsname); /* Get the index from the obd name. * rc = server type, or diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c index 135ce99..e4b1a075 100644 --- a/fs/lustre/ptlrpc/sec_config.c +++ b/fs/lustre/ptlrpc/sec_config.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -577,14 +578,45 @@ static int sptlrpc_conf_merge_rule(struct sptlrpc_conf *conf, * find one through the target name in the record inside conf_lock; * otherwise means caller already hold conf_lock. */ -static int __sptlrpc_process_config(struct lustre_cfg *lcfg, +static int __sptlrpc_process_config(char *target, const char *fsname, + struct sptlrpc_rule *rule, struct sptlrpc_conf *conf) { - char *target, *param; + int rc; + + if (!conf) { + if (!fsname) + return -ENODEV; + + mutex_lock(&sptlrpc_conf_lock); + conf = sptlrpc_conf_get(fsname, 0); + if (!conf) { + CERROR("can't find conf\n"); + rc = -ENOMEM; + } else { + rc = sptlrpc_conf_merge_rule(conf, target, rule); + } + mutex_unlock(&sptlrpc_conf_lock); + } else { + LASSERT(mutex_is_locked(&sptlrpc_conf_lock)); + rc = sptlrpc_conf_merge_rule(conf, target, rule); + } + + if (!rc) + conf->sc_modified++; + + return rc; +} + +int sptlrpc_process_config(struct lustre_cfg *lcfg) +{ char fsname[MTI_NAME_MAXLEN]; struct sptlrpc_rule rule; + char *target, *param; int rc; + print_lustre_cfg(lcfg); + target = lustre_cfg_string(lcfg, 1); if (!target) { CERROR("missing target name\n"); @@ -597,45 +629,34 @@ static int __sptlrpc_process_config(struct lustre_cfg *lcfg, return -EINVAL; } - CDEBUG(D_SEC, "processing rule: %s.%s\n", target, param); - /* parse rule to make sure the format is correct */ - if (strncmp(param, PARAM_SRPC_FLVR, sizeof(PARAM_SRPC_FLVR) - 1) != 0) { + if (strncmp(param, PARAM_SRPC_FLVR, + sizeof(PARAM_SRPC_FLVR) - 1) != 0) { CERROR("Invalid sptlrpc parameter: %s\n", param); return -EINVAL; } param += sizeof(PARAM_SRPC_FLVR) - 1; - rc = sptlrpc_parse_rule(param, &rule); - if (rc) - return -EINVAL; - - if (!conf) { - target2fsname(target, fsname, sizeof(fsname)); - - mutex_lock(&sptlrpc_conf_lock); - conf = sptlrpc_conf_get(fsname, 0); - if (!conf) { - CERROR("can't find conf\n"); - rc = -ENOMEM; - } else { - rc = sptlrpc_conf_merge_rule(conf, target, &rule); - } - mutex_unlock(&sptlrpc_conf_lock); - } else { - LASSERT(mutex_is_locked(&sptlrpc_conf_lock)); - rc = sptlrpc_conf_merge_rule(conf, target, &rule); - } + CDEBUG(D_SEC, "processing rule: %s.%s\n", target, param); - if (rc == 0) - conf->sc_modified++; + /* + * Three types of targets exist for sptlrpc using conf_param + * 1. '_mgs' which targets mgc srpc settings. Treat it as + * as a special file system name. + * 2. target is a device which can be fsname-MDTXXXX or + * fsname-OSTXXXX. This can be verified by the function + * server_name2fsname. + * 3. If both above conditions are not meet then the target + * is a actual filesystem. + */ + if (server_name2fsname(target, fsname, NULL)) + strlcpy(fsname, target, sizeof(target)); - return rc; -} + rc = sptlrpc_parse_rule(param, &rule); + if (rc) + return rc; -int sptlrpc_process_config(struct lustre_cfg *lcfg) -{ - return __sptlrpc_process_config(lcfg, NULL); + return __sptlrpc_process_config(target, fsname, &rule, NULL); } EXPORT_SYMBOL(sptlrpc_process_config); From patchwork Thu Feb 27 21:11:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE54314E3 for ; Thu, 27 Feb 2020 21:27:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D6F91246A0 for ; Thu, 27 Feb 2020 21:27:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6F91246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 385AA348A16; Thu, 27 Feb 2020 13:24:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DDB8B21FC4C for ; Thu, 27 Feb 2020 13:19:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9984B2AD0; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9849F46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:05 -0500 Message-Id: <1582838290-17243-198-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 197/622] lustre: cfg: reserve flags for SELinux status checking X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Reserve LCFG_NODEMAP_SET_SEPOL config flag that will be used to define sepol parameter on nodemap entries. Reserve OBD_CONNECT2_SELINUX_POLICY connection flag that will be set (in ocd_connect_flags2) if a client supports sending the SELinux policy status info. Add checks for all lcfg_command_type constants, along with lustre_cfg and cfg_record_type. WC-bug-id: https://jira.whamcloud.com/browse/LU-8955 Lustre-commit: e71a77ba8d47 ("LU-8955 cfg: reserve flags for SELinux status checking") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/33797 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 115 +++++++++++++++++++++++++++++++-- include/uapi/linux/lustre/lustre_cfg.h | 1 + include/uapi/linux/lustre/lustre_idl.h | 1 + 4 files changed, 112 insertions(+), 6 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index cce9bec..7701bc3 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -120,6 +120,7 @@ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ "archive_id_array", /* 0x100 */ + "selinux_policy", /* 0x200 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 66dce80..bf79b8b 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -41,6 +41,7 @@ #include #include #include +#include #include "ptlrpc_internal.h" @@ -1143,6 +1144,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LOCK_CONVERT); LASSERTF(OBD_CONNECT2_ARCHIVE_ID_ARRAY == 0x100ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ARCHIVE_ID_ARRAY); + LASSERTF(OBD_CONNECT2_SELINUX_POLICY == 0x400ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_SELINUX_POLICY); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", @@ -1150,17 +1153,17 @@ void lustre_assert_wire_constants(void) LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32C); LASSERTF(OBD_CKSUM_RESERVED == 0x00000008UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_RESERVED); + (unsigned int)OBD_CKSUM_RESERVED); LASSERTF(OBD_CKSUM_T10IP512 == 0x00000010UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_T10IP512); + (unsigned int)OBD_CKSUM_T10IP512); LASSERTF(OBD_CKSUM_T10IP4K == 0x00000020UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_T10IP4K); + (unsigned int)OBD_CKSUM_T10IP4K); LASSERTF(OBD_CKSUM_T10CRC512 == 0x00000040UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_T10CRC512); + (unsigned int)OBD_CKSUM_T10CRC512); LASSERTF(OBD_CKSUM_T10CRC4K == 0x00000080UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_T10CRC4K); + (unsigned int)OBD_CKSUM_T10CRC4K); LASSERTF(OBD_CKSUM_T10_TOP == 0x00000002UL, "found 0x%.8xUL\n", - (unsigned int)OBD_CKSUM_T10_TOP); + (unsigned int)OBD_CKSUM_T10_TOP); /* Checks for struct ost_layout */ LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n", @@ -4633,4 +4636,104 @@ void lustre_assert_wire_constants(void) (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_advise)); LASSERTF(LF_ASYNC == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)LF_ASYNC); + + /* Checks for struct lustre_cfg */ + LASSERTF((int)sizeof(struct lustre_cfg) == 32, "found %lld\n", + (long long)(int)sizeof(struct lustre_cfg)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_version) == 0, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_version)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_version) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_version)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_command) == 4, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_command)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_command) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_command)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_num) == 8, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_num)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_num) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_num)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_flags) == 12, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_flags)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_flags) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_flags)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_nid) == 16, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_nid)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_nid) == 8, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_nid)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_nal) == 24, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_nal)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_nal) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_nal)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_bufcount) == 28, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_bufcount)); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_bufcount) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_bufcount)); + LASSERTF((int)offsetof(struct lustre_cfg, lcfg_buflens[0]) == 32, "found %lld\n", + (long long)(int)offsetof(struct lustre_cfg, lcfg_buflens[0])); + LASSERTF((int)sizeof(((struct lustre_cfg *)0)->lcfg_buflens[0]) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_cfg *)0)->lcfg_buflens[0])); + LASSERTF(LCFG_ATTACH == 0x000cf001UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_ATTACH); + LASSERTF(LCFG_DETACH == 0x000cf002UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_DETACH); + LASSERTF(LCFG_SETUP == 0x000cf003UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SETUP); + LASSERTF(LCFG_CLEANUP == 0x000cf004UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_CLEANUP); + LASSERTF(LCFG_ADD_UUID == 0x000cf005UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_ADD_UUID); + LASSERTF(LCFG_DEL_UUID == 0x000cf006UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_DEL_UUID); + LASSERTF(LCFG_MOUNTOPT == 0x000cf007UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_MOUNTOPT); + LASSERTF(LCFG_DEL_MOUNTOPT == 0x000cf008UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_DEL_MOUNTOPT); + LASSERTF(LCFG_SET_TIMEOUT == 0x000cf009UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SET_TIMEOUT); + LASSERTF(LCFG_SET_UPCALL == 0x000cf00aUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SET_UPCALL); + LASSERTF(LCFG_ADD_CONN == 0x000cf00bUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_ADD_CONN); + LASSERTF(LCFG_DEL_CONN == 0x000cf00cUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_DEL_CONN); + LASSERTF(LCFG_LOV_ADD_OBD == 0x000cf00dUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_LOV_ADD_OBD); + LASSERTF(LCFG_LOV_DEL_OBD == 0x000cf00eUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_LOV_DEL_OBD); + LASSERTF(LCFG_PARAM == 0x000cf00fUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_PARAM); + LASSERTF(LCFG_MARKER == 0x000cf010UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_MARKER); + LASSERTF(LCFG_LOG_START == 0x000ce011UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_LOG_START); + LASSERTF(LCFG_LOG_END == 0x000ce012UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_LOG_END); + LASSERTF(LCFG_LOV_ADD_INA == 0x000ce013UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_LOV_ADD_INA); + LASSERTF(LCFG_ADD_MDC == 0x000cf014UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_ADD_MDC); + LASSERTF(LCFG_DEL_MDC == 0x000cf015UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_DEL_MDC); + LASSERTF(LCFG_SPTLRPC_CONF == 0x000ce016UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SPTLRPC_CONF); + LASSERTF(LCFG_POOL_NEW == 0x000ce020UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_POOL_NEW); + LASSERTF(LCFG_POOL_ADD == 0x000ce021UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_POOL_ADD); + LASSERTF(LCFG_POOL_REM == 0x000ce022UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_POOL_REM); + LASSERTF(LCFG_POOL_DEL == 0x000ce023UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_POOL_DEL); + LASSERTF(LCFG_SET_LDLM_TIMEOUT == 0x000ce030UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SET_LDLM_TIMEOUT); + LASSERTF(LCFG_PRE_CLEANUP == 0x000cf031UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_PRE_CLEANUP); + LASSERTF(LCFG_SET_PARAM == 0x000ce032UL, "found 0x%.8xUL\n", + (unsigned int)LCFG_SET_PARAM); + LASSERTF(LCFG_NODEMAP_SET_SEPOL == 0x000ce05bUL, "found 0x%.8xUL\n", + (unsigned int)LCFG_NODEMAP_SET_SEPOL); + LASSERTF(PORTALS_CFG_TYPE == 1, "found %lld\n", + (long long)PORTALS_CFG_TYPE); + LASSERTF(LUSTRE_CFG_TYPE == 123, "found %lld\n", + (long long)LUSTRE_CFG_TYPE); } diff --git a/include/uapi/linux/lustre/lustre_cfg.h b/include/uapi/linux/lustre/lustre_cfg.h index 0620e49..5d6b585 100644 --- a/include/uapi/linux/lustre/lustre_cfg.h +++ b/include/uapi/linux/lustre/lustre_cfg.h @@ -107,6 +107,7 @@ enum lcfg_command_type { LCFG_SET_PARAM = 0x00ce032, /**< use set_param syntax to set * a proc parameters */ + LCFG_NODEMAP_SET_SEPOL = 0x00ce05b, /**< set SELinux policy */ }; struct lustre_cfg_bufs { diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 4236a43..f723d7b 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -805,6 +805,7 @@ struct ptlrpc_body_v2 { */ #define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ #define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ +#define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:11:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409991 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F70D1580 for ; Thu, 27 Feb 2020 21:27:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 882B4246A0 for ; Thu, 27 Feb 2020 21:27:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 882B4246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED4D9349140; Thu, 27 Feb 2020 13:24:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8529421FC4C for ; Thu, 27 Feb 2020 13:19:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9C61D2AD1; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9AEB746C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:06 -0500 Message-Id: <1582838290-17243-199-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 198/622] lustre: llite: remove cl_file_inode_init() LASSERT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger If there is some corruption or other reason that the file layout cannot be used, the first call to cl_file_inode_init() will fail. If it is called a second time on the same file then it will hit an LASSERT() since I_NEW is no longer set on the inode. It would be good to handle the error in lov_init_raid0() better, but we still want to avoid this LASSERT() if there is an error. Convert the LASSERT() in cl_file_inode_init() into a CERROR() and error return. This is being triggered due to corruption on the server, but that shouldn't cause the client to assert. lov_dump_lmm_common() oid 0xdf4e:311367, magic 0x0bd10bd0 lov_dump_lmm_common() stripe_size 1048576, stripe_count 4 lov_dump_lmm_objects() stripe 0 idx 10 subobj 0x0:151194471 lov_dump_lmm_objects() stripe 1 idx 12 subobj 0x0:152477530 lov_dump_lmm_objects() stripe 2 idx 25 subobj 0x0:151589797 lov_dump_lmm_objects() stripe 3 idx 2 subobj 0x0:150332564 lov_init_raid0() fsname-clilov: OST0019 is not initialized cl_file_inode_init() Failure to initialize cl object [0x20004c047:0xdf4e:0x0]: -5 cl_file_inode_init() ASSERTION(inode->i_state & (1 << 3) ) failed cl_file_inode_init() LBUG Pid: 37233, comm: ll_sa_4709 3.10.0-862.14.4.el7.x86_64 #1 SMP Call Trace: libcfs_call_trace+0x8c/0xc0 [libcfs] lbug_with_loc+0x4c/0xa0 [libcfs] cl_file_inode_init+0x2ac/0x300 [lustre] ll_update_inode+0x315/0x600 [lustre] ll_iget+0x163/0x350 [lustre] ll_prep_inode+0x232/0xc80 [lustre] sa_handle_callback+0x3a4/0xf70 [lustre] ll_statahead_thread+0x40e/0x2080 [lustre] Instead, return an IO error instead of killing the client. WC-bug-id: https://jira.whamcloud.com/browse/LU-11579 Lustre-commit: 0baa3eb1a4ab ("LU-11579 llite: remove cl_file_inode_init() LASSERT") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33505 Reviewed-by: Patrick Farrell Reviewed-by: Bobi Jam Signed-off-by: James Simmons --- fs/lustre/llite/lcommon_cl.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c index 978e05b..9ac80e0 100644 --- a/fs/lustre/llite/lcommon_cl.c +++ b/fs/lustre/llite/lcommon_cl.c @@ -171,7 +171,14 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md) * unnecessary to perform lookup-alloc-lookup-insert, just * alloc and insert directly. */ - LASSERT(inode->i_state & I_NEW); + if (!(inode->i_state & I_NEW)) { + result = -EIO; + CERROR("%s: unexpected not-NEW inode "DFID": rc = %d\n", + ll_get_fsname(inode->i_sb, NULL, 0), PFID(fid), + result); + goto out; + } + conf.coc_lu.loc_flags = LOC_F_NEW; clob = cl_object_find(env, lu2cl_dev(site->ls_top_dev), fid, &conf); @@ -193,11 +200,13 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md) } } + if (result) + CERROR("%s: failed to initialize cl_object "DFID": rc = %d\n", + ll_get_fsname(inode->i_sb, NULL, 0), PFID(fid), result); + +out: cl_env_put(env, &refcheck); - if (result != 0) - CERROR("Failure to initialize cl object " DFID ": %d\n", - PFID(fid), result); return result; } From patchwork Thu Feb 27 21:11:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410017 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DB4D138D for ; Thu, 27 Feb 2020 21:28:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA54A246A0 for ; Thu, 27 Feb 2020 21:27:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA54A246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BC8434921B; Thu, 27 Feb 2020 13:24:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DABBB21FBC9 for ; Thu, 27 Feb 2020 13:19:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9FC792AD2; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9DEF8468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:07 -0500 Message-Id: <1582838290-17243-200-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 199/622] lnet: add fault injection for bulk transfers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Artem Blagodarenko An internal test was always passing due to nno fault injecytion happening. Add CFS_FAIL_PTLRPC_OST_BULK_CB2 to simulation a bulk transfer timeout. WC-bug-id: https://jira.whamcloud.com/browse/LU-7159 Lustre-commit: 707820692275 ("LU-7159 tests: fix 224c fault injection") Signed-off-by: Artem Blagodarenko Xyratex-bug-id: MRP-2472 Reviewed-on: https://review.whamcloud.com/16426 Reviewed-by: Alexander Zarochentsev Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + include/linux/libcfs/libcfs_fail.h | 6 ++++++ include/linux/lnet/lib-lnet.h | 3 +++ net/lnet/lnet/lib-move.c | 6 +++++- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 5ff270a..d9a0395 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -487,6 +487,7 @@ #define OBD_FAIL_FLR_LV_INC 0x1A02 #define OBD_FAIL_FLR_RANDOM_PICK_MIRROR 0x1A03 +/* LNet is allocated failure locations 0xe000 to 0xffff */ /* Assign references to moved code to reduce code changes */ #define OBD_FAIL_PRECHECK(id) CFS_FAIL_PRECHECK(id) #define OBD_FAIL_CHECK(id) CFS_FAIL_CHECK(id) diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h index f52a82a..c341567 100644 --- a/include/linux/libcfs/libcfs_fail.h +++ b/include/linux/libcfs/libcfs_fail.h @@ -54,6 +54,12 @@ enum { CFS_FAIL_LOC_VALUE = 3 }; +/* Failure ranges + * "0x0100 - 0x3fff" for Lustre + * "0xe000 - 0xefff" for LNet + * "0xf000 - 0xffff" for LNDs + */ + /* Failure injection control */ #define CFS_FAIL_MASK_SYS 0x0000FF00 #define CFS_FAIL_MASK_LOC (0x000000FF | CFS_FAIL_MASK_SYS) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index bbb678f..d09fb4c 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -49,6 +49,9 @@ #include #include +/* LNET has 0xeXXX */ +#define CFS_FAIL_PTLRPC_OST_BULK_CB2 0xe000 + extern struct lnet the_lnet; /* THE network */ #if (BITS_PER_LONG == 32) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 3bcac03..f5548eb 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4323,7 +4323,11 @@ void lnet_monitor_thr_stop(void) if (ack == LNET_ACK_REQ) lnet_attach_rsp_tracker(rspt, cpt, md, mdh); - rc = lnet_send(self, msg, LNET_NID_ANY); + if (CFS_FAIL_CHECK_ORSET(CFS_FAIL_PTLRPC_OST_BULK_CB2, + CFS_FAIL_ONCE)) + rc = -EIO; + else + rc = lnet_send(self, msg, LNET_NID_ANY); if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); From patchwork Thu Feb 27 21:11:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410035 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E8341580 for ; Thu, 27 Feb 2020 21:28:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7719F246A0 for ; Thu, 27 Feb 2020 21:28:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7719F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0DD22349377; Thu, 27 Feb 2020 13:24:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A7F021FB5E for ; Thu, 27 Feb 2020 13:19:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A57242AD3; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A3BC146D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:08 -0500 Message-Id: <1582838290-17243-201-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 200/622] lnet: remove .nf_min_max handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Kit Westneat The .nf_min_max handling was only used for server side debugging. This has been removed in the OpenSFS tree as well so lets remove it here since the code related to nf_min_max handling is not used. WC-bug-id: https://jira.whamcloud.com/browse/LU-8939 Lustre-commit: a9b830da51bd ("LU-8939 nodemap: remove deprecated lproc files") Signed-off-by: Kit Westneat Reviewed-on: https://review.whamcloud.com/24352 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/nidstrings.c | 278 ++------------------------------------------- 1 file changed, 10 insertions(+), 268 deletions(-) diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c index 13338d0..eca5092 100644 --- a/net/lnet/lnet/nidstrings.c +++ b/net/lnet/lnet/nidstrings.c @@ -451,264 +451,6 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist) } EXPORT_SYMBOL(cfs_print_nidlist); -/** - * Determines minimum and maximum addresses for a single - * numeric address range - * - * @ar - * @min_nid *min_nid __u32 representation of min NID - * @max_nid *max_nid __u32 representation of max NID - * - * Return: -EINVAL unsupported LNET range - * -ERANGE non-contiguous LNET range - */ -static int cfs_ip_ar_min_max(struct addrrange *ar, u32 *min_nid, - u32 *max_nid) -{ - struct cfs_expr_list *expr_list; - struct cfs_range_expr *range; - unsigned int min_ip[4] = { 0 }; - unsigned int max_ip[4] = { 0 }; - int cur_octet = 0; - bool expect_full_octet = false; - - list_for_each_entry(expr_list, &ar->ar_numaddr_ranges, el_link) { - int re_count = 0; - - list_for_each_entry(range, &expr_list->el_exprs, re_link) { - /* XXX: add support for multiple & non-contig. re's */ - if (re_count > 0) - return -EINVAL; - - /* if a previous octet was ranged, then all remaining - * octets must be full for contiguous range - */ - if (expect_full_octet && (range->re_lo != 0 || - range->re_hi != 255)) - return -ERANGE; - - if (range->re_stride != 1) - return -ERANGE; - - if (range->re_lo > range->re_hi) - return -EINVAL; - - if (range->re_lo != range->re_hi) - expect_full_octet = true; - - min_ip[cur_octet] = range->re_lo; - max_ip[cur_octet] = range->re_hi; - - re_count++; - } - - cur_octet++; - } - - if (min_nid) - *min_nid = ((min_ip[0] << 24) | (min_ip[1] << 16) | - (min_ip[2] << 8) | min_ip[3]); - - if (max_nid) - *max_nid = ((max_ip[0] << 24) | (max_ip[1] << 16) | - (max_ip[2] << 8) | max_ip[3]); - - return 0; -} - -/** - * Determines minimum and maximum addresses for a single - * numeric address range - * - * @ar - * @min_nid *min_nid __u32 representation of min NID - * @max_nid *max_nid __u32 representation of max NID - * - * Return: -EINVAL unsupported LNET range - */ -static int cfs_num_ar_min_max(struct addrrange *ar, u32 *min_nid, - u32 *max_nid) -{ - struct cfs_expr_list *el; - struct cfs_range_expr *re; - unsigned int min_addr = 0; - unsigned int max_addr = 0; - - list_for_each_entry(el, &ar->ar_numaddr_ranges, el_link) { - int re_count = 0; - - list_for_each_entry(re, &el->el_exprs, re_link) { - if (re_count > 0) - return -EINVAL; - if (re->re_lo > re->re_hi) - return -EINVAL; - - if (re->re_lo < min_addr || !min_addr) - min_addr = re->re_lo; - if (re->re_hi > max_addr) - max_addr = re->re_hi; - - re_count++; - } - } - - if (min_nid) - *min_nid = min_addr; - if (max_nid) - *max_nid = max_addr; - - return 0; -} - -/** - * Takes a linked list of nidrange expressions, determines the minimum - * and maximum nid and creates appropriate nid structures - * - * @nidlist - * @min_nid *min_nid string representation of min NID - * @max_nid *max_nid string representation of max NID - * - * Return: -EINVAL unsupported LNET range - * -ERANGE non-contiguous LNET range - */ -int cfs_nidrange_find_min_max(struct list_head *nidlist, char *min_nid, - char *max_nid, size_t nidstr_length) -{ - struct nidrange *first_nidrange; - int netnum; - struct netstrfns *nf; - char *lndname; - u32 min_addr; - u32 max_addr; - char min_addr_str[IPSTRING_LENGTH]; - char max_addr_str[IPSTRING_LENGTH]; - int rc; - - first_nidrange = list_entry(nidlist->next, struct nidrange, nr_link); - - netnum = first_nidrange->nr_netnum; - nf = first_nidrange->nr_netstrfns; - lndname = nf->nf_name; - - rc = nf->nf_min_max(nidlist, &min_addr, &max_addr); - if (rc < 0) - return rc; - - nf->nf_addr2str(min_addr, min_addr_str, sizeof(min_addr_str)); - nf->nf_addr2str(max_addr, max_addr_str, sizeof(max_addr_str)); - - snprintf(min_nid, nidstr_length, "%s@%s%d", min_addr_str, lndname, - netnum); - snprintf(max_nid, nidstr_length, "%s@%s%d", max_addr_str, lndname, - netnum); - - return 0; -} -EXPORT_SYMBOL(cfs_nidrange_find_min_max); - -/** - * Determines the min and max NID values for num LNDs - * - * @nidlist - * @min_nid *min_nid if provided, returns string representation of min NID - * @max_nid *max_nid if provided, returns string representation of max NID - * - * Return: -EINVAL unsupported LNET range - * -ERANGE non-contiguous LNET range - */ -static int cfs_num_min_max(struct list_head *nidlist, u32 *min_nid, - u32 *max_nid) -{ - struct nidrange *nr; - struct addrrange *ar; - unsigned int tmp_min_addr = 0; - unsigned int tmp_max_addr = 0; - unsigned int min_addr = 0; - unsigned int max_addr = 0; - int nidlist_count = 0; - int rc; - - list_for_each_entry(nr, nidlist, nr_link) { - if (nidlist_count > 0) - return -EINVAL; - - list_for_each_entry(ar, &nr->nr_addrranges, ar_link) { - rc = cfs_num_ar_min_max(ar, &tmp_min_addr, - &tmp_max_addr); - if (rc) - return rc; - - if (tmp_min_addr < min_addr || !min_addr) - min_addr = tmp_min_addr; - if (tmp_max_addr > max_addr) - max_addr = tmp_min_addr; - } - } - - if (max_nid) - *max_nid = max_addr; - if (min_nid) - *min_nid = min_addr; - - return 0; -} - -/** - * Takes an nidlist and determines the minimum and maximum - * ip addresses. - * - * @nidlist - * @min_nid *min_nid if provided, returns string representation of min NID - * @max_nid *max_nid if provided, returns string representation of max NID - * - * Return: -EINVAL unsupported LNET range - * -ERANGE non-contiguous LNET range - */ -static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, - u32 *max_nid) -{ - struct nidrange *nr; - struct addrrange *ar; - u32 tmp_min_ip_addr = 0; - u32 tmp_max_ip_addr = 0; - u32 min_ip_addr = 0; - u32 max_ip_addr = 0; - int nidlist_count = 0; - int rc; - - list_for_each_entry(nr, nidlist, nr_link) { - if (nidlist_count > 0) - return -EINVAL; - - if (nr->nr_all) { - min_ip_addr = 0; - max_ip_addr = 0xffffffff; - break; - } - - list_for_each_entry(ar, &nr->nr_addrranges, ar_link) { - rc = cfs_ip_ar_min_max(ar, &tmp_min_ip_addr, - &tmp_max_ip_addr); - if (rc) - return rc; - - if (tmp_min_ip_addr < min_ip_addr || !min_ip_addr) - min_ip_addr = tmp_min_ip_addr; - if (tmp_max_ip_addr > max_ip_addr) - max_ip_addr = tmp_max_ip_addr; - } - - nidlist_count++; - } - - if (min_nid) - *min_nid = min_ip_addr; - if (max_nid) - *max_nid = max_ip_addr; - - return 0; -} - static int libcfs_lo_str2addr(const char *str, int nob, u32 *addr) { @@ -912,8 +654,8 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, .nf_str2addr = libcfs_lo_str2addr, .nf_parse_addrlist = libcfs_num_parse, .nf_print_addrlist = libcfs_num_addr_range_print, - .nf_match_addr = libcfs_num_match, - .nf_min_max = cfs_num_min_max }, + .nf_match_addr = libcfs_num_match + }, { .nf_type = SOCKLND, .nf_name = "tcp", .nf_modname = "ksocklnd", @@ -921,8 +663,8 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, .nf_str2addr = libcfs_ip_str2addr, .nf_parse_addrlist = cfs_ip_addr_parse, .nf_print_addrlist = libcfs_ip_addr_range_print, - .nf_match_addr = cfs_ip_addr_match, - .nf_min_max = cfs_ip_min_max }, + .nf_match_addr = cfs_ip_addr_match + }, { .nf_type = O2IBLND, .nf_name = "o2ib", .nf_modname = "ko2iblnd", @@ -930,8 +672,8 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, .nf_str2addr = libcfs_ip_str2addr, .nf_parse_addrlist = cfs_ip_addr_parse, .nf_print_addrlist = libcfs_ip_addr_range_print, - .nf_match_addr = cfs_ip_addr_match, - .nf_min_max = cfs_ip_min_max }, + .nf_match_addr = cfs_ip_addr_match + }, { .nf_type = GNILND, .nf_name = "gni", .nf_modname = "kgnilnd", @@ -939,8 +681,8 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, .nf_str2addr = libcfs_num_str2addr, .nf_parse_addrlist = libcfs_num_parse, .nf_print_addrlist = libcfs_num_addr_range_print, - .nf_match_addr = libcfs_num_match, - .nf_min_max = cfs_num_min_max }, + .nf_match_addr = libcfs_num_match + }, { .nf_type = GNIIPLND, .nf_name = "gip", .nf_modname = "kgnilnd", @@ -948,8 +690,8 @@ static int cfs_ip_min_max(struct list_head *nidlist, u32 *min_nid, .nf_str2addr = libcfs_ip_str2addr, .nf_parse_addrlist = cfs_ip_addr_parse, .nf_print_addrlist = libcfs_ip_addr_range_print, - .nf_match_addr = cfs_ip_addr_match, - .nf_min_max = cfs_ip_min_max }, + .nf_match_addr = cfs_ip_addr_match + }, }; static const size_t libcfs_nnetstrfns = ARRAY_SIZE(libcfs_netstrfns); From patchwork Thu Feb 27 21:11:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410021 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 543851580 for ; Thu, 27 Feb 2020 21:28:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C92F246A0 for ; Thu, 27 Feb 2020 21:28:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C92F246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 83391349281; Thu, 27 Feb 2020 13:24:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 82BC921FBD9 for ; Thu, 27 Feb 2020 13:19:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A6AFE2AD4; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A4DF146A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:09 -0500 Message-Id: <1582838290-17243-202-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 201/622] lustre: sec: create new function sptlrpc_get_sepol() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Create new function sptlrpc_get_sepol() in ptlrpc/sec.c to compute SELinux policy info, by calling new userland command l_getsepol. The SELinux policy info syntax is the following: ::: where: - is a digit telling if SELinux is in Permissive mode (0) or Enforcing mode (1) - is the name of the SELinux policy - is the version of the SELinux policy - is the computed hash of the binary representation of the policy, as exported in /etc/selinux//policy/policy. Userland command l_getsepol can be called on the command line by a security administrator to get SELinux status information to store into 'sepol' field of nodemap. SELinux status information is reported by Lustre client only if new 'send_sepol' ptlrpc kernel module's parameter is not zero, and SELinux is enabled on the client. 'send_sepol' accepts various values: - 0: do not send SELinux policy info; - -1: send SELinux policy info for every request; - N > 0: only send SELinux policy info every N seconds. Use max value 2^31-1 (signed int on 32 bits) to make sure SELinux policy info is only checked at mount time. Independently from 'send_sepol' value, SELinux policy info has an associated mtime. l_getsepol checks mtime and recalculates whole SELinux policy info (including SHA) only if mtime changed. WC-bug-id: https://jira.whamcloud.com/browse/LU-8955 Lustre-commit: c61168239eff ("LU-8955 sec: create new function sptlrpc_get_sepol()") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/24421 Reviewed-by: Patrick Farrell Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 7 ++ fs/lustre/include/lustre_sec.h | 12 +++ fs/lustre/ptlrpc/sec.c | 125 ++++++++++++++++++++++++++++++++ fs/lustre/ptlrpc/sec_lproc.c | 74 +++++++++++++++++++ include/uapi/linux/lustre/lustre_idl.h | 13 ++++ include/uapi/linux/lustre/lustre_user.h | 9 +++ 6 files changed, 240 insertions(+) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 81a6ac9..36de665 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -845,6 +845,13 @@ struct ptlrpc_request { /** description of flavors for client & server */ struct sptlrpc_flavor rq_flvr; + /** + * SELinux policy info at the time of the request + * sepol string format is: + * ::: + */ + char rq_sepol[LUSTRE_NODEMAP_SEPOL_LENGTH + 1]; + /* client/server security flags */ unsigned int rq_ctx_init:1, /* context initiation */ diff --git a/fs/lustre/include/lustre_sec.h b/fs/lustre/include/lustre_sec.h index 99702fd..00710d6 100644 --- a/fs/lustre/include/lustre_sec.h +++ b/fs/lustre/include/lustre_sec.h @@ -792,6 +792,17 @@ struct ptlrpc_sec { /** owning import */ struct obd_import *ps_import; spinlock_t ps_lock; + /** mtime of SELinux policy file */ + time_t ps_sepol_mtime; + /** next check time of SELinux policy file */ + ktime_t ps_sepol_checknext; + /** + * SELinux policy info + * sepol string format is: + * ::: + */ + char ps_sepol[LUSTRE_NODEMAP_SEPOL_LENGTH + + 1]; /* * garbage collection @@ -987,6 +998,7 @@ int sptlrpc_cli_unwrap_early_reply(struct ptlrpc_request *req, void sptlrpc_cli_finish_early_reply(struct ptlrpc_request *early_req); void sptlrpc_request_out_callback(struct ptlrpc_request *req); +int sptlrpc_get_sepol(struct ptlrpc_request *req); /* * exported higher interface of import & request diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c index 54ca97c..789b5cb 100644 --- a/fs/lustre/ptlrpc/sec.c +++ b/fs/lustre/ptlrpc/sec.c @@ -53,6 +53,10 @@ #include "ptlrpc_internal.h" +static int send_sepol; +module_param(send_sepol, int, 0644); +MODULE_PARM_DESC(send_sepol, "Client sends SELinux policy status"); + /*********************************************** * policy registers * ***********************************************/ @@ -1692,6 +1696,127 @@ static int sptlrpc_svc_install_rvs_ctx(struct obd_import *imp, return policy->sp_sops->install_rctx(imp, ctx); } +#ifdef CONFIG_SECURITY_SELINUX +/* Get SELinux policy info from userspace */ +static int sepol_helper(struct obd_import *imp) +{ + char mtime_str[21] = { 0 }, mode_str[2] = { 0 }; + char *argv[] = { + [0] = "/usr/sbin/l_getsepol", + [1] = "-o", + [2] = NULL, /* obd type */ + [3] = "-n", + [4] = NULL, /* obd name */ + [5] = "-t", + [6] = mtime_str, /* policy mtime */ + [7] = "-m", + [8] = mode_str, /* enforcing mode */ + [9] = NULL + }; + static char *envp[] = { + [0] = "HOME=/", + [1] = "PATH=/sbin:/usr/sbin", + [2] = NULL + }; + signed short ret; + int rc = 0; + + if (!imp || !imp->imp_obd || + !imp->imp_obd->obd_type) { + rc = -EINVAL; + } else { + argv[2] = (char *)imp->imp_obd->obd_type->typ_name; + argv[4] = imp->imp_obd->obd_name; + spin_lock(&imp->imp_sec->ps_lock); + if (imp->imp_sec->ps_sepol_mtime == 0 && + imp->imp_sec->ps_sepol[0] == '\0') { + /* ps_sepol has not been initialized */ + argv[5] = NULL; + argv[7] = NULL; + } else { + snprintf(mtime_str, sizeof(mtime_str), "%lu", + imp->imp_sec->ps_sepol_mtime); + mode_str[0] = imp->imp_sec->ps_sepol[0]; + } + spin_unlock(&imp->imp_sec->ps_lock); + ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); + rc = ret>>8; + } + + return rc; +} +#endif + +static inline int sptlrpc_sepol_needs_check(struct ptlrpc_sec *imp_sec) +{ + ktime_t checknext; + + if (send_sepol == 0) + return 0; + + if (send_sepol == -1) + /* send_sepol == -1 means fetch sepol status every time */ + return 1; + + spin_lock(&imp_sec->ps_lock); + checknext = imp_sec->ps_sepol_checknext; + spin_unlock(&imp_sec->ps_lock); + + /* next check is too far in time, please update */ + if (ktime_after(checknext, + ktime_add(ktime_get(), ktime_set(send_sepol, 0)))) + goto setnext; + + if (ktime_before(ktime_get(), checknext)) + /* too early to fetch sepol status */ + return 0; + +setnext: + /* define new sepol_checknext time */ + spin_lock(&imp_sec->ps_lock); + imp_sec->ps_sepol_checknext = ktime_add(ktime_get(), + ktime_set(send_sepol, 0)); + spin_unlock(&imp_sec->ps_lock); + + return 1; +} + +int sptlrpc_get_sepol(struct ptlrpc_request *req) +{ +#ifndef CONFIG_SECURITY_SELINUX + (req->rq_sepol)[0] = '\0'; + + if (unlikely(send_sepol != 0)) + CDEBUG(D_SEC, + "Client cannot report SELinux status, it was not built against libselinux.\n"); + return 0; +#else + struct ptlrpc_sec *imp_sec = req->rq_import->imp_sec; + int rc = 0; + + (req->rq_sepol)[0] = '\0'; + + if (send_sepol == 0) + return 0; + + if (!imp_sec) + return -EINVAL; + + /* Retrieve SELinux status info */ + if (sptlrpc_sepol_needs_check(imp_sec)) + rc = sepol_helper(req->rq_import); + if (likely(rc == 0)) { + spin_lock(&imp_sec->ps_lock); + memcpy(req->rq_sepol, imp_sec->ps_sepol, + sizeof(req->rq_sepol)); + spin_unlock(&imp_sec->ps_lock); + } + + return rc; +#endif +} +EXPORT_SYMBOL(sptlrpc_get_sepol); + /**************************************** * server side security * ****************************************/ diff --git a/fs/lustre/ptlrpc/sec_lproc.c b/fs/lustre/ptlrpc/sec_lproc.c index df7c667..04e421d 100644 --- a/fs/lustre/ptlrpc/sec_lproc.c +++ b/fs/lustre/ptlrpc/sec_lproc.c @@ -131,6 +131,78 @@ static int sptlrpc_ctxs_lprocfs_seq_show(struct seq_file *seq, void *v) LPROC_SEQ_FOPS_RO(sptlrpc_ctxs_lprocfs); +static ssize_t +lprocfs_wr_sptlrpc_sepol(struct file *file, const char __user *buffer, + size_t count, void *data) +{ + struct seq_file *seq = file->private_data; + struct obd_device *dev = seq->private; + struct client_obd *cli = &dev->u.cli; + struct obd_import *imp = cli->cl_import; + struct sepol_downcall_data *param; + int size = sizeof(*param); + int rc = 0; + + if (count < size) { + CERROR("%s: invalid data count = %lu, size = %d\n", + dev->obd_name, (unsigned long) count, size); + return -EINVAL; + } + + param = kzalloc(size, GFP_KERNEL); + if (!param) + return -ENOMEM; + + if (copy_from_user(param, buffer, size)) { + CERROR("%s: bad sepol data\n", dev->obd_name); + rc = -EFAULT; + goto out; + } + + if (param->sdd_magic != SEPOL_DOWNCALL_MAGIC) { + CERROR("%s: sepol downcall bad params\n", + dev->obd_name); + rc = -EINVAL; + goto out; + } + + if (param->sdd_sepol_len == 0 || + param->sdd_sepol_len >= sizeof(imp->imp_sec->ps_sepol)) { + CERROR("%s: invalid sepol data returned\n", + dev->obd_name); + rc = -EINVAL; + goto out; + } + rc = param->sdd_sepol_len; /* save sdd_sepol_len */ + kfree(param); + size = offsetof(struct sepol_downcall_data, + sdd_sepol[rc]); + + /* alloc again with real size */ + rc = 0; + param = kzalloc(size, GFP_KERNEL); + if (!param) + return -ENOMEM; + + if (copy_from_user(param, buffer, size)) { + CERROR("%s: bad sepol data\n", dev->obd_name); + rc = -EFAULT; + goto out; + } + + spin_lock(&imp->imp_sec->ps_lock); + snprintf(imp->imp_sec->ps_sepol, param->sdd_sepol_len + 1, "%s", + param->sdd_sepol); + imp->imp_sec->ps_sepol_mtime = param->sdd_sepol_mtime; + spin_unlock(&imp->imp_sec->ps_lock); + +out: + kfree(param); + + return rc ? rc : count; +} +LPROC_SEQ_FOPS_WR_ONLY(srpc, sptlrpc_sepol); + int sptlrpc_lprocfs_cliobd_attach(struct obd_device *dev) { if (strcmp(dev->obd_type->typ_name, LUSTRE_OSC_NAME) != 0 && @@ -145,6 +217,8 @@ int sptlrpc_lprocfs_cliobd_attach(struct obd_device *dev) &sptlrpc_info_lprocfs_fops); debugfs_create_file("srpc_contexts", 0444, dev->obd_debugfs_entry, dev, &sptlrpc_ctxs_lprocfs_fops); + debugfs_create_file("srpc_sepol", 0200, dev->obd_debugfs_entry, dev, + &srpc_sptlrpc_sepol_fops); return 0; } diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index f723d7b..77b9539 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2936,6 +2936,19 @@ struct close_data { }; }; +/* sepol string format is: + * <1-digit for SELinux status>::: + */ +/* Max length of the sepol string + * Should be large enough to contain a sha512sum of the policy + */ +#define SELINUX_MODE_LEN 1 +#define SELINUX_POLICY_VER_LEN 3 /* 3 chars to leave room for the future */ +#define SELINUX_POLICY_HASH_LEN 64 +#define LUSTRE_NODEMAP_SEPOL_LENGTH (SELINUX_MODE_LEN + NAME_MAX + \ + SELINUX_POLICY_VER_LEN + \ + SELINUX_POLICY_HASH_LEN + 3) + /* * This is the lu_ladvise struct which goes out on the wire. * Corresponds to the userspace arg llapi_lu_ladvise. diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 649aeeb..c1e9dca 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -798,6 +798,7 @@ static inline char *qtype_name(int qtype) } #define IDENTITY_DOWNCALL_MAGIC 0x6d6dd629 +#define SEPOL_DOWNCALL_MAGIC 0x8b8bb842 /* permission */ #define N_PERMS_MAX 64 @@ -819,6 +820,14 @@ struct identity_downcall_data { __u32 idd_groups[0]; }; +struct sepol_downcall_data { + __u32 sdd_magic; + time_t sdd_sepol_mtime; + __u16 sdd_sepol_len; + char sdd_sepol[0]; +}; + + /* lustre volatile file support * file name header: ".^L^S^T^R:volatile" */ From patchwork Thu Feb 27 21:11:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B10B92A for ; Thu, 27 Feb 2020 21:33:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0237E24677 for ; Thu, 27 Feb 2020 21:33:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0237E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6197C349D5D; Thu, 27 Feb 2020 13:28:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D8B4A21FBD9 for ; Thu, 27 Feb 2020 13:19:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A99E42AD5; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A7FA846C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:10 -0500 Message-Id: <1582838290-17243-203-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 202/622] lustre: clio: fix incorrect invariant in cl_io_iter_fini() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" It was discovered during PFL testing that if LINVRNT() is enabled that cl_io_iter_fini() will crash with the following backtrace: kernel: LustreError: 16009:0:(cl_io.c:439:cl_io_iter_fini()) ASSERTION( io->ci_state == CIS_UNLOCKED ) failed kernel: cl_io_iter_fini+0x10c/0x110 [obdclass] kernel: cl_io_loop+0x46/0x220 [obdclass] kernel: cl_setattr_ost+0x1ed/0x2a0 [lustre] kernel: ll_setattr_raw+0x7b0/0x9a0 [lustre] kernel: notify_change+0x1dc/0x430 kernel: do_truncate+0x72/0xc0 kernel: do_sys_ftruncate+0xf5/0x160 This is due to the incorrect assumption that the ci_state will always be CIS_UNLOCKED, but by looking at the behavior of cl_io_loop() it can be seen that is not the case with PFL. We do want to make sure the IO state is not in the middle of some other action (up to CIS_IT_STARTED or CIS_IO_FINISHED or later) when cl_io_iter_fini() is called. WC-bug-id: https://jira.whamcloud.com/browse/LU-11828 Lustre-commit: 8160b9bdf16c ("LU-11828 clio: fix incorrect invariant in cl_io_iter_fini()") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33915 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index a98be15..4278bc0 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -412,7 +412,8 @@ void cl_io_iter_fini(const struct lu_env *env, struct cl_io *io) const struct cl_io_slice *scan; LINVRNT(cl_io_is_loopable(io)); - LINVRNT(io->ci_state < CIS_LOCKED || io->ci_state > CIS_IO_FINISHED); + LINVRNT(io->ci_state <= CIS_IT_STARTED || + io->ci_state > CIS_IO_FINISHED); LINVRNT(cl_io_invariant(io)); list_for_each_entry_reverse(scan, &io->ci_layers, cis_linkage) { From patchwork Thu Feb 27 21:11:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410261 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C33C592A for ; Thu, 27 Feb 2020 21:33:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AB5BB24677 for ; Thu, 27 Feb 2020 21:33:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB5BB24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F4203349D9A; Thu, 27 Feb 2020 13:28:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2724D21FBA6 for ; Thu, 27 Feb 2020 13:19:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AC3C22AD6; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AAEC1468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:11 -0500 Message-Id: <1582838290-17243-204-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 203/622] lustre: mdc: Improve xattr buffer allocations X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Many of the xattr related buffers in the mdc/mdt code are allocated at max_easize, but they are used for normal POSIX xattrs (primarily ACLs) and so they are guaranteed not to exceed XATTR_SIZE_MAX. HSM xattrs should also be less than XATTR_SIZE_MAX. Reduce allocations to MIN(XATTR_SIZE_MAX, max_easize). WC-bug-id: https://jira.whamcloud.com/browse/LU-11868 Lustre-commit: 4f78164f8748 ("LU-11868 mdc: Improve xattr buffer allocations") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34059 Reviewed-by: Andreas Dilger Reviewed-by: Li Dongyang Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_locks.c | 9 ++++++--- fs/lustre/mdc/mdc_reint.c | 4 +++- fs/lustre/mdc/mdc_request.c | 14 ++++++++------ 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index f9d66a4..9898b6a 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -810,7 +810,9 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, generation = obddev->u.cli.cl_import->imp_generation; if (!it || (it->it_op & (IT_OPEN | IT_CREAT))) - acl_bufsize = imp->imp_connect_data.ocd_max_easize; + acl_bufsize = min_t(u32, + imp->imp_connect_data.ocd_max_easize, + XATTR_SIZE_MAX); else acl_bufsize = LUSTRE_POSIX_ACL_MAX_SIZE_OLD; @@ -936,10 +938,11 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, if ((int)lockrep->lock_policy_res2 == -ERANGE && it->it_op & (IT_OPEN | IT_GETATTR | IT_LOOKUP) && - acl_bufsize != imp->imp_connect_data.ocd_max_easize) { + acl_bufsize == LUSTRE_POSIX_ACL_MAX_SIZE_OLD) { mdc_clear_replay_flag(req, -ERANGE); ptlrpc_req_finished(req); - acl_bufsize = imp->imp_connect_data.ocd_max_easize; + acl_bufsize = min_t(u32, imp->imp_connect_data.ocd_max_easize, + XATTR_SIZE_MAX); goto resend; } diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 062685c..2611fc4 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -135,7 +135,9 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data, mdc_setattr_pack(req, op_data, ea, ealen); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, - req->rq_import->imp_connect_data.ocd_max_easize); + min_t(u32, + req->rq_import->imp_connect_data.ocd_max_easize, + XATTR_SIZE_MAX)); ptlrpc_request_set_replen(req); rc = mdc_reint(req, LUSTRE_IMP_FULL); diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index d702fd1..4711288 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -234,9 +234,10 @@ static int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data, rc = mdc_getattr_common(exp, req); if (rc) { - if (rc == -ERANGE && - acl_bufsize != imp->imp_connect_data.ocd_max_easize) { - acl_bufsize = imp->imp_connect_data.ocd_max_easize; + if (rc == -ERANGE) { + acl_bufsize = min_t(u32, + imp->imp_connect_data.ocd_max_easize, + XATTR_SIZE_MAX); mdc_reset_acl_req(req); goto again; } @@ -289,9 +290,10 @@ static int mdc_getattr_name(struct obd_export *exp, struct md_op_data *op_data, rc = mdc_getattr_common(exp, req); if (rc) { - if (rc == -ERANGE && - acl_bufsize != imp->imp_connect_data.ocd_max_easize) { - acl_bufsize = imp->imp_connect_data.ocd_max_easize; + if (rc == -ERANGE) { + acl_bufsize = min_t(u32, + imp->imp_connect_data.ocd_max_easize, + XATTR_SIZE_MAX); mdc_reset_acl_req(req); goto again; } From patchwork Thu Feb 27 21:11:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410287 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2544B138D for ; Thu, 27 Feb 2020 21:34:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0E29824677 for ; Thu, 27 Feb 2020 21:34:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E29824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D98F621FBB6; Thu, 27 Feb 2020 13:28:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 805C621FBA6 for ; Thu, 27 Feb 2020 13:19:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AF6B32AD7; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ADC8246F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:12 -0500 Message-Id: <1582838290-17243-205-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 204/622] lnet: libcfs: allow file/func/line passed to CDEBUG() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Allow the file, function, and line number to be passed to CDEBUG() messages so that they are not duplicated in helper functions that may be called from multiple places. This patch is largely a no-op in terms of code, with the exception of one call in osc_extent_sanity_check0() to OSC_EXTENT_DUMP() that is changed to OSC_EXTENT_DUMP_WITH_LOC(). WC-bug-id: https://jira.whamcloud.com/browse/LU-4664 Lustre-commit: 8503e73bd936 ("LU-4664 libcfs: allow file/func/line passed to CDEBUG()") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33588 Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 43 +++++++++++++---------- include/linux/libcfs/libcfs_debug.h | 69 ++++++++++++++++++++++--------------- 2 files changed, 65 insertions(+), 47 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 1ff258c..a18e791 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -58,10 +58,10 @@ static int osc_io_unplug_async(const struct lu_env *env, static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages, unsigned int lost_grant, unsigned int dirty_grant); -static void __osc_extent_tree_dump(int level, struct osc_object *obj, +static void __osc_extent_tree_dump(int mask, struct osc_object *obj, const char *func, int line); -#define osc_extent_tree_dump(lvl, obj) \ - __osc_extent_tree_dump(lvl, obj, __func__, __LINE__) +#define osc_extent_tree_dump(mask, obj) \ + __osc_extent_tree_dump(mask, obj, __func__, __LINE__) static void osc_unreserve_grant(struct client_obd *cli, unsigned int reserved, unsigned int unused); @@ -106,18 +106,19 @@ static inline char list_empty_marker(struct list_head *list) static const char * const oes_strings[] = { "inv", "active", "cache", "locking", "lockdone", "rpc", "trunc", NULL }; -#define OSC_EXTENT_DUMP(lvl, extent, fmt, ...) do { \ +#define OSC_EXTENT_DUMP_WITH_LOC(file, func, line, mask, extent, fmt, ...) do {\ + static struct cfs_debug_limit_state cdls; \ struct osc_extent *__ext = (extent); \ char __buf[16]; \ \ - CDEBUG(lvl, \ + __CDEBUG_WITH_LOC(file, func, line, mask, &cdls, \ "extent %p@{" EXTSTR ", " \ "[%d|%d|%c|%s|%s|%p], [%d|%d|%c|%c|%p|%u|%p]} " fmt, \ /* ----- extent part 0 ----- */ \ __ext, EXTPARA(__ext), \ /* ----- part 1 ----- */ \ kref_read(&__ext->oe_refc), \ - atomic_read(&__ext->oe_users), \ + atomic_read(&__ext->oe_users), \ list_empty_marker(&__ext->oe_link), \ oes_strings[__ext->oe_state], ext_flags(__ext, __buf), \ __ext->oe_obj, \ @@ -128,12 +129,16 @@ static inline char list_empty_marker(struct list_head *list) __ext->oe_dlmlock, __ext->oe_mppr, __ext->oe_owner, \ /* ----- part 4 ----- */ \ ## __VA_ARGS__); \ - if (lvl == D_ERROR && __ext->oe_dlmlock) \ + if (mask == D_ERROR && __ext->oe_dlmlock) \ LDLM_ERROR(__ext->oe_dlmlock, "extent: %p", __ext); \ else \ LDLM_DEBUG(__ext->oe_dlmlock, "extent: %p", __ext); \ } while (0) +#define OSC_EXTENT_DUMP(mask, ext, fmt, ...) \ + OSC_EXTENT_DUMP_WITH_LOC(__FILE__, __func__, __LINE__, \ + mask, ext, fmt, ## __VA_ARGS__) + #undef EASSERTF #define EASSERTF(expr, ext, fmt, args...) do { \ if (!(expr)) { \ @@ -300,9 +305,9 @@ static int __osc_extent_sanity_check(struct osc_extent *ext, out: if (rc != 0) - OSC_EXTENT_DUMP(D_ERROR, ext, - "%s:%d sanity check %p failed with rc = %d\n", - func, line, ext, rc); + OSC_EXTENT_DUMP_WITH_LOC(__FILE__, func, line, D_ERROR, ext, + "sanity check %p failed: rc = %d\n", + ext, rc); return rc; } @@ -1250,34 +1255,34 @@ static int osc_extent_expand(struct osc_extent *ext, pgoff_t index, return rc; } -static void __osc_extent_tree_dump(int level, struct osc_object *obj, +static void __osc_extent_tree_dump(int mask, struct osc_object *obj, const char *func, int line) { struct osc_extent *ext; int cnt; - if (!cfs_cdebug_show(level, DEBUG_SUBSYSTEM)) + if (!cfs_cdebug_show(mask, DEBUG_SUBSYSTEM)) return; - CDEBUG(level, "Dump object %p extents at %s:%d, mppr: %u.\n", + CDEBUG(mask, "Dump object %p extents at %s:%d, mppr: %u.\n", obj, func, line, osc_cli(obj)->cl_max_pages_per_rpc); /* osc_object_lock(obj); */ cnt = 1; for (ext = first_extent(obj); ext; ext = next_extent(ext)) - OSC_EXTENT_DUMP(level, ext, "in tree %d.\n", cnt++); + OSC_EXTENT_DUMP(mask, ext, "in tree %d.\n", cnt++); cnt = 1; list_for_each_entry(ext, &obj->oo_hp_exts, oe_link) - OSC_EXTENT_DUMP(level, ext, "hp %d.\n", cnt++); + OSC_EXTENT_DUMP(mask, ext, "hp %d.\n", cnt++); cnt = 1; list_for_each_entry(ext, &obj->oo_urgent_exts, oe_link) - OSC_EXTENT_DUMP(level, ext, "urgent %d.\n", cnt++); + OSC_EXTENT_DUMP(mask, ext, "urgent %d.\n", cnt++); cnt = 1; list_for_each_entry(ext, &obj->oo_reading_exts, oe_link) - OSC_EXTENT_DUMP(level, ext, "reading %d.\n", cnt++); + OSC_EXTENT_DUMP(mask, ext, "reading %d.\n", cnt++); /* osc_object_unlock(obj); */ } @@ -1395,9 +1400,9 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap, return 0; } -#define OSC_DUMP_GRANT(lvl, cli, fmt, args...) do { \ +#define OSC_DUMP_GRANT(mask, cli, fmt, args...) do { \ struct client_obd *__tmp = (cli); \ - CDEBUG(lvl, "%s: grant { dirty: %ld/%ld dirty_pages: %ld/%lu " \ + CDEBUG(mask, "%s: grant { dirty: %ld/%ld dirty_pages: %ld/%lu " \ "dropped: %ld avail: %ld, dirty_grant: %ld, " \ "reserved: %ld, flight: %d } lru {in list: %ld, " \ "left: %ld, waiters: %d }" fmt "\n", \ diff --git a/include/linux/libcfs/libcfs_debug.h b/include/linux/libcfs/libcfs_debug.h index 31a97ec..99905f7 100644 --- a/include/linux/libcfs/libcfs_debug.h +++ b/include/linux/libcfs/libcfs_debug.h @@ -79,26 +79,29 @@ (THREAD_SIZE - 1))) # endif /* __ia64__ */ -#define __CHECK_STACK(msgdata, mask, cdls) \ +#define __CHECK_STACK_WITH_LOC(file, func, line, msgdata, mask, cdls) \ do { \ if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \ - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \ + LIBCFS_DEBUG_MSG_DATA_INIT(file, func, line, msgdata, \ + D_WARNING, NULL); \ libcfs_stack = CDEBUG_STACK(); \ - libcfs_debug_msg(msgdata, \ - "maximum lustre stack %lu\n", \ - CDEBUG_STACK()); \ + libcfs_debug_msg(msgdata, "maximum lustre stack %u\n", \ + libcfs_stack); \ (msgdata)->msg_mask = mask; \ (msgdata)->msg_cdls = cdls; \ dump_stack(); \ /*panic("LBUG");*/ \ } \ } while (0) -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls) #else /* __x86_64__ */ -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0) #define CDEBUG_STACK() (0L) +#define __CHECK_STACK_WITH_LOC(file, func, line, msgdata, mask, cdls) \ + do {} while (0) #endif /* __x86_64__ */ +#define CFS_CHECK_STACK(msgdata, mask, cdls) \ + __CHECK_STACK_WITH_LOC(__FILE__, __func__, __LINE__, \ + msgdata, mask, cdls) #ifndef DEBUG_SUBSYSTEM # define DEBUG_SUBSYSTEM S_UNDEFINED #endif @@ -121,24 +124,28 @@ struct libcfs_debug_msg_data { struct cfs_debug_limit_state *msg_cdls; }; -#define LIBCFS_DEBUG_MSG_DATA_INIT(data, mask, cdls) \ +#define LIBCFS_DEBUG_MSG_DATA_INIT(file, func, line, msgdata, mask, cdls)\ do { \ - (data)->msg_subsys = DEBUG_SUBSYSTEM; \ - (data)->msg_file = __FILE__; \ - (data)->msg_fn = __func__; \ - (data)->msg_line = __LINE__; \ - (data)->msg_cdls = (cdls); \ - (data)->msg_mask = (mask); \ + (msgdata)->msg_subsys = DEBUG_SUBSYSTEM; \ + (msgdata)->msg_file = (file); \ + (msgdata)->msg_fn = (func); \ + (msgdata)->msg_line = (line); \ + (msgdata)->msg_mask = (mask); \ + (msgdata)->msg_cdls = (cdls); \ } while (0) -#define LIBCFS_DEBUG_MSG_DATA_DECL(dataname, mask, cdls) \ - static struct libcfs_debug_msg_data dataname = { \ - .msg_subsys = DEBUG_SUBSYSTEM, \ - .msg_file = __FILE__, \ - .msg_fn = __func__, \ - .msg_line = __LINE__, \ - .msg_cdls = (cdls) }; \ - dataname.msg_mask = (mask) +#define LIBCFS_DEBUG_MSG_DATA_DECL_LOC(file, func, line, msgdata, mask, cdls)\ + static struct libcfs_debug_msg_data msgdata = { \ + .msg_subsys = DEBUG_SUBSYSTEM, \ + .msg_file = (file), \ + .msg_fn = (func), \ + .msg_line = (line), \ + .msg_cdls = (cdls) }; \ + msgdata.msg_mask = (mask) + +#define LIBCFS_DEBUG_MSG_DATA_DECL(msgdata, mask, cdls) \ + LIBCFS_DEBUG_MSG_DATA_DECL_LOC(__FILE__, __func__, __LINE__, \ + msgdata, mask, cdls) /** * Filters out logging messages based on mask and subsystem. @@ -147,27 +154,32 @@ static inline int cfs_cdebug_show(unsigned int mask, unsigned int subsystem) { return mask & D_CANTMASK || ((libcfs_debug & mask) && (libcfs_subsystem_debug & subsystem)); + } -#define __CDEBUG(cdls, mask, format, ...) \ +#define __CDEBUG_WITH_LOC(file, func, line, mask, cdls, format, ...) \ do { \ static struct libcfs_debug_msg_data msgdata; \ \ - CFS_CHECK_STACK(&msgdata, mask, cdls); \ + __CHECK_STACK_WITH_LOC(file, func, line, &msgdata, mask, cdls); \ \ if (cfs_cdebug_show(mask, DEBUG_SUBSYSTEM)) { \ - LIBCFS_DEBUG_MSG_DATA_INIT(&msgdata, mask, cdls); \ + LIBCFS_DEBUG_MSG_DATA_INIT(file, func, line, \ + &msgdata, mask, cdls); \ libcfs_debug_msg(&msgdata, format, ## __VA_ARGS__); \ } \ } while (0) -#define CDEBUG(mask, format, ...) __CDEBUG(NULL, mask, format, ## __VA_ARGS__) +#define CDEBUG(mask, format, ...) \ + __CDEBUG_WITH_LOC(__FILE__, __func__, __LINE__, \ + mask, NULL, format, ## __VA_ARGS__) #define CDEBUG_LIMIT(mask, format, ...) \ do { \ static struct cfs_debug_limit_state cdls; \ \ - __CDEBUG(&cdls, mask, format, ## __VA_ARGS__); \ + __CDEBUG_WITH_LOC(__FILE__, __func__, __LINE__, \ + mask, &cdls, format, ## __VA_ARGS__); \ } while (0) /* @@ -189,7 +201,8 @@ static inline int cfs_cdebug_show(unsigned int mask, unsigned int subsystem) "%x-%x: " format, errnum, LERRCHKSUM(errnum), ## __VA_ARGS__) #define LCONSOLE_ERROR(format, ...) LCONSOLE_ERROR_MSG(0x00, format, ## __VA_ARGS__) -#define LCONSOLE_EMERG(format, ...) CDEBUG(D_CONSOLE | D_EMERG, format, ## __VA_ARGS__) +#define LCONSOLE_EMERG(format, ...) \ + CDEBUG(D_CONSOLE | D_EMERG, format, ## __VA_ARGS__) int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, const char *format1, ...) From patchwork Thu Feb 27 21:11:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410291 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F30F92A for ; Thu, 27 Feb 2020 21:34:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EC02324677 for ; Thu, 27 Feb 2020 21:34:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC02324677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44E703487EC; Thu, 27 Feb 2020 13:28:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D55CF21FBA6 for ; Thu, 27 Feb 2020 13:19:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B1EF42AD8; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0CBB46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:13 -0500 Message-Id: <1582838290-17243-206-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 205/622] lustre: llog: add startcat for wrapped catalog X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko The osp_sync_thread loop for a llog_cat_process has a mistake. When llog_cat_process has reached a bottom of catalog, the processing restarts with 0. Which means a default processing. In this case a catalog is wrapped and processing starts from a llh_cat_idx. But records at the bottom were processed already, and were not cancelled yet. The next message appears at log. osp_sync_interpret()) reply req ffff8800123e3600/1, rc -2, transno 0 llog_cat_process support startcat index for processing catalog. In this case the processing starts from startcat index. But if catalog is wrapped startcat index is ignored. The patch adds supporting of startcat index for wrapped catalog. WC-bug-id: https://jira.whamcloud.com/browse/LU-10913 Cray-bug-id: LUS-6765 Lustre-commit: 8109c9e1718d ("LU-10913 llog: add startcat for wrapped catalog") Signed-off-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/33749 Reviewed-by: Sergey Cheremencev Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/llog_cat.c | 33 ++++++++++++++++++++++++--------- include/uapi/linux/lustre/lustre_idl.h | 5 +++++ 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index ca97e08..30b0ac5 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -222,7 +222,7 @@ static int llog_cat_process_or_fork(const struct lu_env *env, LASSERT(llh->llh_flags & LLOG_F_IS_CAT); d.lpd_data = data; d.lpd_cb = cb; - d.lpd_startcat = startcat; + d.lpd_startcat = (startcat == LLOG_CAT_FIRST ? 0 : startcat); d.lpd_startidx = startidx; if (llh->llh_cat_idx > cat_llh->lgh_last_idx) { @@ -231,14 +231,29 @@ static int llog_cat_process_or_fork(const struct lu_env *env, CWARN("%s: catlog " DFID " crosses index zero\n", cat_llh->lgh_ctxt->loc_obd->obd_name, PFID(&cat_llh->lgh_id.lgl_oi.oi_fid)); - - cd.lpcd_first_idx = llh->llh_cat_idx; - cd.lpcd_last_idx = 0; - rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork); - if (rc != 0) - return rc; - - cd.lpcd_first_idx = 0; + /*startcat = 0 is default value for general processing */ + if ((startcat != LLOG_CAT_FIRST && + startcat >= llh->llh_cat_idx) || !startcat) { + /* processing the catalog part at the end */ + cd.lpcd_first_idx = (startcat ? startcat : + llh->llh_cat_idx); + cd.lpcd_last_idx = 0; + rc = llog_process_or_fork(env, cat_llh, cat_cb, + &d, &cd, fork); + /* Reset the startcat because it has already reached + * catalog bottom. + */ + startcat = 0; + if (rc != 0) + return rc; + } + /* processing the catalog part at the beginning */ + cd.lpcd_first_idx = (startcat == LLOG_CAT_FIRST) ? 0 : startcat; + /* Note, the processing will stop at the lgh_last_idx value, + * and it could be increased during processing. So records + * between current lgh_last_idx and lgh_last_idx in future + * would left unprocessed. + */ cd.lpcd_last_idx = cat_llh->lgh_last_idx; rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork); } else { diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 77b9539..76068ee 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2618,6 +2618,11 @@ enum llog_flag { LLOG_F_EXT_X_OMODE | LLOG_F_EXT_X_XATTR, }; +/* means first record of catalog */ +enum { + LLOG_CAT_FIRST = -1, +}; + /* On-disk header structure of each log object, stored in little endian order */ #define LLOG_MIN_CHUNK_SIZE 8192 #define LLOG_HEADER_SIZE (96) /* sizeof (llog_log_hdr) + From patchwork Thu Feb 27 21:11:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410295 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38A2A138D for ; Thu, 27 Feb 2020 21:34:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 21BAD24677 for ; Thu, 27 Feb 2020 21:34:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 21BAD24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 650F9349E9D; Thu, 27 Feb 2020 13:28:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36D9521FAF1 for ; Thu, 27 Feb 2020 13:19:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B65972AD9; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B3C9346A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:14 -0500 Message-Id: <1582838290-17243-207-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 206/622] lustre: llog: add synchronization for the last record X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko The initial problem was a race between llog_process_thread and llog_osd_write_rec for a last record with lgh_last_idx. The catalog should be wrapped for the problem. The lgh_last_idx could be increased with a modification of llog bitmap, and a writing record happen a bit later. When llog_process_thread processing lgh_last_idx after modification and before a write it operates with old record data. The lustre client is only a consumer of llog records but we still need the changes to better handle consumption of the llog records. WC-bug-id: https://jira.whamcloud.com/browse/LU-11591 Lustre-commit: ec4194e4e78c ("LU-11591 llog: add synchronization for the last record") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-6683 Reviewed-on: https://review.whamcloud.com/33683 Reviewed-by: Andreas Dilger Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/llog.c | 68 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 50 insertions(+), 18 deletions(-) diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c index 65384ded..4e9fd17 100644 --- a/fs/lustre/obdclass/llog.c +++ b/fs/lustre/obdclass/llog.c @@ -230,10 +230,11 @@ static int llog_process_thread(void *arg) struct llog_process_cat_data *cd = lpi->lpi_catdata; char *buf; u64 cur_offset, tmp_offset; - int chunk_size; + size_t chunk_size; int rc = 0, index = 1, last_index; int saved_index = 0; int last_called_index = 0; + bool repeated = false; if (!llh) return -EINVAL; @@ -261,8 +262,10 @@ static int llog_process_thread(void *arg) while (rc == 0) { unsigned int buf_offset = 0; struct llog_rec_hdr *rec; + off_t chunk_offset = 0; bool partial_chunk; - off_t chunk_offset; + int synced_idx = 0; + int lh_last_idx; /* skip records not set in bitmap */ while (index <= last_index && @@ -277,8 +280,23 @@ static int llog_process_thread(void *arg) repeat: /* get the buf with our target record; avoid old garbage */ memset(buf, 0, chunk_size); + /* the record index for outdated chunk data */ + /* it is safe to process buffer until saved lgh_last_idx */ + lh_last_idx = LLOG_HDR_TAIL(llh)->lrt_index; rc = llog_next_block(lpi->lpi_env, loghandle, &saved_index, index, &cur_offset, buf, chunk_size); + if (repeated && rc) + CDEBUG(D_OTHER, + "cur_offset %llu, chunk_offset %llu, buf_offset %u, rc = %d\n", + cur_offset, (u64)chunk_offset, buf_offset, rc); + /* we`ve tried to reread the chunk, but there is no + * new records + */ + if (rc == -EIO && repeated && (chunk_offset + buf_offset) == + cur_offset) { + rc = 0; + goto out; + } if (rc) goto out; @@ -313,29 +331,43 @@ static int llog_process_thread(void *arg) CDEBUG(D_OTHER, "after swabbing, type=%#x idx=%d\n", rec->lrh_type, rec->lrh_index); - /* - * for partial chunk the end of it is zeroed, check - * for index 0 to distinguish it. + if (index == (synced_idx + 1) && + synced_idx == LLOG_HDR_TAIL(llh)->lrt_index) { + rc = 0; + goto out; + } + + /* the bitmap could be changed during processing + * records from the chunk. For wrapped catalog + * it means we can read deleted record and try to + * process it. Check this case and reread the chunk. + * It is safe to process to lh_last_idx, including + * lh_last_idx if it was synced. We can not do <= + * comparison, cause for wrapped catalog lgh_last_idx + * could be less than index. So we detect last index + * for processing as index == lh_last_idx+1. But when + * catalog is wrapped and full lgh_last_idx=llh_cat_idx, + * the first processing index is llh_cat_idx+1. */ - if (partial_chunk && !rec->lrh_index) { - /* concurrent llog_add() might add new records - * while llog_processing, check this is not - * the case and re-read the current chunk - * otherwise. - */ - if (index > loghandle->lgh_last_idx) { - rc = 0; - goto out; - } - CDEBUG(D_OTHER, - "Re-read last llog buffer for new records, index %u, last %u\n", - index, loghandle->lgh_last_idx); + if ((index == lh_last_idx && synced_idx != index) || + (index == (lh_last_idx + 1) && + !(index == (llh->llh_cat_idx + 1) && + (llh->llh_flags & LLOG_F_IS_CAT))) || + (rec->lrh_index == 0 && !repeated)) { /* save offset inside buffer for the re-read */ buf_offset = (char *)rec - (char *)buf; cur_offset = chunk_offset; + repeated = true; + /* We need to be sure lgh_last_idx + * record was saved to disk + */ + synced_idx = LLOG_HDR_TAIL(llh)->lrt_index; + CDEBUG(D_OTHER, "synced_idx: %d\n", synced_idx); goto repeat; } + repeated = false; + if (!rec->lrh_len || rec->lrh_len > chunk_size) { CWARN("invalid length %d in llog record for index %d/%d\n", rec->lrh_len, From patchwork Thu Feb 27 21:11:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410039 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1806A1580 for ; Thu, 27 Feb 2020 21:28:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 00A41246A0 for ; Thu, 27 Feb 2020 21:28:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 00A41246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3602D21F9F2; Thu, 27 Feb 2020 13:24:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8FA2821FAF1 for ; Thu, 27 Feb 2020 13:19:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B9BEC2ADA; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B6A9D46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:15 -0500 Message-Id: <1582838290-17243-208-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 207/622] lustre: ptlrpc: improve memory allocation for service RPCs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko The memory for service RPCs are not always page aligned for its size i.e 17KiB for example. Round up to the nearest power of 2 so we can effectively use the whole allocated buffer. WC-bug-id: https://jira.whamcloud.com/browse/LU-11897 Cray-bug-id: LUS-6657 Lustre-commit: 3a90458bd84d ("LU-11897 ost: improve memory allocation for ost") Signed-off-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/34127 Reviewed-by: Alexey Lyashkov Reviewed-by: Alexander Zarochentsev Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index b94ed6a..7bc578c 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -641,6 +641,13 @@ struct ptlrpc_service * service->srv_rep_portal = conf->psc_buf.bc_rep_portal; service->srv_req_portal = conf->psc_buf.bc_req_portal; + /* With slab/alloc_pages buffer size will be rounded up to 2^n */ + if (service->srv_buf_size & (service->srv_buf_size - 1)) { + int round = size_roundup_power2(service->srv_buf_size); + + service->srv_buf_size = round; + } + /* Increase max reply size to next power of two */ service->srv_max_reply_size = 1; while (service->srv_max_reply_size < From patchwork Thu Feb 27 21:11:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410129 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 690B9138D for ; Thu, 27 Feb 2020 21:30:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51B2120801 for ; Thu, 27 Feb 2020 21:30:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51B2120801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 053B534973C; Thu, 27 Feb 2020 13:26:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D034021FAF1 for ; Thu, 27 Feb 2020 13:19:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BCF1A2ADF; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B9EBF468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:16 -0500 Message-Id: <1582838290-17243-209-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 208/622] lustre: llite: enable flock mount option by default X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The "flock" mount option has been optional for many years, initially because of potential stability issues, and also to provide a choice for administrators to select between "flock" and "localflock" options. However, from the large number of problems that users report when trying to use applications that depend on this feature (typically databases and other cloud stacks) that disabling flock by default causes more problems than it solves. Enable the "flock" (distributed coherent userspace locking) feature by default. If applications do not need this functionality, then it will not affect them. If applications *do* need this functionality, they will get it. If administrators really know what they are doing, then they can use the "localflock" feature to enable client-local flock functionality, possibly only on select nodes that need this. Users wanting to disable this functionality should mount with the existing "-o noflock" mount option. If clients are already using "-o {flock|localflock|noflock}" then their existing options will be handled appropriately. WC-bug-id: https://jira.whamcloud.com/browse/LU-10885 Lustre-commit: 3613af3e15cb ("LU-10885 llite: enable flock mount option by default") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/32091 Reviewed-by: Patrick Farrell Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 4797ee9..84fc54d 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -104,7 +104,7 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_flags |= LL_SBI_VERBOSE; sbi->ll_flags |= LL_SBI_CHECKSUM; - + sbi->ll_flags |= LL_SBI_FLOCK; sbi->ll_flags |= LL_SBI_LRU_RESIZE; sbi->ll_flags |= LL_SBI_LAZYSTATFS; From patchwork Thu Feb 27 21:11:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410043 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DC47138D for ; Thu, 27 Feb 2020 21:28:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05CA7246A0 for ; Thu, 27 Feb 2020 21:28:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05CA7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 349F621FCD8; Thu, 27 Feb 2020 13:24:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1DED321FAF1 for ; Thu, 27 Feb 2020 13:19:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BE64A2AE0; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BD23A46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:17 -0500 Message-Id: <1582838290-17243-210-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 209/622] lustre: lmv: avoid gratuitous 64-bit modulus X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Fix the pct() calculation to use unsigned long arguments, since this is what callers use. Remove duplicate pct() definition in lproc_mdc. Don't do a 64-bit modulus of the LNet NID to find the starting MDT index when this isn't really needed. Similarly, don't compute the FLD cache usage percentage for a debug message that is never used. Fixes: fed15ee3b3f2 ("lustre: headers: define pct(a,b) once") WC-bug-id: https://jira.whamcloud.com/browse/LU-10171 Lustre-commit: e1b63fd21177 ("LU-10171 lmv: avoid gratuitous 64-bit modulus") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33922 Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/fld/fld_cache.c | 4 +--- fs/lustre/include/lprocfs_status.h | 2 +- fs/lustre/include/obd.h | 3 +-- fs/lustre/lmv/lmv_obd.c | 5 ++++- fs/lustre/mdc/lproc_mdc.c | 8 +++----- 5 files changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/lustre/fld/fld_cache.c b/fs/lustre/fld/fld_cache.c index 96be544..5267ba2 100644 --- a/fs/lustre/fld/fld_cache.c +++ b/fs/lustre/fld/fld_cache.c @@ -98,10 +98,8 @@ void fld_cache_fini(struct fld_cache *cache) fld_cache_flush(cache); CDEBUG(D_INFO, "FLD cache statistics (%s):\n", cache->fci_name); - CDEBUG(D_INFO, " Total reqs: %llu\n", cache->fci_stat.fst_count); CDEBUG(D_INFO, " Cache reqs: %llu\n", cache->fci_stat.fst_cache); - CDEBUG(D_INFO, " Cache hits: %u%%\n", - pct(cache->fci_stat.fst_cache, cache->fci_stat.fst_count)); + CDEBUG(D_INFO, " Total reqs: %llu\n", cache->fci_stat.fst_count); kfree(cache); } diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index c1079f1..8d74822 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -58,7 +58,7 @@ struct lprocfs_vars { umode_t proc_mode; }; -static inline u32 pct(s64 a, s64 b) +static inline unsigned int pct(unsigned long a, unsigned long b) { return b ? a * 100 / b : 0; } diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 4829e11..bf0bf97 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -437,11 +437,10 @@ struct lmv_obd { int connected; int max_easize; int max_def_easize; + u32 lmv_statfs_start; u32 tgts_size; /* size of tgts array */ struct lmv_tgt_desc **tgts; - int lmv_statfs_start; - struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; }; diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 9f9abd3..0685925 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1366,8 +1366,11 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) break; if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) { + /* We dont need a full 64-bit modulus, just enough + * to distribute the requests across MDTs evenly. + */ lmv->lmv_statfs_start = - lnet_id.nid % lmv->desc.ld_tgt_count; + (u32)lnet_id.nid % lmv->desc.ld_tgt_count; break; } } diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 70c9eaf..81167bbd 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -328,7 +328,6 @@ static ssize_t mdc_rpc_stats_seq_write(struct file *file, return len; } -#define pct(a, b) (b ? a * 100 / b : 0) static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v) { struct obd_device *dev = seq->private; @@ -364,7 +363,7 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", 1 << i, r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), @@ -388,7 +387,7 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", i, r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), pct(write_cum, write_tot)); if (read_cum == read_tot && write_cum == write_tot) @@ -410,7 +409,7 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v) read_cum += r; write_cum += w; - seq_printf(seq, "%d:\t\t%10lu %3lu %3lu | %10lu %3lu %3lu\n", + seq_printf(seq, "%d:\t\t%10lu %3u %3u | %10lu %3u %3u\n", (i == 0) ? 0 : 1 << (i - 1), r, pct(r, read_tot), pct(read_cum, read_tot), w, pct(w, write_tot), pct(write_cum, write_tot)); @@ -421,7 +420,6 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v) return 0; } -#undef pct LPROC_SEQ_FOPS(mdc_rpc_stats); static int mdc_stats_seq_show(struct seq_file *seq, void *v) From patchwork Thu Feb 27 21:11:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410133 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0585A92A for ; Thu, 27 Feb 2020 21:31:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E257C20801 for ; Thu, 27 Feb 2020 21:31:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E257C20801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4D4B534975F; Thu, 27 Feb 2020 13:26:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 74D1221FBC5 for ; Thu, 27 Feb 2020 13:19:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C22F42AE1; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C003446F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:18 -0500 Message-Id: <1582838290-17243-211-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 210/622] lustre: Ensure crc-t10pi is enabled. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Also simplify check_write_checksum code a little - the var isn't needed. Fixes: 86e186db3ed ("lustre: osc: T10PI between RPC and BIO") WC-bug-id: https://jira.whamcloud.com/browse/LU-11770 Lustre-commit: e0fb3133372e ("LU-11770 osc: allow build without blk_integrity or crc-t10pi") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33923 Reviewed-by: Li Dongyang Reviewed-by: Patrick Farrell Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/Kconfig | 1 + fs/lustre/osc/osc_request.c | 14 +++----------- 2 files changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/lustre/Kconfig b/fs/lustre/Kconfig index 2eb7e45..bc89565 100644 --- a/fs/lustre/Kconfig +++ b/fs/lustre/Kconfig @@ -9,6 +9,7 @@ config LUSTRE_FS select CRYPTO_SHA1 select CRYPTO_SHA256 select CRYPTO_SHA512 + select CRC_T10DIF select DEBUG_FS select FHANDLE select QUOTA diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index ba84bd1..6ce22c3 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1638,7 +1638,6 @@ static int check_write_checksum(struct obdo *oa, const char *obd_name = aa->aa_cli->cl_import->imp_obd->obd_name; obd_dif_csum_fn *fn = NULL; int sector_size = 0; - bool t10pi = false; u32 new_cksum; char *msg; enum cksum_type cksum_type; @@ -1658,22 +1657,18 @@ static int check_write_checksum(struct obdo *oa, switch (cksum_type) { case OBD_CKSUM_T10IP512: - t10pi = true; fn = obd_dif_ip_fn; sector_size = 512; break; case OBD_CKSUM_T10IP4K: - t10pi = true; fn = obd_dif_ip_fn; sector_size = 4096; break; case OBD_CKSUM_T10CRC512: - t10pi = true; fn = obd_dif_crc_fn; sector_size = 512; break; case OBD_CKSUM_T10CRC4K: - t10pi = true; fn = obd_dif_crc_fn; sector_size = 4096; break; @@ -1681,13 +1676,10 @@ static int check_write_checksum(struct obdo *oa, break; } - if (t10pi) + if (fn) rc = osc_checksum_bulk_t10pi(obd_name, aa->aa_requested_nob, - aa->aa_page_count, - aa->aa_ppga, - OST_WRITE, - fn, - sector_size, + aa->aa_page_count, aa->aa_ppga, + OST_WRITE, fn, sector_size, &new_cksum); else rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count, From patchwork Thu Feb 27 21:11:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410269 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05D9692A for ; Thu, 27 Feb 2020 21:33:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E181D24677 for ; Thu, 27 Feb 2020 21:33:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E181D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 509A9349DBD; Thu, 27 Feb 2020 13:28:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C281F21FBC5 for ; Thu, 27 Feb 2020 13:19:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C51512AE2; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C30CD46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:19 -0500 Message-Id: <1582838290-17243-212-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 211/622] lustre: lov: fix lov_iocontrol for inactive OST case X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev For inactive OSTs lov->lov_tgts[index]->ltd_exp is NULL. lov_iocontrol() is to check that before dereferencing to lov->lov_tgts[index]->ltd_exp->exp_obd. WC-bug-id: https://jira.whamcloud.com/browse/LU-11911 Lustre-commit: 0facd12afa33 ("LU-11911 lov: fix lov_iocontrol for inactive OST case") Signed-off-by: Vladimir Saveliev Cray-bug-id: LUS-6937 Reviewed-on: https://review.whamcloud.com/34148 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Signed-off-by: James Simmons --- fs/lustre/lov/lov_obd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 08d7edc..cc0ca1c 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -1001,15 +1001,15 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len, /* Try again with the next index */ return -EAGAIN; - imp = lov->lov_tgts[index]->ltd_exp->exp_obd->u.cli.cl_import; - if (!lov->lov_tgts[index]->ltd_active && - imp->imp_state != LUSTRE_IMP_IDLE) - return -ENODATA; - osc_obd = class_exp2obd(lov->lov_tgts[index]->ltd_exp); if (!osc_obd) return -EINVAL; + imp = osc_obd->u.cli.cl_import; + if (!lov->lov_tgts[index]->ltd_active && + imp->imp_state != LUSTRE_IMP_IDLE) + return -ENODATA; + /* copy UUID */ if (copy_to_user(data->ioc_pbuf2, obd2cli_tgt(osc_obd), min_t(unsigned long, data->ioc_plen2, From patchwork Thu Feb 27 21:11:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410025 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 475161580 for ; Thu, 27 Feb 2020 21:28:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2FC2E246A0 for ; Thu, 27 Feb 2020 21:28:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2FC2E246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3E9193492AC; Thu, 27 Feb 2020 13:24:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1244621FBF3 for ; Thu, 27 Feb 2020 13:19:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C72B02AEF; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C615B46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:20 -0500 Message-Id: <1582838290-17243-213-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 212/622] lustre: llite: Initialize cl_dirty_max_pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell cl_dirty_max_pages must be initialized to zero before calling client_adjust_max_dirty. WC-bug-id: https://jira.whamcloud.com/browse/LU-11919 Lustre-commit: 2e9c896dec6d ("LU-11919 llite: Initialize cl_dirty_max_pages") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34173 Reviewed-by: James Simmons Reviewed-by: Li Xi Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 5fe5711..11955b1 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -315,6 +315,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) sizeof(server_uuid))); cli->cl_dirty_pages = 0; + cli->cl_dirty_max_pages = 0; cli->cl_avail_grant = 0; /* FIXME: Should limit this for the sum of all cl_dirty_max_pages. */ /* From patchwork Thu Feb 27 21:11:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410299 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4BF7138D for ; Thu, 27 Feb 2020 21:34:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D1A924677 for ; Thu, 27 Feb 2020 21:34:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D1A924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6041E349ED1; Thu, 27 Feb 2020 13:29:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 53AB221FBF3 for ; Thu, 27 Feb 2020 13:19:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CA5592AF0; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C907D468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:21 -0500 Message-Id: <1582838290-17243-214-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 213/622] lustre: mdc: don't use ACL at setattr X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko For ldiskfs with large_ea, EA max size is equal to 1MB. At mdc_setattr ptlrpc reply size is 1.1MB and it is rounded to 2MB. So REINT_SETATTR request takes about 2MB of memory at client. For a MDS failover case many request stay at reply queue and could lead to OOM. The patch changes acl size to zero, cause server doesn't fill acl for setattr request. WC-bug-id: https://jira.whamcloud.com/browse/LU-11934 Lustre-commit: e7f6f870c356 ("LU-11934 mdc: don't use ACL at setattr") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-6938 Reviewed-on: https://review.whamcloud.com/34194 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_reint.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 2611fc4..0e5f012 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -134,10 +134,8 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data, op_data->op_attr.ia_ctime.tv_sec); mdc_setattr_pack(req, op_data, ea, ealen); - req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, - min_t(u32, - req->rq_import->imp_connect_data.ocd_max_easize, - XATTR_SIZE_MAX)); + req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, 0); + ptlrpc_request_set_replen(req); rc = mdc_reint(req, LUSTRE_IMP_FULL); From patchwork Thu Feb 27 21:11:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410047 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9852A1580 for ; Thu, 27 Feb 2020 21:28:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8108A246A0 for ; Thu, 27 Feb 2020 21:28:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8108A246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4DC6021FBC0; Thu, 27 Feb 2020 13:25:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9536721FBF3 for ; Thu, 27 Feb 2020 13:19:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD5FB2AF1; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CBDD946D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:22 -0500 Message-Id: <1582838290-17243-215-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 214/622] lnet: o2iblnd: ibc_rxs is created and freed with different size X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh kiblnd_create_conn()) alloc '(conn->ibc_rxs)': 26832 at ffffc90012e69000 kiblnd_destroy_conn()) kfreed 'conn->ibc_rxs': 4576 at ffffc90012e69000 The size changed by kiblnd_create_conn() : "peer 172.18.2.3@o2ib - queue depth reduced from 128 to 21" Based on size LIBCFS_FREE() decides whether to use kfree or vfree and accounts memory usage. Allocate ibc_rxs after rdma_create_qp() Cray-bug-id: LUS-6339 WC-bug-id: https://jira.whamcloud.com/browse/LU-11702 Lustre-commit: 277a6faa5b16 ("LU-11702 o2iblnd: ibc_rxs is created and freed with different size") Signed-off-by: Andriy Skulysh Reviewed-by: Andrew Perepechko Reviewed-by: Chris Horn Reviewed-on: https://review.whamcloud.com/33721 Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 017fe5f..0e207ef 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -735,6 +735,8 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, conn->ibc_cmid = cmid; conn->ibc_max_frags = peer_ni->ibp_max_frags; conn->ibc_queue_depth = peer_ni->ibp_queue_depth; + conn->ibc_rxs = NULL; + conn->ibc_rx_pages = NULL; INIT_LIST_HEAD(&conn->ibc_early_rxs); INIT_LIST_HEAD(&conn->ibc_tx_noops); @@ -778,20 +780,6 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, write_unlock_irqrestore(glock, flags); - conn->ibc_rxs = kzalloc_cpt(IBLND_RX_MSGS(conn) * sizeof(struct kib_rx), - GFP_NOFS, cpt); - if (!conn->ibc_rxs) { - CERROR("Cannot allocate RX buffers\n"); - goto failed_2; - } - - rc = kiblnd_alloc_pages(&conn->ibc_rx_pages, cpt, - IBLND_RX_MSG_PAGES(conn)); - if (rc) - goto failed_2; - - kiblnd_map_rx_descs(conn); - cq_attr.cqe = IBLND_CQ_ENTRIES(conn); cq_attr.comp_vector = kiblnd_get_completion_vector(conn, cpt); cq = ib_create_cq(cmid->device, @@ -856,6 +844,20 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, kfree(init_qp_attr); + conn->ibc_rxs = kzalloc_cpt(IBLND_RX_MSGS(conn) * sizeof(struct kib_rx), + GFP_NOFS, cpt); + if (!conn->ibc_rxs) { + CERROR("Cannot allocate RX buffers\n"); + goto failed_2; + } + + rc = kiblnd_alloc_pages(&conn->ibc_rx_pages, cpt, + IBLND_RX_MSG_PAGES(conn)); + if (rc) + goto failed_2; + + kiblnd_map_rx_descs(conn); + /* 1 ref for caller and each rxmsg */ atomic_set(&conn->ibc_refcount, 1 + IBLND_RX_MSGS(conn)); conn->ibc_nrx = IBLND_RX_MSGS(conn); From patchwork Thu Feb 27 21:11:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC147138D for ; Thu, 27 Feb 2020 21:28:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 949D7246A0 for ; Thu, 27 Feb 2020 21:28:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 949D7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14BCA3492CC; Thu, 27 Feb 2020 13:24:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6B8021FCB4 for ; Thu, 27 Feb 2020 13:19:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D04602AF2; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEAC346F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:23 -0500 Message-Id: <1582838290-17243-216-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 215/622] lustre: osc: reduce atomic ops in osc_enter_cache_try X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang We can reduce the number of atomic ops performed on obd_dirty_pages for the common case. WC-bug-id: https://jira.whamcloud.com/browse/LU-11775 Lustre-commit: 8b364fbd6bd9 ("LU-11775 osc: reduce atomic ops in osc_enter_cache_try") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/33859 Reviewed-by: Patrick Farrell Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index a18e791..bdaf65f 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1423,7 +1423,6 @@ static void osc_consume_write_grant(struct client_obd *cli, { assert_spin_locked(&cli->cl_loi_list_lock); LASSERT(!(pga->flag & OBD_BRW_FROM_GRANT)); - atomic_long_inc(&obd_dirty_pages); cli->cl_dirty_pages++; pga->flag |= OBD_BRW_FROM_GRANT; CDEBUG(D_CACHE, "using %lu grant credits for brw %p page %p\n", @@ -1560,13 +1559,18 @@ static bool osc_enter_cache_try(struct client_obd *cli, if (osc_reserve_grant(cli, bytes) < 0) return rc; - if (cli->cl_dirty_pages < cli->cl_dirty_max_pages && - atomic_long_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) { - osc_consume_write_grant(cli, &oap->oap_brw_page); - rc = true; - } else { - __osc_unreserve_grant(cli, bytes, bytes); + if (cli->cl_dirty_pages < cli->cl_dirty_max_pages) { + if (atomic_long_add_return(1, &obd_dirty_pages) <= + obd_max_dirty_pages) { + osc_consume_write_grant(cli, &oap->oap_brw_page); + rc = true; + goto out; + } else + atomic_long_dec(&obd_dirty_pages); } + __osc_unreserve_grant(cli, bytes, bytes); + +out: return rc; } From patchwork Thu Feb 27 21:11:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410033 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A46C51580 for ; Thu, 27 Feb 2020 21:28:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 86C16246A0 for ; Thu, 27 Feb 2020 21:28:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 86C16246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35C9A3492D3; Thu, 27 Feb 2020 13:24:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 241EF21FCB4 for ; Thu, 27 Feb 2020 13:19:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D2FAE2C4A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D184146A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:24 -0500 Message-Id: <1582838290-17243-217-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 216/622] lustre: llite: ll_fault should fail for insane file offsets X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev A page fault for a mmapped lustre file at offset large than 2^63 cause Lustre client to hang due to wrong page index calculations from signed loff_t. There is no need to do such calclulations but perform page offset sanity checks in ll_fault(). Cray-bug-id: LUS-1392 WC-bug-id: https://jira.whamcloud.com/browse/LU-8299 Lustre-commit: ada3b33b52cd ("LU-8299 llite: ll_fault should fail for insane file offsets") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/34242 Reviewed-by: Andrew Perepechko Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/llite/llite_mmap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 14080b6..236d1d2 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -373,6 +373,9 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), LPROC_LL_FAULT, 1); + /* make sure offset is not a negative number */ + if (vmf->pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) + return VM_FAULT_SIGBUS; restart: result = __ll_fault(vmf->vma, vmf); if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) { From patchwork Thu Feb 27 21:11:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF56C14E3 for ; Thu, 27 Feb 2020 21:27:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B7C05246A0 for ; Thu, 27 Feb 2020 21:27:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7C05246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8CD3D349160; Thu, 27 Feb 2020 13:24:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6531121FCB4 for ; Thu, 27 Feb 2020 13:19:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D5FE32C4B; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D473D46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:25 -0500 Message-Id: <1582838290-17243-218-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 217/622] lustre: ptlrpc: reset generation for old requests X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev All requests generated while the import is changing from FULL to IDLE need to be moved to the new generation. WC-bug-id: https://jira.whamcloud.com/browse/LU-11951 Lustre-commit: 42d8cb04637b ("LU-11951 ptlrpc: reset generation for old requests") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/34221 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/import.c | 20 +++++++++++++++++++- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index d9a0395..5e5cf3a 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -263,6 +263,7 @@ #define OBD_FAIL_OST_DQACQ_NET 0x230 #define OBD_FAIL_OST_STATFS_EINPROGRESS 0x231 #define OBD_FAIL_OST_SET_INFO_NET 0x232 +#define OBD_FAIL_OST_DISCONNECT_DELAY 0x245 #define OBD_FAIL_LDLM 0x300 #define OBD_FAIL_LDLM_NAMESPACE_NEW 0x301 diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index df6c459..34a2cb0 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1593,6 +1593,23 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) } EXPORT_SYMBOL(ptlrpc_disconnect_import); +static void ptlrpc_reset_reqs_generation(struct obd_import *imp) +{ + struct ptlrpc_request *old, *tmp; + + /* tag all resendable requests generated before disconnection + * notice this code is part of disconnect-at-idle path only + */ + list_for_each_entry_safe(old, tmp, &imp->imp_delayed_list, + rq_list) { + spin_lock(&old->rq_lock); + if (old->rq_import_generation == imp->imp_generation - 1 && + !old->rq_no_resend) + old->rq_import_generation = imp->imp_generation; + spin_unlock(&old->rq_lock); + } +} + static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, struct ptlrpc_request *req, void *args, int rc) @@ -1600,7 +1617,7 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, struct obd_import *imp = req->rq_import; int connect = 0; - DEBUG_REQ(D_HA, req, "inflight=%d, refcount=%d: rc = %d\n", + DEBUG_REQ(D_HA, req, "inflight=%d, refcount=%d: rc = %d ", atomic_read(&imp->imp_inflight), atomic_read(&imp->imp_refcount), rc); @@ -1620,6 +1637,7 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, imp->imp_generation++; imp->imp_initiated_at = imp->imp_generation; IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_NEW); + ptlrpc_reset_reqs_generation(imp); connect = 1; } } From patchwork Thu Feb 27 21:11:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410049 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11C75138D for ; Thu, 27 Feb 2020 21:28:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EE7D5246A1 for ; Thu, 27 Feb 2020 21:28:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE7D5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28A3F348D72; Thu, 27 Feb 2020 13:25:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A5D3921FCB4 for ; Thu, 27 Feb 2020 13:19:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D8EDB2C4C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D7522468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:26 -0500 Message-Id: <1582838290-17243-219-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 218/622] lustre: osc: check if opg is in lru list without locking X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang osc_lru_use is called for every page queued for io, we can just check if the osc_page is in the lru list without taking the cl_lru_list_lock and return if not as a fast path. Note we still need to do the check again after locking as it could be removed from the lru list by another thread. WC-bug-id: https://jira.whamcloud.com/browse/LU-11775 Lustre-commit: b3af0798682b ("LU-11775 osc: check if opg is in lru list without locking") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/33860 Reviewed-by: Patrick Farrell Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_page.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 4dc6c18..7382e0d 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -494,6 +494,9 @@ static void osc_lru_use(struct client_obd *cli, struct osc_page *opg) * ops_lru should be empty */ if (opg->ops_in_lru) { + if (list_empty(&opg->ops_lru)) + return; + spin_lock(&cli->cl_lru_list_lock); if (!list_empty(&opg->ops_lru)) { __osc_lru_del(cli, opg); From patchwork Thu Feb 27 21:11:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409999 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22E971580 for ; Thu, 27 Feb 2020 21:27:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0BCCE246A0 for ; Thu, 27 Feb 2020 21:27:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BCCE246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2D56A349187; Thu, 27 Feb 2020 13:24:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E6D6D21FC0A for ; Thu, 27 Feb 2020 13:19:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DBA7F2C4D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DA1A346D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:27 -0500 Message-Id: <1582838290-17243-220-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 219/622] lnet: use right rtr address X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov use a sender router to avoid credits distribution problem. Sender is preferable rtr now. Cray-bug-id: LUS-6490 WC-bug-id: https://jira.whamcloud.com/browse/LU-11413 Lustre-commit: 3f4520608130 ("LU-11413 lnet: use right rtr address") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/34031 Reviewed-by: Chris Horn Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 2 +- net/lnet/lnet/lib-msg.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index f5548eb..468de06 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3558,7 +3558,7 @@ void lnet_monitor_thr_stop(void) lnet_ni_recv(ni, msg->msg_private, NULL, 0, 0, 0, 0); msg->msg_receiving = 0; - rc = lnet_send(ni->ni_nid, msg, LNET_NID_ANY); + rc = lnet_send(ni->ni_nid, msg, msg->msg_from); if (rc < 0) { /* didn't get as far as lnet_ni_send() */ CERROR("%s: Unable to send REPLY for GET from %s: %d\n", diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index af0675e..0738bf7 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -401,7 +401,7 @@ * NB: we probably want to use NID of msg::msg_from as 3rd * parameter (router NID) if it's routed message */ - rc = lnet_send(msg->msg_ev.target.nid, msg, LNET_NID_ANY); + rc = lnet_send(msg->msg_ev.target.nid, msg, msg->msg_from); lnet_net_lock(cpt); /* From patchwork Thu Feb 27 21:11:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410037 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD323138D for ; Thu, 27 Feb 2020 21:28:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A55E9246A0 for ; Thu, 27 Feb 2020 21:28:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A55E9246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4CFFB201353; Thu, 27 Feb 2020 13:24:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 356B621FC0A for ; Thu, 27 Feb 2020 13:19:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DF0922C4E; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DCE6946F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:28 -0500 Message-Id: <1582838290-17243-221-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 220/622] lnet: use right address for routing message X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov msg_initiator is real sender address, so use this address as hash source to better distribution against CPT on server side. Cray-bug-id: LUS-6841 WC-bug-id: https://jira.whamcloud.com/browse/LU-11413 Lustre-commit: ad263e5d6e93 ("LU-11413 lnet: use right address for routing message") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/34032 Reviewed-by: Chris Horn Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 468de06..185c31a 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3463,7 +3463,7 @@ void lnet_monitor_thr_stop(void) info.mi_rlength = hdr->payload_length; info.mi_roffset = hdr->msg.put.offset; info.mi_mbits = hdr->msg.put.match_bits; - info.mi_cpt = lnet_cpt_of_nid(msg->msg_rxpeer->lpni_nid, ni); + info.mi_cpt = lnet_cpt_of_nid(msg->msg_initiator, ni); msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv; ready_delay = msg->msg_rx_ready_delay; @@ -3527,7 +3527,7 @@ void lnet_monitor_thr_stop(void) info.mi_rlength = hdr->msg.get.sink_length; info.mi_roffset = hdr->msg.get.src_offset; info.mi_mbits = hdr->msg.get.match_bits; - info.mi_cpt = lnet_cpt_of_nid(msg->msg_rxpeer->lpni_nid, ni); + info.mi_cpt = lnet_cpt_of_nid(msg->msg_initiator, ni); rc = lnet_ptl_match_md(&info, msg); if (rc == LNET_MATCHMD_DROP) { From patchwork Thu Feb 27 21:11:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410523 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A87A92A for ; Thu, 27 Feb 2020 21:40:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 23BD9246A1 for ; Thu, 27 Feb 2020 21:40:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23BD9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F8CA34A797; Thu, 27 Feb 2020 13:32:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76F7421FCC5 for ; Thu, 27 Feb 2020 13:19:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E15F92C4F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DFB6746A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:29 -0500 Message-Id: <1582838290-17243-222-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 221/622] lustre: lov: avoid signed vs. unsigned comparison X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In the expansion of do_div64() GCC complains about pointer comparison because loff_t is not a u64 variable as it should be. lov_do_div64() also has signed vs. unsigned comparisons due to a signed loff_t. Change lov_do_div() to use a 64-bit variable for do_div() instead of loff_t to avoid these warnings. Change OST_MAXREQSIZE and friends to be consistently unsigned values to avoid compiler warnings. WC-bug-id: https://jira.whamcloud.com/browse/LU-11830 Lustre-commit: 632b3591b6ea ("LU-11830 lov: avoid signed vs. unsigned comparison") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33921 Reviewed-by: Jian Yu Reviewed-by: Alex Zhuravlev Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 15 ++++++++------- fs/lustre/lov/lov_internal.h | 15 +++++++++------ 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 36de665..8d71559 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -281,21 +281,22 @@ * - OST_IO_MAXREQSIZE must be at least 1 page of cookies plus some spillover * - Must be a multiple of 1024 */ -#define _OST_MAXREQSIZE_BASE (sizeof(struct lustre_msg) + \ +#define _OST_MAXREQSIZE_BASE ((unsigned long)(sizeof(struct lustre_msg) + \ sizeof(struct ptlrpc_body) + \ sizeof(struct obdo) + \ sizeof(struct obd_ioobj) + \ - sizeof(struct niobuf_remote)) -#define _OST_MAXREQSIZE_SUM (_OST_MAXREQSIZE_BASE + \ + sizeof(struct niobuf_remote))) +#define _OST_MAXREQSIZE_SUM ((unsigned long)(_OST_MAXREQSIZE_BASE + \ sizeof(struct niobuf_remote) * \ - (DT_MAX_BRW_PAGES - 1)) + (DT_MAX_BRW_PAGES - 1))) /** * FIEMAP request can be 4K+ for now */ -#define OST_MAXREQSIZE (16 * 1024) -#define OST_IO_MAXREQSIZE max_t(int, OST_MAXREQSIZE, \ - (((_OST_MAXREQSIZE_SUM - 1) | (1024 - 1)) + 1)) +#define OST_MAXREQSIZE (16UL * 1024UL) +#define OST_IO_MAXREQSIZE max(OST_MAXREQSIZE, \ + ((_OST_MAXREQSIZE_SUM - 1) | \ + (1024 - 1)) + 1) /* Safe estimate of free space in standard RPC, provides upper limit for # of * bytes of i/o to pack in RPC (skipping bulk transfer). diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h index 376ac52..36586b3 100644 --- a/fs/lustre/lov/lov_internal.h +++ b/fs/lustre/lov/lov_internal.h @@ -186,19 +186,22 @@ struct lsm_operations { }) #elif BITS_PER_LONG == 32 # define lov_do_div64(n, base) ({ \ + u64 __num = (n); \ u64 __rem; \ if ((sizeof(base) > 4) && (((base) & 0xffffffff00000000ULL) != 0)) { \ int __remainder; \ - LASSERTF(!((base) & (LOV_MIN_STRIPE_SIZE - 1)), "64 bit lov " \ - "division %llu / %llu\n", (n), (u64)(base)); \ - __remainder = (n) & (LOV_MIN_STRIPE_SIZE - 1); \ - (n) >>= LOV_MIN_STRIPE_BITS; \ - __rem = do_div(n, (base) >> LOV_MIN_STRIPE_BITS); \ + LASSERTF(!((base) & (LOV_MIN_STRIPE_SIZE - 1)), \ + "64 bit lov division %llu / %llu\n", \ + __num, (u64)(base)); \ + __remainder = __num & (LOV_MIN_STRIPE_SIZE - 1); \ + __num >>= LOV_MIN_STRIPE_BITS; \ + __rem = do_div(__num, (base) >> LOV_MIN_STRIPE_BITS); \ __rem <<= LOV_MIN_STRIPE_BITS; \ __rem += __remainder; \ } else { \ - __rem = do_div(n, base); \ + __rem = do_div(__num, base); \ } \ + (n) = __num; \ __rem; \ }) #endif From patchwork Thu Feb 27 21:11:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410277 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7BFD392A for ; Thu, 27 Feb 2020 21:33:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 64A4F24677 for ; Thu, 27 Feb 2020 21:33:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64A4F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2DC6C21F8EF; Thu, 27 Feb 2020 13:28:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CEAAE21FB1B for ; Thu, 27 Feb 2020 13:19:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E43702C50; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E29AD46C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:30 -0500 Message-Id: <1582838290-17243-223-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 222/622] lustre: obd: use ldo_process_config for mdc and osc layer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Both the mdc and osc layer use the lu_device infrastructure but we don't use ldo_process_config() which is preferred over the currently used obd_process_config() handling. Migrate to the lu_device ldo_process_config() for both mdc and osc layer. WC-bug-id: https://jira.whamcloud.com/browse/LU-9855 Lustre-commit: d12959c69fd4 ("LU-9855 obd: use ldo_process_config for mdc and osc layer") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/34106 Reviewed-by: Ben Evans Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_dev.c | 11 +++++++---- fs/lustre/mdc/mdc_internal.h | 1 - fs/lustre/mdc/mdc_request.c | 11 ----------- fs/lustre/osc/osc_dev.c | 11 +++++++---- fs/lustre/osc/osc_request.c | 14 -------------- 5 files changed, 14 insertions(+), 34 deletions(-) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 306b917..f23f6cf 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -35,6 +35,7 @@ #include #include +#include #include "mdc_internal.h" @@ -1422,15 +1423,17 @@ struct lu_object *mdc_object_alloc(const struct lu_env *env, return obj; } -static int mdc_cl_process_config(const struct lu_env *env, - struct lu_device *d, struct lustre_cfg *cfg) +static int mdc_process_config(const struct lu_env *env, struct lu_device *d, + struct lustre_cfg *cfg) { - return mdc_process_config(d->ld_obd, 0, cfg); + size_t count = class_modify_config(cfg, PARAM_MDC, + &d->ld_obd->obd_kset.kobj); + return count > 0 ? 0 : count; } const struct lu_device_operations mdc_lu_ops = { .ldo_object_alloc = mdc_object_alloc, - .ldo_process_config = mdc_cl_process_config, + .ldo_process_config = mdc_process_config, .ldo_recovery_complete = NULL, }; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 7a6ec81..a5fe164 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -93,7 +93,6 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid, int mdc_fid_alloc(const struct lu_env *env, struct obd_export *exp, struct lu_fid *fid, struct md_op_data *op_data); int mdc_setup(struct obd_device *obd, struct lustre_cfg *cfg); -int mdc_process_config(struct obd_device *obd, u32 len, void *buf); struct obd_client_handle; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 4711288..c08a6ee 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -51,7 +51,6 @@ #include #include #include -#include #include #include #include @@ -2743,15 +2742,6 @@ static int mdc_cleanup(struct obd_device *obd) return osc_cleanup_common(obd); } -int mdc_process_config(struct obd_device *obd, u32 len, void *buf) -{ - struct lustre_cfg *lcfg = buf; - size_t count = class_modify_config(lcfg, PARAM_MDC, - &obd->obd_kset.kobj); - - return count > 0 ? 0 : count; -} - static const struct obd_ops mdc_obd_ops = { .owner = THIS_MODULE, .setup = mdc_setup, @@ -2770,7 +2760,6 @@ int mdc_process_config(struct obd_device *obd, u32 len, void *buf) .fid_alloc = mdc_fid_alloc, .import_event = mdc_import_event, .get_info = mdc_get_info, - .process_config = mdc_process_config, .get_uuid = mdc_get_uuid, .quotactl = mdc_quotactl, }; diff --git a/fs/lustre/osc/osc_dev.c b/fs/lustre/osc/osc_dev.c index b8bf75a..6469973 100644 --- a/fs/lustre/osc/osc_dev.c +++ b/fs/lustre/osc/osc_dev.c @@ -40,6 +40,7 @@ /* class_name2obd() */ #include #include +#include #include "osc_internal.h" @@ -161,15 +162,17 @@ struct lu_context_key osc_session_key = { /* type constructor/destructor: osc_type_{init,fini,start,stop}(). */ LU_TYPE_INIT_FINI(osc, &osc_key, &osc_session_key); -static int osc_cl_process_config(const struct lu_env *env, - struct lu_device *d, struct lustre_cfg *cfg) +static int osc_process_config(const struct lu_env *env, struct lu_device *d, + struct lustre_cfg *cfg) { - return osc_process_config_base(d->ld_obd, cfg); + ssize_t count = class_modify_config(cfg, PARAM_OSC, + &d->ld_obd->obd_kset.kobj); + return count > 0 ? 0 : count; } static const struct lu_device_operations osc_lu_ops = { .ldo_object_alloc = osc_object_alloc, - .ldo_process_config = osc_cl_process_config, + .ldo_process_config = osc_process_config, .ldo_recovery_complete = NULL }; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 6ce22c3..c55d5a9 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -47,7 +47,6 @@ #include #include #include -#include #include #include #include @@ -3348,18 +3347,6 @@ int osc_cleanup_common(struct obd_device *obd) } EXPORT_SYMBOL(osc_cleanup_common); -int osc_process_config_base(struct obd_device *obd, struct lustre_cfg *lcfg) -{ - ssize_t count = class_modify_config(lcfg, PARAM_OSC, - &obd->obd_kset.kobj); - return count > 0 ? 0 : count; -} - -static int osc_process_config(struct obd_device *obd, u32 len, void *buf) -{ - return osc_process_config_base(obd, buf); -} - static const struct obd_ops osc_obd_ops = { .owner = THIS_MODULE, .setup = osc_setup, @@ -3379,7 +3366,6 @@ static int osc_process_config(struct obd_device *obd, u32 len, void *buf) .iocontrol = osc_iocontrol, .set_info_async = osc_set_info_async, .import_event = osc_import_event, - .process_config = osc_process_config, .quotactl = osc_quotactl, }; From patchwork Thu Feb 27 21:11:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B662924 for ; Thu, 27 Feb 2020 21:45:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4312024690 for ; Thu, 27 Feb 2020 21:45:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4312024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA7E134AF36; Thu, 27 Feb 2020 13:35:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 338A721FB1B for ; Thu, 27 Feb 2020 13:19:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E715B2C51; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E5800468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:31 -0500 Message-Id: <1582838290-17243-224-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 223/622] lnet: check for asymmetrical route messages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Asymmetrical routes can be an issue when debugging network, and allowing them also opens the door to attacks where hostile clients inject data to the servers. In order to prevent asymmetrical routes, add a new lnet kernel module option named 'lnet_drop_asym_route'. When set to non-zero, lnet_parse() will check if the message received from a remote peer is coming through a router that would normally be used by this node to reach the remote peer. If it is not the case, then it means we are dealing with an asymmetrical route message, and the message will be dropped. The check for asymmetrical route can also be switched on/off with the command 'lnetctl set drop_asym_route 0|1'. And this parameter is exported/imported in Yaml. WC-bug-id: https://jira.whamcloud.com/browse/LU-11894 Lustre-commit: 4932febc1213 ("LU-11894 lnet: check for asymmetrical route messages") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/34119 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/api-ni.c | 44 ++++++++++++++++++++++++++++++++++++++++ net/lnet/lnet/lib-move.c | 47 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 92 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index d09fb4c..a6e64f6 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -507,6 +507,7 @@ struct lnet_ni * extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_recovery_interval; extern unsigned int lnet_peer_discovery_disabled; +extern unsigned int lnet_drop_asym_route; extern int portal_rotor; int lnet_lib_init(void); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 3ee10da..e5f5c6c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -126,6 +126,20 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_peer_discovery_disabled, "Set to 1 to disable peer discovery on this node."); +unsigned int lnet_drop_asym_route; +static int drop_asym_route_set(const char *val, const struct kernel_param *kp); + +static struct kernel_param_ops param_ops_drop_asym_route = { + .set = drop_asym_route_set, + .get = param_get_int, +}; + +#define param_check_drop_asym_route(name, p) \ + __param_check(name, p, int) +module_param(lnet_drop_asym_route, drop_asym_route, 0644); +MODULE_PARM_DESC(lnet_drop_asym_route, + "Set to 1 to drop asymmetrical route messages."); + unsigned int lnet_transaction_timeout = 50; static int transaction_to_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_transaction_timeout = { @@ -292,6 +306,36 @@ static int lnet_discover(struct lnet_process_id id, u32 force, } static int +drop_asym_route_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *drop_asym_route = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_drop_asym_route'\n"); + return rc; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (value == *drop_asym_route) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *drop_asym_route = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + +static int transaction_to_set(const char *val, const struct kernel_param *kp) { unsigned int *transaction_to = (unsigned int *)kp->arg; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 185c31a..809d2b6 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3959,6 +3959,53 @@ void lnet_monitor_thr_stop(void) goto drop; } + if (lnet_drop_asym_route && for_me && + LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) { + struct lnet_net *net; + struct lnet_remotenet *rnet; + bool found = true; + + /* we are dealing with a routed message, + * so see if route to reach src_nid goes through from_nid + */ + lnet_net_lock(cpt); + net = lnet_get_net_locked(LNET_NIDNET(ni->ni_nid)); + if (!net) { + lnet_net_unlock(cpt); + CERROR("net %s not found\n", + libcfs_net2str(LNET_NIDNET(ni->ni_nid))); + return -EPROTO; + } + + rnet = lnet_find_rnet_locked(LNET_NIDNET(src_nid)); + if (rnet) { + struct lnet_peer_ni *gw = NULL; + struct lnet_route *route; + + list_for_each_entry(route, &rnet->lrn_routes, lr_list) { + found = false; + gw = route->lr_gateway; + if (gw->lpni_net != net) + continue; + if (gw->lpni_nid == from_nid) { + found = true; + break; + } + } + } + lnet_net_unlock(cpt); + if (!found) { + /* we would not use from_nid to route a message to + * src_nid + * => asymmetric routing detected but forbidden + */ + CERROR("%s, src %s: Dropping asymmetrical route %s\n", + libcfs_nid2str(from_nid), + libcfs_nid2str(src_nid), lnet_msgtyp2str(type)); + goto drop; + } + } + msg = kzalloc(sizeof(*msg), GFP_NOFS); if (!msg) { CERROR("%s, src %s: Dropping %s (out of memory)\n", From patchwork Thu Feb 27 21:11:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410053 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E854138D for ; Thu, 27 Feb 2020 21:29:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 772F8246A1 for ; Thu, 27 Feb 2020 21:29:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 772F8246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B04A348C21; Thu, 27 Feb 2020 13:25:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88A9E21FCD2 for ; Thu, 27 Feb 2020 13:19:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E9AC92C52; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E859A46D; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:32 -0500 Message-Id: <1582838290-17243-225-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 224/622] lustre: llite: Lock inode on tiny write if setuid/setgid set X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler During a write, the setuid/setgid bits must be reset if they are enabled and the user does not have the correct permissions. Setting any file attributes, including setuid and setgid, requires the inode to be locked. Writes became lockless with the introduction of LU-1669. Locking the inode in the setuid/setgid case was added to vvp_io_write_start() as a special case. The inode locking was not included when support for tiny writes was added with LU-9409. This mod adds the necessary inode lock/unlock calls to ll_do_tiny_write(). If the inode is not locked when setuid/setgid are reset, the kernel will issue a one time warning and Lustre may hang trying to get the inode lock in ll_setattr_raw(). WC-bug-id: https://jira.whamcloud.com/browse/LU-11944 Lustre-commit: f39a552922ca ("LU-11944 llite: Lock inode on tiny write if setuid/setgid set") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/34218 Reviewed-by: Patrick Farrell Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 6 ++++++ fs/lustre/llite/vvp_io.c | 6 +++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 7078734..a73d11f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1616,6 +1616,7 @@ static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter) ssize_t count = iov_iter_count(iter); struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); + bool lock_inode = !IS_NOSEC(inode); ssize_t result = 0; /* Restrict writes to single page and < PAGE_SIZE. See comment at top @@ -1625,8 +1626,13 @@ static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter) (iocb->ki_pos & (PAGE_SIZE-1)) + count > PAGE_SIZE) return 0; + if (unlikely(lock_inode)) + inode_lock(inode); result = __generic_file_write_iter(iocb, iter); + if (unlikely(lock_inode)) + inode_unlock(inode); + /* If the page is not already dirty, ll_tiny_write_begin returns * -ENODATA. We continue on to normal write. */ diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 85bb3e0..ad4b39e 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1037,13 +1037,13 @@ static int vvp_io_write_start(const struct lu_env *env, * consistency, proper locking to protect against writes, * trucates, etc. is handled in the higher layers of lustre. */ - bool lock_node = !IS_NOSEC(inode); + lock_inode = !IS_NOSEC(inode); - if (lock_node) + if (unlikely(lock_inode)) inode_lock(inode); result = __generic_file_write_iter(vio->vui_iocb, vio->vui_iter); - if (lock_node) + if (unlikely(lock_inode)) inode_unlock(inode); if (result > 0 || result == -EIOCBQUEUED) From patchwork Thu Feb 27 21:11:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410041 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35F0E1580 for ; Thu, 27 Feb 2020 21:28:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1E320246A0 for ; Thu, 27 Feb 2020 21:28:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E320246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F29721FC84; Thu, 27 Feb 2020 13:24:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C94AD21FCD5 for ; Thu, 27 Feb 2020 13:19:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EC80D2C53; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EB20E46F; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:33 -0500 Message-Id: <1582838290-17243-226-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 225/622] lustre: llite: make sure name pack atomic X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong We are trying to access dentry name directly and pass it down without holding @d_lock, this is racy and possibly make us trigger assertions: (mdc_lib.c:137:mdc_pack_name()) ASSERTION( lu_name_is_valid_2(buf, cpy_len) ) failed: Fix the problem by allocting memory and copy name with @d_lock held. WC-bug-id: https://jira.whamcloud.com/browse/LU-12020 Lustre-Commit: f575b6551b2b ("LU-12020 llite: make sure name pack atomic") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34330 Reviewed-by: Patrick Farrell Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a73d11f..4560ae0 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -502,7 +502,7 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, struct inode *inode = d_inode(de); struct ll_sb_info *sbi = ll_i2sbi(inode); struct dentry *parent = de->d_parent; - const char *name = NULL; + char *name = NULL; struct md_op_data *op_data; struct ptlrpc_request *req = NULL; int len = 0, rc; @@ -514,21 +514,41 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, * if server supports open-by-fid, or file name is invalid, don't pack * name in open request */ - if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID) && - lu_name_is_valid_2(de->d_name.name, de->d_name.len)) { - name = de->d_name.name; + if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID)) { +retry: len = de->d_name.len; + name = kmalloc(len, GFP_NOFS); + if (!name) + return -ENOMEM; + /* race here */ + spin_lock(&de->d_lock); + if (len != de->d_name.len) { + spin_unlock(&de->d_lock); + kfree(name); + goto retry; + } + memcpy(name, de->d_name.name, len); + spin_unlock(&de->d_lock); + + if (!lu_name_is_valid_2(name, len)) { + kfree(name); + name = NULL; + len = 0; + } } op_data = ll_prep_md_op_data(NULL, d_inode(parent), inode, name, len, O_RDWR, LUSTRE_OPC_ANY, NULL); - if (IS_ERR(op_data)) + if (IS_ERR(op_data)) { + kfree(name); return PTR_ERR(op_data); + } op_data->op_data = lmm; op_data->op_data_size = lmmsize; rc = md_intent_lock(sbi->ll_md_exp, op_data, itp, &req, &ll_md_blocking_ast, 0); + kfree(name); ll_finish_md_op_data(op_data); if (rc == -ESTALE) { /* reason for keep own exit path - don`t flood log From patchwork Thu Feb 27 21:11:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410045 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B7AA138D for ; Thu, 27 Feb 2020 21:28:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 840C3246A0 for ; Thu, 27 Feb 2020 21:28:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 840C3246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9423821F873; Thu, 27 Feb 2020 13:25:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1726821FCD8 for ; Thu, 27 Feb 2020 13:19:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F00A62C54; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EE0CE46A; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:34 -0500 Message-Id: <1582838290-17243-227-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 226/622] lustre: ptlrpc: handle proper import states for recovery X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong There are two problems: See following assertion: lod_add_device() lustre-OSTe42a-osc-MDT0000: can't set up pool, failed with -12 osp_disconnect() ASSERTION( imp != ((void *)0) ) failed: osp_disconnect() LBUG CPU: 1 PID: 10059 Comm: llog_process_th Problem is obd_disconnect() will cleanup @imp and set NULL. ->osp_obd_disconnect ->class_manual_cleanup ->class_process_config ->class_cleanup ->obd_precleanup ->osp_device_fini ->client_obd_cleanup While ldo_process_config() will try to access @imp again: ->ldo_process_config ->osp_shutdown ->osp_disconnect ->LASSERT(imp != NULL) Another problem is if we failed before obd_connect(). we will hang on with mount: ->ldo_process_config ->osp_shutdown ->osp_disconnect ->ptlrpc_disconnect_import ->rc = l_wait_event(imp->imp_recovery_waitq, !ptlrpc_import_in_recovery(imp), &lwi); Since connect is not called, imp state will stay LUSTRE_IMP_NEW. Fix this by check whether we are in recovery properly, only consider we are in recovery if we are in following states: LUSTRE_IMP_CONNECTING = 4, LUSTRE_IMP_REPLAY = 5, LUSTRE_IMP_REPLAY_LOCKS = 6, LUSTRE_IMP_REPLAY_WAIT = 7, LUSTRE_IMP_RECOVER = 8, WC-bug-id: https://jira.whamcloud.com/browse/LU-11243 Lustre-commit: f28353b3d810 ("LU-11243 lod: fix assertion and hang upon lod_add_device failure") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/32994 Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/recover.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ptlrpc/recover.c b/fs/lustre/ptlrpc/recover.c index ceab288..e26612d 100644 --- a/fs/lustre/ptlrpc/recover.c +++ b/fs/lustre/ptlrpc/recover.c @@ -367,9 +367,8 @@ int ptlrpc_import_in_recovery(struct obd_import *imp) int in_recovery = 1; spin_lock(&imp->imp_lock); - if (imp->imp_state == LUSTRE_IMP_FULL || - imp->imp_state == LUSTRE_IMP_CLOSED || - imp->imp_state == LUSTRE_IMP_DISCON || + if (imp->imp_state <= LUSTRE_IMP_DISCON || + imp->imp_state >= LUSTRE_IMP_FULL || imp->imp_obd->obd_no_recov) in_recovery = 0; spin_unlock(&imp->imp_lock); From patchwork Thu Feb 27 21:11:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410051 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25CC81580 for ; Thu, 27 Feb 2020 21:28:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0EE6B246A3 for ; Thu, 27 Feb 2020 21:28:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0EE6B246A3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6AAA8348D79; Thu, 27 Feb 2020 13:25:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AFEE21FAE5 for ; Thu, 27 Feb 2020 13:19:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F271D2C55; Thu, 27 Feb 2020 16:18:15 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F0DC246C; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:35 -0500 Message-Id: <1582838290-17243-228-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 227/622] lustre: ldlm: don't convert wrong resource X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin During enqueue the returned lock may have different resource and local client lock replaces resource too. But there is a valid race with bl_ast and reply from server, so BL AST may come earlier and find client lock with old resource. In that case ldlm_handle_bl_callback() should proceed with normal cancel and don't use cancel_bits for lock convert. WC-bug-id: https://jira.whamcloud.com/browse/LU-11836 Lustre-commit: 2bc71659db69 ("LU-11836 ldlm: don't convert wrong resource") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34264 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lockd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 6905ee5..2985e37 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -131,8 +131,14 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, * NOTE: ld can be NULL or can be not NULL but zeroed if * passed from ldlm_bl_thread_blwi(), check below used bits * in ld to make sure it is valid description. + * + * If server may replace lock resource keeping the same cookie, + * never use cancel bits from different resource, full cancel + * is to be used. */ - if (ld && ld->l_policy_data.l_inodebits.bits) + if (ld && ld->l_policy_data.l_inodebits.bits && + ldlm_res_eq(&ld->l_resource.lr_name, + &lock->l_resource->lr_name)) lock->l_policy_data.l_inodebits.cancel_bits = ld->l_policy_data.l_inodebits.cancel_bits; /* if there is no valid ld and lock is cbpending already From patchwork Thu Feb 27 21:11:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE1A617E0 for ; Thu, 27 Feb 2020 21:29:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C6E86246A1 for ; Thu, 27 Feb 2020 21:29:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6E86246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 276A93493A9; Thu, 27 Feb 2020 13:25:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A1F1321FAE5 for ; Thu, 27 Feb 2020 13:19:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 00F0C2C56; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F3CAC468; Thu, 27 Feb 2020 16:18:15 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:36 -0500 Message-Id: <1582838290-17243-229-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 228/622] lustre: llite: limit statfs ffree if less than OST ffree X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger If the OSTs report fewer total free objects than the MDTs, then use the free files count reported by the OSTs, since it represents the minimum number of files that can be created in the filesystem (creating more may be possible, but this depends on other factors). This has always been what ll_statfs_internal() reports, but the statfs aggregation via the MDT missed this step in lod_statfs(). Fix a minor defect in sanity test_418() that would let it loop forever until the test was killed due to timeout if the "df -i" and "lfs df -i" output did not converge. Fixes: 41a201a04c0f ("lustre: protocol: MDT as a statfs proxy") WC-bug-id: https://jira.whamcloud.com/browse/LU-11721 Lustre-commit: a829595add80 ("LU-11721 lod: limit statfs ffree if less than OST ffree") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/34167 Reviewed-by: Jian Yu Reviewed-by: Nikitas Angelinas Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 5 +++-- fs/lustre/llite/llite_lib.c | 22 +++++++++++----------- fs/lustre/lmv/lmv_obd.c | 4 ++-- 3 files changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 434bb79..6a4b6a5 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -898,8 +898,9 @@ static inline int obd_statfs_async(struct obd_export *exp, obd = exp->exp_obd; if (!obd->obd_type || !obd->obd_type->typ_dt_ops->statfs) { - CERROR("%s: no %s operation\n", obd->obd_name, __func__); - return -EOPNOTSUPP; + rc = -EOPNOTSUPP; + CERROR("%s: no statfs operation: rc = %d\n", obd->obd_name, rc); + return rc; } CDEBUG(D_SUPER, "%s: age %lld, max_age %lld\n", diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 84fc54d..4d41981a 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1723,17 +1723,15 @@ int ll_setattr(struct dentry *de, struct iattr *attr) int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, u32 flags) { - struct obd_statfs obd_osfs; + struct obd_statfs obd_osfs = { 0 }; time64_t max_age; int rc; max_age = ktime_get_seconds() - OBD_STATFS_CACHE_SECONDS; rc = obd_statfs(NULL, sbi->ll_md_exp, osfs, max_age, flags); - if (rc) { - CERROR("md_statfs fails: rc = %d\n", rc); + if (rc) return rc; - } osfs->os_type = LL_SUPER_MAGIC; @@ -1749,8 +1747,9 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, rc = obd_statfs(NULL, sbi->ll_dt_exp, &obd_osfs, max_age, flags); if (rc) { - CERROR("obd_statfs fails: rc = %d\n", rc); - return rc; + /* Possibly a filesystem with no OSTs. Report MDT totals. */ + rc = 0; + goto out; } CDEBUG(D_SUPER, "OSC blocks %llu/%llu objects %llu/%llu\n", @@ -1762,13 +1761,14 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, osfs->os_bfree = obd_osfs.os_bfree; osfs->os_bavail = obd_osfs.os_bavail; - /* If we don't have as many objects free on the OST as inodes - * on the MDS, we reduce the total number of inodes to - * compensate, so that the "inodes in use" number is correct. + /* If we have _some_ OSTs, but don't have as many free objects on the + * OSTs as inodes on the MDTs, reduce the reported number of inodes + * to compensate, so that the "inodes in use" number is correct. + * This should be kept in sync with lod_statfs() behaviour. */ - if (obd_osfs.os_ffree < osfs->os_ffree) { + if (obd_osfs.os_files && obd_osfs.os_ffree < osfs->os_ffree) { osfs->os_files = (osfs->os_files - osfs->os_ffree) + - obd_osfs.os_ffree; + obd_osfs.os_ffree; osfs->os_ffree = obd_osfs.os_ffree; } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 0685925..6ad100c 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1402,8 +1402,8 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, rc = obd_statfs(env, lmv->tgts[idx]->ltd_exp, temp, max_age, flags); if (rc) { - CERROR("can't stat MDS #%d (%s), error %d\n", i, - lmv->tgts[idx]->ltd_exp->exp_obd->obd_name, + CERROR("%s: can't stat MDS #%d: rc = %d\n", + lmv->tgts[idx]->ltd_exp->exp_obd->obd_name, i, rc); goto out_free_temp; } From patchwork Thu Feb 27 21:11:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410059 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1DE71580 for ; Thu, 27 Feb 2020 21:29:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A83C246A1 for ; Thu, 27 Feb 2020 21:29:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9A83C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B16B6348DCE; Thu, 27 Feb 2020 13:25:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 042F021FB96 for ; Thu, 27 Feb 2020 13:19:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 043942C57; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0295146D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:37 -0500 Message-Id: <1582838290-17243-230-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 229/622] lustre: mdc: prevent glimpse lock count grow X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin DOM locks matching tries to ignore locks with LDLM_FL_KMS_IGNORE flag during ldlm_lock_match() but checks that after ldlm_lock_match() call. Therefore if there is any lock with such flag in queue then all other locks after it are ignored and new lock is created causing big amount of locks on single resource in some access patterns. Patch extends lock_matches() function to check flags to exclude and adds ldlm_lock_match_with_skip() to use that when needed. WC-bug-id: https://jira.whamcloud.com/browse/LU-11964 Lustre-commit: b915221b6d0f ("LU-11964 mdc: prevent glimpse lock count grow") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34261 Reviewed-by: Patrick Farrell Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 27 ++++++++++--- fs/lustre/include/obd_support.h | 1 + fs/lustre/ldlm/ldlm_lock.c | 90 ++++++++++++++++++----------------------- fs/lustre/mdc/mdc_dev.c | 28 +++++++++---- 4 files changed, 82 insertions(+), 64 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 1133e20..a95555e 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -1136,12 +1136,27 @@ void ldlm_lock_decref_and_cancel(const struct lustre_handle *lockh, void ldlm_lock_fail_match_locked(struct ldlm_lock *lock); void ldlm_lock_allow_match(struct ldlm_lock *lock); void ldlm_lock_allow_match_locked(struct ldlm_lock *lock); -enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, - const struct ldlm_res_id *res_id, - enum ldlm_type type, - union ldlm_policy_data *policy, - enum ldlm_mode mode, struct lustre_handle *lh, - int unref); +enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, + u64 flags, u64 skip_flags, + const struct ldlm_res_id *res_id, + enum ldlm_type type, + union ldlm_policy_data *policy, + enum ldlm_mode mode, + struct lustre_handle *lh, + int unref); +static inline enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, + u64 flags, + const struct ldlm_res_id *res_id, + enum ldlm_type type, + union ldlm_policy_data *policy, + enum ldlm_mode mode, + struct lustre_handle *lh, + int unref) +{ + return ldlm_lock_match_with_skip(ns, flags, 0, res_id, type, policy, + mode, lh, unref); +} + enum ldlm_mode ldlm_revalidate_lock_handle(const struct lustre_handle *lockh, u64 *bits); void ldlm_lock_cancel(struct ldlm_lock *lock); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 5e5cf3a..39547a0 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -391,6 +391,7 @@ #define OBD_FAIL_MDC_LIGHTWEIGHT 0x805 #define OBD_FAIL_MDC_CLOSE 0x806 #define OBD_FAIL_MDC_MERGE 0x807 +#define OBD_FAIL_MDC_GLIMPSE_DDOS 0x808 #define OBD_FAIL_MGS 0x900 #define OBD_FAIL_MGS_ALL_REQUEST_NET 0x901 diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 06690a6..cc96fbd 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1053,6 +1053,7 @@ struct lock_match_data { enum ldlm_mode *lmd_mode; union ldlm_policy_data *lmd_policy; u64 lmd_flags; + u64 lmd_skip_flags; int lmd_unref; }; @@ -1133,6 +1134,10 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) if (!equi(data->lmd_flags & LDLM_FL_LOCAL_ONLY, ldlm_is_local(lock))) return false; + /* Filter locks by skipping flags */ + if (data->lmd_skip_flags & lock->l_flags) + return false; + if (data->lmd_flags & LDLM_FL_TEST_LOCK) { LDLM_LOCK_GET(lock); ldlm_lock_touch_in_lru(lock); @@ -1267,12 +1272,13 @@ void ldlm_lock_allow_match(struct ldlm_lock *lock) * keep caller code unchanged), the context failure will be discovered by * caller sometime later. */ -enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, - const struct ldlm_res_id *res_id, - enum ldlm_type type, - union ldlm_policy_data *policy, - enum ldlm_mode mode, - struct lustre_handle *lockh, int unref) +enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, + u64 flags, u64 skip_flags, + const struct ldlm_res_id *res_id, + enum ldlm_type type, + union ldlm_policy_data *policy, + enum ldlm_mode mode, + struct lustre_handle *lockh, int unref) { struct lock_match_data data = { .lmd_old = NULL, @@ -1280,11 +1286,12 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, .lmd_mode = &mode, .lmd_policy = policy, .lmd_flags = flags, + .lmd_skip_flags = skip_flags, .lmd_unref = unref, }; struct ldlm_resource *res; struct ldlm_lock *lock; - int rc = 0; + int matched; if (!ns) { data.lmd_old = ldlm_handle2lock(lockh); @@ -1304,25 +1311,13 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, LDLM_RESOURCE_ADDREF(res); lock_res(res); - if (res->lr_type == LDLM_EXTENT) lock = search_itree(res, &data); else lock = search_queue(&res->lr_granted, &data); - if (lock) { - rc = 1; - goto out; - } - if (flags & LDLM_FL_BLOCK_GRANTED) { - rc = 0; - goto out; - } - lock = search_queue(&res->lr_waiting, &data); - if (lock) { - rc = 1; - goto out; - } -out: + if (!lock && !(flags & LDLM_FL_BLOCK_GRANTED)) + lock = search_queue(&res->lr_waiting, &data); + matched = lock ? mode : 0; unlock_res(res); LDLM_RESOURCE_DELREF(res); ldlm_resource_putref(res); @@ -1338,13 +1333,8 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, LDLM_FL_WAIT_NOREPROC, NULL); if (err) { - if (flags & LDLM_FL_TEST_LOCK) - LDLM_LOCK_RELEASE(lock); - else - ldlm_lock_decref_internal(lock, - mode); - rc = 0; - goto out2; + matched = 0; + goto out_fail_match; } } @@ -1352,49 +1342,49 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, u64 flags, wait_event_idle_timeout(lock->l_waitq, lock->l_flags & wait_flags, obd_timeout * HZ); + if (!ldlm_is_lvb_ready(lock)) { - if (flags & LDLM_FL_TEST_LOCK) - LDLM_LOCK_RELEASE(lock); - else - ldlm_lock_decref_internal(lock, mode); - rc = 0; + matched = 0; + goto out_fail_match; } } - } -out2: - if (rc) { - LDLM_DEBUG(lock, "matched (%llu %llu)", - (type == LDLM_PLAIN || type == LDLM_IBITS) ? - res_id->name[2] : policy->l_extent.start, - (type == LDLM_PLAIN || type == LDLM_IBITS) ? - res_id->name[3] : policy->l_extent.end); /* check user's security context */ if (lock->l_conn_export && sptlrpc_import_check_ctx(class_exp2cliimp(lock->l_conn_export))) { - if (!(flags & LDLM_FL_TEST_LOCK)) - ldlm_lock_decref_internal(lock, mode); - rc = 0; + matched = 0; + goto out_fail_match; } + LDLM_DEBUG(lock, "matched (%llu %llu)", + (type == LDLM_PLAIN || type == LDLM_IBITS) ? + res_id->name[2] : policy->l_extent.start, + (type == LDLM_PLAIN || type == LDLM_IBITS) ? + res_id->name[3] : policy->l_extent.end); + +out_fail_match: if (flags & LDLM_FL_TEST_LOCK) LDLM_LOCK_RELEASE(lock); + else if (!matched) + ldlm_lock_decref_internal(lock, mode); + } - } else if (!(flags & LDLM_FL_TEST_LOCK)) {/*less verbose for test-only*/ + /* less verbose for test-only */ + if (!matched && !(flags & LDLM_FL_TEST_LOCK)) { LDLM_DEBUG_NOLOCK("not matched ns %p type %u mode %u res %llu/%llu (%llu %llu)", ns, type, mode, res_id->name[0], res_id->name[1], (type == LDLM_PLAIN || type == LDLM_IBITS) ? - res_id->name[2] : policy->l_extent.start, + res_id->name[2] : policy->l_extent.start, (type == LDLM_PLAIN || type == LDLM_IBITS) ? - res_id->name[3] : policy->l_extent.end); + res_id->name[3] : policy->l_extent.end); } if (data.lmd_old) LDLM_LOCK_PUT(data.lmd_old); - return rc ? mode : 0; + return matched; } -EXPORT_SYMBOL(ldlm_lock_match); +EXPORT_SYMBOL(ldlm_lock_match_with_skip); enum ldlm_mode ldlm_revalidate_lock_handle(const struct lustre_handle *lockh, u64 *bits) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index f23f6cf..cb173f4 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -676,10 +676,16 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, if (einfo->ei_mode == LCK_PR) mode |= LCK_PW; - if (!glimpse) + if (glimpse) match_flags |= LDLM_FL_BLOCK_GRANTED; - mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id, - einfo->ei_type, policy, mode, &lockh, 0); + /* DOM locking uses LDLM_FL_KMS_IGNORE to mark locks wich have no valid + * LVB information, e.g. canceled locks or locks of just pruned object, + * such locks should be skipped. + */ + mode = ldlm_lock_match_with_skip(obd->obd_namespace, match_flags, + LDLM_FL_KMS_IGNORE, res_id, + einfo->ei_type, policy, mode, + &lockh, 0); if (mode) { struct ldlm_lock *matched; @@ -687,8 +693,16 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, return ELDLM_OK; matched = ldlm_handle2lock(&lockh); - if (!matched || ldlm_is_kms_ignore(matched)) + /* this shouldn't happen but this check is kept to make + * related test fail if problem occurs + */ + if (unlikely(ldlm_is_kms_ignore(matched))) { + LDLM_ERROR(matched, "matched lock has KMS ignore flag"); goto no_match; + } + + if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GLIMPSE_DDOS)) + ldlm_set_kms_ignore(matched); if (mdc_set_dom_lock_data(env, matched, einfo->ei_cbdata)) { *flags |= LDLM_FL_LVB_READY; @@ -1337,11 +1351,9 @@ static int mdc_attr_get(const struct lu_env *env, struct cl_object *obj, static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data) { - if ((!lock->l_ast_data && !ldlm_is_kms_ignore(lock)) || - (lock->l_ast_data == data)) { + if (lock->l_ast_data == data) lock->l_ast_data = NULL; - ldlm_set_kms_ignore(lock); - } + ldlm_set_kms_ignore(lock); return LDLM_ITER_CONTINUE; } From patchwork Thu Feb 27 21:11:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410137 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6643F138D for ; Thu, 27 Feb 2020 21:31:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4EFD120801 for ; Thu, 27 Feb 2020 21:31:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4EFD120801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 544C434902C; Thu, 27 Feb 2020 13:26:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AD2121FB96 for ; Thu, 27 Feb 2020 13:19:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 07DAA2C58; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0636646F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:38 -0500 Message-Id: <1582838290-17243-231-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 230/622] lustre: dne: performance improvement for file creation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jinshan Xiong This is to remove an obsoleted code where it causes drastic performance degradation. This code is written before PERM lock is introduced, and it requests UPDATE lock at path walk for remote directory, which will be cancelled at later file creation. Tests result before and after this patch is applied: Test case: rm -rf /mnt/lustre_purple/testdir lfs mkdir -i 0 /mnt/lustre_purple/testdir lfs mkdir -i 2 /mnt/lustre_purple/testdir/dir2 ./lustre-release/lustre/tests/createmany -o \ /mnt/lustre_purple/testdir/dir2/f 10000 Before the patch is applied: total: 10000 open/close in 12.82 seconds: 780.22 ops/second After the patch is applied: total: 10000 open/close in 4.89 seconds: 2044.75 ops/second WC-bug-id: https://jira.whamcloud.com/browse/LU-11999 Lustre-commit: bfbd062e6b17 ("LU-11999 dne: performance improvement for file creation") Signed-off-by: Jinshan Xiong Reviewed-on: https://review.whamcloud.com/34291 Reviewed-by: Lai Siyao Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_intent.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 3f51032..6933f7d 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -71,13 +71,6 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, LASSERT((body->mbo_valid & OBD_MD_MDS)); /* - * Unfortunately, we have to lie to MDC/MDS to retrieve - * attributes llite needs and provideproper locking. - */ - if (it->it_op & IT_LOOKUP) - it->it_op = IT_GETATTR; - - /* * We got LOOKUP lock, but we really need attrs. */ pmode = it->it_lock_mode; From patchwork Thu Feb 27 21:11:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410141 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58CE592A for ; Thu, 27 Feb 2020 21:31:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4188E24677 for ; Thu, 27 Feb 2020 21:31:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4188E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 394C63497BE; Thu, 27 Feb 2020 13:26:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9D47421FAB4 for ; Thu, 27 Feb 2020 13:19:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0AF5F2C59; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 094C846A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:39 -0500 Message-Id: <1582838290-17243-232-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 231/622] lustre: mdc: return DOM size on open resend X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin DOM size is returned along with DOM lock always, but it is not true with open resend. Fix was server side but we did update a mdc debug message. WC-bug-id: https://jira.whamcloud.com/browse/LU-11835 Lustre-commit: bc3ef43d36b5 ("LU-11835 mdt: return DOM size on open resend") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34044 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_locks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 9898b6a..55de559 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -742,7 +742,7 @@ static int mdc_finish_enqueue(struct obd_export *exp, body = req_capsule_server_get(pill, &RMF_MDT_BODY); if (!(body->mbo_valid & OBD_MD_DOM_SIZE)) { - LDLM_ERROR(lock, "%s: DoM lock without size.\n", + LDLM_ERROR(lock, "%s: DoM lock without size.", exp->exp_obd->obd_name); rc = -EPROTO; goto out_lock; From patchwork Thu Feb 27 21:11:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410145 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 541FF138D for ; Thu, 27 Feb 2020 21:31:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3CF6F24677 for ; Thu, 27 Feb 2020 21:31:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3CF6F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B083349047; Thu, 27 Feb 2020 13:26:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFCC321FAB4 for ; Thu, 27 Feb 2020 13:19:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0EE3A2C5A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0C3CE468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:40 -0500 Message-Id: <1582838290-17243-233-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 232/622] lustre: llite: optimizations for not granted lock processing X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko This patch removes ll_md_blocking_ast() processing for not granted locks. The reason is ll_invalidate_negative_children() can slow down I/O significantly without a reason if there are thousands or millions of files in the directory cache. Seagate-bug-id: MRP-3409 WC-bug-id: https://jira.whamcloud.com/browse/LU-8047 Lustre-commit: 2c126c5a73ed ("LU-8047 llite: optimizations for not granted lock processing") Signed-off-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/19665 Reviewed-by: Mike Pershin Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 5 +++++ fs/lustre/ldlm/ldlm_extent.c | 2 +- fs/lustre/ldlm/ldlm_internal.h | 3 +-- fs/lustre/ldlm/ldlm_lock.c | 6 +++--- fs/lustre/ldlm/ldlm_lockd.c | 4 ++-- fs/lustre/ldlm/ldlm_request.c | 7 +++---- fs/lustre/llite/namei.c | 4 ++++ fs/lustre/osc/osc_lock.c | 8 ++++---- fs/lustre/osc/osc_request.c | 2 +- 9 files changed, 24 insertions(+), 17 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index a95555e..355049f 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -876,6 +876,11 @@ struct ldlm_resource { struct lu_ref lr_reference; }; +static inline int ldlm_is_granted(struct ldlm_lock *lock) +{ + return lock->l_req_mode == lock->l_granted_mode; +} + static inline bool ldlm_has_layout(struct ldlm_lock *lock) { return lock->l_resource->lr_type == LDLM_IBITS && diff --git a/fs/lustre/ldlm/ldlm_extent.c b/fs/lustre/ldlm/ldlm_extent.c index 7c72d04..98e2a75 100644 --- a/fs/lustre/ldlm/ldlm_extent.c +++ b/fs/lustre/ldlm/ldlm_extent.c @@ -151,7 +151,7 @@ void ldlm_extent_add_lock(struct ldlm_resource *res, struct ldlm_interval_tree *tree; int idx; - LASSERT(lock->l_granted_mode == lock->l_req_mode); + LASSERT(ldlm_is_granted(lock)); LASSERT(RB_EMPTY_NODE(&lock->l_rb)); diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index df57c02..ede48b2 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -310,8 +310,7 @@ static inline int is_granted_or_cancelled(struct ldlm_lock *lock) int ret = 0; lock_res_and_lock(lock); - if ((lock->l_req_mode == lock->l_granted_mode) && - !ldlm_is_cp_reqd(lock)) + if (ldlm_is_granted(lock) && !ldlm_is_cp_reqd(lock)) ret = 1; else if (ldlm_is_failed(lock) || ldlm_is_cancel(lock)) ret = 1; diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index cc96fbd..b6c49c5 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -992,7 +992,7 @@ void ldlm_grant_lock_with_skiplist(struct ldlm_lock *lock) { struct sl_insert_point prev; - LASSERT(lock->l_req_mode == lock->l_granted_mode); + LASSERT(ldlm_is_granted(lock)); search_granted_lock(&lock->l_resource->lr_granted, lock, &prev); ldlm_granted_list_add_lock(lock, &prev); @@ -1591,7 +1591,7 @@ enum ldlm_error ldlm_lock_enqueue(const struct lu_env *env, struct ldlm_resource *res = lock->l_resource; lock_res_and_lock(lock); - if (lock->l_req_mode == lock->l_granted_mode) { + if (ldlm_is_granted(lock)) { /* The server returned a blocked lock, but it was granted * before we got a chance to actually enqueue it. We don't * need to do anything else. @@ -1799,7 +1799,7 @@ void ldlm_lock_cancel(struct ldlm_lock *lock) ldlm_resource_unlink_lock(lock); ldlm_lock_destroy_nolock(lock); - if (lock->l_granted_mode == lock->l_req_mode) + if (ldlm_is_granted(lock)) ldlm_pool_del(&ns->ns_pool, lock); /* Make sure we will not be called again for same lock what is possible diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 2985e37..db0da99 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -193,7 +193,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, while (to > 0) { schedule_timeout_interruptible(to); - if (lock->l_granted_mode == lock->l_req_mode || + if (ldlm_is_granted(lock) || ldlm_is_destroyed(lock)) break; } @@ -236,7 +236,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, } if (ldlm_is_destroyed(lock) || - lock->l_granted_mode == lock->l_req_mode) { + ldlm_is_granted(lock)) { /* bug 11300: the lock has already been granted */ unlock_res_and_lock(lock); LDLM_DEBUG(lock, "Double grant race happened"); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index b9e9ae9..7c3935f 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -292,8 +292,7 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns, /* Set a flag to prevent us from sending a CANCEL (bug 407) */ lock_res_and_lock(lock); /* Check that lock is not granted or failed, we might race. */ - if ((lock->l_req_mode != lock->l_granted_mode) && - !ldlm_is_failed(lock)) { + if (!ldlm_is_granted(lock) && !ldlm_is_failed(lock)) { /* Make sure that this lock will not be found by raced * bl_ast and -EINVAL reply is sent to server anyways. * bug 17645 @@ -477,7 +476,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, * a tiny window for completion to get in */ lock_res_and_lock(lock); - if (lock->l_req_mode != lock->l_granted_mode) + if (!ldlm_is_granted(lock)) rc = ldlm_fill_lvb(lock, &req->rq_pill, RCL_SERVER, lock->l_lvb_data, lvb_len); unlock_res_and_lock(lock); @@ -2196,7 +2195,7 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock) * This happens whenever a lock enqueue is the request that triggers * recovery. */ - if (lock->l_granted_mode == lock->l_req_mode) + if (ldlm_is_granted(lock)) flags = LDLM_FL_REPLAY | LDLM_FL_BLOCK_GRANTED; else if (lock->l_granted_mode) flags = LDLM_FL_REPLAY | LDLM_FL_BLOCK_CONV; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 3e3fbd9..e410ff0 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -464,6 +464,10 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, break; } case LDLM_CB_CANCELING: + /* Nothing to do for non-granted locks */ + if (!ldlm_is_granted(lock)) + break; + if (ldlm_is_converting(lock)) { /* this is called on already converted lock, so * ibits has remained bits only and cancel_bits diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index eccea37..29d8373 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -105,7 +105,7 @@ static int osc_lock_invariant(struct osc_lock *ols) return 0; if (!ergo(ols->ols_state == OLS_GRANTED, - olock && olock->l_req_mode == olock->l_granted_mode && + olock && ldlm_is_granted(olock) && ols->ols_hold)) return 0; return 1; @@ -227,7 +227,7 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, /* Lock must have been granted. */ lock_res_and_lock(dlmlock); - if (dlmlock->l_granted_mode == dlmlock->l_req_mode) { + if (ldlm_is_granted(dlmlock)) { struct ldlm_extent *ext = &dlmlock->l_policy_data.l_extent; struct cl_lock_descr *descr = &oscl->ols_cl.cls_lock->cll_descr; @@ -336,7 +336,7 @@ static int osc_lock_upcall_speculative(void *cookie, LASSERT(dlmlock); lock_res_and_lock(dlmlock); - LASSERT(dlmlock->l_granted_mode == dlmlock->l_req_mode); + LASSERT(ldlm_is_granted(dlmlock)); /* there is no osc_lock associated with speculative lock */ osc_lock_lvb_update(env, osc, dlmlock, NULL); @@ -401,7 +401,7 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, LASSERT(flag == LDLM_CB_CANCELING); lock_res_and_lock(dlmlock); - if (dlmlock->l_granted_mode != dlmlock->l_req_mode) { + if (!ldlm_is_granted(dlmlock)) { dlmlock->l_ast_data = NULL; unlock_res_and_lock(dlmlock); return 0; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index c55d5a9..7190da9 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -3163,7 +3163,7 @@ static int osc_cancel_weight(struct ldlm_lock *lock) * Cancel all unused and granted extent lock. */ if (lock->l_resource->lr_type == LDLM_EXTENT && - lock->l_granted_mode == lock->l_req_mode && + ldlm_is_granted(lock) && osc_ldlm_weigh_ast(lock) == 0) return 1; From patchwork Thu Feb 27 21:11:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410063 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23E18138D for ; Thu, 27 Feb 2020 21:29:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0C939246A1 for ; Thu, 27 Feb 2020 21:29:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C939246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4C9D348DD9; Thu, 27 Feb 2020 13:25:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 444AD21FAB4 for ; Thu, 27 Feb 2020 13:19:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1044C2C5B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0F1B246C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:41 -0500 Message-Id: <1582838290-17243-234-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 233/622] lustre: osc: propagate grant shrink interval immediately X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev currently the new interval (updated with lctl) will be used only when the next shrink happens. with default interval it will take at least 20 minutes. instead we should refresh it immediately. WC-bug-id: https://jira.whamcloud.com/browse/LU-11408 Lustre-commit: 0b09a19bdf2d ("LU-11408 osc: propagate grant shrink interval immediately") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/33204 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 2 ++ fs/lustre/osc/osc_internal.h | 1 + fs/lustre/osc/osc_request.c | 6 ++++++ 3 files changed, 9 insertions(+) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index ea67d20..5faf518 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -349,6 +349,8 @@ static ssize_t grant_shrink_interval_store(struct kobject *kobj, return -ERANGE; obd->u.cli.cl_grant_shrink_interval = val; + osc_update_next_shrink(&obd->u.cli); + osc_schedule_grant_work(); return count; } diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index 2cb737b..0f0f4d4 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -43,6 +43,7 @@ extern struct ptlrpc_request_pool *osc_rq_pool; int osc_shrink_grant_to_target(struct client_obd *cli, u64 target_bytes); +void osc_schedule_grant_work(void); void osc_update_next_shrink(struct client_obd *cli); int lru_queue_work(const struct lu_env *env, void *data); int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext, diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 7190da9..7b120da 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -905,6 +905,12 @@ static void osc_grant_work_handler(struct work_struct *data) schedule_work(&work.work); } +void osc_schedule_grant_work(void) +{ + cancel_delayed_work_sync(&work); + schedule_work(&work.work); +} + /** * Start grant thread for returing grant to server for idle clients. */ From patchwork Thu Feb 27 21:11:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410303 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC8D392A for ; Thu, 27 Feb 2020 21:34:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D270E24677 for ; Thu, 27 Feb 2020 21:34:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D270E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A0C98349EFE; Thu, 27 Feb 2020 13:29:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 84C3121FCEC for ; Thu, 27 Feb 2020 13:19:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 13BB72C5C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 11FFB46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:42 -0500 Message-Id: <1582838290-17243-235-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 234/622] lustre: osc: grant shrink shouldn't account skipped OSC X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev otherwise only the first 100 OSCs are subject to grant shrink procedure. WC-bug-id: https://jira.whamcloud.com/browse/LU-11409 Lustre-commit: 2b215d3763a8 ("LU-11409 osc: grant shrink shouldn't account skipped OSC") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/33206 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 7b120da..14180a4 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -879,9 +879,11 @@ static void osc_grant_work_handler(struct work_struct *data) mutex_lock(&client_gtd.gtd_mutex); list_for_each_entry(cli, &client_gtd.gtd_clients, cl_grant_chain) { - if (++rpc_sent < GRANT_SHRINK_RPC_BATCH && - osc_should_shrink_grant(cli)) + if (rpc_sent < GRANT_SHRINK_RPC_BATCH && + osc_should_shrink_grant(cli)) { osc_shrink_grant(cli); + rpc_sent++; + } if (!init_next_shrink) { if (cli->cl_next_shrink_grant < next_shrink && From patchwork Thu Feb 27 21:11:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410057 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 020D81580 for ; Thu, 27 Feb 2020 21:29:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DEE32246A1 for ; Thu, 27 Feb 2020 21:29:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEE32246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0BA2D3493D7; Thu, 27 Feb 2020 13:25:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C5FC821FCEC for ; Thu, 27 Feb 2020 13:19:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 164C52C5D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 14BCE46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:43 -0500 Message-Id: <1582838290-17243-236-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 235/622] lustre: quota: protect quota flags at OSC X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang There is no protection in OSC quota hash tracking the quota flags of different qid, which could cause the previous request to modify the quota flags which was set by the current request because the replies could be out of order. This patch also adds a lock to protect the operations on the quota hash from different requests. WC-bug-id: https://jira.whamcloud.com/browse/LU-11678 Lustre-commit: 77d9f4e05a5c ("LU-11678 quota: protect quota flags at OSC") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/33747 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 3 +++ fs/lustre/osc/osc_internal.h | 2 +- fs/lustre/osc/osc_quota.c | 11 ++++++++++- fs/lustre/osc/osc_request.c | 3 ++- 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index bf0bf97..ff94092 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -344,8 +344,11 @@ struct client_obd { /* ptlrpc work for writeback in ptlrpcd context */ void *cl_writeback_work; void *cl_lru_work; + struct mutex cl_quota_mutex; /* hash tables for osc_quota_info */ struct rhashtable cl_quota_hash[MAXQUOTAS]; + /* the xid of the request updating the hash tables */ + u64 cl_quota_last_xid; /* Links to the global list of registered changelog devices */ struct list_head cl_chg_dev_linkage; }; diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index 0f0f4d4..6f71d8d 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -136,7 +136,7 @@ static inline char *cli_name(struct client_obd *cli) int osc_quota_setup(struct obd_device *obd); int osc_quota_cleanup(struct obd_device *obd); -int osc_quota_setdq(struct client_obd *cli, const unsigned int qid[], +int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[], u32 valid, u32 flags); int osc_quota_chkdq(struct client_obd *cli, const unsigned int qid[]); int osc_quotactl(struct obd_device *unused, struct obd_export *exp, diff --git a/fs/lustre/osc/osc_quota.c b/fs/lustre/osc/osc_quota.c index cb5ddef..316e087 100644 --- a/fs/lustre/osc/osc_quota.c +++ b/fs/lustre/osc/osc_quota.c @@ -109,7 +109,7 @@ static inline u32 fl_quota_flag(int qtype) } } -int osc_quota_setdq(struct client_obd *cli, const unsigned int qid[], +int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[], u32 valid, u32 flags) { int type; @@ -118,6 +118,11 @@ int osc_quota_setdq(struct client_obd *cli, const unsigned int qid[], if ((valid & (OBD_MD_FLALLQUOTA)) == 0) return 0; + mutex_lock(&cli->cl_quota_mutex); + if (cli->cl_quota_last_xid > xid) + goto out_unlock; + + cli->cl_quota_last_xid = xid; for (type = 0; type < MAXQUOTAS; type++) { struct osc_quota_info *oqi; @@ -175,6 +180,8 @@ int osc_quota_setdq(struct client_obd *cli, const unsigned int qid[], } } +out_unlock: + mutex_unlock(&cli->cl_quota_mutex); return rc; } @@ -191,6 +198,8 @@ int osc_quota_setup(struct obd_device *obd) struct client_obd *cli = &obd->u.cli; int i, type; + mutex_init(&cli->cl_quota_mutex); + for (type = 0; type < MAXQUOTAS; type++) { if (rhashtable_init(&cli->cl_quota_hash[type], "a_hash_params) != 0) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 14180a4..dca141f 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1753,7 +1753,8 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) "setdq for [%u %u %u] with valid %#llx, flags %x\n", body->oa.o_uid, body->oa.o_gid, body->oa.o_projid, body->oa.o_valid, body->oa.o_flags); - osc_quota_setdq(cli, qid, body->oa.o_valid, body->oa.o_flags); + osc_quota_setdq(cli, req->rq_xid, qid, body->oa.o_valid, + body->oa.o_flags); } osc_update_grant(cli, body); From patchwork Thu Feb 27 21:11:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410067 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E19201580 for ; Thu, 27 Feb 2020 21:29:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9239246A1 for ; Thu, 27 Feb 2020 21:29:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C9239246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A491A34943B; Thu, 27 Feb 2020 13:25:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28E6721FCF5 for ; Thu, 27 Feb 2020 13:19:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1A7402C5E; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1794546A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:44 -0500 Message-Id: <1582838290-17243-237-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 236/622] lustre: osc: pass client page size during reconnect too X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Client page size is reported to the server in ocd_grant_blkbits and server returns back device blocksize. During reconnect that ocd_grant_blkbits contains server device blocksize which is used by server as client page size wrongly. Patch sets ocd_grant_blkbits to the client page size again during reconnect so server will get expected information. WC-bug-id: https://jira.whamcloud.com/browse/LU-11752 Lustre-commit: 5bec8f95cc10 ("LU-11752 osc: pass client page size during reconnect too") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33847 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index dca141f..a7e4f7a 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -3003,10 +3003,13 @@ int osc_reconnect(const struct lu_env *env, struct obd_export *exp, spin_lock(&cli->cl_loi_list_lock); grant = cli->cl_avail_grant + cli->cl_reserved_grant; - if (data->ocd_connect_flags & OBD_CONNECT_GRANT_PARAM) + if (data->ocd_connect_flags & OBD_CONNECT_GRANT_PARAM) { + /* restore ocd_grant_blkbits as client page bits */ + data->ocd_grant_blkbits = PAGE_SHIFT; grant += cli->cl_dirty_grant; - else + } else { grant += cli->cl_dirty_pages << PAGE_SHIFT; + } data->ocd_grant = grant ? : 2 * cli_brw_size(obd); lost_grant = cli->cl_lost_grant; cli->cl_lost_grant = 0; From patchwork Thu Feb 27 21:11:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410307 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFAB6138D for ; Thu, 27 Feb 2020 21:34:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B862524677 for ; Thu, 27 Feb 2020 21:34:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B862524677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C6648349F57; Thu, 27 Feb 2020 13:29:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 694E221FCF5 for ; Thu, 27 Feb 2020 13:19:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1C1AA2C5F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1A67C468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:45 -0500 Message-Id: <1582838290-17243-238-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 237/622] lustre: ptlrpc: Change static defines to use macro for sec_gc.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch replaces all mutex, locks, and wait qeueues which are defined statically in file fs/lustre/ptlrpc/sec_gc.c with kernel provided macro. WC-bug-id: https://jira.whamcloud.com/browse/LU-9010 Lustre-commit: 50c01e02506f ("LU-9010 ptlrpc: Change static defines to use macro for sec_gc.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33937 Reviewed-by: Ben Evans Reviewed-by: Sebastien Buisson Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/sec_gc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/lustre/ptlrpc/sec_gc.c b/fs/lustre/ptlrpc/sec_gc.c index d5edcec..3baed8c 100644 --- a/fs/lustre/ptlrpc/sec_gc.c +++ b/fs/lustre/ptlrpc/sec_gc.c @@ -48,12 +48,12 @@ #define SEC_GC_INTERVAL (30 * 60) -static struct mutex sec_gc_mutex; +static DEFINE_MUTEX(sec_gc_mutex); static LIST_HEAD(sec_gc_list); -static spinlock_t sec_gc_list_lock; +static DEFINE_SPINLOCK(sec_gc_list_lock); static LIST_HEAD(sec_gc_ctx_list); -static spinlock_t sec_gc_ctx_list_lock; +static DEFINE_SPINLOCK(sec_gc_ctx_list_lock); static atomic_t sec_gc_wait_del = ATOMIC_INIT(0); @@ -176,10 +176,6 @@ static void sec_gc_main(struct work_struct *ws) int sptlrpc_gc_init(void) { - mutex_init(&sec_gc_mutex); - spin_lock_init(&sec_gc_list_lock); - spin_lock_init(&sec_gc_ctx_list_lock); - schedule_delayed_work(&sec_gc_work, 0); return 0; } From patchwork Thu Feb 27 21:11:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410061 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54119138D for ; Thu, 27 Feb 2020 21:29:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3D0B2246A1 for ; Thu, 27 Feb 2020 21:29:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D0B2246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0139E349402; Thu, 27 Feb 2020 13:25:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AA63F21FA7D for ; Thu, 27 Feb 2020 13:19:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1E4C22C60; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1D54846C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:46 -0500 Message-Id: <1582838290-17243-239-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 238/622] lnet: libcfs: do not calculate debug_mb if it is set X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev debug_mb is libcfs module parameter. It should be possible to set it via modprobe libcfs libcfs_debug_mb=800 or via adding options libcfs libcfs_debug_mb=800 to modules configuration. Fixes: 0871d551af ("staging/lustre/libcfs: move /proc/sys/lnet to debugfs") WC-bug-id: https://jira.whamcloud.com/browse/LU-11898 Lustre-commit: adeb29400a4a ("LU-11898 libcfs: do not calculate debug_mb if it is set") Signed-off-by: Vladimir Saveliev Cray-bug-id: LUS-6936 Reviewed-on: https://review.whamcloud.com/34128 Reviewed-by: Andreas Dilger Reviewed-by: Alexander Zarochentsev Signed-off-by: James Simmons --- net/lnet/libcfs/debug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c index 88c4c36..c6b92df 100644 --- a/net/lnet/libcfs/debug.c +++ b/net/lnet/libcfs/debug.c @@ -553,7 +553,8 @@ int libcfs_debug_init(unsigned long bufsize) libcfs_register_panic_notifier(); kernel_param_lock(THIS_MODULE); - libcfs_debug_mb = cfs_trace_get_debug_mb(); + if (libcfs_debug_mb == 0) + libcfs_debug_mb = cfs_trace_get_debug_mb(); kernel_param_unlock(THIS_MODULE); return rc; } From patchwork Thu Feb 27 21:11:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410875 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A430E924 for ; Thu, 27 Feb 2020 21:49:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8C88E24690 for ; Thu, 27 Feb 2020 21:49:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C88E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E311D34A6CD; Thu, 27 Feb 2020 13:41:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ECE9D21FCE0 for ; Thu, 27 Feb 2020 13:19:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 22CDD2C61; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 203D846D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:47 -0500 Message-Id: <1582838290-17243-240-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 239/622] lustre: ldlm: Lost lease lock on migrate error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh All the file operations have the following locking order - parent, child. If a lock for a child is returned to the client, the following operations on this file are done by the child fid. However, the migrate is an exception - it takes the lease lock first and takes the PW parent lock next during the MDS_REINT. At the same time, if there is a parallel racing operation (open) which has taken a lock on parent (conflicting with the next MDS_REINT) and is trying to take a lock on child - it is blocked until the lease cancel comes. The lease cancel is piggy-backed on the MDS_REINT RPC and is handled at the end of the operation, trying to take the conflicting parent lock first - thus a deadlock occurs. At the same time, the lease lock is not supposed to block anything, it is just an indicator on the server there is no other conflicting operation has occurred during the migration - thus set LDLM_FL_CANCEL_ON_BLOCK on it and the conflicting operation will not be blocked. In this case, the MDS_REINT will return -EAGAIN as the lease is cancelled and the client will retry its migration. Cray-bug-id: LUS-6811 WC-bug-id: https://jira.whamcloud.com/browse/LU-11926 Lustre-commit: ae7ca90713b4 ("LU-11926 ldlm: Lost lease lock on migrate error") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/34182 Reviewed-by: Vitaly Fertman Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ldlm/ldlm_lockd.c | 3 --- fs/lustre/ldlm/ldlm_request.c | 4 ++++ fs/lustre/llite/file.c | 4 +++- 4 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 39547a0..a60fa07 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -302,6 +302,7 @@ #define OBD_FAIL_LDLM_CP_CB_WAIT5 0x323 #define OBD_FAIL_LDLM_GRANT_CHECK 0x32a +#define OBD_FAIL_LDLM_LOCAL_CANCEL_PAUSE 0x32c /* LOCKLESS IO */ #define OBD_FAIL_LDLM_SET_CONTENTION 0x385 diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index db0da99..ea146aa 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -149,9 +149,6 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, } ldlm_set_cbpending(lock); - if (ldlm_is_cancel_on_block(lock)) - ldlm_set_cancel(lock); - do_ast = !lock->l_readers && !lock->l_writers; unlock_res_and_lock(lock); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 7c3935f..fb564f4 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1293,6 +1293,10 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, ldlm_set_canceling(lock); unlock_res_and_lock(lock); + if (cancel_flags & LCF_LOCAL) + OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_LOCAL_CANCEL_PAUSE, + cfs_fail_val); + rc = ldlm_cli_cancel_local(lock); if (rc == LDLM_FL_LOCAL_ONLY || cancel_flags & LCF_LOCAL) { LDLM_LOCK_RELEASE(lock); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 4560ae0..7ec1099 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3934,7 +3934,9 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (!rc) { LASSERT(request); ll_update_times(request, parent); + } + if (rc == 0 || rc == -EAGAIN) { body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY); LASSERT(body); @@ -3957,7 +3959,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, request = NULL; } - /* Try again if the file layout has changed. */ + /* Try again if the lease has cancelled. */ if (rc == -EAGAIN && S_ISREG(child_inode->i_mode)) goto again; From patchwork Thu Feb 27 21:11:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410311 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B15E92A for ; Thu, 27 Feb 2020 21:34:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2386924677 for ; Thu, 27 Feb 2020 21:34:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2386924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 54151348DB0; Thu, 27 Feb 2020 13:29:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4EC7E21FC22 for ; Thu, 27 Feb 2020 13:19:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 250242C62; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2303246F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:48 -0500 Message-Id: <1582838290-17243-241-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 240/622] lnet: lnd: increase CQ entries X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Several sites have reported RDMA timeouts. Most of the timeouts are occurring for transmits on the active_tx queue. Transmits are placed on the active_tx queue until a completion is received. If there isn't enough CQ entries available, it's possible for a completions events to be delayed, causing these timeouts. WC-bug-id: https://jira.whamcloud.com/browse/LU-12065 Lustre-commit: bf3fc7f1a7bf ("LU-12065 lnd: increase CQ entries") Signed-off-by: Amir Shehata Reviewed-by: Sonia Sharma Reviewed-by: James Simmons Reviewed-on: https://review.whamcloud.com/34473 Reviewed-by: Chris Horn Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 999b58d..44f1d84 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -136,8 +136,7 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) -#define IBLND_CQ_ENTRIES(c) \ - (IBLND_RECV_WRS(c) + 2 * c->ibc_queue_depth) +#define IBLND_CQ_ENTRIES(c) (IBLND_RECV_WRS(c) + kiblnd_send_wrs(c)) struct kib_hca_dev; From patchwork Thu Feb 27 21:11:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410071 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F13B4138D for ; Thu, 27 Feb 2020 21:29:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D9D5A246A0 for ; Thu, 27 Feb 2020 21:29:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9D5A246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0A3C734946D; Thu, 27 Feb 2020 13:25:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F8D721FCFD for ; Thu, 27 Feb 2020 13:19:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 28A7F2C63; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2606246A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:49 -0500 Message-Id: <1582838290-17243-242-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 241/622] lustre: security: return security context for metadata ops X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini Security layer needs to fetch security context of files/dirs upon metadata ops like lookup, getattr, open, truncate, and layout, for its own purpose and control checks. Retrieving the security context consists in a getxattr operation at the file system level. The fact that the requested metadata operation and the getxattr are not atomic can create a window for a dead-lock situation where, based on some access patterns, all MDT service threads can become stuck waiting for lookup lock to be released and thus unable to serve getxattr for security context. Another problem is that sending an additional getxattr request for every metadata op hurts performance. This patch introduces a way to get atomicity by having the MDT return security context upon granted lock reply, sparing the client an additional getxattr request. WC-bug-id: https://jira.whamcloud.com/browse/LU-9193 Lustre-commit: fca35f74f9ec ("LU-9193 security: return security context for metadata ops") Signed-off-by: Bruno Faccini Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/26831 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 3 +- fs/lustre/llite/llite_internal.h | 3 ++ fs/lustre/llite/namei.c | 60 +++++++++++++++++++++++++++++++-- fs/lustre/llite/xattr_security.c | 19 +++++++++++ fs/lustre/lmv/lmv_intent.c | 21 ++++++++++-- fs/lustre/mdc/mdc_locks.c | 61 +++++++++++++++++++++++++++++++++- fs/lustre/mdc/mdc_request.c | 2 ++ fs/lustre/ptlrpc/layout.c | 9 +++-- include/uapi/linux/lustre/lustre_idl.h | 1 + 9 files changed, 169 insertions(+), 10 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index ff94092..758efc1 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -778,8 +778,9 @@ struct md_op_data { u64 op_data_version; struct lustre_handle op_lease_handle; - /* File security context, for creates. */ + /* File security context, for creates/metadata ops */ const char *op_file_secctx_name; + u32 op_file_secctx_name_size; void *op_file_secctx; u32 op_file_secctx_size; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d41531b..3c81c3b 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -279,6 +279,9 @@ int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name, int ll_inode_init_security(struct dentry *dentry, struct inode *inode, struct inode *dir); +int ll_listsecurity(struct inode *inode, char *secctx_name, + size_t secctx_name_size); + /* * Locking to guarantee consistency of non-atomic updates to long long i_size, * consistency between file size and KMS. diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index e410ff0..ee3ce70 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -592,7 +592,8 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) static int ll_lookup_it_finish(struct ptlrpc_request *request, struct lookup_intent *it, - struct inode *parent, struct dentry **de) + struct inode *parent, struct dentry **de, + void *secctx, u32 secctxlen) { struct inode *inode = NULL; u64 bits = 0; @@ -605,6 +606,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, CDEBUG(D_DENTRY, "it %p it_disposition %x\n", it, it->it_disposition); if (!it_disposition(it, DISP_LOOKUP_NEG)) { + struct req_capsule *pill = &request->rq_pill; + struct mdt_body *body = req_capsule_server_get(pill, + &RMF_MDT_BODY); + rc = ll_prep_inode(&inode, request, (*de)->d_sb, it); if (rc) return rc; @@ -623,6 +628,32 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, * ll_glimpse_size or some equivalent themselves anyway. * Also see bug 7198. */ + + /* If security context was returned by MDT, put it in + * inode now to save an extra getxattr from security hooks, + * and avoid deadlock. + */ + if (body->mbo_valid & OBD_MD_SECCTX) { + secctx = req_capsule_server_get(pill, &RMF_FILE_SECCTX); + secctxlen = req_capsule_get_size(pill, + &RMF_FILE_SECCTX, + RCL_SERVER); + + if (secctxlen) + CDEBUG(D_SEC, + "server returned security context for " DFID "\n", + PFID(ll_inode2fid(inode))); + } + + if (secctx && secctxlen != 0) { + inode_lock(inode); + rc = security_inode_notifysecctx(inode, secctx, + secctxlen); + inode_unlock(inode); + if (rc) + CWARN("cannot set security context for " DFID ": rc = %d\n", + PFID(ll_inode2fid(inode)), rc); + } } alias = ll_splice_alias(inode, *de); @@ -680,6 +711,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, struct dentry *save = dentry, *retval; struct ptlrpc_request *req = NULL; struct md_op_data *op_data = NULL; + char secctx_name[XATTR_NAME_MAX + 1]; struct inode *inode; u32 opc; int rc; @@ -742,6 +774,28 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, *secctx = op_data->op_file_secctx; if (secctxlen) *secctxlen = op_data->op_file_secctx_size; + } else { + if (secctx) + *secctx = NULL; + if (secctxlen) + *secctxlen = 0; + } + + /* ask for security context upon intent */ + if (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_OPEN)) { + /* get name of security xattr to request to server */ + rc = ll_listsecurity(parent, secctx_name, + sizeof(secctx_name)); + if (rc < 0) { + CDEBUG(D_SEC, + "cannot get security xattr name for " DFID ": rc = %d\n", + PFID(ll_inode2fid(parent)), rc); + } else if (rc > 0) { + op_data->op_file_secctx_name = secctx_name; + op_data->op_file_secctx_name_size = rc; + CDEBUG(D_SEC, "'%.*s' is security xattr for " DFID "\n", + rc, secctx_name, PFID(ll_inode2fid(parent))); + } } rc = md_intent_lock(ll_i2mdexp(parent), op_data, it, &req, @@ -783,7 +837,9 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, /* dir layout may change */ ll_unlock_md_op_lsm(op_data); - rc = ll_lookup_it_finish(req, it, parent, &dentry); + rc = ll_lookup_it_finish(req, it, parent, &dentry, + secctx ? *secctx : NULL, + secctxlen ? *secctxlen : 0); if (rc != 0) { ll_intent_release(it); retval = ERR_PTR(rc); diff --git a/fs/lustre/llite/xattr_security.c b/fs/lustre/llite/xattr_security.c index e5a52d9..e4fb64a 100644 --- a/fs/lustre/llite/xattr_security.c +++ b/fs/lustre/llite/xattr_security.c @@ -132,3 +132,22 @@ int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name, return 0; return err; } + +/** + * Get security context xattr name used by policy. + * + * \retval >= 0 length of xattr name + * \retval < 0 failure to get security context xattr name + */ +int +ll_listsecurity(struct inode *inode, char *secctx_name, size_t secctx_name_size) +{ + int rc; + + rc = security_inode_listsecurity(inode, secctx_name, secctx_name_size); + if (rc >= secctx_name_size) + rc = -ERANGE; + else if (rc >= 0) + secctx_name[rc] = '\0'; + return rc; +} diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 6933f7d..45f1ac5 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -52,7 +52,8 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, const struct lu_fid *parent_fid, struct ptlrpc_request **reqp, ldlm_blocking_callback cb_blocking, - u64 extra_lock_flags) + u64 extra_lock_flags, + const char *secctx_name, u32 secctx_name_size) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; @@ -109,6 +110,16 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, CDEBUG(D_INODE, "REMOTE_INTENT with fid=" DFID " -> mds #%u\n", PFID(&body->mbo_fid1), tgt->ltd_idx); + /* ask for security context upon intent */ + if (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_OPEN) && + secctx_name_size != 0 && secctx_name) { + op_data->op_file_secctx_name = secctx_name; + op_data->op_file_secctx_name_size = secctx_name_size; + CDEBUG(D_SEC, + "'%.*s' is security xattr to fetch for " DFID "\n", + secctx_name_size, secctx_name, PFID(&body->mbo_fid1)); + } + rc = md_intent_lock(tgt->ltd_exp, op_data, it, &req, cb_blocking, extra_lock_flags); if (rc) @@ -385,7 +396,9 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, /* Not cross-ref case, just get out of here. */ if (unlikely((body->mbo_valid & OBD_MD_MDS))) { rc = lmv_intent_remote(exp, it, &op_data->op_fid1, reqp, - cb_blocking, extra_lock_flags); + cb_blocking, extra_lock_flags, + op_data->op_file_secctx_name, + op_data->op_file_secctx_name_size); if (rc != 0) return rc; @@ -471,7 +484,9 @@ static int lmv_intent_lookup(struct obd_export *exp, /* Not cross-ref case, just get out of here. */ if (unlikely((body->mbo_valid & OBD_MD_MDS))) { rc = lmv_intent_remote(exp, it, NULL, reqp, cb_blocking, - extra_lock_flags); + extra_lock_flags, + op_data->op_file_secctx_name, + op_data->op_file_secctx_name_size); if (rc != 0) return rc; body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY); diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 55de559..6f4baa6 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -310,7 +310,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX_NAME, RCL_CLIENT, op_data->op_file_secctx_name ? - strlen(op_data->op_file_secctx_name) + 1 : 0); + op_data->op_file_secctx_name_size : 0); req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, RCL_CLIENT, op_data->op_file_secctx_size); @@ -337,6 +337,30 @@ static int mdc_save_lovea(struct ptlrpc_request *req, obddev->u.cli.cl_max_mds_easize); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); + if (!(it->it_op & IT_CREAT) && it->it_op & IT_OPEN && + req_capsule_has_field(&req->rq_pill, &RMF_FILE_SECCTX_NAME, + RCL_CLIENT) && + op_data->op_file_secctx_name_size > 0 && + op_data->op_file_secctx_name) { + char *secctx_name; + + secctx_name = req_capsule_client_get(&req->rq_pill, + &RMF_FILE_SECCTX_NAME); + memcpy(secctx_name, op_data->op_file_secctx_name, + op_data->op_file_secctx_name_size); + req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, + RCL_SERVER, + obddev->u.cli.cl_max_mds_easize); + + CDEBUG(D_SEC, "packed '%.*s' as security xattr name\n", + op_data->op_file_secctx_name_size, + op_data->op_file_secctx_name); + + } else { + req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, + RCL_SERVER, 0); + } + /** * Inline buffer for possible data from Data-on-MDT files. */ @@ -407,6 +431,8 @@ static int mdc_save_lovea(struct ptlrpc_request *req, /* pack the intent */ lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); lit->opc = IT_GETXATTR; + CDEBUG(D_INFO, "%s: get xattrs for " DFID "\n", + exp->exp_obd->obd_name, PFID(&op_data->op_fid1)); /* If the supplied buffer is too small then the server will * return -ERANGE and llite will fallback to using non cached @@ -454,12 +480,25 @@ static int mdc_save_lovea(struct ptlrpc_request *req, struct ldlm_intent *lit; int rc; u32 easize; + bool have_secctx = false; req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_LDLM_INTENT_GETATTR); if (!req) return ERR_PTR(-ENOMEM); + /* send name of security xattr to get upon intent */ + if (it->it_op & (IT_LOOKUP | IT_GETATTR) && + req_capsule_has_field(&req->rq_pill, &RMF_FILE_SECCTX_NAME, + RCL_CLIENT) && + op_data->op_file_secctx_name_size > 0 && + op_data->op_file_secctx_name) { + have_secctx = true; + req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX_NAME, + RCL_CLIENT, + op_data->op_file_secctx_name_size); + } + req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, op_data->op_namelen + 1); @@ -483,6 +522,26 @@ static int mdc_save_lovea(struct ptlrpc_request *req, req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, easize); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); + + if (have_secctx) { + char *secctx_name; + + secctx_name = req_capsule_client_get(&req->rq_pill, + &RMF_FILE_SECCTX_NAME); + memcpy(secctx_name, op_data->op_file_secctx_name, + op_data->op_file_secctx_name_size); + + req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, + RCL_SERVER, easize); + + CDEBUG(D_SEC, "packed '%.*s' as security xattr name\n", + op_data->op_file_secctx_name_size, + op_data->op_file_secctx_name); + } else { + req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, + RCL_SERVER, 0); + } + ptlrpc_request_set_replen(req); return req; } diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index c08a6ee..88e790f0 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -439,6 +439,8 @@ static int mdc_getxattr(struct obd_export *exp, const struct lu_fid *fid, LASSERT(obd_md_valid == OBD_MD_FLXATTR || obd_md_valid == OBD_MD_FLXATTRLS); + CDEBUG(D_INFO, "%s: get xattr '%s' for " DFID "\n", + exp->exp_obd->obd_name, name, PFID(fid)); rc = mdc_xattr_common(exp, &RQF_MDS_GETXATTR, fid, MDS_GETXATTR, obd_md_valid, name, NULL, 0, buf_size, 0, -1, req); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 2e74ae1b..1dd18b9 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -417,6 +417,7 @@ &RMF_CAPA1, &RMF_CAPA2, &RMF_NIOBUF_INLINE, + &RMF_FILE_SECCTX }; static const struct req_msg_field *ldlm_intent_getattr_client[] = { @@ -425,7 +426,8 @@ &RMF_LDLM_INTENT, &RMF_MDT_BODY, /* coincides with mds_getattr_name_client[] */ &RMF_CAPA1, - &RMF_NAME + &RMF_NAME, + &RMF_FILE_SECCTX_NAME }; static const struct req_msg_field *ldlm_intent_getattr_server[] = { @@ -434,7 +436,8 @@ &RMF_MDT_BODY, &RMF_MDT_MD, &RMF_ACL, - &RMF_CAPA1 + &RMF_CAPA1, + &RMF_FILE_SECCTX }; static const struct req_msg_field *ldlm_intent_create_client[] = { @@ -935,7 +938,7 @@ struct req_msg_field RMF_FILE_SECCTX_NAME = EXPORT_SYMBOL(RMF_FILE_SECCTX_NAME); struct req_msg_field RMF_FILE_SECCTX = - DEFINE_MSGF("file_secctx", 0, -1, NULL, NULL); + DEFINE_MSGF("file_secctx", RMF_F_NO_SIZE_CHECK, -1, NULL, NULL); EXPORT_SYMBOL(RMF_FILE_SECCTX); struct req_msg_field RMF_LLOGD_BODY = diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 76068ee..1a1b6c6 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1198,6 +1198,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_DEFAULT_MEA (0x0040000000000000ULL) /* default MEA */ #define OBD_MD_FLOSTLAYOUT (0x0080000000000000ULL) /* contain ost_layout */ #define OBD_MD_FLPROJID (0x0100000000000000ULL) /* project ID */ +#define OBD_MD_SECCTX (0x0200000000000000ULL) /* embed security xattr */ #define OBD_MD_FLALLQUOTA (OBD_MD_FLUSRQUOTA | \ OBD_MD_FLGRPQUOTA | \ From patchwork Thu Feb 27 21:11:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410879 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CB5A924 for ; Thu, 27 Feb 2020 21:49:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 554C824690 for ; Thu, 27 Feb 2020 21:49:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 554C824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 04EB634A75E; Thu, 27 Feb 2020 13:41:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5D9121FC35 for ; Thu, 27 Feb 2020 13:19:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2B9212C64; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2943C468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:50 -0500 Message-Id: <1582838290-17243-243-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 242/622] lustre: grant: prevent overflow of o_undirty X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Zhuravlev For the server side tgt_grant_inflate() returns a u64, and if tgd_blockbits and val are large enough, can return a value >= 2^32. tgt_grant_incoming() assigns oa->o_undirty the returned value. Since o_undirty is u32, it can overflow. This occurs with Lustre clients < 2.10 and a ZFS backend when the zfs "recordsize" > 128k (the default). In tgt_grant_inflate(), check the returned value and prevent o_undirty from being assigned a value greater than 2^30. For the osc client side use PTLRPC_MAX_RW_SIZE to prevent o_undirty overflow. WC-bug-id: https://jira.whamcloud.com/browse/LU-11798 Lustre-commit: d6f521916211 ("LU-11798 grant: prevent overflow of o_undirty") Signed-off-by: Alexey Zhuravlev Signed-off-by: Olaf Faaland Reviewed-on: https://review.whamcloud.com/33948 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index a7e4f7a..1fc50cc 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -686,8 +686,8 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, /* Do not ask for more than OBD_MAX_GRANT - a margin for server * to add extent tax, etc. */ - oa->o_undirty = min(undirty, OBD_MAX_GRANT - - (PTLRPC_MAX_BRW_PAGES << PAGE_SHIFT)*4UL); + oa->o_undirty = min(undirty, OBD_MAX_GRANT & + ~(PTLRPC_MAX_BRW_SIZE * 4UL)); } oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant; oa->o_dropped = cli->cl_lost_grant; From patchwork Thu Feb 27 21:11:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410149 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7936D138D for ; Thu, 27 Feb 2020 21:31:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 61AF3246A2 for ; Thu, 27 Feb 2020 21:31:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61AF3246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 09342349816; Thu, 27 Feb 2020 13:26:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3297621FD01 for ; Thu, 27 Feb 2020 13:19:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2D74E2C65; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2C0FB46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:51 -0500 Message-Id: <1582838290-17243-244-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 243/622] lustre: ptlrpc: manage SELinux policy info at connect time X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson At connect time, compute SELinux policy info on client side, and send it over the wire. On server side, get SELinux policy info from nodemap and compare it with the one received from client. WC-bug-id: https://jira.whamcloud.com/browse/LU-8955 Lustre-commit: dd200e5530fd ("LU-8955 ptlrpc: manage SELinux policy info at connect time") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/24422 Reviewed-by: Patrick Farrell Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 1 + fs/lustre/llite/llite_lib.c | 4 ++++ fs/lustre/ptlrpc/import.c | 16 +++++++++++++++- fs/lustre/ptlrpc/layout.c | 7 ++++++- 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 36656c6..9b618fe 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -269,6 +269,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_msg_field RMF_HSM_STATE_SET; extern struct req_msg_field RMF_MDS_HSM_CURRENT_ACTION; extern struct req_msg_field RMF_MDS_HSM_REQUEST; +extern struct req_msg_field RMF_SELINUX_POL; /* seq-mgr fields */ extern struct req_msg_field RMF_SEQ_OPC; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 4d41981a..10d9180 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -256,6 +256,10 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) obd_connect_set_secctx(data); +#if defined(CONFIG_SECURITY) + data->ocd_connect_flags2 |= OBD_CONNECT2_SELINUX_POLICY; +#endif + data->ocd_brw_size = MD_MAX_BRW_SIZE; err = obd_connect(NULL, &sbi->ll_md_exp, sbi->ll_md_obd, diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 34a2cb0..39d9e3e 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -606,7 +606,8 @@ int ptlrpc_connect_import(struct obd_import *imp) obd2cli_tgt(imp->imp_obd), obd->obd_uuid.uuid, (char *)&imp->imp_dlm_handle, - (char *)&imp->imp_connect_data }; + (char *)&imp->imp_connect_data, + NULL }; struct ptlrpc_connect_async_args *aa; int rc; @@ -670,6 +671,19 @@ int ptlrpc_connect_import(struct obd_import *imp) goto out; } + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(request); + if (rc < 0) { + ptlrpc_request_free(request); + goto out; + } + + bufs[5] = request->rq_sepol; + + req_capsule_set_size(&request->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(request->rq_sepol) ? + strlen(request->rq_sepol) + 1 : 0); + rc = ptlrpc_request_bufs_pack(request, LUSTRE_OBD_VERSION, imp->imp_connect_op, bufs, NULL); if (rc) { diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 1dd18b9..f80c627 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -315,7 +315,8 @@ &RMF_TGTUUID, &RMF_CLUUID, &RMF_CONN, - &RMF_CONNECT_DATA + &RMF_CONNECT_DATA, + &RMF_SELINUX_POL, }; static const struct req_msg_field *obd_connect_server[] = { @@ -1039,6 +1040,10 @@ struct req_msg_field RMF_LAYOUT_INTENT = NULL); EXPORT_SYMBOL(RMF_LAYOUT_INTENT); +struct req_msg_field RMF_SELINUX_POL = + DEFINE_MSGF("selinux_pol", RMF_F_STRING, -1, NULL, NULL); +EXPORT_SYMBOL(RMF_SELINUX_POL); + /* * OST request field. */ From patchwork Thu Feb 27 21:11:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410065 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40026138D for ; Thu, 27 Feb 2020 21:29:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 28B89246A1 for ; Thu, 27 Feb 2020 21:29:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 28B89246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D5E54349431; Thu, 27 Feb 2020 13:25:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8AD2621FD01 for ; Thu, 27 Feb 2020 13:19:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 30B362C66; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2EF6A46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:52 -0500 Message-Id: <1582838290-17243-245-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 244/622] lustre: ptlrpc: manage SELinux policy info for metadata ops X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Add SELinux policy info for following metedata operations: - create - open - unlink - rename - getxattr - setxattr - setattr - getattr - symlink - hardlink On server side, get SELinux policy info from nodemap and compare it with the one received from client. WC-bug-id: https://jira.whamcloud.com/browse/LU-8955 Lustre-commit: 0a773f04b288 ("LU-8955 ptlrpc: manage SELinux policy info for metadata ops") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/24424 Reviewed-by: Patrick Farrell Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 2 +- fs/lustre/mdc/mdc_internal.h | 1 + fs/lustre/mdc/mdc_lib.c | 31 +++++++++++++++++++++++++++ fs/lustre/mdc/mdc_locks.c | 23 ++++++++++++++++++++ fs/lustre/mdc/mdc_reint.c | 40 +++++++++++++++++++++++++++++++++++ fs/lustre/mdc/mdc_request.c | 17 ++++++++++++--- fs/lustre/ptlrpc/layout.c | 32 +++++++++++++++++++--------- 7 files changed, 132 insertions(+), 14 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 9b618fe..378f0b6 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -60,7 +60,7 @@ enum req_location { }; /* Maximal number of fields (buffers) in a request message. */ -#define REQ_MAX_FIELD_NR 10 +#define REQ_MAX_FIELD_NR 11 struct req_capsule { struct ptlrpc_request *rc_req; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index a5fe164..f75498a 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -57,6 +57,7 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data, void mdc_file_secctx_pack(struct ptlrpc_request *req, const char *secctx_name, const void *secctx, size_t secctx_size); +void mdc_file_sepol_pack(struct ptlrpc_request *req); void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data); void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data); diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 00a6be4..980676a 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -138,6 +138,22 @@ void mdc_file_secctx_pack(struct ptlrpc_request *req, const char *secctx_name, memcpy(buf, secctx, buf_size); } +void mdc_file_sepol_pack(struct ptlrpc_request *req) +{ + void *buf; + size_t buf_size; + + if (strlen(req->rq_sepol) == 0) + return; + + buf = req_capsule_client_get(&req->rq_pill, &RMF_SELINUX_POL); + buf_size = req_capsule_get_size(&req->rq_pill, &RMF_SELINUX_POL, + RCL_CLIENT); + + LASSERT(buf_size == strlen(req->rq_sepol) + 1); + snprintf(buf, strlen(req->rq_sepol) + 1, "%s", req->rq_sepol); +} + void mdc_readdir_pack(struct ptlrpc_request *req, u64 pgoff, size_t size, const struct lu_fid *fid) { @@ -192,6 +208,9 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data, mdc_file_secctx_pack(req, op_data->op_file_secctx_name, op_data->op_file_secctx, op_data->op_file_secctx_size); + + /* pack SELinux policy info if any */ + mdc_file_sepol_pack(req); } static inline u64 mds_pack_open_flags(u64 flags) @@ -266,6 +285,9 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data, mdc_file_secctx_pack(req, op_data->op_file_secctx_name, op_data->op_file_secctx, op_data->op_file_secctx_size); + + /* pack SELinux policy info if any */ + mdc_file_sepol_pack(req); } if (lmm) { @@ -412,6 +434,9 @@ void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data) rec->ul_bias = op_data->op_bias; mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen); + + /* pack SELinux policy info if any */ + mdc_file_sepol_pack(req); } void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data) @@ -434,6 +459,9 @@ void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data) rec->lk_bias = op_data->op_bias; mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen); + + /* pack SELinux policy info if any */ + mdc_file_sepol_pack(req); } static void mdc_close_intent_pack(struct ptlrpc_request *req, @@ -505,6 +533,9 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data, if (new) mdc_pack_name(req, &RMF_SYMTGT, new, newlen); + + /* pack SELinux policy info if any */ + mdc_file_sepol_pack(req); } void mdc_migrate_pack(struct ptlrpc_request *req, struct md_op_data *op_data, diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 6f4baa6..05447ea 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -315,6 +315,16 @@ static int mdc_save_lovea(struct ptlrpc_request *req, req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, RCL_CLIENT, op_data->op_file_secctx_size); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return ERR_PTR(rc); + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = ldlm_prep_enqueue_req(exp, req, &cancels, count); if (rc < 0) { ptlrpc_request_free(req); @@ -422,6 +432,16 @@ static int mdc_save_lovea(struct ptlrpc_request *req, if (!req) return ERR_PTR(-ENOMEM); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return ERR_PTR(rc); + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = ldlm_prep_enqueue_req(exp, req, &cancels, count); if (rc) { ptlrpc_request_free(req); @@ -452,6 +472,9 @@ static int mdc_save_lovea(struct ptlrpc_request *req, mdc_pack_body(req, &op_data->op_fid1, op_data->op_valid, ea_vals_buf_size, -1, 0); + /* get SELinux policy info if any */ + mdc_file_sepol_pack(req); + req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_SERVER, GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM); diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 0e5f012..86acb4e 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -197,6 +197,16 @@ int mdc_create(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, RCL_CLIENT, op_data->op_file_secctx_size); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return rc; + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { ptlrpc_request_free(req); @@ -286,6 +296,16 @@ int mdc_unlink(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, op_data->op_namelen + 1); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return rc; + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { ptlrpc_request_free(req); @@ -332,6 +352,16 @@ int mdc_link(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, op_data->op_namelen + 1); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return rc; + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { ptlrpc_request_free(req); @@ -394,6 +424,16 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, op_data->op_data_size); + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return rc; + } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); + rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count); if (rc) { ptlrpc_request_free(req); diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 88e790f0..80e58c8 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -328,11 +328,20 @@ static int mdc_xattr_common(struct obd_export *exp, req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT, xattr_namelen); } - if (input_size) { + if (input_size) LASSERT(input); - req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, - input_size); + req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, + input_size); + + /* get SELinux policy info if any */ + rc = sptlrpc_get_sepol(req); + if (rc < 0) { + ptlrpc_request_free(req); + return rc; } + req_capsule_set_size(&req->rq_pill, &RMF_SELINUX_POL, RCL_CLIENT, + strlen(req->rq_sepol) ? + strlen(req->rq_sepol) + 1 : 0); /* Flush local XATTR locks to get rid of a possible cancel RPC */ if (opcode == MDS_REINT && fid_is_sane(fid) && @@ -393,6 +402,8 @@ static int mdc_xattr_common(struct obd_export *exp, memcpy(tmp, input, input_size); } + mdc_file_sepol_pack(req); + if (req_capsule_has_field(&req->rq_pill, &RMF_EADATA, RCL_SERVER)) req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_SERVER, output_size); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index f80c627..9a676ae 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -193,7 +193,8 @@ &RMF_EADATA, &RMF_DLM_REQ, &RMF_FILE_SECCTX_NAME, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_create_sym_client[] = { @@ -204,7 +205,8 @@ &RMF_SYMTGT, &RMF_DLM_REQ, &RMF_FILE_SECCTX_NAME, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_open_client[] = { @@ -215,7 +217,8 @@ &RMF_NAME, &RMF_EADATA, &RMF_FILE_SECCTX_NAME, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_open_server[] = { @@ -232,7 +235,8 @@ &RMF_REC_REINT, &RMF_CAPA1, &RMF_NAME, - &RMF_DLM_REQ + &RMF_DLM_REQ, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_link_client[] = { @@ -241,7 +245,8 @@ &RMF_CAPA1, &RMF_CAPA2, &RMF_NAME, - &RMF_DLM_REQ + &RMF_DLM_REQ, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_rename_client[] = { @@ -251,7 +256,8 @@ &RMF_CAPA2, &RMF_NAME, &RMF_SYMTGT, - &RMF_DLM_REQ + &RMF_DLM_REQ, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_migrate_client[] = { @@ -262,6 +268,7 @@ &RMF_NAME, &RMF_SYMTGT, &RMF_DLM_REQ, + &RMF_SELINUX_POL, &RMF_MDT_EPOCH, &RMF_CLOSE_DATA, &RMF_EADATA @@ -292,7 +299,8 @@ &RMF_CAPA1, &RMF_NAME, &RMF_EADATA, - &RMF_DLM_REQ + &RMF_DLM_REQ, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_reint_resync[] = { @@ -450,7 +458,8 @@ &RMF_NAME, &RMF_EADATA, &RMF_FILE_SECCTX_NAME, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_SELINUX_POL }; static const struct req_msg_field *ldlm_intent_open_client[] = { @@ -463,7 +472,8 @@ &RMF_NAME, &RMF_EADATA, &RMF_FILE_SECCTX_NAME, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_SELINUX_POL }; static const struct req_msg_field *ldlm_intent_getxattr_client[] = { @@ -472,6 +482,7 @@ &RMF_LDLM_INTENT, &RMF_MDT_BODY, &RMF_CAPA1, + &RMF_SELINUX_POL }; static const struct req_msg_field *ldlm_intent_getxattr_server[] = { @@ -496,7 +507,8 @@ &RMF_MDT_BODY, &RMF_CAPA1, &RMF_NAME, - &RMF_EADATA + &RMF_EADATA, + &RMF_SELINUX_POL }; static const struct req_msg_field *mds_getxattr_server[] = { From patchwork Thu Feb 27 21:11:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410075 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF494138D for ; Thu, 27 Feb 2020 21:29:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8480246A0 for ; Thu, 27 Feb 2020 21:29:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8480246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69976349494; Thu, 27 Feb 2020 13:25:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0C1021FD01 for ; Thu, 27 Feb 2020 13:19:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 334B72C67; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 320F346F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:53 -0500 Message-Id: <1582838290-17243-246-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 245/622] lustre: obd: make health_check sysfs compliant X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The patch http://review.whamcloud.com/16721 was ported to the upstream client but was rejected since it violating the sysfs one item rule. Change the reporting of LBUG plus unhealthy to just reporting LBUG. Move the reporting of which device is unhealthy to a new debugfs file that mirrors the sysfs file. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: 5d368bd0b203 ("LU-8066 obd: make health_check sysfs compliant") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/25631 Reviewed-by: Andreas Dilger Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_sysfs.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c index 73e44e7..ca15936 100644 --- a/fs/lustre/obdclass/obd_sysfs.c +++ b/fs/lustre/obdclass/obd_sysfs.c @@ -194,8 +194,12 @@ static ssize_t pinger_show(struct kobject *kobj, struct attribute *attr, if (obd_health_check(NULL, obd)) healthy = false; + class_decref(obd, __func__, current); read_lock(&obd_dev_lock); + + if (!healthy) + break; } read_unlock(&obd_dev_lock); @@ -363,6 +367,40 @@ static int obd_device_list_open(struct inode *inode, struct file *file) .release = seq_release, }; +static int +health_check_seq_show(struct seq_file *m, void *unused) +{ + int i; + + read_lock(&obd_dev_lock); + for (i = 0; i < class_devno_max(); i++) { + struct obd_device *obd; + + obd = class_num2obd(i); + if (!obd || !obd->obd_attached || !obd->obd_set_up) + continue; + + LASSERT(obd->obd_magic == OBD_DEVICE_MAGIC); + if (obd->obd_stopping) + continue; + + class_incref(obd, __func__, current); + read_unlock(&obd_dev_lock); + + if (obd_health_check(NULL, obd)) { + seq_printf(m, "device %s reported unhealthy\n", + obd->obd_name); + } + class_decref(obd, __func__, current); + read_lock(&obd_dev_lock); + } + read_unlock(&obd_dev_lock); + + return 0; +} + +LPROC_SEQ_FOPS_RO(health_check); + struct kset *lustre_kset; EXPORT_SYMBOL_GPL(lustre_kset); @@ -407,6 +445,9 @@ int class_procfs_init(void) debugfs_create_file("devices", 0444, debugfs_lustre_root, NULL, &obd_device_list_fops); + + debugfs_create_file("health_check", 0444, debugfs_lustre_root, + NULL, &health_check_fops); out: return rc; } From patchwork Thu Feb 27 21:11:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410069 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FDC31580 for ; Thu, 27 Feb 2020 21:29:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EC5EB246A0 for ; Thu, 27 Feb 2020 21:29:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC5EB246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 61EA9348D48; Thu, 27 Feb 2020 13:25:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2FD4221FD0F for ; Thu, 27 Feb 2020 13:19:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 36A7C2C68; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3518546A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:54 -0500 Message-Id: <1582838290-17243-247-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 246/622] lustre: misc: delete OBD_IOC_PING_TARGET ioctl X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The OBD_IOC_PING_TARGET ioctl was removed from tool usage in Lustre v2_5_60_0-27-g122aadd and replaced with a sysfs interface. It is no longer needed and can be removed. WC-bug-id: https://jira.whamcloud.com/browse/LU-6202 Lustre-commit: d17d6ef74e52 ("LU-6202 misc: delete OBD_IOC_PING_TARGET ioctl") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33691 Reviewed-by: James Simmons Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 4 +--- fs/lustre/obdclass/class_obd.c | 4 ++-- fs/lustre/osc/osc_request.c | 25 +++++++++++-------------- include/uapi/linux/lustre/lustre_ioctl.h | 2 +- 4 files changed, 15 insertions(+), 20 deletions(-) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 80e58c8..f197abc 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2114,9 +2114,7 @@ static int mdc_iocontrol(unsigned int cmd, struct obd_export *exp, int len, case IOC_OSC_SET_ACTIVE: rc = ptlrpc_set_import_active(imp, data->ioc_offset); goto out; - case OBD_IOC_PING_TARGET: - rc = ptlrpc_obd_ping(obd); - goto out; + /* * Normally IOC_OBD_STATFS, OBD_IOC_QUOTACTL iocontrol are handled by * LMV instead of MDC. But when the cluster is upgraded from 1.8, diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 0435f62..373a8d2 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -510,8 +510,8 @@ int class_handle_ioctl(unsigned int cmd, unsigned long arg) static long obd_class_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { - /* Allow non-root access for OBD_IOC_PING_TARGET - used by lfs check */ - if (!capable(CAP_SYS_ADMIN) && (cmd != OBD_IOC_PING_TARGET)) + /* Allow non-root access for some limited ioctls */ + if (!capable(CAP_SYS_ADMIN)) return -EACCES; if ((cmd & 0xffffff00) == ((int)'T') << 8) /* ignore all tty ioctls */ diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 1fc50cc..7a99ef2 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2840,7 +2840,7 @@ static int osc_iocontrol(unsigned int cmd, struct obd_export *exp, int len, { struct obd_device *obd = exp->exp_obd; struct obd_ioctl_data *data = karg; - int err = 0; + int rc = 0; if (!try_module_get(THIS_MODULE)) { CERROR("%s: cannot get module '%s'\n", obd->obd_name, @@ -2849,27 +2849,24 @@ static int osc_iocontrol(unsigned int cmd, struct obd_export *exp, int len, } switch (cmd) { case OBD_IOC_CLIENT_RECOVER: - err = ptlrpc_recover_import(obd->u.cli.cl_import, - data->ioc_inlbuf1, 0); - if (err > 0) - err = 0; + rc = ptlrpc_recover_import(obd->u.cli.cl_import, + data->ioc_inlbuf1, 0); + if (rc > 0) + rc = 0; goto out; case IOC_OSC_SET_ACTIVE: - err = ptlrpc_set_import_active(obd->u.cli.cl_import, - data->ioc_offset); - goto out; - case OBD_IOC_PING_TARGET: - err = ptlrpc_obd_ping(obd); + rc = ptlrpc_set_import_active(obd->u.cli.cl_import, + data->ioc_offset); goto out; default: - CDEBUG(D_INODE, "unrecognised ioctl %#x by %s\n", - cmd, current->comm); - err = -ENOTTY; + CDEBUG(D_INODE, "%s: unrecognised ioctl %#x by %s\n", + obd->obd_name, cmd, current->comm); + rc = -ENOTTY; goto out; } out: module_put(THIS_MODULE); - return err; + return rc; } int osc_set_info_async(const struct lu_env *env, struct obd_export *exp, diff --git a/include/uapi/linux/lustre/lustre_ioctl.h b/include/uapi/linux/lustre/lustre_ioctl.h index 8289d43..30eb120 100644 --- a/include/uapi/linux/lustre/lustre_ioctl.h +++ b/include/uapi/linux/lustre/lustre_ioctl.h @@ -162,7 +162,7 @@ static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data) #define OBD_IOC_GETDTNAME OBD_IOC_GETNAME #define OBD_IOC_LOV_GET_CONFIG _IOWR('f', 132, OBD_IOC_DATA_TYPE) #define OBD_IOC_CLIENT_RECOVER _IOW('f', 133, OBD_IOC_DATA_TYPE) -#define OBD_IOC_PING_TARGET _IOW('f', 136, OBD_IOC_DATA_TYPE) +/* was OBD_IOC_PING_TARGET _IOW('f', 136, OBD_IOC_DATA_TYPE) until 2.11 */ /* OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */ #define OBD_IOC_NO_TRANSNO _IOW('f', 140, OBD_IOC_DATA_TYPE) From patchwork Thu Feb 27 21:11:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410315 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5B13138D for ; Thu, 27 Feb 2020 21:34:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BE66B24677 for ; Thu, 27 Feb 2020 21:34:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE66B24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 55E23349FCB; Thu, 27 Feb 2020 13:29:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89FA121FD0F for ; Thu, 27 Feb 2020 13:19:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 392722C69; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 38117468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:55 -0500 Message-Id: <1582838290-17243-248-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 247/622] lustre: misc: remove LIBCFS_IOC_DEBUG_MASK ioctl X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the LIBCFS_IOC_DEBUG_MASK ioctl, since the debug and subsystem mask can be modified via sysfs for a long time, and tools have not used this ioctl since 2.6. WC-bug-id: https://jira.whamcloud.com/browse/LU-6202 Lustre-commit: 70f932c7bfc5 ("LU-6202 misc: remove LIBCFS_IOC_DEBUG_MASK ioctl") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/33692 Reviewed-by: Patrick Farrell Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/class_obd.c | 9 --------- include/uapi/linux/lnet/libcfs_ioctl.h | 8 -------- include/uapi/linux/lustre/lustre_ioctl.h | 2 +- 3 files changed, 1 insertion(+), 18 deletions(-) diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 373a8d2..609b4cc 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -274,18 +274,9 @@ int obd_ioctl_getdata(struct obd_ioctl_data **datap, int *len, void __user *arg) int class_handle_ioctl(unsigned int cmd, unsigned long arg) { struct obd_ioctl_data *data; - struct libcfs_debug_ioctl_data *debug_data; struct obd_device *obd = NULL; int err = 0, len = 0; - /* only for debugging */ - if (cmd == LIBCFS_IOC_DEBUG_MASK) { - debug_data = (struct libcfs_debug_ioctl_data *)arg; - libcfs_subsystem_debug = debug_data->subs; - libcfs_debug = debug_data->debug; - return 0; - } - CDEBUG(D_IOCTL, "cmd = %x\n", cmd); if (obd_ioctl_getdata(&data, &len, (void __user *)arg)) { CERROR("OBD ioctl: data error\n"); diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index dfb73f7..455ed78 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -77,14 +77,6 @@ struct libcfs_ioctl_data { char ioc_bulk[0]; }; -struct libcfs_debug_ioctl_data { - struct libcfs_ioctl_hdr hdr; - unsigned int subs; - unsigned int debug; -}; - -/* 'f' ioctls are defined in lustre_ioctl.h and lustre_user.h except for: */ -#define LIBCFS_IOC_DEBUG_MASK _IOWR('f', 250, long) #define IOCTL_LIBCFS_TYPE long #define IOC_LIBCFS_TYPE ('e') diff --git a/include/uapi/linux/lustre/lustre_ioctl.h b/include/uapi/linux/lustre/lustre_ioctl.h index 30eb120..b067cc6 100644 --- a/include/uapi/linux/lustre/lustre_ioctl.h +++ b/include/uapi/linux/lustre/lustre_ioctl.h @@ -222,7 +222,7 @@ static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data) #define OBD_IOC_STOP_LFSCK _IOW('f', 231, OBD_IOC_DATA_TYPE) #define OBD_IOC_QUERY_LFSCK _IOR('f', 232, struct obd_ioctl_data) /* lustre/lustre_user.h 240-249 */ -/* LIBCFS_IOC_DEBUG_MASK 250 */ +/* was LIBCFS_IOC_DEBUG_MASK _IOWR('f', 250, long) until 2.11 */ #define IOC_OSC_SET_ACTIVE _IOWR('h', 21, void *) From patchwork Thu Feb 27 21:11:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410079 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95E2317E0 for ; Thu, 27 Feb 2020 21:29:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7E7EA246A1 for ; Thu, 27 Feb 2020 21:29:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E7EA246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8344D3494C1; Thu, 27 Feb 2020 13:25:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE0B521FD16 for ; Thu, 27 Feb 2020 13:19:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3C8212C6A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3AFE746C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:56 -0500 Message-Id: <1582838290-17243-249-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 248/622] lustre: llite: add file heat support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi File heat is a special attribute fo files/objects which reflects the access frequency of the files/objects. File heat is mainly desinged for cache management. Caches like PCC can use file heat to determine which files to be removed from the cache or which files to fetch into cache. This patch adds file heat support on llite level. WC-bug-id: https://jira.whamcloud.com/browse/LU-10602 Lustre-commit: ae723cf8161f ("LU-10602 llite: add file heat support") Signed-off-by: Li Xi Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34399 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 11 ++++ fs/lustre/include/obd_support.h | 6 ++ fs/lustre/llite/file.c | 104 ++++++++++++++++++++++++++++++- fs/lustre/llite/llite_internal.h | 20 +++++- fs/lustre/llite/llite_lib.c | 6 ++ fs/lustre/llite/lproc_llite.c | 106 ++++++++++++++++++++++++++++++++ fs/lustre/obdclass/class_obd.c | 73 ++++++++++++++++++++++ include/uapi/linux/lustre/lustre_user.h | 32 ++++++++++ 8 files changed, 356 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 6a4b6a5..6cddc4f 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1710,4 +1710,15 @@ struct root_squash_info { struct obd_ioctl_data; int obd_ioctl_getdata(struct obd_ioctl_data **data, int *len, void __user *arg); +extern void obd_heat_add(struct obd_heat_instance *instance, + unsigned int time_second, u64 count, + unsigned int weight, unsigned int period_second); +extern void obd_heat_decay(struct obd_heat_instance *instance, + u64 time_second, unsigned int weight, + unsigned int period_second); +extern u64 obd_heat_get(struct obd_heat_instance *instance, + unsigned int time_second, unsigned int weight, + unsigned int period_second); +extern void obd_heat_clear(struct obd_heat_instance *instance, int count); + #endif /* __LINUX_OBD_CLASS_H */ diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index a60fa07..36955e8 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -536,4 +536,10 @@ (keylen >= (sizeof(str) - 1) && \ memcmp(key, str, (sizeof(str) - 1)) == 0) +struct obd_heat_instance { + u64 ohi_heat; + u64 ohi_time_second; + u64 ohi_count; +}; + #endif diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 7ec1099..f5b5eec 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1399,6 +1399,37 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write) ll_io_set_mirror(io, file); } +static void ll_heat_add(struct inode *inode, enum cl_io_type iot, + u64 count) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct ll_sb_info *sbi = ll_i2sbi(inode); + enum obd_heat_type sample_type; + enum obd_heat_type iobyte_type; + u64 now = ktime_get_real_seconds(); + + if (!ll_sbi_has_file_heat(sbi) || + lli->lli_heat_flags & LU_HEAT_FLAG_OFF) + return; + + if (iot == CIT_READ) { + sample_type = OBD_HEAT_READSAMPLE; + iobyte_type = OBD_HEAT_READBYTE; + } else if (iot == CIT_WRITE) { + sample_type = OBD_HEAT_WRITESAMPLE; + iobyte_type = OBD_HEAT_WRITEBYTE; + } else { + return; + } + + spin_lock(&lli->lli_heat_lock); + obd_heat_add(&lli->lli_heat_instances[sample_type], now, 1, + sbi->ll_heat_decay_weight, sbi->ll_heat_period_second); + obd_heat_add(&lli->lli_heat_instances[iobyte_type], now, count, + sbi->ll_heat_decay_weight, sbi->ll_heat_period_second); + spin_unlock(&lli->lli_heat_lock); +} + static ssize_t ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args, struct file *file, enum cl_io_type iot, @@ -1512,6 +1543,8 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write) } } CDEBUG(D_VFSTRACE, "iot: %d, result: %zd\n", iot, result); + if (result > 0) + ll_heat_add(file_inode(file), iot, result); return result > 0 ? result : rc; } @@ -1575,9 +1608,11 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write) if (result == -ENODATA) result = 0; - if (result > 0) + if (result > 0) { + ll_heat_add(file_inode(iocb->ki_filp), CIT_READ, result); ll_stats_ops_tally(ll_i2sbi(file_inode(iocb->ki_filp)), LPROC_LL_READ_BYTES, result); + } return result; } @@ -1660,6 +1695,7 @@ static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter) result = 0; if (result > 0) { + ll_heat_add(inode, CIT_WRITE, result); ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_WRITE_BYTES, result); set_bit(LLIF_DATA_MODIFIED, &ll_i2info(inode)->lli_flags); @@ -3128,6 +3164,41 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, return rc; } +static void ll_heat_get(struct inode *inode, struct lu_heat *heat) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct ll_sb_info *sbi = ll_i2sbi(inode); + u64 now = ktime_get_real_seconds(); + int i; + + spin_lock(&lli->lli_heat_lock); + heat->lh_flags = lli->lli_heat_flags; + for (i = 0; i < heat->lh_count; i++) + heat->lh_heat[i] = obd_heat_get(&lli->lli_heat_instances[i], + now, sbi->ll_heat_decay_weight, + sbi->ll_heat_period_second); + spin_unlock(&lli->lli_heat_lock); +} + +static int ll_heat_set(struct inode *inode, u64 flags) +{ + struct ll_inode_info *lli = ll_i2info(inode); + int rc = 0; + + spin_lock(&lli->lli_heat_lock); + if (flags & LU_HEAT_FLAG_CLEAR) + obd_heat_clear(lli->lli_heat_instances, OBD_HEAT_COUNT); + + if (flags & LU_HEAT_FLAG_OFF) + lli->lli_heat_flags |= LU_HEAT_FLAG_OFF; + else + lli->lli_heat_flags &= ~LU_HEAT_FLAG_OFF; + + spin_unlock(&lli->lli_heat_lock); + + return rc; +} + static long ll_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -3510,6 +3581,37 @@ static long ll_file_set_lease(struct file *file, struct ll_ioc_lease *ioc, return ll_ioctl_fssetxattr(inode, cmd, arg); case BLKSSZGET: return put_user(PAGE_SIZE, (int __user *)arg); + case LL_IOC_HEAT_GET: { + struct lu_heat uheat; + struct lu_heat *heat; + int size; + + if (copy_from_user(&uheat, (void __user *)arg, sizeof(uheat))) + return -EFAULT; + + if (uheat.lh_count > OBD_HEAT_COUNT) + uheat.lh_count = OBD_HEAT_COUNT; + + size = offsetof(typeof(uheat), lh_heat[uheat.lh_count]); + heat = kzalloc(size, GFP_KERNEL); + if (!heat) + return -ENOMEM; + + heat->lh_count = uheat.lh_count; + ll_heat_get(inode, heat); + rc = copy_to_user((char __user *)arg, heat, size); + kfree(heat); + return rc ? -EFAULT : 0; + } + case LL_IOC_HEAT_SET: { + u64 flags; + + if (copy_from_user(&flags, (void __user *)arg, sizeof(flags))) + return -EFAULT; + + rc = ll_heat_set(inode, flags); + return rc; + } default: return obd_iocontrol(cmd, ll_i2dtexp(inode), 0, NULL, (void __user *)arg); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 3c81c3b..5a0a5ed 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -196,6 +196,11 @@ struct ll_inode_info { /* for writepage() only to communicate to fsync */ int lli_async_rc; + /* protect the file heat fields */ + spinlock_t lli_heat_lock; + u32 lli_heat_flags; + struct obd_heat_instance lli_heat_instances[OBD_HEAT_COUNT]; + /* * Whenever a process try to read/write the file, the * jobid of the process will be saved here, and it'll @@ -418,7 +423,7 @@ enum stats_track_type { * create */ #define LL_SBI_TINY_WRITE 0x2000000 /* tiny write support */ - +#define LL_SBI_FILE_HEAT 0x4000000 /* file heat support */ #define LL_SBI_FLAGS { \ "nolck", \ "checksum", \ @@ -446,6 +451,7 @@ enum stats_track_type { "file_secctx", \ "pio", \ "tiny_write", \ + "file_heat", \ } /* @@ -546,8 +552,15 @@ struct ll_sb_info { struct kset ll_kset; /* sysfs object */ struct completion ll_kobj_unregister; + + /* File heat */ + unsigned int ll_heat_decay_weight; + unsigned int ll_heat_period_second; }; +#define SBI_DEFAULT_HEAT_DECAY_WEIGHT ((80 * 256 + 50) / 100) +#define SBI_DEFAULT_HEAT_PERIOD_SECOND (60) + /* * per file-descriptor read-ahead data. */ @@ -710,6 +723,11 @@ static inline bool ll_sbi_has_tiny_write(struct ll_sb_info *sbi) return !!(sbi->ll_flags & LL_SBI_TINY_WRITE); } +static inline bool ll_sbi_has_file_heat(struct ll_sb_info *sbi) +{ + return !!(sbi->ll_flags & LL_SBI_FILE_HEAT); +} + void ll_ras_enter(struct file *f); /* llite/lcommon_misc.c */ diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 10d9180..795a1f1 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -133,6 +133,9 @@ static struct ll_sb_info *ll_init_sbi(void) INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids); spin_lock_init(&sbi->ll_squash.rsi_lock); + /* Per-filesystem file heat */ + sbi->ll_heat_decay_weight = SBI_DEFAULT_HEAT_DECAY_WEIGHT; + sbi->ll_heat_period_second = SBI_DEFAULT_HEAT_PERIOD_SECOND; return sbi; } @@ -949,6 +952,9 @@ void ll_lli_init(struct ll_inode_info *lli) INIT_LIST_HEAD(&lli->lli_agl_list); lli->lli_agl_index = 0; lli->lli_async_rc = 0; + spin_lock_init(&lli->lli_heat_lock); + obd_heat_clear(lli->lli_heat_instances, OBD_HEAT_COUNT); + lli->lli_heat_flags = 0; } mutex_init(&lli->lli_layout_mutex); memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 4060271..596aad8 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1096,6 +1096,109 @@ static ssize_t fast_read_store(struct kobject *kobj, } LUSTRE_RW_ATTR(fast_read); +static ssize_t file_heat_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", + !!(sbi->ll_flags & LL_SBI_FILE_HEAT)); +} + +static ssize_t file_heat_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + bool val; + int rc; + + rc = kstrtobool(buffer, &val); + if (rc) + return rc; + + spin_lock(&sbi->ll_lock); + if (val) + sbi->ll_flags |= LL_SBI_FILE_HEAT; + else + sbi->ll_flags &= ~LL_SBI_FILE_HEAT; + spin_unlock(&sbi->ll_lock); + + return count; +} +LUSTRE_RW_ATTR(file_heat); + +static ssize_t heat_decay_percentage_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", + (sbi->ll_heat_decay_weight * 100 + 128) / 256); +} + +static ssize_t heat_decay_percentage_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + unsigned long val; + int rc; + + rc = kstrtoul(buffer, 10, &val); + if (rc) + return rc; + + if (val < 0 || val > 100) + return -ERANGE; + + sbi->ll_heat_decay_weight = (val * 256 + 50) / 100; + + return count; +} +LUSTRE_RW_ATTR(heat_decay_percentage); + +static ssize_t heat_period_second_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_heat_period_second); +} + +static ssize_t heat_period_second_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + unsigned long val; + int rc; + + rc = kstrtoul(buffer, 10, &val); + if (rc) + return rc; + + if (val <= 0) + return -ERANGE; + + sbi->ll_heat_period_second = val; + + return count; +} +LUSTRE_RW_ATTR(heat_period_second); + static int ll_unstable_stats_seq_show(struct seq_file *m, void *v) { struct super_block *sb = m->private; @@ -1264,6 +1367,9 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, &lustre_attr_xattr_cache.attr, &lustre_attr_fast_read.attr, &lustre_attr_tiny_write.attr, + &lustre_attr_file_heat.attr, + &lustre_attr_heat_decay_percentage.attr, + &lustre_attr_heat_period_second.attr, NULL, }; diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 609b4cc..0718fdb 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -706,6 +706,79 @@ static void obdclass_exit(void) obd_zombie_impexp_stop(); } +void obd_heat_clear(struct obd_heat_instance *instance, int count) +{ + memset(instance, 0, sizeof(*instance) * count); +} +EXPORT_SYMBOL(obd_heat_clear); + +/* + * The file heat is calculated for every time interval period I. The access + * frequency during each period is counted. The file heat is only recalculated + * at the end of a time period. And a percentage of the former file heat is + * lost when recalculated. The recursion formula to calculate the heat of the + * file f is as follow: + * + * Hi+1(f) = (1-P)*Hi(f)+ P*Ci + * + * Where Hi is the heat value in the period between time points i*I and + * (i+1)*I; Ci is the access count in the period; the symbol P refers to the + * weight of Ci. The larger the value the value of P is, the more influence Ci + * has on the file heat. + */ +void obd_heat_decay(struct obd_heat_instance *instance, u64 time_second, + unsigned int weight, unsigned int period_second) +{ + u64 second; + + if (instance->ohi_time_second > time_second) { + obd_heat_clear(instance, 1); + return; + } + + if (instance->ohi_time_second == 0) + return; + + for (second = instance->ohi_time_second + period_second; + second < time_second; + second += period_second) { + instance->ohi_heat = instance->ohi_heat * + (256 - weight) / 256 + + instance->ohi_count * weight / 256; + instance->ohi_count = 0; + instance->ohi_time_second = second; + } +} +EXPORT_SYMBOL(obd_heat_decay); + +u64 obd_heat_get(struct obd_heat_instance *instance, unsigned int time_second, + unsigned int weight, unsigned int period_second) +{ + obd_heat_decay(instance, time_second, weight, period_second); + + if (instance->ohi_count == 0) + return instance->ohi_heat; + + return instance->ohi_heat * (256 - weight) / 256 + + instance->ohi_count * weight / 256; +} +EXPORT_SYMBOL(obd_heat_get); + +void obd_heat_add(struct obd_heat_instance *instance, + unsigned int time_second, u64 count, + unsigned int weight, unsigned int period_second) +{ + obd_heat_decay(instance, time_second, weight, period_second); + if (instance->ohi_time_second == 0) { + instance->ohi_time_second = time_second; + instance->ohi_heat = 0; + instance->ohi_count = count; + } else { + instance->ohi_count += count; + } +} +EXPORT_SYMBOL(obd_heat_add); + MODULE_AUTHOR("OpenSFS, Inc. "); MODULE_DESCRIPTION("Lustre Class Driver"); MODULE_VERSION(LUSTRE_VERSION_STRING); diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index c1e9dca..1d402f1 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -352,6 +352,8 @@ struct ll_ioc_lease_id { #define LL_IOC_FID2MDTIDX _IOWR('f', 248, struct lu_fid) #define LL_IOC_GETPARENT _IOWR('f', 249, struct getparent) #define LL_IOC_LADVISE _IOR('f', 250, struct llapi_lu_ladvise) +#define LL_IOC_HEAT_GET _IOWR('f', 251, struct lu_heat) +#define LL_IOC_HEAT_SET _IOW('f', 252, long) #define LL_STATFS_LMV 1 #define LL_STATFS_LOV 2 @@ -1957,6 +1959,36 @@ enum lockahead_results { LLA_RESULT_SAME, }; +enum lu_heat_flag_bit { + LU_HEAT_FLAG_BIT_INVALID = 0, + LU_HEAT_FLAG_BIT_OFF, + LU_HEAT_FLAG_BIT_CLEAR, +}; + +#define LU_HEAT_FLAG_CLEAR (1 << LU_HEAT_FLAG_BIT_CLEAR) +#define LU_HEAT_FLAG_OFF (1 << LU_HEAT_FLAG_BIT_OFF) + +enum obd_heat_type { + OBD_HEAT_READSAMPLE = 0, + OBD_HEAT_WRITESAMPLE = 1, + OBD_HEAT_READBYTE = 2, + OBD_HEAT_WRITEBYTE = 3, + OBD_HEAT_COUNT +}; + +#define LU_HEAT_NAMES { \ + [OBD_HEAT_READSAMPLE] = "readsample", \ + [OBD_HEAT_WRITESAMPLE] = "writesample", \ + [OBD_HEAT_READBYTE] = "readbyte", \ + [OBD_HEAT_WRITEBYTE] = "writebyte", \ +} + +struct lu_heat { + __u32 lh_count; + __u32 lh_flags; + __u64 lh_heat[0]; +}; + /** @} lustreuser */ #endif /* _LUSTRE_USER_H */ From patchwork Thu Feb 27 21:11:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410285 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B76592A for ; Thu, 27 Feb 2020 21:34:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0449A24677 for ; Thu, 27 Feb 2020 21:34:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0449A24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE835349598; Thu, 27 Feb 2020 13:28:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 313F521FD16 for ; Thu, 27 Feb 2020 13:19:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3F0112C6B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3E0A546D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:57 -0500 Message-Id: <1582838290-17243-250-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 249/622] lustre: obdclass: improve llog config record message X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Improve the config record message in class_config_parse_rec() by removing the newline and formating to match the other entires for the output dump buffer. WC-bug-id: https://jira.whamcloud.com/browse/LU-11566 Lustre-commit: 2ec11b04dd76 ("LU-11566 utils: improve usage/docs for lctl llog commands") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/34004 Reviewed-by: Joseph Gmitter Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c index 398f888..4b1848f 100644 --- a/fs/lustre/obdclass/obd_config.c +++ b/fs/lustre/obdclass/obd_config.c @@ -1561,7 +1561,7 @@ static int class_config_parse_rec(struct llog_rec_hdr *rec, char *buf, char nidstr[LNET_NIDSTR_SIZE]; libcfs_nid2str_r(lcfg->lcfg_nid, nidstr, sizeof(nidstr)); - ptr += snprintf(ptr, end - ptr, "nid=%s(%#llx)\n ", + ptr += snprintf(ptr, end - ptr, "nid=%s(%#llx) ", nidstr, lcfg->lcfg_nid); } From patchwork Thu Feb 27 21:11:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9BA8138D for ; Thu, 27 Feb 2020 21:34:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A297824677 for ; Thu, 27 Feb 2020 21:34:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A297824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3086B349E56; Thu, 27 Feb 2020 13:28:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 71EF321FD16 for ; Thu, 27 Feb 2020 13:19:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 435712C6C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 40F0446F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:58 -0500 Message-Id: <1582838290-17243-251-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 250/622] lustre: lov: remove KEY_CACHE_SET to simplify the code X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng We must invoke obd_set_info_async with KEY_CACHE_SET after obd_connect for OSC device. In fact, It can be combined in obd_connect to simplify the code. WC-bug-id: https://jira.whamcloud.com/browse/LU-12072 Lustre-commit: 6d21fbbf018b ("LU-12072 lov: remove KEY_CACHE_SET to simplify the code") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/34419 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 +- fs/lustre/ldlm/ldlm_lib.c | 14 +++++++++++++ fs/lustre/llite/llite_lib.c | 13 ++---------- fs/lustre/lmv/lmv_obd.c | 3 ++- fs/lustre/lov/lov_obd.c | 49 ++++++++++++++------------------------------- fs/lustre/osc/osc_request.c | 17 ---------------- 6 files changed, 34 insertions(+), 64 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 758efc1..2195f85 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -446,6 +446,7 @@ struct lmv_obd { struct lmv_tgt_desc **tgts; struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; + void *lmv_cache; }; struct niobuf_local { @@ -672,7 +673,6 @@ struct obd_device { /* KEY_SET_INFO in lustre_idl.h */ #define KEY_SPTLRPC_CONF "sptlrpc_conf" -#define KEY_CACHE_SET "cache_set" #define KEY_CACHE_LRU_SHRINK "cache_lru_shrink" /* Flags for op_xvalid */ diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 11955b1..4a982ab 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -40,6 +40,7 @@ #define DEBUG_SUBSYSTEM S_LDLM +#include #include #include #include @@ -579,6 +580,19 @@ int client_connect_import(const struct lu_env *env, out_sem: up_write(&cli->cl_sem); + if (!rc && localdata) { + LASSERT(!cli->cl_cache); /* only once */ + cli->cl_cache = (struct cl_client_cache *)localdata; + cl_cache_incref(cli->cl_cache); + cli->cl_lru_left = &cli->cl_cache->ccc_lru_left; + + /* add this osc into entity list */ + LASSERT(list_empty(&cli->cl_lru_osc)); + spin_lock(&cli->cl_cache->ccc_lru_lock); + list_add(&cli->cl_lru_osc, &cli->cl_cache->ccc_lru); + spin_unlock(&cli->cl_cache->ccc_lru_lock); + } + return rc; } EXPORT_SYMBOL(client_connect_import); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 795a1f1..57486b4 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -266,7 +266,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_brw_size = MD_MAX_BRW_SIZE; err = obd_connect(NULL, &sbi->ll_md_exp, sbi->ll_md_obd, - &sbi->ll_sb_uuid, data, NULL); + &sbi->ll_sb_uuid, data, sbi->ll_cache); if (err == -EBUSY) { LCONSOLE_ERROR_MSG(0x14f, "An MDT (md %s) is performing recovery, of which this client is not a part. Please wait for recovery to complete, abort, or time out.\n", @@ -462,7 +462,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_brw_size = DT_MAX_BRW_SIZE; err = obd_connect(NULL, &sbi->ll_dt_exp, sbi->ll_dt_obd, - &sbi->ll_sb_uuid, data, NULL); + &sbi->ll_sb_uuid, data, sbi->ll_cache); if (err == -EBUSY) { LCONSOLE_ERROR_MSG(0x150, "An OST (dt %s) is performing recovery, of which this client is not a part. Please wait for recovery to complete, abort, or time out.\n", @@ -583,15 +583,6 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) } cl_sb_init(sb); - err = obd_set_info_async(NULL, sbi->ll_dt_exp, sizeof(KEY_CACHE_SET), - KEY_CACHE_SET, sizeof(*sbi->ll_cache), - sbi->ll_cache, NULL); - if (err) { - CERROR("%s: Set cache_set failed: rc = %d\n", - sbi->ll_dt_exp->exp_obd->obd_name, err); - goto out_root; - } - sb->s_root = d_make_root(root); if (!sb->s_root) { CERROR("%s: can't make root dentry\n", diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 6ad100c..9f3d6de 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -207,6 +207,7 @@ static int lmv_connect(const struct lu_env *env, lmv->connected = 0; lmv->conn_data = *data; + lmv->lmv_cache = localdata; lmv->lmv_tgts_kobj = kobject_create_and_add("target_obds", &obd->obd_kset.kobj); @@ -299,7 +300,7 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) } rc = obd_connect(NULL, &mdc_exp, mdc_obd, &obd->obd_uuid, - &lmv->conn_data, NULL); + &lmv->conn_data, lmv->lmv_cache); if (rc) { CERROR("target %s connect error %d\n", tgt->ltd_uuid.uuid, rc); return rc; diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index cc0ca1c..240cc6f9 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -120,7 +120,7 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid, static int lov_notify(struct obd_device *obd, struct obd_device *watched, enum obd_notify_event ev); -int lov_connect_obd(struct obd_device *obd, u32 index, int activate, +int lov_connect_osc(struct obd_device *obd, u32 index, int activate, struct obd_connect_data *data) { struct lov_obd *lov = &obd->u.lov; @@ -169,13 +169,13 @@ int lov_connect_obd(struct obd_device *obd, u32 index, int activate, if (imp->imp_invalid) { CDEBUG(D_CONFIG, - "not connecting OSC %s; administratively disabled\n", + "%s: not connecting - administratively disabled\n", obd_uuid2str(tgt_uuid)); return 0; } rc = obd_connect(NULL, &lov->lov_tgts[index]->ltd_exp, tgt_obd, - &lov_osc_uuid, data, NULL); + &lov_osc_uuid, data, lov->lov_cache); if (rc || !lov->lov_tgts[index]->ltd_exp) { CERROR("Target %s connect error %d\n", obd_uuid2str(tgt_uuid), rc); @@ -231,12 +231,17 @@ static int lov_connect(const struct lu_env *env, lov_tgts_getref(obd); + if (localdata) { + lov->lov_cache = localdata; + cl_cache_incref(lov->lov_cache); + } + for (i = 0; i < lov->desc.ld_tgt_count; i++) { tgt = lov->lov_tgts[i]; if (!tgt || obd_uuid_empty(&tgt->ltd_uuid)) continue; /* Flags will be lowest common denominator */ - rc = lov_connect_obd(obd, i, tgt->ltd_activate, &lov->lov_ocd); + rc = lov_connect_osc(obd, i, tgt->ltd_activate, &lov->lov_ocd); if (rc) { CERROR("%s: lov connect tgt %d failed: %d\n", obd->obd_name, i, rc); @@ -381,20 +386,12 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid, struct obd_uuid lov_osc_uuid = {"LOV_OSC_UUID"}; rc = obd_connect(NULL, &tgt->ltd_exp, tgt->ltd_obd, - &lov_osc_uuid, &lov->lov_ocd, NULL); + &lov_osc_uuid, &lov->lov_ocd, + lov->lov_cache); if (rc || !tgt->ltd_exp) { index = rc; goto out; } - rc = obd_set_info_async(NULL, tgt->ltd_exp, - sizeof(KEY_CACHE_SET), - KEY_CACHE_SET, - sizeof(struct cl_client_cache), - lov->lov_cache, NULL); - if (rc < 0) { - index = rc; - goto out; - } } if (lov->lov_tgts[index]->ltd_activate == activate) { @@ -574,17 +571,16 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, CDEBUG(D_CONFIG, "idx=%d ltd_gen=%d ld_tgt_count=%d\n", index, tgt->ltd_gen, lov->desc.ld_tgt_count); - if (lov->lov_connects == 0) { + if (lov->lov_connects == 0) /* lov_connect hasn't been called yet. We'll do the - * lov_connect_obd on this target when that fn first runs, + * lov_connect_osc on this target when that fn first runs, * because we don't know the connect flags yet. */ return 0; - } lov_tgts_getref(obd); - rc = lov_connect_obd(obd, index, active, &lov->lov_ocd); + rc = lov_connect_osc(obd, index, active, &lov->lov_ocd); if (rc) goto out; @@ -594,15 +590,6 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp, goto out; } - if (lov->lov_cache) { - rc = obd_set_info_async(NULL, tgt->ltd_exp, - sizeof(KEY_CACHE_SET), KEY_CACHE_SET, - sizeof(struct cl_client_cache), - lov->lov_cache, NULL); - if (rc < 0) - goto out; - } - rc = lov_notify(obd, tgt->ltd_exp->exp_obd, active ? OBD_NOTIFY_CONNECT : OBD_NOTIFY_INACTIVE); @@ -1216,14 +1203,8 @@ static int lov_set_info_async(const struct lu_env *env, struct obd_export *exp, lov_tgts_getref(obddev); - if (KEY_IS(KEY_CHECKSUM)) { + if (KEY_IS(KEY_CHECKSUM)) do_inactive = true; - } else if (KEY_IS(KEY_CACHE_SET)) { - LASSERT(!lov->lov_cache); - lov->lov_cache = val; - do_inactive = true; - cl_cache_incref(lov->lov_cache); - } for (i = 0; i < lov->desc.ld_tgt_count; i++) { tgt = lov->lov_tgts[i]; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 7a99ef2..a988cbf 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2899,23 +2899,6 @@ int osc_set_info_async(const struct lu_env *env, struct obd_export *exp, return 0; } - if (KEY_IS(KEY_CACHE_SET)) { - struct client_obd *cli = &obd->u.cli; - - LASSERT(!cli->cl_cache); /* only once */ - cli->cl_cache = val; - cl_cache_incref(cli->cl_cache); - cli->cl_lru_left = &cli->cl_cache->ccc_lru_left; - - /* add this osc into entity list */ - LASSERT(list_empty(&cli->cl_lru_osc)); - spin_lock(&cli->cl_cache->ccc_lru_lock); - list_add(&cli->cl_lru_osc, &cli->cl_cache->ccc_lru); - spin_unlock(&cli->cl_cache->ccc_lru_lock); - - return 0; - } - if (KEY_IS(KEY_CACHE_LRU_SHRINK)) { struct client_obd *cli = &obd->u.cli; long nr = atomic_long_read(&cli->cl_lru_in_list) >> 1; From patchwork Thu Feb 27 21:11:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410153 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ADDA92A for ; Thu, 27 Feb 2020 21:31:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8346E24677 for ; Thu, 27 Feb 2020 21:31:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8346E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 163A334989E; Thu, 27 Feb 2020 13:26:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CCDB021FD22 for ; Thu, 27 Feb 2020 13:19:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 453E92C6D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4400E468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:11:59 -0500 Message-Id: <1582838290-17243-252-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 251/622] lustre: ldlm: Fix style issues for ldlm_lockd.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ldlm/ldlm_lockd.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 5275c82c67d9 ("LU-6142 ldlm: Fix style issues for ldlm_lockd.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34544 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lockd.c | 64 +++++++++++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index ea146aa..f37d8ef 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -80,7 +80,7 @@ struct ldlm_bl_pool { /* * blp_prio_list is used for callbacks that should be handled * as a priority. It is used for LDLM_FL_DISCARD_DATA requests. - * see bug 13843 + * see b=13843 */ struct list_head blp_prio_list; @@ -126,22 +126,24 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, /* set bits to cancel for this lock for possible lock convert */ if (lock->l_resource->lr_type == LDLM_IBITS) { - /* Lock description contains policy of blocking lock, - * and its cancel_bits is used to pass conflicting bits. - * NOTE: ld can be NULL or can be not NULL but zeroed if - * passed from ldlm_bl_thread_blwi(), check below used bits - * in ld to make sure it is valid description. + /* + * Lock description contains policy of blocking lock, and its + * cancel_bits is used to pass conflicting bits. NOTE: ld can + * be NULL or can be not NULL but zeroed if passed from + * ldlm_bl_thread_blwi(), check below used bits in ld to make + * sure it is valid description. * - * If server may replace lock resource keeping the same cookie, - * never use cancel bits from different resource, full cancel - * is to be used. + * If server may replace lock resource keeping the same + * cookie, never use cancel bits from different resource, full + * cancel is to be used. */ if (ld && ld->l_policy_data.l_inodebits.bits && ldlm_res_eq(&ld->l_resource.lr_name, &lock->l_resource->lr_name)) lock->l_policy_data.l_inodebits.cancel_bits = ld->l_policy_data.l_inodebits.cancel_bits; - /* if there is no valid ld and lock is cbpending already + /* + * If there is no valid ld and lock is cbpending already * then cancel_bits should be kept, otherwise it is zeroed. */ else if (!ldlm_is_cbpending(lock)) @@ -169,7 +171,7 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, LDLM_LOCK_RELEASE(lock); } -/** +/* * Callback handler for receiving incoming completion ASTs. * * This only can happen on client side. @@ -241,8 +243,10 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, goto out; } - /* If we receive the completion AST before the actual enqueue returned, - * then we might need to switch lock modes, resources, or extents. + /* + * If we receive the completion AST before the actual enqueue + * returned, then we might need to switch lock modes, resources, or + * extents. */ if (dlm_req->lock_desc.l_granted_mode != lock->l_req_mode) { lock->l_req_mode = dlm_req->lock_desc.l_granted_mode; @@ -260,7 +264,8 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, ldlm_resource_unlink_lock(lock); if (dlm_req->lock_flags & LDLM_FL_AST_SENT) { - /* BL_AST locks are not needed in LRU. + /* + * BL_AST locks are not needed in LRU. * Let ldlm_cancel_lru() be fast. */ ldlm_lock_remove_from_lru(lock); @@ -374,7 +379,8 @@ static int __ldlm_bl_to_thread(struct ldlm_bl_work_item *blwi, wake_up(&blp->blp_waitq); - /* can not check blwi->blwi_flags as blwi could be already freed in + /* + * Can not check blwi->blwi_flags as blwi could be already freed in * LCF_ASYNC mode */ if (!(cancel_flags & LCF_ASYNC)) @@ -439,7 +445,8 @@ static int ldlm_bl_to_thread(struct ldlm_namespace *ns, rc = __ldlm_bl_to_thread(blwi, cancel_flags); } else { - /* if it is synchronous call do minimum mem alloc, as it could + /* + * If it is synchronous call do minimum mem alloc, as it could * be triggered from kernel shrinker */ struct ldlm_bl_work_item blwi; @@ -535,7 +542,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) struct ldlm_lock *lock; int rc; - /* Requests arrive in sender's byte order. The ptlrpc service + /* + * Requests arrive in sender's byte order. The ptlrpc service * handler has already checked and, if necessary, byte-swapped the * incoming request message body, but I am responsible for the * message buffers. @@ -596,7 +604,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) return 0; } - /* Force a known safe race, send a cancel to the server for a lock + /* + * Force a known safe race, send a cancel to the server for a lock * which the server has already started a blocking callback on. */ if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE) && @@ -626,7 +635,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) lock->l_flags |= ldlm_flags_from_wire(dlm_req->lock_flags & LDLM_FL_AST_MASK); if (lustre_msg_get_opc(req->rq_reqmsg) == LDLM_BL_CALLBACK) { - /* If somebody cancels lock and cache is already dropped, + /* + * If somebody cancels lock and cache is already dropped, * or lock is failed before cp_ast received on client, * we can tell the server we have no lock. Otherwise, we * should send cancel after dropping the cache. @@ -643,7 +653,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) &dlm_req->lock_handle[0]); return 0; } - /* BL_AST locks are not needed in LRU. + /* + * BL_AST locks are not needed in LRU. * Let ldlm_cancel_lru() be fast. */ ldlm_lock_remove_from_lru(lock); @@ -651,14 +662,15 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) } unlock_res_and_lock(lock); - /* We want the ost thread to get this reply so that it can respond + /* + * We want the ost thread to get this reply so that it can respond * to ost requests (write cache writeback) that might be triggered * in the callback. * * But we'd also like to be able to indicate in the reply that we're * cancelling right now, because it's unused, or have an intent result - * in the reply, so we might have to push the responsibility for sending - * the reply down into the AST handlers, alas. + * in the reply, so we might have to push the responsibility for + * sending the reply down into the AST handlers, alas. */ switch (lustre_msg_get_opc(req->rq_reqmsg)) { @@ -866,7 +878,8 @@ static int ldlm_bl_thread_main(void *arg) if (rc == LDLM_ITER_STOP) break; - /* If there are many namespaces, we will not sleep waiting for + /* + * If there are many namespaces, we will not sleep waiting for * work, and must do a cond_resched to avoid holding the CPU * for too long */ @@ -1171,7 +1184,8 @@ void ldlm_exit(void) if (ldlm_refcount) CERROR("ldlm_refcount is %d in %s!\n", ldlm_refcount, __func__); kmem_cache_destroy(ldlm_resource_slab); - /* ldlm_lock_put() use RCU to call ldlm_lock_free, so need call + /* + * ldlm_lock_put() use RCU to call ldlm_lock_free, so need call * synchronize_rcu() to wait a grace period elapsed, so that * ldlm_lock_free() get a chance to be called. */ From patchwork Thu Feb 27 21:12:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410003 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABDAA14E3 for ; Thu, 27 Feb 2020 21:27:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 94AA2246A0 for ; Thu, 27 Feb 2020 21:27:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94AA2246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DC0A13491B5; Thu, 27 Feb 2020 13:24:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3425321FD25 for ; Thu, 27 Feb 2020 13:19:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 489092C6E; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 46DD946A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:00 -0500 Message-Id: <1582838290-17243-253-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 252/622] lustre: ldlm: Fix style issues for ldlm_request.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ldlm/ldlm_request.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 3a56c0e5f42f ("LU-6142 ldlm: Fix style issues for ldlm_request.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34547 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 144 +++++++++++++++++++++++++++--------------- 1 file changed, 94 insertions(+), 50 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index fb564f4..45d70d4 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -147,7 +147,8 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) * * Return: timeout in seconds to wait for the server reply */ -/* We use the same basis for both server side and client side functions +/* + * We use the same basis for both server side and client side functions * from a single node. */ static time64_t ldlm_cp_timeout(struct ldlm_lock *lock) @@ -289,13 +290,14 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns, { int need_cancel = 0; - /* Set a flag to prevent us from sending a CANCEL (bug 407) */ + /* Set a flag to prevent us from sending a CANCEL (b=407) */ lock_res_and_lock(lock); /* Check that lock is not granted or failed, we might race. */ if (!ldlm_is_granted(lock) && !ldlm_is_failed(lock)) { - /* Make sure that this lock will not be found by raced + /* + * Make sure that this lock will not be found by raced * bl_ast and -EINVAL reply is sent to server anyways. - * bug 17645 + * b=17645 */ lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_FAILED | LDLM_FL_ATOMIC_CB | LDLM_FL_CBPENDING; @@ -309,10 +311,12 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns, else LDLM_DEBUG(lock, "lock was granted or failed in race"); - /* XXX - HACK because we shouldn't call ldlm_lock_destroy() + /* + * XXX - HACK because we shouldn't call ldlm_lock_destroy() * from llite/file.c/ll_file_flock(). */ - /* This code makes for the fact that we do not have blocking handler on + /* + * This code makes for the fact that we do not have blocking handler on * a client for flock locks. As such this is the place where we must * completely kill failed locks. (interrupted and those that * were waiting to be granted when server evicted us. @@ -416,7 +420,8 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, CDEBUG(D_INFO, "local: %p, remote cookie: %#llx, flags: 0x%llx\n", lock, reply->lock_handle.cookie, *flags); - /* If enqueue returned a blocked lock but the completion handler has + /* + * If enqueue returned a blocked lock but the completion handler has * already run, then it fixed up the resource and we don't need to do it * again. */ @@ -466,11 +471,13 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, LDLM_DEBUG(lock, "enqueue reply includes blocking AST"); } - /* If the lock has already been granted by a completion AST, don't + /* + * If the lock has already been granted by a completion AST, don't * clobber the LVB with an older one. */ if (lvb_len > 0) { - /* We must lock or a racing completion might update lvb without + /* + * We must lock or a racing completion might update lvb without * letting us know and we'll clobber the correct value. * Cannot unlock after the check either, as that still leaves * a tiny window for completion to get in @@ -499,7 +506,8 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, } if (lvb_len > 0 && lvb) { - /* Copy the LVB here, and not earlier, because the completion + /* + * Copy the LVB here, and not earlier, because the completion * AST (if any) can override what we got in the reply */ memcpy(lvb, lock->l_lvb_data, lvb_len); @@ -586,7 +594,8 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req, to_free = !ns_connect_lru_resize(ns) && opc == LDLM_ENQUEUE ? 1 : 0; - /* Cancel LRU locks here _only_ if the server supports + /* + * Cancel LRU locks here _only_ if the server supports * EARLY_CANCEL. Otherwise we have to send extra CANCEL * RPC, which will make us slower. */ @@ -611,7 +620,8 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req, if (canceloff) { dlm = req_capsule_client_get(pill, &RMF_DLM_REQ); LASSERT(dlm); - /* Skip first lock handler in ldlm_request_pack(), + /* + * Skip first lock handler in ldlm_request_pack(), * this method will increment @lock_count according * to the lock handle amount actually written to * the buffer. @@ -685,7 +695,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, ns = exp->exp_obd->obd_namespace; - /* If we're replaying this lock, just check some invariants. + /* + * If we're replaying this lock, just check some invariants. * If we're creating a new lock, get everything all setup nicely. */ if (is_replay) { @@ -752,7 +763,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, if (*flags & LDLM_FL_NDELAY) { DEBUG_REQ(D_DLMTRACE, req, "enque lock with no delay\n"); req->rq_no_resend = req->rq_no_delay = 1; - /* probably set a shorter timeout value and handle ETIMEDOUT + /* + * probably set a shorter timeout value and handle ETIMEDOUT * in osc_lock_upcall() correctly */ /* lustre_msg_set_timeout(req, req->rq_timeout / 2); */ @@ -799,7 +811,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, einfo->ei_mode, flags, lvb, lvb_len, lockh, rc); - /* If ldlm_cli_enqueue_fini did not find the lock, we need to free + /* + * If ldlm_cli_enqueue_fini did not find the lock, we need to free * one reference that we took */ if (err == -ENOLCK) @@ -860,7 +873,8 @@ static int lock_convert_interpret(const struct lu_env *env, } lock_res_and_lock(lock); - /* Lock convert is sent for any new bits to drop, the converting flag + /* + * Lock convert is sent for any new bits to drop, the converting flag * is dropped when ibits on server are the same as on client. Meanwhile * that can be so that more later convert will be replied first with * and clear converting flag, so in case of such race just exit here. @@ -872,7 +886,8 @@ static int lock_convert_interpret(const struct lu_env *env, reply->lock_desc.l_policy_data.l_inodebits.bits); } else if (reply->lock_desc.l_policy_data.l_inodebits.bits != lock->l_policy_data.l_inodebits.bits) { - /* Compare server returned lock ibits and local lock ibits + /* + * Compare server returned lock ibits and local lock ibits * if they are the same we consider conversion is done, * otherwise we have more converts inflight and keep * converting flag. @@ -882,14 +897,16 @@ static int lock_convert_interpret(const struct lu_env *env, } else { ldlm_clear_converting(lock); - /* Concurrent BL AST may arrive and cause another convert + /* + * Concurrent BL AST may arrive and cause another convert * or cancel so just do nothing here if bl_ast is set, * finish with convert otherwise. */ if (!ldlm_is_bl_ast(lock)) { struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); - /* Drop cancel_bits since there are no more converts + /* + * Drop cancel_bits since there are no more converts * and put lock into LRU if it is still not used and * is not there yet. */ @@ -918,7 +935,8 @@ static int lock_convert_interpret(const struct lu_env *env, } unlock_res_and_lock(lock); - /* fallback to normal lock cancel. If rc means there is no + /* + * fallback to normal lock cancel. If rc means there is no * valid lock on server, do only local cancel */ if (rc == ELDLM_NO_LOCK_DATA) @@ -959,7 +977,8 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) return -EINVAL; } - /* this is better to check earlier and it is done so already, + /* + * this is better to check earlier and it is done so already, * but this check is kept too as final one to issue an error * if any new code will miss such check. */ @@ -1075,7 +1094,8 @@ static void ldlm_cancel_pack(struct ptlrpc_request *req, max += LDLM_LOCKREQ_HANDLES; LASSERT(max >= dlm->lock_count + count); - /* XXX: it would be better to pack lock handles grouped by resource. + /* + * XXX: it would be better to pack lock handles grouped by resource. * so that the server cancel would call filter_lvbo_update() less * frequently. */ @@ -1202,7 +1222,8 @@ int ldlm_cli_update_pool(struct ptlrpc_request *req) return 0; } - /* In some cases RPC may contain SLV and limit zeroed out. This + /* + * In some cases RPC may contain SLV and limit zeroed out. This * is the case when server does not support LRU resize feature. * This is also possible in some recovery cases when server-side * reqs have no reference to the OBD export and thus access to @@ -1221,7 +1242,8 @@ int ldlm_cli_update_pool(struct ptlrpc_request *req) new_slv = lustre_msg_get_slv(req->rq_repmsg); obd = req->rq_import->imp_obd; - /* Set new SLV and limit in OBD fields to make them accessible + /* + * Set new SLV and limit in OBD fields to make them accessible * to the pool thread. We do not access obd_namespace and pool * directly here as there is no reliable way to make sure that * they are still alive at cleanup time. Evil races are possible @@ -1281,7 +1303,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } - /* Lock is being converted, cancel it immediately. + /* + * Lock is being converted, cancel it immediately. * When convert will end, it releases lock and it will be gone. */ if (ldlm_is_converting(lock)) { @@ -1302,7 +1325,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, LDLM_LOCK_RELEASE(lock); return 0; } - /* Even if the lock is marked as LDLM_FL_BL_AST, this is a LDLM_CANCEL + /* + * Even if the lock is marked as LDLM_FL_BL_AST, this is a LDLM_CANCEL * RPC which goes to canceld portal, so we can cancel other LRU locks * here and send them all as one LDLM_CANCEL RPC. */ @@ -1350,7 +1374,8 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, } else { rc = ldlm_cli_cancel_local(lock); } - /* Until we have compound requests and can send LDLM_CANCEL + /* + * Until we have compound requests and can send LDLM_CANCEL * requests batched with generic RPCs, we need to send cancels * with the LDLM_FL_BL_AST flag in a separate RPC from * the one being generated now. @@ -1387,7 +1412,8 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, { enum ldlm_policy_res result = LDLM_POLICY_CANCEL_LOCK; - /* don't check added & count since we want to process all locks + /* + * don't check added & count since we want to process all locks * from unused list. * It's fine to not take lock to access lock->l_resource since * the lock has already been granted so it won't change. @@ -1424,7 +1450,8 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns, u64 slv, lvf, lv; s64 la; - /* Stop LRU processing when we reach past @count or have checked all + /* + * Stop LRU processing when we reach past @count or have checked all * locks in LRU. */ if (count && added >= count) @@ -1447,7 +1474,8 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns, /* Inform pool about current CLV to see it via debugfs. */ ldlm_pool_set_clv(pl, lv); - /* Stop when SLV is not yet come from server or lv is smaller than + /* + * Stop when SLV is not yet come from server or lv is smaller than * it is. */ if (slv == 0 || lv < slv) @@ -1469,7 +1497,8 @@ static enum ldlm_policy_res ldlm_cancel_passed_policy(struct ldlm_namespace *ns, int unused, int added, int count) { - /* Stop LRU processing when we reach past @count or have checked all + /* + * Stop LRU processing when we reach past @count or have checked all * locks in LRU. */ return (added >= count) ? @@ -1538,7 +1567,8 @@ static enum ldlm_policy_res ldlm_cancel_aged_policy(struct ldlm_namespace *ns, ldlm_cancel_default_policy(struct ldlm_namespace *ns, struct ldlm_lock *lock, int unused, int added, int count) { - /* Stop LRU processing when we reach past count or have checked all + /* + * Stop LRU processing when we reach past count or have checked all * locks in LRU. */ return (added >= count) ? @@ -1652,7 +1682,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, !ldlm_is_converting(lock)) break; - /* Somebody is already doing CANCEL. No need for this + /* + * Somebody is already doing CANCEL. No need for this * lock in LRU, do not traverse it again. */ ldlm_lock_remove_from_lru_nolock(lock); @@ -1668,7 +1699,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, spin_unlock(&ns->ns_lock); lu_ref_add(&lock->l_reference, __func__, current); - /* Pass the lock through the policy filter and see if it + /* + * Pass the lock through the policy filter and see if it * should stay in LRU. * * Even for shrinker policy we stop scanning if @@ -1707,7 +1739,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* Check flags again under the lock. */ if (ldlm_is_canceling(lock) || ldlm_is_converting(lock) || (ldlm_lock_remove_from_lru_check(lock, last_use) == 0)) { - /* Another thread is removing lock from LRU, or + /* + * Another thread is removing lock from LRU, or * somebody is already doing CANCEL, or there * is a blocking request which will send cancel * by itself, or the lock is no longer unused or @@ -1722,7 +1755,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, } LASSERT(!lock->l_readers && !lock->l_writers); - /* If we have chosen to cancel this lock voluntarily, we + /* + * If we have chosen to cancel this lock voluntarily, we * better send cancel notification to server, so that it * frees appropriate state. This might lead to a race * where while we are doing cancel here, server is also @@ -1730,7 +1764,8 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, */ ldlm_clear_cancel_on_block(lock); - /* Setting the CBPENDING flag is a little misleading, + /* + * Setting the CBPENDING flag is a little misleading, * but prevents an important race; namely, once * CBPENDING is set, the lock can accumulate no more * readers/writers. Since readers and writers are @@ -1744,11 +1779,12 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, ldlm_has_dom(lock)) && lock->l_granted_mode == LCK_PR) ldlm_set_discard_data(lock); - /* We can't re-add to l_lru as it confuses the + /* + * We can't re-add to l_lru as it confuses the * refcounting in ldlm_lock_remove_from_lru() if an AST * arrives after we drop lr_lock below. We use l_bl_ast * and can't use l_pending_chain as it is used both on - * server and client nevertheless bug 5666 says it is + * server and client nevertheless b=5666 says it is * used only on server */ LASSERT(list_empty(&lock->l_bl_ast)); @@ -1787,7 +1823,8 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr, LIST_HEAD(cancels); int count, rc; - /* Just prepare the list of locks, do not actually cancel them yet. + /* + * Just prepare the list of locks, do not actually cancel them yet. * Locks are cancelled later in a separate thread. */ count = ldlm_prepare_lru_list(ns, &cancels, nr, 0, flags); @@ -1824,7 +1861,8 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, if (lock->l_readers || lock->l_writers) continue; - /* If somebody is already doing CANCEL, or blocking AST came, + /* + * If somebody is already doing CANCEL, or blocking AST came, * skip this lock. */ if (ldlm_is_bl_ast(lock) || ldlm_is_canceling(lock) || @@ -1834,7 +1872,8 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, if (lockmode_compat(lock->l_granted_mode, mode)) continue; - /* If policy is given and this is IBITS lock, add to list only + /* + * If policy is given and this is IBITS lock, add to list only * those locks that match by policy. * Skip locks with DoM bit always to don't flush data. */ @@ -1878,7 +1917,8 @@ int ldlm_cli_cancel_list(struct list_head *cancels, int count, if (list_empty(cancels) || count == 0) return 0; - /* XXX: requests (both batched and not) could be sent in parallel. + /* + * XXX: requests (both batched and not) could be sent in parallel. * Usually it is enough to have just 1 RPC, but it is possible that * there are too many locks to be cancelled in LRU or on a resource. * It would also speed up the case when the server does not support @@ -2071,7 +2111,8 @@ static void ldlm_namespace_foreach(struct ldlm_namespace *ns, ldlm_res_iter_helper, &helper, 0); } -/* non-blocking function to manipulate a lock whose cb_data is being put away. +/* + * non-blocking function to manipulate a lock whose cb_data is being put away. * return 0: find no resource * > 0: must be LDLM_ITER_STOP/LDLM_ITER_CONTINUE. * < 0: errors @@ -2108,8 +2149,9 @@ static int ldlm_chain_lock_for_replay(struct ldlm_lock *lock, void *closure) "lock %p next %p prev %p\n", lock, &lock->l_pending_chain.next, &lock->l_pending_chain.prev); - /* bug 9573: don't replay locks left after eviction, or - * bug 17614: locks being actively cancelled. Get a reference + /* + * b=9573: don't replay locks left after eviction, or + * b=17614: locks being actively cancelled. Get a reference * on a lock so that it does not disappear under us (e.g. due to cancel) */ if (!(lock->l_flags & (LDLM_FL_FAILED | LDLM_FL_BL_DONE))) { @@ -2169,7 +2211,7 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock) struct ldlm_request *body; int flags; - /* Bug 11974: Do not replay a lock which is actively being canceled */ + /* B=11974: Do not replay a lock which is actively being canceled */ if (ldlm_is_bl_done(lock)) { LDLM_DEBUG(lock, "Not replaying canceled lock:"); return 0; @@ -2226,10 +2268,11 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock) req_capsule_set_size(&req->rq_pill, &RMF_DLM_LVB, RCL_SERVER, lock->l_lvb_len); ptlrpc_request_set_replen(req); - /* notify the server we've replayed all requests. + /* + * notify the server we've replayed all requests. * also, we mark the request to be put on a dedicated * queue to be processed after all request replayes. - * bug 6063 + * b=6063 */ lustre_msg_set_flags(req->rq_reqmsg, MSG_REQ_REPLAY_DONE); @@ -2263,7 +2306,8 @@ static void ldlm_cancel_unused_locks_for_replay(struct ldlm_namespace *ns) "Dropping as many unused locks as possible before replay for namespace %s (%d)\n", ldlm_ns_name(ns), ns->ns_nr_unused); - /* We don't need to care whether or not LRU resize is enabled + /* + * We don't need to care whether or not LRU resize is enabled * because the LDLM_LRU_FLAG_NO_WAIT policy doesn't use the * count parameter */ From patchwork Thu Feb 27 21:12:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410083 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E75717E0 for ; Thu, 27 Feb 2020 21:29:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 174F7246A0 for ; Thu, 27 Feb 2020 21:29:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 174F7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D5B163494F3; Thu, 27 Feb 2020 13:25:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B33B21FD34 for ; Thu, 27 Feb 2020 13:19:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4B22A2C6F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 49C4F46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:01 -0500 Message-Id: <1582838290-17243-254-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 253/622] lustre: ptlrpc: Fix style issues for sec_bulk.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/sec_bulk.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: a294ea9a0e04 ("LU-6142 ptlrpc: Fix style issues for sec_bulk.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34548 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/ptlrpc/sec_bulk.c | 71 ++++++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 39 deletions(-) diff --git a/fs/lustre/ptlrpc/sec_bulk.c b/fs/lustre/ptlrpc/sec_bulk.c index e170da1..d36230b 100644 --- a/fs/lustre/ptlrpc/sec_bulk.c +++ b/fs/lustre/ptlrpc/sec_bulk.c @@ -50,9 +50,9 @@ #include "ptlrpc_internal.h" -/**************************************** - * bulk encryption page pools * - ****************************************/ +/* + * bulk encryption page pools + */ #define POINTERS_PER_PAGE (PAGE_SIZE / sizeof(void *)) #define PAGES_PER_POOL (POINTERS_PER_PAGE) @@ -63,19 +63,16 @@ #define CACHE_QUIESCENT_PERIOD (20) static struct ptlrpc_enc_page_pool { - /* - * constants - */ - unsigned long epp_max_pages; /* maximum pages can hold, const */ - unsigned int epp_max_pools; /* number of pools, const */ + unsigned long epp_max_pages; /* maximum pages can hold, const */ + unsigned int epp_max_pools; /* number of pools, const */ /* * wait queue in case of not enough free pages. */ - wait_queue_head_t epp_waitq; /* waiting threads */ - unsigned int epp_waitqlen; /* wait queue length */ - unsigned long epp_pages_short; /* # of pages wanted of in-q users */ - unsigned int epp_growing:1; /* during adding pages */ + wait_queue_head_t epp_waitq; /* waiting threads */ + unsigned int epp_waitqlen; /* wait queue length */ + unsigned long epp_pages_short; /* # of pages wanted of in-q users */ + unsigned int epp_growing:1; /* during adding pages */ /* * indicating how idle the pools are, from 0 to MAX_IDLE_IDX @@ -84,36 +81,32 @@ * is idled for a while but the idle_idx might still be low if no * activities happened in the pools. */ - unsigned long epp_idle_idx; + unsigned long epp_idle_idx; /* last shrink time due to mem tight */ - time64_t epp_last_shrink; - time64_t epp_last_access; - - /* - * in-pool pages bookkeeping - */ - spinlock_t epp_lock; /* protect following fields */ - unsigned long epp_total_pages; /* total pages in pools */ - unsigned long epp_free_pages; /* current pages available */ - - /* - * statistics - */ - unsigned long epp_st_max_pages; /* # of pages ever reached */ - unsigned int epp_st_grows; /* # of grows */ - unsigned int epp_st_grow_fails; /* # of add pages failures */ - unsigned int epp_st_shrinks; /* # of shrinks */ - unsigned long epp_st_access; /* # of access */ - unsigned long epp_st_missings; /* # of cache missing */ - unsigned long epp_st_lowfree; /* lowest free pages reached */ - unsigned int epp_st_max_wqlen; /* highest waitqueue length */ - ktime_t epp_st_max_wait; /* in nanoseconds */ - unsigned long epp_st_outofmem; /* # of out of mem requests */ + time64_t epp_last_shrink; + time64_t epp_last_access; + + /* in-pool pages bookkeeping */ + spinlock_t epp_lock; /* protect following fields */ + unsigned long epp_total_pages; /* total pages in pools */ + unsigned long epp_free_pages; /* current pages available */ + + /* statistics */ + unsigned long epp_st_max_pages; /* # of pages ever reached */ + unsigned int epp_st_grows; /* # of grows */ + unsigned int epp_st_grow_fails; /* # of add pages failures */ + unsigned int epp_st_shrinks; /* # of shrinks */ + unsigned long epp_st_access; /* # of access */ + unsigned long epp_st_missings; /* # of cache missing */ + unsigned long epp_st_lowfree; /* lowest free pages reached */ + unsigned int epp_st_max_wqlen; /* highest waitqueue length */ + ktime_t epp_st_max_wait; /* in nanoseconds */ + unsigned long epp_st_outofmem; /* # of out of mem requests */ /* - * pointers to pools + * pointers to pools, may be vmalloc'd */ - struct page ***epp_pools; + struct page ***epp_pools; } page_pools; /* @@ -185,7 +178,7 @@ static void enc_pools_release_free_pages(long npages) /* max pool index after the release */ p_idx_max1 = page_pools.epp_total_pages == 0 ? -1 : - ((page_pools.epp_total_pages - 1) / PAGES_PER_POOL); + ((page_pools.epp_total_pages - 1) / PAGES_PER_POOL); p_idx = page_pools.epp_free_pages / PAGES_PER_POOL; g_idx = page_pools.epp_free_pages % PAGES_PER_POOL; From patchwork Thu Feb 27 21:12:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410157 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 17BFD17E0 for ; Thu, 27 Feb 2020 21:31:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0044624677 for ; Thu, 27 Feb 2020 21:31:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0044624677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73C673498EF; Thu, 27 Feb 2020 13:26:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E149521FD3F for ; Thu, 27 Feb 2020 13:19:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4E4992C70; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4C93C46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:02 -0500 Message-Id: <1582838290-17243-255-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 254/622] lustre: ldlm: Fix style issues for ptlrpcd.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/ptlrpcd.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: f64aeebfceb3 ("LU-6142 ldlm: Fix style issues for ptlrpcd.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34604 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/ptlrpc/ptlrpcd.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c index e9c03ba..bcf1e46 100644 --- a/fs/lustre/ptlrpc/ptlrpcd.c +++ b/fs/lustre/ptlrpc/ptlrpcd.c @@ -238,7 +238,8 @@ void ptlrpcd_add_req(struct ptlrpc_request *req) wait_event_idle(req->rq_set_waitq, !req->rq_set); } else if (req->rq_set) { - /* If we have a valid "rq_set", just reuse it to avoid double + /* + * If we have a valid "rq_set", just reuse it to avoid double * linked. */ LASSERT(req->rq_phase == RQ_PHASE_NEW); @@ -294,7 +295,8 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc) spin_unlock(&set->set_new_req_lock); } - /* We should call lu_env_refill() before handling new requests to make + /* + * We should call lu_env_refill() before handling new requests to make * sure that env key the requests depending on really exists. */ rc2 = lu_env_refill(env); @@ -316,7 +318,8 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc) if (atomic_read(&set->set_remaining)) rc |= ptlrpc_check_set(env, set); - /* NB: ptlrpc_check_set has already moved completed request at the + /* + * NB: ptlrpc_check_set has already moved completed request at the * head of seq::set_requests */ list_for_each_entry_safe(req, tmp, &set->set_requests, rq_set_chain) { @@ -334,7 +337,8 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc) */ rc = atomic_read(&set->set_new_count); - /* If we have nothing to do, check whether we can take some + /* + * If we have nothing to do, check whether we can take some * work from our partner threads. */ if (rc == 0 && pc->pc_npartners > 0) { @@ -379,7 +383,6 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc) * Main ptlrpcd thread. * ptlrpc's code paths like to execute in process context, so we have this * thread which spins on a set which contains the rpcs and sends them. - * */ static int ptlrpcd(void *arg) { From patchwork Thu Feb 27 21:12:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410073 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4801B138D for ; Thu, 27 Feb 2020 21:29:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 30837246A0 for ; Thu, 27 Feb 2020 21:29:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30837246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C8D84349490; Thu, 27 Feb 2020 13:25:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2EE4121FD46 for ; Thu, 27 Feb 2020 13:19:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 510482C71; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4F5FF46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:03 -0500 Message-Id: <1582838290-17243-256-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 255/622] lustre: ptlrpc: IR doesn't reconnect after EAGAIN X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sergey Cheremencev There is a chance that client is connecting to OST before recovery start when OST is not configured. In such case OST returns EAGAIN(target->obd_no_conn == 1). There is no problem when pinger_recov is enabled because ptlrpc_pinger_main will reconnect later. But it doesn't reconnect when pinger_recov is 0. Move setting imp_connect_error to ptlrpc_connect_interpret. It is needed to store there only connection errors. Cray-bug-id: LUS-2034 WC-bug-id: https://jira.whamcloud.com/browse/LU-11601 Lustre-commit: 3341c8c31871 ("LU-11601 ptlrpc: IR doesn't reconnect after EAGAIN") Signed-off-by: Sergey Cheremencev Reviewed-on: https://es-gerrit.dev.cray.com/153542 Reviewed-by: Alexey Lyashkov Reviewed-by: Vitaly Fertman Reviewed-on: https://review.whamcloud.com/33557 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/client.c | 1 - fs/lustre/ptlrpc/import.c | 1 + fs/lustre/ptlrpc/pinger.c | 3 ++- 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 36955e8..9ebdcb6 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -264,6 +264,7 @@ #define OBD_FAIL_OST_STATFS_EINPROGRESS 0x231 #define OBD_FAIL_OST_SET_INFO_NET 0x232 #define OBD_FAIL_OST_DISCONNECT_DELAY 0x245 +#define OBD_FAIL_OST_PREPARE_DELAY 0x247 #define OBD_FAIL_LDLM 0x300 #define OBD_FAIL_LDLM_NAMESPACE_NEW 0x301 diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index f57ec1883..0f5aa92 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1457,7 +1457,6 @@ static int after_reply(struct ptlrpc_request *req) lustre_msg_get_service_time(req->rq_repmsg)); rc = ptlrpc_check_status(req); - imp->imp_connect_error = rc; if (rc) { /* diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 39d9e3e..a75856a 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -944,6 +944,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, return 0; } + imp->imp_connect_error = rc; if (rc) { struct ptlrpc_request *free_req; struct ptlrpc_request *tmp; diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index c565e2d..c3fbddc 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -228,7 +228,8 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp, if (level == LUSTRE_IMP_DISCON && !imp_is_deactive(imp)) { /* wait for a while before trying recovery again */ imp->imp_next_ping = ptlrpc_next_reconnect(imp); - if (!imp->imp_no_pinger_recover) + if (!imp->imp_no_pinger_recover || + imp->imp_connect_error == -EAGAIN) ptlrpc_initiate_recovery(imp); } else if (level != LUSTRE_IMP_FULL || imp->imp_obd->obd_no_recov || From patchwork Thu Feb 27 21:12:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410877 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E2211580 for ; Thu, 27 Feb 2020 21:49:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 16A4024690 for ; Thu, 27 Feb 2020 21:49:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16A4024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0B95D34A6C1; Thu, 27 Feb 2020 13:41:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 85D1721FC1A for ; Thu, 27 Feb 2020 13:19:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 541062C72; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 52BF5468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:04 -0500 Message-Id: <1582838290-17243-257-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 256/622] lustre: llite: ll_fault fixes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Various error conditions in the fault path can cause us to not return a page in vm_fault. Check if it's present before accessing it. WC-bug-id: https://jira.whamcloud.com/browse/LU-11403 Lustre-commit: a8f4d1e5fd79 ("LU-11403 llite: ll_fault fixes") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34247 Reviewed-by: Alex Zhuravlev Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_mmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 236d1d2..37ce508 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -378,7 +378,8 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; restart: result = __ll_fault(vmf->vma, vmf); - if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) { + if (vmf->page && + !(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) { struct page *vmpage = vmf->page; /* check if this page has been truncated */ From patchwork Thu Feb 27 21:12:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410161 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31A82138D for ; Thu, 27 Feb 2020 21:31:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1967724677 for ; Thu, 27 Feb 2020 21:31:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1967724677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5E433349924; Thu, 27 Feb 2020 13:26:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA4DF21FC1A for ; Thu, 27 Feb 2020 13:19:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 56F532C73; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 55DDC46A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:05 -0500 Message-Id: <1582838290-17243-258-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 257/622] lustre: lsom: Add an OBD_CONNECT2_LSOM connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Add an OBD_CONNECT2_LSOM connect flag so that clients do not send MDS_ATTR_LSIZE and MDS_ATTR_LBLOCKS flags to the old servers that do not support them. WC-bug-id: https://jira.whamcloud.com/browse/LU-12021 Lustre-commit: fdd2c5d3a6e5 ("LU-12021 lsom: Add an OBD_CONNECT2_LSOM connect flag") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34343 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Aurelien Degremont Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 3 ++- fs/lustre/mdc/mdc_request.c | 4 ++++ fs/lustre/obdclass/lprocfs_status.c | 4 +++- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 5 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 57486b4..347bdd6 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -216,7 +216,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_DIR_MIGRATE | OBD_CONNECT2_SUM_STATFS | - OBD_CONNECT2_ARCHIVE_ID_ARRAY; + OBD_CONNECT2_ARCHIVE_ID_ARRAY | + OBD_CONNECT2_LSOM; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index f197abc..5931bc1 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -945,6 +945,10 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data, req->rq_request_portal = MDS_READPAGE_PORTAL; ptlrpc_at_set_req_timeout(req); + if (!(exp_connect_flags2(exp) & OBD_CONNECT2_LSOM)) + op_data->op_xvalid &= ~(OP_XVALID_LAZYSIZE | + OP_XVALID_LAZYBLOCKS); + mdc_close_pack(req, op_data); req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 7701bc3..cdf25ed 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -120,7 +120,9 @@ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ "archive_id_array", /* 0x100 */ - "selinux_policy", /* 0x200 */ + "unknown", /* 0x200 */ + "selinux_policy", /* 0x400 */ + "lsom", /* 0x800 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index bf79b8b..7cb6d74 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1146,6 +1146,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_ARCHIVE_ID_ARRAY); LASSERTF(OBD_CONNECT2_SELINUX_POLICY == 0x400ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_SELINUX_POLICY); + LASSERTF(OBD_CONNECT2_LSOM == 0x800ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_LSOM); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 1a1b6c6..6b9a623 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -806,6 +806,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ #define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ +#define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:12:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410087 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59359138D for ; Thu, 27 Feb 2020 21:29:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 41F41246A0 for ; Thu, 27 Feb 2020 21:29:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 41F41246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42B3234952A; Thu, 27 Feb 2020 13:25:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2E34E21FD66 for ; Thu, 27 Feb 2020 13:19:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5A3732C74; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 58A1146C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:06 -0500 Message-Id: <1582838290-17243-259-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 258/622] lustre: pcc: Reserve a new connection flag for PCC X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Reserve OBD_CONNECT2_PCC connection flag that will be set (in ocd_connect_flags2) if a Lustre server or a client supports Persistent Client Cache (PCC). WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: 93aa68404669 ("LU-10092 pcc: Reserve a new connection flag for PCC") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34356 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index cdf25ed..254a600 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -123,6 +123,7 @@ "unknown", /* 0x200 */ "selinux_policy", /* 0x400 */ "lsom", /* 0x800 */ + "pcc", /* 0x1000 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 7cb6d74..22447e2 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1148,6 +1148,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_SELINUX_POLICY); LASSERTF(OBD_CONNECT2_LSOM == 0x800ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_LSOM); + LASSERTF(OBD_CONNECT2_PCC == 0x1000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_PCC); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 6b9a623..46c3369 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -807,6 +807,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ +#define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:12:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410091 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B93C7138D for ; Thu, 27 Feb 2020 21:29:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A0F43246A0 for ; Thu, 27 Feb 2020 21:29:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0F43246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C3B3934955E; Thu, 27 Feb 2020 13:25:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7007721FD66 for ; Thu, 27 Feb 2020 13:19:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5E0CD2C75; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5BA5646D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:07 -0500 Message-Id: <1582838290-17243-260-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 259/622] lustre: uapi: reserve connect flag for plain layout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Reserve OBD_CONNECT2_PLAIN_LAYOUT flag, so that client supporting plain layout won't enable plain layout if MDT doesn't support, and in contrary, MDT supporting plain layout won't send such layout to client that doesn't support. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 14ee65e77bdc ("LU-11213 uapi: reserve connect flag for plain layout") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34656 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 254a600..a7c274a 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -124,6 +124,7 @@ "selinux_policy", /* 0x400 */ "lsom", /* 0x800 */ "pcc", /* 0x1000 */ + "plain_layout", /* 0x2000 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 22447e2..4a268f6 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1150,6 +1150,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LSOM); LASSERTF(OBD_CONNECT2_PCC == 0x1000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_PCC); + LASSERTF(OBD_CONNECT2_PLAIN_LAYOUT == 0x2000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_PLAIN_LAYOUT); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 46c3369..1b4b018 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -808,6 +808,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ #define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ +#define OBD_CONNECT2_PLAIN_LAYOUT 0x2000ULL /* Plain Directory Layout */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:12:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8886417E0 for ; Thu, 27 Feb 2020 21:46:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 71432246A2 for ; Thu, 27 Feb 2020 21:46:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71432246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 101E434B1AD; Thu, 27 Feb 2020 13:36:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B30B921FD7B for ; Thu, 27 Feb 2020 13:19:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 603012C77; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F03C468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:08 -0500 Message-Id: <1582838290-17243-261-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 260/622] lustre: ptlrpc: allow stopping threads above threads_max X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger If a service "threads_max" parameter is set below the number of running threads, stop each highest-numbered running thread until the running thread count is below threads_max. Stopping nly the last thread ensures the thread t_id numbers are always contiguous rather than having gaps. If the threads are started again they will again be assigned contiguous t_id values. Each thread is stopped only after it has finished processing an incoming request, so running threads may not immediately stop when the tunable is changed. Also fix function declarations in file to match proper coding style. WC-bug-id: https://jira.whamcloud.com/browse/LU-947 Lustre-commit: 183cb1e3cdd2 ("LU-947 ptlrpc: allow stopping threads above threads_max") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/34400 Reviewed-by: Wang Shilong Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 124 +++++++++++++++++++++++++-------------------- 1 file changed, 69 insertions(+), 55 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 7bc578c..362102b 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -106,8 +106,7 @@ return rqbd; } -static void -ptlrpc_free_rqbd(struct ptlrpc_request_buffer_desc *rqbd) +static void ptlrpc_free_rqbd(struct ptlrpc_request_buffer_desc *rqbd) { struct ptlrpc_service_part *svcpt = rqbd->rqbd_svcpt; @@ -123,8 +122,7 @@ kfree(rqbd); } -static int -ptlrpc_grow_req_bufs(struct ptlrpc_service_part *svcpt, int post) +static int ptlrpc_grow_req_bufs(struct ptlrpc_service_part *svcpt, int post) { struct ptlrpc_service *svc = svcpt->scp_service; struct ptlrpc_request_buffer_desc *rqbd; @@ -230,8 +228,8 @@ struct ptlrpc_hr_service { /** * Choose an hr thread to dispatch requests to. */ -static struct ptlrpc_hr_thread * -ptlrpc_hr_select(struct ptlrpc_service_part *svcpt) +static +struct ptlrpc_hr_thread *ptlrpc_hr_select(struct ptlrpc_service_part *svcpt) { struct ptlrpc_hr_partition *hrp; unsigned int rotor; @@ -270,8 +268,7 @@ void ptlrpc_dispatch_difficult_reply(struct ptlrpc_reply_state *rs) wake_up(&hrt->hrt_waitq); } -void -ptlrpc_schedule_difficult_reply(struct ptlrpc_reply_state *rs) +void ptlrpc_schedule_difficult_reply(struct ptlrpc_reply_state *rs) { assert_spin_locked(&rs->rs_svcpt->scp_rep_lock); assert_spin_locked(&rs->rs_lock); @@ -288,8 +285,7 @@ void ptlrpc_dispatch_difficult_reply(struct ptlrpc_reply_state *rs) } EXPORT_SYMBOL(ptlrpc_schedule_difficult_reply); -static int -ptlrpc_server_post_idle_rqbds(struct ptlrpc_service_part *svcpt) +static int ptlrpc_server_post_idle_rqbds(struct ptlrpc_service_part *svcpt) { struct ptlrpc_request_buffer_desc *rqbd; int rc; @@ -345,9 +341,8 @@ static void ptlrpc_at_timer(struct timer_list *t) wake_up(&svcpt->scp_waitq); } -static void -ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, - struct ptlrpc_service_conf *conf) +static void ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, + struct ptlrpc_service_conf *conf) { struct ptlrpc_service_thr_conf *tc = &conf->psc_thr; unsigned int init; @@ -457,9 +452,8 @@ static void ptlrpc_at_timer(struct timer_list *t) /** * Initialize percpt data for a service */ -static int -ptlrpc_service_part_init(struct ptlrpc_service *svc, - struct ptlrpc_service_part *svcpt, int cpt) +static int ptlrpc_service_part_init(struct ptlrpc_service *svc, + struct ptlrpc_service_part *svcpt, int cpt) { struct ptlrpc_at_array *array; int size; @@ -549,10 +543,9 @@ static void ptlrpc_at_timer(struct timer_list *t) * This includes starting serving threads , allocating and posting rqbds and * so on. */ -struct ptlrpc_service * -ptlrpc_register_service(struct ptlrpc_service_conf *conf, - struct kset *parent, - struct dentry *debugfs_entry) +struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, + struct kset *parent, + struct dentry *debugfs_entry) { struct ptlrpc_service_cpt_conf *cconf = &conf->psc_cpt; struct ptlrpc_service *service; @@ -1019,8 +1012,7 @@ static int ptlrpc_at_add_timed(struct ptlrpc_request *req) return 0; } -static void -ptlrpc_at_remove_timed(struct ptlrpc_request *req) +static void ptlrpc_at_remove_timed(struct ptlrpc_request *req) { struct ptlrpc_at_array *array; @@ -1351,7 +1343,7 @@ static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req) } } -static int ptlrpc_server_request_add(struct ptlrpc_service_part *svcpt, +static int ptlrpc_server_request_add(struct ptlrpc_service_part *svcpt, struct ptlrpc_request *req) { int rc; @@ -1453,8 +1445,9 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, * \see ptlrpc_server_allow_normal * \see ptlrpc_server_allow high */ -static inline bool -ptlrpc_server_request_pending(struct ptlrpc_service_part *svcpt, bool force) +static inline +bool ptlrpc_server_request_pending(struct ptlrpc_service_part *svcpt, + bool force) { return ptlrpc_server_high_pending(svcpt, force) || ptlrpc_server_normal_pending(svcpt, force); @@ -1510,9 +1503,8 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, * All incoming requests pass through here before getting into * ptlrpc_server_handle_req later on. */ -static int -ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, - struct ptlrpc_thread *thread) +static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, + struct ptlrpc_thread *thread) { struct ptlrpc_service *svc = svcpt->scp_service; struct ptlrpc_request *req; @@ -1668,9 +1660,8 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, * Main incoming request handling logic. * Calls handler function from service to do actual processing. */ -static int -ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, - struct ptlrpc_thread *thread) +static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, + struct ptlrpc_thread *thread) { struct ptlrpc_service *svc = svcpt->scp_service; struct ptlrpc_request *request; @@ -1817,8 +1808,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, /** * An internal function to process a single reply state object. */ -static int -ptlrpc_handle_rs(struct ptlrpc_reply_state *rs) +static int ptlrpc_handle_rs(struct ptlrpc_reply_state *rs) { struct ptlrpc_service_part *svcpt = rs->rs_svcpt; struct ptlrpc_service *svc = svcpt->scp_service; @@ -1918,8 +1908,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, return 1; } -static void -ptlrpc_check_rqbd_pool(struct ptlrpc_service_part *svcpt) +static void ptlrpc_check_rqbd_pool(struct ptlrpc_service_part *svcpt) { int avail = svcpt->scp_nrqbds_posted; int low_water = test_req_buffer_pressure ? 0 : @@ -1942,8 +1931,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, } } -static inline int -ptlrpc_threads_enough(struct ptlrpc_service_part *svcpt) +static inline int ptlrpc_threads_enough(struct ptlrpc_service_part *svcpt) { return svcpt->scp_nreqs_active < svcpt->scp_nthrs_running - 1 - @@ -1955,8 +1943,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, * user can call it w/o any lock but need to hold * ptlrpc_service_part::scp_lock to get reliable result */ -static inline int -ptlrpc_threads_increasable(struct ptlrpc_service_part *svcpt) +static inline int ptlrpc_threads_increasable(struct ptlrpc_service_part *svcpt) { return svcpt->scp_nthrs_running + svcpt->scp_nthrs_starting < @@ -1966,22 +1953,47 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt, /** * too many requests and allowed to create more threads */ -static inline int -ptlrpc_threads_need_create(struct ptlrpc_service_part *svcpt) +static inline int ptlrpc_threads_need_create(struct ptlrpc_service_part *svcpt) { return !ptlrpc_threads_enough(svcpt) && ptlrpc_threads_increasable(svcpt); } -static inline int -ptlrpc_thread_stopping(struct ptlrpc_thread *thread) +static inline int ptlrpc_thread_stopping(struct ptlrpc_thread *thread) { return thread_is_stopping(thread) || thread->t_svcpt->scp_service->srv_is_stopping; } -static inline int -ptlrpc_rqbd_pending(struct ptlrpc_service_part *svcpt) +/* stop the highest numbered thread if there are too many threads running */ +static inline bool ptlrpc_thread_should_stop(struct ptlrpc_thread *thread) +{ + struct ptlrpc_service_part *svcpt = thread->t_svcpt; + + return thread->t_id >= svcpt->scp_service->srv_nthrs_cpt_limit && + thread->t_id == svcpt->scp_thr_nextid - 1; +} + +static void ptlrpc_stop_thread(struct ptlrpc_thread *thread) +{ + CDEBUG(D_INFO, "Stopping thread %s #%u\n", + thread->t_svcpt->scp_service->srv_thread_name, thread->t_id); + thread_add_flags(thread, SVC_STOPPING); +} + +static inline void ptlrpc_thread_stop(struct ptlrpc_thread *thread) +{ + struct ptlrpc_service_part *svcpt = thread->t_svcpt; + + spin_lock(&svcpt->scp_lock); + if (ptlrpc_thread_should_stop(thread)) { + ptlrpc_stop_thread(thread); + svcpt->scp_thr_nextid--; + } + spin_unlock(&svcpt->scp_lock); +} + +static inline int ptlrpc_rqbd_pending(struct ptlrpc_service_part *svcpt) { return !list_empty(&svcpt->scp_rqbd_idle) && svcpt->scp_rqbd_timeout == 0; @@ -2250,14 +2262,19 @@ static int ptlrpc_main(void *arg) CDEBUG(D_RPCTRACE, "Posted buffers: %d\n", svcpt->scp_nrqbds_posted); } + + /* If the number of threads has been tuned downward and this + * thread should be stopped, then stop in reverse order so the + * the threads always have contiguous thread index values. + */ + if (unlikely(ptlrpc_thread_should_stop(thread))) + ptlrpc_thread_stop(thread); } ptlrpc_watchdog_disable(&thread->t_watchdog); out_srv_fini: - /* - * deconstruct service specific state created by ptlrpc_start_thread() - */ + /* deconstruct service thread state created by ptlrpc_start_thread() */ if (svc->srv_ops.so_thr_done) svc->srv_ops.so_thr_done(thread); @@ -2266,8 +2283,8 @@ static int ptlrpc_main(void *arg) kfree(env); } out: - CDEBUG(D_RPCTRACE, "service thread [ %p : %u ] %d exiting: rc %d\n", - thread, thread->t_pid, thread->t_id, rc); + CDEBUG(D_RPCTRACE, "%s: service thread [%p:%u] %d exiting: rc = %d\n", + thread->t_name, thread, thread->t_pid, thread->t_id, rc); spin_lock(&svcpt->scp_lock); if (thread_test_and_clear_flags(thread, SVC_STARTING)) @@ -2416,11 +2433,8 @@ static void ptlrpc_svcpt_stop_threads(struct ptlrpc_service_part *svcpt) spin_lock(&svcpt->scp_lock); /* let the thread know that we would like it to stop asap */ - list_for_each_entry(thread, &svcpt->scp_threads, t_link) { - CDEBUG(D_INFO, "Stopping thread %s #%u\n", - svcpt->scp_service->srv_thread_name, thread->t_id); - thread_add_flags(thread, SVC_STOPPING); - } + list_for_each_entry(thread, &svcpt->scp_threads, t_link) + ptlrpc_stop_thread(thread); wake_up_all(&svcpt->scp_waitq); From patchwork Thu Feb 27 21:12:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410077 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35B95138D for ; Thu, 27 Feb 2020 21:29:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1E94E246A1 for ; Thu, 27 Feb 2020 21:29:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E94E246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 109823494BF; Thu, 27 Feb 2020 13:25:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1627621FD84 for ; Thu, 27 Feb 2020 13:19:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 62E842C78; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 61C1B46A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:09 -0500 Message-Id: <1582838290-17243-262-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 261/622] lnet: Avoid lnet debugfs read/write if ctl_table does not exist X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma Running command "lctl get param -n stats" after lnet is taken down leads to kernel panic because it tries to read from the file which doesn't exist anymore. In lnet_debugfs_read() and lnet_debugfs_write(), check if struct ctl_table is valid before trying to read/write to it. WC-bug-id: https://jira.whamcloud.com/browse/LU-11986 Lustre-commit: 54ca5e471d9f ("LU-11986 lnet: Avoid lnet debugfs read/write if ctl_table does not exist") Signed-off-by: Sonia Sharma Reviewed-on: https://review.whamcloud.com/34634 Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/module.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c index bee2581..37a3fee 100644 --- a/net/lnet/libcfs/module.c +++ b/net/lnet/libcfs/module.c @@ -597,9 +597,11 @@ static ssize_t lnet_debugfs_read(struct file *filp, char __user *buf, { struct ctl_table *table = filp->private_data; loff_t old_pos = *ppos; - ssize_t rc; + ssize_t rc = -EINVAL; - rc = table->proc_handler(table, 0, (void __user *)buf, &count, ppos); + if (table) + rc = table->proc_handler(table, 0, (void __user *)buf, + &count, ppos); /* * On success, the length read is either in error or in count. * If ppos changed, then use count, else use error @@ -617,9 +619,11 @@ static ssize_t lnet_debugfs_write(struct file *filp, const char __user *buf, { struct ctl_table *table = filp->private_data; loff_t old_pos = *ppos; - ssize_t rc; + ssize_t rc = -EINVAL; - rc = table->proc_handler(table, 1, (void __user *)buf, &count, ppos); + if (table) + rc = table->proc_handler(table, 1, (void __user *)buf, &count, + ppos); if (rc) return rc; From patchwork Thu Feb 27 21:12:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410081 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2C41138D for ; Thu, 27 Feb 2020 21:29:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB308246A0 for ; Thu, 27 Feb 2020 21:29:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB308246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 60B573494DC; Thu, 27 Feb 2020 13:25:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5856B21FD8C for ; Thu, 27 Feb 2020 13:19:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 662A72C79; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 64A8F46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:10 -0500 Message-Id: <1582838290-17243-263-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 262/622] lnet: lnd: bring back concurrent_sends X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Revert "lustre: lnd: remove concurrent_sends tunable" This reverts commit 0d4b38f73774f8363d6c419b16d3b34d23ad1ca9. WC-bug-id: https://jira.whamcloud.com/browse/LU-11931 Lustre-commit: 83e45ead69ba ("LU-11931 lnd: bring back concurrent_sends") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34646 Reviewed-by: Alexey Lyashkov Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 24 +++++++++++++++++++++++- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 5 +++-- net/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 30 +++++++++++++++++++++++++++--- 3 files changed, 53 insertions(+), 6 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 44f1d84..baf1006 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -136,7 +136,9 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) -#define IBLND_CQ_ENTRIES(c) (IBLND_RECV_WRS(c) + kiblnd_send_wrs(c)) +#define IBLND_CQ_ENTRIES(c) \ + (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ + c->ibc_peer->ibp_ni)) struct kib_hca_dev; @@ -635,6 +637,26 @@ struct kib_peer_ni { int kiblnd_msg_queue_size(int version, struct lnet_ni *ni); +static inline int +kiblnd_concurrent_sends(int version, struct lnet_ni *ni) +{ + struct lnet_ioctl_config_o2iblnd_tunables *tunables; + int concurrent_sends; + + tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; + concurrent_sends = tunables->lnd_concurrent_sends; + + if (version == IBLND_MSG_VERSION_1) { + if (concurrent_sends > IBLND_MSG_QUEUE_SIZE_V1 * 2) + return IBLND_MSG_QUEUE_SIZE_V1 * 2; + + if (concurrent_sends < IBLND_MSG_QUEUE_SIZE_V1 / 2) + return IBLND_MSG_QUEUE_SIZE_V1 / 2; + } + + return concurrent_sends; +} + static inline void kiblnd_hdev_addref_locked(struct kib_hca_dev *hdev) { diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 68ab7d5..fa5c93a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -806,6 +806,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, { struct kib_msg *msg = tx->tx_msg; struct kib_peer_ni *peer_ni = conn->ibc_peer; + struct lnet_ni *ni = peer_ni->ibp_ni; int ver = conn->ibc_version; int rc; int done; @@ -821,7 +822,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(conn->ibc_credits >= 0); LASSERT(conn->ibc_credits <= conn->ibc_queue_depth); - if (conn->ibc_nsends_posted == conn->ibc_queue_depth) { + if (conn->ibc_nsends_posted == kiblnd_concurrent_sends(ver, ni)) { /* tx completions outstanding... */ CDEBUG(D_NET, "%s: posted enough\n", libcfs_nid2str(peer_ni->ibp_nid)); @@ -976,7 +977,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, return; } - LASSERT(conn->ibc_nsends_posted <= conn->ibc_queue_depth); + LASSERT(conn->ibc_nsends_posted <= kiblnd_concurrent_sends(ver, ni)); LASSERT(!IBLND_OOB_CAPABLE(ver) || conn->ibc_noops_posted <= IBLND_OOB_MSGS(ver)); LASSERT(conn->ibc_reserved_credits >= 0); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c index b5df7fe..c9e14ec 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -109,7 +109,7 @@ static int concurrent_sends; module_param(concurrent_sends, int, 0444); -MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing (obsolete)"); +MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing"); static bool use_fastreg_gaps; module_param(use_fastreg_gaps, bool, 0444); @@ -272,10 +272,33 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) tunables->lnd_peercredits_hiw = peer_credits_hiw; if (tunables->lnd_peercredits_hiw < net_tunables->lct_peer_tx_credits / 2) - tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits / 2; + tunables->lnd_peercredits_hiw = + net_tunables->lct_peer_tx_credits / 2; if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits) - tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1; + tunables->lnd_peercredits_hiw = + net_tunables->lct_peer_tx_credits - 1; + + if (tunables->lnd_concurrent_sends == 0) + tunables->lnd_concurrent_sends = + net_tunables->lct_peer_tx_credits; + + if (tunables->lnd_concurrent_sends > + net_tunables->lct_peer_tx_credits * 2) + tunables->lnd_concurrent_sends = + net_tunables->lct_peer_tx_credits * 2; + + if (tunables->lnd_concurrent_sends < + net_tunables->lct_peer_tx_credits / 2) + tunables->lnd_concurrent_sends = + net_tunables->lct_peer_tx_credits / 2; + + if (tunables->lnd_concurrent_sends < + net_tunables->lct_peer_tx_credits) { + CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n", + tunables->lnd_concurrent_sends, + net_tunables->lct_peer_tx_credits); + } if (!tunables->lnd_fmr_pool_size) tunables->lnd_fmr_pool_size = fmr_pool_size; @@ -298,6 +321,7 @@ void kiblnd_tunables_init(void) default_tunables.lnd_version = 0; default_tunables.lnd_peercredits_hiw = peer_credits_hiw; default_tunables.lnd_map_on_demand = map_on_demand; + default_tunables.lnd_concurrent_sends = concurrent_sends; default_tunables.lnd_fmr_pool_size = fmr_pool_size; default_tunables.lnd_fmr_flush_trigger = fmr_flush_trigger; default_tunables.lnd_fmr_cache = fmr_cache; From patchwork Thu Feb 27 21:12:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410085 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 767F6138D for ; Thu, 27 Feb 2020 21:29:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5BC91246A0 for ; Thu, 27 Feb 2020 21:29:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BC91246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B74ED348E8E; Thu, 27 Feb 2020 13:25:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B42BD21FD9A for ; Thu, 27 Feb 2020 13:19:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6960C2C7A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 67B1F46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:11 -0500 Message-Id: <1582838290-17243-264-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 263/622] lnet: properly cleanup lnet debugfs files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The function lnet_router_debugfs_remove() is suppose to cleanup the lnet specific debugfs files but that is not happening at all. Change lnet_remove_debugfs() from doing the final debugfs lnet and libcfs cleanup to doing specific debugfs file removal. We can make libcfs module unloading to directly finish the entire libcfs and debugfs tree removal instead. With this change we can make lnet_router_debugfs_fini() call lnet_remove_debugfs(). WC-bug-id: https://jira.whamcloud.com/browse/LU-11986 Lustre-commit: 8cb7ccf54e2d ("LU-11986 lnet: properly cleanup lnet debugfs files") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/34669 Reviewed-by: Sonia Sharma Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/libcfs/libcfs.h | 1 + net/lnet/libcfs/module.c | 16 ++++++++++++---- net/lnet/lnet/router_proc.c | 1 + 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/libcfs/libcfs.h b/include/linux/libcfs/libcfs.h index 33f7477..d3a9754 100644 --- a/include/linux/libcfs/libcfs.h +++ b/include/linux/libcfs/libcfs.h @@ -57,6 +57,7 @@ static inline int notifier_from_ioctl_errno(int err) extern struct workqueue_struct *cfs_rehash_wq; void lnet_insert_debugfs(struct ctl_table *table); +void lnet_remove_debugfs(struct ctl_table *table); /* * Memory diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c index 37a3fee..2e803d6 100644 --- a/net/lnet/libcfs/module.c +++ b/net/lnet/libcfs/module.c @@ -691,12 +691,18 @@ static void lnet_insert_debugfs_links( symlinks->target); } -static void lnet_remove_debugfs(void) +void lnet_remove_debugfs(struct ctl_table *table) { - debugfs_remove_recursive(lnet_debugfs_root); + for (; table && table->procname; table++) { + struct qstr dname = QSTR_INIT(table->procname, + strlen(table->procname)); + struct dentry *dentry; - lnet_debugfs_root = NULL; + dentry = d_hash_and_lookup(lnet_debugfs_root, &dname); + debugfs_remove(dentry); + } } +EXPORT_SYMBOL_GPL(lnet_remove_debugfs); static DEFINE_MUTEX(libcfs_startup); static int libcfs_active; @@ -771,7 +777,9 @@ static void libcfs_exit(void) { int rc; - lnet_remove_debugfs(); + /* Remove everthing */ + debugfs_remove_recursive(lnet_debugfs_root); + lnet_debugfs_root = NULL; if (cfs_rehash_wq) destroy_workqueue(cfs_rehash_wq); diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 45abcfb..8517411 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -936,4 +936,5 @@ void lnet_router_debugfs_init(void) void lnet_router_debugfs_fini(void) { + lnet_remove_debugfs(lnet_table); } From patchwork Thu Feb 27 21:12:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410165 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1152A92A for ; Thu, 27 Feb 2020 21:31:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EDEDF24677 for ; Thu, 27 Feb 2020 21:31:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDEDF24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38734349941; Thu, 27 Feb 2020 13:26:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 029CD21FDA3 for ; Thu, 27 Feb 2020 13:19:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6CC7F2C7B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6AA8146D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:12 -0500 Message-Id: <1582838290-17243-265-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 264/622] lustre: mdc: reset lmm->lmm_stripe_offset in mdc_save_lovea X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov In order to prepare for replay lmm->lmm_stripe_offset (which contains layout generation) has to be set to -1 (LOV_OFFSET_DEFAULT) in order to not confuse lod_verify_v1v3 Fix patch for ("LU-169 lov: add generation number to LOV EA") which was apart of original Lustre merger to Linux kernel. WC-bug-id: https://jira.whamcloud.com/browse/LU-12040 Lustre-commit: c872afa36ff5 ("LU-12040 mdc: reset lmm->lmm_stripe_offset in mdc_save_lovea") Signed-off-by: Vladimir Saveliev Signed-off-by: Alexey Lyashkov Cray-bug-id: LUS-7008 Reviewed-on: https://review.whamcloud.com/34371 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_lib.c | 3 ++- fs/lustre/mdc/mdc_locks.c | 8 ++++++-- include/uapi/linux/lustre/lustre_user.h | 1 + 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index 980676a..f0e5a84 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -406,7 +406,8 @@ void mdc_setattr_pack(struct ptlrpc_request *req, struct md_op_data *op_data, lum->lmm_magic = cpu_to_le32(LOV_USER_MAGIC_V1); lum->lmm_stripe_size = 0; lum->lmm_stripe_count = 0; - lum->lmm_stripe_offset = (typeof(lum->lmm_stripe_offset))(-1); + lum->lmm_stripe_offset = + (typeof(lum->lmm_stripe_offset))LOV_OFFSET_DEFAULT; } else { memcpy(lum, ea, ealen); } diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 05447ea..019eb35 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -220,8 +220,8 @@ static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size) { struct req_capsule *pill = &req->rq_pill; + struct lov_user_md *lmm; int rc = 0; - void *lmm; if (req_capsule_get_size(pill, field, RCL_CLIENT) < size) { rc = sptlrpc_cli_enlarge_reqbuf(req, field, size); @@ -237,8 +237,12 @@ static int mdc_save_lovea(struct ptlrpc_request *req, req_capsule_set_size(pill, field, RCL_CLIENT, size); lmm = req_capsule_client_get(pill, field); - if (lmm) + if (lmm) { memcpy(lmm, data, size); + /* overwrite layout generation returned from the MDS */ + lmm->lmm_stripe_offset = + (typeof(lmm->lmm_stripe_offset))LOV_OFFSET_DEFAULT; + } return rc; } diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 1d402f1..3901eb2 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -404,6 +404,7 @@ struct ll_ioc_lease_id { #define LOV_MAXPOOLNAME 15 #define LOV_POOLNAMEF "%.15s" +#define LOV_OFFSET_DEFAULT ((__u16)-1) #define LOV_MIN_STRIPE_BITS 16 /* maximum PAGE_SIZE (ia64), power of 2 */ #define LOV_MIN_STRIPE_SIZE (1 << LOV_MIN_STRIPE_BITS) From patchwork Thu Feb 27 21:12:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410089 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 317581580 for ; Thu, 27 Feb 2020 21:29:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1A216246A0 for ; Thu, 27 Feb 2020 21:29:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A216246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1E05E349555; Thu, 27 Feb 2020 13:25:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 46B1721FB9D for ; Thu, 27 Feb 2020 13:19:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6FD6F2C7C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6DDA8468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:13 -0500 Message-Id: <1582838290-17243-266-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 265/622] lnet: Cleanup lnet_get_rtr_pool_cfg X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The cfs_percpt_for_each loop contains an off-by-one error that causes memory corruption. In addition, the way these loops are nested results in unnecessary iterations. We only need to iterate through the cpts until we match the cpt number passed as an argument. At that point we want to copy the router buffer pools for that cpt. Cray-bug-id: LUS-7240 WC-bug-id: https://jira.whamcloud.com/browse/LU-12152 Lustre-commit: 187117fd94e4 ("LU-12152 lnet: Cleanup lnet_get_rtr_pool_cfg") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34591 Reviewed-by: James Simmons Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 66a116c..78a8659 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -549,29 +549,30 @@ static void lnet_shuffle_seed(void) lnet_del_route(LNET_NIDNET(LNET_NID_ANY), LNET_NID_ANY); } -int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) +int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) { + struct lnet_rtrbufpool *rbp; int i, rc = -ENOENT, j; if (!the_lnet.ln_rtrpools) return rc; - for (i = 0; i < LNET_NRBPOOLS; i++) { - struct lnet_rtrbufpool *rbp; - lnet_net_lock(LNET_LOCK_EX); - cfs_percpt_for_each(rbp, j, the_lnet.ln_rtrpools) { - if (i++ != idx) - continue; + cfs_percpt_for_each(rbp, i, the_lnet.ln_rtrpools) { + if (i != cpt) + continue; - pool_cfg->pl_pools[i].pl_npages = rbp[i].rbp_npages; - pool_cfg->pl_pools[i].pl_nbuffers = rbp[i].rbp_nbuffers; - pool_cfg->pl_pools[i].pl_credits = rbp[i].rbp_credits; - pool_cfg->pl_pools[i].pl_mincredits = rbp[i].rbp_mincredits; - rc = 0; - break; + lnet_net_lock(i); + for (j = 0; j < LNET_NRBPOOLS; j++) { + pool_cfg->pl_pools[j].pl_npages = rbp[j].rbp_npages; + pool_cfg->pl_pools[j].pl_nbuffers = rbp[j].rbp_nbuffers; + pool_cfg->pl_pools[j].pl_credits = rbp[j].rbp_credits; + pool_cfg->pl_pools[j].pl_mincredits = + rbp[j].rbp_mincredits; } - lnet_net_unlock(LNET_LOCK_EX); + lnet_net_unlock(i); + rc = 0; + break; } lnet_net_lock(LNET_LOCK_EX); From patchwork Thu Feb 27 21:12:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410169 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF542138D for ; Thu, 27 Feb 2020 21:31:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A6B3F24677 for ; Thu, 27 Feb 2020 21:31:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6B3F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E7ED348B0E; Thu, 27 Feb 2020 13:27:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 87F5C21FDB4 for ; Thu, 27 Feb 2020 13:19:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 735B72C7D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7123346A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:14 -0500 Message-Id: <1582838290-17243-267-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 266/622] lustre: quota: make overquota flag for old req X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang For the old request with over quota flag, the over quota flag should still be marked at OSC, because the old request could be processed afther the new request at OST, then it won't break the quota enforement at OST. WC-bug-id: https://jira.whamcloud.com/browse/LU-11678 Lustre-commit: c59cf862c3c0 ("LU-11678 quota: make overquota flag for old req") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/34645 Reviewed-by: Andreas Dilger Reviewed-by: Shilong Wang Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_quota.c | 11 +++++++++-- include/uapi/linux/lustre/lustre_idl.h | 3 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/lustre/osc/osc_quota.c b/fs/lustre/osc/osc_quota.c index 316e087..8ff803c 100644 --- a/fs/lustre/osc/osc_quota.c +++ b/fs/lustre/osc/osc_quota.c @@ -119,10 +119,17 @@ int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[], return 0; mutex_lock(&cli->cl_quota_mutex); - if (cli->cl_quota_last_xid > xid) + /* still mark the quots is running out for the old request, because it + * could be processed after the new request at OST, the side effect is + * the following request will be processed synchronously, but it will + * not break the quota enforcement. + */ + if (cli->cl_quota_last_xid > xid && !(flags & OBD_FL_NO_QUOTA_ALL)) goto out_unlock; - cli->cl_quota_last_xid = xid; + if (cli->cl_quota_last_xid < xid) + cli->cl_quota_last_xid = xid; + for (type = 0; type < MAXQUOTAS; type++) { struct osc_quota_info *oqi; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 1b4b018..3a2a093 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -998,6 +998,9 @@ enum obdo_flags { OBD_FL_CKSUM_T10IP4K | OBD_FL_CKSUM_T10CRC512 | OBD_FL_CKSUM_T10CRC4K), + + OBD_FL_NO_QUOTA_ALL = OBD_FL_NO_USRQUOTA | OBD_FL_NO_GRPQUOTA | + OBD_FL_NO_PRJQUOTA, }; /* From patchwork Thu Feb 27 21:12:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410093 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76C8092A for ; Thu, 27 Feb 2020 21:30:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F2AE246A0 for ; Thu, 27 Feb 2020 21:30:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F2AE246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C171B349587; Thu, 27 Feb 2020 13:25:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C91ED21FC7D for ; Thu, 27 Feb 2020 13:19:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 759D12C7E; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 73FF346F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:15 -0500 Message-Id: <1582838290-17243-268-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 267/622] lustre: osd: Set max ea size to XATTR_SIZE_MAX X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Lustre currently limits EA size to either ~1 MiB (ldiskfs) or 32K (ZFS). VFS has its own limit, XATTR_SIZE_MAX, which we must respect to interoperate correctly with userspace tools like tar, getattr, and the getxattr() syscall. Set this as the new max EA size for both ldiskfs and ZFS. (The current 32K on ZFS is too small for LOV_MAX_STRIPE_COUNT [2000] files, so needs to be raised regardless.) In order to use this correctly, we have to use the real ea size on the client. The previous code for maximum ea size on the client (KEY_MAX_EASIZE, llite.max_easize) used a calculated value based on number of targets. With one exception, the mdc code already uses the default ea size rather than the max. Default ea size adjusts automatically to the largest size sent by the server. The exception is the open code, which uses the max so it never has to resend a layout request. This patch changes it to use default, which means that the first time a very widely striped file is opened, the open will be resent. Add limit checks on client & server so the xattr size limit is honored. WC-bug-id: https://jira.whamcloud.com/browse/LU-11868 Lustre-commit: 3ec712bd183a ("LU-11868 osd: Set max ea size to XATTR_SIZE_MAX") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34058 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 7 +++++++ fs/lustre/llite/llite_lib.c | 4 ++++ fs/lustre/lov/lov_obd.c | 5 +---- fs/lustre/mdc/mdc_locks.c | 12 ++++++------ 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 2195f85..687b54b 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -154,6 +154,13 @@ enum obd_cl_sem_lock_class { */ #define OBD_MAX_DEFAULT_EA_SIZE 4096 +/* + * Lustre can handle larger xattrs internally, but we must respect the Linux + * VFS limitation or tools like tar cannot interact with Lustre volumes + * correctly. + */ +#define OBD_MAX_EA_SIZE XATTR_SIZE_MAX + struct mdc_rpc_lock; struct obd_import; struct client_obd { diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 347bdd6..aadde3f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -663,12 +663,16 @@ int ll_get_max_mdsize(struct ll_sb_info *sbi, int *lmmsize) return rc; } + CDEBUG(D_INFO, "max LOV ea size: %d\n", *lmmsize); + size = sizeof(int); rc = obd_get_info(NULL, sbi->ll_md_exp, sizeof(KEY_MAX_EASIZE), KEY_MAX_EASIZE, &size, lmmsize); if (rc) CERROR("Get max mdsize error rc %d\n", rc); + CDEBUG(D_INFO, "max LMV ea size: %d\n", *lmmsize); + return rc; } diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 240cc6f9..3a90e7e 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -1162,10 +1162,7 @@ static int lov_get_info(const struct lu_env *env, struct obd_export *exp, lov_tgts_getref(obddev); if (KEY_IS(KEY_MAX_EASIZE)) { - u32 max_stripe_count = min_t(u32, ld->ld_active_tgt_count, - LOV_MAX_STRIPE_COUNT); - - *((u32 *)val) = lov_mds_md_size(max_stripe_count, LOV_MAGIC_V3); + *((u32 *)val) = exp->exp_connect_data.ocd_max_easize; } else if (KEY_IS(KEY_DEFAULT_EASIZE)) { u32 def_stripe_count = min_t(u32, ld->ld_default_stripe_count, LOV_MAX_STRIPE_COUNT); diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 019eb35..f6273ef 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -256,12 +256,15 @@ static int mdc_save_lovea(struct ptlrpc_request *req, struct ldlm_intent *lit; const void *lmm = op_data->op_data; u32 lmmsize = op_data->op_data_size; + u32 mdt_md_capsule_size; LIST_HEAD(cancels); int count = 0; enum ldlm_mode mode; int rc; int repsize, repsize_estimate; + mdt_md_capsule_size = obddev->u.cli.cl_default_mds_easize; + it->it_create_mode = (it->it_create_mode & ~S_IFMT) | S_IFREG; /* XXX: openlock is not cancelled for cross-refs. */ @@ -348,7 +351,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, lmmsize); req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, - obddev->u.cli.cl_max_mds_easize); + mdt_md_capsule_size); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); if (!(it->it_op & IT_CREAT) && it->it_op & IT_OPEN && @@ -387,7 +390,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, lustre_msg_early_size()); /* Estimate free space for DoM files in repbuf */ repsize_estimate = repsize - (req->rq_replen - - obddev->u.cli.cl_max_mds_easize + + mdt_md_capsule_size + sizeof(struct lov_comp_md_v1) + sizeof(struct lov_comp_md_entry_v1) + lov_mds_md_size(0, LOV_MAGIC_V3)); @@ -539,10 +542,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT); lit->opc = (u64)it->it_op; - if (obddev->u.cli.cl_default_mds_easize > 0) - easize = obddev->u.cli.cl_default_mds_easize; - else - easize = obddev->u.cli.cl_max_mds_easize; + easize = obddev->u.cli.cl_default_mds_easize; /* pack the intended request */ mdc_getattr_pack(req, valid, it->it_flags, op_data, easize); From patchwork Thu Feb 27 21:12:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410097 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 595DC92A for ; Thu, 27 Feb 2020 21:30:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 41B3A246A0 for ; Thu, 27 Feb 2020 21:30:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 41B3A246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFFCC3488F5; Thu, 27 Feb 2020 13:25:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2DEDA21FDCB for ; Thu, 27 Feb 2020 13:19:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 788962C7F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 76DE746C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:16 -0500 Message-Id: <1582838290-17243-269-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 268/622] lustre: lov: Remove unnecessary assert X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell This is asserting on network data from the server, and additionally, the LU-9846 (overstriping) work shows this condition is not a problem if it does somehow occur. WC-bug-id: https://jira.whamcloud.com/browse/LU-11796 Lustre-commit: 1d7104485119 ("LU-11796 lov: Remove unnecessary assert") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/33882 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/lov/lov_object.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index c6324f4..c04b2ae 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -210,7 +210,6 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev, spin_lock_init(&r0->lo_sub_lock); r0->lo_nr = lse->lsme_stripe_count; - LASSERT(r0->lo_nr <= lov_targets_nr(dev)); r0->lo_sub = kcalloc(r0->lo_nr, sizeof(r0->lo_sub[0]), GFP_KERNEL); From patchwork Thu Feb 27 21:12:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410319 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC8C7138D for ; Thu, 27 Feb 2020 21:34:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C55B824677 for ; Thu, 27 Feb 2020 21:34:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C55B824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4F04C349FFD; Thu, 27 Feb 2020 13:29:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 716C621FDD8 for ; Thu, 27 Feb 2020 13:19:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7B7838A20; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 79DBB46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:17 -0500 Message-Id: <1582838290-17243-270-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 269/622] lnet: o2iblnd: kib_conn leak X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh A new tx can be queued while kiblnd_finalise_conn() aborts txs. Thus a reference from new tx will prevent connection from moving into kib_connd_zombies. Insert new tx after IBLND_CONN_DISCONNECTED into ibc_zombie_txs list and abort it during kiblnd_destroy_conn(). Cray-bug-id: LUS-6412 WC-bug-id: https://jira.whamcloud.com/browse/LU-11756 Lustre-commit: a155c3fca38d ("LU-11756 o2iblnd: kib_conn leak") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/33828 Reviewed-by: Alexey Lyashkov Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 4 ++++ net/lnet/klnds/o2iblnd/o2iblnd.h | 5 ++++- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 21 ++++++++++++++++++--- 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 0e207ef..bb7590f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -744,6 +744,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, INIT_LIST_HEAD(&conn->ibc_tx_queue_rsrvd); INIT_LIST_HEAD(&conn->ibc_tx_queue_nocred); INIT_LIST_HEAD(&conn->ibc_active_txs); + INIT_LIST_HEAD(&conn->ibc_zombie_txs); spin_lock_init(&conn->ibc_lock); conn->ibc_connvars = kzalloc_cpt(sizeof(*conn->ibc_connvars), GFP_NOFS, cpt); @@ -951,6 +952,9 @@ void kiblnd_destroy_conn(struct kib_conn *conn) if (conn->ibc_cq) ib_destroy_cq(conn->ibc_cq); + kiblnd_txlist_done(&conn->ibc_zombie_txs, -ECONNABORTED, + LNET_MSG_STATUS_OK); + if (conn->ibc_rx_pages) kiblnd_unmap_rx_descs(conn); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index baf1006..eb80d5e 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -581,7 +581,9 @@ struct kib_conn { struct list_head ibc_tx_queue_rsrvd; /* sends that need to */ /* reserve an ACK/DONE msg */ struct list_head ibc_active_txs; /* active tx awaiting completion */ - spinlock_t ibc_lock; /* serialise */ + spinlock_t ibc_lock; /* zombie tx awaiting done */ + struct list_head ibc_zombie_txs; + /* serialise */ struct kib_rx *ibc_rxs; /* the rx descs */ struct kib_pages *ibc_rx_pages; /* premapped rx msg pages */ @@ -1005,6 +1007,7 @@ static inline unsigned int kiblnd_sg_dma_len(struct ib_device *dev, #define KIBLND_CONN_PARAM(e) ((e)->param.conn.private_data) #define KIBLND_CONN_PARAM_LEN(e) ((e)->param.conn.private_data_len) +void kiblnd_abort_txs(struct kib_conn *conn, struct list_head *txs); void kiblnd_map_rx_descs(struct kib_conn *conn); void kiblnd_unmap_rx_descs(struct kib_conn *conn); void kiblnd_pool_free_node(struct kib_pool *pool, struct list_head *node); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index fa5c93a..a3abbb6 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1211,6 +1211,21 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(!tx->tx_queued); /* not queued for sending already */ LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED); + if (conn->ibc_state >= IBLND_CONN_DISCONNECTED) { + tx->tx_status = -ECONNABORTED; + tx->tx_waiting = 0; + if (tx->tx_conn) { + /* PUT_DONE first attached to conn as a PUT_REQ */ + LASSERT(tx->tx_conn == conn); + LASSERT(tx->tx_msg->ibm_type == IBLND_MSG_PUT_DONE); + tx->tx_conn = NULL; + kiblnd_conn_decref(conn); + } + list_add(&tx->tx_list, &conn->ibc_zombie_txs); + + return; + } + timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC; tx->tx_queued = 1; tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns); @@ -2056,7 +2071,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); } -static void +void kiblnd_abort_txs(struct kib_conn *conn, struct list_head *txs) { LIST_HEAD(zombies); @@ -2123,8 +2138,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, LASSERT(!in_interrupt()); LASSERT(conn->ibc_state > IBLND_CONN_INIT); - kiblnd_set_conn_state(conn, IBLND_CONN_DISCONNECTED); - /* * abort_receives moves QP state to IB_QPS_ERR. This is only required * for connections that didn't get as far as being connected, because @@ -2132,6 +2145,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, */ kiblnd_abort_receives(conn); + kiblnd_set_conn_state(conn, IBLND_CONN_DISCONNECTED); + /* * Complete all tx descs not waiting for sends to complete. * NB we should be safe from RDMA now that the QP has changed state From patchwork Thu Feb 27 21:12:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15DF3924 for ; Thu, 27 Feb 2020 21:45:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F2AE924690 for ; Thu, 27 Feb 2020 21:45:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F2AE924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5124934AF60; Thu, 27 Feb 2020 13:36:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D65F221FB34 for ; Thu, 27 Feb 2020 13:19:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7EC8D8A21; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7CF95468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:18 -0500 Message-Id: <1582838290-17243-271-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 270/622] lustre: llite: switch to use ll_fsname directly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong There are many places which try to access filesystem fsname, instead of parsing it everytime, just store it into @sbi, we could use @ll_fsname directly whenever we need. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: 506b68a35904 ("LU-12043 llite: switch to use ll_fsname directly") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34602 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 7 ++-- fs/lustre/llite/file.c | 42 ++++++++++------------ fs/lustre/llite/lcommon_cl.c | 5 ++- fs/lustre/llite/llite_internal.h | 4 ++- fs/lustre/llite/llite_lib.c | 76 ++++++++++++++-------------------------- fs/lustre/llite/llite_nfs.c | 8 ++--- fs/lustre/llite/lproc_llite.c | 10 +++--- fs/lustre/llite/statahead.c | 10 +++--- fs/lustre/llite/symlink.c | 9 +++-- fs/lustre/llite/vvp_io.c | 4 +-- fs/lustre/llite/xattr.c | 2 +- 11 files changed, 72 insertions(+), 105 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index ef4fa36..8293a01 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -602,8 +602,8 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, buf = param; /* Get fsname and assume devname to be -MDT0000. */ - ll_get_fsname(inode->i_sb, buf, MTI_NAME_MAXLEN); - strcat(buf, "-MDT0000.lov"); + snprintf(buf, MGS_PARAM_MAXLEN, "%s-MDT0000.lov", + sbi->ll_fsname); buf += strlen(buf); /* Set root stripesize */ @@ -1276,8 +1276,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) rc = ll_get_fid_by_name(inode, filename, namelen, NULL, NULL); if (rc < 0) { CERROR("%s: lookup %.*s failed: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), namelen, - filename, rc); + sbi->ll_fsname, namelen, filename, rc); goto out_free; } out_free: diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index f5b5eec..0f15ea8 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -135,8 +135,7 @@ static int ll_close_inode_openhandle(struct inode *inode, if (!class_exp2obd(md_exp)) { CERROR("%s: invalid MDC connection handle closing " DFID "\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&lli->lli_fid)); + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid)); rc = 0; goto out; } @@ -460,7 +459,7 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, */ if (rnb->rnb_offset + rnb->rnb_len < i_size_read(inode)) { CERROR("%s: server returns off/len %llu/%u < i_size %llu\n", - ll_get_fsname(inode->i_sb, NULL, 0), rnb->rnb_offset, + ll_i2sbi(inode)->ll_fsname, rnb->rnb_offset, rnb->rnb_len, i_size_read(inode)); return; } @@ -486,8 +485,8 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, if (IS_ERR(vmpage)) { CWARN("%s: cannot fill page %lu for "DFID " with data: rc = %li\n", - ll_get_fsname(inode->i_sb, NULL, 0), - index + start, PFID(lu_object_fid(&obj->co_lu)), + ll_i2sbi(inode)->ll_fsname, index + start, + PFID(lu_object_fid(&obj->co_lu)), PTR_ERR(vmpage)); break; } @@ -1080,8 +1079,7 @@ static int ll_lease_och_release(struct inode *inode, struct file *file) rc2 = ll_close_inode_openhandle(inode, och, 0, NULL); if (rc2 < 0) CERROR("%s: error closing file " DFID ": %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&ll_i2info(inode)->lli_fid), rc2); + sbi->ll_fsname, PFID(&ll_i2info(inode)->lli_fid), rc2); och = NULL; /* och has been freed in ll_close_inode_openhandle() */ out_release_it: ll_intent_release(&it); @@ -1124,7 +1122,7 @@ static int ll_swap_layouts_close(struct obd_client_handle *och, int rc; CDEBUG(D_INODE, "%s: biased close of file " DFID "\n", - ll_get_fsname(inode->i_sb, NULL, 0), PFID(fid1)); + ll_i2sbi(inode)->ll_fsname, PFID(fid1)); rc = ll_check_swap_layouts_validity(inode, inode2); if (rc < 0) @@ -2293,7 +2291,7 @@ int ll_hsm_release(struct inode *inode) u16 refcheck; CDEBUG(D_INODE, "%s: Releasing file " DFID ".\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(&ll_i2info(inode)->lli_fid)); och = ll_lease_open(inode, NULL, FMODE_WRITE, MDS_OPEN_RELEASE); @@ -2716,6 +2714,7 @@ int ll_file_lock_ahead(struct file *file, struct llapi_lu_ladvise *ladvise) static int ll_ladvise_sanity(struct inode *inode, struct llapi_lu_ladvise *ladvise) { + struct ll_sb_info *sbi = ll_i2sbi(inode); enum lu_ladvise_type advice = ladvise->lla_advice; /* Note the peradvice flags is a 32 bit field, so per advice flags must * be in the first 32 bits of enum ladvise_flags @@ -2728,7 +2727,7 @@ static int ll_ladvise_sanity(struct inode *inode, rc = -EINVAL; CDEBUG(D_VFSTRACE, "%s: advice with value '%d' not recognized, last supported advice is %s (value '%d'): rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), advice, + sbi->ll_fsname, advice, ladvise_names[LU_LADVISE_MAX - 1], LU_LADVISE_MAX - 1, rc); goto out; @@ -2741,7 +2740,7 @@ static int ll_ladvise_sanity(struct inode *inode, rc = -EINVAL; CDEBUG(D_VFSTRACE, "%s: Invalid flags (%x) for %s: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), flags, + sbi->ll_fsname, flags, ladvise_names[advice], rc); goto out; } @@ -2753,7 +2752,7 @@ static int ll_ladvise_sanity(struct inode *inode, rc = -EINVAL; CDEBUG(D_VFSTRACE, "%s: Invalid mode (%d) for %s: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + sbi->ll_fsname, ladvise->lla_lockahead_mode, ladvise_names[advice], rc); goto out; @@ -2769,7 +2768,7 @@ static int ll_ladvise_sanity(struct inode *inode, rc = -EINVAL; CDEBUG(D_VFSTRACE, "%s: Invalid flags (%x) for %s: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), flags, + sbi->ll_fsname, flags, ladvise_names[advice], rc); goto out; } @@ -2777,7 +2776,7 @@ static int ll_ladvise_sanity(struct inode *inode, rc = -EINVAL; CDEBUG(D_VFSTRACE, "%s: Invalid range (%llu to %llu) for %s: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + sbi->ll_fsname, ladvise->lla_start, ladvise->lla_end, ladvise_names[advice], rc); goto out; @@ -3970,7 +3969,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (le32_to_cpu(lum->lum_stripe_count) > 1 || ll_i2info(child_inode)->lli_lsm_md) { CERROR("%s: MDT doesn't support stripe directory migration!\n", - ll_get_fsname(parent->i_sb, NULL, 0)); + ll_i2sbi(parent)->ll_fsname); rc = -EOPNOTSUPP; goto out_iput; } @@ -3997,7 +3996,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, op_data->op_fid3 = *ll_inode2fid(child_inode); if (!fid_is_sane(&op_data->op_fid3)) { CERROR("%s: migrate %s, but fid " DFID " is insane\n", - ll_get_fsname(parent->i_sb, NULL, 0), name, + ll_i2sbi(parent)->ll_fsname, name, PFID(&op_data->op_fid3)); rc = -EINVAL; goto out_unlock; @@ -4171,7 +4170,7 @@ static int ll_inode_revalidate_fini(struct inode *inode, int rc) } else if (rc != 0) { CDEBUG_LIMIT((rc == -EACCES || rc == -EIDRM) ? D_INFO : D_ERROR, "%s: revalidate FID " DFID " error: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), rc); } @@ -4677,8 +4676,7 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode, /* wait for IO to complete if it's still being used. */ if (wait_layout) { CDEBUG(D_INODE, "%s: " DFID "(%p) wait for layout reconf\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&lli->lli_fid), inode); + sbi->ll_fsname, PFID(&lli->lli_fid), inode); memset(&conf, 0, sizeof(conf)); conf.coc_opc = OBJECT_CONF_WAIT; @@ -4689,8 +4687,7 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode, CDEBUG(D_INODE, "%s: file=" DFID " waiting layout return: %d.\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&lli->lli_fid), rc); + sbi->ll_fsname, PFID(&lli->lli_fid), rc); } return rc; } @@ -4727,8 +4724,7 @@ static int ll_layout_intent(struct inode *inode, struct layout_intent *intent) it.it_flags = FMODE_WRITE; LDLM_DEBUG_NOLOCK("%s: requeue layout lock for file " DFID "(%p)", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&lli->lli_fid), inode); + sbi->ll_fsname, PFID(&lli->lli_fid), inode); rc = md_intent_lock(sbi->ll_md_exp, op_data, &it, &req, &ll_md_blocking_ast, 0); diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c index 9ac80e0..3129316 100644 --- a/fs/lustre/llite/lcommon_cl.c +++ b/fs/lustre/llite/lcommon_cl.c @@ -174,8 +174,7 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md) if (!(inode->i_state & I_NEW)) { result = -EIO; CERROR("%s: unexpected not-NEW inode "DFID": rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), PFID(fid), - result); + ll_i2sbi(inode)->ll_fsname, PFID(fid), result); goto out; } @@ -202,7 +201,7 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md) if (result) CERROR("%s: failed to initialize cl_object "DFID": rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), PFID(fid), result); + ll_i2sbi(inode)->ll_fsname, PFID(fid), result); out: cl_env_put(env, &refcheck); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 5a0a5ed..b9478f4d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -556,6 +556,9 @@ struct ll_sb_info { /* File heat */ unsigned int ll_heat_decay_weight; unsigned int ll_heat_period_second; + + /* filesystem fsname */ + char ll_fsname[LUSTRE_MAXFSNAME + 1]; }; #define SBI_DEFAULT_HEAT_DECAY_WEIGHT ((80 * 256 + 50) / 100) @@ -935,7 +938,6 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, u32 mode, u32 opc, void *data); void ll_finish_md_op_data(struct md_op_data *op_data); int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg); -char *ll_get_fsname(struct super_block *sb, char *buf, int buflen); void ll_compute_rootsquash_state(struct ll_sb_info *sbi); void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req); ssize_t ll_copy_user_md(const struct lov_user_md __user *md, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index aadde3f..8e5cf0a 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -586,9 +586,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sb->s_root = d_make_root(root); if (!sb->s_root) { - CERROR("%s: can't make root dentry\n", - ll_get_fsname(sb, NULL, 0)); err = -ENOMEM; + CERROR("%s: can't make root dentry, rc = %d\n", + sbi->ll_fsname, err); goto out_lock_cn_cb; } @@ -614,7 +614,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sbi->ll_dt_obd->obd_type->typ_name); if (err < 0) { CERROR("%s: could not register %s in llite: rc = %d\n", - dt, ll_get_fsname(sb, NULL, 0), err); + dt, sbi->ll_fsname, err); err = 0; } } @@ -625,7 +625,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sbi->ll_md_obd->obd_type->typ_name); if (err < 0) { CERROR("%s: could not register %s in llite: rc = %d\n", - md, ll_get_fsname(sb, NULL, 0), err); + md, sbi->ll_fsname, err); err = 0; } } @@ -1004,6 +1004,19 @@ int ll_fill_super(struct super_block *sb) if (ptr && (strcmp(ptr, "-client") == 0)) len -= 7; + if (len > LUSTRE_MAXFSNAME) { + if (unlikely(len >= MAX_OBD_NAME)) + len = MAX_OBD_NAME - 1; + strncpy(name, profilenm, len); + name[len] = '\0'; + err = -ENAMETOOLONG; + CERROR("%s: fsname longer than %u characters: rc = %d\n", + name, LUSTRE_MAXFSNAME, err); + goto out_free; + } + strncpy(sbi->ll_fsname, profilenm, len); + sbi->ll_fsname[len] = '\0'; + /* Mount info */ snprintf(name, sizeof(name), "%.*s-%px", len, lsi->lsi_lmd->lmd_profile, sb); @@ -1014,7 +1027,7 @@ int ll_fill_super(struct super_block *sb) err = ll_debugfs_register_super(sb, name); if (err < 0) { CERROR("%s: could not register mountpoint in llite: rc = %d\n", - ll_get_fsname(sb, NULL, 0), err); + sbi->ll_fsname, err); err = 0; } @@ -1208,7 +1221,7 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb, inode = iget_locked(sb, ino); if (!inode) { CERROR("%s: failed get simple inode " DFID ": rc = -ENOENT\n", - ll_get_fsname(sb, NULL, 0), PFID(fid)); + sbi->ll_fsname, PFID(fid)); return ERR_PTR(-ENOENT); } @@ -1252,8 +1265,7 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) LASSERT(lsm); CDEBUG(D_INODE, "%s: "DFID" set dir layout:\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(&lli->lli_fid)); + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid)); lsm_md_dump(D_INODE, lsm); /* @@ -1322,7 +1334,7 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) if (lsm->lsm_md_layout_version <= lli->lli_lsm_md->lsm_md_layout_version) { CERROR("%s: " DFID " dir layout mismatch:\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid)); lsm_md_dump(D_ERROR, lli->lli_lsm_md); lsm_md_dump(D_ERROR, lsm); @@ -1529,7 +1541,7 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, int rc = 0; CDEBUG(D_VFSTRACE, "%s: setattr inode " DFID "(%p) from %llu to %llu, valid %x, hsm_import %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), PFID(&lli->lli_fid), inode, + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid), inode, i_size_read(inode), attr->ia_size, attr->ia_valid, hsm_import); if (attr->ia_valid & ATTR_SIZE) { @@ -2046,7 +2058,7 @@ void ll_delete_inode(struct inode *inode) LASSERTF(nrpages == 0, "%s: inode="DFID"(%p) nrpages=%lu, see https://jira.whamcloud.com/browse/LU-118\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), inode, nrpages); ll_clear_inode(inode); @@ -2300,7 +2312,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, */ if (!fid_is_sane(&md.body->mbo_fid1)) { CERROR("%s: Fid is insane " DFID "\n", - ll_get_fsname(sb, NULL, 0), + sbi->ll_fsname, PFID(&md.body->mbo_fid1)); rc = -EINVAL; goto out; @@ -2570,40 +2582,6 @@ int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg) return 0; } -/** - * Get lustre file system name by @sbi. If @buf is provided(non-NULL), the - * fsname will be returned in this buffer; otherwise, a static buffer will be - * used to store the fsname and returned to caller. - */ -char *ll_get_fsname(struct super_block *sb, char *buf, int buflen) -{ - static char fsname_static[MTI_NAME_MAXLEN]; - struct lustre_sb_info *lsi = s2lsi(sb); - char *ptr; - int len; - - if (!buf) { - /* this means the caller wants to use static buffer - * and it doesn't care about race. Usually this is - * in error reporting path - */ - buf = fsname_static; - buflen = sizeof(fsname_static); - } - - len = strlen(lsi->lsi_lmd->lmd_profile); - ptr = strrchr(lsi->lsi_lmd->lmd_profile, '-'); - if (ptr && (strcmp(ptr, "-client") == 0)) - len -= 7; - - if (unlikely(len >= buflen)) - len = buflen - 1; - strncpy(buf, lsi->lsi_lmd->lmd_profile, len); - buf[len] = '\0'; - - return buf; -} - void ll_dirty_page_discard_warn(struct page *page, int ioret) { char *buf, *path = NULL; @@ -2613,15 +2591,15 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) /* this can be called inside spin lock so use GFP_ATOMIC. */ buf = (char *)__get_free_page(GFP_ATOMIC); if (buf) { - dentry = d_find_alias(page->mapping->host); + dentry = d_find_alias(inode); if (dentry) path = dentry_path_raw(dentry, buf, PAGE_SIZE); } CDEBUG(D_WARNING, "%s: dirty page discard: %s/fid: " DFID "/%s may get corrupted (rc %d)\n", - ll_get_fsname(page->mapping->host->i_sb, NULL, 0), - s2lsi(page->mapping->host->i_sb)->lsi_lmd->lmd_dev, + ll_i2sbi(inode)->ll_fsname, + s2lsi(inode->i_sb)->lsi_lmd->lmd_dev, PFID(ll_inode2fid(inode)), (path && !IS_ERR(path)) ? path : "", ioret); diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index de8f707..2ac5ad9 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -181,7 +181,7 @@ static int ll_encode_fh(struct inode *inode, u32 *fh, int *plen, struct lustre_nfs_fid *nfs_fid = (void *)fh; CDEBUG(D_INFO, "%s: encoding for (" DFID ") maxlen=%d minlen=%d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), *plen, fileid_len); if (*plen < fileid_len) { @@ -298,8 +298,7 @@ int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid) sbi = ll_s2sbi(dir->i_sb); CDEBUG(D_INFO, "%s: getting parent for (" DFID ")\n", - ll_get_fsname(dir->i_sb, NULL, 0), - PFID(ll_inode2fid(dir))); + sbi->ll_fsname, PFID(ll_inode2fid(dir))); rc = ll_get_default_mdsize(sbi, &lmmsize); if (rc != 0) @@ -315,8 +314,7 @@ int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid) ll_finish_md_op_data(op_data); if (rc) { CERROR("%s: failure inode " DFID " get parent: rc = %d\n", - ll_get_fsname(dir->i_sb, NULL, 0), - PFID(ll_inode2fid(dir)), rc); + sbi->ll_fsname, PFID(ll_inode2fid(dir)), rc); return rc; } body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 596aad8..197c09c 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -523,7 +523,7 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file, if (pages_number < 0 || pages_number > totalram_pages()) { CERROR("%s: can't set max cache more than %lu MB\n", - ll_get_fsname(sb, NULL, 0), + sbi->ll_fsname, totalram_pages() >> (20 - PAGE_SHIFT)); return -ERANGE; } @@ -977,7 +977,7 @@ static int ll_sbi_flags_seq_show(struct seq_file *m, void *v) while (flags != 0) { if (ARRAY_SIZE(str) <= i) { CERROR("%s: Revise array LL_SBI_FLAGS to match sbi flags please.\n", - ll_get_fsname(sb, NULL, 0)); + ll_s2sbi(sb)->ll_fsname); return -EINVAL; } @@ -1273,8 +1273,7 @@ static ssize_t ll_root_squash_seq_write(struct file *file, struct ll_sb_info *sbi = ll_s2sbi(sb); struct root_squash_info *squash = &sbi->ll_squash; - return lprocfs_wr_root_squash(buffer, count, squash, - ll_get_fsname(sb, NULL, 0)); + return lprocfs_wr_root_squash(buffer, count, squash, sbi->ll_fsname); } LPROC_SEQ_FOPS(ll_root_squash); @@ -1309,8 +1308,7 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, struct root_squash_info *squash = &sbi->ll_squash; int rc; - rc = lprocfs_wr_nosquash_nids(buffer, count, squash, - ll_get_fsname(sb, NULL, 0)); + rc = lprocfs_wr_nosquash_nids(buffer, count, squash, sbi->ll_fsname); if (rc < 0) return rc; diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 1de62b5..7dfb045 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -663,9 +663,8 @@ static void sa_instantiate(struct ll_statahead_info *sai, goto out; CDEBUG(D_READA, "%s: setting %.*s" DFID " l_data to inode %p\n", - ll_get_fsname(child->i_sb, NULL, 0), - entry->se_qstr.len, entry->se_qstr.name, - PFID(ll_inode2fid(child)), child); + ll_i2sbi(dir)->ll_fsname, entry->se_qstr.len, + entry->se_qstr.name, PFID(ll_inode2fid(child)), child); ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, child, it, NULL); entry->se_inode = child; @@ -1270,7 +1269,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry) rc = PTR_ERR(page); CERROR("%s: error reading dir " DFID " at %llu: opendir_pid = %u : rc = %d\n", - ll_get_fsname(dir->i_sb, NULL, 0), + ll_i2sbi(dir)->ll_fsname, PFID(ll_inode2fid(dir)), pos, lli->lli_opendir_pid, rc); break; @@ -1472,8 +1471,7 @@ static int revalidate_statahead_dentry(struct inode *dir, /* revalidate, but inode is recreated */ CDEBUG(D_READA, "%s: stale dentry %pd inode " DFID ", statahead inode " DFID "\n", - ll_get_fsname((*dentryp)->d_inode->i_sb, - NULL, 0), + ll_i2sbi(inode)->ll_fsname, *dentryp, PFID(ll_inode2fid((*dentryp)->d_inode)), PFID(ll_inode2fid(inode))); diff --git a/fs/lustre/llite/symlink.c b/fs/lustre/llite/symlink.c index d2922d1..aae449c 100644 --- a/fs/lustre/llite/symlink.c +++ b/fs/lustre/llite/symlink.c @@ -75,7 +75,7 @@ static int ll_readlink_internal(struct inode *inode, if (rc) { if (rc != -ENOENT) CERROR("%s: inode " DFID ": rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), rc); goto failed; } @@ -90,9 +90,8 @@ static int ll_readlink_internal(struct inode *inode, LASSERT(symlen != 0); if (body->mbo_eadatasize != symlen) { CERROR("%s: inode " DFID ": symlink length %d not expected %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), - PFID(ll_inode2fid(inode)), body->mbo_eadatasize - 1, - symlen - 1); + sbi->ll_fsname, PFID(ll_inode2fid(inode)), + body->mbo_eadatasize - 1, symlen - 1); rc = -EPROTO; goto failed; } @@ -101,7 +100,7 @@ static int ll_readlink_internal(struct inode *inode, if (!*symname || strnlen(*symname, symlen) != symlen - 1) { /* not full/NULL terminated */ CERROR("%s: inode " DFID ": symlink not NULL terminated string of length %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), symlen - 1); rc = -EPROTO; goto failed; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index ad4b39e..43f4088 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1012,7 +1012,7 @@ static int vvp_io_write_start(const struct lu_env *env, if (pos + cnt > ll_file_maxbytes(inode)) { CDEBUG(D_INODE, "%s: file " DFID " offset %llu > maxbytes %llu\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)), pos + cnt, ll_file_maxbytes(inode)); return -EFBIG; @@ -1440,7 +1440,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj, result = 0; if (result < 0) CERROR("%s: refresh file layout " DFID " error %d.\n", - ll_get_fsname(inode->i_sb, NULL, 0), + ll_i2sbi(inode)->ll_fsname, PFID(lu_object_fid(&obj->co_lu)), result); } diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 948aaf6..aa61a5a 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -381,7 +381,7 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, if (rc == -EOPNOTSUPP && type == XATTR_USER_T) { LCONSOLE_INFO( "%s: disabling user_xattr feature because it is not supported on the server: rc = %d\n", - ll_get_fsname(inode->i_sb, NULL, 0), rc); + sbi->ll_fsname, rc); spin_lock(&sbi->ll_lock); sbi->ll_flags &= ~LL_SBI_USER_XATTR; spin_unlock(&sbi->ll_lock); From patchwork Thu Feb 27 21:12:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410323 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE73292A for ; Thu, 27 Feb 2020 21:35:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A644924677 for ; Thu, 27 Feb 2020 21:35:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A644924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 779C434A031; Thu, 27 Feb 2020 13:29:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3826721FDF1 for ; Thu, 27 Feb 2020 13:19:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 830ED8A22; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 800A846A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:19 -0500 Message-Id: <1582838290-17243-272-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 271/622] lustre: llite: improve max_readahead console messages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Improve the max_readahead_mb, max_readahead_per_file_mb, and max_read_ahead_whole_mb console error messages to print the parameters properly in MB instead of PAGE_SIZE units, and include the filesystem name and bad parameters in the output. WC-bug-id: https://jira.whamcloud.com/browse/LU-1095 Lustre-commit: 48a0697d7910 ("LU-1095 llite: improve max_readahead console messages") Signed-off-by: Andreas Dilger Reviewed-on: http://review.whamcloud.com/12399 Reviewed-by: Dmitry Eremin Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin green@whamcloud.com> Signed-off-by: James Simmons --- fs/lustre/llite/lproc_llite.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 197c09c..cc9f80e 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -346,16 +346,19 @@ static ssize_t max_read_ahead_mb_store(struct kobject *kobj, ll_kset.kobj); int rc; unsigned long pages_number; + int pages_shift; + pages_shift = 20 - PAGE_SHIFT; rc = kstrtoul(buffer, 10, &pages_number); if (rc) return rc; - pages_number *= 1 << (20 - PAGE_SHIFT); /* MB -> pages */ + pages_number <<= pages_shift; /* MB -> pages */ if (pages_number > totalram_pages() / 2) { - CERROR("can't set file readahead more than %lu MB\n", - totalram_pages() >> (20 - PAGE_SHIFT + 1)); /*1/2 of RAM*/ + CERROR("%s: can't set max_readahead_mb=%lu > %luMB\n", + sbi->ll_fsname, pages_number >> pages_shift, + totalram_pages() >> (pages_shift + 1)); /*1/2 of RAM*/ return -ERANGE; } @@ -393,14 +396,20 @@ static ssize_t max_read_ahead_per_file_mb_store(struct kobject *kobj, ll_kset.kobj); int rc; unsigned long pages_number; + int pages_shift; + pages_shift = 20 - PAGE_SHIFT; rc = kstrtoul(buffer, 10, &pages_number); if (rc) return rc; + pages_number <<= pages_shift; /* MB -> pages */ + if (pages_number > sbi->ll_ra_info.ra_max_pages) { - CERROR("can't set file readahead more than max_read_ahead_mb %lu MB\n", - sbi->ll_ra_info.ra_max_pages); + CERROR("%s: can't set max_readahead_per_file_mb=%lu > max_read_ahead_mb=%lu\n", + sbi->ll_fsname, + pages_number >> pages_shift, + sbi->ll_ra_info.ra_max_pages >> pages_shift); return -ERANGE; } @@ -438,17 +447,22 @@ static ssize_t max_read_ahead_whole_mb_store(struct kobject *kobj, ll_kset.kobj); int rc; unsigned long pages_number; + int pages_shift; + pages_shift = 20 - PAGE_SHIFT; rc = kstrtoul(buffer, 10, &pages_number); if (rc) return rc; + pages_number <<= pages_shift; /* MB -> pages */ /* Cap this at the current max readahead window size, the readahead * algorithm does this anyway so it's pointless to set it larger. */ if (pages_number > sbi->ll_ra_info.ra_max_pages_per_file) { - CERROR("can't set max_read_ahead_whole_mb more than max_read_ahead_per_file_mb: %lu\n", - sbi->ll_ra_info.ra_max_pages_per_file >> (20 - PAGE_SHIFT)); + CERROR("%s: can't set max_read_ahead_whole_mb=%lu > max_read_ahead_per_file_mb=%lu\n", + sbi->ll_fsname, + pages_number >> pages_shift, + sbi->ll_ra_info.ra_max_pages_per_file >> pages_shift); return -ERANGE; } From patchwork Thu Feb 27 21:12:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410103 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5379717E0 for ; Thu, 27 Feb 2020 21:30:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3994C246A1 for ; Thu, 27 Feb 2020 21:30:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3994C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4870F348F29; Thu, 27 Feb 2020 13:25:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 957DE21FC75 for ; Thu, 27 Feb 2020 13:19:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 84B388A23; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 836CE46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:20 -0500 Message-Id: <1582838290-17243-273-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 272/622] lustre: llite: fill copied dentry name's ending char properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Dentry name expect an extra '\0'. and dentry_len won't calcualte extra '\0' for it, but we should allocate memory and fill it when copying dentry name by ourselves. Otherwise, lu_name_is_valid_2() will try to access @name[len] and check whether it is '\0'. this is invalid memory access. We will possibly hit a crash if the first access that bit is '\0'. and the bit overwritten by someone else, and finally we failed sanity check in mdc_name_pack(). LustreError: 157839:0:(mdc_lib.c:137:mdc_pack_name()) LBUG Fixes: 2eae6a4 ("lustre: llite: make sure name pack atomic") WC-bug-id: https://jira.whamcloud.com/browse/LU-12169 Lustre-commit: bc9cc327983c ("LU-12169 llite: fill copied dentry name's ending char properly") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34611 Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/llite/file.c | 10 ++++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 9ebdcb6..4e956da 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -456,6 +456,7 @@ #define OBD_FAIL_LLITE_CREATE_NODE_PAUSE 0x140c #define OBD_FAIL_LLITE_IMUTEX_SEC 0x140e #define OBD_FAIL_LLITE_IMUTEX_NOSEC 0x140f +#define OBD_FAIL_LLITE_OPEN_BY_NAME 0x1410 #define OBD_FAIL_FID_INDIR 0x1501 #define OBD_FAIL_FID_INLMA 0x1502 diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 0f15ea8..61d53c4 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -513,12 +513,14 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, * if server supports open-by-fid, or file name is invalid, don't pack * name in open request */ - if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID)) { + if (OBD_FAIL_CHECK(OBD_FAIL_LLITE_OPEN_BY_NAME) || + !(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID)) { retry: len = de->d_name.len; - name = kmalloc(len, GFP_NOFS); + name = kmalloc(len + 1, GFP_NOFS); if (!name) return -ENOMEM; + /* race here */ spin_lock(&de->d_lock); if (len != de->d_name.len) { @@ -527,12 +529,12 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, goto retry; } memcpy(name, de->d_name.name, len); + name[len] = '\0'; spin_unlock(&de->d_lock); if (!lu_name_is_valid_2(name, len)) { kfree(name); - name = NULL; - len = 0; + return -ESTALE; } } From patchwork Thu Feb 27 21:12:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410173 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F060892A for ; Thu, 27 Feb 2020 21:31:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D94E224677 for ; Thu, 27 Feb 2020 21:31:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D94E224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ABAEA348813; Thu, 27 Feb 2020 13:27:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DAC1F21FE0C for ; Thu, 27 Feb 2020 13:19:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 88CFA8A24; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8655D46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:21 -0500 Message-Id: <1582838290-17243-274-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 273/622] lustre: obd: update udev event handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Add a timestamp that users have requested so it can be recorded then a sysfs lustre file changed. Second the PARAM field only was created with the kobject source and parent name but the sysfs file could be deeper in the lustre sysfs tree. Add handling for deeper sysfs tree paths. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: b0d162390ad6 ("LU-8066 obd: update udev event handling") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/34624 Reviewed-by: Emoly Liu Reviewed-by: Alex Zhuravlev Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_config.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c index 4b1848f..97cb8c1 100644 --- a/fs/lustre/obdclass/obd_config.c +++ b/fs/lustre/obdclass/obd_config.c @@ -773,7 +773,7 @@ static int process_param2_config(struct lustre_cfg *lcfg) char *param = lustre_cfg_string(lcfg, 1); struct kobject *kobj = NULL; const char *subsys = param; - char *envp[3]; + char *envp[4]; char *value; size_t len; int rc; @@ -802,7 +802,9 @@ static int process_param2_config(struct lustre_cfg *lcfg) param = strsep(&value, "="); envp[0] = kasprintf(GFP_KERNEL, "PARAM=%s", param); envp[1] = kasprintf(GFP_KERNEL, "SETTING=%s", value); - envp[2] = NULL; + envp[2] = kasprintf(GFP_KERNEL, "TIME=%lld", + ktime_get_real_seconds()); + envp[3] = NULL; rc = kobject_uevent_env(kobj, KOBJ_CHANGE, envp); for (i = 0; i < ARRAY_SIZE(envp); i++) @@ -1128,14 +1130,25 @@ ssize_t class_modify_config(struct lustre_cfg *lcfg, const char *prefix, } if (!attr) { - char *envp[3]; + char *envp[4], *param, *path; - envp[0] = kasprintf(GFP_KERNEL, "PARAM=%s.%s.%.*s", - kobject_name(kobj->parent), - kobject_name(kobj), - (int)keylen, key); + path = kobject_get_path(kobj, GFP_KERNEL); + if (!path) + return -EINVAL; + + /* convert sysfs path to uevent format */ + param = path; + while ((param = strchr(param, '/')) != NULL) + *param = '.'; + + param = strstr(path, "fs.lustre.") + 10; + + envp[0] = kasprintf(GFP_KERNEL, "PARAM=%s.%.*s", + param, (int)keylen, key); envp[1] = kasprintf(GFP_KERNEL, "SETTING=%s", value); - envp[2] = NULL; + envp[2] = kasprintf(GFP_KERNEL, "TIME=%lld", + ktime_get_real_seconds()); + envp[3] = NULL; if (kobject_uevent_env(kobj, KOBJ_CHANGE, envp)) { CERROR("%s: failed to send uevent %s\n", @@ -1144,6 +1157,7 @@ ssize_t class_modify_config(struct lustre_cfg *lcfg, const char *prefix, for (i = 0; i < ARRAY_SIZE(envp); i++) kfree(envp[i]); + kfree(path); } else { count += lustre_attr_store(kobj, attr, value, strlen(value)); From patchwork Thu Feb 27 21:12:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410105 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A459A92A for ; Thu, 27 Feb 2020 21:30:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8D6F8246A0 for ; Thu, 27 Feb 2020 21:30:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D6F8246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5ECFC3495FD; Thu, 27 Feb 2020 13:25:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2ABB621FE15 for ; Thu, 27 Feb 2020 13:19:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8A7868A26; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8949146F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:22 -0500 Message-Id: <1582838290-17243-275-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 274/622] lustre: ptlrpc: Bulk assertion fails on -ENOMEM X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Recalculate rq_mbits on ENOMEM resend if OBD_CONNECT_BULK_MBITS isn't used. Cray-bug-id: LUS-7159 WC-bug-id: https://jira.whamcloud.com/browse/LU-12218 Lustre-commit: e63a49fa6920 ("LU-12218 ptlrpc: Bulk assertion fails on -ENOMEM") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/34753 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 0f5aa92..7c243af 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -3182,7 +3182,14 @@ void ptlrpc_set_bulk_mbits(struct ptlrpc_request *req) old_mbits, req->rq_mbits); } else if (!(lustre_msg_get_flags(req->rq_reqmsg) & MSG_REPLAY)) { /* Request being sent first time, use xid as matchbits. */ - req->rq_mbits = req->rq_xid; + if (OCD_HAS_FLAG(&bd->bd_import->imp_connect_data, BULK_MBITS) + || req->rq_mbits == 0) { + req->rq_mbits = req->rq_xid; + } else { + int total_md = (bd->bd_iov_count + LNET_MAX_IOV - 1) / + LNET_MAX_IOV; + req->rq_mbits -= total_md - 1; + } } else { /* * Replay request, xid and matchbits have already been From patchwork Thu Feb 27 21:12:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410109 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75408138D for ; Thu, 27 Feb 2020 21:30:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C678246A0 for ; Thu, 27 Feb 2020 21:30:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C678246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A5CF034962C; Thu, 27 Feb 2020 13:26:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 712FF21FE20 for ; Thu, 27 Feb 2020 13:19:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8F39B8A27; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C86B468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:23 -0500 Message-Id: <1582838290-17243-276-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 275/622] lustre: obd: Add overstriping CONNECT flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell This patch reserves the OBD_CONNECT flag for overstriping, and also does some cleanup of OBD_CONNECT flags, putting them in the correct order and adding some missing ones in proc and the wire{test,check} checks. WC-bug-id: https://jira.whamcloud.com/browse/LU-9846 Lustre-commit: 5d085745af43 ("LU-9846 obd: Add overstriping CONNECT flag") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34743 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/include/lustre_export.h | 5 +++++ fs/lustre/llite/llite_lib.c | 6 +++--- fs/lustre/obdclass/lprocfs_status.c | 4 ++-- fs/lustre/ptlrpc/wiretest.c | 4 ++++ include/uapi/linux/lustre/lustre_idl.h | 1 + 5 files changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index c94efb0..967ce37 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -264,6 +264,11 @@ static inline int exp_connect_lockahead(struct obd_export *exp) return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LOCKAHEAD); } +static inline int exp_connect_overstriping(struct obd_export *exp) +{ + return !!(exp_connect_flags2(exp) & OBD_CONNECT2_OVERSTRIPING); +} + static inline int exp_connect_flr(struct obd_export *exp) { return !!(exp_connect_flags2(exp) & OBD_CONNECT2_FLR); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 8e5cf0a..fd19035 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -212,10 +212,10 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_GRANT_PARAM | OBD_CONNECT_SHORTIO | OBD_CONNECT_FLAGS2; - data->ocd_connect_flags2 = OBD_CONNECT2_FLR | - OBD_CONNECT2_LOCK_CONVERT | - OBD_CONNECT2_DIR_MIGRATE | + data->ocd_connect_flags2 = OBD_CONNECT2_DIR_MIGRATE | OBD_CONNECT2_SUM_STATFS | + OBD_CONNECT2_FLR | + OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_ARCHIVE_ID_ARRAY | OBD_CONNECT2_LSOM; diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index a7c274a..55057cf 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -114,8 +114,8 @@ "file_secctx", /* 0x01 */ "lockaheadv2", /* 0x02 */ "dir_migrate", /* 0x04 */ - "unknown", /* 0x08 */ - "unknown", /* 0x10 */ + "sum_statfs", /* 0x08 */ + "overstriping", /* 0x10 */ "flr", /* 0x20 */ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 4a268f6..fb57def 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1136,6 +1136,10 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LOCKAHEAD); LASSERTF(OBD_CONNECT2_DIR_MIGRATE == 0x4ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_DIR_MIGRATE); + LASSERTF(OBD_CONNECT2_SUM_STATFS == 0x8ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_SUM_STATFS); + LASSERTF(OBD_CONNECT2_OVERSTRIPING == 0x10ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_OVERSTRIPING); LASSERTF(OBD_CONNECT2_FLR == 0x20ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_FLR); LASSERTF(OBD_CONNECT2_WBC_INTENTS == 0x40ULL, "found 0x%.16llxULL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 3a2a093..bba3a77 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -797,6 +797,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ #define OBD_CONNECT2_SUM_STATFS 0x8ULL /* MDT return aggregated stats */ +#define OBD_CONNECT2_OVERSTRIPING 0x10ULL /* OST overstriping support */ #define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ #define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents * for wbc, also operations From patchwork Thu Feb 27 21:12:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410113 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8815B92A for ; Thu, 27 Feb 2020 21:30:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6FB51246A0 for ; Thu, 27 Feb 2020 21:30:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FB51246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 236E0349658; Thu, 27 Feb 2020 13:26:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C6B9321FE2C for ; Thu, 27 Feb 2020 13:19:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 90AA68A28; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F9C446A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:24 -0500 Message-Id: <1582838290-17243-277-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 276/622] lustre: llite, readahead: fix to call ll_ras_enter() properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong ll_ras_enter() is expected to be called per syscall. However, with fast read enabled, it will be no longer true that We will call vvp_io_read_start() for every syscall. To fix this problem, we should move this to file read handler. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: 500edcada7e4 ("LU-12043 llite, readahead: fix to call ll_ras_enter() properly") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34755 Reviewed-by: Patrick Farrell Reviewed-by: Jinshan Xiong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 ++ fs/lustre/llite/vvp_io.c | 1 - 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 61d53c4..d059ac7 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1625,6 +1625,8 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) u16 refcheck; ssize_t rc2; + ll_ras_enter(iocb->ki_filp); + result = ll_do_fast_read(iocb, to); if (result < 0 || iov_iter_count(to) == 0) goto out; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 43f4088..1f82fe6 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -773,7 +773,6 @@ static int vvp_io_read_start(const struct lu_env *env, vio->vui_ra_valid = true; vio->vui_ra_start = cl_index(obj, pos); vio->vui_ra_count = cl_index(obj, tot + PAGE_SIZE - 1); - ll_ras_enter(file); } /* BUG: 5972 */ From patchwork Thu Feb 27 21:12:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410117 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E495138D for ; Thu, 27 Feb 2020 21:30:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 26F79246A0 for ; Thu, 27 Feb 2020 21:30:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26F79246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6CBF5348F90; Thu, 27 Feb 2020 13:26:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 15F3021FE30 for ; Thu, 27 Feb 2020 13:19:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 95A428A29; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9272746C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:25 -0500 Message-Id: <1582838290-17243-278-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 277/622] lustre: ptlrpc: ASSERTION (req_transno < next_transno) failed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh An update request is checked for duplicates by xid in is_req_replayed_by_update(). However xid is unique per client only. It may happen that there are 2 requests with the same xid from different clients. Perform lookup by transno, it is unique per MDT. Cray-bug-id: LUS-6015 WC-bug-id: https://jira.whamcloud.com/browse/LU-11251 Lustre-commit: 53764826b95f ("LU-11251 mdt: ASSERTION (req_transno < next_transno) failed") Signed-off-by: Andriy Skulysh Reviewed-by: Vitaly Fertman Reviewed-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/33001 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 3 ++- fs/lustre/ptlrpc/client.c | 11 ++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 4e956da..837b68d 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -355,7 +355,8 @@ #define OBD_FAIL_PTLRPC_DROP_BULK 0x51a #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK 0x51b #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK 0x51c -#define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 +#define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 +#define OBD_FAIL_PTLRPC_ROUND_XID 0x530 #define OBD_FAIL_PTLRPC_CONNECT_RACE 0x531 #define OBD_FAIL_OBD_PING_NET 0x600 diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 7c243af..ac16878 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -712,6 +712,8 @@ static inline void ptlrpc_assign_next_xid(struct ptlrpc_request *req) spin_unlock(&req->rq_import->imp_lock); } +static atomic64_t ptlrpc_last_xid; + int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, u32 version, int opcode, char **bufs, struct ptlrpc_cli_ctx *ctx) @@ -761,7 +763,6 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, ptlrpc_at_set_req_timeout(request); lustre_msg_set_opc(request->rq_reqmsg, opcode); - ptlrpc_assign_next_xid(request); /* Let's setup deadline for req/reply/bulk unlink for opcode. */ if (cfs_fail_val == opcode) { @@ -776,6 +777,11 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, } else if (CFS_FAIL_CHECK(OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK)) { fail_t = &request->rq_reply_deadline; fail2_t = &request->rq_bulk_deadline; + } else if (CFS_FAIL_CHECK(OBD_FAIL_PTLRPC_ROUND_XID)) { + time64_t now = ktime_get_real_seconds(); + + atomic64_set(&ptlrpc_last_xid, + ((u64)now >> 4) << 24); } if (fail_t) { @@ -791,6 +797,7 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, msleep(4 * MSEC_PER_SEC); } } + ptlrpc_assign_next_xid(request); return 0; @@ -3085,8 +3092,6 @@ void ptlrpc_abort_set(struct ptlrpc_request_set *set) } } -static atomic64_t ptlrpc_last_xid; - /** * Initialize the XID for the node. This is common among all requests on * this node, and only requires the property that it is monotonically From patchwork Thu Feb 27 21:12:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410095 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 070FC17E0 for ; Thu, 27 Feb 2020 21:30:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E3C36246A0 for ; Thu, 27 Feb 2020 21:30:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E3C36246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42FC534958F; Thu, 27 Feb 2020 13:25:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7127121FE36 for ; Thu, 27 Feb 2020 13:19:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 96F0E8A2A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 95A3246D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:26 -0500 Message-Id: <1582838290-17243-279-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 278/622] lustre: lov: new foreign LOV format X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini This patch introduces a new layout/LOV format in order to allow to specify an arbitrary external reference for a file in Lustre namespace. The new LOV format is made of {newmagic, length, type, flags, string[length]} to be as flexible as possible. Foreign file can be created by using the open(O_LOV_DELAY_CREATE) + ioctl(LL_IOC_LOV_SETSTRIPE) operations and it can only be and remain an empty file until removed. A new API method llapi_file_create_foreign() has been introduced and "lfs [[get,set]stripe,find" modified to understand new layout. The idea behind this is to provide Lustre namespace support and layout prefetch/caching under layout protection, for user/external usage. Code has been added for lfsck to handle foreign files, and a new sub-test has been added in sanity-lfsck in order to verify if does not break foreign file and that reverse is also true. WC-bug-id: https://jira.whamcloud.com/browse/LU-11376 Lustre-commit: 6a20bdcc608b ("LU-11376 lov: new foreign LOV format") Signed-off-by: Bruno Faccini Reviewed-on: https://review.whamcloud.com/33755 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 12 ++++++- fs/lustre/llite/llite_internal.h | 2 ++ fs/lustre/llite/vvp_io.c | 2 +- fs/lustre/llite/xattr.c | 4 ++- fs/lustre/lov/lov_cl_internal.h | 6 ++++ fs/lustre/lov/lov_ea.c | 63 ++++++++++++++++++++++++++++++--- fs/lustre/lov/lov_internal.h | 19 +++++++--- fs/lustre/lov/lov_object.c | 49 ++++++++++++++++++++++++- fs/lustre/lov/lov_pack.c | 44 ++++++++++++++++++++--- fs/lustre/lov/lov_page.c | 7 ++++ include/uapi/linux/lustre/lustre_idl.h | 1 + include/uapi/linux/lustre/lustre_user.h | 31 ++++++++++++++++ 12 files changed, 222 insertions(+), 18 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index d059ac7..0d7d566 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1827,7 +1827,8 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, if (lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V1) && lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V3) && - lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_COMP_V1)) { + lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_COMP_V1) && + lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_FOREIGN)) { rc = -EPROTO; goto out; } @@ -1863,6 +1864,15 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, stripe_count); } else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) { lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm); + } else if (lmm->lmm_magic == + cpu_to_le32(LOV_MAGIC_FOREIGN)) { + struct lov_foreign_md *lfm; + + lfm = (struct lov_foreign_md *)lmm; + __swab32s(&lfm->lfm_magic); + __swab32s(&lfm->lfm_length); + __swab32s(&lfm->lfm_type); + __swab32s(&lfm->lfm_flags); } } diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index b9478f4d..9d7345a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -962,6 +962,8 @@ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum) LOV_USER_MAGIC_SPECIFIC); case LOV_USER_MAGIC_COMP_V1: return ((struct lov_comp_md_v1 *)lum)->lcm_size; + case LOV_USER_MAGIC_FOREIGN: + return foreign_size(lum); } return -EINVAL; } diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 1f82fe6..ee44a18 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -165,7 +165,7 @@ static int vvp_prep_size(const struct lu_env *env, struct cl_object *obj, * --bug 17336 */ loff_t size = i_size_read(inode); - loff_t cur_index = start >> PAGE_SHIFT; + unsigned long cur_index = start >> PAGE_SHIFT; loff_t size_index = (size - 1) >> PAGE_SHIFT; if ((size == 0 && cur_index != 0) || diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index aa61a5a..9707e78 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -453,6 +453,7 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) }; struct lu_env *env; u16 refcheck; + u32 magic; if (!obj) return -ENODATA; @@ -483,7 +484,8 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) * recognizing layout gen as stripe offset when the * file is restored. See LU-2809. */ - if (((struct lov_mds_md *)buf)->lmm_magic == LOV_MAGIC_COMP_V1) + magic = ((struct lov_mds_md *)buf)->lmm_magic; + if (magic == LOV_MAGIC_COMP_V1 || magic == LOV_MAGIC_FOREIGN) goto out_env; ((struct lov_mds_md *)buf)->lmm_layout_gen = 0; diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index e14567d..7b95a00 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -122,6 +122,7 @@ enum lov_layout_type { LLT_EMPTY, /** empty file without body (mknod + truncate) */ LLT_RELEASED, /** file with no objects (data in HSM) */ LLT_COMP, /** support composite layout */ + LLT_FOREIGN, /** foreign layout */ LLT_NR }; @@ -134,6 +135,8 @@ static inline char *llt2str(enum lov_layout_type llt) return "RELEASED"; case LLT_COMP: return "COMPOSITE"; + case LLT_FOREIGN: + return "FOREIGN"; case LLT_NR: LBUG(); } @@ -626,9 +629,12 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj, struct cl_page *page, pgoff_t index); int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj, struct cl_page *page, pgoff_t index); +int lov_page_init_foreign(const struct lu_env *env, struct cl_object *obj, + struct cl_page *page, pgoff_t index); struct lu_object *lov_object_alloc(const struct lu_env *env, const struct lu_object_header *hdr, struct lu_device *dev); + struct lu_object *lovsub_object_alloc(const struct lu_env *env, const struct lu_object_header *hdr, struct lu_device *dev); diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index 31a18d0..b7a6d91 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -134,8 +134,12 @@ void lsm_free(struct lov_stripe_md *lsm) unsigned int entry_count = lsm->lsm_entry_count; unsigned int i; - for (i = 0; i < entry_count; i++) - lsme_free(lsm->lsm_entries[i]); + if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) { + kvfree(lsm_foreign(lsm)); + } else { + for (i = 0; i < entry_count; i++) + lsme_free(lsm->lsm_entries[i]); + } kfree(lsm); } @@ -513,6 +517,44 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm, .lsm_unpackmd = lsm_unpackmd_comp_md_v1, }; +static struct +lov_stripe_md *lsm_unpackmd_foreign(struct lov_obd *lov, void *buf, + size_t buf_size) +{ + struct lov_foreign_md *lfm = buf; + struct lov_stripe_md *lsm; + size_t lsm_size; + struct lov_stripe_md_entry *lsme; + + lsm_size = offsetof(typeof(*lsm), lsm_entries[1]); + lsm = kzalloc(lsm_size, GFP_NOFS); + if (!lsm) + return ERR_PTR(-ENOMEM); + + atomic_set(&lsm->lsm_refc, 1); + spin_lock_init(&lsm->lsm_lock); + lsm->lsm_magic = le32_to_cpu(lfm->lfm_magic); + lsm->lsm_foreign_size = foreign_size_le(lfm); + + /* alloc for full foreign EA including format fields */ + lsme = kvzalloc(lsm->lsm_foreign_size, GFP_NOFS); + if (!lsme) { + kfree(lsm); + return ERR_PTR(-ENOMEM); + } + + /* copy full foreign EA including format fields */ + memcpy(lsme, buf, lsm->lsm_foreign_size); + + lsm_foreign(lsm) = lsme; + + return lsm; +} + +const struct lsm_operations lsm_foreign_ops = { + .lsm_unpackmd = lsm_unpackmd_foreign, +}; + const struct lsm_operations *lsm_op_find(int magic) { const struct lsm_operations *lsm = NULL; @@ -527,6 +569,9 @@ const struct lsm_operations *lsm_op_find(int magic) case LOV_MAGIC_COMP_V1: lsm = &lsm_comp_md_v1_ops; break; + case LOV_MAGIC_FOREIGN: + lsm = &lsm_foreign_ops; + break; default: CERROR("unrecognized lsm_magic %08x\n", magic); break; @@ -539,12 +584,22 @@ void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm) { int i, j; - CDEBUG(level, - "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, layout_gen %u\n", + CDEBUG_LIMIT(level, + "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, layout_gen %u\n", lsm, POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic, atomic_read(&lsm->lsm_refc), lsm->lsm_entry_count, lsm->lsm_layout_gen); + if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) { + struct lov_foreign_md *lfm = (void *)lsm_foreign(lsm); + + CDEBUG_LIMIT(level, + "foreign LOV EA, magic %x, length %u, type %x, flags %x, value '%.*s'\n", + lfm->lfm_magic, lfm->lfm_length, lfm->lfm_type, + lfm->lfm_flags, lfm->lfm_length, lfm->lfm_value); + return; + } + for (i = 0; i < lsm->lsm_entry_count; i++) { struct lov_stripe_md_entry *lse = lsm->lsm_entries[i]; diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h index 36586b3..d235abe 100644 --- a/fs/lustre/lov/lov_internal.h +++ b/fs/lustre/lov/lov_internal.h @@ -79,11 +79,15 @@ struct lov_stripe_md { spinlock_t lsm_lock; pid_t lsm_lock_owner; /* debugging */ - /* - * maximum possible file size, might change as OSTs status changes, - * e.g. disconnected, deactivated - */ - loff_t lsm_maxbytes; + union { + /* + * maximum possible file size, might change as OSTs status + * changes, e.g. disconnected, deactivated + */ + loff_t lsm_maxbytes; + /* size of full foreign LOV */ + size_t lsm_foreign_size; + }; struct ost_id lsm_oi; u32 lsm_magic; u32 lsm_layout_gen; @@ -94,6 +98,8 @@ struct lov_stripe_md { struct lov_stripe_md_entry *lsm_entries[]; }; +#define lsm_foreign(lsm) (lsm->lsm_entries[0]) + static inline bool lsme_inited(const struct lov_stripe_md_entry *lsme) { return lsme->lsme_flags & LCME_FL_INIT; @@ -119,6 +125,9 @@ static inline size_t lov_comp_md_size(const struct lov_stripe_md *lsm) return lov_mds_md_size(lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_entries[0]->lsme_magic); + if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) + return lsm->lsm_foreign_size; + LASSERT(lsm->lsm_magic == LOV_MAGIC_COMP_V1); size = sizeof(struct lov_comp_md_v1); diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index c04b2ae..7543ef2 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -810,10 +810,25 @@ static int lov_init_released(const struct lu_env *env, return 0; } +static int lov_init_foreign(const struct lu_env *env, + struct lov_device *dev, struct lov_object *lov, + struct lov_stripe_md *lsm, + const struct cl_object_conf *conf, + union lov_layout_state *state) +{ + LASSERT(lsm); + LASSERT(lov->lo_type == LLT_FOREIGN); + LASSERT(!lov->lo_lsm); + + lov->lo_lsm = lsm_addref(lsm); + return 0; +} + static int lov_delete_empty(const struct lu_env *env, struct lov_object *lov, union lov_layout_state *state) { - LASSERT(lov->lo_type == LLT_EMPTY || lov->lo_type == LLT_RELEASED); + LASSERT(lov->lo_type == LLT_EMPTY || lov->lo_type == LLT_RELEASED || + lov->lo_type == LLT_FOREIGN); lov_layout_wait(env, lov); return 0; @@ -923,6 +938,23 @@ static int lov_print_released(const struct lu_env *env, void *cookie, return 0; } +static int lov_print_foreign(const struct lu_env *env, void *cookie, + lu_printer_t p, const struct lu_object *o) +{ + struct lov_object *lov = lu2lov(o); + struct lov_stripe_md *lsm = lov->lo_lsm; + + (*p)(env, cookie, + "foreign: %s, lsm{%p 0x%08X %d %u}:\n", + lov->lo_layout_invalid ? "invalid" : "valid", lsm, + lsm->lsm_magic, atomic_read(&lsm->lsm_refc), + lsm->lsm_layout_gen); + (*p)(env, cookie, + "raw_ea_content '%.*s'\n", + (int)lsm->lsm_foreign_size, (char *)lsm_foreign(lsm)); + return 0; +} + /** * Implements cl_object_operations::coo_attr_get() method for an object * without stripes (LLT_EMPTY layout type). @@ -1020,6 +1052,16 @@ static int lov_attr_get_composite(const struct lu_env *env, .llo_io_init = lov_io_init_composite, .llo_getattr = lov_attr_get_composite, }, + [LLT_FOREIGN] = { + .llo_init = lov_init_foreign, + .llo_delete = lov_delete_empty, + .llo_fini = lov_fini_released, + .llo_print = lov_print_foreign, + .llo_page_init = lov_page_init_foreign, + .llo_lock_init = lov_lock_init_empty, + .llo_io_init = lov_io_init_empty, + .llo_getattr = lov_attr_get_empty, + }, }; /** @@ -1051,6 +1093,9 @@ static enum lov_layout_type lov_type(struct lov_stripe_md *lsm) lsm->lsm_magic == LOV_MAGIC_COMP_V1) return LLT_COMP; + if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) + return LLT_FOREIGN; + return LLT_EMPTY; } @@ -2141,6 +2186,8 @@ int lov_read_and_clear_async_rc(struct cl_object *clob) } case LLT_RELEASED: case LLT_EMPTY: + /* fall through */ + case LLT_FOREIGN: break; default: LBUG(); diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c index c6dec2d..2b348d3 100644 --- a/fs/lustre/lov/lov_pack.c +++ b/fs/lustre/lov/lov_pack.c @@ -162,6 +162,28 @@ ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf, return lmm_size; } +ssize_t lov_lsm_pack_foreign(const struct lov_stripe_md *lsm, void *buf, + size_t buf_size) +{ + struct lov_foreign_md *lfm = buf; + size_t lfm_size; + + lfm_size = lsm->lsm_foreign_size; + + if (buf_size == 0) + return lfm_size; + + if (buf_size < lfm_size) + return -ERANGE; + + /* full foreign LOV is already avail in its cache + * no need to translate format fields to little-endian + */ + memcpy(lfm, lsm_foreign(lsm), lsm->lsm_foreign_size); + + return lfm_size; +} + ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf, size_t buf_size) { @@ -177,6 +199,9 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf, if (lsm->lsm_magic == LOV_MAGIC_V1 || lsm->lsm_magic == LOV_MAGIC_V3) return lov_lsm_pack_v1v3(lsm, buf, buf_size); + if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) + return lov_lsm_pack_foreign(lsm, buf, buf_size); + lmm_size = lov_comp_md_size(lsm); if (buf_size == 0) return lmm_size; @@ -331,6 +356,7 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj, { /* we use lov_user_md_v3 because it is larger than lov_user_md_v1 */ struct lov_mds_md *lmmk, *lmm; + struct lov_foreign_md *lfm; struct lov_user_md_v1 lum; ssize_t lmm_size, lum_size = 0; static bool printed; @@ -338,7 +364,8 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj, int rc = 0; if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3 && - lsm->lsm_magic != LOV_MAGIC_COMP_V1) { + lsm->lsm_magic != LOV_MAGIC_COMP_V1 && + lsm->lsm_magic != LOV_MAGIC_FOREIGN) { CERROR("bad LSM MAGIC: 0x%08X != 0x%08X nor 0x%08X\n", lsm->lsm_magic, LOV_MAGIC_V1, LOV_MAGIC_V3); rc = -EIO; @@ -374,16 +401,23 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj, lmmk->lmm_stripe_count); } else if (lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) { lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmmk); + } else if (lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_FOREIGN)) { + lfm = (struct lov_foreign_md *)lmmk; + __swab32s(&lfm->lfm_magic); + __swab32s(&lfm->lfm_length); + __swab32s(&lfm->lfm_type); + __swab32s(&lfm->lfm_flags); } } /* Legacy appication passes limited buffer, we need to figure out * the user buffer size by the passed in lmm_stripe_count. */ - if (copy_from_user(&lum, lump, sizeof(struct lov_user_md_v1))) { - rc = -EFAULT; - goto out_free; - } + if (lsm->lsm_magic != LOV_MAGIC_FOREIGN) + if (copy_from_user(&lum, lump, sizeof(struct lov_user_md_v1))) { + rc = -EFAULT; + goto out_free; + } if (lum.lmm_magic == LOV_USER_MAGIC_V1 || lum.lmm_magic == LOV_USER_MAGIC_V3) diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c index 3f08da7..c3337706 100644 --- a/fs/lustre/lov/lov_page.c +++ b/fs/lustre/lov/lov_page.c @@ -145,6 +145,13 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj, return 0; } +int lov_page_init_foreign(const struct lu_env *env, struct cl_object *obj, + struct cl_page *page, pgoff_t index) +{ + CDEBUG(D_PAGE, DFID" has no data\n", PFID(lu_object_fid(&obj->co_lu))); + return -ENODATA; +} + bool lov_page_is_empty(const struct cl_page *page) { const struct cl_page_slice *slice = cl_page_at(page, &lov_device_type); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index bba3a77..fd35023 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1022,6 +1022,7 @@ enum obdo_flags { #define LOV_MAGIC_SPECIFIC (0x0BD50000 | LOV_MAGIC_MAGIC) #define LOV_MAGIC LOV_MAGIC_V1 #define LOV_MAGIC_COMP_V1 (0x0BD60000 | LOV_MAGIC_MAGIC) +#define LOV_MAGIC_FOREIGN (0x0BD70000 | LOV_MAGIC_MAGIC) /* * magic for fully defined striping diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 3901eb2..ad5d446 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -56,6 +56,7 @@ # include # include # include /* snprintf() */ +# include # include # include #endif /* __KERNEL__ */ @@ -388,6 +389,7 @@ struct ll_ioc_lease_id { /* 0x0BD40BD0 is occupied by LOV_MAGIC_MIGRATE */ #define LOV_USER_MAGIC_SPECIFIC 0x0BD50BD0 /* for specific OSTs */ #define LOV_USER_MAGIC_COMP_V1 0x0BD60BD0 +#define LOV_USER_MAGIC_FOREIGN 0x0BD70BD0 #define LMV_USER_MAGIC 0x0CD30CD0 /*default lmv magic*/ #define LMV_USER_MAGIC_SPECIFIC 0x0CD40CD0 @@ -469,6 +471,21 @@ struct lov_user_md_v3 { /* LOV EA user data (host-endian) */ struct lov_user_ost_data_v1 lmm_objects[0]; /* per-stripe data */ } __packed; +struct lov_foreign_md { + __u32 lfm_magic; /* magic number = LOV_MAGIC_FOREIGN */ + __u32 lfm_length; /* length of lfm_value */ + __u32 lfm_type; /* type, see LOV_FOREIGN_TYPE_ */ + __u32 lfm_flags; /* flags, type specific */ + char lfm_value[]; +}; + +#define foreign_size(lfm) (((struct lov_foreign_md *)lfm)->lfm_length + \ + offsetof(struct lov_foreign_md, lfm_value)) + +#define foreign_size_le(lfm) \ + (le32_to_cpu(((struct lov_foreign_md *)lfm)->lfm_length) + \ + offsetof(struct lov_foreign_md, lfm_value)) + struct lu_extent { __u64 e_start; __u64 e_end; @@ -628,6 +645,20 @@ enum lmv_hash_type { #define LMV_HASH_NAME_ALL_CHARS "all_char" #define LMV_HASH_NAME_FNV_1A_64 "fnv_1a_64" +/** + * LOV foreign types + **/ +#define LOV_FOREIGN_TYPE_NONE 0 +#define LOV_FOREIGN_TYPE_DAOS 0xda05 +#define LOV_FOREIGN_TYPE_UNKNOWN UINT32_MAX + +struct lustre_foreign_type { + uint32_t lft_type; + const char *lft_name; +}; + +extern struct lustre_foreign_type lov_foreign_type[]; + /* * Got this according to how get LOV_MAX_STRIPE_COUNT, see above, * (max buffer size - lmv+rpc header) / sizeof(struct lmv_user_mds_data) From patchwork Thu Feb 27 21:12:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410119 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38EB892A for ; Thu, 27 Feb 2020 21:30:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 21B6720801 for ; Thu, 27 Feb 2020 21:30:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 21B6720801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35F763496C4; Thu, 27 Feb 2020 13:26:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C871821FE3D for ; Thu, 27 Feb 2020 13:19:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9AFCA8A2B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 98D5A46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:27 -0500 Message-Id: <1582838290-17243-280-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 279/622] lustre: lmv: new foreign LMV format X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bruno Faccini This patch introduces a new striping/LMV format in order to allow to specify an arbitrary external reference for a dir in Lustre namespace. The new LMV format is made of {newmagic, length, type, flags, string[length]} to be as flexible as possible. Foreign dir can be created by using the ioctl(LL_IOC_LMV_SETDIRSTRIPE) operation and it can only be and remain an empty dir until removed. The idea behind this is to provide Lustre namespace support and striping prefetch/caching under lock protection, for user/external usage. This patch is the LMV/dirs complement of LOV/files previous change (lustre: lov: new foreign LOV format) has been rebased on top of the latter along with some with obvious mutualizations and simplifications. WC-bug-id: https://jira.whamcloud.com/browse/LU-11376 Lustre-commit: fdad38781ccc ("LU-11376 lmv: new foreign LMV format") Signed-off-by: Bruno Faccini Reviewed-on: https://review.whamcloud.com/34087 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 7 +++ fs/lustre/include/obd.h | 5 +- fs/lustre/llite/dir.c | 94 +++++++++++++++++++++++++++++---- fs/lustre/llite/file.c | 5 ++ fs/lustre/llite/llite_lib.c | 20 +++++-- fs/lustre/lmv/lmv_intent.c | 14 +++++ fs/lustre/lmv/lmv_obd.c | 50 +++++++++++++++++- fs/lustre/mdc/mdc_request.c | 17 ++++-- fs/lustre/ptlrpc/pack_generic.c | 9 +++- include/uapi/linux/lustre/lustre_idl.h | 11 ++++ include/uapi/linux/lustre/lustre_user.h | 31 +++++++---- 11 files changed, 232 insertions(+), 31 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index 1246c25..cef315d 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -189,4 +189,11 @@ static inline bool lmv_is_known_hash_type(u32 type) (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_ALL_CHARS; } +static inline bool lmv_magic_supported(u32 lum_magic) +{ + return lum_magic == LMV_USER_MAGIC || + lum_magic == LMV_USER_MAGIC_SPECIFIC || + lum_magic == LMV_MAGIC_FOREIGN; +} + #endif diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 687b54b..996211a 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -929,7 +929,10 @@ struct obd_ops { struct lustre_md { struct mdt_body *body; struct lu_buf layout; - struct lmv_stripe_md *lmv; + union { + struct lmv_stripe_md *lmv; + struct lmv_foreign_md *lfm; + }; #ifdef CONFIG_LUSTRE_FS_POSIX_ACL struct posix_acl *posix_acl; #endif diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 8293a01..fd7cd2d 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -346,6 +346,14 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) rc = PTR_ERR(op_data); goto out; } + + /* foreign dirs are browsed out of Lustre */ + if (unlikely(op_data->op_mea1 && + op_data->op_mea1->lsm_md_magic == LMV_MAGIC_FOREIGN)) { + ll_finish_md_op_data(op_data); + return -ENODATA; + } + op_data->op_fid3 = pfid; ctx->pos = pos; @@ -421,14 +429,22 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, }; int err; - if (unlikely(lump->lum_magic != LMV_USER_MAGIC && - lump->lum_magic != LMV_USER_MAGIC_SPECIFIC)) + if (unlikely(!lmv_magic_supported(lump->lum_magic))) return -EINVAL; - CDEBUG(D_VFSTRACE, - "VFS Op:inode=" DFID "(%p) name %s stripe_offset %d, stripe_count: %u\n", - PFID(ll_inode2fid(parent)), parent, dirname, - (int)lump->lum_stripe_offset, lump->lum_stripe_count); + if (lump->lum_magic != LMV_MAGIC_FOREIGN) { + CDEBUG(D_VFSTRACE, + "VFS Op:inode=" DFID "(%p) name %s stripe_offset %d, stripe_count: %u\n", + PFID(ll_inode2fid(parent)), parent, dirname, + (int)lump->lum_stripe_offset, lump->lum_stripe_count); + } else { + struct lmv_foreign_md *lfm = (struct lmv_foreign_md *)lump; + + CDEBUG(D_VFSTRACE, + "VFS Op:inode=" DFID "(%p) name %s foreign, length %u, value '%.*s'\n", + PFID(ll_inode2fid(parent)), parent, dirname, + lfm->lfm_length, lfm->lfm_length, lfm->lfm_value); + } if (lump->lum_stripe_count > 1 && !(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_DIR_STRIPE)) @@ -438,8 +454,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, !OBD_FAIL_CHECK(OBD_FAIL_LLITE_NO_CHECK_DEAD)) return -ENOENT; - if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC) && - lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC_SPECIFIC)) + if (unlikely(!lmv_magic_supported(cpu_to_le32(lump->lum_magic)))) lustre_swab_lmv_user_md(lump); if (!IS_POSIXACL(parent) || !exp_connect_umask(ll_i2mdexp(parent))) @@ -721,6 +736,17 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, } } break; + case LMV_MAGIC_FOREIGN: { + struct lmv_foreign_md *lfm = (struct lmv_foreign_md *)lmm; + + if (cpu_to_le32(LMV_MAGIC_FOREIGN) != LMV_MAGIC_FOREIGN) { + __swab32s(&lfm->lfm_magic); + __swab32s(&lfm->lfm_length); + __swab32s(&lfm->lfm_type); + __swab32s(&lfm->lfm_flags); + } + break; + } default: CERROR("unknown magic: %lX\n", (unsigned long)lmm->lmm_magic); rc = -EPROTO; @@ -1313,9 +1339,24 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) lum = (struct lmv_user_md *)data->ioc_inlbuf2; lumlen = data->ioc_inllen2; - if ((lum->lum_magic != LMV_USER_MAGIC && - lum->lum_magic != LMV_USER_MAGIC_SPECIFIC) || + if (!lmv_magic_supported(lum->lum_magic)) { + CERROR("%s: wrong lum magic %x : rc = %d\n", filename, + lum->lum_magic, -EINVAL); + rc = -EINVAL; + goto lmv_out_free; + } + + if ((lum->lum_magic == LMV_USER_MAGIC || + lum->lum_magic == LMV_USER_MAGIC_SPECIFIC) && lumlen < sizeof(*lum)) { + CERROR("%s: wrong lum size %d for magic %x : rc = %d\n", + filename, lumlen, lum->lum_magic, -EINVAL); + rc = -EINVAL; + goto lmv_out_free; + } + + if (lum->lum_magic == LMV_MAGIC_FOREIGN && + lumlen < sizeof(struct lmv_foreign_md)) { CERROR("%s: wrong lum magic %x or size %d: rc = %d\n", filename, lum->lum_magic, lumlen, -EFAULT); rc = -EINVAL; @@ -1447,7 +1488,25 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto finish_req; } - stripe_count = lmv_mds_md_stripe_count_get(lmm); + /* if foreign LMV case, fake stripes number */ + if (lmm->lmv_magic == LMV_MAGIC_FOREIGN) { + struct lmv_foreign_md *lfm; + + lfm = (struct lmv_foreign_md *)lmm; + if (lfm->lfm_length < XATTR_SIZE_MAX - + offsetof(typeof(*lfm), lfm_value)) { + u32 size = lfm->lfm_length + + offsetof(typeof(*lfm), lfm_value); + + stripe_count = lmv_foreign_to_md_stripes(size); + } else { + CERROR("invalid %d foreign size returned\n", + lfm->lfm_length); + return -EINVAL; + } + } else { + stripe_count = lmv_mds_md_stripe_count_get(lmm); + } if (max_stripe_count < stripe_count) { lum.lum_stripe_count = stripe_count; if (copy_to_user(ulmv, &lum, sizeof(lum))) { @@ -1458,6 +1517,19 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto finish_req; } + /* enough room on user side and foreign case */ + if (lmm->lmv_magic == LMV_MAGIC_FOREIGN) { + struct lmv_foreign_md *lfm; + u32 size; + + lfm = (struct lmv_foreign_md *)lmm; + size = lfm->lfm_length + + offsetof(struct lmv_foreign_md, lfm_value); + if (copy_to_user(ulmv, lfm, size)) + rc = -EFAULT; + goto finish_req; + } + lum_size = lmv_user_md_size(stripe_count, LMV_USER_MAGIC_SPECIFIC); tmp = kzalloc(lum_size, GFP_NOFS); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 0d7d566..76d3b4c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4249,6 +4249,11 @@ static int ll_merge_md_attr(struct inode *inode) int rc; LASSERT(lli->lli_lsm_md); + + /* foreign dir is not striped dir */ + if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_FOREIGN) + return 0; + down_read(&lli->lli_lsm_sem); rc = md_merge_attr(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md, &attr, ll_md_blocking_ast); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index fd19035..21825251 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1329,8 +1329,12 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) /* * if dir layout mismatch, check whether version is increased, which * means layout is changed, this happens in dir migration and lfsck. + * + * foreign LMV should not change. */ - if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { + if (lli->lli_lsm_md && + lli->lli_lsm_md->lsm_md_magic != LMV_MAGIC_FOREIGN && + !lsm_md_eq(lli->lli_lsm_md, lsm)) { if (lsm->lsm_md_layout_version <= lli->lli_lsm_md->lsm_md_layout_version) { CERROR("%s: " DFID " dir layout mismatch:\n", @@ -1352,6 +1356,16 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) if (!lli->lli_lsm_md) { struct cl_attr *attr; + if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) { + /* set md->lmv to NULL, so the following free lustre_md + * will not free this lsm + */ + md->lmv = NULL; + lli->lli_lsm_md = lsm; + up_write(&lli->lli_lsm_sem); + return 0; + } + rc = ll_init_lsm_md(inode, md); up_write(&lli->lli_lsm_sem); if (rc) @@ -2297,7 +2311,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, rc = md_get_lustre_md(sbi->ll_md_exp, req, sbi->ll_dt_exp, sbi->ll_md_exp, &md); if (rc) - goto cleanup; + goto out; if (*inode) { rc = ll_update_inode(*inode, &md); @@ -2365,8 +2379,8 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, } out: + /* cleanup will be done if necessary */ md_free_lustre_md(sbi->ll_md_exp, &md); -cleanup: if (rc != 0 && it && it->it_op & IT_OPEN) ll_open_cleanup(sb ? sb : (*inode)->i_sb, req); diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 45f1ac5..84a21a0 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -276,6 +276,11 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, u64 flags = it->it_flags; int rc; + /* do not allow file creation in foreign dir */ + if ((it->it_op & IT_CREAT) && op_data->op_mea1 && + op_data->op_mea1->lsm_md_magic == LMV_MAGIC_FOREIGN) + return -ENODATA; + if ((it->it_op & IT_CREAT) && !(flags & MDS_OPEN_BY_FID)) { /* don't allow create under dir with bad hash */ if (lmv_is_dir_bad_hash(op_data->op_mea1)) @@ -426,6 +431,15 @@ static int lmv_intent_lookup(struct obd_export *exp, struct mdt_body *body; int rc; + /* foreign dir is not striped */ + if (op_data->op_mea1 && + op_data->op_mea1->lsm_md_magic == LMV_MAGIC_FOREIGN) { + /* only allow getattr/lookup for itself */ + if (op_data->op_name) + return -ENODATA; + return 0; + } + retry: tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); if (IS_ERR(tgt)) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 9f3d6de..dc4bd1e 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1166,15 +1166,22 @@ static int lmv_placement_policy(struct obd_device *obd, * 2. Then check if there is default stripe offset. * 3. Finally choose MDS by name hash if the parent * is striped directory. (see lmv_locate_tgt()). + * + * presently explicit MDT location is not supported + * for foreign dirs (as it can't be embedded into free + * format LMV, like with lum_stripe_offset), so we only + * rely on default stripe offset or then name hashing. */ if (op_data->op_cli_flags & CLI_SET_MEA && lum && + le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN) && le32_to_cpu(lum->lum_stripe_offset) != (u32)-1) { *mds = le32_to_cpu(lum->lum_stripe_offset); } else if (op_data->op_default_stripe_offset != (u32)-1) { *mds = op_data->op_default_stripe_offset; op_data->op_mds = *mds; /* Correct the stripe offset in lum */ - if (lum) + if (lum && + le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN)) lum->lum_stripe_offset = cpu_to_le32(*mds); } else { *mds = op_data->op_mds; @@ -1606,6 +1613,10 @@ struct lmv_tgt_desc* struct lmv_oinfo *oinfo; struct lmv_tgt_desc *tgt; + /* foreign dir is not striped dir */ + if (lsm && lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) + return ERR_PTR(-ENODATA); + /* * During creating VOLATILE file, it should honor the mdt * index if the file under striped dir is being restored, see @@ -2657,6 +2668,10 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, struct lmv_tgt_desc *tgt; if (unlikely(lsm)) { + /* foreign dir is not striped dir */ + if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) + return -ENODATA; + return lmv_striped_read_page(exp, op_data, cb_op, offset, ppage); } @@ -2962,6 +2977,16 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, /* Free memmd */ if (lsm && !lmm) { int i; + struct lmv_foreign_md *lfm = (struct lmv_foreign_md *)lsm; + + if (lfm->lfm_magic == LMV_MAGIC_FOREIGN) { + size_t lfm_size; + + lfm_size = lfm->lfm_length + offsetof(typeof(*lfm), + lfm_value[0]); + kvfree(lfm); + return 0; + } for (i = 0; i < lsm->lsm_md_stripe_count; i++) iput(lsm->lsm_md_oinfo[i].lmo_root); @@ -2971,6 +2996,25 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, return 0; } + /* foreign lmv case */ + if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_FOREIGN) { + struct lmv_foreign_md *lfm = (struct lmv_foreign_md *)lsm; + + if (!lfm) { + lfm = kvzalloc(lmm_size, GFP_NOFS); + if (!lfm) + return -ENOMEM; + *lsmp = (struct lmv_stripe_md *)lfm; + } + lfm->lfm_magic = le32_to_cpu(lmm->lmv_foreign_md.lfm_magic); + lfm->lfm_length = le32_to_cpu(lmm->lmv_foreign_md.lfm_length); + lfm->lfm_type = le32_to_cpu(lmm->lmv_foreign_md.lfm_type); + lfm->lfm_flags = le32_to_cpu(lmm->lmv_foreign_md.lfm_flags); + memcpy(&lfm->lfm_value, &lmm->lmv_foreign_md.lfm_value, + lfm->lfm_length); + return lmm_size; + } + if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_STRIPE) return -EPERM; @@ -3279,6 +3323,10 @@ static int lmv_merge_attr(struct obd_export *exp, { int rc, i; + /* foreign dir is not striped dir */ + if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) + return 0; + rc = lmv_revalidate_slaves(exp, lsm, cb_blocking, 0); if (rc < 0) return rc; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 5931bc1..57da3c3 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -613,11 +613,18 @@ static int mdc_get_lustre_md(struct obd_export *exp, goto out; if (rc < (typeof(rc))sizeof(*md->lmv)) { - CDEBUG(D_INFO, - "size too small: rc < sizeof(*md->lmv) (%d < %d)\n", - rc, (int)sizeof(*md->lmv)); - rc = -EPROTO; - goto out; + struct lmv_foreign_md *lfm = md->lfm; + + /* short (< sizeof(struct lmv_stripe_md)) + * foreign LMV case + */ + if (lfm->lfm_magic != LMV_MAGIC_FOREIGN) { + CDEBUG(D_INFO, + "size too small: rc < sizeof(*md->lmv) (%d < %d)\n", + rc, (int)sizeof(*md->lmv)); + rc = -EPROTO; + goto out; + } } } } diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 231cb26..a4f28f3 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1974,8 +1974,15 @@ void lustre_swab_lmv_user_md_objects(struct lmv_user_mds_data *lmd, void lustre_swab_lmv_user_md(struct lmv_user_md *lum) { - u32 count = lum->lum_stripe_count; + u32 count; + if (lum->lum_magic == LMV_MAGIC_FOREIGN) { + __swab32s(&lum->lum_magic); + __swab32s(&((struct lmv_foreign_md *)lum)->lfm_length); + return; + } + + count = lum->lum_stripe_count; __swab32s(&lum->lum_magic); __swab32s(&lum->lum_stripe_count); __swab32s(&lum->lum_stripe_offset); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index fd35023..f7ea744 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1976,11 +1976,21 @@ struct lmv_mds_md_v1 { struct lu_fid lmv_stripe_fids[0]; /* FIDs for each stripe */ }; +/* foreign LMV EA */ +struct lmv_foreign_md { + __u32 lfm_magic; /* magic number = LMV_MAGIC_FOREIGN */ + __u32 lfm_length; /* length of lfm_value */ + __u32 lfm_type; /* type, see LU_FOREIGN_TYPE_ */ + __u32 lfm_flags; /* flags, type specific */ + char lfm_value[]; /* free format value */ +}; + #define LMV_MAGIC_V1 0x0CD20CD0 /* normal stripe lmv magic */ #define LMV_MAGIC LMV_MAGIC_V1 /* #define LMV_USER_MAGIC 0x0CD30CD0 */ #define LMV_MAGIC_STRIPE 0x0CD40CD0 /* magic for dir sub_stripe */ +#define LMV_MAGIC_FOREIGN 0x0CD50CD0 /* magic for lmv foreign */ /* *Right now only the lower part(0-16bits) of lmv_hash_type is being used, @@ -2025,6 +2035,7 @@ static inline __u64 lustre_hash_fnv_1a_64(const void *buf, size_t size) __u32 lmv_magic; struct lmv_mds_md_v1 lmv_md_v1; struct lmv_user_md lmv_user_md; + struct lmv_foreign_md lmv_foreign_md; }; static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index ad5d446..03ec680 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -474,7 +474,7 @@ struct lov_user_md_v3 { /* LOV EA user data (host-endian) */ struct lov_foreign_md { __u32 lfm_magic; /* magic number = LOV_MAGIC_FOREIGN */ __u32 lfm_length; /* length of lfm_value */ - __u32 lfm_type; /* type, see LOV_FOREIGN_TYPE_ */ + __u32 lfm_type; /* type, see LU_FOREIGN_TYPE_ */ __u32 lfm_flags; /* flags, type specific */ char lfm_value[]; }; @@ -645,19 +645,22 @@ enum lmv_hash_type { #define LMV_HASH_NAME_ALL_CHARS "all_char" #define LMV_HASH_NAME_FNV_1A_64 "fnv_1a_64" -/** - * LOV foreign types - **/ -#define LOV_FOREIGN_TYPE_NONE 0 -#define LOV_FOREIGN_TYPE_DAOS 0xda05 -#define LOV_FOREIGN_TYPE_UNKNOWN UINT32_MAX - struct lustre_foreign_type { uint32_t lft_type; const char *lft_name; }; -extern struct lustre_foreign_type lov_foreign_type[]; +/** + * LOV/LMV foreign types + **/ +enum lustre_foreign_types { + LU_FOREIGN_TYPE_NONE = 0, + LU_FOREIGN_TYPE_DAOS = 0xda05, + /* must be the max/last one */ + LU_FOREIGN_TYPE_UNKNOWN = 0xffffffff, +}; + +extern struct lustre_foreign_type lu_foreign_types[]; /* * Got this according to how get LOV_MAX_STRIPE_COUNT, see above, @@ -678,6 +681,16 @@ struct lmv_user_md_v1 { struct lmv_user_mds_data lum_objects[0]; } __packed; +static inline __u32 lmv_foreign_to_md_stripes(__u32 size) +{ + if (size <= sizeof(struct lmv_user_md)) + return 0; + + size -= sizeof(struct lmv_user_md); + return (size + sizeof(struct lmv_user_mds_data) - 1) / + sizeof(struct lmv_user_mds_data); +} + static inline int lmv_user_md_size(int stripes, int lmm_magic) { int size = sizeof(struct lmv_user_md); From patchwork Thu Feb 27 21:12:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410293 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3DE6138D for ; Thu, 27 Feb 2020 21:34:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CC96D24677 for ; Thu, 27 Feb 2020 21:34:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC96D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 66EE321FF11; Thu, 27 Feb 2020 13:28:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A22521FE4C for ; Thu, 27 Feb 2020 13:19:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9DF1A8A2C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9BD41468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:28 -0500 Message-Id: <1582838290-17243-281-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 280/622] lustre: obd: replace class_uuid with linux kernel version. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" We can replace the lustre custom class_uuid_t with the linux kernels uuid handling. WC-bug-id: https://jira.whamcloud.com/browse/LU-11803 Lustre-commit: 604c266a175b ("LU-11803 obd: replace class_uuid with linux kernel version.") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/33916 Reviewed-by: Petros Koutoupis Reviewed-by: Ben Evans Reviewed-by: Yang Sheng Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 10 ---------- fs/lustre/llite/llite_lib.c | 23 +++++++++++++---------- fs/lustre/obdclass/obd_mount.c | 8 +++++--- 3 files changed, 18 insertions(+), 23 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 6cddc4f..a142d6e 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1672,13 +1672,6 @@ struct lwp_register_item { /* obd_mount.c */ int lustre_check_exclusion(struct super_block *sb, char *svname); -typedef u8 class_uuid_t[16]; - -static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) -{ - sprintf(out->uuid, "%pU", uu); -} - /* lustre_peer.c */ int lustre_uuid_to_peer(const char *uuid, lnet_nid_t *peer_nid, int index); int class_add_uuid(const char *uuid, u64 nid); @@ -1689,9 +1682,6 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out) extern char obd_jobid_name[]; int class_procfs_init(void); int class_procfs_clean(void); -/* prng.c */ -#define ll_generate_random_uuid(uuid_out) \ - get_random_bytes(uuid_out, sizeof(class_uuid_t)) /* statfs_pack.c */ struct kstatfs; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 21825251..99cedcf 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -38,9 +38,11 @@ #define DEBUG_SUBSYSTEM S_LLITE #include +#include #include #include #include +#include #include #include #include @@ -69,7 +71,6 @@ static struct ll_sb_info *ll_init_sbi(void) unsigned long pages; unsigned long lru_page_max; struct sysinfo si; - class_uuid_t uuid; int i; sbi = kzalloc(sizeof(*sbi), GFP_NOFS); @@ -97,11 +98,6 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_ra_info.ra_max_pages = sbi->ll_ra_info.ra_max_pages_per_file; sbi->ll_ra_info.ra_max_read_ahead_whole_pages = -1; - ll_generate_random_uuid(uuid); - sprintf(sbi->ll_sb_uuid.uuid, "%pU", uuid); - - CDEBUG(D_CONFIG, "generated uuid: %s\n", sbi->ll_sb_uuid.uuid); - sbi->ll_flags |= LL_SBI_VERBOSE; sbi->ll_flags |= LL_SBI_CHECKSUM; sbi->ll_flags |= LL_SBI_FLOCK; @@ -965,6 +961,7 @@ int ll_fill_super(struct super_block *sb) char *profilenm = get_profile_name(sb); struct config_llog_instance *cfg; char name[MAX_OBD_NAME]; + uuid_t uuid; char *ptr; int len; int err; @@ -991,13 +988,15 @@ int ll_fill_super(struct super_block *sb) if (err) goto out_free; - err = super_setup_bdi_name(sb, "lustre-%p", sb); - if (err) - goto out_free; - /* kernel >= 2.6.38 store dentry operations in sb->s_d_op. */ sb->s_d_op = &ll_d_ops; + /* UUID handling */ + generate_random_uuid(uuid.b); + snprintf(sbi->ll_sb_uuid.uuid, UUID_SIZE, "%pU", uuid.b); + + CDEBUG(D_CONFIG, "llite sb uuid: %s\n", sbi->ll_sb_uuid.uuid); + /* Get fsname */ len = strlen(lsi->lsi_lmd->lmd_profile); ptr = strrchr(lsi->lsi_lmd->lmd_profile, '-'); @@ -1021,6 +1020,10 @@ int ll_fill_super(struct super_block *sb) snprintf(name, sizeof(name), "%.*s-%px", len, lsi->lsi_lmd->lmd_profile, sb); + err = super_setup_bdi_name(sb, "%s", name); + if (err) + goto out_free; + /* Call ll_debugsfs_register_super() before lustre_process_log() * so that "llite.*.*" params can be processed correctly. */ diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c index 6c68bc7..31f2f5b 100644 --- a/fs/lustre/obdclass/obd_mount.c +++ b/fs/lustre/obdclass/obd_mount.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include #include #include @@ -216,7 +218,7 @@ int lustre_start_mgc(struct super_block *sb) struct obd_device *obd; struct obd_export *exp; struct obd_uuid *uuid = NULL; - class_uuid_t uuidc; + uuid_t uuidc; lnet_nid_t nid; char nidstr[LNET_NIDSTR_SIZE]; char *mgcname = NULL, *niduuid = NULL, *mgssec = NULL; @@ -336,8 +338,8 @@ int lustre_start_mgc(struct super_block *sb) goto out_free; } - ll_generate_random_uuid(uuidc); - sprintf(uuid->uuid, "%pU", uuidc); + generate_random_uuid(uuidc.b); + snprintf(uuid->uuid, UUID_SIZE, "%pU", uuidc.b); /* Start the MGC */ rc = lustre_start_simple(mgcname, LUSTRE_MGC_NAME, From patchwork Thu Feb 27 21:12:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49F4417E0 for ; Thu, 27 Feb 2020 21:46:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 305D2246A1 for ; Thu, 27 Feb 2020 21:46:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 305D2246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6F05034A141; Thu, 27 Feb 2020 13:36:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A23DD21FA75 for ; Thu, 27 Feb 2020 13:19:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9FD948A2D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9EA3A46A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:29 -0500 Message-Id: <1582838290-17243-282-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 281/622] lustre: ptlrpc: Fix style issues for sec_null.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/sec_null.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 7d00fbae100b ("LU-6142 ptlrpc: Fix style issues for sec_null.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34549 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ptlrpc/sec_null.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/lustre/ptlrpc/sec_null.c b/fs/lustre/ptlrpc/sec_null.c index 3c7fb68..2eaa788 100644 --- a/fs/lustre/ptlrpc/sec_null.c +++ b/fs/lustre/ptlrpc/sec_null.c @@ -101,6 +101,7 @@ int null_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req) if (req->rq_early) { cksums = lustre_msg_get_cksum(req->rq_repdata); cksumc = lustre_msg_calc_cksum(req->rq_repmsg); + if (cksumc != cksums) { CDEBUG(D_SEC, "early reply checksum mismatch: %08x != %08x\n", @@ -119,7 +120,8 @@ struct ptlrpc_sec *null_create_sec(struct obd_import *imp, { LASSERT(SPTLRPC_FLVR_POLICY(sf->sf_rpc) == SPTLRPC_POLICY_NULL); - /* general layer has take a module reference for us, because we never + /* + * general layer has take a module reference for us, because we never * really destroy the sec, simply release the reference here. */ sptlrpc_policy_put(&null_policy); @@ -142,9 +144,8 @@ struct ptlrpc_cli_ctx *null_lookup_ctx(struct ptlrpc_sec *sec, } static -int null_flush_ctx_cache(struct ptlrpc_sec *sec, - uid_t uid, - int grace, int force) +int null_flush_ctx_cache(struct ptlrpc_sec *sec, uid_t uid, int grace, + int force) { return 0; } @@ -250,7 +251,8 @@ int null_enlarge_reqbuf(struct ptlrpc_sec *sec, if (!newbuf) return -ENOMEM; - /* Must lock this, so that otherwise unprotected change of + /* + * Must lock this, so that otherwise unprotected change of * rq_reqmsg is not racing with parallel processing of * imp_replay_list traversing threads. See LU-3333 * This is a bandaid at best, we really need to deal with this @@ -454,6 +456,6 @@ void sptlrpc_null_fini(void) rc = sptlrpc_unregister_policy(&null_policy); if (rc) - CERROR("failed to unregister %s: %d\n", - null_policy.sp_name, rc); + CERROR("failed to unregister %s: %d\n", null_policy.sp_name, + rc); } From patchwork Thu Feb 27 21:12:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410177 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F13192A for ; Thu, 27 Feb 2020 21:32:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 67D0724677 for ; Thu, 27 Feb 2020 21:32:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 67D0724677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 490B63499E5; Thu, 27 Feb 2020 13:27:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E2A3421FE60 for ; Thu, 27 Feb 2020 13:19:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A42CD8A2E; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A1DCE46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:30 -0500 Message-Id: <1582838290-17243-283-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 282/622] lustre: ptlrpc: Fix style issues for service.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/service.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: cb82520d2474 ("LU-6142 ptlrpc: Fix style issues for service.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34605 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 159 +++++++++++++++++++++++++++++---------------- 1 file changed, 102 insertions(+), 57 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 362102b..1513f51 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -145,7 +145,8 @@ static int ptlrpc_grow_req_bufs(struct ptlrpc_service_part *svcpt, int post) spin_unlock(&svcpt->scp_lock); for (i = 0; i < svc->srv_nbuf_per_group; i++) { - /* NB: another thread might have recycled enough rqbds, we + /* + * NB: another thread might have recycled enough rqbds, we * need to make sure it wouldn't over-allocate, see LU-1212. */ if (svcpt->scp_nrqbds_posted >= svc->srv_nbuf_per_group || @@ -321,7 +322,8 @@ static int ptlrpc_server_post_idle_rqbds(struct ptlrpc_service_part *svcpt) svcpt->scp_nrqbds_posted--; list_move_tail(&rqbd->rqbd_list, &svcpt->scp_rqbd_idle); - /* Don't complain if no request buffers are posted right now; LNET + /* + * Don't complain if no request buffers are posted right now; LNET * won't drop requests because we set the portal lazy! */ @@ -362,13 +364,15 @@ static void ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, init = PTLRPC_NTHRS_INIT + (svc->srv_ops.so_hpreq_handler != NULL); init = max_t(int, init, tc->tc_nthrs_init); - /* NB: please see comments in lustre_lnet.h for definition + /* + * NB: please see comments in lustre_lnet.h for definition * details of these members */ LASSERT(tc->tc_nthrs_max != 0); if (tc->tc_nthrs_user != 0) { - /* In case there is a reason to test a service with many + /* + * In case there is a reason to test a service with many * threads, we give a less strict check here, it can * be up to 8 * nthrs_max */ @@ -380,7 +384,8 @@ static void ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, total = tc->tc_nthrs_max; if (tc->tc_nthrs_base == 0) { - /* don't care about base threads number per partition, + /* + * don't care about base threads number per partition, * this is most for non-affinity service */ nthrs = total / svc->srv_ncpts; @@ -391,7 +396,8 @@ static void ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, if (svc->srv_ncpts == 1) { int i; - /* NB: Increase the base number if it's single partition + /* + * NB: Increase the base number if it's single partition * and total number of cores/HTs is larger or equal to 4. * result will always < 2 * nthrs_base */ @@ -419,7 +425,8 @@ static void ptlrpc_server_nthreads_check(struct ptlrpc_service *svc, */ /* weight is # of HTs */ preempt_disable(); - if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1) { + if (cpumask_weight + (topology_sibling_cpumask(smp_processor_id())) > 1) { /* depress thread factor for hyper-thread */ factor = factor - (factor >> 1) + (factor >> 3); } @@ -511,7 +518,8 @@ static int ptlrpc_service_part_init(struct ptlrpc_service *svc, timer_setup(&svcpt->scp_at_timer, ptlrpc_at_timer, 0); - /* At SOW, service time should be quick; 10s seems generous. If client + /* + * At SOW, service time should be quick; 10s seems generous. If client * timeout is less than this, we'll be sending an early reply. */ at_init(&svcpt->scp_at_estimate, 10, 0); @@ -520,7 +528,8 @@ static int ptlrpc_service_part_init(struct ptlrpc_service *svc, svcpt->scp_service = svc; /* Now allocate the request buffers, but don't post them now */ rc = ptlrpc_grow_req_bufs(svcpt, 0); - /* We shouldn't be under memory pressure at startup, so + /* + * We shouldn't be under memory pressure at startup, so * fail if we can't allocate all our buffers at this time. */ if (rc != 0) @@ -719,7 +728,8 @@ static void ptlrpc_server_free_request(struct ptlrpc_request *req) LASSERT(atomic_read(&req->rq_refcount) == 0); LASSERT(list_empty(&req->rq_timed_list)); - /* DEBUG_REQ() assumes the reply state of a request with a valid + /* + * DEBUG_REQ() assumes the reply state of a request with a valid * ref will not be destroyed until that reference is dropped. */ ptlrpc_req_drop_rs(req); @@ -727,7 +737,8 @@ static void ptlrpc_server_free_request(struct ptlrpc_request *req) sptlrpc_svc_ctx_decref(req); if (req != &req->rq_rqbd->rqbd_req) { - /* NB request buffers use an embedded + /* + * NB request buffers use an embedded * req if the incoming req unlinked the * MD; this isn't one of them! */ @@ -751,7 +762,8 @@ static void ptlrpc_server_drop_request(struct ptlrpc_request *req) if (req->rq_at_linked) { spin_lock(&svcpt->scp_at_lock); - /* recheck with lock, in case it's unlinked by + /* + * recheck with lock, in case it's unlinked by * ptlrpc_at_check_timed() */ if (likely(req->rq_at_linked)) @@ -777,7 +789,8 @@ static void ptlrpc_server_drop_request(struct ptlrpc_request *req) list_move_tail(&rqbd->rqbd_list, &svcpt->scp_hist_rqbds); svcpt->scp_hist_nrqbds++; - /* cull some history? + /* + * cull some history? * I expect only about 1 or 2 rqbds need to be recycled here */ while (svcpt->scp_hist_nrqbds > svc->srv_hist_nrqbds_cpt_max) { @@ -788,11 +801,12 @@ static void ptlrpc_server_drop_request(struct ptlrpc_request *req) list_del(&rqbd->rqbd_list); svcpt->scp_hist_nrqbds--; - /* remove rqbd's reqs from svc's req history while + /* + * remove rqbd's reqs from svc's req history while * I've got the service lock */ list_for_each_entry(req, &rqbd->rqbd_reqs, rq_list) { - /* Track the highest culled req seq */ + /* Track the highest culled */ if (req->rq_history_seq > svcpt->scp_hist_seq_culled) { svcpt->scp_hist_seq_culled = @@ -980,7 +994,8 @@ static int ptlrpc_at_add_timed(struct ptlrpc_request *req) div_u64_rem(req->rq_deadline, array->paa_size, &index); if (array->paa_reqs_count[index] > 0) { - /* latest rpcs will have the latest deadlines in the list, + /* + * latest rpcs will have the latest deadlines in the list, * so search backward. */ list_for_each_entry_reverse(rq, &array->paa_reqs_array[index], @@ -1043,7 +1058,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) time64_t newdl; int rc; - /* deadline is when the client expects us to reply, margin is the + /* + * deadline is when the client expects us to reply, margin is the * difference between clients' and servers' expectations */ DEBUG_REQ(D_ADAPTTO, req, @@ -1057,14 +1073,15 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) if (olddl < 0) { DEBUG_REQ(D_WARNING, req, - "Already past deadline (%+lds), not sending early reply. Consider increasing at_early_margin (%d)?", - olddl, at_early_margin); + "Already past deadline (%+llds), not sending early reply. Consider increasing at_early_margin (%d)?", + (s64)olddl, at_early_margin); /* Return an error so we're not re-added to the timed list. */ return -ETIMEDOUT; } - if (!(lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT)) { + if (!(lustre_msghdr_get_flags(req->rq_reqmsg) & + MSGHDR_AT_SUPPORT)) { DEBUG_REQ(D_INFO, req, "Wanted to ask client for more time, but no AT support"); return -ENOSYS; @@ -1082,7 +1099,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) ktime_get_real_seconds() - req->rq_arrival_time.tv_sec); newdl = req->rq_arrival_time.tv_sec + at_get(&svcpt->scp_at_estimate); - /* Check to see if we've actually increased the deadline - + /* + * Check to see if we've actually increased the deadline - * we may be past adaptive_max */ if (req->rq_deadline >= newdl) { @@ -1159,7 +1177,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) DEBUG_REQ(D_ERROR, req, "Early reply send failed %d", rc); } - /* Free the (early) reply state from lustre_pack_reply. + /* + * Free the (early) reply state from lustre_pack_reply. * (ptlrpc_send_reply takes it's own rs ref, so this is safe here) */ ptlrpc_req_drop_rs(reqcopy); @@ -1175,7 +1194,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) return rc; } -/* Send early replies to everybody expiring within at_early_margin +/* + * Send early replies to everybody expiring within at_early_margin * asking for at_extra time */ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) @@ -1211,7 +1231,8 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) return; } - /* We're close to a timeout, and we don't know how much longer the + /* + * We're close to a timeout, and we don't know how much longer the * server will take. Send early replies to everyone expiring soon. */ INIT_LIST_HEAD(&work_list); @@ -1258,7 +1279,8 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) "timeout in %+ds, asking for %d secs on %d early replies\n", first, at_extra, counter); if (first < 0) { - /* We're already past request deadlines before we even get a + /* + * We're already past request deadlines before we even get a * chance to send early replies */ LCONSOLE_WARN("%s: This server is not able to keep up with request traffic (cpu-bound).\n", @@ -1269,7 +1291,8 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) at_get(&svcpt->scp_at_estimate), delay); } - /* we took additional refcount so entries can't be deleted from list, no + /* + * we took additional refcount so entries can't be deleted from list, no * locking is needed */ while ((rq = list_first_entry_or_null(&work_list, @@ -1285,8 +1308,10 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) } /** + * * Put the request to the export list if the request may become * a high priority one. + */ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, struct ptlrpc_request *req) @@ -1300,7 +1325,8 @@ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, LASSERT(rc == 0); } if (req->rq_export && req->rq_ops) { - /* Perform request specific check. We should do this check + /* + * Perform request specific check. We should do this check * before the request is added into exp_hp_rpcs list otherwise * it may hit swab race at LU-1044. */ @@ -1310,9 +1336,10 @@ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, req->rq_status = rc; ptlrpc_error(req); } - /** can only return error, + /* + * can only return error, * 0 for normal request, - * or 1 for high priority request + * or 1 for high priority request */ LASSERT(rc <= 1); } @@ -1331,7 +1358,8 @@ static int ptlrpc_server_hpreq_init(struct ptlrpc_service_part *svcpt, static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req) { if (req->rq_export && req->rq_ops) { - /* refresh lock timeout again so that client has more + /* + * refresh lock timeout again so that client has more * room to send lock cancel RPC. */ if (req->rq_ops->hpreq_fini) @@ -1357,7 +1385,7 @@ static int ptlrpc_server_request_add(struct ptlrpc_service_part *svcpt, return 0; } -/** +/* * Allow to handle high priority request * User can call it w/o any lock but need to hold * ptlrpc_service_part::scp_req_lock to get reliable result @@ -1521,7 +1549,8 @@ static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, struct ptlrpc_request, rq_list); list_del_init(&req->rq_list); svcpt->scp_nreqs_incoming--; - /* Consider this still a "queued" request as far as stats are + /* + * Consider this still a "queued" request as far as stats are * concerned */ spin_unlock(&svcpt->scp_lock); @@ -1556,7 +1585,7 @@ static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, rc = lustre_unpack_req_ptlrpc_body(req, MSG_PTLRPC_BODY_OFF); if (rc) { - CERROR("error unpacking ptlrpc body: ptl %d from %s x%llu\n", + CERROR("error unpacking ptlrpc body: ptl %d from %s x %llu\n", svc->srv_req_portal, libcfs_id2str(req->rq_peer), req->rq_xid); goto err_req; @@ -1615,8 +1644,9 @@ static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, /* Set rpc server deadline and add it to the timed list */ deadline = (lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT) ? - /* The max time the client expects us to take */ - lustre_msg_get_timeout(req->rq_reqmsg) : obd_timeout; + /* The max time the client expects us to take */ + lustre_msg_get_timeout(req->rq_reqmsg) : obd_timeout; + req->rq_deadline = req->rq_arrival_time.tv_sec + deadline; if (unlikely(deadline == 0)) { DEBUG_REQ(D_ERROR, req, "Dropping request with 0 timeout"); @@ -1625,11 +1655,12 @@ static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, req->rq_svc_thread = thread; if (thread) { - /* initialize request session, it is needed for request + /* + * initialize request session, it is needed for request * processing by target */ - rc = lu_context_init(&req->rq_session, - LCT_SERVER_SESSION | LCT_NOREF); + rc = lu_context_init(&req->rq_session, LCT_SERVER_SESSION | + LCT_NOREF); if (rc) { CERROR("%s: failure to initialize session: rc = %d\n", thread->t_name, rc); @@ -1710,7 +1741,8 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, goto put_conn; } - /* Discard requests queued for longer than the deadline. + /* + * Discard requests queued for longer than the deadline. * The deadline is increased if we send an early reply. */ if (ktime_get_real_seconds() > request->rq_deadline) { @@ -1827,7 +1859,8 @@ static int ptlrpc_handle_rs(struct ptlrpc_reply_state *rs) list_del_init(&rs->rs_exp_list); spin_unlock(&exp->exp_lock); - /* The disk commit callback holds exp_uncommitted_replies_lock while it + /* + * The disk commit callback holds exp_uncommitted_replies_lock while it * iterates over newly committed replies, removing them from * exp_uncommitted_replies. It then drops this lock and schedules the * replies it found for handling here. @@ -1864,7 +1897,8 @@ static int ptlrpc_handle_rs(struct ptlrpc_reply_state *rs) rs->rs_nlocks = 0; /* locks still on rs_locks! */ if (nlocks == 0 && !been_handled) { - /* If we see this, we should already have seen the warning + /* + * If we see this, we should already have seen the warning * in mds_steal_ack_locks() */ CDEBUG(D_HA, @@ -1916,7 +1950,8 @@ static void ptlrpc_check_rqbd_pool(struct ptlrpc_service_part *svcpt) /* NB I'm not locking; just looking. */ - /* CAVEAT EMPTOR: We might be allocating buffers here because we've + /* + * CAVEAT EMPTOR: We might be allocating buffers here because we've * allowed the request history to grow out of control. We could put a * sanity check on that here and cull some history if we need the * space. @@ -2194,7 +2229,8 @@ static int ptlrpc_main(void *arg) LASSERT(svcpt->scp_nthrs_starting == 1); svcpt->scp_nthrs_starting--; - /* SVC_STOPPING may already be set here if someone else is trying + /* + * SVC_STOPPING may already be set here if someone else is trying * to stop the service while this new thread has been dynamically * forked. We still set SVC_RUNNING to let our creator know that * we are now running, however we will exit as soon as possible @@ -2254,7 +2290,8 @@ static int ptlrpc_main(void *arg) if (ptlrpc_rqbd_pending(svcpt) && ptlrpc_server_post_idle_rqbds(svcpt) < 0) { - /* I just failed to repost request buffers. + /* + * I just failed to repost request buffers. * Wait for a timeout (unless something else * happens) before I try again */ @@ -2262,8 +2299,8 @@ static int ptlrpc_main(void *arg) CDEBUG(D_RPCTRACE, "Posted buffers: %d\n", svcpt->scp_nrqbds_posted); } - - /* If the number of threads has been tuned downward and this + /* + * If the number of threads has been tuned downward and this * thread should be stopped, then stop in reverse order so the * the threads always have contiguous thread index values. */ @@ -2285,7 +2322,6 @@ static int ptlrpc_main(void *arg) out: CDEBUG(D_RPCTRACE, "%s: service thread [%p:%u] %d exiting: rc = %d\n", thread->t_name, thread, thread->t_pid, thread->t_id, rc); - spin_lock(&svcpt->scp_lock); if (thread_test_and_clear_flags(thread, SVC_STARTING)) svcpt->scp_nthrs_starting--; @@ -2546,7 +2582,8 @@ int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait) } if (svcpt->scp_nthrs_starting != 0) { - /* serialize starting because some modules (obdfilter) + /* + * serialize starting because some modules (obdfilter) * might require unique and contiguous t_id */ LASSERT(svcpt->scp_nthrs_starting == 1); @@ -2589,7 +2626,8 @@ int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait) spin_lock(&svcpt->scp_lock); --svcpt->scp_nthrs_starting; if (thread_is_stopping(thread)) { - /* this ptlrpc_thread is being handled + /* + * this ptlrpc_thread is being handled * by ptlrpc_svcpt_stop_threads now */ thread_add_flags(thread, SVC_STOPPED); @@ -2616,7 +2654,7 @@ int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait) int ptlrpc_hr_init(void) { struct ptlrpc_hr_partition *hrp; - struct ptlrpc_hr_thread *hrt; + struct ptlrpc_hr_thread *hrt; int rc; int i; int j; @@ -2736,7 +2774,8 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt) int rc; int i; - /* All history will be culled when the next request buffer is + /* + * All history will be culled when the next request buffer is * freed in ptlrpc_service_purge_all() */ svc->srv_hist_nrqbds_cpt_max = 0; @@ -2748,7 +2787,8 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt) if (!svcpt->scp_service) break; - /* Unlink all the request buffers. This forces a 'final' + /* + * Unlink all the request buffers. This forces a 'final' * event with its 'unlink' flag set for each posted rqbd */ list_for_each_entry(rqbd, &svcpt->scp_rqbd_posted, @@ -2762,13 +2802,15 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt) if (!svcpt->scp_service) break; - /* Wait for the network to release any buffers + /* + * Wait for the network to release any buffers * it's currently filling */ spin_lock(&svcpt->scp_lock); while (svcpt->scp_nrqbds_posted != 0) { spin_unlock(&svcpt->scp_lock); - /* Network access will complete in finite time but + /* + * Network access will complete in finite time but * the HUGE timeout lets us CWARN for visibility * of sluggish LNDs */ @@ -2811,7 +2853,8 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt) } spin_unlock(&svcpt->scp_rep_lock); - /* purge the request queue. NB No new replies (rqbds + /* + * purge the request queue. NB No new replies (rqbds * all unlinked) and no service threads, so I'm the only * thread noodling the request queue now */ @@ -2831,12 +2874,14 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt) LASSERT(list_empty(&svcpt->scp_rqbd_posted)); LASSERT(svcpt->scp_nreqs_incoming == 0); LASSERT(svcpt->scp_nreqs_active == 0); - /* history should have been culled by + /* + * history should have been culled by * ptlrpc_server_finish_request */ LASSERT(svcpt->scp_hist_nrqbds == 0); - /* Now free all the request buffers since nothing + /* + * Now free all the request buffers since nothing * references them any more... */ From patchwork Thu Feb 27 21:12:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410123 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF86617E0 for ; Thu, 27 Feb 2020 21:30:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D846F20801 for ; Thu, 27 Feb 2020 21:30:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D846F20801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89153348F9E; Thu, 27 Feb 2020 13:26:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4466821FE60 for ; Thu, 27 Feb 2020 13:19:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A69EC8A2F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A4AC646D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:31 -0500 Message-Id: <1582838290-17243-284-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 283/622] lustre: uapi: fix file heat support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Change the LL_IOC_HEAT_SET ioctl number assignment to reduce the number of different values used, since we are running out. Use a __u64 as the IOC struct argument instead of a "long" since that is what is actually passed, and it avoids being CPU-dependent. Move the LU_HEAT_FLAG_* values into an enum to avoid a generic "flags" argument in the code. This makes it clear what is passed. Clean up code style for lfs_heat_get() and lfs_heat_set(). Fixes: 868c66dca13f ("lustre: llite: add file heat support") WC-bug-id: https://jira.whamcloud.com/browse/LU-10602 Lustre-commit: ac1f97a88101 ("LU-10602 utils: fix file heat support") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/34757 Reviewed-by: Wang Shilong Reviewed-by: Alex Zhuravlev Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 +- include/uapi/linux/lustre/lustre_user.h | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 76d3b4c..e9d0ff9 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3193,7 +3193,7 @@ static void ll_heat_get(struct inode *inode, struct lu_heat *heat) spin_unlock(&lli->lli_heat_lock); } -static int ll_heat_set(struct inode *inode, u64 flags) +static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags) { struct ll_inode_info *lli = ll_i2info(inode); int rc = 0; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 03ec680..d52879e 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -354,7 +354,7 @@ struct ll_ioc_lease_id { #define LL_IOC_GETPARENT _IOWR('f', 249, struct getparent) #define LL_IOC_LADVISE _IOR('f', 250, struct llapi_lu_ladvise) #define LL_IOC_HEAT_GET _IOWR('f', 251, struct lu_heat) -#define LL_IOC_HEAT_SET _IOW('f', 252, long) +#define LL_IOC_HEAT_SET _IOW('f', 251, __u64) #define LL_STATFS_LMV 1 #define LL_STATFS_LOV 2 @@ -2010,8 +2010,10 @@ enum lu_heat_flag_bit { LU_HEAT_FLAG_BIT_CLEAR, }; -#define LU_HEAT_FLAG_CLEAR (1 << LU_HEAT_FLAG_BIT_CLEAR) -#define LU_HEAT_FLAG_OFF (1 << LU_HEAT_FLAG_BIT_OFF) +enum lu_heat_flag { + LU_HEAT_FLAG_OFF = 1ULL << LU_HEAT_FLAG_BIT_OFF, + LU_HEAT_FLAG_CLEAR = 1ULL << LU_HEAT_FLAG_BIT_CLEAR, +}; enum obd_heat_type { OBD_HEAT_READSAMPLE = 0, From patchwork Thu Feb 27 21:12:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410181 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 93F60138D for ; Thu, 27 Feb 2020 21:32:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7C57024677 for ; Thu, 27 Feb 2020 21:32:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7C57024677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7DA0349A08; Thu, 27 Feb 2020 13:27:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8889621FE77 for ; Thu, 27 Feb 2020 13:19:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A91F18A30; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A761E46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:32 -0500 Message-Id: <1582838290-17243-285-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 284/622] lnet: libcfs: poll fail_loc in cfs_fail_timeout_set() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev Some internal test usually take 800-900s which is almost half of the whole sanityn test suite run time. 99.(9)% of the time the tests just wait to ensure specific order the operations execute in. the patch changes cfs_fail_timeout_set() so that it can interrupt waiting if fail_loc is set to 0 - polling with 1/10s frequency is used. the tests itself are modified to reset fail_loc. to be able to do so both operations (referenced as OP1 and OP2 in the tests) are run in background. once started and then ensured with pdo_sched() helper that MDS threads got to the blocking points, we can interrupt OP1 and do usual checks. ONLY=40-47 sh sanityn.sh take: 1017s before and 78s after. WC-bug-id: https://jira.whamcloud.com/browse/LU-2233 Lustre-commit: 743b85a32e24 ("LU-2233 tests: improve tests sanityn/40-47") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/4392 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- net/lnet/libcfs/fail.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/lnet/libcfs/fail.c b/net/lnet/libcfs/fail.c index 6ee4de2..40e93b00 100644 --- a/net/lnet/libcfs/fail.c +++ b/net/lnet/libcfs/fail.c @@ -131,14 +131,21 @@ int __cfs_fail_check_set(u32 id, u32 value, int set) int __cfs_fail_timeout_set(u32 id, u32 value, int ms, int set) { + ktime_t till = ktime_add_ms(ktime_get(), ms); int ret; ret = __cfs_fail_check_set(id, value, set); if (ret && likely(ms > 0)) { - CERROR("cfs_fail_timeout id %x sleeping for %dms\n", - id, ms); - schedule_timeout_uninterruptible(ms * HZ / 1000); - CERROR("cfs_fail_timeout id %x awake\n", id); + CERROR("cfs_fail_timeout id %x sleeping for %dms\n", id, ms); + while (ktime_before(ktime_get(), till)) { + schedule_timeout_uninterruptible(HZ / 10); + if (!cfs_fail_loc) { + CERROR("cfs_fail_timeout interrupted\n"); + break; + } + } + if (cfs_fail_loc) + CERROR("cfs_fail_timeout id %x awake\n", id); } return ret; } From patchwork Thu Feb 27 21:12:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410185 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F0B2138D for ; Thu, 27 Feb 2020 21:32:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6795E24677 for ; Thu, 27 Feb 2020 21:32:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6795E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EF5A7349A35; Thu, 27 Feb 2020 13:27:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC92921FC82 for ; Thu, 27 Feb 2020 13:19:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AC6BF8A31; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AA515468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:33 -0500 Message-Id: <1582838290-17243-286-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 285/622] lustre: obd: round values to nearest MiB for *_mb syfs files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Several sysfs files report their settings with the functions lprocfs_read_frac_helper() which has the intent of showing fractional values i.e 1.5 MiB. This approach has caused problems with shells which don't handle fractional representation and the values reported don't faithfully represent the original value the configurator passed into the sysfs file. To resolve this lets instead always round up the value the configurator passed into the sysfs file to the nearest MiB value. This way it is always guaranteed the values reported are always exactly some MiB value. WC-bug-id: https://jira.whamcloud.com/browse/LU-11157 Lustre-commit: ba2817fe3ead ("LU-11157 obd: round values to nearest MiB for *_mb syfs files") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/34317 Reviewed-by: Ben Evans Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 5 ++- fs/lustre/llite/llite_internal.h | 6 +-- fs/lustre/llite/lproc_llite.c | 78 +++++++++++++++----------------------- fs/lustre/mdc/lproc_mdc.c | 8 ++-- fs/lustre/osc/lproc_osc.c | 21 +++++----- 5 files changed, 50 insertions(+), 68 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 8d74822..9f62d4e 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -63,6 +63,9 @@ static inline unsigned int pct(unsigned long a, unsigned long b) return b ? a * 100 / b : 0; } +#define PAGES_TO_MiB(pages) ((pages) >> (20 - PAGE_SHIFT)) +#define MiB_TO_PAGES(mb) ((mb) << (20 - PAGE_SHIFT)) + struct lprocfs_static_vars { struct lprocfs_vars *obd_vars; const struct attribute_group *sysfs_vars; @@ -363,8 +366,6 @@ enum { int lprocfs_write_frac_helper(const char __user *buffer, unsigned long count, int *val, int mult); -int lprocfs_read_frac_helper(char *buffer, unsigned long count, - long val, int mult); int lprocfs_stats_alloc_one(struct lprocfs_stats *stats, unsigned int cpuid); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 9d7345a..eb7e0dc 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -297,18 +297,16 @@ int ll_listsecurity(struct inode *inode, char *secctx_name, void ll_inode_size_lock(struct inode *inode); void ll_inode_size_unlock(struct inode *inode); -/* FIXME: replace the name of this with LL_I to conform to kernel stuff */ -/* static inline struct ll_inode_info *LL_I(struct inode *inode) */ static inline struct ll_inode_info *ll_i2info(struct inode *inode) { return container_of(inode, struct ll_inode_info, lli_vfs_inode); } /* default to about 64M of readahead on a given system. */ -#define SBI_DEFAULT_READAHEAD_MAX (64UL << (20 - PAGE_SHIFT)) +#define SBI_DEFAULT_READAHEAD_MAX MiB_TO_PAGES(64UL) /* default to read-ahead full files smaller than 2MB on the second read */ -#define SBI_DEFAULT_READAHEAD_WHOLE_MAX (2UL << (20 - PAGE_SHIFT)) +#define SBI_DEFAULT_READAHEAD_WHOLE_MAX MiB_TO_PAGES(2UL) enum ra_stat { RA_STAT_HIT = 0, diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index cc9f80e..165d37f 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -326,15 +326,13 @@ static ssize_t max_read_ahead_mb_show(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); - long pages_number; - int mult; + unsigned long ra_max_mb; spin_lock(&sbi->ll_lock); - pages_number = sbi->ll_ra_info.ra_max_pages; + ra_max_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages); spin_unlock(&sbi->ll_lock); - mult = 1 << (20 - PAGE_SHIFT); - return lprocfs_read_frac_helper(buf, PAGE_SIZE, pages_number, mult); + return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_mb); } static ssize_t max_read_ahead_mb_store(struct kobject *kobj, @@ -344,21 +342,19 @@ static ssize_t max_read_ahead_mb_store(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); + u64 ra_max_mb, pages_number; int rc; - unsigned long pages_number; - int pages_shift; - pages_shift = 20 - PAGE_SHIFT; - rc = kstrtoul(buffer, 10, &pages_number); + rc = kstrtoull(buffer, 10, &ra_max_mb); if (rc) return rc; - pages_number <<= pages_shift; /* MB -> pages */ - + pages_number = round_up(ra_max_mb, 1024 * 1024) >> PAGE_SHIFT; if (pages_number > totalram_pages() / 2) { - CERROR("%s: can't set max_readahead_mb=%lu > %luMB\n", - sbi->ll_fsname, pages_number >> pages_shift, - totalram_pages() >> (pages_shift + 1)); /*1/2 of RAM*/ + /* 1/2 of RAM */ + CERROR("%s: can't set max_readahead_mb=%llu > %luMB\n", + sbi->ll_fsname, PAGES_TO_MiB(pages_number), + PAGES_TO_MiB(totalram_pages())); return -ERANGE; } @@ -376,15 +372,13 @@ static ssize_t max_read_ahead_per_file_mb_show(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); - long pages_number; - int mult; + unsigned long ra_max_file_mb; spin_lock(&sbi->ll_lock); - pages_number = sbi->ll_ra_info.ra_max_pages_per_file; + ra_max_file_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages_per_file); spin_unlock(&sbi->ll_lock); - mult = 1 << (20 - PAGE_SHIFT); - return lprocfs_read_frac_helper(buf, PAGE_SIZE, pages_number, mult); + return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_file_mb); } static ssize_t max_read_ahead_per_file_mb_store(struct kobject *kobj, @@ -394,22 +388,18 @@ static ssize_t max_read_ahead_per_file_mb_store(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); + u64 ra_max_file_mb, pages_number; int rc; - unsigned long pages_number; - int pages_shift; - pages_shift = 20 - PAGE_SHIFT; - rc = kstrtoul(buffer, 10, &pages_number); + rc = kstrtoull(buffer, 10, &ra_max_file_mb); if (rc) return rc; - pages_number <<= pages_shift; /* MB -> pages */ - + pages_number = round_up(ra_max_file_mb, 1024 * 1024) >> PAGE_SHIFT; if (pages_number > sbi->ll_ra_info.ra_max_pages) { - CERROR("%s: can't set max_readahead_per_file_mb=%lu > max_read_ahead_mb=%lu\n", - sbi->ll_fsname, - pages_number >> pages_shift, - sbi->ll_ra_info.ra_max_pages >> pages_shift); + CERROR("%s: can't set max_readahead_per_file_mb=%llu > max_read_ahead_mb=%lu\n", + sbi->ll_fsname, PAGES_TO_MiB(pages_number), + PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages)); return -ERANGE; } @@ -427,15 +417,13 @@ static ssize_t max_read_ahead_whole_mb_show(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); - long pages_number; - int mult; + unsigned long ra_max_whole_mb; spin_lock(&sbi->ll_lock); - pages_number = sbi->ll_ra_info.ra_max_read_ahead_whole_pages; + ra_max_whole_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_read_ahead_whole_pages); spin_unlock(&sbi->ll_lock); - mult = 1 << (20 - PAGE_SHIFT); - return lprocfs_read_frac_helper(buf, PAGE_SIZE, pages_number, mult); + return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_whole_mb); } static ssize_t max_read_ahead_whole_mb_store(struct kobject *kobj, @@ -445,24 +433,21 @@ static ssize_t max_read_ahead_whole_mb_store(struct kobject *kobj, { struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, ll_kset.kobj); + u64 ra_max_whole_mb, pages_number; int rc; - unsigned long pages_number; - int pages_shift; - pages_shift = 20 - PAGE_SHIFT; - rc = kstrtoul(buffer, 10, &pages_number); + rc = kstrtoull(buffer, 10, &ra_max_whole_mb); if (rc) return rc; - pages_number <<= pages_shift; /* MB -> pages */ + pages_number = round_up(ra_max_whole_mb, 1024 * 1024) >> PAGE_SHIFT; /* Cap this at the current max readahead window size, the readahead * algorithm does this anyway so it's pointless to set it larger. */ if (pages_number > sbi->ll_ra_info.ra_max_pages_per_file) { - CERROR("%s: can't set max_read_ahead_whole_mb=%lu > max_read_ahead_per_file_mb=%lu\n", - sbi->ll_fsname, - pages_number >> pages_shift, - sbi->ll_ra_info.ra_max_pages_per_file >> pages_shift); + CERROR("%s: can't set max_read_ahead_whole_mb=%llu > max_read_ahead_per_file_mb=%lu\n", + sbi->ll_fsname, PAGES_TO_MiB(pages_number), + PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages_per_file)); return -ERANGE; } @@ -479,12 +464,11 @@ static int ll_max_cached_mb_seq_show(struct seq_file *m, void *v) struct super_block *sb = m->private; struct ll_sb_info *sbi = ll_s2sbi(sb); struct cl_client_cache *cache = sbi->ll_cache; - int shift = 20 - PAGE_SHIFT; long max_cached_mb; long unused_mb; - max_cached_mb = cache->ccc_lru_max >> shift; - unused_mb = atomic_long_read(&cache->ccc_lru_left) >> shift; + max_cached_mb = PAGES_TO_MiB(cache->ccc_lru_max); + unused_mb = PAGES_TO_MiB(atomic_long_read(&cache->ccc_lru_left)); seq_printf(m, "users: %d\n" "max_cached_mb: %ld\n" @@ -538,7 +522,7 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file, if (pages_number < 0 || pages_number > totalram_pages()) { CERROR("%s: can't set max cache more than %lu MB\n", sbi->ll_fsname, - totalram_pages() >> (20 - PAGE_SHIFT)); + PAGES_TO_MiB(totalram_pages())); return -ERANGE; } /* Allow enough cache so clients can make well-formed RPCs */ diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 81167bbd..454b69d 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -47,7 +47,7 @@ static int mdc_max_dirty_mb_seq_show(struct seq_file *m, void *v) unsigned long val; spin_lock(&cli->cl_loi_list_lock); - val = cli->cl_dirty_max_pages >> (20 - PAGE_SHIFT); + val = PAGES_TO_MiB(cli->cl_dirty_max_pages); spin_unlock(&cli->cl_loi_list_lock); seq_printf(m, "%lu\n", val); @@ -69,10 +69,10 @@ static ssize_t mdc_max_dirty_mb_seq_write(struct file *file, if (rc) return rc; - pages_number >>= PAGE_SHIFT; - + /* MB -> pages */ + pages_number = round_up(pages_number, 1024 * 1024) >> PAGE_SHIFT; if (pages_number <= 0 || - pages_number >= OSC_MAX_DIRTY_MB_MAX << (20 - PAGE_SHIFT) || + pages_number >= MiB_TO_PAGES(OSC_MAX_DIRTY_MB_MAX) || pages_number > totalram_pages() / 4) /* 1/4 of RAM */ return -ERANGE; diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 5faf518..775bf74 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -134,12 +134,13 @@ static ssize_t max_dirty_mb_show(struct kobject *kobj, struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &dev->u.cli; - long val; - int mult; + unsigned long val; - val = cli->cl_dirty_max_pages; - mult = 1 << (20 - PAGE_SHIFT); - return lprocfs_read_frac_helper(buf, PAGE_SIZE, val, mult); + spin_lock(&cli->cl_loi_list_lock); + val = PAGES_TO_MiB(cli->cl_dirty_max_pages); + spin_unlock(&cli->cl_loi_list_lock); + + return scnprintf(buf, PAGE_SIZE, "%lu\n", val); } static ssize_t max_dirty_mb_store(struct kobject *kobj, @@ -150,17 +151,15 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj, struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &dev->u.cli; - unsigned long pages_number; + unsigned long pages_number, max_dirty_mb; int rc; - rc = kstrtoul(buffer, 10, &pages_number); + rc = kstrtoul(buffer, 10, &max_dirty_mb); if (rc) return rc; - pages_number *= 1 << (20 - PAGE_SHIFT); /* MB -> pages */ - - if (pages_number <= 0 || - pages_number >= OSC_MAX_DIRTY_MB_MAX << (20 - PAGE_SHIFT) || + pages_number = MiB_TO_PAGES(max_dirty_mb); + if (pages_number >= MiB_TO_PAGES(OSC_MAX_DIRTY_MB_MAX) || pages_number > totalram_pages() / 4) /* 1/4 of RAM */ return -ERANGE; From patchwork Thu Feb 27 21:12:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410127 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8CD792A for ; Thu, 27 Feb 2020 21:30:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 917D420801 for ; Thu, 27 Feb 2020 21:30:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 917D420801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 05793349726; Thu, 27 Feb 2020 13:26:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2DD7D21FC82 for ; Thu, 27 Feb 2020 13:19:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AE2EF8A32; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AD1A946A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:34 -0500 Message-Id: <1582838290-17243-287-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 286/622] lustre: osc: don't check capability for every page X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang We check CFS_CAP_SYS_RESOURCE for every page during the io. This is expensive on apparmor enabled systems, we can only do that once for the entire io and use the result when submitting the pages. Don't init the oap_brw_flags during osc_page_init(), the flag will be set in either osc_queue_async_io() or osc_page_submit(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12093 Lustre-commit: c1cab789aaa2 ("LU-12093 osc: don't check capability for every page") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/34478 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 4 +++- fs/lustre/osc/osc_cache.c | 5 +---- fs/lustre/osc/osc_io.c | 6 ++++-- fs/lustre/osc/osc_page.c | 5 +++-- 4 files changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index aa3d4c3..1c5af80 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -139,7 +139,9 @@ struct osc_io { /* true if this io is lockless. */ unsigned int oi_lockless:1, /* true if this io is counted as active IO */ - oi_is_active:1; + oi_is_active:1, + /** true if this io has CAP_SYS_RESOURCE */ + oi_cap_sys_resource:1; /* how many LRU pages are reserved for this IO */ unsigned long oi_lru_reserved; diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index bdaf65f..a02adac 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2283,9 +2283,6 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, oap->oap_obj_off = offset; LASSERT(!(offset & ~PAGE_MASK)); - if (capable(CAP_SYS_RESOURCE)) - oap->oap_brw_flags = OBD_BRW_NOQUOTA; - INIT_LIST_HEAD(&oap->oap_pending_item); INIT_LIST_HEAD(&oap->oap_rpc_item); @@ -2324,7 +2321,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, /* Set the OBD_BRW_SRVLOCK before the page is queued. */ brw_flags |= ops->ops_srvlock ? OBD_BRW_SRVLOCK : 0; - if (capable(CAP_SYS_RESOURCE)) { + if (oio->oi_cap_sys_resource) { brw_flags |= OBD_BRW_NOQUOTA; cmd |= OBD_BRW_NOQUOTA; } diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 76657f3..dfdf064 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -357,18 +357,20 @@ int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios) { struct osc_object *osc = cl2osc(ios->cis_obj); struct obd_import *imp = osc_cli(osc)->cl_import; + struct osc_io *oio = osc_env_io(env); int rc = -EIO; spin_lock(&imp->imp_lock); if (likely(!imp->imp_invalid)) { - struct osc_io *oio = osc_env_io(env); - atomic_inc(&osc->oo_nr_ios); oio->oi_is_active = 1; rc = 0; } spin_unlock(&imp->imp_lock); + if (capable(CAP_SYS_RESOURCE)) + oio->oi_cap_sys_resource = 1; + return rc; } EXPORT_SYMBOL(osc_io_iter_init); diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 7382e0d..0910f3a 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -302,6 +302,7 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj, void osc_page_submit(const struct lu_env *env, struct osc_page *opg, enum cl_req_type crt, int brw_flags) { + struct osc_io *oio = osc_env_io(env); struct osc_async_page *oap = &opg->ops_oap; LASSERTF(oap->oap_magic == OAP_MAGIC, @@ -313,9 +314,9 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg, oap->oap_cmd = crt == CRT_WRITE ? OBD_BRW_WRITE : OBD_BRW_READ; oap->oap_page_off = opg->ops_from; oap->oap_count = opg->ops_to - opg->ops_from; - oap->oap_brw_flags = brw_flags | OBD_BRW_SYNC; + oap->oap_brw_flags = OBD_BRW_SYNC | brw_flags; - if (capable(CAP_SYS_RESOURCE)) { + if (oio->oi_cap_sys_resource) { oap->oap_brw_flags |= OBD_BRW_NOQUOTA; oap->oap_cmd |= OBD_BRW_NOQUOTA; } From patchwork Thu Feb 27 21:12:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34ED3138D for ; Thu, 27 Feb 2020 21:31:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1D97620801 for ; Thu, 27 Feb 2020 21:31:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D97620801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4091B349752; Thu, 27 Feb 2020 13:26:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 957F221FC82 for ; Thu, 27 Feb 2020 13:19:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B138A8A33; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B00B346C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:35 -0500 Message-Id: <1582838290-17243-288-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 287/622] lustre: statahead: sa_handle_callback get lli_sa_lock earlier X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler sa_handle_callback() must acquire the lli_sa_lock before calling sa_has_callback(), which checks whether the sai_interim_entries list is empty. Acquiring the lock avoids a race between an rpc handler executing ll_statahead_interpret and the separate ll_statahead_thread. When a client receives a stat request response, ll_statahead_interpret increments sai_replied and if needed adds the request to the sai_interim_entries list for instantiating by the ll_statahead_thread. ll_statahead_interpret() holds the lli_sa_lock while doing this work. On process termination, ll_statahead_thread() waits for sai_sent to equal sai_replied and then removes any entries in the sai_interim_entries list. It does not get the lli_sa_lock until it determines that there are sai_interim_entries to process. A bug occurs on weak memory model processors that do not guarantee that both ll_statahead_interpret updates done under the lock are visible to other processors at the same time. For example, on ARM nodes, an ll_statahead_thread can read the updated value of sai_replied and a non-updated value of sai_interim_lists. ll_statahead_thread then thinks all replies have been received (true) and all sai_interim_entries have been processed false). Later, the update to sai_interim_entries becomes visible leaving the ll_statahead_info struct in an unexpected state. The bad state eventually triggers the LBUG: statahead.c:477:ll_sai_put()) ASSERTION( !sa_has_callback(sai) ) Cray-bug-id: LUS-6243 WC-bug-id: https://jira.whamcloud.com/browse/LU-12221 Lustre-commit: 31ef093c2197 ("LU-12221 statahead: sa_handle_callback get lli_sa_lock earlier") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/34760 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/statahead.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 7dfb045..497aba3 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -688,21 +688,19 @@ static void sa_handle_callback(struct ll_statahead_info *sai) lli = ll_i2info(sai->sai_dentry->d_inode); + spin_lock(&lli->lli_sa_lock); while (sa_has_callback(sai)) { struct sa_entry *entry; - spin_lock(&lli->lli_sa_lock); - if (unlikely(!sa_has_callback(sai))) { - spin_unlock(&lli->lli_sa_lock); - break; - } entry = list_first_entry(&sai->sai_interim_entries, struct sa_entry, se_list); list_del_init(&entry->se_list); spin_unlock(&lli->lli_sa_lock); sa_instantiate(sai, entry); + spin_lock(&lli->lli_sa_lock); } + spin_unlock(&lli->lli_sa_lock); } /* From patchwork Thu Feb 27 21:12:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410189 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4609C92A for ; Thu, 27 Feb 2020 21:32:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2ED8E24677 for ; Thu, 27 Feb 2020 21:32:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2ED8E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5D873349A5B; Thu, 27 Feb 2020 13:27:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6B0021FEA4 for ; Thu, 27 Feb 2020 13:19:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B40AB8A34; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B2EAD46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:36 -0500 Message-Id: <1582838290-17243-289-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 288/622] lnet: use number of wrs to calculate CQEs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Using concurrent sends to calculate the number of CQEs results in a small number of CQEs which exposes an issue where under failure scenarios, example when a node reboots, there wouldn't be enough CQEs available leading to IB_EVENT_QP_FATAL Fixes: b61010ddf672 ("lnet: lnd: bring back concurrent_sends") WC-bug-id: https://jira.whamcloud.com/browse/LU-12279 Lustre-commit: 24294b843f79 ("LU-12279 lnet: use number of wrs to calculate CQEs") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34945 Reviewed-by: Sonia Sharma Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index eb80d5e..2f7ca52 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -136,9 +136,7 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) -#define IBLND_CQ_ENTRIES(c) \ - (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ - c->ibc_peer->ibp_ni)) +#define IBLND_CQ_ENTRIES(c) (IBLND_RECV_WRS(c) + kiblnd_send_wrs(c)) struct kib_hca_dev; From patchwork Thu Feb 27 21:12:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D314138D for ; Thu, 27 Feb 2020 21:31:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4624720801 for ; Thu, 27 Feb 2020 21:31:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4624720801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B25C134977F; Thu, 27 Feb 2020 13:26:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2871621FCA1 for ; Thu, 27 Feb 2020 13:19:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B6C888A35; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B5A3946F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:37 -0500 Message-Id: <1582838290-17243-290-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 289/622] lustre: ldlm: Fix style issues for ldlm_resource.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ldlm/ldlm_resource.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: d7627feb4594 ("LU-6142 ldlm: Fix style issues for ldlm_resource.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34492 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Ben Evans Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_resource.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 59b17b5..14e03bc 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -443,7 +443,7 @@ struct ldlm_resource *ldlm_resource_getref(struct ldlm_resource *res) static unsigned int ldlm_res_hop_hash(struct cfs_hash *hs, const void *key, unsigned int mask) { - const struct ldlm_res_id *id = key; + const struct ldlm_res_id *id = key; unsigned int val = 0; unsigned int i; @@ -627,7 +627,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, return NULL; } - for (idx = 0;; idx++) { + for (idx = 0; ; idx++) { nsd = &ldlm_ns_hash_defs[idx]; if (nsd->nsd_type == LDLM_NS_TYPE_UNKNOWN) { CERROR("Unknown type %d for ns %s\n", ns_type, name); @@ -770,7 +770,8 @@ static void cleanup_resource(struct ldlm_resource *res, struct list_head *q, ldlm_set_local_only(lock); if (local_only && (lock->l_readers || lock->l_writers)) { - /* This is a little bit gross, but much better than the + /* + * This is a little bit gross, but much better than the * alternative: pretend that we got a blocking AST from * the server, so that when the lock is decref'd, it * will go away ... From patchwork Thu Feb 27 21:12:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410099 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A92AD17E0 for ; Thu, 27 Feb 2020 21:30:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 91BFE246A0 for ; Thu, 27 Feb 2020 21:30:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 91BFE246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 53494348F0A; Thu, 27 Feb 2020 13:25:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6A44221FEB2 for ; Thu, 27 Feb 2020 13:19:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B96FE8A36; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B8612468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:38 -0500 Message-Id: <1582838290-17243-291-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 290/622] lustre: ptlrpc: Fix style issues for sec_gc.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/sec_gc.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 930d88e71d16 ("LU-6142 ptlrpc: Fix style issues for sec_gc.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34551 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ptlrpc/sec_gc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/sec_gc.c b/fs/lustre/ptlrpc/sec_gc.c index 3baed8c..36ac319 100644 --- a/fs/lustre/ptlrpc/sec_gc.c +++ b/fs/lustre/ptlrpc/sec_gc.c @@ -147,7 +147,8 @@ static void sec_gc_main(struct work_struct *ws) sec_process_ctx_list(); again: - /* go through sec list do gc. + /* + * go through sec list do gc. * FIXME here we iterate through the whole list each time which * is not optimal. we perhaps want to use balanced binary tree * to trace each sec as order of expiry time. @@ -156,7 +157,8 @@ static void sec_gc_main(struct work_struct *ws) */ mutex_lock(&sec_gc_mutex); list_for_each_entry(sec, &sec_gc_list, ps_gc_list) { - /* if someone is waiting to be deleted, let it + /* + * if someone is waiting to be deleted, let it * proceed as soon as possible. */ if (atomic_read(&sec_gc_wait_del)) { From patchwork Thu Feb 27 21:12:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D93892A for ; Thu, 27 Feb 2020 21:31:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 057A020801 for ; Thu, 27 Feb 2020 21:31:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 057A020801 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BF79D349778; Thu, 27 Feb 2020 13:26:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB60121FEB2 for ; Thu, 27 Feb 2020 13:19:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BC26E8A37; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BB17446A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:39 -0500 Message-Id: <1582838290-17243-292-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 291/622] lustre: ptlrpc: Fix style issues for llog_client.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ptlrpc/llog_client.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: b0372d346200 ("LU-6142 ptlrpc: Fix style issues for llog_client.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34900 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ptlrpc/llog_client.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/lustre/ptlrpc/llog_client.c b/fs/lustre/ptlrpc/llog_client.c index e5ff080..ff1ca36 100644 --- a/fs/lustre/ptlrpc/llog_client.c +++ b/fs/lustre/ptlrpc/llog_client.c @@ -55,7 +55,7 @@ ctxt->loc_idx); \ imp = NULL; \ mutex_unlock(&ctxt->loc_mutex); \ - return (-EINVAL); \ + return -EINVAL; \ } \ mutex_unlock(&ctxt->loc_mutex); \ } while (0) @@ -64,12 +64,13 @@ mutex_lock(&ctxt->loc_mutex); \ if (ctxt->loc_imp != imp) \ CWARN("loc_imp has changed from %p to %p\n", \ - ctxt->loc_imp, imp); \ + ctxt->loc_imp, imp); \ class_import_put(imp); \ mutex_unlock(&ctxt->loc_mutex); \ } while (0) -/* This is a callback from the llog_* functions. +/* + * This is a callback from the llog_* functions. * Assumes caller has already pushed us into the kernel context. */ static int llog_client_open(const struct lu_env *env, @@ -171,7 +172,8 @@ static int llog_client_next_block(const struct lu_env *env, req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_SERVER, len); ptlrpc_request_set_replen(req); rc = ptlrpc_queue_wait(req); - /* -EIO has a special meaning here. If llog_osd_next_block() + /* + * -EIO has a special meaning here. If llog_osd_next_block() * reaches the end of the log without finding the desired * record then it updates *cur_offset and *cur_idx and returns * -EIO. In llog_process_thread() we use this to detect @@ -338,8 +340,9 @@ static int llog_client_read_header(const struct lu_env *env, static int llog_client_close(const struct lu_env *env, struct llog_handle *handle) { - /* this doesn't call LLOG_ORIGIN_HANDLE_CLOSE because - * the servers all close the file at the end of every + /* + * this doesn't call LLOG_ORIGIN_HANDLE_CLOSE because + * the servers all close the file at the end of every * other LLOG_ RPC. */ return 0; From patchwork Thu Feb 27 21:12:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A3D4138D for ; Thu, 27 Feb 2020 21:31:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6319324677 for ; Thu, 27 Feb 2020 21:31:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6319324677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A64021FCFA; Thu, 27 Feb 2020 13:26:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EE6FA21FEB2 for ; Thu, 27 Feb 2020 13:19:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BEF938A38; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BDDAF46C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:40 -0500 Message-Id: <1582838290-17243-293-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 292/622] lustre: dne: allow access to striped dir with broken layout X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Sometimes the layout of striped directories may become broken: * creation/unlink is partially executed on some MDT. * disk failure or stopped MDS cause some stripe inaccessible. * software bugs. In this situation, this directory should still be accessible, and specially be able to migrate to other active MDTs. This patch add this support on both server and client: don't imply stripe FID is sane, and when stripe doesn't exist, skip it. WC-bug-id: https://jira.whamcloud.com/browse/LU-11907 Lustre-commit: d2725563e7af ("LU-11907 dne: allow access to striped dir with broken layout") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34750 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 17 ++++++++++------- fs/lustre/llite/llite_lib.c | 4 ++++ fs/lustre/lmv/lmv_intent.c | 16 ++++++++++++++++ fs/lustre/lmv/lmv_obd.c | 27 ++++++++++++++++++++++++--- 4 files changed, 54 insertions(+), 10 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index fd7cd2d..f75183b 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -321,7 +321,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) */ if (file_dentry(filp)->d_parent && file_dentry(filp)->d_parent->d_inode) { - u64 ibits = MDS_INODELOCK_UPDATE; + u64 ibits = MDS_INODELOCK_LOOKUP; struct inode *parent; parent = file_dentry(filp)->d_parent->d_inode; @@ -1551,13 +1551,16 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct lu_fid fid; fid_le_to_cpu(&fid, &lmm->lmv_md_v1.lmv_stripe_fids[i]); - mdt_index = ll_get_mdt_idx_by_fid(sbi, &fid); - if (mdt_index < 0) { - rc = mdt_index; - goto out_tmp; + if (fid_is_sane(&fid)) { + mdt_index = ll_get_mdt_idx_by_fid(sbi, &fid); + if (mdt_index < 0) { + rc = mdt_index; + goto out_tmp; + } + tmp->lum_objects[i].lum_mds = mdt_index; + tmp->lum_objects[i].lum_fid = fid; } - tmp->lum_objects[i].lum_mds = mdt_index; - tmp->lum_objects[i].lum_fid = fid; + tmp->lum_stripe_count++; } diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 99cedcf..ba477ad 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1279,6 +1279,10 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) for (i = 0; i < lsm->lsm_md_stripe_count; i++) { fid = &lsm->lsm_md_oinfo[i].lmo_fid; LASSERT(!lsm->lsm_md_oinfo[i].lmo_root); + + if (!fid_is_sane(fid)) + continue; + /* Unfortunately ll_iget will call ll_update_inode, * where the initialization of slave inode is slightly * different, so it reset lsm_md to NULL to avoid diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 84a21a0..ba14e7c 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -162,6 +162,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct ptlrpc_request *req = NULL; struct mdt_body *body; struct md_op_data *op_data; + int valid_stripe_count = 0; int rc = 0, i; /** @@ -186,6 +187,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, fid = lsm->lsm_md_oinfo[i].lmo_fid; inode = lsm->lsm_md_oinfo[i].lmo_root; + if (!inode) + continue; + /* * Prepare op_data for revalidating. Note that @fid2 shluld be * defined otherwise it will go to server and take new lock @@ -211,6 +215,12 @@ int lmv_revalidate_slaves(struct obd_export *exp, rc = md_intent_lock(tgt->ltd_exp, op_data, &it, &req, cb_blocking, extra_lock_flags); + if (rc == -ENOENT) { + /* skip stripe is not exists */ + rc = 0; + continue; + } + if (rc < 0) goto cleanup; @@ -249,12 +259,18 @@ int lmv_revalidate_slaves(struct obd_export *exp, ldlm_lock_decref(lockh, it.it_lock_mode); it.it_lock_mode = 0; } + + valid_stripe_count++; } cleanup: if (req) ptlrpc_req_finished(req); + /* if all stripes are invalid, return -ENOENT to notify user */ + if (!rc && !valid_stripe_count) + rc = -ENOENT; + kfree(op_data); return rc; } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index dc4bd1e..4b5bd36 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2398,6 +2398,11 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, } oinfo = &op_data->op_mea1->lsm_md_oinfo[stripe_index]; + if (!oinfo->lmo_root) { + rc = -ENOENT; + break; + } + tgt = lmv_get_target(ctxt->ldc_lmv, oinfo->lmo_mds, NULL); if (IS_ERR(tgt)) { rc = PTR_ERR(tgt); @@ -2953,10 +2958,22 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm, for (i = 0; i < stripe_count; i++) { fid_le_to_cpu(&lsm->lsm_md_oinfo[i].lmo_fid, &lmm1->lmv_stripe_fids[i]); + /* + * set default value -1, so lmv_locate_tgt() knows this stripe + * target is not initialized. + */ + lsm->lsm_md_oinfo[i].lmo_mds = (u32)-1; + if (!fid_is_sane(&lsm->lsm_md_oinfo[i].lmo_fid)) + continue; + rc = lmv_fld_lookup(lmv, &lsm->lsm_md_oinfo[i].lmo_fid, &lsm->lsm_md_oinfo[i].lmo_mds); + if (rc == -ENOENT) + continue; + if (rc) return rc; + CDEBUG(D_INFO, "unpack fid #%d " DFID "\n", i, PFID(&lsm->lsm_md_oinfo[i].lmo_fid)); } @@ -2988,9 +3005,10 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, return 0; } - for (i = 0; i < lsm->lsm_md_stripe_count; i++) - iput(lsm->lsm_md_oinfo[i].lmo_root); - + for (i = 0; i < lsm->lsm_md_stripe_count; i++) { + if (lsm->lsm_md_oinfo[i].lmo_root) + iput(lsm->lsm_md_oinfo[i].lmo_root); + } kvfree(lsm); *lsmp = NULL; return 0; @@ -3334,6 +3352,9 @@ static int lmv_merge_attr(struct obd_export *exp, for (i = 0; i < lsm->lsm_md_stripe_count; i++) { struct inode *inode = lsm->lsm_md_oinfo[i].lmo_root; + if (!inode) + continue; + CDEBUG(D_INFO, "" DFID " size %llu, blocks %llu nlink %u, atime %lld ctime %lld, mtime %lld.\n", PFID(&lsm->lsm_md_oinfo[i].lmo_fid), From patchwork Thu Feb 27 21:12:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410147 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 307AF92A for ; Thu, 27 Feb 2020 21:31:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 18B03246A2 for ; Thu, 27 Feb 2020 21:31:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18B03246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 03A6234980A; Thu, 27 Feb 2020 13:26:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5040921FEC2 for ; Thu, 27 Feb 2020 13:19:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C23298A39; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C09FF46D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:41 -0500 Message-Id: <1582838290-17243-294-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 293/622] lustre: ptlrpc: ocd_connect_flags are wrong during reconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Import connect flags are reset to original ones during reconnect, so a request can be created with unsupported features. Use separate obd_connect_data to send connect request. Cray-bug-id: LUS-6397 WC-bug-id: https://jira.whamcloud.com/browse/LU-12095 Lustre-commit: 1224084c6300 ("LU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/34480 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index a75856a..6f13ec1 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -602,11 +602,12 @@ int ptlrpc_connect_import(struct obd_import *imp) int set_transno = 0; u64 committed_before_reconnect = 0; struct ptlrpc_request *request; + struct obd_connect_data ocd; char *bufs[] = { NULL, obd2cli_tgt(imp->imp_obd), obd->obd_uuid.uuid, (char *)&imp->imp_dlm_handle, - (char *)&imp->imp_connect_data, + (char *)&ocd, NULL }; struct ptlrpc_connect_async_args *aa; int rc; @@ -653,15 +654,16 @@ int ptlrpc_connect_import(struct obd_import *imp) /* Reset connect flags to the originally requested flags, in case * the server is updated on-the-fly we will get the new features. */ - imp->imp_connect_data.ocd_connect_flags = imp->imp_connect_flags_orig; - imp->imp_connect_data.ocd_connect_flags2 = imp->imp_connect_flags2_orig; + ocd = imp->imp_connect_data; + ocd.ocd_connect_flags = imp->imp_connect_flags_orig; + ocd.ocd_connect_flags2 = imp->imp_connect_flags2_orig; /* Reset ocd_version each time so the server knows the exact versions */ - imp->imp_connect_data.ocd_version = LUSTRE_VERSION_CODE; + ocd.ocd_version = LUSTRE_VERSION_CODE; imp->imp_msghdr_flags &= ~MSGHDR_AT_SUPPORT; imp->imp_msghdr_flags &= ~MSGHDR_CKSUM_INCOMPAT18; rc = obd_reconnect(NULL, imp->imp_obd->obd_self_export, obd, - &obd->obd_uuid, &imp->imp_connect_data, NULL); + &obd->obd_uuid, &ocd, NULL); if (rc) goto out; From patchwork Thu Feb 27 21:12:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84EBB92A for ; Thu, 27 Feb 2020 21:34:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6CA5524677 for ; Thu, 27 Feb 2020 21:34:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CA5524677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 59449349EBE; Thu, 27 Feb 2020 13:29:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 91A0221FEC8 for ; Thu, 27 Feb 2020 13:19:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C46D38A3A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C34F546F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:42 -0500 Message-Id: <1582838290-17243-295-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 294/622] lnet: libcfs: fix panic for too large cpu partitions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong If cpu partitions larger than online cpus, following calcuation will be 0: num = num_online_cpus() / ncpt; And it will trigger following panic in cfs_cpt_choose_ncpus() LASSERT(number > 0); We actually did not support this, instead of panic it, return failure is better. Also fix a invalid pointer access if we failed to init @cfs_cpt_table, as it will be converted to ERR_PTR() if error happen. WC-bug-id: https://jira.whamcloud.com/browse/LU-12299 Lustre-commit: 77771ff24c03 ("LU-12299 libcfs: fix panic for too large cpu partions") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34864 Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/libcfs_cpu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/lnet/libcfs/libcfs_cpu.c b/net/lnet/libcfs/libcfs_cpu.c index 3e566ac..80533c2 100644 --- a/net/lnet/libcfs/libcfs_cpu.c +++ b/net/lnet/libcfs/libcfs_cpu.c @@ -878,7 +878,14 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt) if (ncpt <= 0) ncpt = num; - if (ncpt > num_online_cpus() || ncpt > 4 * num) { + if (ncpt > num_online_cpus()) { + rc = -EINVAL; + CERROR("libcfs: CPU partition count %d > cores %d: rc = %d\n", + ncpt, num_online_cpus(), rc); + goto failed; + } + + if (ncpt > 4 * num) { CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n", ncpt, num); } From patchwork Thu Feb 27 21:12:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410151 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0982C138D for ; Thu, 27 Feb 2020 21:31:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E5E8C24677 for ; Thu, 27 Feb 2020 21:31:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5E8C24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2EFC3349865; Thu, 27 Feb 2020 13:26:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D565E21FEC8 for ; Thu, 27 Feb 2020 13:19:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C75578A3B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C60C4468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:43 -0500 Message-Id: <1582838290-17243-296-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 295/622] lustre: obdclass: put all service's env on the list X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev to be able to lookup by current thread where it's too complicated to pass env by argument. this version has stats to see slow/fast lookups. so, in sanity-benchmark there were 172850 fast lookups (from per-cpu cache) and 27228 slow lookups (from rhashtable). going to see the ration in autotest's reports. Fixes: 8a9e013dad74 ("lustre: ldlm: pass env to lvbo methods") WC-bug-id: https://jira.whamcloud.com/browse/LU-12034 Lustre-commit: aa82cc83612d ("LU-12034 obdclass: put all service's env on the list") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/34566 Reviewed-by: Andrew Perepechko Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 4 ++ fs/lustre/include/obd_class.h | 19 +++++--- fs/lustre/ldlm/ldlm_lockd.c | 20 +++++++- fs/lustre/obdclass/lu_object.c | 104 +++++++++++++++++++++++++++++++++++++++- fs/lustre/obdecho/echo_client.c | 2 + fs/lustre/ptlrpc/service.c | 20 ++++---- 6 files changed, 153 insertions(+), 16 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index a709ad7..c34605c 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1213,6 +1213,10 @@ struct lu_env { void lu_env_fini(struct lu_env *env); int lu_env_refill(struct lu_env *env); +struct lu_env *lu_env_find(void); +int lu_env_add(struct lu_env *env); +void lu_env_remove(struct lu_env *env); + /** @} lu_context */ /** diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index a142d6e..a890d00 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -477,12 +477,19 @@ static inline int obd_precleanup(struct obd_device *obd) int rc; if (ldt && d) { - struct lu_env env; - - rc = lu_env_init(&env, ldt->ldt_ctx_tags); - if (!rc) { - ldt->ldt_ops->ldto_device_fini(&env, d); - lu_env_fini(&env); + struct lu_env *env = lu_env_find(); + struct lu_env _env; + + if (!env) { + env = &_env; + rc = lu_env_init(env, ldt->ldt_ctx_tags); + LASSERT(!rc); + lu_env_add(env); + } + ldt->ldt_ops->ldto_device_fini(env, d); + if (env == &_env) { + lu_env_remove(env); + lu_env_fini(env); } } if (!obd->obd_type->typ_dt_ops->precleanup) diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index f37d8ef..3b405be 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -846,8 +846,20 @@ static int ldlm_bl_thread_blwi(struct ldlm_bl_pool *blp, */ static int ldlm_bl_thread_main(void *arg) { + struct lu_env *env; struct ldlm_bl_pool *blp; struct ldlm_bl_thread_data *bltd = arg; + int rc; + + env = kzalloc(sizeof(*env), GFP_NOFS); + if (!env) + return -ENOMEM; + rc = lu_env_init(env, LCT_DT_THREAD); + if (rc) + goto out_env; + rc = lu_env_add(env); + if (rc) + goto out_env_fini; blp = bltd->bltd_blp; @@ -888,7 +900,13 @@ static int ldlm_bl_thread_main(void *arg) atomic_dec(&blp->blp_num_threads); complete(&blp->blp_comp); - return 0; + + lu_env_remove(env); +out_env_fini: + lu_env_fini(env); +out_env: + kfree(env); + return rc; } static int ldlm_setup(void); diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 2ab4977..2f709b0 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1859,6 +1859,101 @@ static unsigned long lu_cache_shrink_scan(struct shrinker *sk, /** * Debugging printer function using printk(). */ + +struct lu_env_item { + struct task_struct *lei_task; /* rhashtable key */ + struct rhash_head lei_linkage; + struct lu_env *lei_env; +}; + +static const struct rhashtable_params lu_env_rhash_params = { + .key_len = sizeof(struct task_struct *), + .key_offset = offsetof(struct lu_env_item, lei_task), + .head_offset = offsetof(struct lu_env_item, lei_linkage), +}; + +struct rhashtable lu_env_rhash; + +struct lu_env_percpu { + struct task_struct *lep_task; + struct lu_env *lep_env ____cacheline_aligned_in_smp; +}; + +static struct lu_env_percpu lu_env_percpu[NR_CPUS]; + +int lu_env_add(struct lu_env *env) +{ + struct lu_env_item *lei, *old; + + LASSERT(env); + + lei = kzalloc(sizeof(*lei), GFP_NOFS); + if (!lei) + return -ENOMEM; + + lei->lei_task = current; + lei->lei_env = env; + + old = rhashtable_lookup_get_insert_fast(&lu_env_rhash, + &lei->lei_linkage, + lu_env_rhash_params); + LASSERT(!old); + + return 0; +} +EXPORT_SYMBOL(lu_env_add); + +void lu_env_remove(struct lu_env *env) +{ + struct lu_env_item *lei; + const void *task = current; + int i; + + for_each_possible_cpu(i) { + if (lu_env_percpu[i].lep_env == env) { + LASSERT(lu_env_percpu[i].lep_task == task); + lu_env_percpu[i].lep_task = NULL; + lu_env_percpu[i].lep_env = NULL; + } + } + + rcu_read_lock(); + lei = rhashtable_lookup_fast(&lu_env_rhash, &task, + lu_env_rhash_params); + if (lei && rhashtable_remove_fast(&lu_env_rhash, &lei->lei_linkage, + lu_env_rhash_params) == 0) + kfree(lei); + rcu_read_unlock(); +} +EXPORT_SYMBOL(lu_env_remove); + +struct lu_env *lu_env_find(void) +{ + struct lu_env *env = NULL; + struct lu_env_item *lei; + const void *task = current; + int i = get_cpu(); + + if (lu_env_percpu[i].lep_task == current) { + env = lu_env_percpu[i].lep_env; + put_cpu(); + LASSERT(env); + return env; + } + + lei = rhashtable_lookup_fast(&lu_env_rhash, &task, + lu_env_rhash_params); + if (lei) { + env = lei->lei_env; + lu_env_percpu[i].lep_task = current; + lu_env_percpu[i].lep_env = env; + } + put_cpu(); + + return env; +} +EXPORT_SYMBOL(lu_env_find); + static struct shrinker lu_site_shrinker = { .count_objects = lu_cache_shrink_count, .scan_objects = lu_cache_shrink_scan, @@ -1905,6 +2000,11 @@ int lu_global_init(void) * lu_object/inode cache consuming all the memory. */ result = register_shrinker(&lu_site_shrinker); + if (result == 0) { + result = rhashtable_init(&lu_env_rhash, &lu_env_rhash_params); + if (result != 0) + unregister_shrinker(&lu_site_shrinker); + } if (result != 0) { /* Order explained in lu_global_fini(). */ lu_context_key_degister(&lu_global_key); @@ -1917,7 +2017,7 @@ int lu_global_init(void) return result; } - return 0; + return result; } /** @@ -1936,6 +2036,8 @@ void lu_global_fini(void) lu_env_fini(&lu_shrink_env); up_write(&lu_sites_guard); + rhashtable_destroy(&lu_env_rhash); + lu_ref_global_fini(); } diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 5ac4519..01d8c04 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1506,6 +1506,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, rc = -ENOMEM; goto out; } + lu_env_add(env); switch (cmd) { case OBD_IOC_CREATE: /* may create echo object */ @@ -1572,6 +1573,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, } out: + lu_env_remove(env); lu_env_fini(env); kfree(env); diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 1513f51..d93cf14 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -2194,11 +2194,14 @@ static int ptlrpc_main(void *arg) rc = -ENOMEM; goto out_srv_fini; } + rc = lu_env_add(env); + if (rc) + goto out_env; rc = lu_context_init(&env->le_ctx, svc->srv_ctx_tags | LCT_REMEMBER | LCT_NOREF); if (rc) - goto out_srv_fini; + goto out_env_remove; thread->t_env = env; env->le_ctx.lc_thread = thread; @@ -2211,14 +2214,14 @@ static int ptlrpc_main(void *arg) CERROR("Failed to post rqbd for %s on CPT %d: %d\n", svc->srv_name, svcpt->scp_cpt, rc); - goto out_srv_fini; + goto out_ctx_fini; } /* Alloc reply state structure for this one */ rs = kvzalloc(svc->srv_max_reply_size, GFP_KERNEL); if (!rs) { rc = -ENOMEM; - goto out_srv_fini; + goto out_ctx_fini; } spin_lock(&svcpt->scp_lock); @@ -2310,15 +2313,16 @@ static int ptlrpc_main(void *arg) ptlrpc_watchdog_disable(&thread->t_watchdog); +out_ctx_fini: + lu_context_fini(&env->le_ctx); +out_env_remove: + lu_env_remove(env); +out_env: + kfree(env); out_srv_fini: /* deconstruct service thread state created by ptlrpc_start_thread() */ if (svc->srv_ops.so_thr_done) svc->srv_ops.so_thr_done(thread); - - if (env) { - lu_context_fini(&env->le_ctx); - kfree(env); - } out: CDEBUG(D_RPCTRACE, "%s: service thread [%p:%u] %d exiting: rc = %d\n", thread->t_name, thread, thread->t_pid, thread->t_id, rc); From patchwork Thu Feb 27 21:12:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410155 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18AFA92A for ; Thu, 27 Feb 2020 21:31:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 013DB24677 for ; Thu, 27 Feb 2020 21:31:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 013DB24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CFAA23498CF; Thu, 27 Feb 2020 13:26:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36B2A21FEC8 for ; Thu, 27 Feb 2020 13:19:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C9FB98A3C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C8D3346A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:44 -0500 Message-Id: <1582838290-17243-297-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 296/622] lustre: mdt: fix mdt_dom_discard_data() timeouts X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The mdt_dom_discard_data() issues new lock to cause data discard for all conflicting client locks. This was done in context of unlink RPC processing and may cause it to be stuck waiting for client to cancel their locks leading to cascading timeouts for any other locks waiting on the same resource and parent directory. Patch skips discard lock waiting in the current context by using own CP callback for that which doesn't wait for blocking locks. They will be finished later by LDLM and cleaned up in that completion callback. So current thread just makes sure discard locks are taken and BL ASTs are sent but doesn't wait for lock granting and that fixes the original problem. At the same time that opens window for race with data being flushed on client, so it is possible that new IO from client will happen on just unlinked object causing error message and it is not possible to distinguish that case from other possibly critical situations. To solve that the unlinked object is pinned in memory while until discard lock is granted. Therefore, such objects can be easily distinguished as stale one and any IO against it can be just silently ignored. Older clients are not fully compatible with async DoM discard so patch adds also new connection flag ASYNC_DISCARD to distinguish old clients and use old blocking discard for then. WC-bug-id: https://jira.whamcloud.com/browse/LU-11359 Lustre-commit: 9c028e74c220 ("LU-11359 mdt: fix mdt_dom_discard_data() timeouts") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34071 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 2 ++ fs/lustre/ldlm/ldlm_internal.h | 5 +---- fs/lustre/ldlm/ldlm_request.c | 13 +++++++++++++ fs/lustre/llite/llite_lib.c | 19 ++++++++++++------- fs/lustre/llite/namei.c | 12 +++++++++++- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/osc/osc_cache.c | 2 +- fs/lustre/ptlrpc/service.c | 23 +++++++++++++++++++++++ fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 3 +++ 10 files changed, 69 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 355049f..4060bb4 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -1082,6 +1082,8 @@ static inline struct ldlm_lock *ldlm_handle2lock(const struct lustre_handle *h) return lock; } +int is_granted_or_cancelled_nolock(struct ldlm_lock *lock); + int ldlm_error2errno(enum ldlm_error error); #if LUSTRE_TRACKS_LOCK_EXP_REFS diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index ede48b2..3789496 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -310,10 +310,7 @@ static inline int is_granted_or_cancelled(struct ldlm_lock *lock) int ret = 0; lock_res_and_lock(lock); - if (ldlm_is_granted(lock) && !ldlm_is_cp_reqd(lock)) - ret = 1; - else if (ldlm_is_failed(lock) || ldlm_is_cancel(lock)) - ret = 1; + ret = is_granted_or_cancelled_nolock(lock); unlock_res_and_lock(lock); return ret; diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 45d70d4..71892a5 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -138,6 +138,19 @@ static void ldlm_expired_completion_wait(struct ldlm_lock *lock, u32 conn_cnt) obd2cli_tgt(obd), imp->imp_connection->c_remote_uuid.uuid); } +int is_granted_or_cancelled_nolock(struct ldlm_lock *lock) +{ + int ret = 0; + + check_res_locked(lock->l_resource); + if (ldlm_is_granted(lock) && !ldlm_is_cp_reqd(lock)) + ret = 1; + else if (ldlm_is_failed(lock) || ldlm_is_cancel(lock)) + ret = 1; + return ret; +} +EXPORT_SYMBOL(is_granted_or_cancelled_nolock); + /** * Calculate the Completion timeout (covering enqueue, BL AST, data flush, * lock cancel, and their replies). Used for lock completion timeout on the diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index ba477ad..a89189c 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -213,7 +213,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_ARCHIVE_ID_ARRAY | - OBD_CONNECT2_LSOM; + OBD_CONNECT2_LSOM | + OBD_CONNECT2_ASYNC_DISCARD; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; @@ -2054,13 +2055,17 @@ void ll_delete_inode(struct inode *inode) struct address_space *mapping = &inode->i_data; unsigned long nrpages; - if (S_ISREG(inode->i_mode) && lli->lli_clob) - /* discard all dirty pages before truncating them, required by - * osc_extent implementation at LU-1030. + if (S_ISREG(inode->i_mode) && lli->lli_clob) { + /* It is last chance to write out dirty pages, + * otherwise we may lose data while umount. + * + * If i_nlink is 0 then just discard data. This is safe because + * local inode gets i_nlink 0 from server only for the last + * unlink, so that file is not opened somewhere else */ - cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, - CL_FSYNC_LOCAL, 1); - + cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, inode->i_nlink ? + CL_FSYNC_LOCAL : CL_FSYNC_DISCARD, 1); + } truncate_inode_pages_final(mapping); /* Workaround for LU-118: Note nrpages may not be totally updated when diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index ee3ce70..c3e8de4 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -224,8 +224,18 @@ void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) u64 bits = to_cancel; int rc; - if (!inode) + if (!inode) { + /* That means the inode is evicted most likely and may cause + * the skipping of lock cleanups below, so print the message + * about that in log. + */ + if (lock->l_resource->lr_lvb_inode) + LDLM_DEBUG(lock, + "can't take inode for the lock (%sevicted)\n", + lock->l_resource->lr_lvb_inode->i_state & + I_FREEING ? "" : "not "); return; + } if (!fid_res_name_eq(ll_inode2fid(inode), &lock->l_resource->lr_name)) { diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 55057cf..c244adb 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -125,6 +125,7 @@ "lsom", /* 0x800 */ "pcc", /* 0x1000 */ "plain_layout", /* 0x2000 */ + "async_discard", /* 0x4000 */ NULL }; diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index a02adac..8ffd8f9 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2926,7 +2926,7 @@ int osc_cache_writeback_range(const struct lu_env *env, struct osc_object *obj, * [start, end] must contain this extent */ EASSERT(ext->oe_start >= start && - ext->oe_max_end <= end, ext); + ext->oe_end <= end, ext); osc_extent_state_set(ext, OES_LOCKING); ext->oe_owner = current; list_move_tail(&ext->oe_link, &discard_list); diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index d93cf14..8e6013a 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -2367,8 +2367,13 @@ static int ptlrpc_hr_main(void *arg) struct ptlrpc_hr_thread *hrt = arg; struct ptlrpc_hr_partition *hrp = hrt->hrt_partition; LIST_HEAD(replies); + struct lu_env *env; int rc; + env = kzalloc(sizeof(*env), GFP_NOFS); + if (!env) + return -ENOMEM; + unshare_fs_struct(); rc = cfs_cpt_bind(ptlrpc_hr.hr_cpt_table, hrp->hrp_cpt); @@ -2381,6 +2386,15 @@ static int ptlrpc_hr_main(void *arg) threadname, hrp->hrp_cpt, ptlrpc_hr.hr_cpt_table, rc); } + rc = lu_context_init(&env->le_ctx, LCT_MD_THREAD | LCT_DT_THREAD | + LCT_REMEMBER | LCT_NOREF); + if (rc) + goto out_env; + + rc = lu_env_add(env); + if (rc) + goto out_ctx_fini; + atomic_inc(&hrp->hrp_nstarted); wake_up(&ptlrpc_hr.hr_waitq); @@ -2394,13 +2408,22 @@ static int ptlrpc_hr_main(void *arg) struct ptlrpc_reply_state, rs_list); list_del_init(&rs->rs_list); + /* refill keys if needed */ + lu_env_refill(env); + lu_context_enter(&env->le_ctx); ptlrpc_handle_rs(rs); + lu_context_exit(&env->le_ctx); } } atomic_inc(&hrp->hrp_nstopped); wake_up(&ptlrpc_hr.hr_waitq); + lu_env_remove(env); +out_ctx_fini: + lu_context_fini(&env->le_ctx); +out_env: + kfree(env); return 0; } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index fb57def..34c1d13 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1156,6 +1156,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_PCC); LASSERTF(OBD_CONNECT2_PLAIN_LAYOUT == 0x2000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_PLAIN_LAYOUT); + LASSERTF(OBD_CONNECT2_ASYNC_DISCARD == 0x4000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ASYNC_DISCARD); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index f7ea744..86395b7 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -810,6 +810,9 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ #define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ #define OBD_CONNECT2_PLAIN_LAYOUT 0x2000ULL /* Plain Directory Layout */ +#define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* support async DoM data + * discard + */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same From patchwork Thu Feb 27 21:12:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410159 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 427A0138D for ; Thu, 27 Feb 2020 21:31:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2AD0224677 for ; Thu, 27 Feb 2020 21:31:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AD0224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C675534990C; Thu, 27 Feb 2020 13:26:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8C66B21FC99 for ; Thu, 27 Feb 2020 13:19:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CC9C28A3D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CB91146C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:45 -0500 Message-Id: <1582838290-17243-298-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 297/622] lustre: lov: Add overstriping support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Each stripe in a shared file in Lustre corresponds to a single LDLM extent locking domain and also to a single object on disk (and in the OSS page cache). LDLM locks are extent locks, but there are still significant issues with false sharing with multiple writers. On-disk file systems also have per-object performance limitations for both read and write. The LDLM limitation means it is best to have a single writer per stripe, but modern OSTs can be faster than a single client, so this restricts maximum performance unless special methods are used (eg, Lustre lock ahead). The on disk file system limitations mean that even if LDLM locking is not an issue (read and not write, or lockahead), OST performance in a shared file is still limited by having only one object per OST. These limitations make it impossible to get the full performance of a modern Lustre FS with a single shared file. This patch makes it possible to have >1 stripe on a given OST in each layout component. This is known as overstriping. It works exactly like a normally striped file, and is largely transparent to users. By raising the object count per OST, this avoids the single object limits, and by creating more stripes, also avoids the "single effective writer per stripe" LDLM limitation. However, it is only desirable in some situations, so users must request it with a special setstripe command: lfs setstripe -C [count] [file] Users can also access overstriping using the standard '-o' option to manually select OSTs: lfs setstripe -o [ost_indices] [file] Overstriping also makes it easy to test layout size limits,so we add a test for that. WC-bug-id: https://jira.whamcloud.com/browse/LU-9846 Lustre-commit: 591a9b4cebc5 ("LU-9846 lod: Add overstriping support") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/28425 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 1 + fs/lustre/lov/lov_cl_internal.h | 5 +++-- fs/lustre/lov/lov_ea.c | 33 ++++++++++++++++++++++----------- fs/lustre/lov/lov_obd.c | 4 ++-- fs/lustre/ptlrpc/wiretest.c | 4 ++-- include/uapi/linux/lustre/lustre_user.h | 22 +++++++++++++++++----- 6 files changed, 47 insertions(+), 22 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index a89189c..d6293d1 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -210,6 +210,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags2 = OBD_CONNECT2_DIR_MIGRATE | OBD_CONNECT2_SUM_STATFS | + OBD_CONNECT2_OVERSTRIPING | OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_ARCHIVE_ID_ARRAY | diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index 7b95a00..6fea0f5 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -150,9 +150,10 @@ static inline char *llt2str(enum lov_layout_type llt) */ static inline u32 lov_entry_type(struct lov_stripe_md_entry *lsme) { - if ((lov_pattern(lsme->lsme_pattern) == LOV_PATTERN_RAID0) || + if ((lov_pattern(lsme->lsme_pattern) & LOV_PATTERN_RAID0) || (lov_pattern(lsme->lsme_pattern) == LOV_PATTERN_MDT)) - return lov_pattern(lsme->lsme_pattern); + return lov_pattern(lsme->lsme_pattern & + ~LOV_PATTERN_OVERSTRIPING); return 0; } diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index b7a6d91..07bfe0f 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -84,34 +84,45 @@ static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt) static int lsm_lmm_verify_v1v3(struct lov_mds_md *lmm, size_t lmm_size, u16 stripe_count) { + int rc = 0; + if (stripe_count > LOV_V1_INSANE_STRIPE_COUNT) { - CERROR("bad stripe count %d\n", stripe_count); + rc = -EINVAL; + CERROR("lov: bad stripe count %d: rc = %d\n", + stripe_count, rc); lov_dump_lmm_common(D_WARNING, lmm); - return -EINVAL; + goto out; } if (lmm_oi_id(&lmm->lmm_oi) == 0) { - CERROR("zero object id\n"); + rc = -EINVAL; + CERROR("lov: zero object id: rc = %d\n", rc); lov_dump_lmm_common(D_WARNING, lmm); - return -EINVAL; + goto out; } if (lov_pattern(le32_to_cpu(lmm->lmm_pattern)) != LOV_PATTERN_MDT && - lov_pattern(le32_to_cpu(lmm->lmm_pattern)) != LOV_PATTERN_RAID0) { - CERROR("bad striping pattern\n"); + lov_pattern(le32_to_cpu(lmm->lmm_pattern)) != LOV_PATTERN_RAID0 && + lov_pattern(le32_to_cpu(lmm->lmm_pattern)) != + (LOV_PATTERN_RAID0 | LOV_PATTERN_OVERSTRIPING)) { + rc = -EINVAL; + CERROR("lov: unrecognized striping pattern: rc = %d\n", rc); lov_dump_lmm_common(D_WARNING, lmm); - return -EINVAL; + goto out; } if (lmm->lmm_stripe_size == 0 || (le32_to_cpu(lmm->lmm_stripe_size) & (LOV_MIN_STRIPE_SIZE - 1)) != 0) { - CERROR("bad stripe size %u\n", - le32_to_cpu(lmm->lmm_stripe_size)); + rc = -EINVAL; + CERROR("lov: bad stripe size %u: rc = %d\n", + le32_to_cpu(lmm->lmm_stripe_size), rc); lov_dump_lmm_common(D_WARNING, lmm); - return -EINVAL; + goto out; } - return 0; + +out: + return rc; } static void lsme_free(struct lov_stripe_md_entry *lsme) diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 3a90e7e..234b556 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -699,8 +699,8 @@ void lov_fix_desc_stripe_count(u32 *val) void lov_fix_desc_pattern(u32 *val) { /* from lov_setstripe */ - if ((*val != 0) && (*val != LOV_PATTERN_RAID0)) { - LCONSOLE_WARN("Unknown stripe pattern: %#x\n", *val); + if ((*val != 0) && !lov_pattern_supported_normal_comp(*val)) { + LCONSOLE_WARN("lov: Unknown stripe pattern: %#x\n", *val); *val = 0; } } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 34c1d13..b8b561c 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1517,8 +1517,8 @@ void lustre_assert_wire_constants(void) (unsigned int)LOV_PATTERN_RAID1); LASSERTF(LOV_PATTERN_MDT == 0x00000100UL, "found 0x%.8xUL\n", (unsigned int)LOV_PATTERN_MDT); - LASSERTF(LOV_PATTERN_CMOBD == 0x00000200UL, "found 0x%.8xUL\n", - (unsigned int)LOV_PATTERN_CMOBD); + LASSERTF(LOV_PATTERN_OVERSTRIPING == 0x00000200UL, "found 0x%.8xUL\n", + (unsigned int)LOV_PATTERN_OVERSTRIPING); /* Checks for struct lov_comp_md_entry_v1 */ LASSERTF((int)sizeof(struct lov_comp_md_entry_v1) == 48, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index d52879e..dc39265 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -394,16 +394,28 @@ struct ll_ioc_lease_id { #define LMV_USER_MAGIC 0x0CD30CD0 /*default lmv magic*/ #define LMV_USER_MAGIC_SPECIFIC 0x0CD40CD0 -#define LOV_PATTERN_RAID0 0x001 +#define LOV_PATTERN_NONE 0x000 +#define LOV_PATTERN_RAID0 0x001 -#define LOV_PATTERN_RAID1 0x002 -#define LOV_PATTERN_MDT 0x100 -#define LOV_PATTERN_CMOBD 0x200 +#define LOV_PATTERN_RAID1 0x002 +#define LOV_PATTERN_MDT 0x100 +#define LOV_PATTERN_OVERSTRIPING 0x200 #define LOV_PATTERN_F_MASK 0xffff0000 #define LOV_PATTERN_F_HOLE 0x40000000 /* there is hole in LOV EA */ #define LOV_PATTERN_F_RELEASED 0x80000000 /* HSM released file */ +/* RELEASED and MDT patterns are not valid in many places, so rather than + * having many extra checks on lov_pattern_supported, we have this separate + * check for non-released, non-DOM components + */ +static inline bool lov_pattern_supported_normal_comp(__u32 pattern) +{ + return pattern == LOV_PATTERN_RAID0 || + pattern == (LOV_PATTERN_RAID0 | LOV_PATTERN_OVERSTRIPING); + +} + #define LOV_MAXPOOLNAME 15 #define LOV_POOLNAMEF "%.15s" #define LOV_OFFSET_DEFAULT ((__u16)-1) @@ -421,7 +433,7 @@ struct ll_ioc_lease_id { * * (max buffer size - lov+rpc header) / sizeof(struct lov_ost_data_v1) */ -#define LOV_MAX_STRIPE_COUNT 2000 /* ((12 * 4096 - 256) / 24) */ +#define LOV_MAX_STRIPE_COUNT 2000 /* ~((12 * 4096 - 256) / 24) */ #define LOV_ALL_STRIPES 0xffff /* only valid for directories */ #define LOV_V1_INSANE_STRIPE_COUNT 65532 /* maximum stripe count bz13933 */ From patchwork Thu Feb 27 21:12:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410101 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5137C138D for ; Thu, 27 Feb 2020 21:30:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 39916246A0 for ; Thu, 27 Feb 2020 21:30:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39916246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AAD793495D5; Thu, 27 Feb 2020 13:25:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E41F021FAAD for ; Thu, 27 Feb 2020 13:19:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CF6698A3E; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CE4F846D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:46 -0500 Message-Id: <1582838290-17243-299-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 298/622] lustre: rpc: support maximum 64MB I/O RPC X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin On newer systems, some block drivers allow max_hw_sector_kb to be up to 65536KB (64MB) to the underlying storage. To maximize driver efficiency, Lustre should also have bump up maximum I/O RPC size to 64MB. Clamp max_read_ahead_whold_mb not to exceed max_read_ahead_per_file_mb WC-bug-id: https://jira.whamcloud.com/browse/LU-11526 Lustre-commit: 1a9be0046b1f ("LU-11526 rpc: support maximum 64MB I/O RPC") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34042 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 +- fs/lustre/llite/llite_lib.c | 7 ++++++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 8d71559..f96265b 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -82,7 +82,7 @@ * transfer via cl_max_pages_per_rpc to some non-power-of-two value. * NOTE: This is limited to 16 (=64GB RPCs) by IOOBJ_MAX_BRW_BITS. */ -#define PTLRPC_BULK_OPS_BITS 4 +#define PTLRPC_BULK_OPS_BITS 6 #if PTLRPC_BULK_OPS_BITS > 16 #error "More than 65536 BRW RPCs not allowed by IOOBJ_MAX_BRW_BITS." #endif diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index d6293d1..e6ac16f 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -281,10 +281,15 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sbi->ll_md_exp->exp_connect_data = *data; /* Don't change value if it was specified in the config log */ - if (sbi->ll_ra_info.ra_max_read_ahead_whole_pages == -1) + if (sbi->ll_ra_info.ra_max_read_ahead_whole_pages == -1) { sbi->ll_ra_info.ra_max_read_ahead_whole_pages = max_t(unsigned long, SBI_DEFAULT_READAHEAD_WHOLE_MAX, (data->ocd_brw_size >> PAGE_SHIFT)); + if (sbi->ll_ra_info.ra_max_read_ahead_whole_pages > + sbi->ll_ra_info.ra_max_pages_per_file) + sbi->ll_ra_info.ra_max_read_ahead_whole_pages = + sbi->ll_ra_info.ra_max_pages_per_file; + } err = obd_fid_init(sbi->ll_md_exp->exp_obd, sbi->ll_md_exp, LUSTRE_SEQ_METADATA); From patchwork Thu Feb 27 21:12:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAF9C138D for ; Thu, 27 Feb 2020 21:31:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A358524677 for ; Thu, 27 Feb 2020 21:31:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A358524677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A308B349939; Thu, 27 Feb 2020 13:26:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3918521FEDF for ; Thu, 27 Feb 2020 13:19:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D26588A3F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D10D346F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:47 -0500 Message-Id: <1582838290-17243-300-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 299/622] lustre: dom: per-resource ELC for WRITE lock enqueue X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Improve client write lock enqueue by doing ELC for any read lock on the same resource. This helps with read/write access, e.g. compilebench shows ~10% better results with about 45% less ldlm cancel RPCs. In mdc_enqueue_send() collect resource unused read locks and pack them into enqueue request. The ldlm_cancel_resource_local() is changed also to don't skip DOM lock if it is set in policy explicitly WC-bug-id: https://jira.whamcloud.com/browse/LU-10894 Lustre-commit: 16c156c3218b ("LU-10894 dom: per-resource ELC for WRITE lock enqueue") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34736 Reviewed-by: Patrick Farrell Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 17 ++++++++++++----- fs/lustre/mdc/mdc_dev.c | 13 +++++++++++-- fs/lustre/mdc/mdc_internal.h | 5 ++++- fs/lustre/mdc/mdc_reint.c | 26 +++++++++++++++++--------- 4 files changed, 44 insertions(+), 17 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 71892a5..5a7026d 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1888,12 +1888,19 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, /* * If policy is given and this is IBITS lock, add to list only * those locks that match by policy. - * Skip locks with DoM bit always to don't flush data. */ - if (policy && (lock->l_resource->lr_type == LDLM_IBITS) && - (!(lock->l_policy_data.l_inodebits.bits & - policy->l_inodebits.bits) || ldlm_has_dom(lock))) - continue; + if (policy && (lock->l_resource->lr_type == LDLM_IBITS)) { + if (!(lock->l_policy_data.l_inodebits.bits & + policy->l_inodebits.bits)) + continue; + /* Skip locks with DoM bit if it is not set in policy + * to don't flush data by side-bits. Lock convert will + * drop those bits separately. + */ + if (ldlm_has_dom(lock) && + !(policy->l_inodebits.bits & MDS_INODELOCK_DOM)) + continue; + } /* See CBPENDING comment in ldlm_cancel_lru */ lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_CANCELING | diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index cb173f4..8f0e283 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -670,7 +670,8 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, enum ldlm_mode mode; bool glimpse = *flags & LDLM_FL_HAS_INTENT; u64 match_flags = *flags; - int rc; + LIST_HEAD(cancels); + int rc, count; mode = einfo->ei_mode; if (einfo->ei_mode == LCK_PR) @@ -726,7 +727,15 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, if (!req) return -ENOMEM; - rc = ldlm_prep_enqueue_req(exp, req, NULL, 0); + /* For WRITE lock cancel other locks on resource early if any */ + if (einfo->ei_mode & LCK_PW) + count = mdc_resource_get_unused_res(exp, res_id, &cancels, + einfo->ei_mode, + MDS_INODELOCK_DOM); + else + count = 0; + + rc = ldlm_prep_enqueue_req(exp, req, &cancels, count); if (rc < 0) { ptlrpc_request_free(req); return rc; diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index f75498a..2b540f8 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -86,7 +86,10 @@ int mdc_enqueue(struct obd_export *exp, struct ldlm_enqueue_info *einfo, const union ldlm_policy_data *policy, struct md_op_data *op_data, struct lustre_handle *lockh, u64 extra_lock_flags); - +int mdc_resource_get_unused_res(struct obd_export *exp, + struct ldlm_res_id *res_id, + struct list_head *cancels, + enum ldlm_mode mode, u64 bits); int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid, struct list_head *cancels, enum ldlm_mode mode, u64 bits); diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 86acb4e..d26e27d 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -62,13 +62,13 @@ static int mdc_reint(struct ptlrpc_request *request, int level) * found by @fid. Found locks are added into @cancel list. Returns the amount of * locks added to @cancels list. */ -int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid, - struct list_head *cancels, enum ldlm_mode mode, - u64 bits) +int mdc_resource_get_unused_res(struct obd_export *exp, + struct ldlm_res_id *res_id, + struct list_head *cancels, + enum ldlm_mode mode, u64 bits) { struct ldlm_namespace *ns = exp->exp_obd->obd_namespace; union ldlm_policy_data policy = {}; - struct ldlm_res_id res_id; struct ldlm_resource *res; int count; @@ -82,21 +82,29 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid, if (exp_connect_cancelset(exp) && !ns_connect_cancelset(ns)) return 0; - fid_build_reg_res_name(fid, &res_id); - res = ldlm_resource_get(exp->exp_obd->obd_namespace, - NULL, &res_id, 0, 0); + res = ldlm_resource_get(ns, NULL, res_id, 0, 0); if (IS_ERR(res)) return 0; LDLM_RESOURCE_ADDREF(res); /* Initialize ibits lock policy. */ policy.l_inodebits.bits = bits; - count = ldlm_cancel_resource_local(res, cancels, &policy, - mode, 0, 0, NULL); + count = ldlm_cancel_resource_local(res, cancels, &policy, mode, 0, 0, + NULL); LDLM_RESOURCE_DELREF(res); ldlm_resource_putref(res); return count; } +int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid, + struct list_head *cancels, enum ldlm_mode mode, + u64 bits) +{ + struct ldlm_res_id res_id; + + fid_build_reg_res_name(fid, &res_id); + return mdc_resource_get_unused_res(exp, &res_id, cancels, mode, bits); +} + int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data, void *ea, size_t ealen, struct ptlrpc_request **request) { From patchwork Thu Feb 27 21:12:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410193 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F3B1138D for ; Thu, 27 Feb 2020 21:32:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 81D6524677 for ; Thu, 27 Feb 2020 21:32:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81D6524677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C3D55349A88; Thu, 27 Feb 2020 13:27:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D357E200880 for ; Thu, 27 Feb 2020 13:19:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D8AA98A41; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D6813468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:48 -0500 Message-Id: <1582838290-17243-301-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 300/622] lustre: dom: mdc_lock_flush() improvement X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin There is small improvement in osc_lock_flush() to don't match other locks for write lock because there are none. Do the same in mdc_lock_flush(). WC-bug-id: https://jira.whamcloud.com/browse/LU-10894 Lustre-commit: 276221c2a1d2 ("LU-10894 dom: mdc_lock_flush() improvement") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34738 Reviewed-by: Patrick Farrell Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 8f0e283..14cece1 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -253,7 +253,9 @@ static int mdc_lock_flush(const struct lu_env *env, struct osc_object *obj, result = 0; } - rc = mdc_lock_discard_pages(env, obj, start, end, discard); + /* Avoid lock matching with CLM_WRITE, there can be no other locks */ + rc = mdc_lock_discard_pages(env, obj, start, end, + mode == CLM_WRITE || discard); if (result == 0 && rc < 0) result = rc; From patchwork Thu Feb 27 21:12:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85674924 for ; Thu, 27 Feb 2020 21:46:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6DF2E24690 for ; Thu, 27 Feb 2020 21:46:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DF2E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB5C034B229; Thu, 27 Feb 2020 13:36:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8EFAD200A4B for ; Thu, 27 Feb 2020 13:19:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D833D8A40; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D73FC46A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:49 -0500 Message-Id: <1582838290-17243-302-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 301/622] lnet: Fix NI status in debugfs for loopback ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The loopback NI is never really "down", but since its associated ns_status is used for other purposes that's how it is reported in proc_lnet_nis(). There's an existing check for lolnd so just hardcode the status as "up" there. WC-bug-id: https://jira.whamcloud.com/browse/LU-12302 Lustre-commit: 0c27e760c357 ("LU-12302 lnet: Fix NI status in proc for loopback ni") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34871 Reviewed-by: Sonia Sharma Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router_proc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 8517411..5341599 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -723,16 +723,18 @@ static int proc_lnet_nis(struct ctl_table *table, int write, if (the_lnet.ln_routing) last_alive = now - ni->ni_last_alive; - /* @lo forever alive */ - if (ni->ni_net->net_lnd->lnd_type == LOLND) - last_alive = 0; - lnet_ni_lock(ni); LASSERT(ni->ni_status); stat = (ni->ni_status->ns_status == LNET_NI_STATUS_UP) ? "up" : "down"; lnet_ni_unlock(ni); + /* @lo forever alive */ + if (ni->ni_net->net_lnd->lnd_type == LOLND) { + last_alive = 0; + stat = "up"; + } + /* * we actually output credits information for * TX queue of each partition From patchwork Thu Feb 27 21:12:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410167 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 845D8138D for ; Thu, 27 Feb 2020 21:31:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6CE0724677 for ; Thu, 27 Feb 2020 21:31:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CE0724677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D01C721FE60; Thu, 27 Feb 2020 13:27:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28A3F205763 for ; Thu, 27 Feb 2020 13:19:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DB0978A42; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D9A4746C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:50 -0500 Message-Id: <1582838290-17243-303-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 302/622] lustre: ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The rq_req_unlinked, rq_reply_unlinked and rq_receiving_reply flags determine whether a PtlRPC request can transition out of RQ_PHASE_UNREG_RPC. Add these flags to the DEBUG_REQ_FLAGS macro to aid in debugging issues where requests are stuck in this unregistering state. WC-bug-id: https://jira.whamcloud.com/browse/LU-12333 Lustre-commit: 5bcc3a330e21 ("LU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34949 Reviewed-by: Ann Koehler Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index f96265b..383d59e 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1069,9 +1069,12 @@ static inline void lustre_set_rep_swabbed(struct ptlrpc_request *req, FLAG(req->rq_no_resend, "N"), \ FLAG(req->rq_waiting, "W"), \ FLAG(req->rq_wait_ctx, "C"), FLAG(req->rq_hp, "H"), \ - FLAG(req->rq_committed, "M") + FLAG(req->rq_committed, "M"), \ + FLAG(req->rq_req_unlinked, "Q"), \ + FLAG(req->rq_reply_unlinked, "U"), \ + FLAG(req->rq_receiving_reply, "r") -#define REQ_FLAGS_FMT "%s:%s%s%s%s%s%s%s%s%s%s%s%s%s" +#define REQ_FLAGS_FMT "%s:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s" void _debug_req(struct ptlrpc_request *req, struct libcfs_debug_msg_data *data, const char *fmt, ...) From patchwork Thu Feb 27 21:12:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410881 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A1148924 for ; Thu, 27 Feb 2020 21:49:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 89A4F24690 for ; Thu, 27 Feb 2020 21:49:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89A4F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA40F34A77B; Thu, 27 Feb 2020 13:41:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6F80521C9CA for ; Thu, 27 Feb 2020 13:19:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DEC508A43; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DC65746D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:51 -0500 Message-Id: <1582838290-17243-304-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 303/622] lustre: llite: Revalidate dentries in ll_intent_file_open X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin We might get a lookup lock in response to our open request and we definitely want to ensure that our dentry is valid, so it could actually be matched by dcache code in future operations. Benchmark results: This patch can significantly improve open-create + stat on the same client. This patch in combination with two others: https://review.whamcloud.com/#/c/33584 https://review.whamcloud.com/#/c/33585 Improves the 'stat' side of open-create + stat by >10x. Without patches (master branch commit 26a7abe): mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 3838.205 3838.204 3838.204 0.000 File stat : 33459.289 33459.249 33459.271 0.011 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 3146.841 3146.841 3146.841 0.000 Tree removal : 0.000 0.000 0.000 0.000 With the three patches: mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 3822.440 3822.439 3822.440 0.000 File stat : 350620.140 350615.980 350617.193 1.051 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 2076.727 2076.727 2076.727 0.000 Tree removal : 0.000 0.000 0.000 0.000 Note 33K stats/second vs 350K stats/second. ls -l time of the mdtest directory is also reduced from 23.5 seconds to 5.8 seconds. WC-bug-id: https://jira.whamcloud.com/browse/LU-10948 Lustre-commit: 14ca3157b21d ("LU-10948 llite: Revalidate dentries in ll_intent_file_open") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/32157 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index e9d0ff9..191b0f9 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -419,25 +419,12 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, struct page *vmpage; struct niobuf_remote *rnb; char *data; - struct lustre_handle lockh; - struct ldlm_lock *lock; unsigned long index, start; struct niobuf_local lnb; - bool dom_lock = false; if (!obj) return; - if (it->it_lock_mode != 0) { - lockh.cookie = it->it_lock_handle; - lock = ldlm_handle2lock(&lockh); - if (lock) - dom_lock = ldlm_has_dom(lock); - LDLM_LOCK_PUT(lock); - } - if (!dom_lock) - return; - if (!req_capsule_has_field(&req->rq_pill, &RMF_NIOBUF_INLINE, RCL_SERVER)) return; @@ -576,8 +563,27 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, rc = ll_prep_inode(&inode, req, NULL, itp); if (!rc && itp->it_lock_mode) { - ll_dom_finish_open(d_inode(de), req, itp); + struct lustre_handle handle = {.cookie = itp->it_lock_handle}; + struct ldlm_lock *lock; + bool has_dom_bit = false; + + /* If we got a lock back and it has a LOOKUP bit set, + * make sure the dentry is marked as valid so we can find it. + * We don't need to care about actual hashing since other bits + * of kernel will deal with that later. + */ + lock = ldlm_handle2lock(&handle); + if (lock) { + has_dom_bit = ldlm_has_dom(lock); + if (lock->l_policy_data.l_inodebits.bits & + MDS_INODELOCK_LOOKUP) + d_lustre_revalidate(de); + + LDLM_LOCK_PUT(lock); + } ll_set_lock_data(sbi->ll_md_exp, inode, itp, NULL); + if (has_dom_bit) + ll_dom_finish_open(inode, req, itp); } out: From patchwork Thu Feb 27 21:12:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410197 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B52CA92A for ; Thu, 27 Feb 2020 21:32:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D76E24677 for ; Thu, 27 Feb 2020 21:32:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D76E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CDE6C349AAE; Thu, 27 Feb 2020 13:27:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C661A205750 for ; Thu, 27 Feb 2020 13:19:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E0F6F8A44; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DF23A46F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:52 -0500 Message-Id: <1582838290-17243-305-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 304/622] lustre: llite: hash just created files if lock allows X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin If open|creat (and other intent operations later) returned a lookup bit as part of the lock, hash the resultant dentry under this lock, not to trigger further RPCs in subsequent lookups. Benchmark results: This patch can significantly improve open-create + stat on the same client. This patch in combination with two others: https://review.whamcloud.com/32157 https://review.whamcloud.com/33585 Improves the 'stat' side of open-create + stat by >10x. Without patches (master branch commit 26a7abe): mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 3838.205 3838.204 3838.204 0.000 File stat : 33459.289 33459.249 33459.271 0.011 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 3146.841 3146.841 3146.841 0.000 Tree removal : 0.000 0.000 0.000 0.000 With the three patches: mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 3822.440 3822.439 3822.440 0.000 File stat : 350620.140 350615.980 350617.193 1.051 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 2076.727 2076.727 2076.727 0.000 Tree removal : 0.000 0.000 0.000 0.000 Note 33K stats/second vs 350K stats/second. ls -l time of the mdtest directory is also reduced from 23.5 seconds to 5.8 seconds. WC-bug-id: https://jira.whamcloud.com/browse/LU-11623 Lustre-commit: fc42cbe0e2e5 ("LU-11623 llite: hash just created files if lock allows") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/33584 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index c3e8de4..3c796bd 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -678,9 +678,9 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, if (bits & MDS_INODELOCK_LOOKUP) d_lustre_revalidate(*de); } else if (!it_disposition(it, DISP_OPEN_CREATE)) { - /* If file created on server, don't depend on parent UPDATE - * lock to unhide it. It is left hidden and next lookup can - * find it in ll_splice_alias. + /* + * If file was created on the server, the dentry is revalidated + * in ll_create_it if the lock allows for it. */ /* Check that parent has UPDATE lock. */ struct lookup_intent parent_it = { @@ -1063,6 +1063,7 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, struct lookup_intent *it, void *secctx, u32 secctxlen) { struct inode *inode; + u64 bits = 0; int rc = 0; CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir=" DFID "(%p), intent=%s\n", @@ -1088,6 +1089,10 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, return rc; } + ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits); + if (bits & MDS_INODELOCK_LOOKUP) + d_lustre_revalidate(dentry); + d_instantiate(dentry, inode); if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) From patchwork Thu Feb 27 21:12:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410201 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39954138D for ; Thu, 27 Feb 2020 21:32:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2180424677 for ; Thu, 27 Feb 2020 21:32:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2180424677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 98137349AD8; Thu, 27 Feb 2020 13:27:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B36D21FC95 for ; Thu, 27 Feb 2020 13:19:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E30658A45; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E1D8B47C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:53 -0500 Message-Id: <1582838290-17243-306-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 305/622] lnet: adds checking msg len X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko The LNET can't handle a msg with len larger than LNET_MTU. The next error occurred for DOM 1MB LNetError: 3137:0:(lib-move.c:4143:lnet_parse()) 192.168.8.1@tcp, src 192.168.8.1@tcp: bad PUT payload 1051832 (1048576 max expected) The patch adds fragment size check. WC-bug-id: https://jira.whamcloud.com/browse/LU-12140 Lustre-commit: 4d43a6c3b182 ("LU-12140 lnet: adds checking msg len") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-7174 Reviewed-on: https://review.whamcloud.com/34975 Reviewed-by: Alexey Lyashkov Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-md.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c index 7ea0f5e..4a70c76 100644 --- a/net/lnet/lnet/lib-md.c +++ b/net/lnet/lnet/lib-md.c @@ -325,6 +325,10 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) CERROR("Invalid option: too many fragments %u, %d max\n", umd->length, LNET_MAX_IOV); return -EINVAL; + } else if (umd->length > LNET_MTU) { + CERROR("Invalid length: too big fragment size %u, %d max\n", + umd->length, LNET_MTU); + return -EINVAL; } return 0; From patchwork Thu Feb 27 21:12:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEC7192A for ; Thu, 27 Feb 2020 21:31:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C783224677 for ; Thu, 27 Feb 2020 21:31:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C783224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 616A03490D5; Thu, 27 Feb 2020 13:27:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BD6121FCC4 for ; Thu, 27 Feb 2020 13:19:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E5DB08A46; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E49CB468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:54 -0500 Message-Id: <1582838290-17243-307-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 306/622] lustre: dne: add new dir hash type "space" X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Add a new hash type "space", if this is set on default LMV of a directory, its subdirs will be created on all MDTs with balanced space usage. * new hash type LMV_HASH_TYPE_SPACE. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: a24f61532927 ("LU-11213 dne: add new dir hash type "space"") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34358 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index dc39265..22a0144 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -650,12 +650,16 @@ enum lmv_hash_type { LMV_HASH_TYPE_UNKNOWN = 0, /* 0 is reserved for testing purpose */ LMV_HASH_TYPE_ALL_CHARS = 1, LMV_HASH_TYPE_FNV_1A_64 = 2, + LMV_HASH_TYPE_SPACE = 3, /* + * distribute subdirs among all MDTs + * with balanced space usage. + */ + LMV_HASH_TYPE_MAX, }; -#define LMV_HASH_TYPE_MAX LMV_HASH_TYPE_FNV_1A_64 + 1 - #define LMV_HASH_NAME_ALL_CHARS "all_char" #define LMV_HASH_NAME_FNV_1A_64 "fnv_1a_64" +#define LMV_HASH_NAME_SPACE "space" struct lustre_foreign_type { uint32_t lft_type; @@ -685,7 +689,7 @@ struct lmv_user_md_v1 { __u32 lum_stripe_count; /* dirstripe count */ __u32 lum_stripe_offset; /* MDT idx for default dirstripe */ __u32 lum_hash_type; /* Dir stripe policy */ - __u32 lum_type; /* LMV type: default or normal */ + __u32 lum_type; /* LMV type: default */ __u32 lum_padding1; __u32 lum_padding2; __u32 lum_padding3; @@ -703,6 +707,15 @@ static inline __u32 lmv_foreign_to_md_stripes(__u32 size) sizeof(struct lmv_user_mds_data); } +/* + * NB, historically default layout didn't set type, but use XATTR name to differ + * from normal layout, for backward compatibility, define LMV_TYPE_DEFAULT 0x0, + * and still use the same method. + */ +enum lmv_type { + LMV_TYPE_DEFAULT = 0x0000, +}; + static inline int lmv_user_md_size(int stripes, int lmm_magic) { int size = sizeof(struct lmv_user_md); From patchwork Thu Feb 27 21:12:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410107 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D58F17E0 for ; Thu, 27 Feb 2020 21:30:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 35C77246A0 for ; Thu, 27 Feb 2020 21:30:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 35C77246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA7E7349606; Thu, 27 Feb 2020 13:25:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AE4CC21FEE5 for ; Thu, 27 Feb 2020 13:19:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E8F948A47; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E756046A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:55 -0500 Message-Id: <1582838290-17243-308-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 307/622] lustre: uapi: Add nonrotational flag to statfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell It is potentially useful for the MDS and userspace to know whether or not an OST is using non-rotational media. Add a flag to obd_statfs that reflects this. Users can override this parameter in sysfs. ZFS does not currently make this information available to Lustre, so default to rotational and allow users to override. WC-bug-id: https://jira.whamcloud.com/browse/LU-11963 Lustre-commit: 68635c3d9b31 ("LU-11963 osd: Add nonrotational flag to statfs") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34235 Reviewed-by: Andreas Dilger Reviewed-by: Li Dongyang Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 14 ++++++++++++++ include/uapi/linux/lustre/lustre_user.h | 1 + 2 files changed, 15 insertions(+) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index b8b561c..64ccc6e 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1745,6 +1745,20 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_spare9)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare9) == 4, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_spare9)); + LASSERTF(OS_STATE_DEGRADED == 0x1, "found %lld\n", + (long long)OS_STATE_DEGRADED); + LASSERTF(OS_STATE_READONLY == 0x2, "found %lld\n", + (long long)OS_STATE_READONLY); + LASSERTF(OS_STATE_NOPRECREATE == 0x4, "found %lld\n", + (long long)OS_STATE_NOPRECREATE); + LASSERTF(OS_STATE_ENOSPC == 0x20, "found %lld\n", + (long long)OS_STATE_ENOSPC); + LASSERTF(OS_STATE_ENOINO == 0x40, "found %lld\n", + (long long)OS_STATE_ENOINO); + LASSERTF(OS_STATE_SUM == 0x100, "found %lld\n", + (long long)OS_STATE_SUM); + LASSERTF(OS_STATE_NONROT == 0x200, "found %lld\n", + (long long)OS_STATE_NONROT); /* Checks for struct obd_ioobj */ LASSERTF((int)sizeof(struct obd_ioobj) == 24, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 22a0144..d66c883 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -105,6 +105,7 @@ enum obd_statfs_state { OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ OS_STATE_SUM = 0x00000100, /**< aggregated for all tagrets */ + OS_STATE_NONROT = 0x00000200, /**< non-rotational device */ }; struct obd_statfs { From patchwork Thu Feb 27 21:12:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410111 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23A6717E0 for ; Thu, 27 Feb 2020 21:30:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0C2F0246A0 for ; Thu, 27 Feb 2020 21:30:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C2F0246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 546D2349632; Thu, 27 Feb 2020 13:26:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F052E21CA15 for ; Thu, 27 Feb 2020 13:19:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EBA358A48; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EA06746C; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:56 -0500 Message-Id: <1582838290-17243-309-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 308/622] lnet: libcfs: crashes with certain cpu part numbers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko Due to a bug in the code, libcfs will crash if the number of online cpus does not divide by the number of cpu partitions. Based on the checks in cfs_cpt_table_create(), it appears that the original intent was to push the remaining cpus into the initial partitions. So let's do that properly. WC-bug-id: https://jira.whamcloud.com/browse/LU-12352 Lustre-commit: e33e3da58972 ("LU-12352 libcfs: crashes with certain cpu part numbers") Signed-off-by: Andrew Perepechko Reviewed-by: Alexander Boyko Reviewed-by: Alexander Zarochentsev Cray-bug-id: LUS-6455 Reviewed-on: https://review.whamcloud.com/34991 Reviewed-by: Gu Zheng Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/libcfs_cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/libcfs/libcfs_cpu.c b/net/lnet/libcfs/libcfs_cpu.c index 80533c2..20ca15a 100644 --- a/net/lnet/libcfs/libcfs_cpu.c +++ b/net/lnet/libcfs/libcfs_cpu.c @@ -913,7 +913,7 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt) int ncpu = cpumask_weight(part->cpt_cpumask); rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask, - num - ncpu); + (rem > 0) + num - ncpu); if (rc < 0) { rc = -EINVAL; goto failed_mask; From patchwork Thu Feb 27 21:12:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410205 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 793F592A for ; Thu, 27 Feb 2020 21:32:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6040E24677 for ; Thu, 27 Feb 2020 21:32:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6040E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D0822349AFE; Thu, 27 Feb 2020 13:27:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3E1EB21FEED for ; Thu, 27 Feb 2020 13:19:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EDBC18A49; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ECC8746D; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:57 -0500 Message-Id: <1582838290-17243-310-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 309/622] lustre: lov: fix wrong calculated length for fiemap X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong lov_stripe_intersects() will return a closed interval [@obd_start, @obd_end], so to calcuate length of interval we need @obd_end - @obd_start + 1 rather than @obd_end - @obd_start Wrong extent length will make us return wrong fiemap information. WC-bug-id: https://jira.whamcloud.com/browse/LU-12361 Lustre-commit: 225e7b8c70fb ("LU-12361 lov: fix wrong calculated length for fiemap") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34998 Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_object.c | 4 ++-- fs/lustre/lov/lov_offset.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 7543ef2..27e0ca5 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -1677,7 +1677,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj, if (lun_start == lun_end) return 0; - req_fm_len = obd_object_end - lun_start; + req_fm_len = obd_object_end - lun_start + 1; fs->fs_fm->fm_length = 0; len_mapped_single_call = 0; @@ -1723,7 +1723,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj, fs->fs_fm->fm_mapped_extents = 1; fm_ext[0].fe_logical = lun_start; - fm_ext[0].fe_length = obd_object_end - lun_start; + fm_ext[0].fe_length = obd_object_end - lun_start + 1; fm_ext[0].fe_flags |= FIEMAP_EXTENT_UNKNOWN; goto inactive_tgt; diff --git a/fs/lustre/lov/lov_offset.c b/fs/lustre/lov/lov_offset.c index bb67d82..b53ce43 100644 --- a/fs/lustre/lov/lov_offset.c +++ b/fs/lustre/lov/lov_offset.c @@ -226,6 +226,8 @@ u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size, /* given an extent in an lov and a stripe, calculate the extent of the stripe * that is contained within the lov extent. this returns true if the given * stripe does intersect with the lov extent. + * + * Closed interval [@obd_start, @obd_end] will be returned. */ int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno, struct lu_extent *ext, u64 *obd_start, u64 *obd_end) From patchwork Thu Feb 27 21:12:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0E7F17E0 for ; Thu, 27 Feb 2020 21:30:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9EF7246A0 for ; Thu, 27 Feb 2020 21:30:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C9EF7246A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B44C0349661; Thu, 27 Feb 2020 13:26:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 85B2121C973 for ; Thu, 27 Feb 2020 13:19:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F087F8A4A; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EF86146F; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:58 -0500 Message-Id: <1582838290-17243-311-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 310/622] lustre: obdclass: remove unprotected access to lu_object X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The check of lu_object_is_dying() is done after reference drop and without lock, so can access freed object if concurrent thread did final put. The patch saves object state right before atomic_dec_and_lock() and checks it after check, so object is not being accessed WC-bug-id: https://jira.whamcloud.com/browse/LU-11204 Lustre-commit: 336cf0f2f3a9 ("LU-11204 obdclass: remove unprotected access to lu_object") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/34960 Reviewed-by: Alex Zhuravlev Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_object.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 2f709b0..bafd817 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -128,22 +128,18 @@ enum { void lu_object_put(const struct lu_env *env, struct lu_object *o) { struct lu_site_bkt_data *bkt; - struct lu_object_header *top; - struct lu_site *site; - struct lu_object *orig; + struct lu_object_header *top = o->lo_header; + struct lu_site *site = o->lo_dev->ld_site; + struct lu_object *orig = o; struct cfs_hash_bd bd; - const struct lu_fid *fid; - - top = o->lo_header; - site = o->lo_dev->ld_site; - orig = o; + const struct lu_fid *fid = lu_object_fid(o); + bool is_dying; /* * till we have full fids-on-OST implemented anonymous objects * are possible in OSP. such an object isn't listed in the site * so we should not remove it from the site. */ - fid = lu_object_fid(o); if (fid_is_zero(fid)) { LASSERT(!top->loh_hash.next && !top->loh_hash.pprev); LASSERT(list_empty(&top->loh_lru)); @@ -160,8 +156,14 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) cfs_hash_bd_get(site->ls_obj_hash, &top->loh_fid, &bd); bkt = cfs_hash_bd_extra_get(site->ls_obj_hash, &bd); + is_dying = lu_object_is_dying(top); if (!cfs_hash_bd_dec_and_lock(site->ls_obj_hash, &bd, &top->loh_ref)) { - if (lu_object_is_dying(top)) { + /* at this point the object reference is dropped and lock is + * not taken, so lu_object should not be touched because it + * can be freed by concurrent thread. Use local variable for + * check. + */ + if (is_dying) { /* * somebody may be waiting for this, currently only * used for cl_object, see cl_object_put_last(). @@ -180,6 +182,10 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) o->lo_ops->loo_object_release(env, o); } + /* don't use local 'is_dying' here because if was taken without lock + * but here we need the latest actual value of it so check lu_object + * directly here. + */ if (!lu_object_is_dying(top)) { LASSERT(list_empty(&top->loh_lru)); list_add_tail(&top->loh_lru, &bkt->lsb_lru); From patchwork Thu Feb 27 21:12:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410301 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22DEB138D for ; Thu, 27 Feb 2020 21:34:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B51E24677 for ; Thu, 27 Feb 2020 21:34:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B51E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 22167348D6F; Thu, 27 Feb 2020 13:29:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DC00021C9EE for ; Thu, 27 Feb 2020 13:19:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F358F8A4B; Thu, 27 Feb 2020 16:18:16 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F2365468; Thu, 27 Feb 2020 16:18:16 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:12:59 -0500 Message-Id: <1582838290-17243-312-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 311/622] lustre: push rcu_barrier() before destroying slab X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong From rcubarrier.txt: " We could try placing a synchronize_rcu() in the module-exit code path, but this is not sufficient. Although synchronize_rcu() does wait for a grace period to elapse, it does not wait for the callbacks to complete. One might be tempted to try several back-to-back synchronize_rcu() calls, but this is still not guaranteed to work. If there is a very heavy RCU-callback load, then some of the callbacks might be deferred in order to allow other processing to proceed. Such deferral is required in realtime kernels in order to avoid excessive scheduling latencies. We instead need the rcu_barrier() primitive. This primitive is similar to synchronize_rcu(), but instead of waiting solely for a grace period to elapse, it also waits for all outstanding RCU callbacks to complete. Pseudo-code using rcu_barrier() is as follows: 1. Prevent any new RCU callbacks from being posted. 2. Execute rcu_barrier(). 3. Allow the module to be unloaded. " So use synchronize_rcu() in ldlm_exit() is not safe enough, and we might still hit use-after-free problem, also we missed rcu_barrier() when destroy inode cache, this is simiar idea what current local filesystem does. WC-bug-id: https://jira.whamcloud.com/browse/LU-12374 Lustre-commit: 1f7613968c80 ("LU-12374 lustre: push rcu_barrier() before destroying slab") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35030 Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lockd.c | 6 +++--- fs/lustre/llite/super25.c | 5 +++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 3b405be..79dab6e 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -1204,10 +1204,10 @@ void ldlm_exit(void) kmem_cache_destroy(ldlm_resource_slab); /* * ldlm_lock_put() use RCU to call ldlm_lock_free, so need call - * synchronize_rcu() to wait a grace period elapsed, so that - * ldlm_lock_free() get a chance to be called. + * rcu_barrier() to wait all outstanding RCU callbacks to complete, + * so that ldlm_lock_free() get a chance to be called. */ - synchronize_rcu(); + rcu_barrier(); kmem_cache_destroy(ldlm_lock_slab); kmem_cache_destroy(ldlm_interval_tree_slab); } diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 133fe2a..6cae48c 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -271,6 +271,11 @@ static void __exit lustre_exit(void) cl_env_put(cl_inode_fini_env, &cl_inode_fini_refcheck); vvp_global_fini(); + /* + * Make sure all delayed rcu free inodes are flushed before we + * destroy cache. + */ + rcu_barrier(); kmem_cache_destroy(ll_inode_cachep); kmem_cache_destroy(ll_file_data_slab); } From patchwork Thu Feb 27 21:13:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410175 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C30C92A for ; Thu, 27 Feb 2020 21:32:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3478424677 for ; Thu, 27 Feb 2020 21:32:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3478424677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 007373499D0; Thu, 27 Feb 2020 13:27:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28DB421C937 for ; Thu, 27 Feb 2020 13:19:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 03EF28A4C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 00F7F46A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:00 -0500 Message-Id: <1582838290-17243-313-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 312/622] lustre: ptlrpc: intent_getattr fetches default LMV X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Intent_getattr fetches default LMV, and caches it on client, which will be used in subdir creation. * Add RMF_DEFAULT_MDT_MD in intent_getattr reply. * Save default LMV in ll_inode_info->lli_default_lsm_md, and replace lli_def_stripe_offset with it. * take LOOKUP lock on default LMV setting to let client update cached default LMV. * improve mdt_object_striped() to read from bottom device to avoid reading stripe FIDs. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 55ca00c3d1cd ("LU-11213 ptlrpc: intent_getattr fetches default LMV") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34802 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 13 +++++-- fs/lustre/include/lustre_req_layout.h | 1 + fs/lustre/include/obd.h | 14 +++++-- fs/lustre/llite/llite_internal.h | 20 ++-------- fs/lustre/llite/llite_lib.c | 72 ++++++++++++++++++++++++++++++++--- fs/lustre/llite/namei.c | 56 +++++++++++++++++++-------- fs/lustre/lmv/lmv_obd.c | 41 +++++++++++++++----- fs/lustre/mdc/mdc_locks.c | 10 +++-- fs/lustre/mdc/mdc_request.c | 41 ++++++++++++++++---- fs/lustre/ptlrpc/layout.c | 8 +++- 10 files changed, 210 insertions(+), 66 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index cef315d..c88e4b5 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -72,10 +72,12 @@ struct lmv_stripe_md { strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name) != 0) return false; - for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) { - if (!lu_fid_eq(&lsm1->lsm_md_oinfo[idx].lmo_fid, - &lsm2->lsm_md_oinfo[idx].lmo_fid)) - return false; + if (lsm1->lsm_md_magic == LMV_MAGIC_V1) { + for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) { + if (!lu_fid_eq(&lsm1->lsm_md_oinfo[idx].lmo_fid, + &lsm2->lsm_md_oinfo[idx].lmo_fid)) + return false; + } } return true; @@ -92,6 +94,9 @@ static inline void lsm_md_dump(int mask, const struct lmv_stripe_md *lsm) lsm->lsm_md_layout_version, lsm->lsm_md_migrate_offset, lsm->lsm_md_migrate_hash, lsm->lsm_md_pool_name); + if (lsm->lsm_md_magic != LMV_MAGIC_V1) + return; + for (i = 0; i < lsm->lsm_md_stripe_count; i++) CDEBUG(mask, "stripe[%d] "DFID"\n", i, PFID(&lsm->lsm_md_oinfo[i].lmo_fid)); diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 378f0b6..dca4ef4 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -249,6 +249,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_msg_field RMF_LDLM_INTENT; extern struct req_msg_field RMF_LAYOUT_INTENT; extern struct req_msg_field RMF_MDT_MD; +extern struct req_msg_field RMF_DEFAULT_MDT_MD; extern struct req_msg_field RMF_REC_REINT; extern struct req_msg_field RMF_EADATA; extern struct req_msg_field RMF_EAVALS; diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 996211a..fb77df7 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -729,6 +729,14 @@ enum md_cli_flags { CLI_MIGRATE = BIT(4), }; +enum md_op_code { + LUSTRE_OPC_MKDIR = 0, + LUSTRE_OPC_SYMLINK = 1, + LUSTRE_OPC_MKNOD = 2, + LUSTRE_OPC_CREATE = 3, + LUSTRE_OPC_ANY = 5, +}; + /** * GETXATTR is not included as only a couple of fields in the reply body * is filled, but not FID which is needed for common intent handling in @@ -746,6 +754,7 @@ struct md_op_data { struct lu_fid op_fid4; /* to the operation locks. */ u32 op_mds; /* what mds server open will go to */ u32 op_mode; + enum md_op_code op_code; struct lustre_handle op_open_handle; s64 op_mod_time; const char *op_name; @@ -754,6 +763,7 @@ struct md_op_data { struct rw_semaphore *op_mea2_sem; struct lmv_stripe_md *op_mea1; struct lmv_stripe_md *op_mea2; + struct lmv_stripe_md *op_default_mea1; /* default LMV */ u32 op_suppgids[2]; u32 op_fsuid; u32 op_fsgid; @@ -791,9 +801,6 @@ struct md_op_data { void *op_file_secctx; u32 op_file_secctx_size; - /* default stripe offset */ - u32 op_default_stripe_offset; - u32 op_projid; u16 op_mirror_id; @@ -933,6 +940,7 @@ struct lustre_md { struct lmv_stripe_md *lmv; struct lmv_foreign_md *lfm; }; + struct lmv_stripe_md *default_lmv; #ifdef CONFIG_LUSTRE_FS_POSIX_ACL struct posix_acl *posix_acl; #endif diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index eb7e0dc..687d504 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -172,13 +172,8 @@ struct ll_inode_info { struct rw_semaphore lli_lsm_sem; /* directory stripe information */ struct lmv_stripe_md *lli_lsm_md; - /* default directory stripe offset. This is extracted - * from the "dmv" xattr in order to decide which MDT to - * create a subdirectory on. The MDS itself fetches - * "dmv" and gets the rest of the default layout itself - * (count, hash, etc). - */ - u32 lli_def_stripe_offset; + /* directory default LMV */ + struct lmv_stripe_md *lli_default_lsm_md; }; /* for non-directory */ @@ -921,19 +916,12 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, int ll_get_default_mdsize(struct ll_sb_info *sbi, int *default_mdsize); int ll_set_default_mdsize(struct ll_sb_info *sbi, int default_mdsize); -enum { - LUSTRE_OPC_MKDIR = 0, - LUSTRE_OPC_SYMLINK = 1, - LUSTRE_OPC_MKNOD = 2, - LUSTRE_OPC_CREATE = 3, - LUSTRE_OPC_ANY = 5, -}; - void ll_unlock_md_op_lsm(struct md_op_data *op_data); struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, struct inode *i1, struct inode *i2, const char *name, size_t namelen, - u32 mode, u32 opc, void *data); + u32 mode, enum md_op_code opc, + void *data); void ll_finish_md_op_data(struct md_op_data *op_data); int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg); void ll_compute_rootsquash_state(struct ll_sb_info *sbi); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e6ac16f..bd17ba1 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -939,7 +939,6 @@ void ll_lli_init(struct ll_inode_info *lli) spin_lock_init(&lli->lli_sa_lock); lli->lli_opendir_pid = 0; lli->lli_sa_enabled = 0; - lli->lli_def_stripe_offset = -1; init_rwsem(&lli->lli_lsm_sem); } else { mutex_init(&lli->lli_size_mutex); @@ -1216,6 +1215,11 @@ void ll_dir_clear_lsm_md(struct inode *inode) lmv_free_memmd(lli->lli_lsm_md); lli->lli_lsm_md = NULL; } + + if (lli->lli_default_lsm_md) { + lmv_free_memmd(lli->lli_default_lsm_md); + lli->lli_default_lsm_md = NULL; + } } static struct inode *ll_iget_anon_dir(struct super_block *sb, @@ -1314,6 +1318,46 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) return 0; } +static void ll_update_default_lsm_md(struct inode *inode, struct lustre_md *md) +{ + struct ll_inode_info *lli = ll_i2info(inode); + + if (!md->default_lmv) { + /* clear default lsm */ + if (lli->lli_default_lsm_md) { + down_write(&lli->lli_lsm_sem); + if (lli->lli_default_lsm_md) { + lmv_free_memmd(lli->lli_default_lsm_md); + lli->lli_default_lsm_md = NULL; + } + up_write(&lli->lli_lsm_sem); + } + } else if (lli->lli_default_lsm_md) { + /* update default lsm if it changes */ + down_read(&lli->lli_lsm_sem); + if (lli->lli_default_lsm_md && + !lsm_md_eq(lli->lli_default_lsm_md, md->default_lmv)) { + up_read(&lli->lli_lsm_sem); + down_write(&lli->lli_lsm_sem); + if (lli->lli_default_lsm_md) + lmv_free_memmd(lli->lli_default_lsm_md); + lli->lli_default_lsm_md = md->default_lmv; + lsm_md_dump(D_INODE, md->default_lmv); + md->default_lmv = NULL; + up_write(&lli->lli_lsm_sem); + } else { + up_read(&lli->lli_lsm_sem); + } + } else { + /* init default lsm */ + down_write(&lli->lli_lsm_sem); + lli->lli_default_lsm_md = md->default_lmv; + lsm_md_dump(D_INODE, md->default_lmv); + md->default_lmv = NULL; + up_write(&lli->lli_lsm_sem); + } +} + static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) { struct ll_inode_info *lli = ll_i2info(inode); @@ -1324,6 +1368,10 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) CDEBUG(D_INODE, "update lsm %p of " DFID "\n", lli->lli_lsm_md, PFID(ll_inode2fid(inode))); + /* update default LMV */ + if (md->default_lmv) + ll_update_default_lsm_md(inode, md); + /* * no striped information from request, lustre_md from req does not * include stripeEA, see ll_md_setattr() @@ -2322,6 +2370,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, { struct ll_sb_info *sbi = NULL; struct lustre_md md = { NULL }; + bool default_lmv_deleted = false; int rc; LASSERT(*inode || sb); @@ -2331,6 +2380,15 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, if (rc) goto out; + /* + * clear default_lmv only if intent_getattr reply doesn't contain it. + * but it needs to be done after iget, check this early because + * ll_update_lsm_md() may change md. + */ + if (it && (it->it_op & (IT_LOOKUP | IT_GETATTR)) && + S_ISDIR(md.body->mbo_mode) && !md.default_lmv) + default_lmv_deleted = true; + if (*inode) { rc = ll_update_inode(*inode, &md); if (rc) @@ -2396,9 +2454,12 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, LDLM_LOCK_PUT(lock); } + if (default_lmv_deleted) + ll_update_default_lsm_md(*inode, &md); out: /* cleanup will be done if necessary */ md_free_lustre_md(sbi->ll_md_exp, &md); + if (rc != 0 && it && it->it_op & IT_OPEN) ll_open_cleanup(sb ? sb : (*inode)->i_sb, req); @@ -2481,7 +2542,8 @@ void ll_unlock_md_op_lsm(struct md_op_data *op_data) struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, struct inode *i1, struct inode *i2, const char *name, size_t namelen, - u32 mode, u32 opc, void *data) + u32 mode, enum md_op_code opc, + void *data) { if (!name) { /* Do not reuse namelen for something else. */ @@ -2503,15 +2565,13 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data, ll_i2gids(op_data->op_suppgids, i1, i2); op_data->op_fid1 = *ll_inode2fid(i1); - op_data->op_default_stripe_offset = -1; + op_data->op_code = opc; if (S_ISDIR(i1->i_mode)) { down_read(&ll_i2info(i1)->lli_lsm_sem); op_data->op_mea1_sem = &ll_i2info(i1)->lli_lsm_sem; op_data->op_mea1 = ll_i2info(i1)->lli_lsm_md; - if (opc == LUSTRE_OPC_MKDIR) - op_data->op_default_stripe_offset = - ll_i2info(i1)->lli_def_stripe_offset; + op_data->op_default_mea1 = ll_i2info(i1)->lli_default_lsm_md; } if (i2) { diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 3c796bd..1aaf184 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -246,8 +246,6 @@ void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) } if (bits & MDS_INODELOCK_XATTR) { - if (S_ISDIR(inode->i_mode)) - ll_i2info(inode)->lli_def_stripe_offset = -1; ll_xattr_cache_destroy(inode); bits &= ~MDS_INODELOCK_XATTR; } @@ -1155,14 +1153,10 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry, from_kuid(&init_user_ns, current_fsuid()), from_kgid(&init_user_ns, current_fsgid()), current_cap(), rdev, &request); - if (err < 0 && err != -EREMOTE) - goto err_exit; - +#if OBD_OCD_VERSION(2, 14, 58, 0) > LUSTRE_VERSION_CODE /* - * If the client doesn't know where to create a subdirectory (or - * in case of a race that sends the RPC to the wrong MDS), the - * MDS will return -EREMOTE and the client will fetch the layout - * of the directory, then create the directory on the right MDT. + * server < 2.12.58 doesn't pack default LMV in intent_getattr reply, + * fetch default LMV here. */ if (unlikely(err == -EREMOTE)) { struct ll_inode_info *lli = ll_i2info(dir); @@ -1174,26 +1168,58 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry, err2 = ll_dir_getstripe(dir, (void **)&lum, &lumsize, &request, OBD_MD_DEFAULT_MEA); + ll_finish_md_op_data(op_data); + op_data = NULL; if (!err2) { - /* Update stripe_offset and retry */ - lli->lli_def_stripe_offset = lum->lum_stripe_offset; - } else if (err2 == -ENODATA && - lli->lli_def_stripe_offset != -1) { + struct lustre_md md = { NULL }; + + md.body = req_capsule_server_get(&request->rq_pill, + &RMF_MDT_BODY); + if (!md.body) { + err = -EPROTO; + goto err_exit; + } + + md.default_lmv = kzalloc(sizeof(*md.default_lmv), + GFP_NOFS); + if (!md.default_lmv) { + err = -ENOMEM; + goto err_exit; + } + + md.default_lmv->lsm_md_magic = lum->lum_magic; + md.default_lmv->lsm_md_stripe_count = + lum->lum_stripe_count; + md.default_lmv->lsm_md_master_mdt_index = + lum->lum_stripe_offset; + md.default_lmv->lsm_md_hash_type = lum->lum_hash_type; + + err = ll_update_inode(dir, &md); + md_free_lustre_md(sbi->ll_md_exp, &md); + if (err) + goto err_exit; + } else if (err2 == -ENODATA && lli->lli_default_lsm_md) { /* * If there are no default stripe EA on the MDT, but the * client has default stripe, then it probably means * default stripe EA has just been deleted. */ - lli->lli_def_stripe_offset = -1; + down_write(&lli->lli_lsm_sem); + kfree(lli->lli_default_lsm_md); + lli->lli_default_lsm_md = NULL; + up_write(&lli->lli_lsm_sem); } else { goto err_exit; } ptlrpc_req_finished(request); request = NULL; - ll_finish_md_op_data(op_data); goto again; } +#endif + + if (err < 0) + goto err_exit; ll_update_times(request, dir); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 4b5bd36..48cd41a 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1176,13 +1176,12 @@ static int lmv_placement_policy(struct obd_device *obd, le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN) && le32_to_cpu(lum->lum_stripe_offset) != (u32)-1) { *mds = le32_to_cpu(lum->lum_stripe_offset); - } else if (op_data->op_default_stripe_offset != (u32)-1) { - *mds = op_data->op_default_stripe_offset; + } else if (op_data->op_code == LUSTRE_OPC_MKDIR && + op_data->op_default_mea1 && + op_data->op_default_mea1->lsm_md_master_mdt_index != + (u32)-1) { + *mds = op_data->op_default_mea1->lsm_md_master_mdt_index; op_data->op_mds = *mds; - /* Correct the stripe offset in lum */ - if (lum && - le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN)) - lum->lum_stripe_offset = cpu_to_le32(*mds); } else { *mds = op_data->op_mds; } @@ -2981,6 +2980,18 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm, return rc; } +static inline int lmv_unpack_user_md(struct obd_export *exp, + struct lmv_stripe_md *lsm, + const struct lmv_user_md *lmu) +{ + lsm->lsm_md_magic = le32_to_cpu(lmu->lum_magic); + lsm->lsm_md_stripe_count = le32_to_cpu(lmu->lum_stripe_count); + lsm->lsm_md_master_mdt_index = le32_to_cpu(lmu->lum_stripe_offset); + lsm->lsm_md_hash_type = le32_to_cpu(lmu->lum_hash_type); + + return 0; +} + static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, const union lmv_mds_md *lmm, size_t lmm_size) { @@ -3005,9 +3016,14 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, return 0; } - for (i = 0; i < lsm->lsm_md_stripe_count; i++) { - if (lsm->lsm_md_oinfo[i].lmo_root) - iput(lsm->lsm_md_oinfo[i].lmo_root); + if (lsm->lsm_md_magic == LMV_MAGIC) { + for (i = 0; i < lsm->lsm_md_stripe_count; i++) { + if (lsm->lsm_md_oinfo[i].lmo_root) + iput(lsm->lsm_md_oinfo[i].lmo_root); + } + lsm_size = lmv_stripe_md_size(lsm->lsm_md_stripe_count); + } else { + lsm_size = lmv_stripe_md_size(0); } kvfree(lsm); *lsmp = NULL; @@ -3066,6 +3082,9 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, case LMV_MAGIC_V1: rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1); break; + case LMV_USER_MAGIC: + rc = lmv_unpack_user_md(exp, lsm, &lmm->lmv_user_md); + break; default: CERROR("%s: unrecognized magic %x\n", exp->exp_obd->obd_name, le32_to_cpu(lmm->lmv_magic)); @@ -3190,6 +3209,10 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md) struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt = lmv->tgts[0]; + if (md->default_lmv) { + lmv_free_memmd(md->default_lmv); + md->default_lmv = NULL; + } if (md->lmv) { lmv_free_memmd(md->lmv); md->lmv = NULL; diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index f6273ef..cf6bc9d 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -504,13 +504,13 @@ static int mdc_save_lovea(struct ptlrpc_request *req, { struct ptlrpc_request *req; struct obd_device *obddev = class_exp2obd(exp); - u64 valid = OBD_MD_FLGETATTR | OBD_MD_FLEASIZE | - OBD_MD_FLMODEASIZE | OBD_MD_FLDIREA | - OBD_MD_MEA | OBD_MD_FLACL; + u64 valid = OBD_MD_FLGETATTR | OBD_MD_FLEASIZE | OBD_MD_FLMODEASIZE | + OBD_MD_FLDIREA | OBD_MD_MEA | OBD_MD_FLACL | + OBD_MD_DEFAULT_MEA; struct ldlm_intent *lit; - int rc; u32 easize; bool have_secctx = false; + int rc; req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_LDLM_INTENT_GETATTR); @@ -549,6 +549,8 @@ static int mdc_save_lovea(struct ptlrpc_request *req, req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, easize); req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize); + req_capsule_set_size(&req->rq_pill, &RMF_DEFAULT_MDT_MD, RCL_SERVER, + sizeof(struct lmv_user_md)); if (have_secctx) { char *secctx_name; diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 57da3c3..c834891 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -594,13 +594,13 @@ static int mdc_get_lustre_md(struct obd_export *exp, goto out; } - lmv_size = md->body->mbo_eadatasize; - if (!lmv_size) { - CDEBUG(D_INFO, - "OBD_MD_FLDIREA is set, but eadatasize 0\n"); - return -EPROTO; - } if (md->body->mbo_valid & OBD_MD_MEA) { + lmv_size = md->body->mbo_eadatasize; + if (!lmv_size) { + CDEBUG(D_INFO, + "OBD_MD_FLDIREA is set, but eadatasize 0\n"); + return -EPROTO; + } lmv = req_capsule_server_sized_get(pill, &RMF_MDT_MD, lmv_size); if (!lmv) { @@ -612,7 +612,7 @@ static int mdc_get_lustre_md(struct obd_export *exp, if (rc < 0) goto out; - if (rc < (typeof(rc))sizeof(*md->lmv)) { + if (rc < (int)sizeof(*md->lmv)) { struct lmv_foreign_md *lfm = md->lfm; /* short (< sizeof(struct lmv_stripe_md)) @@ -620,13 +620,38 @@ static int mdc_get_lustre_md(struct obd_export *exp, */ if (lfm->lfm_magic != LMV_MAGIC_FOREIGN) { CDEBUG(D_INFO, - "size too small: rc < sizeof(*md->lmv) (%d < %d)\n", + "lmv size too small: %d < %d\n", rc, (int)sizeof(*md->lmv)); rc = -EPROTO; goto out; } } } + + /* since 2.12.58 intent_getattr fetches default LMV */ + if (md->body->mbo_valid & OBD_MD_DEFAULT_MEA) { + lmv_size = sizeof(struct lmv_user_md); + lmv = req_capsule_server_sized_get(pill, + &RMF_DEFAULT_MDT_MD, + lmv_size); + if (!lmv) { + rc = -EPROTO; + goto out; + } + + rc = md_unpackmd(md_exp, &md->default_lmv, lmv, + lmv_size); + if (rc < 0) + goto out; + + if (rc < (int)sizeof(*md->default_lmv)) { + CDEBUG(D_INFO, + "default lmv size too small: %d < %d\n", + rc, (int)sizeof(*md->lmv)); + rc = -EPROTO; + goto out; + } + } } rc = 0; diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 9a676ae..c10b593 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -446,7 +446,8 @@ &RMF_MDT_MD, &RMF_ACL, &RMF_CAPA1, - &RMF_FILE_SECCTX + &RMF_FILE_SECCTX, + &RMF_DEFAULT_MDT_MD }; static const struct req_msg_field *ldlm_intent_create_client[] = { @@ -1016,6 +1017,11 @@ struct req_msg_field RMF_MDT_MD = DEFINE_MSGF("mdt_md", RMF_F_NO_SIZE_CHECK, MIN_MD_SIZE, NULL, NULL); EXPORT_SYMBOL(RMF_MDT_MD); +struct req_msg_field RMF_DEFAULT_MDT_MD = + DEFINE_MSGF("default_mdt_md", RMF_F_NO_SIZE_CHECK, MIN_MD_SIZE, NULL, + NULL); +EXPORT_SYMBOL(RMF_DEFAULT_MDT_MD); + struct req_msg_field RMF_REC_REINT = DEFINE_MSGF("rec_reint", 0, sizeof(struct mdt_rec_reint), lustre_swab_mdt_rec_reint, NULL); From patchwork Thu Feb 27 21:13:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410179 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75142138D for ; Thu, 27 Feb 2020 21:32:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5E11224677 for ; Thu, 27 Feb 2020 21:32:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E11224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5940E3499F5; Thu, 27 Feb 2020 13:27:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8004421C937 for ; Thu, 27 Feb 2020 13:19:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0684E8A4E; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 03DAA46C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:01 -0500 Message-Id: <1582838290-17243-314-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 313/622] lustre: mdc: add async statfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Add obd_statfs_async() interface for MDC, the statfs request is sent by ptlrpcd. This statfs result is for each MDT separately, it's different from current cached statfs which is aggregated statfs of all MDTs. The max age of statfs result is decided by lmv_desc.ld_qos_maxage. It will deactivate MDC on failure, and activate MDC on success. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 7f412954ad38 ("LU-11213 mdc: add async statfs") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34359 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 4 ++++ fs/lustre/include/obd_class.h | 18 +++------------- fs/lustre/lmv/lmv_internal.h | 2 ++ fs/lustre/lmv/lmv_obd.c | 44 +++++++++++++++++++++++++++++++++++++++ fs/lustre/mdc/mdc_request.c | 48 +++++++++++++++++++++++++++++++++++++++++++ fs/lustre/osc/osc_request.c | 16 +++++++++++++++ 6 files changed, 117 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index fb77df7..e815584 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -86,6 +86,8 @@ static inline void loi_kms_set(struct lov_oinfo *oinfo, u64 kms) struct obd_info { /* OBD_STATFS_* flags */ u64 oi_flags; + struct obd_device *oi_obd; + struct lmv_tgt_desc *oi_tgt; /* lsm data specific for every OSC. */ struct lov_stripe_md *oi_md; /* statfs data specific for every OSC, if needed at all. */ @@ -435,6 +437,8 @@ struct lmv_tgt_desc { struct obd_export *ltd_exp; u32 ltd_idx; struct mutex ltd_fid_mutex; + struct obd_statfs ltd_statfs; + time64_t ltd_statfs_age; unsigned long ltd_active:1; /* target up for requests */ }; diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index a890d00..58c743c 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -912,21 +912,9 @@ static inline int obd_statfs_async(struct obd_export *exp, CDEBUG(D_SUPER, "%s: age %lld, max_age %lld\n", obd->obd_name, obd->obd_osfs_age, max_age); - if (obd->obd_osfs_age < max_age) { - rc = OBP(obd, statfs_async)(exp, oinfo, max_age, rqset); - } else { - CDEBUG(D_SUPER, - "%s: use %p cache blocks %llu/%llu objects %llu/%llu\n", - obd->obd_name, &obd->obd_osfs, - obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks, - obd->obd_osfs.os_ffree, obd->obd_osfs.os_files); - spin_lock(&obd->obd_osfs_lock); - memcpy(oinfo->oi_osfs, &obd->obd_osfs, sizeof(*oinfo->oi_osfs)); - spin_unlock(&obd->obd_osfs_lock); - oinfo->oi_flags |= OBD_STATFS_FROM_CACHE; - if (oinfo->oi_cb_up) - oinfo->oi_cb_up(oinfo, 0); - } + + rc = OBP(obd, statfs_async)(exp, oinfo, max_age, rqset); + return rc; } diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index e434919..b4c5297 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -61,6 +61,8 @@ int lmv_revalidate_slaves(struct obd_export *exp, int lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data, struct ptlrpc_request **preq); +int lmv_statfs_check_update(struct obd_device *obd, struct lmv_tgt_desc *tgt); + static inline struct obd_device *lmv2obd_dev(struct lmv_obd *lmv) { return container_of_safe(lmv, struct obd_device, u.lmv); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 48cd41a..4365533 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -349,6 +349,8 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) mdc_obd->obd_name, mdc_obd->obd_uuid.uuid, atomic_read(&obd->obd_refcount)); + lmv_statfs_check_update(obd, tgt); + if (lmv->lmv_tgts_kobj) /* Even if we failed to create the link, that's fine */ rc = sysfs_create_link(lmv->lmv_tgts_kobj, @@ -1276,6 +1278,7 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) obd_str2uuid(&lmv->desc.ld_uuid, desc->ld_uuid.uuid); lmv->desc.ld_tgt_count = 0; lmv->desc.ld_active_tgt_count = 0; + lmv->desc.ld_qos_maxage = 60; lmv->max_def_easize = 0; lmv->max_easize = 0; @@ -1445,6 +1448,47 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, return rc; } +static int lmv_statfs_update(void *cookie, int rc) +{ + struct obd_info *oinfo = cookie; + struct obd_device *obd = oinfo->oi_obd; + struct lmv_obd *lmv = &obd->u.lmv; + struct lmv_tgt_desc *tgt = oinfo->oi_tgt; + struct obd_statfs *osfs = oinfo->oi_osfs; + + /* + * NB: don't deactivate TGT upon error, because we may not trigger async + * statfs any longer, then there is no chance to activate TGT. + */ + if (!rc) { + spin_lock(&lmv->lmv_lock); + tgt->ltd_statfs = *osfs; + tgt->ltd_statfs_age = ktime_get_seconds(); + spin_unlock(&lmv->lmv_lock); + } + + return rc; +} + +/* update tgt statfs async if it's ld_qos_maxage old */ +int lmv_statfs_check_update(struct obd_device *obd, struct lmv_tgt_desc *tgt) +{ + struct obd_info oinfo = { + .oi_obd = obd, + .oi_tgt = tgt, + .oi_cb_up = lmv_statfs_update, + }; + int rc; + + if (ktime_get_seconds() - tgt->ltd_statfs_age < + obd->u.lmv.desc.ld_qos_maxage) + return 0; + + rc = obd_statfs_async(tgt->ltd_exp, &oinfo, 0, NULL); + + return rc; +} + static int lmv_get_root(struct obd_export *exp, const char *fileset, struct lu_fid *fid) { diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index c834891..a26efa1 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1570,6 +1570,53 @@ static int mdc_read_page(struct obd_export *exp, struct md_op_data *op_data, goto out_unlock; } +static int mdc_statfs_interpret(const struct lu_env *env, + struct ptlrpc_request *req, void *args, int rc) +{ + struct obd_info *oinfo = args; + struct obd_statfs *osfs; + + if (!rc) { + osfs = req_capsule_server_get(&req->rq_pill, &RMF_OBD_STATFS); + if (!osfs) + return -EPROTO; + + oinfo->oi_osfs = osfs; + + CDEBUG(D_CACHE, + "blocks=%llu free=%llu avail=%llu objects=%llu free=%llu state=%x\n", + osfs->os_blocks, osfs->os_bfree, osfs->os_bavail, + osfs->os_files, osfs->os_ffree, osfs->os_state); + } + + oinfo->oi_cb_up(oinfo, rc); + + return rc; +} + +static int mdc_statfs_async(struct obd_export *exp, + struct obd_info *oinfo, time64_t max_age, + struct ptlrpc_request_set *unused) +{ + struct ptlrpc_request *req; + struct obd_info *aa; + + req = ptlrpc_request_alloc_pack(class_exp2cliimp(exp), &RQF_MDS_STATFS, + LUSTRE_MDS_VERSION, MDS_STATFS); + if (!req) + return -ENOMEM; + + ptlrpc_request_set_replen(req); + req->rq_interpret_reply = mdc_statfs_interpret; + + aa = ptlrpc_req_async_args(aa, req); + *aa = *oinfo; + + ptlrpcd_add_req(req); + + return 0; +} + static int mdc_statfs(const struct lu_env *env, struct obd_export *exp, struct obd_statfs *osfs, time64_t max_age, u32 flags) @@ -2802,6 +2849,7 @@ static int mdc_cleanup(struct obd_device *obd) .iocontrol = mdc_iocontrol, .set_info_async = mdc_set_info_async, .statfs = mdc_statfs, + .statfs_async = mdc_statfs_async, .fid_init = client_fid_init, .fid_fini = client_fid_fini, .fid_alloc = mdc_fid_alloc, diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index a988cbf..f929908 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2736,6 +2736,22 @@ static int osc_statfs_async(struct obd_export *exp, struct osc_async_args *aa; int rc; + if (obd->obd_osfs_age >= max_age) { + CDEBUG(D_SUPER, + "%s: use %p cache blocks %llu/%llu objects %llu/%llu\n", + obd->obd_name, &obd->obd_osfs, + obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks, + obd->obd_osfs.os_ffree, obd->obd_osfs.os_files); + spin_lock(&obd->obd_osfs_lock); + memcpy(oinfo->oi_osfs, &obd->obd_osfs, sizeof(*oinfo->oi_osfs)); + spin_unlock(&obd->obd_osfs_lock); + oinfo->oi_flags |= OBD_STATFS_FROM_CACHE; + if (oinfo->oi_cb_up) + oinfo->oi_cb_up(oinfo, 0); + + return 0; + } + /* We could possibly pass max_age in the request (as an absolute * timestamp or a "seconds.usec ago") so the target can avoid doing * extra calls into the filesystem if that isn't necessary (e.g. From patchwork Thu Feb 27 21:13:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410183 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D3A992A for ; Thu, 27 Feb 2020 21:32:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 759B424677 for ; Thu, 27 Feb 2020 21:32:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 759B424677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A74DD349A1D; Thu, 27 Feb 2020 13:27:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D93DA21FEF8 for ; Thu, 27 Feb 2020 13:19:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 07FC08A4F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 06D2046D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:02 -0500 Message-Id: <1582838290-17243-315-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 314/622] lustre: lmv: mkdir with balanced space usage X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao If a plain directory default LMV hash type is "space", create subdirs on all MDTs with balanced space usage: * client mkdir allocate FID on MDT with balanced space usage (space QoS code is in next patch). * MDT allows mkdir on different MDT with its parent if it has "space" hash type in default LMV, this is normally rejected because mkdir shouldn't create remote directory. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 6d296587441d ("LU-11213 lmv: mkdir with balanced space usage") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34360 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 51 +++++-- fs/lustre/llite/dir.c | 5 +- fs/lustre/llite/file.c | 10 +- fs/lustre/llite/llite_internal.h | 7 + fs/lustre/llite/llite_lib.c | 25 ++-- fs/lustre/llite/namei.c | 8 +- fs/lustre/lmv/lmv_intent.c | 21 ++- fs/lustre/lmv/lmv_internal.h | 30 +--- fs/lustre/lmv/lmv_obd.c | 299 +++++++++++++++++++-------------------- 9 files changed, 229 insertions(+), 227 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index c88e4b5..bb1efb4 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -55,6 +55,47 @@ struct lmv_stripe_md { struct lmv_oinfo lsm_md_oinfo[0]; }; +/* NB: LMV_HASH_TYPE_SPACE is set in default LMV only */ +static inline bool lmv_is_known_hash_type(u32 type) +{ + return (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_FNV_1A_64 || + (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_ALL_CHARS; +} + +static inline bool lmv_dir_striped(const struct lmv_stripe_md *lsm) +{ + return lsm && lsm->lsm_md_magic == LMV_MAGIC; +} + +static inline bool lmv_dir_foreign(const struct lmv_stripe_md *lsm) +{ + return lsm && lsm->lsm_md_magic == LMV_MAGIC_FOREIGN; +} + +static inline bool lmv_dir_migrating(const struct lmv_stripe_md *lsm) +{ + return lmv_dir_striped(lsm) && + lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION; +} + +static inline bool lmv_dir_bad_hash(const struct lmv_stripe_md *lsm) +{ + if (!lmv_dir_striped(lsm)) + return false; + + if (lmv_dir_migrating(lsm) && + lsm->lsm_md_stripe_count - lsm->lsm_md_migrate_offset <= 1) + return false; + + return !lmv_is_known_hash_type(lsm->lsm_md_hash_type); +} + +/* NB, this is checking directory default LMV */ +static inline bool lmv_dir_space_hashed(const struct lmv_stripe_md *lsm) +{ + return lsm && lsm->lsm_md_hash_type == LMV_HASH_TYPE_SPACE; +} + static inline bool lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2) { @@ -72,7 +113,7 @@ struct lmv_stripe_md { strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name) != 0) return false; - if (lsm1->lsm_md_magic == LMV_MAGIC_V1) { + if (lmv_dir_striped(lsm1)) { for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) { if (!lu_fid_eq(&lsm1->lsm_md_oinfo[idx].lmo_fid, &lsm2->lsm_md_oinfo[idx].lmo_fid)) @@ -94,7 +135,7 @@ static inline void lsm_md_dump(int mask, const struct lmv_stripe_md *lsm) lsm->lsm_md_layout_version, lsm->lsm_md_migrate_offset, lsm->lsm_md_migrate_hash, lsm->lsm_md_pool_name); - if (lsm->lsm_md_magic != LMV_MAGIC_V1) + if (!lmv_dir_striped(lsm)) return; for (i = 0; i < lsm->lsm_md_stripe_count; i++) @@ -188,12 +229,6 @@ static inline int lmv_name_to_stripe_index(u32 lmv_hash_type, return idx; } -static inline bool lmv_is_known_hash_type(u32 type) -{ - return (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_FNV_1A_64 || - (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_ALL_CHARS; -} - static inline bool lmv_magic_supported(u32 lum_magic) { return lum_magic == LMV_USER_MAGIC || diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f75183b..a1dce52 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -160,8 +160,7 @@ void ll_release_page(struct inode *inode, struct page *page, bool remove) * Always remove the page for striped dir, because the page is * built from temporarily in LMV layer */ - if (inode && S_ISDIR(inode->i_mode) && - ll_i2info(inode)->lli_lsm_md) { + if (inode && ll_dir_striped(inode)) { __free_page(page); return; } @@ -314,7 +313,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) goto out; } - if (unlikely(ll_i2info(inode)->lli_lsm_md)) { + if (unlikely(ll_dir_striped(inode))) { /* * This is only needed for striped dir to fill .., * see lmv_read_page diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 191b0f9..50220eb 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3987,7 +3987,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, if (!(exp_connect_flags2(ll_i2sbi(parent)->ll_md_exp) & OBD_CONNECT2_DIR_MIGRATE)) { if (le32_to_cpu(lum->lum_stripe_count) > 1 || - ll_i2info(child_inode)->lli_lsm_md) { + ll_dir_striped(child_inode)) { CERROR("%s: MDT doesn't support stripe directory migration!\n", ll_i2sbi(parent)->ll_fsname); rc = -EOPNOTSUPP; @@ -4179,7 +4179,7 @@ static int ll_inode_revalidate_fini(struct inode *inode, int rc) * Let's revalidate the dentry again, instead of returning * error */ - if (S_ISDIR(inode->i_mode) && ll_i2info(inode)->lli_lsm_md) + if (ll_dir_striped(inode)) return 0; /* This path cannot be hit for regular files unless in @@ -4256,8 +4256,7 @@ static int ll_merge_md_attr(struct inode *inode) LASSERT(lli->lli_lsm_md); - /* foreign dir is not striped dir */ - if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_FOREIGN) + if (!lmv_dir_striped(lli->lli_lsm_md)) return 0; down_read(&lli->lli_lsm_sem); @@ -4307,8 +4306,7 @@ int ll_getattr(const struct path *path, struct kstat *stat, } } else { /* If object isn't regular a file then don't validate size. */ - if (S_ISDIR(inode->i_mode) && - lli->lli_lsm_md != NULL) { + if (ll_dir_striped(inode)) { rc = ll_merge_md_attr(inode); if (rc < 0) return rc; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 687d504..9e413c2 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1071,6 +1071,13 @@ static inline struct lu_fid *ll_inode2fid(struct inode *inode) return fid; } +static inline bool ll_dir_striped(struct inode *inode) +{ + LASSERT(inode); + return S_ISDIR(inode->i_mode) && + lmv_dir_striped(ll_i2info(inode)->lli_lsm_md); +} + static inline loff_t ll_file_maxbytes(struct inode *inode) { struct cl_object *obj = ll_i2info(inode)->lli_clob; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index bd17ba1..0633cc5 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1282,6 +1282,9 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid)); lsm_md_dump(D_INODE, lsm); + if (!lmv_dir_striped(lsm)) + goto out; + /* * XXX sigh, this lsm_root initialization should be in * LMV layer, but it needs ll_iget right now, so we @@ -1312,7 +1315,7 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md) return rc; } } - +out: lli->lli_lsm_md = lsm; return 0; @@ -1394,10 +1397,9 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) * * foreign LMV should not change. */ - if (lli->lli_lsm_md && - lli->lli_lsm_md->lsm_md_magic != LMV_MAGIC_FOREIGN && - !lsm_md_eq(lli->lli_lsm_md, lsm)) { - if (lsm->lsm_md_layout_version <= + if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { + if (lmv_dir_striped(lli->lli_lsm_md) && + lsm->lsm_md_layout_version <= lli->lli_lsm_md->lsm_md_layout_version) { CERROR("%s: " DFID " dir layout mismatch:\n", ll_i2sbi(inode)->ll_fsname, @@ -1418,16 +1420,6 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) if (!lli->lli_lsm_md) { struct cl_attr *attr; - if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) { - /* set md->lmv to NULL, so the following free lustre_md - * will not free this lsm - */ - md->lmv = NULL; - lli->lli_lsm_md = lsm; - up_write(&lli->lli_lsm_sem); - return 0; - } - rc = ll_init_lsm_md(inode, md); up_write(&lli->lli_lsm_sem); if (rc) @@ -1445,6 +1437,9 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) */ down_read(&lli->lli_lsm_sem); + if (!lmv_dir_striped(lli->lli_lsm_md)) + goto unlock; + attr = kzalloc(sizeof(*attr), GFP_NOFS); if (!attr) { rc = -ENOMEM; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 1aaf184..fb5caaf 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -221,6 +221,7 @@ int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) { struct inode *inode = ll_inode_from_resource_lock(lock); + struct ll_inode_info *lli; u64 bits = to_cancel; int rc; @@ -308,13 +309,12 @@ void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) PFID(ll_inode2fid(inode)), rc); } + lli = ll_i2info(inode); if (bits & MDS_INODELOCK_UPDATE) set_bit(LLIF_UPDATE_ATIME, - &ll_i2info(inode)->lli_flags); + &lli->lli_flags); if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) { - struct ll_inode_info *lli = ll_i2info(inode); - CDEBUG(D_INODE, "invalidating inode "DFID" lli = %p, pfid = "DFID"\n", PFID(ll_inode2fid(inode)), @@ -688,7 +688,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, struct lu_fid fid = ll_i2info(parent)->lli_fid; /* If it is striped directory, get the real stripe parent */ - if (unlikely(ll_i2info(parent)->lli_lsm_md)) { + if (unlikely(ll_dir_striped(parent))) { rc = md_get_fid_from_lsm(ll_i2mdexp(parent), ll_i2info(parent)->lli_lsm_md, (*de)->d_name.name, diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index ba14e7c..6017375 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -293,16 +293,15 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, int rc; /* do not allow file creation in foreign dir */ - if ((it->it_op & IT_CREAT) && op_data->op_mea1 && - op_data->op_mea1->lsm_md_magic == LMV_MAGIC_FOREIGN) + if ((it->it_op & IT_CREAT) && lmv_dir_foreign(op_data->op_mea1)) return -ENODATA; if ((it->it_op & IT_CREAT) && !(flags & MDS_OPEN_BY_FID)) { /* don't allow create under dir with bad hash */ - if (lmv_is_dir_bad_hash(op_data->op_mea1)) + if (lmv_dir_bad_hash(op_data->op_mea1)) return -EBADF; - if (lmv_is_dir_migrating(op_data->op_mea1)) { + if (lmv_dir_migrating(op_data->op_mea1)) { if (flags & O_EXCL) { /* * open(O_CREAT | O_EXCL) needs to check @@ -311,8 +310,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, * file under old layout, check old layout on * client side. */ - tgt = lmv_locate_tgt(lmv, op_data, - &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -348,7 +346,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, * without name, but we can set it to child fid, and MDT * will obtain it from linkea in open in such case. */ - if (op_data->op_mea1) + if (lmv_dir_striped(op_data->op_mea1)) op_data->op_fid1 = op_data->op_fid2; tgt = lmv_find_target(lmv, &op_data->op_fid2); @@ -361,7 +359,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, LASSERT(fid_is_zero(&op_data->op_fid2)); LASSERT(op_data->op_name); - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); } @@ -448,8 +446,7 @@ static int lmv_intent_lookup(struct obd_export *exp, int rc; /* foreign dir is not striped */ - if (op_data->op_mea1 && - op_data->op_mea1->lsm_md_magic == LMV_MAGIC_FOREIGN) { + if (lmv_dir_foreign(op_data->op_mea1)) { /* only allow getattr/lookup for itself */ if (op_data->op_name) return -ENODATA; @@ -457,7 +454,7 @@ static int lmv_intent_lookup(struct obd_export *exp, } retry: - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -482,7 +479,7 @@ static int lmv_intent_lookup(struct obd_export *exp, * If RPC happens, lsm information will be revalidated * during update_inode process (see ll_update_lsm_md) */ - if (op_data->op_mea2) { + if (lmv_dir_striped(op_data->op_mea2)) { rc = lmv_revalidate_slaves(exp, op_data->op_mea2, cb_blocking, extra_lock_flags); diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index b4c5297..9974ec5 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -137,6 +137,8 @@ static inline int lmv_stripe_md_size(int stripe_count) u32 stripe_count = lsm->lsm_md_stripe_count; int stripe_index; + LASSERT(lmv_dir_striped(lsm)); + if (hash_type & LMV_HASH_FLAG_MIGRATION) { if (post_migrate) { hash_type &= ~LMV_HASH_FLAG_MIGRATION; @@ -166,26 +168,6 @@ static inline int lmv_stripe_md_size(int stripe_count) return &lsm->lsm_md_oinfo[stripe_index]; } -static inline bool lmv_is_dir_migrating(const struct lmv_stripe_md *lsm) -{ - return lsm ? lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION : false; -} - -static inline bool lmv_is_dir_bad_hash(const struct lmv_stripe_md *lsm) -{ - if (!lsm) - return false; - - if (lmv_is_dir_migrating(lsm)) { - if (lsm->lsm_md_stripe_count - lsm->lsm_md_migrate_offset > 1) - return !lmv_is_known_hash_type( - lsm->lsm_md_migrate_hash); - return false; - } - - return !lmv_is_known_hash_type(lsm->lsm_md_hash_type); -} - static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) { const struct lmv_stripe_md *lsm = op_data->op_mea1; @@ -193,12 +175,12 @@ static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) if (!lsm) return false; - if (lmv_is_dir_migrating(lsm) && !op_data->op_post_migrate) { + if (lmv_dir_migrating(lsm) && !op_data->op_post_migrate) { op_data->op_post_migrate = true; return true; } - if (lmv_is_dir_bad_hash(lsm) && + if (lmv_dir_bad_hash(lsm) && op_data->op_stripe_index < lsm->lsm_md_stripe_count - 1) { op_data->op_stripe_index++; return true; @@ -208,8 +190,8 @@ static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) } struct lmv_tgt_desc *lmv_locate_tgt(struct lmv_obd *lmv, - struct md_op_data *op_data, - struct lu_fid *fid); + struct md_op_data *op_data); + /* lproc_lmv.c */ int lmv_tunables_init(struct obd_device *obd); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 4365533..02dfd35 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1149,24 +1149,24 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, /** * This is _inode_ placement policy function (not name). */ -static int lmv_placement_policy(struct obd_device *obd, - struct md_op_data *op_data, u32 *mds) +static u32 lmv_placement_policy(struct obd_device *obd, + struct md_op_data *op_data) { struct lmv_obd *lmv = &obd->u.lmv; struct lmv_user_md *lum; + u32 mdt; - LASSERT(mds); - - if (lmv->desc.ld_tgt_count == 1) { - *mds = 0; + if (lmv->desc.ld_tgt_count == 1) return 0; - } lum = op_data->op_data; - /* Choose MDS by + /* + * Choose MDT by * 1. See if the stripe offset is specified by lum. - * 2. Then check if there is default stripe offset. - * 3. Finally choose MDS by name hash if the parent + * 2. If parent has default LMV, and its hash type is "space", choose + * MDT with QoS. (see lmv_locate_tgt_qos()). + * 3. Then check if default LMV stripe offset is not -1. + * 4. Finally choose MDS by name hash if the parent * is striped directory. (see lmv_locate_tgt()). * * presently explicit MDT location is not supported @@ -1177,18 +1177,22 @@ static int lmv_placement_policy(struct obd_device *obd, if (op_data->op_cli_flags & CLI_SET_MEA && lum && le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN) && le32_to_cpu(lum->lum_stripe_offset) != (u32)-1) { - *mds = le32_to_cpu(lum->lum_stripe_offset); + mdt = le32_to_cpu(lum->lum_stripe_offset); + } else if (op_data->op_code == LUSTRE_OPC_MKDIR && + !lmv_dir_striped(op_data->op_mea1) && + lmv_dir_space_hashed(op_data->op_default_mea1)) { + mdt = op_data->op_mds; } else if (op_data->op_code == LUSTRE_OPC_MKDIR && op_data->op_default_mea1 && op_data->op_default_mea1->lsm_md_master_mdt_index != - (u32)-1) { - *mds = op_data->op_default_mea1->lsm_md_master_mdt_index; - op_data->op_mds = *mds; + (u32)-1) { + mdt = op_data->op_default_mea1->lsm_md_master_mdt_index; + op_data->op_mds = mdt; } else { - *mds = op_data->op_mds; + mdt = op_data->op_mds; } - return 0; + return mdt; } int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds) @@ -1230,24 +1234,17 @@ int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp, { struct obd_device *obd = class_exp2obd(exp); struct lmv_obd *lmv = &obd->u.lmv; - u32 mds = 0; + u32 mds; int rc; LASSERT(op_data); LASSERT(fid); - rc = lmv_placement_policy(obd, op_data, &mds); - if (rc) { - CERROR("Can't get target for allocating fid, rc %d\n", - rc); - return rc; - } + mds = lmv_placement_policy(obd, op_data); rc = __lmv_fid_alloc(lmv, fid, mds); - if (rc) { + if (rc) CERROR("Can't alloc new fid, rc %d\n", rc); - return rc; - } return rc; } @@ -1588,20 +1585,30 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -struct lmv_tgt_desc* -__lmv_locate_tgt(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, - const char *name, int namelen, struct lu_fid *fid, u32 *mds, - bool post_migrate) +static struct lmv_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) +{ + static unsigned int rr_index; + + /* locate MDT round-robin is the first step */ + *mdt = rr_index % lmv->tgts_size; + rr_index++; + + return lmv->tgts[*mdt]; +} + +static struct lmv_tgt_desc * +lmv_locate_tgt_by_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, + const char *name, int namelen, struct lu_fid *fid, + u32 *mds, bool post_migrate) { const struct lmv_oinfo *oinfo; struct lmv_tgt_desc *tgt; - if (!lsm || namelen == 0) { + if (!lmv_dir_striped(lsm) || !namelen) { tgt = lmv_find_target(lmv, fid); if (IS_ERR(tgt)) return tgt; - LASSERT(mds); *mds = tgt->ltd_idx; return tgt; } @@ -1617,47 +1624,41 @@ struct lmv_tgt_desc* return ERR_CAST(oinfo); } - if (fid) - *fid = oinfo->lmo_fid; - if (mds) - *mds = oinfo->lmo_mds; - + *fid = oinfo->lmo_fid; + *mds = oinfo->lmo_mds; tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); - CDEBUG(D_INFO, "locate on mds %u " DFID "\n", oinfo->lmo_mds, - PFID(&oinfo->lmo_fid)); + CDEBUG(D_INODE, "locate MDT %u parent " DFID "\n", *mds, PFID(fid)); return tgt; } /** - * Locate mdt by fid or name + * Locate MDT of op_data->op_fid1 * * For striped directory, it will locate the stripe by name hash, if hash_type * is unknown, it will return the stripe specified by 'op_data->op_stripe_index' * which is set outside, and if dir is migrating, 'op_data->op_post_migrate' * indicates whether old or new layout is used to locate. * - * For normal direcotry, it will locate MDS by FID directly. + * For plain direcotry, normally it will locate MDT by FID, but if this + * directory has default LMV, and its hash type is "space", locate MDT with QoS. * * @lmv: LMV device * @op_data: client MD stack parameters, name, namelen * mds_num etc. - * @fid: object FID used to locate MDS. * * Returns: pointer to the lmv_tgt_desc if succeed. * ERR_PTR(errno) if failed. */ -struct lmv_tgt_desc* -lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data, - struct lu_fid *fid) +struct lmv_tgt_desc * +lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data) { struct lmv_stripe_md *lsm = op_data->op_mea1; struct lmv_oinfo *oinfo; struct lmv_tgt_desc *tgt; - /* foreign dir is not striped dir */ - if (lsm && lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) + if (lmv_dir_foreign(lsm)) return ERR_PTR(-ENODATA); /* @@ -1671,43 +1672,101 @@ struct lmv_tgt_desc* if (IS_ERR(tgt)) return tgt; - if (lsm) { + if (lmv_dir_striped(lsm)) { int i; /* refill the right parent fid */ for (i = 0; i < lsm->lsm_md_stripe_count; i++) { oinfo = &lsm->lsm_md_oinfo[i]; if (oinfo->lmo_mds == op_data->op_mds) { - *fid = oinfo->lmo_fid; + op_data->op_fid1 = oinfo->lmo_fid; break; } } if (i == lsm->lsm_md_stripe_count) - *fid = lsm->lsm_md_oinfo[0].lmo_fid; + op_data->op_fid1 = lsm->lsm_md_oinfo[0].lmo_fid; } - } else if (lmv_is_dir_bad_hash(lsm)) { + } else if (lmv_dir_bad_hash(lsm)) { LASSERT(op_data->op_stripe_index < lsm->lsm_md_stripe_count); oinfo = &lsm->lsm_md_oinfo[op_data->op_stripe_index]; - *fid = oinfo->lmo_fid; + op_data->op_fid1 = oinfo->lmo_fid; op_data->op_mds = oinfo->lmo_mds; - tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); + } else if (op_data->op_code == LUSTRE_OPC_MKDIR && + lmv_dir_space_hashed(op_data->op_default_mea1) && + !lmv_dir_striped(lsm)) { + tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); + /* + * only update statfs when mkdir under dir with "space" hash, + * this means the cached statfs may be stale, and current mkdir + * may not follow QoS accurately, but it's not serious, and it + * avoids periodic statfs when client doesn't mkdir under + * "space" hashed directories. + */ + if (!IS_ERR(tgt)) { + struct obd_device *obd; + + obd = container_of(lmv, struct obd_device, u.lmv); + lmv_statfs_check_update(obd, tgt); + } } else { - tgt = __lmv_locate_tgt(lmv, lsm, op_data->op_name, - op_data->op_namelen, fid, - &op_data->op_mds, - op_data->op_post_migrate); + tgt = lmv_locate_tgt_by_name(lmv, op_data->op_mea1, + op_data->op_name, op_data->op_namelen, + &op_data->op_fid1, &op_data->op_mds, + op_data->op_post_migrate); } return tgt; } -static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, - const void *data, size_t datalen, umode_t mode, - uid_t uid, gid_t gid, kernel_cap_t cap_effective, - u64 rdev, struct ptlrpc_request **request) +/* Locate MDT of op_data->op_fid2 for link/rename */ +static struct lmv_tgt_desc * +lmv_locate_tgt2(struct lmv_obd *lmv, struct md_op_data *op_data) +{ + struct lmv_tgt_desc *tgt; + int rc; + + LASSERT(op_data->op_name); + if (lmv_dir_migrating(op_data->op_mea2)) { + struct lu_fid fid1 = op_data->op_fid1; + struct lmv_stripe_md *lsm1 = op_data->op_mea1; + struct ptlrpc_request *request = NULL; + + /* + * avoid creating new file under old layout of migrating + * directory, check it here. + */ + tgt = lmv_locate_tgt_by_name(lmv, op_data->op_mea2, + op_data->op_name, op_data->op_namelen, + &op_data->op_fid2, &op_data->op_mds, false); + if (IS_ERR(tgt)) + return tgt; + + op_data->op_fid1 = op_data->op_fid2; + op_data->op_mea1 = op_data->op_mea2; + rc = md_getattr_name(tgt->ltd_exp, op_data, &request); + op_data->op_fid1 = fid1; + op_data->op_mea1 = lsm1; + if (!rc) { + ptlrpc_req_finished(request); + return ERR_PTR(-EEXIST); + } + + if (rc != -ENOENT) + return ERR_PTR(rc); + } + + return lmv_locate_tgt_by_name(lmv, op_data->op_mea2, op_data->op_name, + op_data->op_namelen, &op_data->op_fid2, + &op_data->op_mds, true); +} + +int lmv_create(struct obd_export *exp, struct md_op_data *op_data, + const void *data, size_t datalen, umode_t mode, uid_t uid, + gid_t gid, kernel_cap_t cap_effective, u64 rdev, + struct ptlrpc_request **request) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; @@ -1717,16 +1776,16 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (!lmv->desc.ld_active_tgt_count) return -EIO; - if (lmv_is_dir_bad_hash(op_data->op_mea1)) + if (lmv_dir_bad_hash(op_data->op_mea1)) return -EBADF; - if (lmv_is_dir_migrating(op_data->op_mea1)) { + if (lmv_dir_migrating(op_data->op_mea1)) { /* * if parent is migrating, create() needs to lookup existing * name, to avoid creating new file under old layout of * migrating directory, check old layout here. */ - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1743,7 +1802,7 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, op_data->op_post_migrate = true; } - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1765,8 +1824,6 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(tgt); op_data->op_mds = tgt->ltd_idx; - } else { - CDEBUG(D_CONFIG, "Server doesn't support striped dirs\n"); } CDEBUG(D_INODE, "CREATE obj " DFID " -> mds #%x\n", @@ -1818,7 +1875,7 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, int rc; retry: - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1916,39 +1973,7 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - if (lmv_is_dir_migrating(op_data->op_mea2)) { - struct lu_fid fid1 = op_data->op_fid1; - struct lmv_stripe_md *lsm1 = op_data->op_mea1; - - /* - * avoid creating new file under old layout of migrating - * directory, check it here. - */ - tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, op_data->op_name, - op_data->op_namelen, &op_data->op_fid2, - &op_data->op_mds, false); - tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - - op_data->op_fid1 = op_data->op_fid2; - op_data->op_mea1 = op_data->op_mea2; - rc = md_getattr_name(tgt->ltd_exp, op_data, request); - op_data->op_fid1 = fid1; - op_data->op_mea1 = lsm1; - if (!rc) { - ptlrpc_req_finished(*request); - *request = NULL; - return -EEXIST; - } - - if (rc != -ENOENT) - return rc; - } - - tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, op_data->op_name, - op_data->op_namelen, &op_data->op_fid2, - &op_data->op_mds, true); + tgt = lmv_locate_tgt2(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1992,7 +2017,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(parent_tgt)) return PTR_ERR(parent_tgt); - if (lsm) { + if (lmv_dir_striped(lsm)) { u32 hash_type = lsm->lsm_md_hash_type; u32 stripe_count = lsm->lsm_md_stripe_count; @@ -2000,7 +2025,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, * old stripes are appended after new stripes for migrating * directory. */ - if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + if (lmv_dir_migrating(lsm)) { hash_type = lsm->lsm_md_migrate_hash; stripe_count -= lsm->lsm_md_migrate_offset; } @@ -2010,7 +2035,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, if (rc < 0) return rc; - if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) + if (lmv_dir_migrating(lsm)) rc += lsm->lsm_md_migrate_offset; /* save it in fid4 temporarily for early cancel */ @@ -2024,7 +2049,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, * if parent is being migrated too, fill op_fid2 with target * stripe fid, otherwise the target stripe is not created yet. */ - if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION) { + if (lmv_dir_migrating(lsm)) { hash_type = lsm->lsm_md_hash_type & ~LMV_HASH_FLAG_MIGRATION; stripe_count = lsm->lsm_md_migrate_offset; @@ -2151,44 +2176,10 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - if (lmv_is_dir_migrating(op_data->op_mea2)) { - struct lu_fid fid1 = op_data->op_fid1; - struct lmv_stripe_md *lsm1 = op_data->op_mea1; - - /* - * we avoid creating new file under old layout of migrating - * directory, if there is an existing file with new name under - * old layout, we can't unlink file in old layout and rename to - * new layout in one transaction, so return -EBUSY here.` - */ - tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, new, newlen, - &op_data->op_fid2, &op_data->op_mds, - false); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - - op_data->op_fid1 = op_data->op_fid2; - op_data->op_mea1 = op_data->op_mea2; - op_data->op_name = new; - op_data->op_namelen = newlen; - rc = md_getattr_name(tgt->ltd_exp, op_data, request); - op_data->op_fid1 = fid1; - op_data->op_mea1 = lsm1; - op_data->op_name = NULL; - op_data->op_namelen = 0; - if (!rc) { - ptlrpc_req_finished(*request); - *request = NULL; - return -EBUSY; - } + op_data->op_name = new; + op_data->op_namelen = newlen; - if (rc != -ENOENT) - return rc; - } - - /* rename to new layout for migrating directory */ - tp_tgt = __lmv_locate_tgt(lmv, op_data->op_mea2, new, newlen, - &op_data->op_fid2, &op_data->op_mds, true); + tp_tgt = lmv_locate_tgt2(lmv, op_data); if (IS_ERR(tp_tgt)) return PTR_ERR(tp_tgt); @@ -2240,10 +2231,10 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, return rc; } + op_data->op_name = old; + op_data->op_namelen = oldlen; retry: - sp_tgt = __lmv_locate_tgt(lmv, op_data->op_mea1, old, oldlen, - &op_data->op_fid1, &op_data->op_mds, - op_data->op_post_migrate); + sp_tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(sp_tgt)) return PTR_ERR(sp_tgt); @@ -2710,16 +2701,14 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, struct md_callback *cb_op, u64 offset, struct page **ppage) { - struct lmv_stripe_md *lsm = op_data->op_mea1; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - if (unlikely(lsm)) { - /* foreign dir is not striped dir */ - if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) - return -ENODATA; + if (unlikely(lmv_dir_foreign(op_data->op_mea1))) + return -ENODATA; + if (unlikely(lmv_dir_striped(op_data->op_mea1))) { return lmv_striped_read_page(exp, op_data, cb_op, offset, ppage); } @@ -2770,7 +2759,7 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, op_data->op_cap = current_cap(); retry: - parent_tgt = lmv_locate_tgt(lmv, op_data, &op_data->op_fid1); + parent_tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(parent_tgt)) return PTR_ERR(parent_tgt); @@ -3060,7 +3049,7 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp, return 0; } - if (lsm->lsm_md_magic == LMV_MAGIC) { + if (lmv_dir_striped(lsm)) { for (i = 0; i < lsm->lsm_md_stripe_count; i++) { if (lsm->lsm_md_oinfo[i].lmo_root) iput(lsm->lsm_md_oinfo[i].lmo_root); @@ -3343,7 +3332,8 @@ static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, { const struct lmv_oinfo *oinfo; - LASSERT(lsm); + LASSERT(lmv_dir_striped(lsm)); + oinfo = lsm_name_to_stripe_info(lsm, name, namelen, false); if (IS_ERR(oinfo)) return PTR_ERR(oinfo); @@ -3408,8 +3398,7 @@ static int lmv_merge_attr(struct obd_export *exp, { int rc, i; - /* foreign dir is not striped dir */ - if (lsm->lsm_md_magic == LMV_MAGIC_FOREIGN) + if (!lmv_dir_striped(lsm)) return 0; rc = lmv_revalidate_slaves(exp, lsm, cb_blocking, 0); From patchwork Thu Feb 27 21:13:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410187 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D12D0138D for ; Thu, 27 Feb 2020 21:32:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B984524677 for ; Thu, 27 Feb 2020 21:32:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B984524677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4A832349100; Thu, 27 Feb 2020 13:27:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3BEC521FEFD for ; Thu, 27 Feb 2020 13:19:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0B84C8A50; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 09AB346F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:03 -0500 Message-Id: <1582838290-17243-316-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 315/622] lustre: llite: check correct size in ll_dom_finish_open() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The check in ll_dom_finish_open() for data end shouldn't use i_size for comparision because it may be not updated yet with just returned data from server. Use size value in mdt_body from reply for that check. WC-bug-id: https://jira.whamcloud.com/browse/LU-12014 Lustre-commit: 7b9fd576f7de ("LU-12014 llite: check correct size in ll_dom_finish_open()") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/33895 Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 50220eb..88d5c2d 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -418,6 +418,7 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, struct address_space *mapping = inode->i_mapping; struct page *vmpage; struct niobuf_remote *rnb; + struct mdt_body *body; char *data; unsigned long index, start; struct niobuf_local lnb; @@ -441,18 +442,19 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, if (rnb->rnb_offset % PAGE_SIZE) return; - /* Server returns whole file or just file tail if it fills in - * reply buffer, in both cases total size should be inode size. + /* Server returns whole file or just file tail if it fills in reply + * buffer, in both cases total size should be equal to the file size. */ - if (rnb->rnb_offset + rnb->rnb_len < i_size_read(inode)) { - CERROR("%s: server returns off/len %llu/%u < i_size %llu\n", + body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); + if (rnb->rnb_offset + rnb->rnb_len != body->mbo_dom_size) { + CERROR("%s: server returns off/len %llu/%u but size %llu\n", ll_i2sbi(inode)->ll_fsname, rnb->rnb_offset, - rnb->rnb_len, i_size_read(inode)); + rnb->rnb_len, body->mbo_dom_size); return; } - CDEBUG(D_INFO, "Get data along with open at %llu len %i, i_size %llu\n", - rnb->rnb_offset, rnb->rnb_len, i_size_read(inode)); + CDEBUG(D_INFO, "Get data along with open at %llu len %i, size %llu\n", + rnb->rnb_offset, rnb->rnb_len, body->mbo_dom_size); data = (char *)rnb + sizeof(*rnb); From patchwork Thu Feb 27 21:13:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410209 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3387B138D for ; Thu, 27 Feb 2020 21:32:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1C2DB24677 for ; Thu, 27 Feb 2020 21:32:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C2DB24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 40E46349B28; Thu, 27 Feb 2020 13:27:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D33621FF02 for ; Thu, 27 Feb 2020 13:19:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0DD9F8A51; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0CA9E468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:04 -0500 Message-Id: <1582838290-17243-317-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 316/622] lnet: recovery event handling broken X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Don't increment health on unlink event. If a SEND fails an unlink will follow so no need to do any special processing on SEND event. If SEND succeeds then we wait for the reply. When queuing a message on the NI recovery queue only do so if the MT thread is still running. WC-bug-id: https://jira.whamcloud.com/browse/LU-12080 Lustre-commit: 5409e620e025 ("LU-12080 lnet: recovery event handling broken") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34445 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 9 +++++---- net/lnet/lnet/lib-msg.c | 5 +++++ 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 809d2b6..a6df9ba 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3197,7 +3197,7 @@ struct lnet_mt_event_info { static void lnet_handle_recovery_reply(struct lnet_mt_event_info *ev_info, - int status) + int status, bool unlink_event) { lnet_nid_t nid = ev_info->mt_nid; @@ -3228,7 +3228,8 @@ struct lnet_mt_event_info { * carry forward too much information. * In the peer case, it'll naturally be incremented */ - lnet_inc_healthv(&ni->ni_healthv); + if (!unlink_event) + lnet_inc_healthv(&ni->ni_healthv); } else { struct lnet_peer_ni *lpni; int cpt; @@ -3273,14 +3274,14 @@ struct lnet_mt_event_info { libcfs_nid2str(ev_info->mt_nid)); /* fall-through */ case LNET_EVENT_REPLY: - lnet_handle_recovery_reply(ev_info, event->status); + lnet_handle_recovery_reply(ev_info, event->status, + event->type == LNET_EVENT_UNLINK); break; case LNET_EVENT_SEND: CDEBUG(D_NET, "%s recovery message sent %s:%d\n", libcfs_nid2str(ev_info->mt_nid), (event->status) ? "unsuccessfully" : "successfully", event->status); - lnet_handle_recovery_reply(ev_info, event->status); break; default: CERROR("Unexpected event: %d\n", event->type); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 0738bf7..146e23c 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -521,6 +521,11 @@ return; lnet_net_lock(0); + /* the mt could've shutdown and cleaned up the queues */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(0); + return; + } lnet_handle_remote_failure_locked(lpni); lnet_net_unlock(0); } From patchwork Thu Feb 27 21:13:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410191 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D234F92A for ; Thu, 27 Feb 2020 21:32:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BA8E224677 for ; Thu, 27 Feb 2020 21:32:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA8E224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4902349A73; Thu, 27 Feb 2020 13:27:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C124D21FB3F for ; Thu, 27 Feb 2020 13:19:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 10C848A52; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0F66746A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:05 -0500 Message-Id: <1582838290-17243-318-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 317/622] lnet: clean mt_eqh properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata There is a scenario where you have a peer on your recovery queue that's down. So you keep pinging it, but every ping times out after 10 seconds. In the middle of these 10 seconds you perform a shutdown. First you try to do the rsp_tracker_clean. It goes through and calls MDUnlink on the MD related to that ping. But because the message has a ref count on the MD, it doesn't go away. The MD gets zombied. And just waits for lnet_md_unlink to be called in lnet_finalize(). Then you hit clean_peer_ni_recovery. We see the peer on the queue, we try to call Unlink on it, but when we lookup the MD using lnet_handle2md() we can't find it. Afterwards we try to clean up the EQ and it asserts. Even if we remove the assert we end up with a resource leak since the EQ is not actually freed since we won't call LNetEQFree() again. The solution is to pull the EQ create in the LNetNIInit() and deletion happens in lnet_unprepare. By this point all the remaining messages would've been finalized and all references on the EQ are gone, allowing us to clean it up properly WC-bug-id: https://jira.whamcloud.com/browse/LU-12080 Lustre-commit: 1065c8888e96 ("LU-12080 lnet: clean mt_eqh properly") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34477 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/lnet/api-ni.c | 15 +++++++++++++++ net/lnet/lnet/lib-eq.c | 2 -- net/lnet/lnet/lib-move.c | 13 +------------ 4 files changed, 18 insertions(+), 14 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index a6e64f6..10922ae 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -513,6 +513,8 @@ struct lnet_ni * int lnet_lib_init(void); void lnet_lib_exit(void); +void lnet_mt_event_handler(struct lnet_event *event); + int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, int alive, time64_t when); void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e5f5c6c..1388bd4 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1059,6 +1059,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq); INIT_LIST_HEAD(&the_lnet.ln_mt_peerNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); + LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); rc = lnet_descriptor_setup(); if (rc != 0) @@ -1126,6 +1127,8 @@ struct lnet_libhandle * static int lnet_unprepare(void) { + int rc; + /* * NB no LNET_LOCK since this is the last reference. All LND instances * have shut down already, so it is safe to unlink and free all @@ -1138,6 +1141,12 @@ struct lnet_libhandle * LASSERT(list_empty(&the_lnet.ln_test_peers)); LASSERT(list_empty(&the_lnet.ln_nets)); + if (!LNetEQHandleIsInvalid(the_lnet.ln_mt_eqh)) { + rc = LNetEQFree(the_lnet.ln_mt_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); + LASSERT(rc == 0); + } + lnet_portals_destroy(); if (the_lnet.ln_md_containers) { @@ -2503,6 +2512,12 @@ void lnet_lib_exit(void) lnet_ping_target_update(pbuf, ping_mdh); + rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); + if (rc != 0) { + CERROR("Can't allocate monitor thread EQ: %d\n", rc); + goto err_stop_ping; + } + rc = lnet_monitor_thr_start(); if (rc) goto err_stop_ping; diff --git a/net/lnet/lnet/lib-eq.c b/net/lnet/lnet/lib-eq.c index 3d99f0a..01b8ee3 100644 --- a/net/lnet/lnet/lib-eq.c +++ b/net/lnet/lnet/lib-eq.c @@ -164,8 +164,6 @@ int size = 0; int i; - LASSERT(the_lnet.ln_refcount > 0); - lnet_res_lock(LNET_LOCK_EX); /* * NB: hold lnet_eq_wait_lock for EQ link/unlink, so we can do diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index a6df9ba..7c135c4 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3254,7 +3254,7 @@ struct lnet_mt_event_info { } } -static void +void lnet_mt_event_handler(struct lnet_event *event) { struct lnet_mt_event_info *ev_info = event->md.user_ptr; @@ -3333,12 +3333,6 @@ int lnet_monitor_thr_start(void) if (rc) goto clean_queues; - rc = LNetEQAlloc(0, lnet_mt_event_handler, &the_lnet.ln_mt_eqh); - if (rc != 0) { - CERROR("Can't allocate monitor thread EQ: %d\n", rc); - goto clean_queues; - } - /* Pre monitor thread start processing */ rc = lnet_router_pre_mt_start(); if (rc) @@ -3371,7 +3365,6 @@ int lnet_monitor_thr_start(void) lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); - LNetEQFree(the_lnet.ln_mt_eqh); LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); return rc; clean_queues: @@ -3384,8 +3377,6 @@ int lnet_monitor_thr_start(void) void lnet_monitor_thr_stop(void) { - int rc; - if (the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) return; @@ -3405,8 +3396,6 @@ void lnet_monitor_thr_stop(void) lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); lnet_clean_resendqs(); - rc = LNetEQFree(the_lnet.ln_mt_eqh); - LASSERT(rc == 0); } void From patchwork Thu Feb 27 21:13:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410195 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8217B138D for ; Thu, 27 Feb 2020 21:32:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 687DA24677 for ; Thu, 27 Feb 2020 21:32:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 687DA24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B9450348A72; Thu, 27 Feb 2020 13:27:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 24A1821FC02 for ; Thu, 27 Feb 2020 13:19:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 13E1F8A53; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 123B846C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:06 -0500 Message-Id: <1582838290-17243-319-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 318/622] lnet: handle remote health error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a peer is dead set the health status to REMOTE_DROPPED in order to handle health properly for the peer. When dropping a routed message set REMOTE_ERROR. Routed messages are dropped when the routing feature is turned off which could be considered a configuration error if it happens in the middle of traffic. Therefore, it's better to flag this issue at this point without resending the message. WC-bug-id: https://jira.whamcloud.com/browse/LU-12344 Lustre-commit: b45e3d96fc4d ("LU-12344 lnet: handle remote health error") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34967 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 7c135c4..8eeb5ec 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -770,7 +770,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); - msg->msg_health_status = LNET_MSG_STATUS_LOCAL_DROPPED; + msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED; if (do_send) lnet_finalize(msg, -EHOSTUNREACH); @@ -786,6 +786,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, libcfs_id2str(msg->msg_target)); if (do_send) { msg->msg_no_resend = true; + CDEBUG(D_NET, + "msg %p to %s canceled and will not be resent\n", + msg, libcfs_id2str(msg->msg_target)); lnet_finalize(msg, -ECANCELED); } @@ -1065,6 +1068,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, 0, 0, 0, msg->msg_hdr.payload_length); list_del_init(&msg->msg_list); msg->msg_no_resend = true; + msg->msg_health_status = LNET_MSG_STATUS_REMOTE_ERROR; lnet_finalize(msg, -ECANCELED); } From patchwork Thu Feb 27 21:13:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410199 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 858E2138D for ; Thu, 27 Feb 2020 21:32:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6DBAF24677 for ; Thu, 27 Feb 2020 21:32:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DBAF24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6AA1D348B99; Thu, 27 Feb 2020 13:27:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 675FD21FCC3 for ; Thu, 27 Feb 2020 13:19:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 165308A54; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1517846D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:07 -0500 Message-Id: <1582838290-17243-320-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 319/622] lnet: setup health timeout defaults X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Enable health feature by default. Setup transaction timeout to a default 10 seconds and retry count to 3 when health is enabled. When health is disabled set default transaction timeout to 50. When toggling between health enabled/disabled the defaults will always kick in. WC-bug-id: https://jira.whamcloud.com/browse/LU-11816 Lustre-commit: 8632e94aeb7e ("LU-11816 lnet: setup health timeout defaults") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34252 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 55 +++++++++++++++++++++++++------------------------- 1 file changed, 28 insertions(+), 27 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 1388bd4..aeb9d92 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -79,10 +79,10 @@ struct lnet the_lnet = { "NUMA range to consider during Multi-Rail selection"); /* lnet_health_sensitivity determines by how much we decrement the health - * value on sending error. The value defaults to 0, which means health - * checking is turned off by default. + * value on sending error. The value defaults to 100, which means health + * interface health is decremented by 100 points every failure. */ -unsigned int lnet_health_sensitivity; +unsigned int lnet_health_sensitivity = 100; static int sensitivity_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_health_sensitivity = { .set = sensitivity_set, @@ -140,7 +140,10 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_drop_asym_route, "Set to 1 to drop asymmetrical route messages."); -unsigned int lnet_transaction_timeout = 50; +#define LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT 50 +#define LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT 10 + +unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT; static int transaction_to_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_transaction_timeout = { .set = transaction_to_set, @@ -153,7 +156,8 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_transaction_timeout, "Maximum number of seconds to wait for a peer response."); -unsigned int lnet_retry_count; +#define LNET_RETRY_COUNT_HEALTH_DEFAULT 3 +unsigned int lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT; static int retry_count_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_retry_count = { .set = retry_count_set, @@ -201,11 +205,6 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (the_lnet.ln_state != LNET_STATE_RUNNING) { - mutex_unlock(&the_lnet.ln_api_mutex); - return 0; - } - if (value > LNET_MAX_HEALTH_VALUE) { mutex_unlock(&the_lnet.ln_api_mutex); CERROR("Invalid health value. Maximum: %d value = %lu\n", @@ -213,6 +212,22 @@ static int lnet_discover(struct lnet_process_id id, u32 force, return -EINVAL; } + /* if we're turning on health then use the health timeout + * defaults. + */ + if (*sensitivity == 0 && value != 0) { + lnet_transaction_timeout = + LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT; + lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT; + /* if we're turning off health then use the no health timeout + * default. + */ + } else if (*sensitivity != 0 && value == 0) { + lnet_transaction_timeout = + LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT; + lnet_retry_count = 0; + } + *sensitivity = value; mutex_unlock(&the_lnet.ln_api_mutex); @@ -243,11 +258,6 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (the_lnet.ln_state != LNET_STATE_RUNNING) { - mutex_unlock(&the_lnet.ln_api_mutex); - return 0; - } - *interval = value; mutex_unlock(&the_lnet.ln_api_mutex); @@ -353,11 +363,6 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (the_lnet.ln_state != LNET_STATE_RUNNING) { - mutex_unlock(&the_lnet.ln_api_mutex); - return 0; - } - if (value < lnet_retry_count || value == 0) { mutex_unlock(&the_lnet.ln_api_mutex); CERROR("Invalid value for lnet_transaction_timeout (%lu). Has to be greater than lnet_retry_count (%u)\n", @@ -399,9 +404,10 @@ static int lnet_discover(struct lnet_process_id id, u32 force, */ mutex_lock(&the_lnet.ln_api_mutex); - if (the_lnet.ln_state != LNET_STATE_RUNNING) { + if (lnet_health_sensitivity == 0) { mutex_unlock(&the_lnet.ln_api_mutex); - return 0; + CERROR("Can not set retry_count when health feature is turned off\n"); + return -EINVAL; } if (value > lnet_transaction_timeout) { @@ -411,11 +417,6 @@ static int lnet_discover(struct lnet_process_id id, u32 force, return -EINVAL; } - if (value == *retry_count) { - mutex_unlock(&the_lnet.ln_api_mutex); - return 0; - } - *retry_count = value; if (value == 0) From patchwork Thu Feb 27 21:13:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410203 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66AFF92A for ; Thu, 27 Feb 2020 21:32:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4F59B24677 for ; Thu, 27 Feb 2020 21:32:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F59B24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B8900349193; Thu, 27 Feb 2020 13:27:39 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C10D0200D36 for ; Thu, 27 Feb 2020 13:19:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1973B8A55; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 17DCC46F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:08 -0500 Message-Id: <1582838290-17243-321-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 320/622] lnet: fix cpt locking X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata In lnet_select_pathway() the call to lnet_handle_send_case_locked() can result in sd_cpt being changed. If this function returns REPEAT_SEND, we'll go back to the again label. It is possible at this time to initiate discovery, which will unlock the cpt. If the local cpt isn't updated we could potentially be manipulating the wrong cpt resulting in some form of corruption or dead lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-12163 Lustre-commit: f6d63067e1ec ("LU-12163 lnet: fix cpt locking") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34607 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 8eeb5ec..0ee3a55 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2390,10 +2390,15 @@ struct lnet_ni * rc = lnet_handle_send_case_locked(&send_data); + /* Update the local cpt since send_data.sd_cpt might've been + * updated as a result of calling lnet_handle_send_case_locked(). + */ + cpt = send_data.sd_cpt; + if (rc == REPEAT_SEND) goto again; - lnet_net_unlock(send_data.sd_cpt); + lnet_net_unlock(cpt); return rc; } From patchwork Thu Feb 27 21:13:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410327 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA0B9138D for ; Thu, 27 Feb 2020 21:35:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF68224677 for ; Thu, 27 Feb 2020 21:35:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF68224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED10334A050; Thu, 27 Feb 2020 13:29:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E10F200D6A for ; Thu, 27 Feb 2020 13:19:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1C0108A57; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1AAF3468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:09 -0500 Message-Id: <1582838290-17243-322-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 321/622] lnet: detach response tracker X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata We need to unlink the response tracker from MDs even if the corresponding message failed to send. WC-bug-id: https://jira.whamcloud.com/browse/LU-12201 Lustre-commit: 1bb91b966d15 ("LU-12201 lnet: detach response tracker") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34770 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 146e23c..a245942 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -771,12 +771,7 @@ } if (unlink) { - /* if this is an ACK or a REPLY then make sure to remove the - * response tracker. - */ - if (msg->msg_ev.type == LNET_EVENT_REPLY || - msg->msg_ev.type == LNET_EVENT_ACK) - lnet_detach_rsp_tracker(msg->msg_md, cpt); + lnet_detach_rsp_tracker(md, cpt); lnet_md_unlink(md); } From patchwork Thu Feb 27 21:13:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410215 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C92E117E0 for ; Thu, 27 Feb 2020 21:32:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1B32246A5 for ; Thu, 27 Feb 2020 21:32:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1B32246A5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2FA4F349B56; Thu, 27 Feb 2020 13:27:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 52A0621CA5A for ; Thu, 27 Feb 2020 13:19:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1EA7F8A58; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1D8F646A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:10 -0500 Message-Id: <1582838290-17243-323-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 322/622] lnet: invalidate recovery ping mdh X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata For cleanliness, ensure that recovery ping mdh is invalidated when an peer ni or a local ni are allocated WC-bug-id: https://jira.whamcloud.com/browse/LU-11297 Lustre-commit: d7b5f3114d51 ("LU-11297 lnet: invalidate recovery ping mdh") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34771 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/config.c | 1 + net/lnet/lnet/peer.c | 1 + 2 files changed, 2 insertions(+) diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 5e0831a..760452c 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -443,6 +443,7 @@ struct lnet_net * spin_lock_init(&ni->ni_lock); INIT_LIST_HEAD(&ni->ni_netlist); INIT_LIST_HEAD(&ni->ni_recovery); + LNetInvalidateMDHandle(&ni->ni_ping_mdh); ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*ni->ni_refs[0])); if (!ni->ni_refs) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 24a5cd3..7b11f28 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -126,6 +126,7 @@ INIT_LIST_HEAD(&lpni->lpni_peer_nis); INIT_LIST_HEAD(&lpni->lpni_recovery); INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list); + LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh); spin_lock_init(&lpni->lpni_lock); From patchwork Thu Feb 27 21:13:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 247D5138D for ; Thu, 27 Feb 2020 21:32:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0CD49246A1 for ; Thu, 27 Feb 2020 21:32:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0CD49246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 089CE349BCC; Thu, 27 Feb 2020 13:27:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9D09F21FAE5 for ; Thu, 27 Feb 2020 13:19:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 217EA8A59; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 203FE46C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:11 -0500 Message-Id: <1582838290-17243-324-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 323/622] lnet: fix list corruption X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata In shutdown the resend queues are cleared and freed. The monitor thread state is set to shutdown. It is possible to get lnet_finalize() called after the queues are freed. The code checks for ln_state to see if we're shutting down. But in this case we should really be checking ln_mt_state. The monitor thread is the one that matters in this case, because it's the one which allocates and frees the resend queues. WC-bug-id: https://jira.whamcloud.com/browse/LU-12249 Lustre-commit: d799ac910cd6 ("LU-12249 lnet: fix list corruption") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34778 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 10 ++++++++++ net/lnet/lnet/lib-msg.c | 8 +++++++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 0ee3a55..8bce3a9 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3135,7 +3135,9 @@ struct lnet_mt_event_info { lnet_prune_rc_data(1); /* Shutting down */ + lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_net_unlock(LNET_LOCK_EX); /* signal that the monitor thread is exiting */ complete(&the_lnet.ln_mt_signal); @@ -3349,7 +3351,9 @@ int lnet_monitor_thr_start(void) init_completion(&the_lnet.ln_mt_signal); + lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_RUNNING; + lnet_net_unlock(LNET_LOCK_EX); task = kthread_run(lnet_monitor_thread, NULL, "monitor_thread"); if (IS_ERR(task)) { rc = PTR_ERR(task); @@ -3363,13 +3367,17 @@ int lnet_monitor_thr_start(void) return 0; clean_thread: + lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + lnet_net_unlock(LNET_LOCK_EX); /* block until event callback signals exit */ wait_for_completion(&the_lnet.ln_mt_signal); /* clean up */ lnet_router_cleanup(); free_mem: + lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; + lnet_net_unlock(LNET_LOCK_EX); lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); @@ -3390,7 +3398,9 @@ void lnet_monitor_thr_stop(void) return; LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); + lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_STOPPING; + lnet_net_unlock(LNET_LOCK_EX); /* tell the monitor thread that we're shutting down */ wake_up(&the_lnet.ln_mt_waitq); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index a245942..ad35c3d 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -604,7 +604,7 @@ bool lo = false; /* if we're shutting down no point in handling health. */ - if (the_lnet.ln_state != LNET_STATE_RUNNING) + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) return -1; LASSERT(msg->msg_txni); @@ -712,6 +712,12 @@ lnet_net_lock(msg->msg_tx_cpt); + /* check again under lock */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(msg->msg_tx_cpt); + return -1; + } + /* remove message from the active list and reset it in preparation * for a resend. Two exception to this * From patchwork Thu Feb 27 21:13:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410331 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D01092A for ; Thu, 27 Feb 2020 21:35:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15C2424677 for ; Thu, 27 Feb 2020 21:35:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15C2424677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2FFF534A06A; Thu, 27 Feb 2020 13:29:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0310B21CB07 for ; Thu, 27 Feb 2020 13:19:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 249358A5A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 22FEE46D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:12 -0500 Message-Id: <1582838290-17243-325-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 324/622] lnet: correct discovery LNetEQFree() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The EQ needs to be freed after all the queues are cleaned to avoid having non-processed events on the event queue on free. This will prevent the memory from being freed. WC-bug-id: https://jira.whamcloud.com/browse/LU-12254 Lustre-commit: a0879b5985b4 ("LU-12254 lnet: correct discovery LNetEQFree()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34796 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 7b11f28..8af9db2 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3142,8 +3142,6 @@ static int lnet_peer_discovery(void *arg) * size of the thundering herd if there are multiple threads * waiting on discovery of a single peer. */ - LNetEQFree(the_lnet.ln_dc_eqh); - LNetInvalidateEQHandle(&the_lnet.ln_dc_eqh); /* Queue cleanup 1: stop all pending pings and pushes. */ lnet_net_lock(LNET_LOCK_EX); @@ -3171,6 +3169,9 @@ static int lnet_peer_discovery(void *arg) } lnet_net_unlock(LNET_LOCK_EX); + LNetEQFree(the_lnet.ln_dc_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_dc_eqh); + the_lnet.ln_dc_state = LNET_DC_STATE_SHUTDOWN; wake_up(&the_lnet.ln_dc_waitq); From patchwork Thu Feb 27 21:13:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410207 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB747138D for ; Thu, 27 Feb 2020 21:32:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B43CA24677 for ; Thu, 27 Feb 2020 21:32:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B43CA24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 08A47349B19; Thu, 27 Feb 2020 13:27:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4C87121CB73 for ; Thu, 27 Feb 2020 13:19:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 277978A5B; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 25C2146F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:13 -0500 Message-Id: <1582838290-17243-326-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 325/622] lnet: Protect lp_dc_pendq manipulation with lp_lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Protect the peer discovery queue from concurrent manipulation by acquiring the lp_lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-12264 Lustre-commit: dd16a31bf4ae ("LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34798 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 2 ++ net/lnet/lnet/peer.c | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 8bce3a9..de5951a 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2336,7 +2336,9 @@ struct lnet_ni * /* queue message and return */ msg->msg_rtr_nid_param = rtr_nid; msg->msg_sending = 0; + spin_lock(&peer->lp_lock); list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + spin_unlock(&peer->lp_lock); lnet_peer_ni_decref_locked(lpni); primary_nid = peer->lp_primary_nid; lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 8af9db2..0d2d356 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -254,7 +254,9 @@ * Releasing the lock can cause an inconsistent state */ spin_lock(&the_lnet.ln_msg_resend_lock); + spin_lock(&lp->lp_lock); list_splice(&lp->lp_dc_pendq, &the_lnet.ln_msg_resend); + spin_unlock(&lp->lp_lock); spin_unlock(&the_lnet.ln_msg_resend_lock); wake_up(&the_lnet.ln_dc_waitq); @@ -1778,7 +1780,9 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp) libcfs_nid2str(lp->lp_primary_nid)); list_del_init(&lp->lp_dc_list); + spin_lock(&lp->lp_lock); list_splice_init(&lp->lp_dc_pendq, &pending_msgs); + spin_unlock(&lp->lp_lock); wake_up_all(&lp->lp_dc_waitq); lnet_net_unlock(LNET_LOCK_EX); From patchwork Thu Feb 27 21:13:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410211 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7855692A for ; Thu, 27 Feb 2020 21:32:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 608AB24677 for ; Thu, 27 Feb 2020 21:32:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 608AB24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 29C2D349B3F; Thu, 27 Feb 2020 13:27:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F71921CB97 for ; Thu, 27 Feb 2020 13:19:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 29F298A5C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 28802468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:14 -0500 Message-Id: <1582838290-17243-327-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 326/622] lnet: Ensure md is detached when msg is not committed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn It's possible for lnet_is_health_check() to return "true" when the message has not hit the network. In this situation the message is freed without detaching the MD. As a result, requests do not receive their unlink events and these requests are stuck forever. A little cleanup is included here: - The value of lnet_is_health_check() is only used in one place, so we don't need to save the result of it in a variable. - We don't need separate logic to detach the md when the send was successful. We'll fall through to the finalizing code after incrementing the health counters Cray-bug-id: LUS-7239 WC-bug-id: https://jira.whamcloud.com/browse/LU-12199 Lustre-commit: b65f3a1767ae ("LU-12199 lnet: Ensure md is detached when msg is not committed") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34885 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 66 +++++++++++++++---------------------------------- 1 file changed, 20 insertions(+), 46 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index ad35c3d..dbd8de4 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -784,16 +784,6 @@ msg->msg_md = NULL; } -static void -lnet_detach_md(struct lnet_msg *msg, int status) -{ - int cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); - - lnet_res_lock(cpt); - lnet_msg_detach_md(msg, cpt, status); - lnet_res_unlock(cpt); -} - static bool lnet_is_health_check(struct lnet_msg *msg) { @@ -881,7 +871,6 @@ int cpt; int rc; int i; - bool hc; LASSERT(!in_interrupt()); @@ -890,36 +879,7 @@ msg->msg_ev.status = status; - /* if the message is successfully sent, no need to keep the MD around */ - if (msg->msg_md && !status) - lnet_detach_md(msg, status); - -again: - hc = lnet_is_health_check(msg); - - /* the MD would've been detached from the message if it was - * successfully sent. However, if it wasn't successfully sent the - * MD would be around. And since we recalculate whether to - * health check or not, it's possible that we change our minds and - * we don't want to health check this message. In this case also - * free the MD. - * - * If the message is successful we're going to - * go through the lnet_health_check() function, but that'll just - * increment the appropriate health value and return. - */ - if (msg->msg_md && !hc) - lnet_detach_md(msg, status); - - rc = 0; - if (!msg->msg_tx_committed && !msg->msg_rx_committed) { - /* not committed to network yet */ - LASSERT(!msg->msg_onactivelist); - kfree(msg); - return; - } - - if (hc) { + if (lnet_is_health_check(msg)) { /* Check the health status of the message. If it has one * of the errors that we're supposed to handle, and it has * not timed out, then @@ -932,13 +892,26 @@ * put on the resend queue. */ if (!lnet_health_check(msg)) + /* Message is queued for resend */ return; + } - /* if we get here then we need to clean up the md because we're - * finalizing the message. - */ - if (msg->msg_md) - lnet_detach_md(msg, status); + /* We're not going to resend this message so detach its MD and invoke + * the appropriate callbacks + */ + if (msg->msg_md) { + cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + lnet_res_lock(cpt); + lnet_msg_detach_md(msg, cpt, status); + lnet_res_unlock(cpt); + } + +again: + if (!msg->msg_tx_committed && !msg->msg_rx_committed) { + /* not committed to network yet */ + LASSERT(!msg->msg_onactivelist); + kfree(msg); + return; } /* @@ -972,6 +945,7 @@ container->msc_finalizers[my_slot] = current; + rc = 0; while ((msg = list_first_entry_or_null(&container->msc_finalizing, struct lnet_msg, msg_list)) != NULL) { From patchwork Thu Feb 27 21:13:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E138924 for ; Thu, 27 Feb 2020 21:45:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 063C724690 for ; Thu, 27 Feb 2020 21:45:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 063C724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 16E4B34AF99; Thu, 27 Feb 2020 13:36:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E654821FAE5 for ; Thu, 27 Feb 2020 13:19:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2C9F98A5D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2B47246A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:15 -0500 Message-Id: <1582838290-17243-328-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 327/622] lnet: verify msg is commited for send/recv X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Before performing a health check make sure the message is committed for either send or receive. Otherwise we can just finalize it. WC-bug-id: https://jira.whamcloud.com/browse/LU-12199 Lustre-commit: fc6b321036f3 ("LU-12199 lnet: verify msg is commited for send/recv") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34797 Reviewed-by: Chris Horn Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index dbd8de4..e4253de 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -790,6 +790,20 @@ bool hc; int status = msg->msg_ev.status; + if ((!msg->msg_tx_committed && !msg->msg_rx_committed) || + !msg->msg_onactivelist) { + CDEBUG(D_NET, "msg %p not committed for send or receive\n", + msg); + return false; + } + + if ((msg->msg_tx_committed && !msg->msg_txpeer) || + (msg->msg_rx_committed && !msg->msg_rxpeer)) { + CDEBUG(D_NET, "msg %p failed too early to retry and send\n", + msg); + return false; + } + /* perform a health check for any message committed for transmit */ hc = msg->msg_tx_committed; From patchwork Thu Feb 27 21:13:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58835924 for ; Thu, 27 Feb 2020 21:45:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 410AF24690 for ; Thu, 27 Feb 2020 21:45:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 410AF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E1D8534AFC0; Thu, 27 Feb 2020 13:36:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 346B921FAE5 for ; Thu, 27 Feb 2020 13:19:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2F2998A5E; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2E00946C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:16 -0500 Message-Id: <1582838290-17243-329-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 328/622] lnet: select LO interface for sending X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata In the following scenario Lustre->LNetPrimaryNID with 0@lo Discover is initiated on 0@lo The peer is created with 0@lo and @ The interface health of the peer's @ is decremented LNetPut() to self selection algorithm selects 0@lo to send to This exposes an issue where we try and go through the peer credit management algorithm, but because there are no credits associated with 0@lo we end up indefinitely queuing the message. ptlrpc will then get stuck waiting for send completion on the message. This was exposed via conf-sanity 32a WC-bug-id: https://jira.whamcloud.com/browse/LU-12339 Lustre-commit: 69d1535ebdac ("LU-12339 lnet: select LO interface for sending") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34957 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 53 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 15 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index de5951a..75049ec 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -751,6 +751,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, LASSERT(!do_send || msg->msg_tx_delayed); LASSERT(!msg->msg_receiving); LASSERT(msg->msg_tx_committed); + /* can't get here if we're sending to the loopback interface */ + LASSERT(lp->lpni_nid != the_lnet.ln_loni->ni_nid); /* NB 'lp' is always the next hop */ if (!(msg->msg_target.pid & LNET_PID_USERFLAG) && @@ -1426,6 +1428,25 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, #define SRC_ANY_ROUTER_NMR_DST (SRC_ANY | REMOTE_DST | NMR_DST) static int +lnet_handle_lo_send(struct lnet_send_data *sd) +{ + struct lnet_msg *msg = sd->sd_msg; + int cpt = sd->sd_cpt; + + /* No send credit hassles with LOLND */ + lnet_ni_addref_locked(the_lnet.ln_loni, cpt); + msg->msg_hdr.dest_nid = cpu_to_le64(the_lnet.ln_loni->ni_nid); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = + cpu_to_le64(the_lnet.ln_loni->ni_nid); + msg->msg_target.nid = the_lnet.ln_loni->ni_nid; + lnet_msg_commit(msg, cpt); + msg->msg_txni = the_lnet.ln_loni; + + return LNET_CREDIT_OK; +} + +static int lnet_handle_send(struct lnet_send_data *sd) { struct lnet_ni *best_ni = sd->sd_best_ni; @@ -1733,7 +1754,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, sd->sd_best_ni->ni_net->net_id); } - if (sd->sd_best_lpni) + if (sd->sd_best_lpni && + sd->sd_best_lpni->lpni_nid == the_lnet.ln_loni->ni_nid) + return lnet_handle_lo_send(sd); + else if (sd->sd_best_lpni) return lnet_handle_send(sd); CERROR("can't send to %s. no NI on %s\n", @@ -2074,7 +2098,15 @@ struct lnet_ni * * try and see if we can reach it over another routed * network */ - if (sd->sd_best_lpni) { + if (sd->sd_best_lpni && + sd->sd_best_lpni->lpni_nid == the_lnet.ln_loni->ni_nid) { + /* in case we initially started with a routed + * destination, let's reset to local + */ + sd->sd_send_case &= ~REMOTE_DST; + sd->sd_send_case |= LOCAL_DST; + return lnet_handle_lo_send(sd); + } else if (sd->sd_best_lpni) { /* in case we initially started with a routed * destination, let's reset to local */ @@ -2284,19 +2316,12 @@ struct lnet_ni * * is no need to go through any selection. We can just shortcut * the entire process and send over lolnd */ + send_data.sd_msg = msg; + send_data.sd_cpt = cpt; if (LNET_NETTYP(LNET_NIDNET(dst_nid)) == LOLND) { - /* No send credit hassles with LOLND */ - lnet_ni_addref_locked(the_lnet.ln_loni, cpt); - msg->msg_hdr.dest_nid = cpu_to_le64(the_lnet.ln_loni->ni_nid); - if (!msg->msg_routing) - msg->msg_hdr.src_nid = - cpu_to_le64(the_lnet.ln_loni->ni_nid); - msg->msg_target.nid = the_lnet.ln_loni->ni_nid; - lnet_msg_commit(msg, cpt); - msg->msg_txni = the_lnet.ln_loni; + rc = lnet_handle_lo_send(&send_data); lnet_net_unlock(cpt); - - return LNET_CREDIT_OK; + return rc; } /* find an existing peer_ni, or create one and mark it as having been @@ -2376,7 +2401,6 @@ struct lnet_ni * send_case |= SND_RESP; /* assign parameters to the send_data */ - send_data.sd_msg = msg; send_data.sd_rtr_nid = rtr_nid; send_data.sd_src_nid = src_nid; send_data.sd_dst_nid = dst_nid; @@ -2387,7 +2411,6 @@ struct lnet_ni * send_data.sd_final_dst_lpni = lpni; send_data.sd_peer = peer; send_data.sd_md_cpt = md_cpt; - send_data.sd_cpt = cpt; send_data.sd_send_case = send_case; rc = lnet_handle_send_case_locked(&send_data); From patchwork Thu Feb 27 21:13:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410227 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE850138D for ; Thu, 27 Feb 2020 21:33:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B7136246A1 for ; Thu, 27 Feb 2020 21:33:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7136246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2A98349C0D; Thu, 27 Feb 2020 13:27:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A89F21FF0B for ; Thu, 27 Feb 2020 13:19:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3257A8A80; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 30B6646D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:17 -0500 Message-Id: <1582838290-17243-330-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 329/622] lnet: remove route add restriction X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Remove restriction with adding routes to the same remote network via two different gateways. WC-bug-id: https://jira.whamcloud.com/browse/LU-10153 Lustre-commit: 79ea6af86f57 ("LU-10153 lnet: remove route add restriction") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33447 Reviewed-by: Sonia Sharma Reviewed-by: Chris Horn Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 - net/lnet/lnet/api-ni.c | 10 --------- net/lnet/lnet/router.c | 49 ------------------------------------------- 3 files changed, 60 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 10922ae..534be2a 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -521,7 +521,6 @@ void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, time64_t when); int lnet_add_route(u32 net, u32 hops, lnet_nid_t gateway_nid, unsigned int priority); -int lnet_check_routes(void); int lnet_del_route(u32 net, lnet_nid_t gw_nid); void lnet_destroy_routes(void); int lnet_get_route(int idx, u32 *net, u32 *hops, diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index aeb9d92..d27e9a4 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2491,10 +2491,6 @@ void lnet_lib_exit(void) if (rc) goto err_shutdown_lndnis; - rc = lnet_check_routes(); - if (rc) - goto err_destroy_routes; - rc = lnet_rtrpools_alloc(im_a_router); if (rc) goto err_destroy_routes; @@ -3449,12 +3445,6 @@ u32 lnet_get_dlc_seq_locked(void) config->cfg_config_u.cfg_route.rtr_hop, config->cfg_nid, config->cfg_config_u.cfg_route.rtr_priority); - if (!rc) { - rc = lnet_check_routes(); - if (rc) - lnet_del_route(config->cfg_net, - config->cfg_nid); - } mutex_unlock(&the_lnet.ln_api_mutex); return rc; diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 78a8659..c00b9251 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -427,55 +427,6 @@ static void lnet_shuffle_seed(void) } int -lnet_check_routes(void) -{ - struct lnet_remotenet *rnet; - struct lnet_route *route; - struct lnet_route *route2; - int cpt; - struct list_head *rn_list; - int i; - - cpt = lnet_net_lock_current(); - - for (i = 0; i < LNET_REMOTE_NETS_HASH_SIZE; i++) { - rn_list = &the_lnet.ln_remote_nets_hash[i]; - list_for_each_entry(rnet, rn_list, lrn_list) { - route2 = NULL; - list_for_each_entry(route, &rnet->lrn_routes, lr_list) { - lnet_nid_t nid1; - lnet_nid_t nid2; - int net; - - if (!route2) { - route2 = route; - continue; - } - - if (route->lr_gateway->lpni_net == - route2->lr_gateway->lpni_net) - continue; - - nid1 = route->lr_gateway->lpni_nid; - nid2 = route2->lr_gateway->lpni_nid; - net = rnet->lrn_net; - - lnet_net_unlock(cpt); - - CERROR("Routes to %s via %s and %s not supported\n", - libcfs_net2str(net), - libcfs_nid2str(nid1), - libcfs_nid2str(nid2)); - return -EINVAL; - } - } - } - - lnet_net_unlock(cpt); - return 0; -} - -int lnet_del_route(u32 net, lnet_nid_t gw_nid) { struct lnet_peer_ni *gateway; From patchwork Thu Feb 27 21:13:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D68B6924 for ; Thu, 27 Feb 2020 21:48:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF43624690 for ; Thu, 27 Feb 2020 21:48:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF43624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E886434B7DA; Thu, 27 Feb 2020 13:38:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7A74B21FAC8 for ; Thu, 27 Feb 2020 13:21:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3669DA16A; Thu, 27 Feb 2020 16:18:22 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3378046F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:18 -0500 Message-Id: <1582838290-17243-331-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 330/622] lnet: Discover routers on first use X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Discover routers on first use. This brings the behavior when interacting with routers in line with when dealing with normal peers. WC-bug-id: https://jira.whamcloud.com/browse/LU-11292 Lustre-commit: c7f8215d74a2 ("LU-11292 lnet: Discover routers on first use") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33182 Reviewed-by: Chris Horn Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 101 +++++++++++++++++++++++++++++++---------------- 1 file changed, 67 insertions(+), 34 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 75049ec..e080580 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1224,7 +1224,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, static struct lnet_peer_ni * lnet_find_route_locked(struct lnet_net *net, u32 remote_net, - lnet_nid_t rtr_nid) + lnet_nid_t rtr_nid, struct lnet_route **use_route, + struct lnet_route **prev_route) { struct lnet_remotenet *rnet; struct lnet_route *route; @@ -1276,13 +1277,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lpni_best = lp; } - /* - * set sequence number on the best router to the latest sequence + 1 - * so we can round-robin all routers, it's race and inaccurate but - * harmless and functional - */ - if (best_route) - best_route->lr_seq = last_route->lr_seq + 1; + if (best_route) { + *use_route = best_route; + *prev_route = last_route; + } return lpni_best; } @@ -1798,16 +1796,52 @@ struct lnet_ni * } static int +lnet_initiate_peer_discovery(struct lnet_peer_ni *lpni, + struct lnet_msg *msg, lnet_nid_t rtr_nid, + int cpt) +{ + struct lnet_peer *peer; + lnet_nid_t primary_nid; + int rc; + + lnet_peer_ni_addref_locked(lpni); + + rc = lnet_discover_peer_locked(lpni, cpt, false); + if (rc) { + lnet_peer_ni_decref_locked(lpni); + return rc; + } + /* The peer may have changed. */ + peer = lpni->lpni_peer_net->lpn_peer; + /* queue message and return */ + msg->msg_rtr_nid_param = rtr_nid; + msg->msg_sending = 0; + msg->msg_txpeer = NULL; + spin_lock(&peer->lp_lock); + list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + spin_unlock(&peer->lp_lock); + lnet_peer_ni_decref_locked(lpni); + primary_nid = peer->lp_primary_nid; + + CDEBUG(D_NET, "msg %p delayed. %s pending discovery\n", + msg, libcfs_nid2str(primary_nid)); + + return LNET_DC_WAIT; +} + +static int lnet_handle_find_routed_path(struct lnet_send_data *sd, lnet_nid_t dst_nid, struct lnet_peer_ni **gw_lpni, struct lnet_peer **gw_peer) { + struct lnet_route *best_route = NULL; + struct lnet_route *last_route = NULL; struct lnet_peer_ni *gw; lnet_nid_t src_nid = sd->sd_src_nid; gw = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), - sd->sd_rtr_nid); + sd->sd_rtr_nid, &best_route, &last_route); if (!gw) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); @@ -1820,6 +1854,17 @@ struct lnet_ni * *gw_peer = gw->lpni_peer_net->lpn_peer; + /* Discover this gateway if it hasn't already been discovered. + * This means we might delay the message until discovery has + * completed + */ + if (lnet_msg_discovery(sd->sd_msg) && + !lnet_peer_is_uptodate(*gw_peer)) { + sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; + return lnet_initiate_peer_discovery(gw, sd->sd_msg, + sd->sd_rtr_nid, sd->sd_cpt); + } + if (!sd->sd_best_ni) sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, *gw_peer, @@ -1853,6 +1898,12 @@ struct lnet_ni * *gw_lpni = gw; + /* increment the route sequence number since now we're sure we're + * going to use it + */ + LASSERT(best_route && last_route); + best_route->lr_seq = last_route->lr_seq + 1; + return 0; } @@ -1889,7 +1940,7 @@ struct lnet_ni * rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, &gw_peer); - if (rc < 0) + if (rc) return rc; if (sd->sd_send_case & NMR_DST) @@ -2165,6 +2216,8 @@ struct lnet_ni * CERROR("Can't send response to %s. No route available\n", libcfs_nid2str(sd->sd_dst_nid)); return -EHOSTUNREACH; + } else if (rc > 0) { + return rc; } sd->sd_best_lpni = gw; @@ -2192,7 +2245,7 @@ struct lnet_ni * */ rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, &gw_peer); - if (rc < 0) + if (rc) return rc; sd->sd_send_case &= ~LOCAL_DST; @@ -2228,7 +2281,7 @@ struct lnet_ni * */ rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, &gw_peer); - if (rc < 0) + if (rc) return rc; /* set the best_ni we've chosen as the preferred one for @@ -2348,30 +2401,10 @@ struct lnet_ni * */ peer = lpni->lpni_peer_net->lpn_peer; if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { - lnet_nid_t primary_nid; - - rc = lnet_discover_peer_locked(lpni, cpt, false); - if (rc) { - lnet_peer_ni_decref_locked(lpni); - lnet_net_unlock(cpt); - return rc; - } - /* The peer may have changed. */ - peer = lpni->lpni_peer_net->lpn_peer; - /* queue message and return */ - msg->msg_rtr_nid_param = rtr_nid; - msg->msg_sending = 0; - spin_lock(&peer->lp_lock); - list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); - spin_unlock(&peer->lp_lock); + rc = lnet_initiate_peer_discovery(lpni, msg, rtr_nid, cpt); lnet_peer_ni_decref_locked(lpni); - primary_nid = peer->lp_primary_nid; lnet_net_unlock(cpt); - - CDEBUG(D_NET, "%s pending discovery\n", - libcfs_nid2str(primary_nid)); - - return LNET_DC_WAIT; + return rc; } lnet_peer_ni_decref_locked(lpni); From patchwork Thu Feb 27 21:13:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410231 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CFB892A for ; Thu, 27 Feb 2020 21:33:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15343246A1 for ; Thu, 27 Feb 2020 21:33:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15343246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E6012348B81; Thu, 27 Feb 2020 13:28:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E3F8821FCD3 for ; Thu, 27 Feb 2020 13:19:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 379C88A81; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 365C2468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:19 -0500 Message-Id: <1582838290-17243-332-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 331/622] lnet: use peer for gateway X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The routing code uses peer_ni for a gateway. However with Mulit-Rail a gateway could have multiple interfaces on several different networks. Instead of using a single peer_ni as the gateway we should be using the peer and let the MR selection code select the best peer_ni to send to. This patch moves the gateway from peer to peer_ni. Much of the code needs to be rewritten in the following patches to account for that change. This patch disables the routing features by disabling the code to add/delete routes. The asymmetric routing detection feature is also modified to use the MR routing WC-bug-id: https://jira.whamcloud.com/browse/LU-11298 Lustre-commit: 53f7b8b7a228 ("LU-11298 lnet: use peer for gateway") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33183 Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 19 +- include/linux/lnet/lib-types.h | 46 +-- net/lnet/lnet/lib-move.c | 215 +++++++----- net/lnet/lnet/peer.c | 17 +- net/lnet/lnet/router.c | 720 ++--------------------------------------- net/lnet/lnet/router_proc.c | 31 +- 6 files changed, 230 insertions(+), 818 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 534be2a..80f6f8c 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -92,15 +92,12 @@ static inline int lnet_is_route_alive(struct lnet_route *route) { - /* gateway is down */ - if (!route->lr_gateway->lpni_alive) - return 0; - /* no NI status, assume it's alive */ - if ((route->lr_gateway->lpni_ping_feats & - LNET_PING_FEAT_NI_STATUS) == 0) - return 1; - /* has NI status, check # down NIs */ - return route->lr_downis == 0; + /* TODO re-implement gateway alive indication */ + CDEBUG(D_NET, "TODO: reimplement routing. gateway = %s\n", + route->lr_gateway ? + libcfs_nid2str(route->lr_gateway->lp_primary_nid) : + "undefined"); + return 1; } static inline int lnet_is_wire_handle_none(struct lnet_handle_wire *wh) @@ -402,9 +399,9 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, } static inline int -lnet_isrouter(struct lnet_peer_ni *lp) +lnet_isrouter(struct lnet_peer_ni *lpni) { - return lp->lpni_rtr_refcount ? 1 : 0; + return lpni->lpni_peer_net->lpn_peer->lp_rtr_refcount ? 1 : 0; } static inline void diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index b1a6f6a..31fe22a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -534,20 +534,21 @@ struct lnet_peer_ni { struct list_head lpni_hashlist; /* messages blocking for tx credits */ struct list_head lpni_txq; - /* messages blocking for router credits */ - struct list_head lpni_rtrq; - /* chain on router list */ - struct list_head lpni_rtr_list; + /* pointer to peer net I'm part of */ + struct lnet_peer_net *lpni_peer_net; /* statistics kept on each peer NI */ struct lnet_element_stats lpni_stats; struct lnet_health_remote_stats lpni_hstats; - /* spin lock protecting credits and lpni_txq / lpni_rtrq */ + /* spin lock protecting credits and lpni_txq */ spinlock_t lpni_lock; /* # tx credits available */ int lpni_txcredits; - struct lnet_peer_net *lpni_peer_net; /* low water mark */ int lpni_mintxcredits; + /* + * Each peer_ni in a gateway maintains its own credits. This + * allows more traffic to gateways that have multiple interfaces. + */ /* # router credits */ int lpni_rtrcredits; /* low water mark */ @@ -560,18 +561,12 @@ struct lnet_peer_ni { bool lpni_notifylnd; /* some thread is handling notification */ bool lpni_notifying; - /* SEND event outstanding from ping */ - unsigned int lpni_ping_notsent; /* # times router went dead<->alive */ int lpni_alive_count; /* ytes queued for sending */ long lpni_txqnob; /* time of last aliveness news */ time64_t lpni_timestamp; - /* time of last ping attempt */ - time64_t lpni_ping_timestamp; - /* != 0 if ping reply expected */ - time64_t lpni_ping_deadline; /* when I was last alive */ time64_t lpni_last_alive; /* when lpni_ni was queried last time */ @@ -590,18 +585,12 @@ struct lnet_peer_ni { int lpni_cpt; /* state flags -- protected by lpni_lock */ unsigned int lpni_state; - /* # refs from lnet_route::lr_gateway */ - int lpni_rtr_refcount; /* sequence number used to round robin over peer nis within a net */ u32 lpni_seq; /* sequence number used to round robin over gateways */ u32 lpni_gw_seq; - /* health flag */ - bool lpni_healthy; /* returned RC ping features. Protected with lpni_lock */ unsigned int lpni_ping_feats; - /* routers on this peer */ - struct list_head lpni_routes; /* preferred local nids: if only one, use lpni_pref.nid */ union lpni_pref { lnet_nid_t nid; @@ -632,6 +621,9 @@ struct lnet_peer { /* list of messages pending discovery*/ struct list_head lp_dc_pendq; + /* chain on router list */ + struct list_head lp_rtr_list; + /* primary NID of the peer */ lnet_nid_t lp_primary_nid; @@ -641,10 +633,22 @@ struct lnet_peer { /* number of NIDs on this peer */ int lp_nnis; + /* # refs from lnet_route_t::lr_gateway */ + int lp_rtr_refcount; + + /* messages blocking for router credits */ + struct list_head lp_rtrq; + + /* routes on this peer */ + struct list_head lp_routes; + + /* time of last router check attempt */ + time64_t lp_rtrcheck_timestamp; + /* reference count */ atomic_t lp_refcount; - /* lock protecting peer state flags */ + /* lock protecting peer state flags and lpni_rtrq */ spinlock_t lp_lock; /* peer state flags */ @@ -808,9 +812,11 @@ struct lnet_route { /* chain on gateway */ struct list_head lr_gwlist; /* router node */ - struct lnet_peer_ni *lr_gateway; + struct lnet_peer *lr_gateway; /* remote network number */ u32 lr_net; + /* local network number */ + u32 lr_lnet; /* sequence for round-robin */ int lr_seq; /* number of down NIs */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index e080580..99ff882 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -877,7 +877,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * I return LNET_CREDIT_WAIT if msg blocked and LNET_CREDIT_OK if * received or OK to receive */ - struct lnet_peer_ni *lp = msg->msg_rxpeer; + struct lnet_peer_ni *lpni = msg->msg_rxpeer; + struct lnet_peer *lp; struct lnet_rtrbufpool *rbp; struct lnet_rtrbuf *rb; @@ -887,29 +888,36 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, LASSERT(msg->msg_routing); LASSERT(msg->msg_receiving); LASSERT(!msg->msg_sending); + LASSERT(lpni->lpni_peer_net); + LASSERT(lpni->lpni_peer_net->lpn_peer); + + lp = lpni->lpni_peer_net->lpn_peer; /* non-lnet_parse callers only receive delayed messages */ LASSERT(!do_recv || msg->msg_rx_delayed); if (!msg->msg_peerrtrcredit) { - spin_lock(&lp->lpni_lock); - LASSERT((lp->lpni_rtrcredits < 0) == - !list_empty(&lp->lpni_rtrq)); + /* lpni_lock protects the credit manipulation */ + spin_lock(&lpni->lpni_lock); + /* lp_lock protects the lp_rtrq */ + spin_lock(&lp->lp_lock); msg->msg_peerrtrcredit = 1; - lp->lpni_rtrcredits--; - if (lp->lpni_rtrcredits < lp->lpni_minrtrcredits) - lp->lpni_minrtrcredits = lp->lpni_rtrcredits; + lpni->lpni_rtrcredits--; + if (lpni->lpni_rtrcredits < lpni->lpni_minrtrcredits) + lpni->lpni_minrtrcredits = lpni->lpni_rtrcredits; - if (lp->lpni_rtrcredits < 0) { + if (lpni->lpni_rtrcredits < 0) { /* must have checked eager_recv before here */ LASSERT(msg->msg_rx_ready_delay); msg->msg_rx_delayed = 1; - list_add_tail(&msg->msg_list, &lp->lpni_rtrq); - spin_unlock(&lp->lpni_lock); + list_add_tail(&msg->msg_list, &lp->lp_rtrq); + spin_unlock(&lp->lp_lock); + spin_unlock(&lpni->lpni_lock); return LNET_CREDIT_WAIT; } - spin_unlock(&lp->lpni_lock); + spin_unlock(&lp->lp_lock); + spin_unlock(&lpni->lpni_lock); } rbp = lnet_msg2bufpool(msg); @@ -1080,7 +1088,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, void lnet_return_rx_credits_locked(struct lnet_msg *msg) { - struct lnet_peer_ni *rxpeer = msg->msg_rxpeer; + struct lnet_peer_ni *rxpeerni = msg->msg_rxpeer; + struct lnet_peer *lp; struct lnet_ni *rxni = msg->msg_rxni; struct lnet_msg *msg2; @@ -1135,44 +1144,69 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, routing_off: if (msg->msg_peerrtrcredit) { + LASSERT(rxpeerni); + LASSERT(rxpeerni->lpni_peer_net); + LASSERT(rxpeerni->lpni_peer_net->lpn_peer); + + lp = rxpeerni->lpni_peer_net->lpn_peer; + /* give back peer router credits */ msg->msg_peerrtrcredit = 0; - spin_lock(&rxpeer->lpni_lock); - LASSERT((rxpeer->lpni_rtrcredits < 0) == - !list_empty(&rxpeer->lpni_rtrq)); + spin_lock(&rxpeerni->lpni_lock); + spin_lock(&lp->lp_lock); - rxpeer->lpni_rtrcredits++; - /* - * drop all messages which are queued to be routed on that + rxpeerni->lpni_rtrcredits++; + + /* drop all messages which are queued to be routed on that * peer. */ if (!the_lnet.ln_routing) { LIST_HEAD(drop); - list_splice_init(&rxpeer->lpni_rtrq, &drop); - spin_unlock(&rxpeer->lpni_lock); + list_splice_init(&lp->lp_rtrq, &drop); + spin_unlock(&lp->lp_lock); + spin_unlock(&rxpeerni->lpni_lock); lnet_drop_routed_msgs_locked(&drop, msg->msg_rx_cpt); - } else if (rxpeer->lpni_rtrcredits <= 0) { - msg2 = list_first_entry(&rxpeer->lpni_rtrq, + } else if (!list_empty(&lp->lp_rtrq)) { + int msg2_cpt; + + msg2 = list_first_entry(&lp->lp_rtrq, struct lnet_msg, msg_list); list_del(&msg2->msg_list); - spin_unlock(&rxpeer->lpni_lock); + msg2_cpt = msg2->msg_rx_cpt; + spin_unlock(&lp->lp_lock); + spin_unlock(&rxpeerni->lpni_lock); + /* messages on the lp_rtrq can be from any NID in + * the peer, which means they might have different + * cpts. We need to make sure we lock the right + * one. + */ + if (msg2_cpt != msg->msg_rx_cpt) { + lnet_net_unlock(msg->msg_rx_cpt); + lnet_net_lock(msg2_cpt); + } (void)lnet_post_routed_recv_locked(msg2, 1); + if (msg2_cpt != msg->msg_rx_cpt) { + lnet_net_unlock(msg2_cpt); + lnet_net_lock(msg->msg_rx_cpt); + } } else { - spin_unlock(&rxpeer->lpni_lock); + spin_unlock(&lp->lp_lock); + spin_unlock(&rxpeerni->lpni_lock); } } if (rxni) { msg->msg_rxni = NULL; lnet_ni_decref_locked(rxni, msg->msg_rx_cpt); } - if (rxpeer) { + if (rxpeerni) { msg->msg_rxpeer = NULL; - lnet_peer_ni_decref_locked(rxpeer); + lnet_peer_ni_decref_locked(rxpeerni); } } +#if 0 static int lnet_compare_peers(struct lnet_peer_ni *p1, struct lnet_peer_ni *p2) { @@ -1190,15 +1224,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return 0; } +#endif static int lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2) { + /* TODO re-implement gateway comparison struct lnet_peer_ni *p1 = r1->lr_gateway; struct lnet_peer_ni *p2 = r2->lr_gateway; + */ int r1_hops = (r1->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r1->lr_hops; int r2_hops = (r2->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r2->lr_hops; - int rc; + /*int rc;*/ if (r1->lr_priority < r2->lr_priority) return 1; @@ -1212,9 +1249,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (r1_hops > r2_hops) return -1; + /* rc = lnet_compare_peers(p1, p2); if (rc) return rc; + */ if (r1->lr_seq - r2->lr_seq <= 0) return 1; @@ -1222,17 +1261,17 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return -1; } -static struct lnet_peer_ni * +/* TODO: lnet_find_route_locked() needs to be reimplemented */ +static struct lnet_route * lnet_find_route_locked(struct lnet_net *net, u32 remote_net, - lnet_nid_t rtr_nid, struct lnet_route **use_route, - struct lnet_route **prev_route) + lnet_nid_t rtr_nid, struct lnet_route **prev_route) { struct lnet_remotenet *rnet; struct lnet_route *route; struct lnet_route *best_route; struct lnet_route *last_route; - struct lnet_peer_ni *lpni_best; - struct lnet_peer_ni *lp; + struct lnet_peer *lp_best; + struct lnet_peer *lp; int rc; /* @@ -1243,7 +1282,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!rnet) return NULL; - lpni_best = NULL; + lp_best = NULL; best_route = NULL; last_route = NULL; list_for_each_entry(route, &rnet->lrn_routes, lr_list) { @@ -1252,16 +1291,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!lnet_is_route_alive(route)) continue; - if (net && lp->lpni_net != net) - continue; - - if (lp->lpni_nid == rtr_nid) /* it's pre-determined router */ - return lp; - - if (!lpni_best) { + if (!lp_best) { best_route = route; last_route = route; - lpni_best = lp; + lp_best = lp; continue; } @@ -1274,14 +1307,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, continue; best_route = route; - lpni_best = lp; + lp_best = lp; } - if (best_route) { - *use_route = best_route; - *prev_route = last_route; - } - return lpni_best; + *prev_route = last_route; + + return best_route; } static struct lnet_ni * @@ -1835,60 +1866,80 @@ struct lnet_ni * struct lnet_peer_ni **gw_lpni, struct lnet_peer **gw_peer) { - struct lnet_route *best_route = NULL; - struct lnet_route *last_route = NULL; - struct lnet_peer_ni *gw; + struct lnet_peer *gw; + struct lnet_route *best_route; + struct lnet_route *last_route; + struct lnet_peer_ni *lpni = NULL; lnet_nid_t src_nid = sd->sd_src_nid; - gw = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), - sd->sd_rtr_nid, &best_route, &last_route); - if (!gw) { + best_route = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), + sd->sd_rtr_nid, &last_route); + if (!best_route) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); return -EHOSTUNREACH; } - /* get the peer of the gw_ni */ - LASSERT(gw->lpni_peer_net); - LASSERT(gw->lpni_peer_net->lpn_peer); - - *gw_peer = gw->lpni_peer_net->lpn_peer; + gw = best_route->lr_gateway; + *gw_peer = gw; /* Discover this gateway if it hasn't already been discovered. * This means we might delay the message until discovery has * completed */ +#if 0 + /* TODO: disable discovey for now */ if (lnet_msg_discovery(sd->sd_msg) && !lnet_peer_is_uptodate(*gw_peer)) { sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; return lnet_initiate_peer_discovery(gw, sd->sd_msg, sd->sd_rtr_nid, sd->sd_cpt); } +#endif - if (!sd->sd_best_ni) - sd->sd_best_ni = - lnet_find_best_ni_on_spec_net(NULL, *gw_peer, - gw->lpni_peer_net, - sd->sd_md_cpt, - true); + if (!sd->sd_best_ni) { + struct lnet_peer_net *lpeer; + lpeer = lnet_peer_get_net_locked(gw, best_route->lr_lnet); + sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpeer, + sd->sd_md_cpt, + true); + } if (!sd->sd_best_ni) { CERROR("Internal Error. Expected local ni on %s but non found :%s\n", - libcfs_net2str(gw->lpni_peer_net->lpn_net_id), + libcfs_net2str(best_route->lr_lnet), libcfs_nid2str(sd->sd_src_nid)); return -EFAULT; } /* if gw is MR let's find its best peer_ni */ - if (lnet_peer_is_multi_rail(*gw_peer)) { - gw = lnet_find_best_lpni_on_net(sd, *gw_peer, - sd->sd_best_ni->ni_net->net_id); + if (lnet_peer_is_multi_rail(gw)) { + lpni = lnet_find_best_lpni_on_net(sd, gw, + sd->sd_best_ni->ni_net->net_id); /* We've already verified that the gw has an NI on that * desired net, but we're not finding it. Something is * wrong. */ - if (!gw) { + if (!lpni) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } + } else { + struct lnet_peer_net *lpn; + + lpn = lnet_peer_get_net_locked(gw, best_route->lr_lnet); + if (!lpn) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } + lpni = list_entry(lpn->lpn_peer_nis.next, struct lnet_peer_ni, + lpni_peer_nis); + if (!lpni) { CERROR("Internal Error. Route expected to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); @@ -1896,7 +1947,7 @@ struct lnet_ni * } } - *gw_lpni = gw; + *gw_lpni = lpni; /* increment the route sequence number since now we're sure we're * going to use it @@ -4046,17 +4097,23 @@ void lnet_monitor_thr_stop(void) rnet = lnet_find_rnet_locked(LNET_NIDNET(src_nid)); if (rnet) { - struct lnet_peer_ni *gw = NULL; + struct lnet_peer *gw = NULL; + struct lnet_peer_ni *lpni = NULL; struct lnet_route *route; list_for_each_entry(route, &rnet->lrn_routes, lr_list) { found = false; gw = route->lr_gateway; - if (gw->lpni_net != net) + if (route->lr_lnet != net->net_id) continue; - if (gw->lpni_nid == from_nid) { - found = true; - break; + /* if the nid is one of the gateway's NIDs + * then this is a valid gateway + */ + while ((lpni = lnet_get_next_peer_ni_locked(gw, NULL, lpni)) != NULL) { + if (lpni->lpni_nid == from_nid) { + found = true; + break; + } } } } @@ -4773,9 +4830,11 @@ struct lnet_msg * LASSERT(shortest); hops = shortest_hops; if (srcnidp) { - ni = lnet_get_next_ni_locked( - shortest->lr_gateway->lpni_net, - NULL); + struct lnet_net *net; + + net = lnet_get_net_locked(shortest->lr_lnet); + LASSERT(net); + ni = lnet_get_next_ni_locked(net, NULL); *srcnidp = ni->ni_nid; } if (orderp) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 0d2d356..faaf94a 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -120,8 +120,6 @@ return NULL; INIT_LIST_HEAD(&lpni->lpni_txq); - INIT_LIST_HEAD(&lpni->lpni_rtrq); - INIT_LIST_HEAD(&lpni->lpni_routes); INIT_LIST_HEAD(&lpni->lpni_hashlist); INIT_LIST_HEAD(&lpni->lpni_peer_nis); INIT_LIST_HEAD(&lpni->lpni_recovery); @@ -206,10 +204,13 @@ if (!lp) return NULL; + INIT_LIST_HEAD(&lp->lp_rtrq); + INIT_LIST_HEAD(&lp->lp_routes); INIT_LIST_HEAD(&lp->lp_peer_list); INIT_LIST_HEAD(&lp->lp_peer_nets); INIT_LIST_HEAD(&lp->lp_dc_list); INIT_LIST_HEAD(&lp->lp_dc_pendq); + INIT_LIST_HEAD(&lp->lp_rtr_list); init_waitqueue_head(&lp->lp_dc_waitq); spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; @@ -235,6 +236,7 @@ CDEBUG(D_NET, "%p nid %s\n", lp, libcfs_nid2str(lp->lp_primary_nid)); LASSERT(atomic_read(&lp->lp_refcount) == 0); + LASSERT(lp->lp_rtr_refcount == 0); LASSERT(list_empty(&lp->lp_peer_nets)); LASSERT(list_empty(&lp->lp_peer_list)); LASSERT(list_empty(&lp->lp_dc_list)); @@ -324,7 +326,7 @@ struct lnet_peer_table *ptable = NULL; /* don't remove a peer_ni if it's also a gateway */ - if (lpni->lpni_rtr_refcount > 0) { + if (lnet_isrouter(lpni)) { CERROR("Peer NI %s is a gateway. Can not delete it\n", libcfs_nid2str(lpni->lpni_nid)); return -EBUSY; @@ -570,7 +572,7 @@ void lnet_peer_uninit(void) { struct lnet_peer_ni *lp; struct lnet_peer_ni *tmp; - lnet_nid_t lpni_nid; + lnet_nid_t gw_nid; int i; for (i = 0; i < LNET_PEER_HASH_SIZE; i++) { @@ -579,13 +581,13 @@ void lnet_peer_uninit(void) if (net != lp->lpni_net) continue; - if (!lp->lpni_rtr_refcount) + if (!lnet_isrouter(lp)) continue; - lpni_nid = lp->lpni_nid; + gw_nid = lp->lpni_peer_net->lpn_peer->lp_primary_nid; lnet_net_unlock(LNET_LOCK_EX); - lnet_del_route(LNET_NIDNET(LNET_NID_ANY), lpni_nid); + lnet_del_route(LNET_NIDNET(LNET_NID_ANY), gw_nid); lnet_net_lock(LNET_LOCK_EX); } } @@ -1567,7 +1569,6 @@ struct lnet_peer_net * CDEBUG(D_NET, "%p nid %s\n", lpni, libcfs_nid2str(lpni->lpni_nid)); LASSERT(atomic_read(&lpni->lpni_refcount) == 0); - LASSERT(lpni->lpni_rtr_refcount == 0); LASSERT(list_empty(&lpni->lpni_txq)); LASSERT(lpni->lpni_txqnob == 0); LASSERT(list_empty(&lpni->lpni_peer_nis)); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index c00b9251..4e79c21 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -114,7 +114,6 @@ spin_lock(&lp->lpni_lock); lp->lpni_timestamp = when; /* update timestamp */ - lp->lpni_ping_deadline = 0; /* disable ping timeout */ if (lp->lpni_alive_count && /* got old news */ (!lp->lpni_alive) == (!alive)) { /* new date for old news */ @@ -191,58 +190,6 @@ spin_unlock(&lp->lpni_lock); } -static void -lnet_rtr_addref_locked(struct lnet_peer_ni *lp) -{ - LASSERT(atomic_read(&lp->lpni_refcount) > 0); - LASSERT(lp->lpni_rtr_refcount >= 0); - - /* lnet_net_lock must be exclusively locked */ - lp->lpni_rtr_refcount++; - if (lp->lpni_rtr_refcount == 1) { - struct list_head *pos; - - /* a simple insertion sort */ - list_for_each_prev(pos, &the_lnet.ln_routers) { - struct lnet_peer_ni *rtr; - - rtr = list_entry(pos, struct lnet_peer_ni, - lpni_rtr_list); - if (rtr->lpni_nid < lp->lpni_nid) - break; - } - - list_add(&lp->lpni_rtr_list, pos); - /* addref for the_lnet.ln_routers */ - lnet_peer_ni_addref_locked(lp); - the_lnet.ln_routers_version++; - } -} - -static void -lnet_rtr_decref_locked(struct lnet_peer_ni *lp) -{ - LASSERT(atomic_read(&lp->lpni_refcount) > 0); - LASSERT(lp->lpni_rtr_refcount > 0); - - /* lnet_net_lock must be exclusively locked */ - lp->lpni_rtr_refcount--; - if (!lp->lpni_rtr_refcount) { - LASSERT(list_empty(&lp->lpni_routes)); - - if (lp->lpni_rcd) { - list_add(&lp->lpni_rcd->rcd_list, - &the_lnet.ln_rcd_deathrow); - lp->lpni_rcd = NULL; - } - - list_del(&lp->lpni_rtr_list); - /* decref for the_lnet.ln_routers */ - lnet_peer_ni_decref_locked(lp); - the_lnet.ln_routers_version++; - } -} - struct lnet_remotenet * lnet_find_rnet_locked(u32 net) { @@ -259,239 +206,24 @@ struct lnet_remotenet * return NULL; } -static void lnet_shuffle_seed(void) -{ - static int seeded; - struct lnet_ni *ni = NULL; - - if (seeded) - return; - - /* Nodes with small feet have little entropy - * the NID for this node gives the most entropy in the low bits */ - while ((ni = lnet_get_next_ni_locked(NULL, ni))) { - u32 lnd_type, seed; - - lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid)); - if (lnd_type != LOLND) { - seed = (LNET_NIDADDR(ni->ni_nid) | lnd_type); - add_device_randomness(&seed, sizeof(seed)); - } - } - - seeded = 1; -} - -/* NB expects LNET_LOCK held */ -static void -lnet_add_route_to_rnet(struct lnet_remotenet *rnet, struct lnet_route *route) -{ - unsigned int len = 0; - unsigned int offset = 0; - struct list_head *e; - - lnet_shuffle_seed(); - - list_for_each(e, &rnet->lrn_routes) { - len++; - } - - /* len+1 positions to add a new entry */ - offset = prandom_u32_max(len + 1); - list_for_each(e, &rnet->lrn_routes) { - if (!offset) - break; - offset--; - } - list_add(&route->lr_list, e); - list_add(&route->lr_gwlist, &route->lr_gateway->lpni_routes); - - the_lnet.ln_remote_nets_version++; - lnet_rtr_addref_locked(route->lr_gateway); -} - int lnet_add_route(u32 net, u32 hops, lnet_nid_t gateway, unsigned int priority) { - struct lnet_remotenet *rnet; - struct lnet_remotenet *rnet2; - struct lnet_route *route; - struct lnet_route *route2; - struct lnet_ni *ni; - struct lnet_peer_ni *lpni; - int add_route; - int rc; - - CDEBUG(D_NET, "Add route: net %s hops %d priority %u gw %s\n", - libcfs_net2str(net), hops, priority, libcfs_nid2str(gateway)); - - if (gateway == LNET_NID_ANY || - LNET_NETTYP(LNET_NIDNET(gateway)) == LOLND || - net == LNET_NIDNET(LNET_NID_ANY) || - LNET_NETTYP(net) == LOLND || - LNET_NIDNET(gateway) == net || - (hops != LNET_UNDEFINED_HOPS && (hops < 1 || hops > 255))) - return -EINVAL; - - if (lnet_islocalnet(net)) /* it's a local network */ - return -EEXIST; - - /* Assume net, route, all new */ - route = kzalloc(sizeof(*route), GFP_NOFS); - rnet = kzalloc(sizeof(*rnet), GFP_NOFS); - if (!route || !rnet) { - CERROR("Out of memory creating route %s %d %s\n", - libcfs_net2str(net), hops, libcfs_nid2str(gateway)); - kfree(route); - kfree(rnet); - return -ENOMEM; - } - - INIT_LIST_HEAD(&rnet->lrn_routes); - rnet->lrn_net = net; - route->lr_hops = hops; - route->lr_net = net; - route->lr_priority = priority; - - lnet_net_lock(LNET_LOCK_EX); - - lpni = lnet_nid2peerni_ex(gateway, LNET_LOCK_EX); - if (IS_ERR(lpni)) { - lnet_net_unlock(LNET_LOCK_EX); - - kfree(route); - kfree(rnet); - - rc = PTR_ERR(lpni); - if (rc == -EHOSTUNREACH) /* gateway is not on a local net */ - return rc; /* ignore the route entry */ - CERROR("Error %d creating route %s %d %s\n", rc, - libcfs_net2str(net), hops, - libcfs_nid2str(gateway)); - return rc; - } - route->lr_gateway = lpni; - LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); - - rnet2 = lnet_find_rnet_locked(net); - if (!rnet2) { - /* new network */ - list_add_tail(&rnet->lrn_list, lnet_net2rnethash(net)); - rnet2 = rnet; - } - - /* Search for a duplicate route (it's a NOOP if it is) */ - add_route = 1; - list_for_each_entry(route2, &rnet2->lrn_routes, lr_list) { - if (route2->lr_gateway == route->lr_gateway) { - add_route = 0; - break; - } - - /* our lookups must be true */ - LASSERT(route2->lr_gateway->lpni_nid != gateway); - } - - if (add_route) { - lnet_peer_ni_addref_locked(route->lr_gateway); /* +1 for notify */ - lnet_add_route_to_rnet(rnet2, route); - - ni = lnet_get_next_ni_locked(route->lr_gateway->lpni_net, NULL); - lnet_net_unlock(LNET_LOCK_EX); - - /* XXX Assume alive */ - if (ni->ni_net->net_lnd->lnd_notify) - ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1); - - lnet_net_lock(LNET_LOCK_EX); - } - - /* -1 for notify or !add_route */ - lnet_peer_ni_decref_locked(route->lr_gateway); - lnet_net_unlock(LNET_LOCK_EX); - rc = 0; - - if (!add_route) { - rc = -EEXIST; - kfree(route); - } - - if (rnet != rnet2) - kfree(rnet); - - /* kick start the monitor thread to handle the added route */ - wake_up(&the_lnet.ln_mt_waitq); - - return rc; + net = net; + hops = hops; + gateway = gateway; + priority = priority; + return -EINVAL; } +/* TODO: reimplement lnet_check_routes() */ int lnet_del_route(u32 net, lnet_nid_t gw_nid) { - struct lnet_peer_ni *gateway; - struct lnet_remotenet *rnet; - struct lnet_route *route; - int rc = -ENOENT; - struct list_head *rn_list; - int idx = 0; - - CDEBUG(D_NET, "Del route: net %s : gw %s\n", - libcfs_net2str(net), libcfs_nid2str(gw_nid)); - - /* - * NB Caller may specify either all routes via the given gateway - * or a specific route entry actual NIDs) - */ - lnet_net_lock(LNET_LOCK_EX); - if (net == LNET_NIDNET(LNET_NID_ANY)) - rn_list = &the_lnet.ln_remote_nets_hash[0]; - else - rn_list = lnet_net2rnethash(net); - -again: - list_for_each_entry(rnet, rn_list, lrn_list) { - if (!(net == LNET_NIDNET(LNET_NID_ANY) || - net == rnet->lrn_net)) - continue; - - list_for_each_entry(route, &rnet->lrn_routes, lr_list) { - gateway = route->lr_gateway; - if (!(gw_nid == LNET_NID_ANY || - gw_nid == gateway->lpni_nid)) - continue; - - list_del(&route->lr_list); - list_del(&route->lr_gwlist); - the_lnet.ln_remote_nets_version++; - - if (list_empty(&rnet->lrn_routes)) - list_del(&rnet->lrn_list); - else - rnet = NULL; - - lnet_rtr_decref_locked(gateway); - lnet_peer_ni_decref_locked(gateway); - - lnet_net_unlock(LNET_LOCK_EX); - - kfree(route); - kfree(rnet); - - rc = 0; - lnet_net_lock(LNET_LOCK_EX); - goto again; - } - } - - if (net == LNET_NIDNET(LNET_NID_ANY) && - ++idx < LNET_REMOTE_NETS_HASH_SIZE) { - rn_list = &the_lnet.ln_remote_nets_hash[idx]; - goto again; - } - lnet_net_unlock(LNET_LOCK_EX); - - return rc; + net = net; + gw_nid = gw_nid; + return -EINVAL; } void @@ -553,7 +285,8 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) *net = rnet->lrn_net; *hops = route->lr_hops; *priority = route->lr_priority; - *gateway = route->lr_gateway->lpni_nid; + *gateway = + route->lr_gateway->lp_primary_nid; *alive = lnet_is_route_alive(route); lnet_net_unlock(cpt); return 0; @@ -588,110 +321,12 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } /** - * parse router-checker pinginfo, record number of down NIs for remote - * networks on that router. + * TODO: re-implement */ static void lnet_parse_rc_info(struct lnet_rc_data *rcd) { - struct lnet_ping_buffer *pbuf = rcd->rcd_pingbuffer; - struct lnet_peer_ni *gw = rcd->rcd_gateway; - struct lnet_route *rte; - int nnis; - - if (!gw->lpni_alive || !pbuf) - return; - - /* - * Protect gw->lpni_ping_feats. This can be set from - * lnet_notify_locked with different locks being held - */ - spin_lock(&gw->lpni_lock); - - if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) - lnet_swap_pinginfo(pbuf); - - /* NB always racing with network! */ - if (pbuf->pb_info.pi_magic != LNET_PROTO_PING_MAGIC) { - CDEBUG(D_NET, "%s: Unexpected magic %08x\n", - libcfs_nid2str(gw->lpni_nid), pbuf->pb_info.pi_magic); - gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - goto out; - } - - gw->lpni_ping_feats = pbuf->pb_info.pi_features; - - /* Without NI status info there's nothing more to do. */ - if (!(gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) - goto out; - - /* Determine the number of NIs for which there is data. */ - nnis = pbuf->pb_info.pi_nnis; - if (pbuf->pb_nnis < nnis) { - if (rcd->rcd_nnis < nnis) - rcd->rcd_nnis = nnis; - nnis = pbuf->pb_nnis; - } - - list_for_each_entry(rte, &gw->lpni_routes, lr_gwlist) { - int down = 0; - int up = 0; - int i; - - /* If routing disabled then the route is down. */ - if (gw->lpni_ping_feats & LNET_PING_FEAT_RTE_DISABLED) { - rte->lr_downis = 1; - continue; - } - - for (i = 0; i < nnis; i++) { - struct lnet_ni_status *stat = &pbuf->pb_info.pi_ni[i]; - lnet_nid_t nid = stat->ns_nid; - - if (nid == LNET_NID_ANY) { - CDEBUG(D_NET, "%s: unexpected LNET_NID_ANY\n", - libcfs_nid2str(gw->lpni_nid)); - gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - goto out; - } - - if (LNET_NETTYP(LNET_NIDNET(nid)) == LOLND) - continue; - - if (stat->ns_status == LNET_NI_STATUS_DOWN) { - down++; - continue; - } - - if (stat->ns_status == LNET_NI_STATUS_UP) { - if (LNET_NIDNET(nid) == rte->lr_net) { - up = 1; - break; - } - continue; - } - - CDEBUG(D_NET, "%s: Unexpected status 0x%x\n", - libcfs_nid2str(gw->lpni_nid), stat->ns_status); - gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - goto out; - } - - if (up) { /* ignore downed NIs if NI for dest network is up */ - rte->lr_downis = 0; - continue; - } - /** - * if @down is zero and this route is single-hop, it means - * we can't find NI for target network - */ - if (!down && rte->lr_hops == 1) - down = 1; - - rte->lr_downis = down; - } -out: - spin_unlock(&gw->lpni_lock); + rcd = rcd; } static void @@ -725,7 +360,6 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } if (event->type == LNET_EVENT_SEND) { - lp->lpni_ping_notsent = 0; if (!event->status) goto out; } @@ -755,7 +389,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) static void lnet_wait_known_routerstate(void) { - struct lnet_peer_ni *rtr; + struct lnet_peer *rtr; int all_known; LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); @@ -764,15 +398,15 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) int cpt = lnet_net_lock_current(); all_known = 1; - list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { - spin_lock(&rtr->lpni_lock); + list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) { + spin_lock(&rtr->lp_lock); - if (!rtr->lpni_alive_count) { + if (!(rtr->lp_state & LNET_PEER_DISCOVERED)) { all_known = 0; - spin_unlock(&rtr->lpni_lock); + spin_unlock(&rtr->lp_lock); break; } - spin_unlock(&rtr->lpni_lock); + spin_unlock(&rtr->lp_lock); } lnet_net_unlock(cpt); @@ -784,17 +418,22 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } +/* TODO: reimplement */ void lnet_router_ni_update_locked(struct lnet_peer_ni *gw, u32 net) { struct lnet_route *rte; + struct lnet_peer *lp; - if ((gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) { - list_for_each_entry(rte, &gw->lpni_routes, lr_gwlist) { - if (rte->lr_net == net) { - rte->lr_downis = 0; - break; - } + if ((gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) + lp = gw->lpni_peer_net->lpn_peer; + else + return; + + list_for_each_entry(rte, &lp->lp_routes, lr_gwlist) { + if (rte->lr_net == net) { + rte->lr_downis = 0; + break; } } } @@ -841,212 +480,6 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -static void -lnet_destroy_rc_data(struct lnet_rc_data *rcd) -{ - LASSERT(list_empty(&rcd->rcd_list)); - /* detached from network */ - LASSERT(LNetMDHandleIsInvalid(rcd->rcd_mdh)); - - if (rcd->rcd_gateway) { - int cpt = rcd->rcd_gateway->lpni_cpt; - - lnet_net_lock(cpt); - lnet_peer_ni_decref_locked(rcd->rcd_gateway); - lnet_net_unlock(cpt); - } - - if (rcd->rcd_pingbuffer) - lnet_ping_buffer_decref(rcd->rcd_pingbuffer); - - kfree(rcd); -} - -static struct lnet_rc_data * -lnet_update_rc_data_locked(struct lnet_peer_ni *gateway) -{ - struct lnet_handle_md mdh; - struct lnet_rc_data *rcd; - struct lnet_ping_buffer *pbuf = NULL; - struct lnet_md md; - int nnis = LNET_INTERFACES_MIN; - int rc; - int i; - - rcd = gateway->lpni_rcd; - if (rcd) { - nnis = rcd->rcd_nnis; - mdh = rcd->rcd_mdh; - LNetInvalidateMDHandle(&rcd->rcd_mdh); - pbuf = rcd->rcd_pingbuffer; - rcd->rcd_pingbuffer = NULL; - } else { - LNetInvalidateMDHandle(&mdh); - } - - lnet_net_unlock(gateway->lpni_cpt); - - if (rcd) { - LNetMDUnlink(mdh); - lnet_ping_buffer_decref(pbuf); - } else { - rcd = kzalloc(sizeof(*rcd), GFP_NOFS); - if (!rcd) - goto out; - - LNetInvalidateMDHandle(&rcd->rcd_mdh); - INIT_LIST_HEAD(&rcd->rcd_list); - rcd->rcd_nnis = nnis; - } - - pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); - if (!pbuf) - goto out; - - for (i = 0; i < nnis; i++) { - pbuf->pb_info.pi_ni[i].ns_nid = LNET_NID_ANY; - pbuf->pb_info.pi_ni[i].ns_status = LNET_NI_STATUS_INVALID; - } - rcd->rcd_pingbuffer = pbuf; - - md.start = &pbuf->pb_info; - md.user_ptr = rcd; - md.length = LNET_PING_INFO_SIZE(nnis); - md.threshold = LNET_MD_THRESH_INF; - md.options = LNET_MD_TRUNCATE; - md.eq_handle = the_lnet.ln_rc_eqh; - - LASSERT(!LNetEQHandleIsInvalid(the_lnet.ln_rc_eqh)); - rc = LNetMDBind(md, LNET_UNLINK, &rcd->rcd_mdh); - if (rc < 0) { - CERROR("Can't bind MD: %d\n", rc); - goto out_ping_buffer_decref; - } - LASSERT(!rc); - - lnet_net_lock(gateway->lpni_cpt); - /* Check if this is still a router. */ - if (!lnet_isrouter(gateway)) - goto out_unlock; - /* Check if someone else installed router data. */ - if (gateway->lpni_rcd && gateway->lpni_rcd != rcd) - goto out_unlock; - - /* Install and/or update the router data. */ - if (!gateway->lpni_rcd) { - lnet_peer_ni_addref_locked(gateway); - rcd->rcd_gateway = gateway; - gateway->lpni_rcd = rcd; - } - gateway->lpni_ping_notsent = 0; - - return rcd; - -out_unlock: - lnet_net_unlock(gateway->lpni_cpt); - rc = LNetMDUnlink(mdh); - LASSERT(!rc); -out_ping_buffer_decref: - lnet_ping_buffer_decref(pbuf); -out: - if (rcd && rcd != gateway->lpni_rcd) - lnet_destroy_rc_data(rcd); - lnet_net_lock(gateway->lpni_cpt); - return gateway->lpni_rcd; -} - -static int -lnet_router_check_interval(struct lnet_peer_ni *rtr) -{ - int secs; - - secs = rtr->lpni_alive ? live_router_check_interval : - dead_router_check_interval; - if (secs < 0) - secs = 0; - - return secs; -} - -static void -lnet_ping_router_locked(struct lnet_peer_ni *rtr) -{ - struct lnet_rc_data *rcd = NULL; - time64_t now = ktime_get_seconds(); - time64_t secs; - struct lnet_ni *ni; - - lnet_peer_ni_addref_locked(rtr); - - if (rtr->lpni_ping_deadline && /* ping timed out? */ - now > rtr->lpni_ping_deadline) - lnet_notify_locked(rtr, 1, 0, now); - - /* Run any outstanding notifications */ - ni = lnet_get_next_ni_locked(rtr->lpni_net, NULL); - lnet_ni_notify_locked(ni, rtr); - - if (!lnet_isrouter(rtr) || - the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { - /* router table changed or router checker is shutting down */ - lnet_peer_ni_decref_locked(rtr); - return; - } - - rcd = rtr->lpni_rcd; - - /* The response to the router checker ping could've timed out and - * the mdh might've been invalidated, so we need to update it - * again. - */ - if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis || - LNetMDHandleIsInvalid(rcd->rcd_mdh)) - rcd = lnet_update_rc_data_locked(rtr); - if (!rcd) - return; - - secs = lnet_router_check_interval(rtr); - - CDEBUG(D_NET, - "rtr %s %lldd: deadline %lld ping_notsent %d alive %d alive_count %d lpni_ping_timestamp %lld\n", - libcfs_nid2str(rtr->lpni_nid), secs, - rtr->lpni_ping_deadline, rtr->lpni_ping_notsent, - rtr->lpni_alive, rtr->lpni_alive_count, - rtr->lpni_ping_timestamp); - - if (secs && !rtr->lpni_ping_notsent && - now > rtr->lpni_ping_timestamp + secs) { - int rc; - struct lnet_process_id id; - struct lnet_handle_md mdh; - - id.nid = rtr->lpni_nid; - id.pid = LNET_PID_LUSTRE; - CDEBUG(D_NET, "Check: %s\n", libcfs_id2str(id)); - - rtr->lpni_ping_notsent = 1; - rtr->lpni_ping_timestamp = now; - - mdh = rcd->rcd_mdh; - - if (!rtr->lpni_ping_deadline) { - rtr->lpni_ping_deadline = ktime_get_seconds() + - router_ping_timeout; - } - - lnet_net_unlock(rtr->lpni_cpt); - - rc = LNetGet(LNET_NID_ANY, mdh, id, LNET_RESERVED_PORTAL, - LNET_PROTO_PING_MATCHBITS, 0, false); - - lnet_net_lock(rtr->lpni_cpt); - if (rc) - rtr->lpni_ping_notsent = 0; /* no event pending */ - } - - lnet_peer_ni_decref_locked(rtr); -} - int lnet_router_pre_mt_start(void) { int rc; @@ -1088,81 +521,7 @@ void lnet_router_cleanup(void) void lnet_prune_rc_data(int wait_unlink) { - struct lnet_rc_data *rcd; - struct lnet_rc_data *tmp; - struct lnet_peer_ni *lp; - struct list_head head; - int i = 2; - - if (likely(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING && - list_empty(&the_lnet.ln_rcd_deathrow) && - list_empty(&the_lnet.ln_rcd_zombie))) - return; - - INIT_LIST_HEAD(&head); - - lnet_net_lock(LNET_LOCK_EX); - - if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { - /* router checker is stopping, prune all */ - list_for_each_entry(lp, &the_lnet.ln_routers, - lpni_rtr_list) { - if (!lp->lpni_rcd) - continue; - - LASSERT(list_empty(&lp->lpni_rcd->rcd_list)); - list_add(&lp->lpni_rcd->rcd_list, - &the_lnet.ln_rcd_deathrow); - lp->lpni_rcd = NULL; - } - } - - /* unlink all RCDs on deathrow list */ - list_splice_init(&the_lnet.ln_rcd_deathrow, &head); - - if (!list_empty(&head)) { - lnet_net_unlock(LNET_LOCK_EX); - - list_for_each_entry(rcd, &head, rcd_list) - LNetMDUnlink(rcd->rcd_mdh); - - lnet_net_lock(LNET_LOCK_EX); - } - - list_splice_init(&head, &the_lnet.ln_rcd_zombie); - - /* release all zombie RCDs */ - while (!list_empty(&the_lnet.ln_rcd_zombie)) { - list_for_each_entry_safe(rcd, tmp, &the_lnet.ln_rcd_zombie, - rcd_list) { - if (LNetMDHandleIsInvalid(rcd->rcd_mdh)) - list_move(&rcd->rcd_list, &head); - } - - wait_unlink = wait_unlink && - !list_empty(&the_lnet.ln_rcd_zombie); - - lnet_net_unlock(LNET_LOCK_EX); - - while ((rcd = list_first_entry_or_null(&head, - struct lnet_rc_data, - rcd_list)) != NULL) { - list_del_init(&rcd->rcd_list); - lnet_destroy_rc_data(rcd); - } - - if (!wait_unlink) - return; - - i++; - CDEBUG(((i & (-i)) == i) ? D_WARNING : D_NET, - "Waiting for rc buffers to unlink\n"); - schedule_timeout_uninterruptible(HZ / 4); - - lnet_net_lock(LNET_LOCK_EX); - } - - lnet_net_unlock(LNET_LOCK_EX); + wait_unlink = wait_unlink; } /* @@ -1194,27 +553,16 @@ bool lnet_router_checker_active(void) void lnet_check_routers(void) { - struct lnet_peer_ni *rtr; + struct lnet_peer *rtr; u64 version; int cpt; - int cpt2; cpt = lnet_net_lock_current(); rescan: version = the_lnet.ln_routers_version; - list_for_each_entry(rtr, &the_lnet.ln_routers, lpni_rtr_list) { - cpt2 = rtr->lpni_cpt; - if (cpt != cpt2) { - lnet_net_unlock(cpt); - cpt = cpt2; - lnet_net_lock(cpt); - /* the routers list has changed */ - if (version != the_lnet.ln_routers_version) - goto rescan; - } - - lnet_ping_router_locked(rtr); + list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) { + /* TODO use discovery to determine if router is alive */ /* NB dropped lock */ if (version != the_lnet.ln_routers_version) { diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 5341599..d41ff00 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -215,7 +215,7 @@ static int proc_lnet_routes(struct ctl_table *table, int write, u32 net = rnet->lrn_net; u32 hops = route->lr_hops; unsigned int priority = route->lr_priority; - lnet_nid_t nid = route->lr_gateway->lpni_nid; + lnet_nid_t nid = route->lr_gateway->lp_primary_nid; int alive = lnet_is_route_alive(route); s += snprintf(s, tmpstr + tmpsiz - s, @@ -290,7 +290,7 @@ static int proc_lnet_routers(struct ctl_table *table, int write, *ppos = LNET_PROC_POS_MAKE(0, ver, 0, off); } else { struct list_head *r; - struct lnet_peer_ni *peer = NULL; + struct lnet_peer *peer = NULL; int skip = off - 1; lnet_net_lock(0); @@ -305,9 +305,9 @@ static int proc_lnet_routers(struct ctl_table *table, int write, r = the_lnet.ln_routers.next; while (r != &the_lnet.ln_routers) { - struct lnet_peer_ni *lp; + struct lnet_peer *lp; - lp = list_entry(r, struct lnet_peer_ni, lpni_rtr_list); + lp = list_entry(r, struct lnet_peer, lp_rtr_list); if (!skip) { peer = lp; break; @@ -318,21 +318,22 @@ static int proc_lnet_routers(struct ctl_table *table, int write, } if (peer) { - lnet_nid_t nid = peer->lpni_nid; + lnet_nid_t nid = peer->lp_primary_nid; time64_t now = ktime_get_seconds(); - time64_t deadline = peer->lpni_ping_deadline; - int nrefs = atomic_read(&peer->lpni_refcount); - int nrtrrefs = peer->lpni_rtr_refcount; - int alive_cnt = peer->lpni_alive_count; - int alive = peer->lpni_alive; - int pingsent = !peer->lpni_ping_notsent; - time64_t last_ping = now - peer->lpni_ping_timestamp; + /* TODO: readjust what's being printed */ + time64_t deadline = 0; + int nrefs = atomic_read(&peer->lp_refcount); + int nrtrrefs = peer->lp_rtr_refcount; + int alive_cnt = 0; + int alive = 0; + int pingsent = ((peer->lp_state & LNET_PEER_PING_SENT) + != 0); + time64_t last_ping = now - peer->lp_rtrcheck_timestamp; int down_ni = 0; struct lnet_route *rtr; - if ((peer->lpni_ping_feats & - LNET_PING_FEAT_NI_STATUS)) { - list_for_each_entry(rtr, &peer->lpni_routes, + if (nrtrrefs > 0) { + list_for_each_entry(rtr, &peer->lp_routes, lr_gwlist) { /* * downis on any route should be the From patchwork Thu Feb 27 21:13:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410235 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE7E5138D for ; Thu, 27 Feb 2020 21:33:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D6CEF24677 for ; Thu, 27 Feb 2020 21:33:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6CEF24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 339D6349C62; Thu, 27 Feb 2020 13:28:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 48B1521FC34 for ; Thu, 27 Feb 2020 13:20:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3AC488A82; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 392A546A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:20 -0500 Message-Id: <1582838290-17243-333-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 332/622] lnet: lnet_add/del_route() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Reimplemented lnet_add_route() and lnet_del_route() to use the peer instead of the peer_ni. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: 680da7444a06 ("LU-11299 lnet: lnet_add/del_route()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33184 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 317 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 307 insertions(+), 10 deletions(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 4e79c21..8374ce1 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -190,6 +190,39 @@ spin_unlock(&lp->lpni_lock); } +static void +lnet_rtr_addref_locked(struct lnet_peer *lp) +{ + LASSERT(lp->lp_rtr_refcount >= 0); + + /* lnet_net_lock must be exclusively locked */ + lp->lp_rtr_refcount++; + if (lp->lp_rtr_refcount == 1) { + list_add_tail(&lp->lp_rtr_list, &the_lnet.ln_routers); + /* addref for the_lnet.ln_routers */ + lnet_peer_addref_locked(lp); + the_lnet.ln_routers_version++; + } +} + +static void +lnet_rtr_decref_locked(struct lnet_peer *lp) +{ + LASSERT(atomic_read(&lp->lp_refcount) > 0); + LASSERT(lp->lp_rtr_refcount > 0); + + /* lnet_net_lock must be exclusively locked */ + lp->lp_rtr_refcount--; + if (lp->lp_rtr_refcount == 0) { + LASSERT(list_empty(&lp->lp_routes)); + + list_del(&lp->lp_rtr_list); + /* decref for the_lnet.ln_routers */ + lnet_peer_decref_locked(lp); + the_lnet.ln_routers_version++; + } +} + struct lnet_remotenet * lnet_find_rnet_locked(u32 net) { @@ -206,24 +239,288 @@ struct lnet_remotenet * return NULL; } +static void lnet_shuffle_seed(void) +{ + static int seeded; + struct lnet_ni *ni = NULL; + + if (seeded) + return; + + /* Nodes with small feet have little entropy + * the NID for this node gives the most entropy in the low bits + */ + while ((ni = lnet_get_next_ni_locked(NULL, ni))) + add_device_randomness(&ni->ni_nid, sizeof(ni->ni_nid)); + + seeded = 1; +} + +/* NB expects LNET_LOCK held */ +static void +lnet_add_route_to_rnet(struct lnet_remotenet *rnet, struct lnet_route *route) +{ + unsigned int len = 0; + unsigned int offset = 0; + struct list_head *e; + + lnet_shuffle_seed(); + + list_for_each(e, &rnet->lrn_routes) + len++; + + /* Randomly adding routes to the list is done to ensure that when + * different nodes are using the same list of routers, they end up + * preferring different routers. + */ + offset = prandom_u32_max(len + 1); + list_for_each(e, &rnet->lrn_routes) { + if (offset == 0) + break; + offset--; + } + list_add(&route->lr_list, e); + /* force a router check on the gateway to make sure the route is + * alive + */ + route->lr_gateway->lp_rtrcheck_timestamp = 0; + + the_lnet.ln_remote_nets_version++; + + /* add the route on the gateway list */ + list_add(&route->lr_gwlist, &route->lr_gateway->lp_routes); + + /* take a router reference count on the gateway */ + lnet_rtr_addref_locked(route->lr_gateway); +} + int lnet_add_route(u32 net, u32 hops, lnet_nid_t gateway, unsigned int priority) { - net = net; - hops = hops; - gateway = gateway; - priority = priority; - return -EINVAL; + struct list_head *route_entry; + struct lnet_remotenet *rnet; + struct lnet_remotenet *rnet2; + struct lnet_route *route; + struct lnet_peer_ni *lpni; + struct lnet_peer *gw; + int add_route; + int rc; + + CDEBUG(D_NET, "Add route: remote net %s hops %d priority %u gw %s\n", + libcfs_net2str(net), hops, priority, libcfs_nid2str(gateway)); + + if (gateway == LNET_NID_ANY || + LNET_NETTYP(LNET_NIDNET(gateway)) == LOLND || + net == LNET_NIDNET(LNET_NID_ANY) || + LNET_NETTYP(net) == LOLND || + LNET_NIDNET(gateway) == net || + (hops != LNET_UNDEFINED_HOPS && (hops < 1 || hops > 255))) + return -EINVAL; + + /* it's a local network */ + if (lnet_islocalnet(net)) + return -EEXIST; + + /* Assume net, route, all new */ + route = kzalloc(sizeof(*route), GFP_NOFS); + rnet = kzalloc(sizeof(*rnet), GFP_NOFS); + if (!route || !rnet) { + CERROR("Out of memory creating route %s %d %s\n", + libcfs_net2str(net), hops, libcfs_nid2str(gateway)); + kfree(route); + kfree(rnet); + return -ENOMEM; + } + + INIT_LIST_HEAD(&rnet->lrn_routes); + rnet->lrn_net = net; + /* store the local and remote net that the route represents */ + route->lr_lnet = LNET_NIDNET(gateway); + route->lr_net = net; + route->lr_priority = priority; + route->lr_hops = hops; + + lnet_net_lock(LNET_LOCK_EX); + + /* lnet_nid2peerni_ex() grabs a ref on the lpni. We will need to + * lose that once we're done + */ + lpni = lnet_nid2peerni_ex(gateway, LNET_LOCK_EX); + if (IS_ERR(lpni)) { + lnet_net_unlock(LNET_LOCK_EX); + + kfree(route); + kfree(rnet); + + rc = PTR_ERR(lpni); + CERROR("Error %d creating route %s %d %s\n", rc, + libcfs_net2str(net), hops, + libcfs_nid2str(gateway)); + return rc; + } + + LASSERT(lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer); + gw = lpni->lpni_peer_net->lpn_peer; + + route->lr_gateway = gw; + + rnet2 = lnet_find_rnet_locked(net); + if (!rnet2) { + /* new network */ + list_add_tail(&rnet->lrn_list, lnet_net2rnethash(net)); + rnet2 = rnet; + } + + /* Search for a duplicate route (it's a NOOP if it is) */ + add_route = 1; + list_for_each(route_entry, &rnet2->lrn_routes) { + struct lnet_route *route2; + + route2 = list_entry(route_entry, struct lnet_route, lr_list); + if (route2->lr_gateway == route->lr_gateway) { + add_route = 0; + break; + } + + /* our lookups must be true */ + LASSERT(route2->lr_gateway->lp_primary_nid != gateway); + } + + /* It is possible to add multiple routes through the same peer, + * but it'll be using a different NID of that peer. When the + * gateway is discovered, discovery will consolidate the different + * peers into one peer. In this case the discovery code will have + * to move the routes from the peer that's being deleted to the + * consolidated peer lp_routes list + */ + if (add_route) + lnet_add_route_to_rnet(rnet2, route); + + /* get rid of the reference on the lpni. + */ + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(LNET_LOCK_EX); + + rc = 0; + + if (!add_route) { + rc = -EEXIST; + kfree(route); + } + + if (rnet != rnet2) + kfree(rnet); + + /* kick start the monitor thread to handle the added route */ + wake_up(&the_lnet.ln_mt_waitq); + + return rc; +} + +static void +lnet_del_route_from_rnet(lnet_nid_t gw_nid, struct list_head *route_list, + struct list_head *zombies) +{ + struct lnet_peer *gateway; + struct lnet_route *route; + struct lnet_route *tmp; + + list_for_each_entry_safe(route, tmp, route_list, lr_list) { + gateway = route->lr_gateway; + if (gw_nid != LNET_NID_ANY && + gw_nid != gateway->lp_primary_nid) + continue; + + /* move to zombie to delete outside the lock + * Note that this function is called with the + * ln_api_mutex held as well as the exclusive net + * lock. Adding to the remote net list happens + * under the same conditions. Same goes for the + * gateway router list + */ + list_move(&route->lr_list, zombies); + the_lnet.ln_remote_nets_version++; + + list_del(&route->lr_gwlist); + lnet_rtr_decref_locked(gateway); + } } -/* TODO: reimplement lnet_check_routes() */ int lnet_del_route(u32 net, lnet_nid_t gw_nid) { - net = net; - gw_nid = gw_nid; - return -EINVAL; + struct list_head rnet_zombies; + struct lnet_remotenet *rnet; + struct lnet_remotenet *tmp; + struct list_head *rn_list; + struct lnet_peer_ni *lpni; + struct lnet_route *route; + struct list_head zombies; + struct lnet_peer *lp; + int i = 0; + + INIT_LIST_HEAD(&rnet_zombies); + INIT_LIST_HEAD(&zombies); + + CDEBUG(D_NET, "Del route: net %s : gw %s\n", + libcfs_net2str(net), libcfs_nid2str(gw_nid)); + + /* NB Caller may specify either all routes via the given gateway + * or a specific route entry actual NIDs) + */ + + lnet_net_lock(LNET_LOCK_EX); + + lpni = lnet_find_peer_ni_locked(gw_nid); + if (lpni) { + lp = lpni->lpni_peer_net->lpn_peer; + LASSERT(lp); + gw_nid = lp->lp_primary_nid; + lnet_peer_ni_decref_locked(lpni); + } + + if (net != LNET_NIDNET(LNET_NID_ANY)) { + rnet = lnet_find_rnet_locked(net); + if (!rnet) { + lnet_net_unlock(LNET_LOCK_EX); + return -ENOENT; + } + lnet_del_route_from_rnet(gw_nid, &rnet->lrn_routes, + &zombies); + if (list_empty(&rnet->lrn_routes)) + list_move(&rnet->lrn_list, &rnet_zombies); + goto delete_zombies; + } + + for (i = 0; i < LNET_REMOTE_NETS_HASH_SIZE; i++) { + rn_list = &the_lnet.ln_remote_nets_hash[i]; + + list_for_each_entry_safe(rnet, tmp, rn_list, lrn_list) { + lnet_del_route_from_rnet(gw_nid, &rnet->lrn_routes, + &zombies); + if (list_empty(&rnet->lrn_routes)) + list_move(&rnet->lrn_list, &rnet_zombies); + } + } + +delete_zombies: + lnet_net_unlock(LNET_LOCK_EX); + + while (!list_empty(&zombies)) { + route = list_first_entry(&zombies, struct lnet_route, lr_list); + list_del(&route->lr_list); + kfree(route); + } + + while (!list_empty(&rnet_zombies)) { + rnet = list_first_entry(&rnet_zombies, struct lnet_remotenet, + lrn_list); + list_del(&rnet->lrn_list); + kfree(rnet); + } + + return 0; } void @@ -900,7 +1197,7 @@ bool lnet_router_checker_active(void) lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_routing = 1; lnet_net_unlock(LNET_LOCK_EX); - + wake_up(&the_lnet.ln_mt_waitq); return 0; failed: From patchwork Thu Feb 27 21:13:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410239 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6812392A for ; Thu, 27 Feb 2020 21:33:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 50D5224677 for ; Thu, 27 Feb 2020 21:33:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 50D5224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7381349C88; Thu, 27 Feb 2020 13:28:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A0F0F21FC34 for ; Thu, 27 Feb 2020 13:20:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3D2558A83; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BD8246C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:21 -0500 Message-Id: <1582838290-17243-334-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 333/622] lnet: Do not allow deleting of router nis X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Check the peer before deleting a peer_ni. If it's a router then do not allow deletion of the peer-ni. WC-bug-id: https://jira.whamcloud.com/browse/LU-11551 Lustre-commit: 7832a9f52d90 ("LU-11551 lnet: Do not allow deleting of router nis") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33448 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index faaf94a..cb70bc7 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1550,6 +1550,15 @@ struct lnet_peer_net * return -ENODEV; } + lnet_net_lock(LNET_LOCK_EX); + if (lp->lp_rtr_refcount > 0) { + lnet_net_unlock(LNET_LOCK_EX); + CERROR("%s is a router. Can not be deleted\n", + libcfs_nid2str(prim_nid)); + return -EBUSY; + } + lnet_net_unlock(LNET_LOCK_EX); + if (nid == LNET_NID_ANY || nid == lp->lp_primary_nid) return lnet_peer_del(lp); From patchwork Thu Feb 27 21:13:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C533138D for ; Thu, 27 Feb 2020 21:35:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0511D24677 for ; Thu, 27 Feb 2020 21:35:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0511D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A45F534A0AF; Thu, 27 Feb 2020 13:29:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E3DBB21FF19 for ; Thu, 27 Feb 2020 13:20:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4012E8A84; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3E9A746D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:22 -0500 Message-Id: <1582838290-17243-335-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 334/622] lnet: router sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Introduce the router_sensitivity_percentage module parameter to control the sensitivity of routers to failures. It defaults to 100% which means a router interface needs to be fully healthy in order to be used. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 2b59dae54efc ("LU-11300 lnet: router sensitivity") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33449 Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/router.c | 50 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 80f6f8c..eae55d5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -505,6 +505,7 @@ struct lnet_ni * extern unsigned int lnet_recovery_interval; extern unsigned int lnet_peer_discovery_disabled; extern unsigned int lnet_drop_asym_route; +extern unsigned int router_sensitivity_percentage; extern int portal_rotor; int lnet_lib_init(void); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 8374ce1..40725d2 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -90,6 +90,56 @@ module_param(router_ping_timeout, int, 0644); MODULE_PARM_DESC(router_ping_timeout, "Seconds to wait for the reply to a router health query"); +/* A value between 0 and 100. 0 meaning that even if router's interfaces + * have the worse health still consider the gateway usable. + * 100 means that at least one interface on the route's remote net is 100% + * healthy to consider the route alive. + * The default is set to 100 to ensure we maintain the original behavior. + */ +unsigned int router_sensitivity_percentage = 100; +static int rtr_sensitivity_set(const char *val, + const struct kernel_param *kp); +static struct kernel_param_ops param_ops_rtr_sensitivity = { + .set = rtr_sensitivity_set, + .get = param_get_int, +}; + +#define param_check_rtr_sensitivity(name, p) \ + __param_check(name, p, int) +module_param(router_sensitivity_percentage, rtr_sensitivity, 0644); +MODULE_PARM_DESC(router_sensitivity_percentage, + "How healthy a gateway should be to be used in percent"); + +static int +rtr_sensitivity_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *sen = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'router_sensitivity_percentage'\n"); + return rc; + } + + if (value < 0 || value > 100) { + CERROR("Invalid value: %lu for 'router_sensitivity_percentage'\n", value); + return -EINVAL; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + *sen = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + int lnet_peers_start_down(void) { From patchwork Thu Feb 27 21:13:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410217 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED95917E0 for ; Thu, 27 Feb 2020 21:32:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D66FC246A2 for ; Thu, 27 Feb 2020 21:32:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D66FC246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9F007349BAC; Thu, 27 Feb 2020 13:27:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4944D21FC19 for ; Thu, 27 Feb 2020 13:20:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 42E1C8A85; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 414FB47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:23 -0500 Message-Id: <1582838290-17243-336-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 335/622] lnet: cache ni status X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When processing the data in the PUSH or the REPLY make sure to cache the ns_status. This is the status of the peer_ni as reported by the peer itself. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 398f4071dc17 ("LU-11300 lnet: cache ni status") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33450 Reviewed-by: Chris Horn Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/peer.c | 42 +++++++++++++++++++++++++++++++----------- 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 31fe22a..a551005 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -585,6 +585,8 @@ struct lnet_peer_ni { int lpni_cpt; /* state flags -- protected by lpni_lock */ unsigned int lpni_state; + /* status of the peer NI as reported by the peer */ + u32 lpni_ns_status; /* sequence number used to round robin over peer nis within a net */ u32 lpni_seq; /* sequence number used to round robin over gateways */ diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index cb70bc7..cba3da2 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -128,8 +128,10 @@ spin_lock_init(&lpni->lpni_lock); - lpni->lpni_alive = !lnet_peers_start_down(); /* 1 bit!! */ - lpni->lpni_last_alive = ktime_get_seconds(); /* assumes alive */ + if (lnet_peers_start_down()) + lpni->lpni_ns_status = LNET_NI_STATUS_DOWN; + else + lpni->lpni_ns_status = LNET_NI_STATUS_UP; lpni->lpni_ping_feats = LNET_PING_FEAT_INVAL; lpni->lpni_nid = nid; lpni->lpni_cpt = cpt; @@ -2410,7 +2412,7 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, { struct lnet_peer_ni *lpni; lnet_nid_t *curnis = NULL; - lnet_nid_t *addnis = NULL; + struct lnet_ni_status *addnis = NULL; lnet_nid_t *delnis = NULL; unsigned int flags; int ncurnis; @@ -2426,9 +2428,9 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, flags |= LNET_PEER_MULTI_RAIL; nnis = max_t(int, lp->lp_nnis, pbuf->pb_info.pi_nnis); - curnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); - addnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); - delnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); + curnis = kmalloc_array(nnis, sizeof(*curnis), GFP_NOFS); + addnis = kmalloc_array(nnis, sizeof(*addnis), GFP_NOFS); + delnis = kmalloc_array(nnis, sizeof(*delnis), GFP_NOFS); if (!curnis || !addnis || !delnis) { rc = -ENOMEM; goto out; @@ -2451,7 +2453,7 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, if (pbuf->pb_info.pi_ni[i].ns_nid == curnis[j]) break; if (j == ncurnis) - addnis[naddnis++] = pbuf->pb_info.pi_ni[i].ns_nid; + addnis[naddnis++] = pbuf->pb_info.pi_ni[i]; } /* * Check for NIDs in curnis[] not present in pbuf. @@ -2463,23 +2465,41 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, for (i = 0; i < ncurnis; i++) { if (LNET_NETTYP(LNET_NIDNET(curnis[i])) == LOLND) continue; - for (j = 1; j < pbuf->pb_info.pi_nnis; j++) - if (curnis[i] == pbuf->pb_info.pi_ni[j].ns_nid) + for (j = 1; j < pbuf->pb_info.pi_nnis; j++) { + if (curnis[i] == pbuf->pb_info.pi_ni[j].ns_nid) { + /* update the information we cache for the + * peer with the latest information we + * received + */ + lpni = lnet_find_peer_ni_locked(curnis[i]); + if (lpni) { + lpni->lpni_ns_status = + pbuf->pb_info.pi_ni[j].ns_status; + lnet_peer_ni_decref_locked(lpni); + } break; + } + } if (j == pbuf->pb_info.pi_nnis) delnis[ndelnis++] = curnis[i]; } for (i = 0; i < naddnis; i++) { - rc = lnet_peer_add_nid(lp, addnis[i], flags); + rc = lnet_peer_add_nid(lp, addnis[i].ns_nid, flags); if (rc) { CERROR("Error adding NID %s to peer %s: %d\n", - libcfs_nid2str(addnis[i]), + libcfs_nid2str(addnis[i].ns_nid), libcfs_nid2str(lp->lp_primary_nid), rc); if (rc == -ENOMEM) goto out; } + lpni = lnet_find_peer_ni_locked(addnis[i].ns_nid); + if (lpni) { + lpni->lpni_ns_status = addnis[i].ns_status; + lnet_peer_ni_decref_locked(lpni); + } } + for (i = 0; i < ndelnis; i++) { rc = lnet_peer_del_nid(lp, delnis[i], flags); if (rc) { From patchwork Thu Feb 27 21:13:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410243 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71F8A138D for ; Thu, 27 Feb 2020 21:33:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5AA5624677 for ; Thu, 27 Feb 2020 21:33:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5AA5624677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3E3783492C8; Thu, 27 Feb 2020 13:28:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A45F221FF1E for ; Thu, 27 Feb 2020 13:20:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 459E58A86; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 44068468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:24 -0500 Message-Id: <1582838290-17243-337-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 336/622] lnet: Cache the routing feature X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When processing a REPLY or a PUSH for a discovery cache the whether the routing feature is enabled or disabled as reported by the peer. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: d65a7b8727ee ("LU-11300 lnet: Cache the routing feature") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33451 Reviewed-by: Chris Horn Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 28 ++++++++++++++++------------ net/lnet/lnet/peer.c | 10 ++++++++++ 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index a551005..ecc6dee 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -705,9 +705,13 @@ struct lnet_peer { * * A peer is marked NO_DISCOVERY if the LNET_PING_FEAT_DISCOVERY bit was * NOT set when the peer was pinged by discovery. + * + * A peer is marked ROUTER if it indicates so in the feature bit. */ #define LNET_PEER_MULTI_RAIL BIT(0) /* Multi-rail aware */ #define LNET_PEER_NO_DISCOVERY BIT(1) /* Peer disabled discovery */ +#define LNET_PEER_ROUTER_ENABLED BIT(2) /* router feature enabled */ + /* * A peer is marked CONFIGURED if it was configured by DLC. * @@ -721,28 +725,28 @@ struct lnet_peer { * A peer that was created as the result of inbound traffic will not * be marked at all. */ -#define LNET_PEER_CONFIGURED BIT(2) /* Configured via DLC */ -#define LNET_PEER_DISCOVERED BIT(3) /* Peer was discovered */ -#define LNET_PEER_REDISCOVER BIT(4) /* Discovery was disabled */ +#define LNET_PEER_CONFIGURED BIT(3) /* Configured via DLC */ +#define LNET_PEER_DISCOVERED BIT(4) /* Peer was discovered */ +#define LNET_PEER_REDISCOVER BIT(5) /* Discovery was disabled */ /* * A peer is marked DISCOVERING when discovery is in progress. * The other flags below correspond to stages of discovery. */ -#define LNET_PEER_DISCOVERING BIT(5) /* Discovering */ -#define LNET_PEER_DATA_PRESENT BIT(6) /* Remote peer data present */ -#define LNET_PEER_NIDS_UPTODATE BIT(7) /* Remote peer info uptodate */ -#define LNET_PEER_PING_SENT BIT(8) /* Waiting for REPLY to Ping */ -#define LNET_PEER_PUSH_SENT BIT(9) /* Waiting for ACK of Push */ -#define LNET_PEER_PING_FAILED BIT(10) /* Ping send failure */ -#define LNET_PEER_PUSH_FAILED BIT(11) /* Push send failure */ +#define LNET_PEER_DISCOVERING BIT(6) /* Discovering */ +#define LNET_PEER_DATA_PRESENT BIT(7) /* Remote peer data present */ +#define LNET_PEER_NIDS_UPTODATE BIT(8) /* Remote peer info uptodate */ +#define LNET_PEER_PING_SENT BIT(9) /* Waiting for REPLY to Ping */ +#define LNET_PEER_PUSH_SENT BIT(10) /* Waiting for ACK of Push */ +#define LNET_PEER_PING_FAILED BIT(11) /* Ping send failure */ +#define LNET_PEER_PUSH_FAILED BIT(12) /* Push send failure */ /* * A ping can be forced as a way to fix up state, or as a manual * intervention by an admin. * A push can be forced in circumstances that would normally not * allow for one to happen. */ -#define LNET_PEER_FORCE_PING BIT(12) /* Forced Ping */ -#define LNET_PEER_FORCE_PUSH BIT(13) /* Forced Push */ +#define LNET_PEER_FORCE_PING BIT(13) /* Forced Ping */ +#define LNET_PEER_FORCE_PUSH BIT(14) /* Forced Push */ struct lnet_peer_net { /* chain on lp_peer_nets */ diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index cba3da2..91ad6b4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2427,6 +2427,16 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) flags |= LNET_PEER_MULTI_RAIL; + /* Cache the routing feature for the peer; whether it is enabled + * for disabled as reported by the remote peer. + */ + spin_lock(&lp->lp_lock); + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_RTE_DISABLED)) + lp->lp_state |= LNET_PEER_ROUTER_ENABLED; + else + lp->lp_state &= ~LNET_PEER_ROUTER_ENABLED; + spin_unlock(&lp->lp_lock); + nnis = max_t(int, lp->lp_nnis, pbuf->pb_info.pi_nnis); curnis = kmalloc_array(nnis, sizeof(*curnis), GFP_NOFS); addnis = kmalloc_array(nnis, sizeof(*addnis), GFP_NOFS); From patchwork Thu Feb 27 21:13:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410225 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A7E992A for ; Thu, 27 Feb 2020 21:32:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 02C8A246A1 for ; Thu, 27 Feb 2020 21:32:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02C8A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 53FC83491FA; Thu, 27 Feb 2020 13:27:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 08E8F21FF1E for ; Thu, 27 Feb 2020 13:20:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 47E478A87; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 46C9546A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:25 -0500 Message-Id: <1582838290-17243-338-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 337/622] lnet: peer aliveness X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Peer NI aliveness is now solely dependent on the health infrastructure. With the addition of router_sensitivity_percentage, peer NI is considered dead if its health drops below the percentage specified of the total health. Setting the percentage to 100% means that a peer_ni is considered dead if it's interface is less than fully healthy. Removed obsolete code that queries the peer NI every second since the health infrastructure introduces the recovery mechanism which is designed to recover the health of peer NIs. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 8e498d3f23ea ("LU-11300 lnet: peer aliveness") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33186 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 25 ++++++--- include/linux/lnet/lib-types.h | 2 - net/lnet/lnet/lib-move.c | 124 +++++------------------------------------ net/lnet/lnet/peer.c | 7 ++- net/lnet/lnet/router.c | 11 ++-- net/lnet/lnet/router_proc.c | 3 +- 6 files changed, 42 insertions(+), 130 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index eae55d5..d5704b7 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -846,15 +846,6 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return NULL; } -static inline void -lnet_peer_set_alive(struct lnet_peer_ni *lp) -{ - lp->lpni_last_query = ktime_get_seconds(); - lp->lpni_last_alive = lp->lpni_last_query; - if (!lp->lpni_alive) - lnet_notify_locked(lp, 0, 1, lp->lpni_last_alive); -} - static inline bool lnet_peer_is_multi_rail(struct lnet_peer *lp) { @@ -889,6 +880,22 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return false; } +/* + * A peer is alive if it satisfies the following two conditions: + * 1. peer health >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage + * 2. the cached NI status received when we discover the peer is UP + */ +static inline bool +lnet_is_peer_ni_alive(struct lnet_peer_ni *lpni) +{ + bool halive = false; + + halive = (atomic_read(&lpni->lpni_healthv) >= + (LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage / 100)); + + return halive && lpni->lpni_ns_status == LNET_NI_STATUS_UP; +} + static inline void lnet_inc_healthv(atomic_t *healthv) { diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index ecc6dee..9a09fad 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -553,8 +553,6 @@ struct lnet_peer_ni { int lpni_rtrcredits; /* low water mark */ int lpni_minrtrcredits; - /* alive/dead? */ - bool lpni_alive; /* notification outstanding? */ bool lpni_notify; /* outstanding notification for LND? */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 99ff882..af3cd1e 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -609,86 +609,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } /* - * This function can be called from two paths: - * 1. when sending a message - * 2. when decommiting a message (lnet_msg_decommit_tx()) - * In both these cases the peer_ni should have it's reference count - * acquired by the caller and therefore it is safe to drop the spin - * lock before calling lnd_query() - */ -static void -lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp) -{ - time64_t last_alive = 0; - int cpt = lnet_cpt_of_nid_locked(lp->lpni_nid, ni); - - LASSERT(lnet_peer_aliveness_enabled(lp)); - LASSERT(ni->ni_net->net_lnd->lnd_query); - - lnet_net_unlock(cpt); - ni->ni_net->net_lnd->lnd_query(ni, lp->lpni_nid, &last_alive); - lnet_net_lock(cpt); - - lp->lpni_last_query = ktime_get_seconds(); - - if (last_alive) /* NI has updated timestamp */ - lp->lpni_last_alive = last_alive; -} - -/* NB: always called with lnet_net_lock held */ -static inline int -lnet_peer_is_alive(struct lnet_peer_ni *lp, unsigned long now) -{ - int alive; - time64_t deadline; - - LASSERT(lnet_peer_aliveness_enabled(lp)); - - /* Trust lnet_notify() if it has more recent aliveness news, but - * ignore the initial assumed death (see lnet_peers_start_down()). - */ - spin_lock(&lp->lpni_lock); - if (!lp->lpni_alive && lp->lpni_alive_count > 0 && - lp->lpni_timestamp >= lp->lpni_last_alive) { - spin_unlock(&lp->lpni_lock); - return 0; - } - - deadline = lp->lpni_last_alive + - lp->lpni_net->net_tunables.lct_peer_timeout; - alive = deadline > now; - - /* Update obsolete lpni_alive except for routers assumed to be dead - * initially, because router checker would update aliveness in this - * case, and moreover lpni_last_alive at peer creation is assumed. - */ - if (alive && !lp->lpni_alive && - !(lnet_isrouter(lp) && !lp->lpni_alive_count)) { - spin_unlock(&lp->lpni_lock); - lnet_notify_locked(lp, 0, 1, lp->lpni_last_alive); - } else { - spin_unlock(&lp->lpni_lock); - } - - return alive; -} - -/* * NB: returns 1 when alive, 0 when dead, negative when error; * may drop the lnet_net_lock */ static int -lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp, +lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lpni, struct lnet_msg *msg) { - time64_t now = ktime_get_seconds(); - - if (!lnet_peer_aliveness_enabled(lp)) + if (!lnet_peer_aliveness_enabled(lpni)) return -ENODEV; - if (lnet_peer_is_alive(lp, now)) - return 1; - /* * If we're resending a message, let's attempt to send it even if * the peer is down to fulfill our resend quota on the message @@ -696,35 +626,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (msg->msg_retry_count > 0) return 1; - /* - * Peer appears dead, but we should avoid frequent NI queries (at - * most once per lnet_queryinterval seconds). - */ - if (lp->lpni_last_query) { - static const int lnet_queryinterval = 1; - time64_t next_query; - - next_query = lp->lpni_last_query + lnet_queryinterval; - - if (now < next_query) { - if (lp->lpni_alive) - CWARN("Unexpected aliveness of peer %s: %lld < %lld (%d/%d)\n", - libcfs_nid2str(lp->lpni_nid), - now, next_query, - lnet_queryinterval, - lp->lpni_net->net_tunables.lct_peer_timeout); - return 0; - } - } - - /* query NI for latest aliveness news */ - lnet_ni_query_locked(ni, lp); + /* try and send recovery messages irregardless */ + if (msg->msg_recovery) + return 1; - if (lnet_peer_is_alive(lp, now)) + /* always send any responses */ + if (msg->msg_type == LNET_MSG_ACK || + msg->msg_type == LNET_MSG_REPLY) return 1; - lnet_notify_locked(lp, 0, 0, lp->lpni_last_alive); - return 0; + return lnet_is_peer_ni_alive(lpni); } /** @@ -4184,18 +4095,11 @@ void lnet_monitor_thr_stop(void) /* Multi-Rail: Primary NID of source. */ msg->msg_initiator = lnet_peer_primary_nid_locked(src_nid); - if (lnet_isrouter(msg->msg_rxpeer)) { - lnet_peer_set_alive(msg->msg_rxpeer); - if (avoid_asym_router_failure && - LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) { - /* received a remote message from router, update - * remote NI status on this router. - * NB: multi-hop routed message will be ignored. - */ - lnet_router_ni_update_locked(msg->msg_rxpeer, - LNET_NIDNET(src_nid)); - } - } + /* mark the status of this lpni as UP since we received a message + * from it. The ping response reports back the ns_status which is + * marked on the remote as up or down and we cache it here. + */ + msg->msg_rxpeer->lpni_ns_status = LNET_NI_STATUS_UP; lnet_msg_commit(msg, cpt); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 91ad6b4..8669fbb 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3296,7 +3296,7 @@ void lnet_peer_discovery_stop(void) } if (lnet_isrouter(lp) || lnet_peer_aliveness_enabled(lp)) - aliveness = lp->lpni_alive ? "up" : "down"; + aliveness = (lnet_is_peer_ni_alive(lp)) ? "up" : "down"; CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n", libcfs_nid2str(lp->lpni_nid), atomic_read(&lp->lpni_refcount), @@ -3353,7 +3353,8 @@ void lnet_peer_discovery_stop(void) if (lnet_isrouter(lp) || lnet_peer_aliveness_enabled(lp)) snprintf(aliveness, LNET_MAX_STR_LEN, - lp->lpni_alive ? "up" : "down"); + lnet_is_peer_ni_alive(lp) + ? "up" : "down"); *nid = lp->lpni_nid; *refcount = atomic_read(&lp->lpni_refcount); @@ -3439,7 +3440,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) if (lnet_isrouter(lpni) || lnet_peer_aliveness_enabled(lpni)) snprintf(lpni_info->cr_aliveness, LNET_MAX_STR_LEN, - lpni->lpni_alive ? "up" : "down"); + lnet_is_peer_ni_alive(lpni) ? "up" : "down"); lpni_info->cr_refcount = atomic_read(&lpni->lpni_refcount); lpni_info->cr_ni_peer_tx_credits = lpni->lpni_net ? diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 40725d2..d5b4914 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -165,8 +165,10 @@ static int rtr_sensitivity_set(const char *val, lp->lpni_timestamp = when; /* update timestamp */ - if (lp->lpni_alive_count && /* got old news */ - (!lp->lpni_alive) == (!alive)) { /* new date for old news */ + /* got old news */ + if (lp->lpni_alive_count != 0 && + /* new date for old news */ + (!lnet_is_peer_ni_alive(lp)) == !alive) { spin_unlock(&lp->lpni_lock); CDEBUG(D_NET, "Old news\n"); return; @@ -175,10 +177,9 @@ static int rtr_sensitivity_set(const char *val, /* Flag that notification is outstanding */ lp->lpni_alive_count++; - lp->lpni_alive = !!alive; /* 1 bit! */ lp->lpni_notify = 1; lp->lpni_notifylnd = notifylnd; - if (lp->lpni_alive) + if (lnet_is_peer_ni_alive(lp)) lp->lpni_ping_feats = LNET_PING_FEAT_INVAL; /* reset */ spin_unlock(&lp->lpni_lock); @@ -214,7 +215,7 @@ static int rtr_sensitivity_set(const char *val, * lnet_notify_locked(). */ while (lp->lpni_notify) { - alive = lp->lpni_alive; + alive = lnet_is_peer_ni_alive(lp); notifylnd = lp->lpni_notifylnd; lp->lpni_notifylnd = 0; diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index d41ff00..e9aef1e 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -529,7 +529,8 @@ static int proc_lnet_peers(struct ctl_table *table, int write, if (lnet_isrouter(peer) || lnet_peer_aliveness_enabled(peer)) - aliveness = peer->lpni_alive ? "up" : "down"; + aliveness = lnet_is_peer_ni_alive(peer) ? + "up" : "down"; if (lnet_peer_aliveness_enabled(peer)) { time64_t now = ktime_get_seconds(); From patchwork Thu Feb 27 21:13:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410305 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3367992A for ; Thu, 27 Feb 2020 21:34:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1C1B824677 for ; Thu, 27 Feb 2020 21:34:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C1B824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5094C348A9D; Thu, 27 Feb 2020 13:29:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 619DA21FCFF for ; Thu, 27 Feb 2020 13:20:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4ACA98A88; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 49BA346C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:26 -0500 Message-Id: <1582838290-17243-339-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 338/622] lnet: router aliveness X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata A route is considered alive if the gateway is able to route messages from the local to the remote net. That means that at least one of the network interfaces on the remote net of the gateway is viable. Introduced the concept of sensitivity percentage. This defaults to 100%. It holds a dual meaning: 1. A route is considered alive if at least one of the its interfaces' health is >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage 100 means at least one interface has to be 100% healthy 2. On a router consider a peer_ni dead if its health is not at least LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage. 100% means the interface has to be 100% healthy. Re-implemented lnet_notify() to decrement the health of the peer interface if the LND reports a failure on that peer. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 21d2252648be ("LU-11300 lnet: router aliveness") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33185 Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 11 ++----- net/lnet/lnet/router.c | 74 +++++++++++++++++++++++++++++++++++++++++++ net/lnet/lnet/router_proc.c | 2 +- 3 files changed, 77 insertions(+), 10 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index d5704b7..0007adf 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -90,15 +90,8 @@ */ #define LNET_LND_DEFAULT_TIMEOUT 5 -static inline int lnet_is_route_alive(struct lnet_route *route) -{ - /* TODO re-implement gateway alive indication */ - CDEBUG(D_NET, "TODO: reimplement routing. gateway = %s\n", - route->lr_gateway ? - libcfs_nid2str(route->lr_gateway->lp_primary_nid) : - "undefined"); - return 1; -} +bool lnet_is_route_alive(struct lnet_route *route); +bool lnet_is_gateway_alive(struct lnet_peer *gw); static inline int lnet_is_wire_handle_none(struct lnet_handle_wire *wh) { diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index d5b4914..bb92759 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -146,6 +146,80 @@ static int rtr_sensitivity_set(const char *val, return check_routers_before_use; } +/* A net is alive if at least one gateway NI on the network is alive. */ +static bool +lnet_is_gateway_net_alive(struct lnet_peer_net *lpn) +{ + struct lnet_peer_ni *lpni; + + list_for_each_entry(lpni, &lpn->lpn_peer_nis, lpni_peer_nis) { + if (lnet_is_peer_ni_alive(lpni)) + return true; + } + + return false; +} + +/* a gateway is alive only if all its nets are alive + * called with cpt lock held + */ +bool lnet_is_gateway_alive(struct lnet_peer *gw) +{ + struct lnet_peer_net *lpn; + + list_for_each_entry(lpn, &gw->lp_peer_nets, lpn_peer_nets) { + if (!lnet_is_gateway_net_alive(lpn)) + return false; + } + + return true; +} + +/* lnet_is_route_alive() needs to be called with cpt lock held + * A route is alive if the gateway can route between the local network and + * the remote network of the route. + * This means at least one NI is alive on each of the local and remote + * networks of the gateway. + */ +bool lnet_is_route_alive(struct lnet_route *route) +{ + struct lnet_peer *gw = route->lr_gateway; + struct lnet_peer_net *llpn; + struct lnet_peer_net *rlpn; + bool route_alive; + + /* check the gateway's interfaces on the route rnet to make sure + * that the gateway is viable. + */ + llpn = lnet_peer_get_net_locked(gw, route->lr_lnet); + if (!llpn) + return false; + + route_alive = lnet_is_gateway_net_alive(llpn); + + if (avoid_asym_router_failure) { + rlpn = lnet_peer_get_net_locked(gw, route->lr_net); + if (!rlpn) + return false; + route_alive = route_alive && + lnet_is_gateway_net_alive(rlpn); + } + + if (!route_alive) + return route_alive; + + spin_lock(&gw->lp_lock); + if (!(gw->lp_state & LNET_PEER_ROUTER_ENABLED)) { + if (gw->lp_rtr_refcount > 0) + CERROR("peer %s is being used as a gateway but routing feature is not turned on\n", + libcfs_nid2str(gw->lp_primary_nid)); + route_alive = false; + } + spin_unlock(&gw->lp_lock); + + return route_alive; +} + void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, time64_t when) diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index e9aef1e..3120533 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -325,7 +325,7 @@ static int proc_lnet_routers(struct ctl_table *table, int write, int nrefs = atomic_read(&peer->lp_refcount); int nrtrrefs = peer->lp_rtr_refcount; int alive_cnt = 0; - int alive = 0; + int alive = lnet_is_gateway_alive(peer); int pingsent = ((peer->lp_state & LNET_PEER_PING_SENT) != 0); time64_t last_ping = now - peer->lp_rtrcheck_timestamp; From patchwork Thu Feb 27 21:13:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410339 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECF5C138D for ; Thu, 27 Feb 2020 21:35:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D588E24677 for ; Thu, 27 Feb 2020 21:35:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D588E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B12F434A0DA; Thu, 27 Feb 2020 13:29:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B9A4A21FCE2 for ; Thu, 27 Feb 2020 13:20:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4D7528A89; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4C6CA46D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:27 -0500 Message-Id: <1582838290-17243-340-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 339/622] lnet: simplify lnet_handle_local_failure() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Pass the struct lnet_ni to lnet_handle_local_failure() instead of the message structure, since nothing else from the message is being used. This also makes symmetrical with lnet_handle_remote_failure() WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: f8c7dd6f5374 ("LU-11300 lnet: simplify lnet_handle_local_failure()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33452 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index e4253de..23c3bf4 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -461,12 +461,8 @@ } static void -lnet_handle_local_failure(struct lnet_msg *msg) +lnet_handle_local_failure(struct lnet_ni *local_ni) { - struct lnet_ni *local_ni; - - local_ni = msg->msg_txni; - /* the lnet_net_lock(0) is used to protect the addref on the ni * and the recovery queue. */ @@ -652,7 +648,7 @@ case LNET_MSG_STATUS_LOCAL_ABORTED: case LNET_MSG_STATUS_LOCAL_NO_ROUTE: case LNET_MSG_STATUS_LOCAL_TIMEOUT: - lnet_handle_local_failure(msg); + lnet_handle_local_failure(msg->msg_txni); /* add to the re-send queue */ goto resend; @@ -660,7 +656,7 @@ * finalize the message */ case LNET_MSG_STATUS_LOCAL_ERROR: - lnet_handle_local_failure(msg); + lnet_handle_local_failure(msg->msg_txni); return -1; /* TODO: since the remote dropped the message we can From patchwork Thu Feb 27 21:13:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410343 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B860192A for ; Thu, 27 Feb 2020 21:35:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A122924677 for ; Thu, 27 Feb 2020 21:35:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A122924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3CCDA348E97; Thu, 27 Feb 2020 13:29:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0801C21FCE2 for ; Thu, 27 Feb 2020 13:20:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5072C8A8A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4F49D47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:28 -0500 Message-Id: <1582838290-17243-341-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 340/622] lnet: Cleanup rcd X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Cleanup all code pertaining to rcd, as routing code will use discovery going forward and there will be no need to keep its own pinging code. test_215 looks at the routers file which had its format changed. Update the test to reflect the change. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: 9ee453928ab8 ("LU-11299 lnet: Cleanup rcd") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33187 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 4 - include/linux/lnet/lib-types.h | 40 +------ net/lnet/lnet/api-ni.c | 24 +++- net/lnet/lnet/lib-move.c | 11 -- net/lnet/lnet/router.c | 255 ----------------------------------------- net/lnet/lnet/router_proc.c | 66 ++--------- 6 files changed, 31 insertions(+), 369 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 0007adf..8730670 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -748,11 +748,7 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, bool lnet_router_checker_active(void); void lnet_check_routers(void); -int lnet_router_pre_mt_start(void); void lnet_router_post_mt_start(void); -void lnet_prune_rc_data(int wait_unlink); -void lnet_router_cleanup(void); -void lnet_router_ni_update_locked(struct lnet_peer_ni *gw, u32 net); void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf); int lnet_ping_info_validate(struct lnet_ping_info *pinfo); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 9a09fad..495e805 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -509,20 +509,6 @@ struct lnet_ping_buffer { #define LNET_PING_INFO_TO_BUFFER(PINFO) \ container_of((PINFO), struct lnet_ping_buffer, pb_info) -/* router checker data, per router */ -struct lnet_rc_data { - /* chain on the_lnet.ln_zombie_rcd or ln_deathrow_rcd */ - struct list_head rcd_list; - /* ping buffer MD */ - struct lnet_handle_md rcd_mdh; - /* reference to gateway */ - struct lnet_peer_ni *rcd_gateway; - /* ping buffer */ - struct lnet_ping_buffer *rcd_pingbuffer; - /* desired size of buffer */ - int rcd_nnis; -}; - struct lnet_peer_ni { /* chain on lpn_peer_nis */ struct list_head lpni_peer_nis; @@ -553,22 +539,8 @@ struct lnet_peer_ni { int lpni_rtrcredits; /* low water mark */ int lpni_minrtrcredits; - /* notification outstanding? */ - bool lpni_notify; - /* outstanding notification for LND? */ - bool lpni_notifylnd; - /* some thread is handling notification */ - bool lpni_notifying; - /* # times router went dead<->alive */ - int lpni_alive_count; - /* ytes queued for sending */ + /* bytes queued for sending */ long lpni_txqnob; - /* time of last aliveness news */ - time64_t lpni_timestamp; - /* when I was last alive */ - time64_t lpni_last_alive; - /* when lpni_ni was queried last time */ - time64_t lpni_last_query; /* network peer is on */ struct lnet_net *lpni_net; /* peer's NID */ @@ -598,8 +570,6 @@ struct lnet_peer_ni { } lpni_pref; /* number of preferred NIDs in lnpi_pref_nids */ u32 lpni_pref_nnids; - /* router checker state */ - struct lnet_rc_data *lpni_rcd; }; /* Preferred path added due to traffic on non-MR peer_ni */ @@ -823,8 +793,6 @@ struct lnet_route { u32 lr_lnet; /* sequence for round-robin */ int lr_seq; - /* number of down NIs */ - unsigned int lr_downis; /* how far I am */ u32 lr_hops; /* route priority */ @@ -1115,12 +1083,6 @@ struct lnet { /* monitor thread startup/shutdown state */ enum lnet_rc_state ln_mt_state; - /* router checker's event queue */ - struct lnet_handle_eq ln_rc_eqh; - /* rcd still pending on net */ - struct list_head ln_rcd_deathrow; - /* rcd ready for free */ - struct list_head ln_rcd_zombie; /* serialise startup/shutdown */ struct completion ln_mt_signal; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index d27e9a4..32b4b4f 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1457,6 +1457,27 @@ struct lnet_ping_buffer * return count; } +void +lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) +{ + struct lnet_ni_status *stat; + int nnis; + int i; + + __swab32s(&pbuf->pb_info.pi_magic); + __swab32s(&pbuf->pb_info.pi_features); + __swab32s(&pbuf->pb_info.pi_pid); + __swab32s(&pbuf->pb_info.pi_nnis); + nnis = pbuf->pb_info.pi_nnis; + if (nnis > pbuf->pb_nnis) + nnis = pbuf->pb_nnis; + for (i = 0; i < nnis; i++) { + stat = &pbuf->pb_info.pi_ni[i]; + __swab64s(&stat->ns_nid); + __swab32s(&stat->ns_status); + } +} + int lnet_ping_info_validate(struct lnet_ping_info *pinfo) { @@ -2362,12 +2383,9 @@ int lnet_lib_init(void) } the_lnet.ln_refcount = 0; - LNetInvalidateEQHandle(&the_lnet.ln_rc_eqh); INIT_LIST_HEAD(&the_lnet.ln_lnds); INIT_LIST_HEAD(&the_lnet.ln_net_zombie); - INIT_LIST_HEAD(&the_lnet.ln_rcd_zombie); INIT_LIST_HEAD(&the_lnet.ln_msg_resend); - INIT_LIST_HEAD(&the_lnet.ln_rcd_deathrow); /* * The hash table size is the number of bits it takes to express the set diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index af3cd1e..2e2299d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3151,9 +3151,6 @@ struct lnet_mt_event_info { false, HZ * interval); } - /* clean up the router checker */ - lnet_prune_rc_data(1); - /* Shutting down */ lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; @@ -3364,11 +3361,6 @@ int lnet_monitor_thr_start(void) if (rc) goto clean_queues; - /* Pre monitor thread start processing */ - rc = lnet_router_pre_mt_start(); - if (rc) - goto free_mem; - init_completion(&the_lnet.ln_mt_signal); lnet_net_lock(LNET_LOCK_EX); @@ -3393,8 +3385,6 @@ int lnet_monitor_thr_start(void) /* block until event callback signals exit */ wait_for_completion(&the_lnet.ln_mt_signal); /* clean up */ - lnet_router_cleanup(); -free_mem: lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_mt_state = LNET_MT_STATE_SHUTDOWN; lnet_net_unlock(LNET_LOCK_EX); @@ -3430,7 +3420,6 @@ void lnet_monitor_thr_stop(void) LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN); /* perform cleanup tasks */ - lnet_router_cleanup(); lnet_rsp_tracker_clean(); lnet_clean_local_ni_recoveryq(); lnet_clean_peer_ni_recoveryq(); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index bb92759..1399545 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -220,101 +220,6 @@ bool lnet_is_route_alive(struct lnet_route *route) return route_alive; } -void -lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, - time64_t when) -{ - if (lp->lpni_timestamp > when) { /* out of date information */ - CDEBUG(D_NET, "Out of date\n"); - return; - } - - /* - * This function can be called with different cpt locks being - * held. lpni_alive_count modification needs to be properly protected. - * Significant reads to lpni_alive_count are also protected with - * the same lock - */ - spin_lock(&lp->lpni_lock); - - lp->lpni_timestamp = when; /* update timestamp */ - - /* got old news */ - if (lp->lpni_alive_count != 0 && - /* new date for old news */ - (!lnet_is_peer_ni_alive(lp)) == !alive) { - spin_unlock(&lp->lpni_lock); - CDEBUG(D_NET, "Old news\n"); - return; - } - - /* Flag that notification is outstanding */ - - lp->lpni_alive_count++; - lp->lpni_notify = 1; - lp->lpni_notifylnd = notifylnd; - if (lnet_is_peer_ni_alive(lp)) - lp->lpni_ping_feats = LNET_PING_FEAT_INVAL; /* reset */ - - spin_unlock(&lp->lpni_lock); - - CDEBUG(D_NET, "set %s %d\n", libcfs_nid2str(lp->lpni_nid), alive); -} - -/* - * This function will always be called with lp->lpni_cpt lock held. - */ -static void -lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer_ni *lp) -{ - int alive; - int notifylnd; - - /* - * Notify only in 1 thread at any time to ensure ordered notification. - * NB individual events can be missed; the only guarantee is that you - * always get the most recent news - */ - spin_lock(&lp->lpni_lock); - - if (lp->lpni_notifying || !ni) { - spin_unlock(&lp->lpni_lock); - return; - } - - lp->lpni_notifying = 1; - - /* - * lp->lpni_notify needs to be protected because it can be set in - * lnet_notify_locked(). - */ - while (lp->lpni_notify) { - alive = lnet_is_peer_ni_alive(lp); - notifylnd = lp->lpni_notifylnd; - - lp->lpni_notifylnd = 0; - lp->lpni_notify = 0; - - if (notifylnd && ni->ni_net->net_lnd->lnd_notify) { - spin_unlock(&lp->lpni_lock); - lnet_net_unlock(lp->lpni_cpt); - - /* - * A new notification could happen now; I'll handle it - * when control returns to me - */ - ni->ni_net->net_lnd->lnd_notify(ni, lp->lpni_nid, - alive); - - lnet_net_lock(lp->lpni_cpt); - spin_lock(&lp->lpni_lock); - } - } - - lp->lpni_notifying = 0; - spin_unlock(&lp->lpni_lock); -} - static void lnet_rtr_addref_locked(struct lnet_peer *lp) { @@ -721,93 +626,6 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) return -ENOENT; } -void -lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) -{ - struct lnet_ni_status *stat; - int nnis; - int i; - - __swab32s(&pbuf->pb_info.pi_magic); - __swab32s(&pbuf->pb_info.pi_features); - __swab32s(&pbuf->pb_info.pi_pid); - __swab32s(&pbuf->pb_info.pi_nnis); - nnis = pbuf->pb_info.pi_nnis; - if (nnis > pbuf->pb_nnis) - nnis = pbuf->pb_nnis; - for (i = 0; i < nnis; i++) { - stat = &pbuf->pb_info.pi_ni[i]; - __swab64s(&stat->ns_nid); - __swab32s(&stat->ns_status); - } -} - -/** - * TODO: re-implement - */ -static void -lnet_parse_rc_info(struct lnet_rc_data *rcd) -{ - rcd = rcd; -} - -static void -lnet_router_checker_event(struct lnet_event *event) -{ - struct lnet_rc_data *rcd = event->md.user_ptr; - struct lnet_peer_ni *lp; - - LASSERT(rcd); - - if (event->unlinked) { - LNetInvalidateMDHandle(&rcd->rcd_mdh); - return; - } - - LASSERT(event->type == LNET_EVENT_SEND || - event->type == LNET_EVENT_REPLY); - - lp = rcd->rcd_gateway; - LASSERT(lp); - - /* - * NB: it's called with holding lnet_res_lock, we have a few - * places need to hold both locks at the same time, please take - * care of lock ordering - */ - lnet_net_lock(lp->lpni_cpt); - if (!lnet_isrouter(lp) || lp->lpni_rcd != rcd) { - /* ignore if no longer a router or rcd is replaced */ - goto out; - } - - if (event->type == LNET_EVENT_SEND) { - if (!event->status) - goto out; - } - - /* LNET_EVENT_REPLY */ - /* - * A successful REPLY means the router is up. If _any_ comms - * to the router fail I assume it's down (this will happen if - * we ping alive routers to try to detect router death before - * apps get burned). - */ - lnet_notify_locked(lp, 1, !event->status, ktime_get_seconds()); - - /* - * The router checker will wake up very shortly and do the - * actual notification. - * XXX If 'lp' stops being a router before then, it will still - * have the notification pending!!! - */ - if (avoid_asym_router_failure && !event->status) - lnet_parse_rc_info(rcd); - -out: - lnet_net_unlock(lp->lpni_cpt); -} - static void lnet_wait_known_routerstate(void) { @@ -840,26 +658,6 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -/* TODO: reimplement */ -void -lnet_router_ni_update_locked(struct lnet_peer_ni *gw, u32 net) -{ - struct lnet_route *rte; - struct lnet_peer *lp; - - if ((gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) - lp = gw->lpni_peer_net->lpn_peer; - else - return; - - list_for_each_entry(rte, &lp->lp_routes, lr_gwlist) { - if (rte->lr_net == net) { - rte->lr_downis = 0; - break; - } - } -} - static void lnet_update_ni_status_locked(void) { @@ -902,25 +700,6 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -int lnet_router_pre_mt_start(void) -{ - int rc; - - if (check_routers_before_use && - dead_router_check_interval <= 0) { - LCONSOLE_ERROR_MSG(0x10a, "'dead_router_check_interval' must be set if 'check_routers_before_use' is set\n"); - return -EINVAL; - } - - rc = LNetEQAlloc(0, lnet_router_checker_event, &the_lnet.ln_rc_eqh); - if (rc) { - CERROR("Can't allocate EQ(0): %d\n", rc); - return -ENOMEM; - } - - return 0; -} - void lnet_router_post_mt_start(void) { if (check_routers_before_use) { @@ -933,19 +712,6 @@ void lnet_router_post_mt_start(void) } } -void lnet_router_cleanup(void) -{ - int rc; - - rc = LNetEQFree(the_lnet.ln_rc_eqh); - LASSERT(rc == 0); -} - -void lnet_prune_rc_data(int wait_unlink) -{ - wait_unlink = wait_unlink; -} - /* * This function is called from the monitor thread to check if there are * any active routers that need to be checked. @@ -962,11 +728,6 @@ bool lnet_router_checker_active(void) if (the_lnet.ln_routing) return true; - /* if there are routers that need to be cleaned up then do so */ - if (!list_empty(&the_lnet.ln_rcd_deathrow) || - !list_empty(&the_lnet.ln_rcd_zombie)) - return true; - return !list_empty(&the_lnet.ln_routers) && (live_router_check_interval > 0 || dead_router_check_interval > 0); @@ -997,8 +758,6 @@ bool lnet_router_checker_active(void) lnet_update_ni_status_locked(); lnet_net_unlock(cpt); - - lnet_prune_rc_data(0); /* don't wait for UNLINK */ } void @@ -1503,20 +1262,6 @@ bool lnet_router_checker_active(void) lnet_net_lock(cpt); } - /* - * We can't fully trust LND on reporting exact peer last_alive - * if he notifies us about dead peer. For example ksocklnd can - * call us with when == _time_when_the_node_was_booted_ if - * no connections were successfully established - */ - if (ni && !alive && when < lp->lpni_last_alive) - when = lp->lpni_last_alive; - - lnet_notify_locked(lp, !ni, alive, when); - - if (ni) - lnet_ni_notify_locked(ni, lp); - lnet_peer_ni_decref_locked(lp); lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 3120533..e494d19 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -215,7 +215,6 @@ static int proc_lnet_routes(struct ctl_table *table, int write, u32 net = rnet->lrn_net; u32 hops = route->lr_hops; unsigned int priority = route->lr_priority; - lnet_nid_t nid = route->lr_gateway->lp_primary_nid; int alive = lnet_is_route_alive(route); s += snprintf(s, tmpstr + tmpsiz - s, @@ -223,7 +222,8 @@ static int proc_lnet_routes(struct ctl_table *table, int write, libcfs_net2str(net), hops, priority, alive ? "up" : "down", - libcfs_nid2str(nid)); + /* TODO: replace with actual nid */ + libcfs_nid2str(LNET_NID_ANY)); LASSERT(tmpstr + tmpsiz - s > 0); } @@ -278,10 +278,8 @@ static int proc_lnet_routers(struct ctl_table *table, int write, if (!*ppos) { s += snprintf(s, tmpstr + tmpsiz - s, - "%-4s %7s %9s %6s %12s %9s %8s %7s %s\n", - "ref", "rtr_ref", "alive_cnt", "state", - "last_ping", "ping_sent", "deadline", - "down_ni", "router"); + "%-4s %7s %5s %s\n", + "ref", "rtr_ref", "alive", "router"); LASSERT(tmpstr + tmpsiz - s > 0); lnet_net_lock(0); @@ -319,48 +317,15 @@ static int proc_lnet_routers(struct ctl_table *table, int write, if (peer) { lnet_nid_t nid = peer->lp_primary_nid; - time64_t now = ktime_get_seconds(); - /* TODO: readjust what's being printed */ - time64_t deadline = 0; int nrefs = atomic_read(&peer->lp_refcount); int nrtrrefs = peer->lp_rtr_refcount; - int alive_cnt = 0; int alive = lnet_is_gateway_alive(peer); - int pingsent = ((peer->lp_state & LNET_PEER_PING_SENT) - != 0); - time64_t last_ping = now - peer->lp_rtrcheck_timestamp; - int down_ni = 0; - struct lnet_route *rtr; - - if (nrtrrefs > 0) { - list_for_each_entry(rtr, &peer->lp_routes, - lr_gwlist) { - /* - * downis on any route should be the - * number of downis on the gateway - */ - if (rtr->lr_downis) { - down_ni = rtr->lr_downis; - break; - } - } - } - if (!deadline) - s += snprintf(s, tmpstr + tmpsiz - s, - "%-4d %7d %9d %6s %12llu %9d %8s %7d %s\n", - nrefs, nrtrrefs, alive_cnt, - alive ? "up" : "down", last_ping, - pingsent, "NA", down_ni, - libcfs_nid2str(nid)); - else - s += snprintf(s, tmpstr + tmpsiz - s, - "%-4d %7d %9d %6s %12llu %9d %8llu %7d %s\n", - nrefs, nrtrrefs, alive_cnt, - alive ? "up" : "down", last_ping, - pingsent, deadline - now, - down_ni, libcfs_nid2str(nid)); - LASSERT(tmpstr + tmpsiz - s > 0); + s += snprintf(s, tmpstr + tmpsiz - s, + "%-4d %7d %5s %s\n", + nrefs, nrtrrefs, + alive ? "up" : "down", + libcfs_nid2str(nid)); } lnet_net_unlock(0); @@ -532,19 +497,6 @@ static int proc_lnet_peers(struct ctl_table *table, int write, aliveness = lnet_is_peer_ni_alive(peer) ? "up" : "down"; - if (lnet_peer_aliveness_enabled(peer)) { - time64_t now = ktime_get_seconds(); - - lastalive = now - peer->lpni_last_alive; - - /* No need to mess up peers contents with - * arbitrarily long integers - it suffices to - * know that lastalive is more than 10000s old - */ - if (lastalive >= 10000) - lastalive = 9999; - } - lnet_net_unlock(cpt); s += snprintf(s, tmpstr + tmpsiz - s, From patchwork Thu Feb 27 21:13:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410347 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB29F92A for ; Thu, 27 Feb 2020 21:35:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B3C9924677 for ; Thu, 27 Feb 2020 21:35:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3C9924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 90763348874; Thu, 27 Feb 2020 13:29:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5E74321FCE2 for ; Thu, 27 Feb 2020 13:20:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 534A28A8B; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 52248468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:29 -0500 Message-Id: <1582838290-17243-342-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 341/622] lnet: modify lnd notification mechanism X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata LND notifies when a peer is up or down. If the LND notifies LNet that the peer is up and sets the "reset" flag to true then this indicates to LNet that the LND knows about the health of the peer and is telling LNet that the peer is fully healthy. LNet will set the health value of the peer to maximum, otherwise it will increment the health by one. If the LND notifies the LNet that the peer is down, LNet will decrement the health of the peer by sensitivity value configured. LNet then turns around and rechecks the peer aliveness and if its dead it'll notify the LND. This code is only used by the socklnd because it needs to tear down connections. This is in keeping with the original functionality. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: b34e754c1a0b ("LU-11299 lnet: modify lnd notification mechanism") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33453 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 8 ++++- include/linux/lnet/lib-types.h | 4 +-- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 2 +- net/lnet/klnds/socklnd/socklnd.c | 21 ++++++------- net/lnet/klnds/socklnd/socklnd.h | 2 +- net/lnet/lnet/api-ni.c | 2 +- net/lnet/lnet/router.c | 60 +++++++++++++++++++++++++------------ 7 files changed, 62 insertions(+), 37 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 8730670..94918d3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -506,7 +506,7 @@ struct lnet_ni * void lnet_mt_event_handler(struct lnet_event *event); -int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, int alive, +int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, bool alive, bool reset, time64_t when); void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, time64_t when); @@ -886,6 +886,12 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, } static inline void +lnet_set_healthv(atomic_t *healthv, int value) +{ + atomic_set(healthv, value); +} + +static inline void lnet_inc_healthv(atomic_t *healthv) { atomic_add_unless(healthv, 1, LNET_MAX_HEALTH_VALUE); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 495e805..2d5ae21 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -298,8 +298,8 @@ struct lnet_lnd { int (*lnd_eager_recv)(struct lnet_ni *ni, void *private, struct lnet_msg *msg, void **new_privatep); - /* notification of peer health */ - void (*lnd_notify)(struct lnet_ni *ni, lnet_nid_t peer, int alive); + /* notification of peer down */ + void (*lnd_notify_peer_down)(lnet_nid_t peer); /* query of peer aliveness */ void (*lnd_query)(struct lnet_ni *ni, lnet_nid_t peer, time64_t *when); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index a3abbb6..69918cf 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1960,7 +1960,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (error) lnet_notify(peer_ni->ibp_ni, - peer_ni->ibp_nid, 0, last_alive); + peer_ni->ibp_nid, false, false, last_alive); } void diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 8b283ac..0f5c7fc 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1518,8 +1518,8 @@ struct ksock_peer * read_unlock(&ksocknal_data.ksnd_global_lock); if (notify) - lnet_notify(peer_ni->ksnp_ni, peer_ni->ksnp_id.nid, 0, - last_alive); + lnet_notify(peer_ni->ksnp_ni, peer_ni->ksnp_id.nid, + false, false, last_alive); } void @@ -1787,7 +1787,7 @@ struct ksock_peer * } void -ksocknal_notify(struct lnet_ni *ni, lnet_nid_t gw_nid, int alive) +ksocknal_notify_gw_down(lnet_nid_t gw_nid) { /* * The router is telling me she's been notified of a change in @@ -1798,17 +1798,14 @@ struct ksock_peer * id.nid = gw_nid; id.pid = LNET_PID_ANY; - CDEBUG(D_NET, "gw %s %s\n", libcfs_nid2str(gw_nid), - alive ? "up" : "down"); + CDEBUG(D_NET, "gw %s down\n", libcfs_nid2str(gw_nid)); - if (!alive) { - /* If the gateway crashed, close all open connections... */ - ksocknal_close_matching_conns(id, 0); - return; - } + /* If the gateway crashed, close all open connections... */ + ksocknal_close_matching_conns(id, 0); + return; /* - * ...otherwise do nothing. We can only establish new connections + * We can only establish new connections * if we have autroutes, and these connect on demand. */ } @@ -2839,7 +2836,7 @@ static int __init ksocklnd_init(void) the_ksocklnd.lnd_ctl = ksocknal_ctl; the_ksocklnd.lnd_send = ksocknal_send; the_ksocklnd.lnd_recv = ksocknal_recv; - the_ksocklnd.lnd_notify = ksocknal_notify; + the_ksocklnd.lnd_notify_peer_down = ksocknal_notify_gw_down; the_ksocklnd.lnd_query = ksocknal_query; the_ksocklnd.lnd_accept = ksocknal_accept; diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 2e292f0..80c2e19 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -659,7 +659,7 @@ int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx, void ksocknal_next_tx_carrier(struct ksock_conn *conn); void ksocknal_queue_tx_locked(struct ksock_tx *tx, struct ksock_conn *conn); void ksocknal_txlist_done(struct lnet_ni *ni, struct list_head *txlist, int error); -void ksocknal_notify(struct lnet_ni *ni, lnet_nid_t gw_nid, int alive); +void ksocknal_notify(lnet_nid_t gw_nid); void ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when); int ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name); void ksocknal_thread_fini(void); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 32b4b4f..4dc9514 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3767,7 +3767,7 @@ u32 lnet_get_dlc_seq_locked(void) * that deadline to the wall clock. */ deadline += ktime_get_seconds(); - return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, + return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, false, deadline); } diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 1399545..22a3018 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1199,12 +1199,26 @@ bool lnet_router_checker_active(void) lnet_rtrpools_free(1); } +static inline void +lnet_notify_peer_down(struct lnet_ni *ni, lnet_nid_t nid) +{ + if (ni->ni_net->net_lnd->lnd_notify_peer_down) + ni->ni_net->net_lnd->lnd_notify_peer_down(nid); +} + +/* ni: local NI used to communicate with the peer + * nid: peer NID + * alive: true if peer is alive, false otherwise + * reset: reset health value. This is requested by the LND. + * when: notificaiton time. + */ int -lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, int alive, time64_t when) +lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, bool alive, bool reset, + time64_t when) { - struct lnet_peer_ni *lp = NULL; + struct lnet_peer_ni *lpni = NULL; time64_t now = ktime_get_seconds(); - int cpt = lnet_cpt_of_nid(nid, ni); + int cpt; LASSERT(!in_interrupt()); @@ -1235,36 +1249,44 @@ bool lnet_router_checker_active(void) return 0; } - lnet_net_lock(cpt); + /* must lock 0 since this is used for synchronization */ + lnet_net_lock(0); if (the_lnet.ln_state != LNET_STATE_RUNNING) { - lnet_net_unlock(cpt); + lnet_net_unlock(0); return -ESHUTDOWN; } - lp = lnet_find_peer_ni_locked(nid); - if (!lp) { + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { /* nid not found */ - lnet_net_unlock(cpt); + lnet_net_unlock(0); CDEBUG(D_NET, "%s not found\n", libcfs_nid2str(nid)); return 0; } - /* - * It is possible for this function to be called for the same peer - * but with different NIs. We want to synchronize the notification - * between the different calls. So we will use the lpni_cpt to - * grab the net lock. - */ - if (lp->lpni_cpt != cpt) { - lnet_net_unlock(cpt); - cpt = lp->lpni_cpt; - lnet_net_lock(cpt); + if (alive) { + if (reset) + lnet_set_healthv(&lpni->lpni_healthv, + LNET_MAX_HEALTH_VALUE); + else + lnet_inc_healthv(&lpni->lpni_healthv); + } else { + lnet_handle_remote_failure_locked(lpni); } - lnet_peer_ni_decref_locked(lp); + /* recalculate aliveness */ + alive = lnet_is_peer_ni_alive(lpni); + lnet_net_unlock(0); + if (ni && !alive) + lnet_notify_peer_down(ni, lpni->lpni_nid); + + cpt = lpni->lpni_cpt; + lnet_net_lock(cpt); + lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); + return 0; } EXPORT_SYMBOL(lnet_notify); From patchwork Thu Feb 27 21:13:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410247 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 53C44138D for ; Thu, 27 Feb 2020 21:33:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 393D224677 for ; Thu, 27 Feb 2020 21:33:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 393D224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0BFC349D0B; Thu, 27 Feb 2020 13:28:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B5C8421FCE2 for ; Thu, 27 Feb 2020 13:20:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 568848A8C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5533C46A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:30 -0500 Message-Id: <1582838290-17243-343-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 342/622] lnet: use discovery for routing X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Instead of re-inventing the wheel, routing now uses discovery. Every router interval the router is discovered. This will update the router information locally and will serve to let the router know that the peer is alive. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: 146580754295 ("LU-11299 lnet: use discovery for routing") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33454 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 9 ++- include/linux/lnet/lib-types.h | 5 ++ net/lnet/lnet/api-ni.c | 19 +++--- net/lnet/lnet/lib-move.c | 10 ++- net/lnet/lnet/peer.c | 41 ++++++++++++- net/lnet/lnet/router.c | 134 +++++++++++++++++++++++++++++++++++------ net/lnet/lnet/router_proc.c | 3 +- 7 files changed, 186 insertions(+), 35 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 94918d3..1d06263 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -499,6 +499,7 @@ struct lnet_ni * extern unsigned int lnet_peer_discovery_disabled; extern unsigned int lnet_drop_asym_route; extern unsigned int router_sensitivity_percentage; +extern int alive_router_check_interval; extern int portal_rotor; int lnet_lib_init(void); @@ -742,13 +743,16 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, int lnet_peers_start_down(void); int lnet_peer_buffer_credits(struct lnet_net *net); +void lnet_consolidate_routes_locked(struct lnet_peer *orig_lp, + struct lnet_peer *new_lp); +void lnet_router_discovery_complete(struct lnet_peer *lp); int lnet_monitor_thr_start(void); void lnet_monitor_thr_stop(void); bool lnet_router_checker_active(void); void lnet_check_routers(void); -void lnet_router_post_mt_start(void); +void lnet_wait_router_start(void); void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf); int lnet_ping_info_validate(struct lnet_ping_info *pinfo); @@ -795,6 +799,8 @@ struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt); struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); +struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp, + lnet_nid_t nid); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); struct lnet_peer *lnet_find_peer(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); @@ -854,6 +860,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, } bool lnet_peer_is_uptodate(struct lnet_peer *lp); +bool lnet_peer_gw_discovery(struct lnet_peer *lp); static inline bool lnet_peer_needs_push(struct lnet_peer *lp) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 2d5ae21..9662c9e 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -716,6 +716,9 @@ struct lnet_peer { #define LNET_PEER_FORCE_PING BIT(13) /* Forced Ping */ #define LNET_PEER_FORCE_PUSH BIT(14) /* Forced Push */ +/* gw undergoing alive discovery */ +#define LNET_PEER_RTR_DISCOVERY BIT(16) + struct lnet_peer_net { /* chain on lp_peer_nets */ struct list_head lpn_peer_nets; @@ -787,6 +790,8 @@ struct lnet_route { struct list_head lr_gwlist; /* router node */ struct lnet_peer *lr_gateway; + /* NID used to add route */ + lnet_nid_t lr_nid; /* remote network number */ u32 lr_net; /* local network number */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 4dc9514..b1823cd 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2533,29 +2533,32 @@ void lnet_lib_exit(void) goto err_stop_ping; } - rc = lnet_monitor_thr_start(); + rc = lnet_push_target_init(); if (rc) goto err_stop_ping; - rc = lnet_push_target_init(); - if (rc != 0) - goto err_stop_monitor_thr; - rc = lnet_peer_discovery_start(); if (rc != 0) goto err_destroy_push_target; + rc = lnet_monitor_thr_start(); + if (rc != 0) + goto err_stop_discovery_thr; + lnet_fault_init(); lnet_router_debugfs_init(); mutex_unlock(&the_lnet.ln_api_mutex); + /* wait for all routers to start */ + lnet_wait_router_start(); + return 0; +err_stop_discovery_thr: + lnet_peer_discovery_stop(); err_destroy_push_target: lnet_push_target_fini(); -err_stop_monitor_thr: - lnet_monitor_thr_stop(); err_stop_ping: lnet_ping_target_fini(); err_acceptor_stop: @@ -2603,9 +2606,9 @@ void lnet_lib_exit(void) lnet_fault_fini(); lnet_router_debugfs_fini(); + lnet_monitor_thr_stop(); lnet_peer_discovery_stop(); lnet_push_target_fini(); - lnet_monitor_thr_stop(); lnet_ping_target_fini(); /* Teardown fns that use my own API functions BEFORE here */ diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 2e2299d..e214a95 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1748,6 +1748,13 @@ struct lnet_ni * lnet_peer_ni_addref_locked(lpni); + peer = lpni->lpni_peer_net->lpn_peer; + + if (lnet_peer_gw_discovery(peer)) { + lnet_peer_ni_decref_locked(lpni); + return 0; + } + rc = lnet_discover_peer_locked(lpni, cpt, false); if (rc) { lnet_peer_ni_decref_locked(lpni); @@ -3373,9 +3380,6 @@ int lnet_monitor_thr_start(void) goto clean_thread; } - /* post monitor thread start processing */ - lnet_router_post_mt_start(); - return 0; clean_thread: diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 8669fbb..b804d78 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -659,6 +659,24 @@ struct lnet_peer_ni * return lpni; } +struct lnet_peer_ni * +lnet_peer_get_ni_locked(struct lnet_peer *lp, lnet_nid_t nid) +{ + struct lnet_peer_net *lpn; + struct lnet_peer_ni *lpni; + + lpn = lnet_peer_get_net_locked(lp, LNET_NIDNET(nid)); + if (!lpn) + return NULL; + + list_for_each_entry(lpni, &lpn->lpn_peer_nis, lpni_peer_nis) { + if (lpni->lpni_nid == nid) + return lpni; + } + + return NULL; +} + struct lnet_peer * lnet_find_peer(lnet_nid_t nid) { @@ -1708,6 +1726,19 @@ struct lnet_peer_ni * * Peer Discovery */ +bool +lnet_peer_gw_discovery(struct lnet_peer *lp) +{ + bool rc = false; + + spin_lock(&lp->lp_lock); + if (lp->lp_state & LNET_PEER_RTR_DISCOVERY) + rc = true; + spin_unlock(&lp->lp_lock); + + return rc; +} + /* * Is a peer uptodate from the point of view of discovery? * @@ -1797,6 +1828,9 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp) spin_unlock(&lp->lp_lock); wake_up_all(&lp->lp_dc_waitq); + if (lp->lp_rtr_refcount > 0) + lnet_router_discovery_complete(lp); + lnet_net_unlock(LNET_LOCK_EX); /* iterate through all pending messages and send them again */ @@ -2685,8 +2719,11 @@ static int lnet_peer_data_present(struct lnet_peer *lp) rc = lnet_peer_merge_data(lp, pbuf); } } else { - rc = lnet_peer_set_primary_data( - lpni->lpni_peer_net->lpn_peer, pbuf); + struct lnet_peer *new_lp; + + new_lp = lpni->lpni_peer_net->lpn_peer; + rc = lnet_peer_set_primary_data(new_lp, pbuf); + lnet_consolidate_routes_locked(lp, new_lp); lnet_peer_ni_decref_locked(lpni); } } diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 22a3018..4a061f3 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -78,13 +78,9 @@ module_param(avoid_asym_router_failure, int, 0644); MODULE_PARM_DESC(avoid_asym_router_failure, "Avoid asymmetrical router failures (0 to disable)"); -static int dead_router_check_interval = 60; -module_param(dead_router_check_interval, int, 0644); -MODULE_PARM_DESC(dead_router_check_interval, "Seconds between dead router health checks (<= 0 to disable)"); - -static int live_router_check_interval = 60; -module_param(live_router_check_interval, int, 0644); -MODULE_PARM_DESC(live_router_check_interval, "Seconds between live router health checks (<= 0 to disable)"); +int alive_router_check_interval = 60; +module_param(alive_router_check_interval, int, 0644); +MODULE_PARM_DESC(alive_router_check_interval, "Seconds between live router health checks (<= 0 to disable)"); static int router_ping_timeout = 50; module_param(router_ping_timeout, int, 0644); @@ -220,6 +216,61 @@ bool lnet_is_route_alive(struct lnet_route *route) return route_alive; } +void +lnet_consolidate_routes_locked(struct lnet_peer *orig_lp, + struct lnet_peer *new_lp) +{ + struct lnet_peer_ni *lpni; + struct lnet_route *route; + + /* Although a route is correlated with a peer, but when it's added + * a specific NID is used. That NID refers to a peer_ni within + * a peer. There could be other peer_nis on the same net, which + * can be used to send to that gateway. However when we are + * consolidating gateways because of discovery, the nid used to + * add the route might've moved between gateway peers. In this + * case we want to move the route to the new gateway as well. The + * intent here is not to confuse the user who added the route. + */ + list_for_each_entry(route, &orig_lp->lp_routes, lr_gwlist) { + lpni = lnet_peer_get_ni_locked(orig_lp, route->lr_nid); + if (!lpni) { + lnet_net_lock(LNET_LOCK_EX); + list_move(&route->lr_gwlist, &new_lp->lp_routes); + lnet_net_unlock(LNET_LOCK_EX); + } + } +} + +void +lnet_router_discovery_complete(struct lnet_peer *lp) +{ + struct lnet_peer_ni *lpni = NULL; + + spin_lock(&lp->lp_lock); + lp->lp_state &= ~LNET_PEER_RTR_DISCOVERY; + spin_unlock(&lp->lp_lock); + + /* Router discovery successful? All peer information would've been + * updated already. No need to do any more processing + */ + if (!lp->lp_dc_error) + return; + /* discovery failed? then we need to set the status of each lpni + * to DOWN. It will be updated the next time we discover the + * router. For router peer NIs not on local networks, we never send + * messages directly to them, so their health will always remain + * at maximum. We can only tell if they are up or down from the + * status returned in the PING response. If we fail to get that + * status in our scheduled router discovery, then we'll assume + * it's down until we're told otherwise. + */ + CDEBUG(D_NET, "%s: Router discovery failed %d\n", + libcfs_nid2str(lp->lp_primary_nid), lp->lp_dc_error); + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) + lpni->lpni_ns_status = LNET_NI_STATUS_DOWN; +} + static void lnet_rtr_addref_locked(struct lnet_peer *lp) { @@ -368,6 +419,7 @@ static void lnet_shuffle_seed(void) /* store the local and remote net that the route represents */ route->lr_lnet = LNET_NIDNET(gateway); route->lr_net = net; + route->lr_nid = gateway; route->lr_priority = priority; route->lr_hops = hops; @@ -610,10 +662,10 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) list_for_each_entry(route, &rnet->lrn_routes, lr_list) { if (!idx--) { *net = rnet->lrn_net; + *gateway = route->lr_nid; *hops = route->lr_hops; - *priority = route->lr_priority; - *gateway = - route->lr_gateway->lp_primary_nid; + *priority = + route->lr_priority; *alive = lnet_is_route_alive(route); lnet_net_unlock(cpt); return 0; @@ -667,8 +719,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) LASSERT(the_lnet.ln_routing); - timeout = router_ping_timeout + - max(live_router_check_interval, dead_router_check_interval); + timeout = router_ping_timeout + alive_router_check_interval; now = ktime_get_real_seconds(); while ((ni = lnet_get_next_ni_locked(NULL, ni))) { @@ -700,7 +751,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -void lnet_router_post_mt_start(void) +void lnet_wait_router_start(void) { if (check_routers_before_use) { /* @@ -718,9 +769,6 @@ void lnet_router_post_mt_start(void) */ bool lnet_router_checker_active(void) { - if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) - return true; - /* * Router Checker thread needs to run when routing is enabled in * order to call lnet_update_ni_status_locked() @@ -729,23 +777,71 @@ bool lnet_router_checker_active(void) return true; return !list_empty(&the_lnet.ln_routers) && - (live_router_check_interval > 0 || - dead_router_check_interval > 0); + alive_router_check_interval > 0; } void lnet_check_routers(void) { + struct lnet_peer_ni *lpni; struct lnet_peer *rtr; u64 version; + time64_t now; int cpt; + int rc; cpt = lnet_net_lock_current(); rescan: version = the_lnet.ln_routers_version; list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) { - /* TODO use discovery to determine if router is alive */ + now = ktime_get_real_seconds(); + + /* only discover the router if we've passed + * alive_router_check_interval seconds. Some of the router + * interfaces could be down and in that case they would be + * undergoing recovery separately from this discovery. + */ + if (now - rtr->lp_rtrcheck_timestamp < + alive_router_check_interval) + continue; + + /* If we're currently discovering the peer then don't + * issue another discovery + */ + spin_lock(&rtr->lp_lock); + if (rtr->lp_state & LNET_PEER_RTR_DISCOVERY) { + spin_unlock(&rtr->lp_lock); + continue; + } + /* make sure we actively discover the router */ + rtr->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + rtr->lp_state |= LNET_PEER_RTR_DISCOVERY; + spin_unlock(&rtr->lp_lock); + + /* find the peer_ni associated with the primary NID */ + lpni = lnet_peer_get_ni_locked(rtr, rtr->lp_primary_nid); + if (!lpni) { + CDEBUG(D_NET, + "Expected to find an lpni for %s, but non found\n", + libcfs_nid2str(rtr->lp_primary_nid)); + continue; + } + lnet_peer_ni_addref_locked(lpni); + + /* discover the router */ + CDEBUG(D_NET, "discover %s, cpt = %d\n", + libcfs_nid2str(lpni->lpni_nid), cpt); + rc = lnet_discover_peer_locked(lpni, cpt, false); + + /* decrement ref count acquired by find_peer_ni_locked() */ + lnet_peer_ni_decref_locked(lpni); + + if (!rc) + rtr->lp_rtrcheck_timestamp = now; + else + CERROR("Failed to discover router %s\n", + libcfs_nid2str(rtr->lp_primary_nid)); /* NB dropped lock */ if (version != the_lnet.ln_routers_version) { diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index e494d19..9771ef0 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -222,8 +222,7 @@ static int proc_lnet_routes(struct ctl_table *table, int write, libcfs_net2str(net), hops, priority, alive ? "up" : "down", - /* TODO: replace with actual nid */ - libcfs_nid2str(LNET_NID_ANY)); + libcfs_nid2str(route->lr_nid)); LASSERT(tmpstr + tmpsiz - s > 0); } From patchwork Thu Feb 27 21:13:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410883 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D9DA924 for ; Thu, 27 Feb 2020 21:49:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 65D6924690 for ; Thu, 27 Feb 2020 21:49:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 65D6924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDE0C34A7F0; Thu, 27 Feb 2020 13:41:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1BE9F21FCE2 for ; Thu, 27 Feb 2020 13:20:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 594AB8A8D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5805646C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:31 -0500 Message-Id: <1582838290-17243-344-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 343/622] lnet: MR aware gateway selection X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When selecting a route use the Multi-Rail Selection algorithm to select the best available peer_ni of the best route. The selected peer_ni can then be used to send the message or to discover it if the gateway peer needs discovering. WC-bug-id: https://jira.whamcloud.com/browse/LU-11378 Lustre-commit: 11d8380d5ad0 ("LU-11378 lnet: MR aware gateway selection") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33188 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 353 +++++++++++++++++++++++------------------------ 1 file changed, 171 insertions(+), 182 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index e214a95..054ae48 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1117,7 +1117,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } } -#if 0 static int lnet_compare_peers(struct lnet_peer_ni *p1, struct lnet_peer_ni *p2) { @@ -1135,53 +1134,189 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return 0; } -#endif + +static struct lnet_peer_ni * +lnet_select_peer_ni(struct lnet_ni *best_ni, lnet_nid_t dst_nid, + struct lnet_peer *peer, + struct lnet_peer_net *peer_net) +{ + /* Look at the peer NIs for the destination peer that connect + * to the chosen net. If a peer_ni is preferred when using the + * best_ni to communicate, we use that one. If there is no + * preferred peer_ni, or there are multiple preferred peer_ni, + * the available transmit credits are used. If the transmit + * credits are equal, we round-robin over the peer_ni. + */ + struct lnet_peer_ni *lpni = NULL; + struct lnet_peer_ni *best_lpni = NULL; + int best_lpni_credits = INT_MIN; + bool preferred = false; + bool ni_is_pref; + int best_lpni_healthv = 0; + int lpni_healthv; + + while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { + /* if the best_ni we've chosen aleady has this lpni + * preferred, then let's use it + */ + if (best_ni) { + ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, + best_ni->ni_nid); + CDEBUG(D_NET, "%s ni_is_pref = %d\n", + libcfs_nid2str(best_ni->ni_nid), ni_is_pref); + } else { + ni_is_pref = false; + } + + lpni_healthv = atomic_read(&lpni->lpni_healthv); + + if (best_lpni) + CDEBUG(D_NET, "%s c:[%d, %d], s:[%d, %d]\n", + libcfs_nid2str(lpni->lpni_nid), + lpni->lpni_txcredits, best_lpni_credits, + lpni->lpni_seq, best_lpni->lpni_seq); + + /* pick the healthiest peer ni */ + if (lpni_healthv < best_lpni_healthv) { + continue; + } else if (lpni_healthv > best_lpni_healthv) { + best_lpni_healthv = lpni_healthv; + /* if this is a preferred peer use it */ + } else if (!preferred && ni_is_pref) { + preferred = true; + } else if (preferred && !ni_is_pref) { + /* this is not the preferred peer so let's ignore + * it. + */ + continue; + } else if (lpni->lpni_txcredits < best_lpni_credits) { + /* We already have a peer that has more credits + * available than this one. No need to consider + * this peer further. + */ + continue; + } else if (lpni->lpni_txcredits == best_lpni_credits) { + /* The best peer found so far and the current peer + * have the same number of available credits let's + * make sure to select between them using Round + * Robin + */ + if (best_lpni) { + if (best_lpni->lpni_seq <= lpni->lpni_seq) + continue; + } + } + + best_lpni = lpni; + best_lpni_credits = lpni->lpni_txcredits; + } + + /* if we still can't find a peer ni then we can't reach it */ + if (!best_lpni) { + u32 net_id = (peer_net) ? peer_net->lpn_net_id : + LNET_NIDNET(dst_nid); + CDEBUG(D_NET, "no peer_ni found on peer net %s\n", + libcfs_net2str(net_id)); + return NULL; + } + + CDEBUG(D_NET, "sd_best_lpni = %s\n", + libcfs_nid2str(best_lpni->lpni_nid)); + + return best_lpni; +} + +/* Prerequisite: the best_ni should already be set in the sd */ +static inline struct lnet_peer_ni * +lnet_find_best_lpni_on_net(struct lnet_send_data *sd, struct lnet_peer *peer, + u32 net_id) +{ + struct lnet_peer_net *peer_net; + + /* The gateway is Multi-Rail capable so now we must select the + * proper peer_ni + */ + peer_net = lnet_peer_get_net_locked(peer, net_id); + + if (!peer_net) { + CERROR("gateway peer %s has no NI on net %s\n", + libcfs_nid2str(peer->lp_primary_nid), + libcfs_net2str(net_id)); + return NULL; + } + + return lnet_select_peer_ni(sd->sd_best_ni, sd->sd_dst_nid, + peer, peer_net); +} static int -lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2) +lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2, + struct lnet_peer_ni **best_lpni) { - /* TODO re-implement gateway comparison - struct lnet_peer_ni *p1 = r1->lr_gateway; - struct lnet_peer_ni *p2 = r2->lr_gateway; - */ int r1_hops = (r1->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r1->lr_hops; int r2_hops = (r2->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r2->lr_hops; - /*int rc;*/ + struct lnet_peer *lp1 = r1->lr_gateway; + struct lnet_peer *lp2 = r2->lr_gateway; + struct lnet_peer_ni *lpni1; + struct lnet_peer_ni *lpni2; + struct lnet_send_data sd; + int rc; + + sd.sd_best_ni = NULL; + sd.sd_dst_nid = LNET_NID_ANY; + lpni1 = lnet_find_best_lpni_on_net(&sd, lp1, r1->lr_lnet); + lpni2 = lnet_find_best_lpni_on_net(&sd, lp2, r2->lr_lnet); + LASSERT(lpni1 && lpni2); - if (r1->lr_priority < r2->lr_priority) + if (r1->lr_priority < r2->lr_priority) { + *best_lpni = lpni1; return 1; + } - if (r1->lr_priority > r2->lr_priority) + if (r1->lr_priority > r2->lr_priority) { + *best_lpni = lpni2; return -1; + } - if (r1_hops < r2_hops) + if (r1_hops < r2_hops) { + *best_lpni = lpni1; return 1; + } - if (r1_hops > r2_hops) + if (r1_hops > r2_hops) { + *best_lpni = lpni2; return -1; + } - /* - rc = lnet_compare_peers(p1, p2); - if (rc) + rc = lnet_compare_peers(lpni1, lpni2); + if (rc == 1) { + *best_lpni = lpni1; + return rc; + } else if (rc == -1) { + *best_lpni = lpni2; return rc; - */ + } - if (r1->lr_seq - r2->lr_seq <= 0) + if (r1->lr_seq - r2->lr_seq <= 0) { + *best_lpni = lpni1; return 1; + } + *best_lpni = lpni2; return -1; } -/* TODO: lnet_find_route_locked() needs to be reimplemented */ static struct lnet_route * lnet_find_route_locked(struct lnet_net *net, u32 remote_net, - lnet_nid_t rtr_nid, struct lnet_route **prev_route) + lnet_nid_t rtr_nid, struct lnet_route **prev_route, + struct lnet_peer_ni **gwni) { - struct lnet_remotenet *rnet; - struct lnet_route *route; + struct lnet_peer_ni *best_gw_ni = NULL; struct lnet_route *best_route; struct lnet_route *last_route; + struct lnet_remotenet *rnet; struct lnet_peer *lp_best; + struct lnet_route *route; struct lnet_peer *lp; int rc; @@ -1206,14 +1341,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_route = route; last_route = route; lp_best = lp; - continue; } /* no protection on below fields, but it's harmless */ if (last_route->lr_seq - route->lr_seq < 0) last_route = route; - rc = lnet_compare_routes(route, best_route); + rc = lnet_compare_routes(route, best_route, &best_gw_ni); if (rc < 0) continue; @@ -1222,6 +1356,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } *prev_route = last_route; + *gwni = best_gw_ni; return best_route; } @@ -1507,123 +1642,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return rc; } -static struct lnet_peer_ni * -lnet_select_peer_ni(struct lnet_send_data *sd, struct lnet_peer *peer, - struct lnet_peer_net *peer_net) -{ - /* - * Look at the peer NIs for the destination peer that connect - * to the chosen net. If a peer_ni is preferred when using the - * best_ni to communicate, we use that one. If there is no - * preferred peer_ni, or there are multiple preferred peer_ni, - * the available transmit credits are used. If the transmit - * credits are equal, we round-robin over the peer_ni. - */ - struct lnet_peer_ni *lpni = NULL; - struct lnet_peer_ni *best_lpni = NULL; - struct lnet_ni *best_ni = sd->sd_best_ni; - lnet_nid_t dst_nid = sd->sd_dst_nid; - int best_lpni_credits = INT_MIN; - bool preferred = false; - bool ni_is_pref; - int best_lpni_healthv = 0; - int lpni_healthv; - - while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) { - /* if the best_ni we've chosen aleady has this lpni - * preferred, then let's use it - */ - ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, - best_ni->ni_nid); - - lpni_healthv = atomic_read(&lpni->lpni_healthv); - - CDEBUG(D_NET, "%s ni_is_pref = %d\n", - libcfs_nid2str(best_ni->ni_nid), ni_is_pref); - - if (best_lpni) - CDEBUG(D_NET, "%s c:[%d, %d], s:[%d, %d]\n", - libcfs_nid2str(lpni->lpni_nid), - lpni->lpni_txcredits, best_lpni_credits, - lpni->lpni_seq, best_lpni->lpni_seq); - - /* pick the healthiest peer ni */ - if (lpni_healthv < best_lpni_healthv) { - continue; - } else if (lpni_healthv > best_lpni_healthv) { - best_lpni_healthv = lpni_healthv; - /* if this is a preferred peer use it */ - } else if (!preferred && ni_is_pref) { - preferred = true; - } else if (preferred && !ni_is_pref) { - /* - * this is not the preferred peer so let's ignore - * it. - */ - continue; - } else if (lpni->lpni_txcredits < best_lpni_credits) { - /* - * We already have a peer that has more credits - * available than this one. No need to consider - * this peer further. - */ - continue; - } else if (lpni->lpni_txcredits == best_lpni_credits) { - /* - * The best peer found so far and the current peer - * have the same number of available credits let's - * make sure to select between them using Round - * Robin - */ - if (best_lpni) { - if (best_lpni->lpni_seq <= lpni->lpni_seq) - continue; - } - } - - best_lpni = lpni; - best_lpni_credits = lpni->lpni_txcredits; - } - - /* if we still can't find a peer ni then we can't reach it */ - if (!best_lpni) { - u32 net_id = peer_net ? peer_net->lpn_net_id : - LNET_NIDNET(dst_nid); - - CDEBUG(D_NET, "no peer_ni found on peer net %s\n", - libcfs_net2str(net_id)); - return NULL; - } - - CDEBUG(D_NET, "sd_best_lpni = %s\n", - libcfs_nid2str(best_lpni->lpni_nid)); - - return best_lpni; -} - -/* Prerequisite: the best_ni should already be set in the sd - */ -static inline struct lnet_peer_ni * -lnet_find_best_lpni_on_net(struct lnet_send_data *sd, struct lnet_peer *peer, - u32 net_id) -{ - struct lnet_peer_net *peer_net; - - /* The gateway is Multi-Rail capable so now we must select the - * proper peer_ni - */ - peer_net = lnet_peer_get_net_locked(peer, net_id); - - if (!peer_net) { - CERROR("gateway peer %s has no NI on net %s\n", - libcfs_nid2str(peer->lp_primary_nid), - libcfs_net2str(net_id)); - return NULL; - } - - return lnet_select_peer_ni(sd, peer, peer_net); -} - static inline void lnet_set_non_mr_pref_nid(struct lnet_send_data *sd) { @@ -1791,29 +1809,34 @@ struct lnet_ni * lnet_nid_t src_nid = sd->sd_src_nid; best_route = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), - sd->sd_rtr_nid, &last_route); + sd->sd_rtr_nid, &last_route, + &lpni); if (!best_route) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); return -EHOSTUNREACH; } + if (!lpni) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } + gw = best_route->lr_gateway; - *gw_peer = gw; + LASSERT(gw == lpni->lpni_peer_net->lpn_peer); /* Discover this gateway if it hasn't already been discovered. * This means we might delay the message until discovery has * completed */ -#if 0 - /* TODO: disable discovey for now */ if (lnet_msg_discovery(sd->sd_msg) && - !lnet_peer_is_uptodate(*gw_peer)) { + !lnet_peer_is_uptodate(gw)) { sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; - return lnet_initiate_peer_discovery(gw, sd->sd_msg, + return lnet_initiate_peer_discovery(lpni, sd->sd_msg, sd->sd_rtr_nid, sd->sd_cpt); } -#endif if (!sd->sd_best_ni) { struct lnet_peer_net *lpeer; @@ -1830,42 +1853,8 @@ struct lnet_ni * return -EFAULT; } - /* if gw is MR let's find its best peer_ni - */ - if (lnet_peer_is_multi_rail(gw)) { - lpni = lnet_find_best_lpni_on_net(sd, gw, - sd->sd_best_ni->ni_net->net_id); - /* We've already verified that the gw has an NI on that - * desired net, but we're not finding it. Something is - * wrong. - */ - if (!lpni) { - CERROR("Internal Error. Route expected to %s from %s\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EFAULT; - } - } else { - struct lnet_peer_net *lpn; - - lpn = lnet_peer_get_net_locked(gw, best_route->lr_lnet); - if (!lpn) { - CERROR("Internal Error. Route expected to %s from %s\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EFAULT; - } - lpni = list_entry(lpn->lpn_peer_nis.next, struct lnet_peer_ni, - lpni_peer_nis); - if (!lpni) { - CERROR("Internal Error. Route expected to %s from %s\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EFAULT; - } - } - *gw_lpni = lpni; + *gw_peer = gw; /* increment the route sequence number since now we're sure we're * going to use it From patchwork Thu Feb 27 21:13:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4615138D for ; Thu, 27 Feb 2020 21:43:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CD8F0246A1 for ; Thu, 27 Feb 2020 21:43:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD8F0246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B629A34ADC5; Thu, 27 Feb 2020 13:34:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7291521FF3C for ; Thu, 27 Feb 2020 13:20:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5D7C18A8E; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5ADC746D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:32 -0500 Message-Id: <1582838290-17243-345-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 344/622] lnet: consider alive_router_check_interval X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Consider router_check_interval when waking up the monitor thread, to make sure you wakeup the monitor thread at the earliest possible time. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 434456256f30 ("LU-11300 lnet: consider alive_router_check_interval") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33298 Reviewed-by: Olaf Weber Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 054ae48..90b4e3f 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3142,7 +3142,8 @@ struct lnet_mt_event_info { * is waking up unnecessarily. */ interval = min(lnet_recovery_interval, - lnet_transaction_timeout / 2); + min((unsigned int)alive_router_check_interval, + lnet_transaction_timeout / 2)); wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, false, HZ * interval); } From patchwork Thu Feb 27 21:13:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D700D924 for ; Thu, 27 Feb 2020 21:45:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB6AC246A5 for ; Thu, 27 Feb 2020 21:45:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB6AC246A5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5DE6F349F21; Thu, 27 Feb 2020 13:36:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B42B221FD1F for ; Thu, 27 Feb 2020 13:20:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F2588A8F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5DB9847C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:33 -0500 Message-Id: <1582838290-17243-346-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 345/622] lnet: allow deleting router primary_nid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Discovery doesn't allow deleting a primary_nid of a peer. This is necessary because upper layers only know to reach the peer by using the primary_nid. For routers this is not the case. So if a router changes its interfaces and comes back up again, the peer_ni should be adjusted. WC-bug-id: https://jira.whamcloud.com/browse/LU-11475 Lustre-commit: 086962e37737 ("LU-11475 lnet: allow deleting router primary_nid") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33300 Reviewed-by: Sebastien Buisson Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 3 +++ net/lnet/lnet/peer.c | 29 ++++++++++++++++++++++------- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 9662c9e..97d35e0 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -716,6 +716,9 @@ struct lnet_peer { #define LNET_PEER_FORCE_PING BIT(13) /* Forced Ping */ #define LNET_PEER_FORCE_PUSH BIT(14) /* Forced Push */ +/* force delete even if router */ +#define LNET_PEER_RTR_NI_FORCE_DEL BIT(15) + /* gw undergoing alive discovery */ #define LNET_PEER_RTR_DISCOVERY BIT(16) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b804d78..a81fee2 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -323,12 +323,12 @@ /* called with lnet_net_lock LNET_LOCK_EX held */ static int -lnet_peer_ni_del_locked(struct lnet_peer_ni *lpni) +lnet_peer_ni_del_locked(struct lnet_peer_ni *lpni, bool force) { struct lnet_peer_table *ptable = NULL; /* don't remove a peer_ni if it's also a gateway */ - if (lnet_isrouter(lpni)) { + if (lnet_isrouter(lpni) && !force) { CERROR("Peer NI %s is a gateway. Can not delete it\n", libcfs_nid2str(lpni->lpni_nid)); return -EBUSY; @@ -384,7 +384,7 @@ void lnet_peer_uninit(void) /* remove all peer_nis from the remote peer and the hash list */ list_for_each_entry_safe(lpni, tmp, &the_lnet.ln_remote_peer_ni_list, lpni_on_remote_peer_ni_list) - lnet_peer_ni_del_locked(lpni); + lnet_peer_ni_del_locked(lpni, false); lnet_peer_tables_destroy(); @@ -439,7 +439,7 @@ void lnet_peer_uninit(void) lpni = lnet_get_next_peer_ni_locked(peer, NULL, lpni); while (lpni) { lpni2 = lnet_get_next_peer_ni_locked(peer, NULL, lpni); - rc = lnet_peer_ni_del_locked(lpni); + rc = lnet_peer_ni_del_locked(lpni, false); if (rc != 0) rc2 = rc; lpni = lpni2; @@ -473,6 +473,7 @@ void lnet_peer_uninit(void) struct lnet_peer_ni *lpni; lnet_nid_t primary_nid = lp->lp_primary_nid; int rc = 0; + bool force = (flags & LNET_PEER_RTR_NI_FORCE_DEL) ? true : false; if (!(flags & LNET_PEER_CONFIGURED)) { if (lp->lp_state & LNET_PEER_CONFIGURED) { @@ -495,14 +496,21 @@ void lnet_peer_uninit(void) * This function only allows deletion of the primary NID if it * is the only NID. */ - if (nid == lp->lp_primary_nid && lp->lp_nnis != 1) { + if (nid == lp->lp_primary_nid && lp->lp_nnis != 1 && !force) { rc = -EBUSY; goto out; } lnet_net_lock(LNET_LOCK_EX); - rc = lnet_peer_ni_del_locked(lpni); + if (nid == lp->lp_primary_nid && lp->lp_nnis != 1 && force) { + struct lnet_peer_ni *lpni2; + /* assign the next peer_ni to be the primary */ + lpni2 = lnet_get_next_peer_ni_locked(lp, NULL, lpni); + LASSERT(lpni2); + lp->lp_primary_nid = lpni->lpni_nid; + } + rc = lnet_peer_ni_del_locked(lpni, force); lnet_net_unlock(LNET_LOCK_EX); @@ -530,7 +538,7 @@ void lnet_peer_uninit(void) peer = lpni->lpni_peer_net->lpn_peer; if (peer->lp_primary_nid != lpni->lpni_nid) { - lnet_peer_ni_del_locked(lpni); + lnet_peer_ni_del_locked(lpni, false); continue; } /* @@ -2545,6 +2553,13 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, } for (i = 0; i < ndelnis; i++) { + /* for routers it's okay to delete the primary_nid because + * the upper layers don't really rely on it. So if we're + * being told that the router changed its primary_nid + * then it's okay to delete it. + */ + if (lp->lp_rtr_refcount > 0) + flags |= LNET_PEER_RTR_NI_FORCE_DEL; rc = lnet_peer_del_nid(lp, delnis[i], flags); if (rc) { CERROR("Error deleting NID %s from peer %s: %d\n", From patchwork Thu Feb 27 21:13:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EBF3924 for ; Thu, 27 Feb 2020 21:43:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 573BC24690 for ; Thu, 27 Feb 2020 21:43:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 573BC24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2393E34AE16; Thu, 27 Feb 2020 13:34:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C4A821FD1C for ; Thu, 27 Feb 2020 13:20:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 621338A90; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 60B42468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:34 -0500 Message-Id: <1582838290-17243-347-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 346/622] lnet: transfer routers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a primary NID of a peer is about to be deleted because it's being transferred to another peer, if that peer is a gateway then transfer all gateway properties to the new peer. WC-bug-id: https://jira.whamcloud.com/browse/LU-11475 Lustre-commit: cab57464e17b ("LU-11475 lnet: transfer routers") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34539 Reviewed-by: Sebastien Buisson Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/lnet/peer.c | 12 ++++++++++++ net/lnet/lnet/router.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 1d06263..5a83e3a 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -534,6 +534,8 @@ int lnet_get_peer_list(u32 *countp, u32 *sizep, int lnet_rtrpools_enable(void); void lnet_rtrpools_disable(void); void lnet_rtrpools_free(int keep_pools); +void lnet_rtr_transfer_to_peer(struct lnet_peer *src, + struct lnet_peer *target); struct lnet_remotenet *lnet_find_rnet_locked(u32 net); int lnet_dyn_add_net(struct lnet_ioctl_config_data *conf); int lnet_dyn_del_net(u32 net); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index a81fee2..5d13986 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1355,6 +1355,18 @@ struct lnet_peer_net * } /* If this is the primary NID, destroy the peer. */ if (lnet_peer_ni_is_primary(lpni)) { + struct lnet_peer *rtr_lp = + lpni->lpni_peer_net->lpn_peer; + int rtr_refcount = rtr_lp->lp_rtr_refcount; + + /* if we're trying to delete a router it means + * we're moving this peer NI to a new peer so must + * transfer router properties to the new peer + */ + if (rtr_refcount > 0) { + flags |= LNET_PEER_RTR_NI_FORCE_DEL; + lnet_rtr_transfer_to_peer(rtr_lp, lp); + } lnet_peer_del(lpni->lpni_peer_net->lpn_peer); lpni = lnet_peer_ni_alloc(nid); if (!lpni) { diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 4a061f3..aa8ec8c 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -136,6 +136,35 @@ static int rtr_sensitivity_set(const char *val, return 0; } +void +lnet_rtr_transfer_to_peer(struct lnet_peer *src, struct lnet_peer *target) +{ + struct lnet_route *route; + + lnet_net_lock(LNET_LOCK_EX); + target->lp_rtr_refcount += src->lp_rtr_refcount; + /* move the list of queued messages to the new peer */ + list_splice_init(&src->lp_rtrq, &target->lp_rtrq); + /* move all the routes that reference the peer */ + list_splice_init(&src->lp_routes, &target->lp_routes); + /* update all the routes to point to the new peer */ + list_for_each_entry(route, &target->lp_routes, lr_gwlist) + route->lr_gateway = target; + /* remove the old peer from the ln_routers list */ + list_del_init(&src->lp_rtr_list); + /* add the new peer to the ln_routers list */ + if (list_empty(&target->lp_rtr_list)) { + lnet_peer_addref_locked(target); + list_add_tail(&target->lp_rtr_list, &the_lnet.ln_routers); + } + /* reset the ref count on the old peer and decrement its ref count */ + src->lp_rtr_refcount = 0; + lnet_peer_decref_locked(src); + /* update the router version */ + the_lnet.ln_routers_version++; + lnet_net_unlock(LNET_LOCK_EX); +} + int lnet_peers_start_down(void) { From patchwork Thu Feb 27 21:13:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410887 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2231F1580 for ; Thu, 27 Feb 2020 21:49:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0ADA424690 for ; Thu, 27 Feb 2020 21:49:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0ADA424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 947E934A8B6; Thu, 27 Feb 2020 13:41:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7476321FC59 for ; Thu, 27 Feb 2020 13:20:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 653028A91; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 63A6146A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:35 -0500 Message-Id: <1582838290-17243-348-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 347/622] lnet: handle health for incoming messages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata In case of routers (as well as for the general case) it's important to update the health of the ni/lpni for incoming messages. For an lpni specifically when we receive a message is when we know that the lpni is up. A percentage router health is required in order to send a message to a gateway. That defaults to 100, meaning that a router interface has to be absolutely healthy in order to send to it. This matches the current behavior. So if a router interface goes down an its health goes down significantly, but then it comes back up again; either we receive a message from it or we discover it and get a reply, then in order to start using that router interface again we have to boost its health all the way up to maximum. This behavior is special cased for routers. WC-bug-id: https://jira.whamcloud.com/browse/LU-11477 Lustre-commit: 18c850cb91a6 ("LU-11477 lnet: handle health for incoming messages") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33301 Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 90 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 65 insertions(+), 25 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 23c3bf4..2cbaff8a 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -598,19 +598,23 @@ { enum lnet_msg_hstatus hstatus = msg->msg_health_status; bool lo = false; + struct lnet_ni *ni; + struct lnet_peer_ni *lpni; /* if we're shutting down no point in handling health. */ if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) return -1; - LASSERT(msg->msg_txni); + LASSERT(msg->msg_tx_committed || msg->msg_rx_committed); /* if we're sending to the LOLND then the msg_txpeer will not be * set. So no need to sanity check it. */ - if (LNET_NETTYP(LNET_NIDNET(msg->msg_txni->ni_nid)) != LOLND) + if (msg->msg_tx_committed && + LNET_NETTYP(LNET_NIDNET(msg->msg_txni->ni_nid)) != LOLND) LASSERT(msg->msg_txpeer); - else + else if (msg->msg_tx_committed && + LNET_NETTYP(LNET_NIDNET(msg->msg_txni->ni_nid)) == LOLND) lo = true; if (hstatus != LNET_MSG_STATUS_OK && @@ -626,20 +630,52 @@ lnet_net_unlock(0); } + /* always prefer txni/txpeer if they message is committed for both + * directions. + */ + if (msg->msg_tx_committed) { + ni = msg->msg_txni; + lpni = msg->msg_txpeer; + } else { + ni = msg->msg_rxni; + lpni = msg->msg_rxpeer; + } + + if (!lo) + LASSERT(ni && lpni); + else + LASSERT(ni); + CDEBUG(D_NET, "health check: %s->%s: %s: %s\n", - libcfs_nid2str(msg->msg_txni->ni_nid), - (lo) ? "self" : libcfs_nid2str(msg->msg_txpeer->lpni_nid), + libcfs_nid2str(ni->ni_nid), + (lo) ? "self" : libcfs_nid2str(lpni->lpni_nid), lnet_msgtyp2str(msg->msg_type), lnet_health_error2str(hstatus)); switch (hstatus) { case LNET_MSG_STATUS_OK: - lnet_inc_healthv(&msg->msg_txni->ni_healthv); + /* increment the local ni health weather we successfully + * received or sent a message on it. + */ + lnet_inc_healthv(&ni->ni_healthv); /* It's possible msg_txpeer is NULL in the LOLND - * case. + * case. Only increment the peer's health if we're + * receiving a message from it. It's the only sure way to + * know that a remote interface is up. + * If this interface is part of a router, then take that + * as indication that the router is fully healthy. */ - if (msg->msg_txpeer) - lnet_inc_healthv(&msg->msg_txpeer->lpni_healthv); + if (lpni && msg->msg_rx_committed) { + /* If we're receiving a message from the router or + * I'm a router, then set that lpni's health to + * maximum so we can commence communication + */ + if (lnet_isrouter(lpni) || the_lnet.ln_routing) + lnet_set_healthv(&lpni->lpni_healthv, + LNET_MAX_HEALTH_VALUE); + else + lnet_inc_healthv(&lpni->lpni_healthv); + } /* we can finalize this message */ return -1; @@ -648,34 +684,41 @@ case LNET_MSG_STATUS_LOCAL_ABORTED: case LNET_MSG_STATUS_LOCAL_NO_ROUTE: case LNET_MSG_STATUS_LOCAL_TIMEOUT: - lnet_handle_local_failure(msg->msg_txni); - /* add to the re-send queue */ - goto resend; + lnet_handle_local_failure(ni); + if (msg->msg_tx_committed) + /* add to the re-send queue */ + goto resend; + break; /* These errors will not trigger a resend so simply * finalize the message */ case LNET_MSG_STATUS_LOCAL_ERROR: - lnet_handle_local_failure(msg->msg_txni); + lnet_handle_local_failure(ni); return -1; /* TODO: since the remote dropped the message we can * attempt a resend safely. */ case LNET_MSG_STATUS_REMOTE_DROPPED: - lnet_handle_remote_failure(msg->msg_txpeer); - goto resend; + lnet_handle_remote_failure(lpni); + if (msg->msg_tx_committed) + goto resend; + break; case LNET_MSG_STATUS_REMOTE_ERROR: case LNET_MSG_STATUS_REMOTE_TIMEOUT: case LNET_MSG_STATUS_NETWORK_TIMEOUT: - lnet_handle_remote_failure(msg->msg_txpeer); + lnet_handle_remote_failure(lpni); return -1; default: LBUG(); } resend: + /* we can only resend tx_committed messages */ + LASSERT(msg->msg_tx_committed); + /* don't resend recovery messages */ if (msg->msg_recovery) { CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n", @@ -783,7 +826,7 @@ static bool lnet_is_health_check(struct lnet_msg *msg) { - bool hc; + bool hc = true; int status = msg->msg_ev.status; if ((!msg->msg_tx_committed && !msg->msg_rx_committed) || @@ -800,15 +843,12 @@ return false; } - /* perform a health check for any message committed for transmit */ - hc = msg->msg_tx_committed; - /* Check for status inconsistencies */ - if (hc && - ((!status && msg->msg_health_status != LNET_MSG_STATUS_OK) || - (status && msg->msg_health_status == LNET_MSG_STATUS_OK))) { - CERROR("Msg is in inconsistent state, don't perform health checking (%d, %d)\n", - status, msg->msg_health_status); + if ((!status && msg->msg_health_status != LNET_MSG_STATUS_OK) || + (status && msg->msg_health_status == LNET_MSG_STATUS_OK)) { + CDEBUG(D_NET, + "Msg %p is in inconsistent state, don't perform health checking (%d, %d)\n", + msg, status, msg->msg_health_status); hc = false; } From patchwork Thu Feb 27 21:13:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410309 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B82FC138D for ; Thu, 27 Feb 2020 21:34:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A0C1F24677 for ; Thu, 27 Feb 2020 21:34:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0C1F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C5ED348DF3; Thu, 27 Feb 2020 13:29:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C9E5321FC59 for ; Thu, 27 Feb 2020 13:20:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 67F138A92; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6685146C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:36 -0500 Message-Id: <1582838290-17243-349-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 348/622] lnet: misleading discovery seqno. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata There is a sequence number used when sending discovery messages. This sequence number is intended to detect stale messages. However it could be misleading if the peer reboots. In this case the peer's sequence number will reset. The node will think that all information being sent to it is stale, while in reality the peer might've changed configuration. There is no reliable why to know whether a peer rebooted, so we'll always assume that the messages we're receiving are valid. So we'll operate on first come first serve basis. WC-bug-id: https://jira.whamcloud.com/browse/LU-11478 Lustre-commit: 42d999ed8f61 ("LU-11478 lnet: misleading discovery seqno.") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33304 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 45 +++++++-------------------------------------- 1 file changed, 7 insertions(+), 38 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 5d13986..2097a97 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1987,38 +1987,9 @@ void lnet_peer_push_event(struct lnet_event *ev) goto out; } - /* - * Check whether the Put data is stale. Stale data can just be - * dropped. - */ - if (pbuf->pb_info.pi_nnis > 1 && - lp->lp_primary_nid == pbuf->pb_info.pi_ni[1].ns_nid && - LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) { - CDEBUG(D_NET, "Stale Push from %s: got %u have %u\n", - libcfs_nid2str(lp->lp_primary_nid), - LNET_PING_BUFFER_SEQNO(pbuf), - lp->lp_peer_seqno); - goto out; - } - - /* - * Check whether the Put data is new, in which case we clear - * the UPTODATE flag and prepare to process it. - * - * If the Put data is current, and the peer is UPTODATE then - * we assome everything is all right and drop the data as - * stale. - */ - if (LNET_PING_BUFFER_SEQNO(pbuf) > lp->lp_peer_seqno) { - lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); - lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; - } else if (lp->lp_state & LNET_PEER_NIDS_UPTODATE) { - CDEBUG(D_NET, "Stale Push from %s: got %u have %u\n", - libcfs_nid2str(lp->lp_primary_nid), - LNET_PING_BUFFER_SEQNO(pbuf), - lp->lp_peer_seqno); - goto out; - } + /* always assume new data */ + lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; /* * If there is data present that hasn't been processed yet, @@ -2302,16 +2273,14 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL && pbuf->pb_info.pi_nnis > 1 && lp->lp_primary_nid == pbuf->pb_info.pi_ni[1].ns_nid) { - if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) { - CDEBUG(D_NET, "Stale Reply from %s: got %u have %u\n", + if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) + CDEBUG(D_NET, + "peer %s: seq# got %u have %u. peer rebooted?\n", libcfs_nid2str(lp->lp_primary_nid), LNET_PING_BUFFER_SEQNO(pbuf), lp->lp_peer_seqno); - goto out; - } - if (LNET_PING_BUFFER_SEQNO(pbuf) > lp->lp_peer_seqno) - lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); } /* We're happy with the state of the data in the buffer. */ From patchwork Thu Feb 27 21:13:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410229 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05E31138D for ; Thu, 27 Feb 2020 21:33:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E2224246A1 for ; Thu, 27 Feb 2020 21:33:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E2224246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B09AC349B80; Thu, 27 Feb 2020 13:28:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2F5D021FC28 for ; Thu, 27 Feb 2020 13:20:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6BCB08A93; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 693A246D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:37 -0500 Message-Id: <1582838290-17243-350-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 349/622] lnet: drop all rule X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Add a rule to drop all messages arriving on a specific interface. This is useful for simulating failures on a specific router interface. WC-bug-id: https://jira.whamcloud.com/browse/LU-11470 Lustre-commit: deb31c2ffad5 ("LU-11470 lnet: drop all rule") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33305 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 3 ++- include/uapi/linux/lnet/lnetctl.h | 6 ++++++ net/lnet/lnet/lib-move.c | 2 +- net/lnet/lnet/lib-msg.c | 7 +++++-- net/lnet/lnet/net_fault.c | 28 +++++++++++++++++++++------- 5 files changed, 35 insertions(+), 11 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 5a83e3a..4dee7a9 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -663,7 +663,8 @@ void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, int lnet_fault_init(void); void lnet_fault_fini(void); -bool lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus); +bool lnet_drop_rule_match(struct lnet_hdr *hdr, lnet_nid_t local_nid, + enum lnet_msg_hstatus *hstatus); int lnet_delay_rule_add(struct lnet_fault_attr *attr); int lnet_delay_rule_del(lnet_nid_t src, lnet_nid_t dst, bool shutdown); diff --git a/include/uapi/linux/lnet/lnetctl.h b/include/uapi/linux/lnet/lnetctl.h index 2eb9c82..bd08b4f 100644 --- a/include/uapi/linux/lnet/lnetctl.h +++ b/include/uapi/linux/lnet/lnetctl.h @@ -64,6 +64,10 @@ struct lnet_fault_attr { lnet_nid_t fa_src; /** destination NID of drop rule, see @dr_src for details */ lnet_nid_t fa_dst; + /** local NID. In case of router this is the NID we're ceiving + * messages on + */ + lnet_nid_t fa_local_nid; /** * Portal mask to drop, -1 means all portals, for example: * fa_ptl_mask = (1 << _LDLM_CB_REQUEST_PORTAL ) | @@ -95,6 +99,8 @@ struct lnet_fault_attr { __u32 da_health_error_mask; /** randomize error generation */ bool da_random; + /** drop all messages if flag is set */ + bool da_drop_all; } drop; /** message latency simulation */ struct { diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 90b4e3f..fff9fea 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3964,7 +3964,7 @@ void lnet_monitor_thr_stop(void) } if (!list_empty(&the_lnet.ln_drop_rules) && - lnet_drop_rule_match(hdr, NULL)) { + lnet_drop_rule_match(hdr, ni->ni_nid, NULL)) { CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n", libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid), lnet_msgtyp2str(type)); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 2cbaff8a..8876866 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -900,11 +900,14 @@ return false; /* match only health rules */ - if (!lnet_drop_rule_match(&msg->msg_hdr, hstatus)) + if (!lnet_drop_rule_match(&msg->msg_hdr, LNET_NID_ANY, + hstatus)) return false; - CDEBUG(D_NET, "src %s, dst %s: %s simulate health error: %s\n", + CDEBUG(D_NET, + "src %s(%s)->dst %s: %s simulate health error: %s\n", libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_txni->ni_nid), libcfs_nid2str(msg->msg_hdr.dest_nid), lnet_msgtyp2str(msg->msg_type), lnet_health_error2str(*hstatus)); diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index becb709..9f78e43 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -79,10 +79,12 @@ struct lnet_drop_rule { static bool lnet_fault_attr_match(struct lnet_fault_attr *attr, lnet_nid_t src, - lnet_nid_t dst, unsigned int type, unsigned int portal) + lnet_nid_t local_nid, lnet_nid_t dst, + unsigned int type, unsigned int portal) { if (!lnet_fault_nid_match(attr->fa_src, src) || - !lnet_fault_nid_match(attr->fa_dst, dst)) + !lnet_fault_nid_match(attr->fa_dst, dst) || + !lnet_fault_nid_match(attr->fa_local_nid, local_nid)) return false; if (!(attr->fa_msg_mask & (1 << type))) @@ -340,15 +342,22 @@ struct lnet_drop_rule { */ static bool drop_rule_match(struct lnet_drop_rule *rule, lnet_nid_t src, - lnet_nid_t dst, unsigned int type, unsigned int portal, + lnet_nid_t local_nid, lnet_nid_t dst, + unsigned int type, unsigned int portal, enum lnet_msg_hstatus *hstatus) { struct lnet_fault_attr *attr = &rule->dr_attr; bool drop; - if (!lnet_fault_attr_match(attr, src, dst, type, portal)) + if (!lnet_fault_attr_match(attr, src, local_nid, dst, type, portal)) return false; + if (attr->u.drop.da_drop_all) { + CDEBUG(D_NET, "set to drop all messages\n"); + drop = true; + goto drop_matched; + } + /* if we're trying to match a health status error but it hasn't * been set in the rule, then don't match */ @@ -396,6 +405,8 @@ struct lnet_drop_rule { } } +drop_matched: + if (drop) { /* drop this message, update counters */ if (hstatus) lnet_fault_match_health(hstatus, @@ -412,7 +423,9 @@ struct lnet_drop_rule { * Check if message from @src to @dst can match any existed drop rule */ bool -lnet_drop_rule_match(struct lnet_hdr *hdr, enum lnet_msg_hstatus *hstatus) +lnet_drop_rule_match(struct lnet_hdr *hdr, + lnet_nid_t local_nid, + enum lnet_msg_hstatus *hstatus) { lnet_nid_t src = le64_to_cpu(hdr->src_nid); lnet_nid_t dst = le64_to_cpu(hdr->dest_nid); @@ -433,7 +446,7 @@ struct lnet_drop_rule { cpt = lnet_net_lock_current(); list_for_each_entry(rule, &the_lnet.ln_drop_rules, dr_link) { - drop = drop_rule_match(rule, src, dst, typ, ptl, + drop = drop_rule_match(rule, src, local_nid, dst, typ, ptl, hstatus); if (drop) break; @@ -524,7 +537,8 @@ struct delay_daemon_data { struct lnet_fault_attr *attr = &rule->dl_attr; bool delay; - if (!lnet_fault_attr_match(attr, src, dst, type, portal)) + if (!lnet_fault_attr_match(attr, src, LNET_NID_ANY, + dst, type, portal)) return false; /* match this rule, check delay rate now */ From patchwork Thu Feb 27 21:13:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410313 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E93B138D for ; Thu, 27 Feb 2020 21:34:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6730624677 for ; Thu, 27 Feb 2020 21:34:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6730624677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E628C349FA4; Thu, 27 Feb 2020 13:29:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88DB921FC09 for ; Thu, 27 Feb 2020 13:20:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6F1708A94; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C26447C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:38 -0500 Message-Id: <1582838290-17243-351-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 350/622] lnet: handle discovery off X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When discovery is turned off locally or when the peer either has discovery off or doesn't support MR at all then degrade discovery behavior to a standard ping. This will allow routers to continue using discovery mechanism even if it's turned off. WC-bug-id: https://jira.whamcloud.com/browse/LU-11641 Lustre-commit: f9ad0d13b092 ("LU-11641 lnet: handle discovery off") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33620 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 4 + net/lnet/lnet/lib-move.c | 21 +++-- net/lnet/lnet/peer.c | 176 ++++++++++++++++++++++++++++++------------ 3 files changed, 144 insertions(+), 57 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 4dee7a9..09adfc3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -863,6 +863,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, } bool lnet_peer_is_uptodate(struct lnet_peer *lp); +bool lnet_is_discovery_disabled(struct lnet_peer *lp); bool lnet_peer_gw_discovery(struct lnet_peer *lp); static inline bool @@ -874,6 +875,9 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return true; if (lp->lp_state & LNET_PEER_NO_DISCOVERY) return false; + /* if discovery is not enabled then no need to push */ + if (lnet_peer_discovery_disabled) + return false; if (lp->lp_node_seqno < atomic_read(&the_lnet.ln_ping_target_seqno)) return true; return false; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index fff9fea..0ff1d38 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1773,6 +1773,11 @@ struct lnet_ni * return 0; } + if (!lnet_msg_discovery(msg) || lnet_peer_is_uptodate(peer)) { + lnet_peer_ni_decref_locked(lpni); + return 0; + } + rc = lnet_discover_peer_locked(lpni, cpt, false); if (rc) { lnet_peer_ni_decref_locked(lpni); @@ -1802,6 +1807,7 @@ struct lnet_ni * struct lnet_peer_ni **gw_lpni, struct lnet_peer **gw_peer) { + int rc; struct lnet_peer *gw; struct lnet_route *best_route; struct lnet_route *last_route; @@ -1831,12 +1837,11 @@ struct lnet_ni * * This means we might delay the message until discovery has * completed */ - if (lnet_msg_discovery(sd->sd_msg) && - !lnet_peer_is_uptodate(gw)) { - sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; - return lnet_initiate_peer_discovery(lpni, sd->sd_msg, - sd->sd_rtr_nid, sd->sd_cpt); - } + sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; + rc = lnet_initiate_peer_discovery(lpni, sd->sd_msg, sd->sd_rtr_nid, + sd->sd_cpt); + if (rc) + return rc; if (!sd->sd_best_ni) { struct lnet_peer_net *lpeer; @@ -2358,8 +2363,8 @@ struct lnet_ni * * trigger discovery. */ peer = lpni->lpni_peer_net->lpn_peer; - if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { - rc = lnet_initiate_peer_discovery(lpni, msg, rtr_nid, cpt); + rc = lnet_initiate_peer_discovery(lpni, msg, rtr_nid, cpt); + if (rc) { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); return rc; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 2097a97..41a6180 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1444,7 +1444,10 @@ struct lnet_peer_net * struct lnet_peer *lp; struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; - unsigned int flags = 0; + /* Assume peer is Multi-Rail capable and let discovery find out + * otherwise. + */ + unsigned int flags = LNET_PEER_MULTI_RAIL; int rc = 0; if (nid == LNET_NID_ANY) { @@ -1742,9 +1745,34 @@ struct lnet_peer_ni * return lpni; } +bool +lnet_is_discovery_disabled_locked(struct lnet_peer *lp) +{ + if (lnet_peer_discovery_disabled) + return true; + + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL) || + (lp->lp_state & LNET_PEER_NO_DISCOVERY)) { + return true; + } + + return false; +} + /* * Peer Discovery */ +bool +lnet_is_discovery_disabled(struct lnet_peer *lp) +{ + bool rc = false; + + spin_lock(&lp->lp_lock); + rc = lnet_is_discovery_disabled_locked(lp); + spin_unlock(&lp->lp_lock); + + return rc; +} bool lnet_peer_gw_discovery(struct lnet_peer *lp) @@ -1777,13 +1805,8 @@ struct lnet_peer_ni * LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH)) { rc = false; - } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { - rc = true; } else if (lp->lp_state & LNET_PEER_REDISCOVER) { - if (lnet_peer_discovery_disabled) - rc = true; - else - rc = false; + rc = false; } else if (lnet_peer_needs_push(lp)) { rc = false; } else if (lp->lp_state & LNET_PEER_DISCOVERED) { @@ -2095,6 +2118,9 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) if (lnet_peer_is_uptodate(lp)) break; lnet_peer_queue_for_discovery(lp); + + if (lnet_is_discovery_disabled(lp)) + break; /* * if caller requested a non-blocking operation then * return immediately. Once discovery is complete then the @@ -2133,7 +2159,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) rc = lp->lp_dc_error; else if (!block) CDEBUG(D_NET, "non-blocking discovery\n"); - else if (!lnet_peer_is_uptodate(lp)) + else if (!lnet_peer_is_uptodate(lp) && !lnet_is_discovery_disabled(lp)) goto again; CDEBUG(D_NET, "peer %s NID %s: %d. %s\n", @@ -2205,6 +2231,34 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) } /* + * Only enable the multi-rail feature on the peer if both sides of + * the connection have discovery on + */ + if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { + CDEBUG(D_NET, "Peer %s has Multi-Rail feature enabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state |= LNET_PEER_MULTI_RAIL; + } else { + CDEBUG(D_NET, "Peer %s has Multi-Rail feature disabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state &= ~LNET_PEER_MULTI_RAIL; + } + + /* The peer may have discovery disabled at its end. Set + * NO_DISCOVERY as appropriate. + */ + if ((pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) && + !lnet_peer_discovery_disabled) { + CDEBUG(D_NET, "Peer %s has discovery enabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state &= ~LNET_PEER_NO_DISCOVERY; + } else { + CDEBUG(D_NET, "Peer %s has discovery disabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state |= LNET_PEER_NO_DISCOVERY; + } + + /* * Update the MULTI_RAIL flag based on the reply. If the peer * was configured with DLC then the setting should match what * DLC put in. @@ -2216,8 +2270,16 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) CWARN("Reply says %s is Multi-Rail, DLC says not\n", libcfs_nid2str(lp->lp_primary_nid)); } else { - lp->lp_state |= LNET_PEER_MULTI_RAIL; - lnet_peer_clr_non_mr_pref_nids(lp); + /* if discovery is disabled then we don't want to + * update the state of the peer. All we'll do is + * update the peer_nis which were reported back in + * the initial ping + */ + + if (!lnet_is_discovery_disabled_locked(lp)) { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } } } else if (lp->lp_state & LNET_PEER_MULTI_RAIL) { if (lp->lp_state & LNET_PEER_CONFIGURED) { @@ -2238,20 +2300,6 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) lp->lp_data_nnis = pbuf->pb_info.pi_nnis; /* - * The peer may have discovery disabled at its end. Set - * NO_DISCOVERY as appropriate. - */ - if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) { - CDEBUG(D_NET, "Peer %s has discovery disabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state |= LNET_PEER_NO_DISCOVERY; - } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { - CDEBUG(D_NET, "Peer %s has discovery enabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state &= ~LNET_PEER_NO_DISCOVERY; - } - - /* * Check for truncation of the Reply. Clear PING_SENT and set * PING_FAILED to trigger a retry. */ @@ -2284,8 +2332,9 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) } /* We're happy with the state of the data in the buffer. */ - CDEBUG(D_NET, "peer %s data present %u\n", - libcfs_nid2str(lp->lp_primary_nid), lp->lp_peer_seqno); + CDEBUG(D_NET, "peer %s data present %u. state = 0x%x\n", + libcfs_nid2str(lp->lp_primary_nid), lp->lp_peer_seqno, + lp->lp_state); if (lp->lp_state & LNET_PEER_DATA_PRESENT) lnet_ping_buffer_decref(lp->lp_data); else @@ -2517,6 +2566,14 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, delnis[ndelnis++] = curnis[i]; } + /* If we get here and the discovery is disabled then we don't want + * to add or delete any NIs. We just updated the ones we have some + * information on, and call it a day + */ + rc = 0; + if (lnet_is_discovery_disabled(lp)) + goto out; + for (i = 0; i < naddnis; i++) { rc = lnet_peer_add_nid(lp, addnis[i].ns_nid, flags); if (rc) { @@ -2561,7 +2618,8 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, kfree(addnis); kfree(delnis); lnet_ping_buffer_decref(pbuf); - CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + CDEBUG(D_NET, "peer %s (%p): %d\n", libcfs_nid2str(lp->lp_primary_nid), + lp, rc); if (rc) { spin_lock(&lp->lp_lock); @@ -2634,6 +2692,19 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, return 0; } +static bool lnet_is_nid_in_ping_info(lnet_nid_t nid, + struct lnet_ping_info *pinfo) +{ + int i; + + for (i = 0; i < pinfo->pi_nnis; i++) { + if (pinfo->pi_ni[i].ns_nid == nid) + return true; + } + + return false; +} + /* * Update a peer using the data received. */ @@ -2701,7 +2772,17 @@ static int lnet_peer_data_present(struct lnet_peer *lp) rc = lnet_peer_set_primary_nid(lp, nid, flags); if (!rc) rc = lnet_peer_merge_data(lp, pbuf); - } else if (lp->lp_primary_nid == nid) { + /* if the primary nid of the peer is present in the ping info returned + * from the peer, but it's not the local primary peer we have + * cached and discovery is disabled, then we don't want to update + * our local peer info, by adding or removing NIDs, we just want + * to update the status of the nids that we currently have + * recorded in that peer. + */ + } else if (lp->lp_primary_nid == nid || + (lnet_is_nid_in_ping_info(lp->lp_primary_nid, + &pbuf->pb_info) && + lnet_is_discovery_disabled(lp))) { rc = lnet_peer_merge_data(lp, pbuf); } else { lpni = lnet_find_peer_ni_locked(nid); @@ -2718,13 +2799,24 @@ static int lnet_peer_data_present(struct lnet_peer *lp) struct lnet_peer *new_lp; new_lp = lpni->lpni_peer_net->lpn_peer; + /* if lp has discovery/MR enabled that means new_lp + * should have discovery/MR enabled as well, since + * it's the same peer, which we're about to merge + */ + if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY)) + new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY; + if (lp->lp_state & LNET_PEER_MULTI_RAIL) + new_lp->lp_state |= LNET_PEER_MULTI_RAIL; + rc = lnet_peer_set_primary_data(new_lp, pbuf); lnet_consolidate_routes_locked(lp, new_lp); lnet_peer_ni_decref_locked(lpni); } } out: - CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + CDEBUG(D_NET, "peer %s(%p): %d. state = 0x%x\n", + libcfs_nid2str(lp->lp_primary_nid), lp, rc, + lp->lp_state); mutex_unlock(&the_lnet.ln_api_mutex); spin_lock(&lp->lp_lock); @@ -2941,7 +3033,8 @@ static int lnet_peer_send_push(struct lnet_peer *lp) LNetMDUnlink(lp->lp_push_mdh); LNetInvalidateMDHandle(&lp->lp_push_mdh); fail_error: - CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + CDEBUG(D_NET, "peer %s(%p): %d\n", + libcfs_nid2str(lp->lp_primary_nid), lp, rc); /* * The errors that get us here are considered hard errors and * cause Discovery to terminate. So we clear PUSH_SENT, but do @@ -2985,19 +3078,6 @@ static int lnet_peer_discovered(struct lnet_peer *lp) return 0; } -/* - * Mark the peer as to be rediscovered. - */ -static int lnet_peer_rediscover(struct lnet_peer *lp) -__must_hold(&lp->lp_lock) -{ - lp->lp_state |= LNET_PEER_REDISCOVER; - lp->lp_state &= ~LNET_PEER_DISCOVERING; - - CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); - - return 0; -} /* * Discovering this peer is taking too long. Cancel any Ping or Push @@ -3170,8 +3250,8 @@ static int lnet_peer_discovery(void *arg) * forcing a Ping or Push. */ spin_lock(&lp->lp_lock); - CDEBUG(D_NET, "peer %s state %#x\n", - libcfs_nid2str(lp->lp_primary_nid), + CDEBUG(D_NET, "peer %s(%p) state %#x\n", + libcfs_nid2str(lp->lp_primary_nid), lp, lp->lp_state); if (lp->lp_state & LNET_PEER_DATA_PRESENT) rc = lnet_peer_data_present(lp); @@ -3183,16 +3263,14 @@ static int lnet_peer_discovery(void *arg) rc = lnet_peer_send_ping(lp); else if (lp->lp_state & LNET_PEER_FORCE_PUSH) rc = lnet_peer_send_push(lp); - else if (lnet_peer_discovery_disabled) - rc = lnet_peer_rediscover(lp); else if (!(lp->lp_state & LNET_PEER_NIDS_UPTODATE)) rc = lnet_peer_send_ping(lp); else if (lnet_peer_needs_push(lp)) rc = lnet_peer_send_push(lp); else rc = lnet_peer_discovered(lp); - CDEBUG(D_NET, "peer %s state %#x rc %d\n", - libcfs_nid2str(lp->lp_primary_nid), + CDEBUG(D_NET, "peer %s(%p) state %#x rc %d\n", + libcfs_nid2str(lp->lp_primary_nid), lp, lp->lp_state, rc); spin_unlock(&lp->lp_lock); From patchwork Thu Feb 27 21:13:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E4F792A for ; Thu, 27 Feb 2020 21:35:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 86711246A2 for ; Thu, 27 Feb 2020 21:35:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 86711246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5206234949B; Thu, 27 Feb 2020 13:30:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DF69B21FC09 for ; Thu, 27 Feb 2020 13:20:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 706B08A95; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F1FA468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:39 -0500 Message-Id: <1582838290-17243-352-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 351/622] lnet: handle router health off X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Routing infrastructure depends on health infrastructure to manage route status. However, health can be turned off. Therefore, we need to enable health for gateways in order to monitor them properly. Each peer now has its own health sensitivity. When adding a route the gateway's health sensitivity can be explicitly set from lnetctl or if not specified then it'll default to 1, thereby turning health on for that gateway, allowing peer NI recovery if there is a failure. WC-bug-id: https://jira.whamcloud.com/browse/LU-11297 Lustre-commit: 00a2932b0aa7 ("LU-11297 lnet: handle router health off") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33634 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 5 +++-- include/linux/lnet/lib-types.h | 6 ++++++ include/uapi/linux/lnet/lnet-dlc.h | 1 + net/lnet/lnet/api-ni.c | 16 +++++++++++++--- net/lnet/lnet/config.c | 2 +- net/lnet/lnet/lib-msg.c | 20 +++++++++++++++----- net/lnet/lnet/peer.c | 6 ++++++ net/lnet/lnet/router.c | 11 +++++++---- 8 files changed, 52 insertions(+), 15 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 09adfc3..36aaaa5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -512,11 +512,12 @@ int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, bool alive, bool reset, void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, time64_t when); int lnet_add_route(u32 net, u32 hops, lnet_nid_t gateway_nid, - unsigned int priority); + u32 priority, u32 sensitivity); int lnet_del_route(u32 net, lnet_nid_t gw_nid); void lnet_destroy_routes(void); int lnet_get_route(int idx, u32 *net, u32 *hops, - lnet_nid_t *gateway, u32 *alive, u32 *priority); + lnet_nid_t *gateway, u32 *alive, u32 *priority, + u32 *sensitivity); int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg); struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 97d35e0..56654f5 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -606,6 +606,12 @@ struct lnet_peer { /* # refs from lnet_route_t::lr_gateway */ int lp_rtr_refcount; + /* + * peer specific health sensitivity value to decrement peer nis in + * this peer with if set to something other than 0 + */ + u32 lp_health_sensitivity; + /* messages blocking for router credits */ struct list_head lp_rtrq; diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 87f7680..e0b9eae 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -129,6 +129,7 @@ struct lnet_ioctl_config_data { __u32 rtr_hop; __u32 rtr_priority; __u32 rtr_flags; + __u32 rtr_sensitivity; } cfg_route; struct { char net_intf[LNET_MAX_STR_LEN]; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index b1823cd..702e4b9 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3455,19 +3455,28 @@ u32 lnet_get_dlc_seq_locked(void) case IOC_LIBCFS_FAIL_NID: return lnet_fail_nid(data->ioc_nid, data->ioc_count); - case IOC_LIBCFS_ADD_ROUTE: + case IOC_LIBCFS_ADD_ROUTE: { + /* default router sensitivity to 1 */ + unsigned int sensitivity = 1; config = arg; if (config->cfg_hdr.ioc_len < sizeof(*config)) return -EINVAL; + if (config->cfg_config_u.cfg_route.rtr_sensitivity) { + sensitivity = + config->cfg_config_u.cfg_route.rtr_sensitivity; + } + mutex_lock(&the_lnet.ln_api_mutex); rc = lnet_add_route(config->cfg_net, config->cfg_config_u.cfg_route.rtr_hop, config->cfg_nid, - config->cfg_config_u.cfg_route.rtr_priority); + config->cfg_config_u.cfg_route.rtr_priority, + sensitivity); mutex_unlock(&the_lnet.ln_api_mutex); return rc; + } case IOC_LIBCFS_DEL_ROUTE: config = arg; @@ -3492,7 +3501,8 @@ u32 lnet_get_dlc_seq_locked(void) &config->cfg_config_u.cfg_route.rtr_hop, &config->cfg_nid, &config->cfg_config_u.cfg_route.rtr_flags, - &config->cfg_config_u.cfg_route.rtr_priority); + &config->cfg_config_u.cfg_route.rtr_priority, + &config->cfg_config_u.cfg_route.rtr_sensitivity); mutex_unlock(&the_lnet.ln_api_mutex); return rc; diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 760452c..949cdd3 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1215,7 +1215,7 @@ struct lnet_ni * continue; } - rc = lnet_add_route(net, hops, nid, priority); + rc = lnet_add_route(net, hops, nid, priority, 1); if (rc && rc != -EEXIST && rc != -EHOSTUNREACH) { CERROR("Can't create route to %s via %s\n", libcfs_net2str(net), diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 8876866..9ffd874 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -448,14 +448,14 @@ } static void -lnet_dec_healthv_locked(atomic_t *healthv) +lnet_dec_healthv_locked(atomic_t *healthv, int sensitivity) { int h = atomic_read(healthv); - if (h < lnet_health_sensitivity) { + if (h < sensitivity) { atomic_set(healthv, 0); } else { - h -= lnet_health_sensitivity; + h -= sensitivity; atomic_set(healthv, h); } } @@ -473,7 +473,7 @@ return; } - lnet_dec_healthv_locked(&local_ni->ni_healthv); + lnet_dec_healthv_locked(&local_ni->ni_healthv, lnet_health_sensitivity); /* add the NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's * possible that the sensitivity might be set to 0, and the health @@ -495,11 +495,21 @@ void lnet_handle_remote_failure_locked(struct lnet_peer_ni *lpni) { + u32 sensitivity = lnet_health_sensitivity; + u32 lp_sensitivity; + /* lpni could be NULL if we're in the LOLND case */ if (!lpni) return; - lnet_dec_healthv_locked(&lpni->lpni_healthv); + /* If there is a health sensitivity in the peer then use that + * instead of the globally set one. + */ + lp_sensitivity = lpni->lpni_peer_net->lpn_peer->lp_health_sensitivity; + if (lp_sensitivity) + sensitivity = lp_sensitivity; + + lnet_dec_healthv_locked(&lpni->lpni_healthv, sensitivity); /* add the peer NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's * possible that the sensitivity might be set to 0, and the health diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 41a6180..294f968 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -217,6 +217,12 @@ spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; + /* all peers created on a router should have health on + * if it's not already on. + */ + if (the_lnet.ln_routing && !lnet_health_sensitivity) + lp->lp_health_sensitivity = 1; + /* Turn off discovery for loopback peer. If you're creating a peer * for the loopback interface then that was initiated when we * attempted to send a message over the loopback. There is no need diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index aa8ec8c..eb36df5 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -406,7 +406,7 @@ static void lnet_shuffle_seed(void) int lnet_add_route(u32 net, u32 hops, lnet_nid_t gateway, - unsigned int priority) + u32 priority, u32 sensitivity) { struct list_head *route_entry; struct lnet_remotenet *rnet; @@ -505,8 +505,10 @@ static void lnet_shuffle_seed(void) * to move the routes from the peer that's being deleted to the * consolidated peer lp_routes list */ - if (add_route) + if (add_route) { + gw->lp_health_sensitivity = sensitivity; lnet_add_route_to_rnet(rnet2, route); + } /* get rid of the reference on the lpni. */ @@ -675,13 +677,13 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) int lnet_get_route(int idx, u32 *net, u32 *hops, - lnet_nid_t *gateway, u32 *alive, u32 *priority) + lnet_nid_t *gateway, u32 *alive, u32 *priority, u32 *sensitivity) { struct lnet_remotenet *rnet; + struct list_head *rn_list; struct lnet_route *route; int cpt; int i; - struct list_head *rn_list; cpt = lnet_net_lock_current(); @@ -695,6 +697,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) *hops = route->lr_hops; *priority = route->lr_priority; + *sensitivity = route->lr_gateway->lp_health_sensitivity; *alive = lnet_is_route_alive(route); lnet_net_unlock(cpt); return 0; From patchwork Thu Feb 27 21:13:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410317 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63A0392A for ; Thu, 27 Feb 2020 21:34:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4C7CE24677 for ; Thu, 27 Feb 2020 21:34:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C7CE24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D0C46349418; Thu, 27 Feb 2020 13:29:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 444D021FA61 for ; Thu, 27 Feb 2020 13:20:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 72E7D8A96; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 71D2046A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:40 -0500 Message-Id: <1582838290-17243-353-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 352/622] lnet: push router interface updates X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata A router can bring up/down its interfaces if it hasn't received any messages on that interface for a configurable period (alive_router_ping_timeout). When this even occures the router can now push its status change to the peers it's talking to in order to inform them of the change in its status. This will allow the router users to handle asym router failures quicker. WC-bug-id: https://jira.whamcloud.com/browse/LU-11664 Lustre-commit: 0fa02a7d81e7 ("LU-11664 lnet: push router interface updates") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33651 Reviewed-by: Sebastien Buisson Reviewed-by: Alexey Lyashkov Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 18 ++++++++++++------ net/lnet/lnet/router.c | 13 +++++++++++-- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 0ff1d38..d6cbcd1 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3840,16 +3840,17 @@ void lnet_monitor_thr_stop(void) lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid, void *private, int rdma_req) { - int rc = 0; - int cpt; - int for_me; + struct lnet_peer_ni *lpni; struct lnet_msg *msg; + u32 payload_length; lnet_pid_t dest_pid; lnet_nid_t dest_nid; lnet_nid_t src_nid; - struct lnet_peer_ni *lpni; - u32 payload_length; + bool push = false; + int for_me; u32 type; + int rc = 0; + int cpt; LASSERT(!in_interrupt()); @@ -3907,11 +3908,16 @@ void lnet_monitor_thr_stop(void) lnet_ni_lock(ni); ni->ni_last_alive = ktime_get_real_seconds(); if (ni->ni_status && - ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) + ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) { ni->ni_status->ns_status = LNET_NI_STATUS_UP; + push = true; + } lnet_ni_unlock(ni); } + if (push) + lnet_push_update_to_peers(1); + /* * Regard a bad destination NID as a protocol error. Senders should * know what they're doing; if they don't they're misconfigured, buggy diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index eb36df5..0a396d9 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -742,10 +742,11 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -static void +static bool lnet_update_ni_status_locked(void) { struct lnet_ni *ni = NULL; + bool push = false; time64_t now; time64_t timeout; @@ -778,9 +779,12 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) * NI status to "down" */ ni->ni_status->ns_status = LNET_NI_STATUS_DOWN; + push = true; } lnet_ni_unlock(ni); } + + return push; } void lnet_wait_router_start(void) @@ -817,6 +821,7 @@ bool lnet_router_checker_active(void) { struct lnet_peer_ni *lpni; struct lnet_peer *rtr; + bool push = false; u64 version; time64_t now; int cpt; @@ -883,9 +888,13 @@ bool lnet_router_checker_active(void) } if (the_lnet.ln_routing) - lnet_update_ni_status_locked(); + push = lnet_update_ni_status_locked(); lnet_net_unlock(cpt); + + /* if the status of the ni changed update the peers */ + if (push) + lnet_push_update_to_peers(1); } void From patchwork Thu Feb 27 21:13:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410361 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99D4792A for ; Thu, 27 Feb 2020 21:36:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8257924677 for ; Thu, 27 Feb 2020 21:36:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8257924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 83EA421C911; Thu, 27 Feb 2020 13:30:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9F02521FA61 for ; Thu, 27 Feb 2020 13:20:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7597E8A98; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7492846C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:41 -0500 Message-Id: <1582838290-17243-354-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 353/622] lnet: net aliveness X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata If a router is discovered on any interface on the network, then update the network last alive time and the NI's status to UP. If a router isn't discovered on any interface on a network, then change the status of all the interfaces on that network to down. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: 1d80e9debf99 ("LU-11299 lnet: net aliveness") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34510 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 9 +++++--- net/lnet/lnet/config.c | 3 ++- net/lnet/lnet/lib-move.c | 7 +++--- net/lnet/lnet/router.c | 52 ++++++++++++++++++++++++++---------------- net/lnet/lnet/router_proc.c | 2 +- 5 files changed, 45 insertions(+), 28 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 56654f5..7b43236 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -397,6 +397,12 @@ struct lnet_net { /* dying LND instances */ struct list_head net_ni_zombie; + + /* when I was last alive */ + time64_t net_last_alive; + + /* protects access to net_last_alive */ + spinlock_t net_lock; }; struct lnet_ni { @@ -431,9 +437,6 @@ struct lnet_ni { /* percpt reference count */ int **ni_refs; - /* when I was last alive */ - time64_t ni_last_alive; - /* pointer to parent network */ struct lnet_net *ni_net; diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 949cdd3..a2a9c79 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -366,8 +366,10 @@ struct lnet_net * INIT_LIST_HEAD(&net->net_ni_list); INIT_LIST_HEAD(&net->net_ni_added); INIT_LIST_HEAD(&net->net_ni_zombie); + spin_lock_init(&net->net_lock); net->net_id = net_id; + net->net_last_alive = ktime_get_real_seconds(); /* initialize global paramters to undefiend */ net->net_tunables.lct_peer_timeout = -1; @@ -467,7 +469,6 @@ struct lnet_net * else ni->ni_net_ns = NULL; - ni->ni_last_alive = ktime_get_real_seconds(); ni->ni_state = LNET_NI_STATE_INIT; list_add_tail(&ni->ni_netlist, &net->net_ni_added); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index d6cbcd1..ec32d22 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3903,10 +3903,11 @@ void lnet_monitor_thr_stop(void) } if (the_lnet.ln_routing && - ni->ni_last_alive != ktime_get_real_seconds()) { - /* NB: so far here is the only place to set NI status to "up */ + ni->ni_net->net_last_alive != ktime_get_real_seconds()) { lnet_ni_lock(ni); - ni->ni_last_alive = ktime_get_real_seconds(); + spin_lock(&ni->ni_net->net_lock); + ni->ni_net->net_last_alive = ktime_get_real_seconds(); + spin_unlock(&ni->ni_net->net_lock); if (ni->ni_status && ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) { ni->ni_status->ns_status = LNET_NI_STATUS_UP; diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 0a396d9..4ca3c5c 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -742,10 +742,29 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } +static inline bool +lnet_net_set_status_locked(struct lnet_net *net, u32 status) +{ + struct lnet_ni *ni; + bool update = false; + + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + lnet_ni_lock(ni); + if (ni->ni_status && + ni->ni_status->ns_status != status) { + ni->ni_status->ns_status = status; + update = true; + } + lnet_ni_unlock(ni); + } + + return update; +} + static bool lnet_update_ni_status_locked(void) { - struct lnet_ni *ni = NULL; + struct lnet_net *net; bool push = false; time64_t now; time64_t timeout; @@ -755,33 +774,26 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) timeout = router_ping_timeout + alive_router_check_interval; now = ktime_get_real_seconds(); - while ((ni = lnet_get_next_ni_locked(NULL, ni))) { - if (ni->ni_net->net_lnd->lnd_type == LOLND) + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + if (net->net_lnd->lnd_type == LOLND) continue; - if (now < ni->ni_last_alive + timeout) + if (now < net->net_last_alive + timeout) continue; - lnet_ni_lock(ni); + spin_lock(&net->net_lock); /* re-check with lock */ - if (now < ni->ni_last_alive + timeout) { - lnet_ni_unlock(ni); + if (now < net->net_last_alive + timeout) { + spin_unlock(&net->net_lock); continue; } + spin_unlock(&net->net_lock); - LASSERT(ni->ni_status); - - if (ni->ni_status->ns_status != LNET_NI_STATUS_DOWN) { - CDEBUG(D_NET, "NI(%s:%lld) status changed to down\n", - libcfs_nid2str(ni->ni_nid), timeout); - /* - * NB: so far, this is the only place to set - * NI status to "down" - */ - ni->ni_status->ns_status = LNET_NI_STATUS_DOWN; - push = true; - } - lnet_ni_unlock(ni); + /* if the net didn't receive any traffic for past the + * timeout on any of its constituent NIs, then mark all + * the NIs down. + */ + push = lnet_net_set_status_locked(net, LNET_NI_STATUS_DOWN); } return push; diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 9771ef0..2e9342c 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write, int j; if (the_lnet.ln_routing) - last_alive = now - ni->ni_last_alive; + last_alive = now - ni->ni_net->net_last_alive; lnet_ni_lock(ni); LASSERT(ni->ni_status); From patchwork Thu Feb 27 21:13:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A495138D for ; Thu, 27 Feb 2020 21:36:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7341D24677 for ; Thu, 27 Feb 2020 21:36:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7341D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 68479348D0F; Thu, 27 Feb 2020 13:30:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 017C021FA61 for ; Thu, 27 Feb 2020 13:20:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 785F28A99; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7754A46D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:42 -0500 Message-Id: <1582838290-17243-355-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 354/622] lnet: discover each gateway Net X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Wakeup every gateway aliveness interval / number of local networks. Discover each local gateway network in round robin. This is done to make sure the gateway keeps its networks up. WC-bug-id: https://jira.whamcloud.com/browse/LU-11299 Lustre-commit: 526679c681c3 ("LU-11299 lnet: discover each gateway Net") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34511 Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 5 ++++ include/linux/lnet/lib-types.h | 9 ++++--- net/lnet/lnet/api-ni.c | 39 +++++++++++++++++++++++++++--- net/lnet/lnet/lib-move.c | 19 ++++++++++++--- net/lnet/lnet/peer.c | 32 ++++++++++++++++++++++++ net/lnet/lnet/router.c | 55 ++++++++++++++++++++++++++++++++++++++---- 6 files changed, 145 insertions(+), 14 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 36aaaa5..3dd56a2 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -53,6 +53,7 @@ #define CFS_FAIL_PTLRPC_OST_BULK_CB2 0xe000 extern struct lnet the_lnet; /* THE network */ +extern unsigned int lnet_current_net_count; #if (BITS_PER_LONG == 32) /* 2 CPTs, allowing more CPTs might make us under memory pressure */ @@ -547,6 +548,7 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src, int lnet_islocalnid(lnet_nid_t nid); int lnet_islocalnet(u32 net); +int lnet_islocalnet_locked(u32 net); void lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md, unsigned int offset, unsigned int mlen); @@ -796,7 +798,10 @@ bool lnet_net_unique(u32 net_id, struct list_head *nilist, bool lnet_ni_unique_net(struct list_head *nilist, char *iface); void lnet_incr_dlc_seq(void); u32 lnet_get_dlc_seq_locked(void); +int lnet_get_net_count(void); +struct lnet_peer_net *lnet_get_next_peer_net_locked(struct lnet_peer *lp, + u32 prev_lpn_id); struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *peer_net, struct lnet_peer_ni *prev); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 7b43236..8c9ae9e 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -600,6 +600,9 @@ struct lnet_peer { /* primary NID of the peer */ lnet_nid_t lp_primary_nid; + /* net to perform discovery on */ + u32 lp_disc_net_id; + /* CPT of peer_table */ int lp_cpt; @@ -621,9 +624,6 @@ struct lnet_peer { /* routes on this peer */ struct list_head lp_routes; - /* time of last router check attempt */ - time64_t lp_rtrcheck_timestamp; - /* reference count */ atomic_t lp_refcount; @@ -744,6 +744,9 @@ struct lnet_peer_net { /* Net ID */ u32 lpn_net_id; + /* time of last router net check attempt */ + time64_t lpn_rtrcheck_timestamp; + /* reference count */ atomic_t lpn_refcount; }; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 702e4b9..65f1f17 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -171,6 +171,7 @@ static int recovery_interval_set(const char *val, "Maximum number of times to retry transmitting a message"); unsigned int lnet_lnd_timeout = LNET_LND_DEFAULT_TIMEOUT; +unsigned int lnet_current_net_count; /* * This sequence number keeps track of how many times DLC was used to @@ -1294,16 +1295,28 @@ struct lnet_net * EXPORT_SYMBOL(lnet_cpt_of_nid); int -lnet_islocalnet(u32 net_id) +lnet_islocalnet_locked(u32 net_id) { struct lnet_net *net; + + net = lnet_get_net_locked(net_id); + + return !!net; +} + +int +lnet_islocalnet(u32 net_id) +{ int cpt; + bool local; cpt = lnet_net_lock_current(); - net = lnet_get_net_locked(net_id); + + local = lnet_islocalnet_locked(net_id); + lnet_net_unlock(cpt); - return !!net; + return local; } struct lnet_ni * @@ -1457,6 +1470,23 @@ struct lnet_ping_buffer * return count; } +int +lnet_get_net_count(void) +{ + struct lnet_net *net; + int count = 0; + + lnet_net_lock(0); + + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + count++; + } + + lnet_net_unlock(0); + + return count; +} + void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) { @@ -2292,6 +2322,9 @@ static void lnet_push_target_fini(void) lnet_net_unlock(LNET_LOCK_EX); } + /* update net count */ + lnet_current_net_count = lnet_get_net_count(); + return ni_count; failed1: diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ec32d22..e93284b 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1922,7 +1922,8 @@ struct lnet_ni * } struct lnet_ni * -lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt) +lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt, + bool discovery) { struct lnet_peer_net *peer_net = NULL; struct lnet_ni *best_ni = NULL; @@ -1943,6 +1944,12 @@ struct lnet_ni * best_ni = lnet_find_best_ni_on_spec_net(best_ni, peer, peer_net, md_cpt, false); + /* if this is a discovery message and lp_disc_net_id is + * specified then use that net to send the discovery on. + */ + if (peer->lp_disc_net_id == peer_net->lpn_net_id && + discovery) + break; } if (best_ni) @@ -2101,7 +2108,8 @@ struct lnet_ni * * networks. */ sd->sd_best_ni = lnet_find_best_ni_on_local_net(sd->sd_peer, - sd->sd_md_cpt); + sd->sd_md_cpt, + lnet_msg_discovery(sd->sd_msg)); if (sd->sd_best_ni) { sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd, sd->sd_peer, @@ -3145,9 +3153,14 @@ struct lnet_mt_event_info { * if we wake up every 1 second? Although, we've seen * cases where we get a complaint that an idle thread * is waking up unnecessarily. + * + * Take into account the current net_count when you wake + * up for alive router checking, since we need to check + * possibly as many networks as we have configured. */ interval = min(lnet_recovery_interval, - min((unsigned int)alive_router_check_interval, + min((unsigned int)alive_router_check_interval / + lnet_current_net_count, lnet_transaction_timeout / 2)); wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, false, HZ * interval); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 294f968..55ff01d 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -710,6 +710,38 @@ struct lnet_peer * return lp; } +struct lnet_peer_net * +lnet_get_next_peer_net_locked(struct lnet_peer *lp, u32 prev_lpn_id) +{ + struct lnet_peer_net *net; + + if (!prev_lpn_id) { + /* no net id provided return the first net */ + net = list_first_entry_or_null(&lp->lp_peer_nets, + struct lnet_peer_net, + lpn_peer_nets); + + return net; + } + + /* find the net after the one provided */ + list_for_each_entry(net, &lp->lp_peer_nets, lpn_peer_nets) { + if (net->lpn_net_id == prev_lpn_id) { + /* if we reached the end of the list loop to the + * beginning. + */ + if (net->lpn_peer_nets.next == &lp->lp_peer_nets) + return list_first_entry_or_null(&lp->lp_peer_nets, + struct lnet_peer_net, + lpn_peer_nets); + else + return list_next_entry(net, lpn_peer_nets); + } + } + + return NULL; +} + struct lnet_peer_ni * lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *peer_net, diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 4ca3c5c..81f7a94 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -370,8 +370,9 @@ static void lnet_shuffle_seed(void) static void lnet_add_route_to_rnet(struct lnet_remotenet *rnet, struct lnet_route *route) { - unsigned int len = 0; + struct lnet_peer_net *lpn; unsigned int offset = 0; + unsigned int len = 0; struct list_head *e; lnet_shuffle_seed(); @@ -393,7 +394,10 @@ static void lnet_shuffle_seed(void) /* force a router check on the gateway to make sure the route is * alive */ - route->lr_gateway->lp_rtrcheck_timestamp = 0; + list_for_each_entry(lpn, &route->lr_gateway->lp_peer_nets, + lpn_peer_nets) { + lpn->lpn_rtrcheck_timestamp = 0; + } the_lnet.ln_remote_nets_version++; @@ -618,6 +622,17 @@ static void lnet_shuffle_seed(void) } delete_zombies: + /* check if there are any routes remaining on the gateway + * If there are no more routes make sure to set the peer's + * lp_disc_net_id to 0 (invalid), in case we add more routes in + * the future on that gateway, then we start our discovery process + * from scratch + */ + if (lpni) { + if (list_empty(&lp->lp_routes)) + lp->lp_disc_net_id = 0; + } + lnet_net_unlock(LNET_LOCK_EX); while (!list_empty(&zombies)) { @@ -831,10 +846,14 @@ bool lnet_router_checker_active(void) void lnet_check_routers(void) { + struct lnet_peer_net *first_lpn = NULL; + struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; struct lnet_peer *rtr; bool push = false; + bool found_lpn; u64 version; + u32 net_id; time64_t now; int cpt; int rc; @@ -851,8 +870,31 @@ bool lnet_router_checker_active(void) * interfaces could be down and in that case they would be * undergoing recovery separately from this discovery. */ - if (now - rtr->lp_rtrcheck_timestamp < - alive_router_check_interval) + /* find next peer net which is also local */ + net_id = rtr->lp_disc_net_id; + do { + lpn = lnet_get_next_peer_net_locked(rtr, net_id); + if (!lpn) { + CERROR("gateway %s has no networks\n", + libcfs_nid2str(rtr->lp_primary_nid)); + break; + } + if (first_lpn == lpn) + break; + if (!first_lpn) + first_lpn = lpn; + found_lpn = lnet_islocalnet_locked(lpn->lpn_net_id); + net_id = lpn->lpn_net_id; + } while (!found_lpn); + + if (!found_lpn || !lpn) { + CERROR("no local network found for gateway %s\n", + libcfs_nid2str(rtr->lp_primary_nid)); + continue; + } + + if (now - lpn->lpn_rtrcheck_timestamp < + alive_router_check_interval / lnet_current_net_count) continue; /* If we're currently discovering the peer then don't @@ -878,6 +920,9 @@ bool lnet_router_checker_active(void) } lnet_peer_ni_addref_locked(lpni); + /* specify the net to use */ + rtr->lp_disc_net_id = lpn->lpn_net_id; + /* discover the router */ CDEBUG(D_NET, "discover %s, cpt = %d\n", libcfs_nid2str(lpni->lpni_nid), cpt); @@ -887,7 +932,7 @@ bool lnet_router_checker_active(void) lnet_peer_ni_decref_locked(lpni); if (!rc) - rtr->lp_rtrcheck_timestamp = now; + lpn->lpn_rtrcheck_timestamp = now; else CERROR("Failed to discover router %s\n", libcfs_nid2str(rtr->lp_primary_nid)); From patchwork Thu Feb 27 21:13:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35DD2138D for ; Thu, 27 Feb 2020 21:43:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1E654246A3 for ; Thu, 27 Feb 2020 21:43:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E654246A3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1606E21FC93; Thu, 27 Feb 2020 13:34:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5966221FA61 for ; Thu, 27 Feb 2020 13:20:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7B7AA8A9A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A28B47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:43 -0500 Message-Id: <1582838290-17243-356-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 355/622] lnet: look up MR peers routes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata An MR peer can have multiple interfaces some of which we might have a route to. The primary NID of the peer might not necessarily specify a NID we have a route to. When looking up a route, we must iterate over all the nets the peer is on and select the one which we can route to. Taking into consideration the peer can exist on multiple routed networks we also have a simple round robin algorithm to iterate over all the networks we can reach the peer on. WC-bug-id: https://jira.whamcloud.com/browse/LU-12053 Lustre-commit: 52eef8179743 ("LU-12053 lnet: look up MR peers routes") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34625 Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 3 ++ net/lnet/lnet/lib-move.c | 73 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 62 insertions(+), 14 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 8c9ae9e..da5b860 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -747,6 +747,9 @@ struct lnet_peer_net { /* time of last router net check attempt */ time64_t lpn_rtrcheck_timestamp; + /* selection sequence number */ + u32 lpn_seq; + /* reference count */ atomic_t lpn_refcount; }; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index e93284b..f0804e1 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1809,21 +1809,60 @@ struct lnet_ni * { int rc; struct lnet_peer *gw; + struct lnet_peer *lp; + struct lnet_peer_net *lpn; + struct lnet_peer_net *best_lpn = NULL; + struct lnet_remotenet *rnet; struct lnet_route *best_route; struct lnet_route *last_route; struct lnet_peer_ni *lpni = NULL; + struct lnet_peer_ni *gwni = NULL; lnet_nid_t src_nid = sd->sd_src_nid; - best_route = lnet_find_route_locked(NULL, LNET_NIDNET(dst_nid), + /* we've already looked up the initial lpni using dst_nid */ + lpni = sd->sd_best_lpni; + /* the peer tree must be in existence */ + LASSERT(lpni && lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer); + lp = lpni->lpni_peer_net->lpn_peer; + + list_for_each_entry(lpn, &lp->lp_peer_nets, lpn_peer_nets) { + /* is this remote network reachable? */ + rnet = lnet_find_rnet_locked(lpn->lpn_net_id); + if (!rnet) + continue; + + if (!best_lpn) + best_lpn = lpn; + + if (best_lpn->lpn_seq <= lpn->lpn_seq) + continue; + + best_lpn = lpn; + } + + if (!best_lpn) { + CERROR("peer %s has no available nets\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd, lp, + best_lpn->lpn_net_id); + if (!sd->sd_best_lpni) { + CERROR("peer %s down\n", libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } + + best_route = lnet_find_route_locked(NULL, best_lpn->lpn_net_id, sd->sd_rtr_nid, &last_route, - &lpni); + &gwni); if (!best_route) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); return -EHOSTUNREACH; } - if (!lpni) { + if (!gwni) { CERROR("Internal Error. Route expected to %s from %s\n", libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); @@ -1831,14 +1870,14 @@ struct lnet_ni * } gw = best_route->lr_gateway; - LASSERT(gw == lpni->lpni_peer_net->lpn_peer); + LASSERT(gw == gwni->lpni_peer_net->lpn_peer); /* Discover this gateway if it hasn't already been discovered. * This means we might delay the message until discovery has * completed */ sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; - rc = lnet_initiate_peer_discovery(lpni, sd->sd_msg, sd->sd_rtr_nid, + rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_rtr_nid, sd->sd_cpt); if (rc) return rc; @@ -1858,14 +1897,15 @@ struct lnet_ni * return -EFAULT; } - *gw_lpni = lpni; + *gw_lpni = gwni; *gw_peer = gw; - /* increment the route sequence number since now we're sure we're - * going to use it + /* increment the sequence numbers since now we're sure we're + * going to use this path */ LASSERT(best_route && last_route); best_route->lr_seq = last_route->lr_seq + 1; + best_lpn->lpn_seq++; return 0; } @@ -2208,11 +2248,11 @@ struct lnet_ni * if (rc != PASS_THROUGH) return rc; - /* TODO; One possible enhancement is to run the selection - * algorithm on the peer. However for remote peers the credits are - * not decremented, so we'll be basically going over the peer NIs - * in round robin. An MR router will run the selection algorithm - * on the next-hop interfaces. + /* Now that we must route to the destination, we must consider the + * MR case, where the destination has multiple interfaces, some of + * which we can route to and others we do not. For this reason we + * need to select the destination which we can route to and if + * there are multiple, we need to round robin. */ rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni, &gw_peer); @@ -2455,8 +2495,13 @@ struct lnet_ni * LASSERT(!msg->msg_tx_committed); rc = lnet_select_pathway(src_nid, dst_nid, msg, rtr_nid); - if (rc < 0) + if (rc < 0) { + if (rc == -EHOSTUNREACH) + msg->msg_health_status = LNET_MSG_STATUS_REMOTE_ERROR; + else + msg->msg_health_status = LNET_MSG_STATUS_LOCAL_ERROR; return rc; + } if (rc == LNET_CREDIT_OK) lnet_ni_send(msg->msg_txni, msg); From patchwork Thu Feb 27 21:13:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E73E992A for ; Thu, 27 Feb 2020 21:36:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D004724677 for ; Thu, 27 Feb 2020 21:36:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D004724677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AE34B3494BB; Thu, 27 Feb 2020 13:30:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B37BF21FD37 for ; Thu, 27 Feb 2020 13:20:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7DEF18A9B; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7CDD8468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:44 -0500 Message-Id: <1582838290-17243-357-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 356/622] lnet: check peer timeout on a router X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata On a router assume that a peer is alive and attempt to send it messages as long as the peer_timeout hasn't expired. WC-bug-id: https://jira.whamcloud.com/browse/LU-12200 Lustre-commit: 41f3c27adf16 ("LU-12200 lnet: check peer timeout on a router") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/34772 Reviewed-by: Sebastien Buisson Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/lib-move.c | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index da5b860..b240361 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -566,6 +566,8 @@ struct lnet_peer_ni { u32 lpni_gw_seq; /* returned RC ping features. Protected with lpni_lock */ unsigned int lpni_ping_feats; + /* time last message was received from the peer */ + time64_t lpni_last_alive; /* preferred local nids: if only one, use lpni_pref.nid */ union lpni_pref { lnet_nid_t nid; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index f0804e1..629856c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -608,6 +608,23 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return rc; } +static bool +lnet_is_peer_deadline_passed(struct lnet_peer_ni *lpni, time64_t now) +{ + time64_t deadline; + + deadline = lpni->lpni_last_alive + + lpni->lpni_net->net_tunables.lct_peer_timeout; + + /* assume peer_ni is alive as long as we're within the configured + * peer timeout + */ + if (deadline > now) + return false; + + return true; +} + /* * NB: returns 1 when alive, 0 when dead, negative when error; * may drop the lnet_net_lock @@ -616,6 +633,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lpni, struct lnet_msg *msg) { + time64_t now = ktime_get_seconds(); + if (!lnet_peer_aliveness_enabled(lpni)) return -ENODEV; @@ -635,6 +654,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, msg->msg_type == LNET_MSG_REPLY) return 1; + if (!lnet_is_peer_deadline_passed(lpni, now)) + return true; + return lnet_is_peer_ni_alive(lpni); } @@ -4142,6 +4164,10 @@ void lnet_monitor_thr_stop(void) return 0; goto drop; } + + if (the_lnet.ln_routing) + lpni->lpni_last_alive = ktime_get_seconds(); + msg->msg_rxpeer = lpni; msg->msg_rxni = ni; lnet_ni_addref_locked(ni, cpt); From patchwork Thu Feb 27 21:13:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410321 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D29592A for ; Thu, 27 Feb 2020 21:34:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 75B3C24677 for ; Thu, 27 Feb 2020 21:34:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75B3C24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9C5BD349434; Thu, 27 Feb 2020 13:29:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 04D1621FD37 for ; Thu, 27 Feb 2020 13:20:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 823C38A9C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7FD1C46A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:45 -0500 Message-Id: <1582838290-17243-358-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 357/622] lustre: lmv: reuse object alloc QoS code from LOD X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Reuse the same object alloc QoS code as LOD, but the QoS code is not moved to lower layer module, instead it's copied to LMV, because it involves almost all LMV code, which is too big a change and should be done separately in the future. And for LMV round-robin object allocation, because we only need to allocate one object, use the MDT index saved and update it to next MDT. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: b601eb35e97a ("LU-11213 lmv: reuse object alloc QoS code from LOD") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/34657 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 88 +++++++ fs/lustre/include/obd.h | 36 +-- fs/lustre/lmv/Makefile | 2 +- fs/lustre/lmv/lmv_intent.c | 10 +- fs/lustre/lmv/lmv_internal.h | 8 +- fs/lustre/lmv/lmv_obd.c | 106 +++++--- fs/lustre/lmv/lmv_qos.c | 446 +++++++++++++++++++++++++++++++++ fs/lustre/lmv/lproc_lmv.c | 108 +++++++- fs/lustre/obdclass/Makefile | 2 +- fs/lustre/obdclass/lu_qos.c | 166 ++++++++++++ include/uapi/linux/lustre/lustre_idl.h | 2 + 11 files changed, 896 insertions(+), 78 deletions(-) create mode 100644 fs/lustre/lmv/lmv_qos.c create mode 100644 fs/lustre/obdclass/lu_qos.c diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index c34605c..0f3e3be 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1303,5 +1303,93 @@ struct lu_kmem_descr { extern u32 lu_context_tags_default; extern u32 lu_session_tags_default; +/* Generic subset of OSTs */ +struct ost_pool { + u32 *op_array; /* array of index of + * lov_obd->lov_tgts + */ + unsigned int op_count; /* number of OSTs in the array */ + unsigned int op_size; /* allocated size of lp_array */ + struct rw_semaphore op_rw_sem; /* to protect ost_pool use */ +}; + +/* round-robin QoS data for LOD/LMV */ +struct lu_qos_rr { + spinlock_t lqr_alloc; /* protect allocation index */ + u32 lqr_start_idx; /* start index of new inode */ + u32 lqr_offset_idx;/* aliasing for start_idx */ + int lqr_start_count;/* reseed counter */ + struct ost_pool lqr_pool; /* round-robin optimized list */ + unsigned long lqr_dirty:1; /* recalc round-robin list */ +}; + +/* QoS data per MDS/OSS */ +struct lu_svr_qos { + struct obd_uuid lsq_uuid; /* ptlrpc's c_remote_uuid */ + struct list_head lsq_svr_list; /* link to lq_svr_list */ + u64 lsq_bavail; /* total bytes avail on svr */ + u64 lsq_iavail; /* tital inode avail on svr */ + u64 lsq_penalty; /* current penalty */ + u64 lsq_penalty_per_obj; /* penalty decrease + * every obj + */ + time64_t lsq_used; /* last used time, seconds */ + u32 lsq_tgt_count; /* number of tgts on this svr */ + u32 lsq_id; /* unique svr id */ +}; + +/* QoS data per MDT/OST */ +struct lu_tgt_qos { + struct lu_svr_qos *ltq_svr; /* svr info */ + u64 ltq_penalty; /* current penalty */ + u64 ltq_penalty_per_obj; /* penalty decrease + * every obj + */ + u64 ltq_weight; /* net weighting */ + time64_t ltq_used; /* last used time, seconds */ + bool ltq_usable:1; /* usable for striping */ +}; + +/* target descriptor */ +struct lu_tgt_desc { + union { + struct dt_device *ltd_tgt; + struct obd_device *ltd_obd; + }; + struct obd_export *ltd_exp; + struct obd_uuid ltd_uuid; + u32 ltd_index; + u32 ltd_gen; + struct list_head ltd_kill; + struct ptlrpc_thread *ltd_recovery_thread; + struct mutex ltd_fid_mutex; + struct lu_tgt_qos ltd_qos; /* qos info per target */ + struct obd_statfs ltd_statfs; + time64_t ltd_statfs_age; + unsigned long ltd_active:1, /* is this target up for requests */ + ltd_activate:1, /* should target be activated */ + ltd_reap:1, /* should this target be deleted */ + ltd_got_update_log:1, /* Already got update log */ + ltd_connecting:1; /* target is connecting */ +}; + +/* QoS data for LOD/LMV */ +struct lu_qos { + struct list_head lq_svr_list; /* lu_svr_qos list */ + struct rw_semaphore lq_rw_sem; + u32 lq_active_svr_count; + unsigned int lq_prio_free; /* priority for free space */ + unsigned int lq_threshold_rr;/* priority for rr */ + struct lu_qos_rr lq_rr; /* round robin qos data */ + unsigned long lq_dirty:1, /* recalc qos data */ + lq_same_space:1,/* the servers all have approx. + * the same space avail + */ + lq_reset:1; /* zero current penalties */ +}; + +int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); +int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); + /** @} lu */ #endif /* __LUSTRE_LU_OBJECT_H */ diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index e815584..2f878d6 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -87,7 +87,7 @@ struct obd_info { /* OBD_STATFS_* flags */ u64 oi_flags; struct obd_device *oi_obd; - struct lmv_tgt_desc *oi_tgt; + struct lu_tgt_desc *oi_tgt; /* lsm data specific for every OSC. */ struct lov_stripe_md *oi_md; /* statfs data specific for every OSC, if needed at all. */ @@ -377,28 +377,10 @@ struct echo_client_obd { u64 ec_unique; }; -/* Generic subset of OSTs */ -struct ost_pool { - u32 *op_array; /* array of index of lov_obd->lov_tgts */ - unsigned int op_count; /* number of OSTs in the array */ - unsigned int op_size; /* allocated size of lp_array */ - struct rw_semaphore op_rw_sem; /* to protect ost_pool use */ -}; - /* allow statfs data caching for 1 second */ #define OBD_STATFS_CACHE_SECONDS 1 -struct lov_tgt_desc { - struct list_head ltd_kill; - struct obd_uuid ltd_uuid; - struct obd_device *ltd_obd; - struct obd_export *ltd_exp; - u32 ltd_gen; - u32 ltd_index; /* index in lov_obd->tgts */ - unsigned long ltd_active:1,/* is this target up for requests */ - ltd_activate:1,/* should target be activated */ - ltd_reap:1; /* should this target be deleted */ -}; +#define lov_tgt_desc lu_tgt_desc struct lov_md_tgt_desc { struct obd_device *lmtd_mdc; @@ -431,16 +413,7 @@ struct lov_obd { struct lov_md_tgt_desc *lov_mdc_tgts; }; -struct lmv_tgt_desc { - struct obd_uuid ltd_uuid; - struct obd_device *ltd_obd; - struct obd_export *ltd_exp; - u32 ltd_idx; - struct mutex ltd_fid_mutex; - struct obd_statfs ltd_statfs; - time64_t ltd_statfs_age; - unsigned long ltd_active:1; /* target up for requests */ -}; +#define lmv_tgt_desc lu_tgt_desc struct lmv_obd { struct lu_client_fld lmv_fld; @@ -458,6 +431,9 @@ struct lmv_obd { struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; void *lmv_cache; + + struct lu_qos lmv_qos; + u32 lmv_qos_rr_index; }; struct niobuf_local { diff --git a/fs/lustre/lmv/Makefile b/fs/lustre/lmv/Makefile index ad470bf..6f9a19c 100644 --- a/fs/lustre/lmv/Makefile +++ b/fs/lustre/lmv/Makefile @@ -1,4 +1,4 @@ ccflags-y += -I$(srctree)/$(src)/../include obj-$(CONFIG_LUSTRE_FS) += lmv.o -lmv-y := lmv_obd.o lmv_intent.o lmv_fld.o lproc_lmv.o +lmv-y := lmv_obd.o lmv_intent.o lmv_fld.o lproc_lmv.o lmv_qos.o diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 6017375..3efd977 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -108,7 +108,7 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, op_data->op_bias = MDS_CROSS_REF; CDEBUG(D_INODE, "REMOTE_INTENT with fid=" DFID " -> mds #%u\n", - PFID(&body->mbo_fid1), tgt->ltd_idx); + PFID(&body->mbo_fid1), tgt->ltd_index); /* ask for security context upon intent */ if (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_OPEN) && @@ -206,7 +206,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, } CDEBUG(D_INODE, "Revalidate slave " DFID " -> mds #%u\n", - PFID(&fid), tgt->ltd_idx); + PFID(&fid), tgt->ltd_index); if (req) { ptlrpc_req_finished(req); @@ -353,7 +353,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(tgt)) return PTR_ERR(tgt); - op_data->op_mds = tgt->ltd_idx; + op_data->op_mds = tgt->ltd_index; } else { LASSERT(fid_is_sane(&op_data->op_fid1)); LASSERT(fid_is_zero(&op_data->op_fid2)); @@ -380,7 +380,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, CDEBUG(D_INODE, "OPEN_INTENT with fid1=" DFID ", fid2=" DFID ", name='%s' -> mds #%u\n", PFID(&op_data->op_fid1), - PFID(&op_data->op_fid2), op_data->op_name, tgt->ltd_idx); + PFID(&op_data->op_fid2), op_data->op_name, tgt->ltd_index); rc = md_intent_lock(tgt->ltd_exp, op_data, it, reqp, cb_blocking, extra_lock_flags); @@ -465,7 +465,7 @@ static int lmv_intent_lookup(struct obd_export *exp, "LOOKUP_INTENT with fid1=" DFID ", fid2=" DFID ", name='%s' -> mds #%u\n", PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), op_data->op_name ? op_data->op_name : "", - tgt->ltd_idx); + tgt->ltd_index); op_data->op_bias &= ~MDS_CROSS_REF; diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index 9974ec5..c673656 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -60,6 +60,8 @@ int lmv_revalidate_slaves(struct obd_export *exp, int lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data, struct ptlrpc_request **preq); +void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, + int activate); int lmv_statfs_check_update(struct obd_device *obd, struct lmv_tgt_desc *tgt); @@ -77,7 +79,7 @@ static inline struct obd_device *lmv2obd_dev(struct lmv_obd *lmv) if (!lmv->tgts[i]) continue; - if (lmv->tgts[i]->ltd_idx == mdt_idx) { + if (lmv->tgts[i]->ltd_index == mdt_idx) { if (index) *index = i; return lmv->tgts[i]; @@ -192,6 +194,10 @@ static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) struct lmv_tgt_desc *lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data); +/* lmv_qos.c */ +struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt); +struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt); + /* lproc_lmv.c */ int lmv_tunables_init(struct obd_device *obd); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 02dfd35..20ae322 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -57,9 +57,8 @@ static int lmv_check_connect(struct obd_device *obd); -static void lmv_activate_target(struct lmv_obd *lmv, - struct lmv_tgt_desc *tgt, - int activate) +void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, + int activate) { if (tgt->ltd_active == activate) return; @@ -315,7 +314,7 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) target.ft_srv = NULL; target.ft_exp = mdc_exp; - target.ft_idx = tgt->ltd_idx; + target.ft_idx = tgt->ltd_index; fld_client_add_target(&lmv->lmv_fld, &target); @@ -345,6 +344,12 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) md_init_ea_size(tgt->ltd_exp, lmv->max_easize, lmv->max_def_easize); + rc = lqos_add_tgt(&lmv->lmv_qos, tgt); + if (rc) { + obd_disconnect(mdc_exp); + return rc; + } + CDEBUG(D_CONFIG, "Connected to %s(%s) successfully (%d)\n", mdc_obd->obd_name, mdc_obd->obd_uuid.uuid, atomic_read(&obd->obd_refcount)); @@ -364,6 +369,8 @@ static void lmv_del_target(struct lmv_obd *lmv, int index) if (!lmv->tgts[index]) return; + lqos_del_tgt(&lmv->lmv_qos, lmv->tgts[index]); + kfree(lmv->tgts[index]); lmv->tgts[index] = NULL; } @@ -435,7 +442,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, } mutex_init(&tgt->ltd_fid_mutex); - tgt->ltd_idx = index; + tgt->ltd_index = index; tgt->ltd_uuid = *uuidp; tgt->ltd_active = 0; lmv->tgts[index] = tgt; @@ -1099,7 +1106,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, return -EINVAL; /* only files on same MDT can have their layouts swapped */ - if (tgt1->ltd_idx != tgt2->ltd_idx) + if (tgt1->ltd_index != tgt2->ltd_index) return -EPERM; rc = obd_iocontrol(cmd, tgt1->ltd_exp, len, karg, uarg); @@ -1253,6 +1260,8 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) { struct lmv_obd *lmv = &obd->u.lmv; struct lmv_desc *desc; + struct lnet_process_id lnet_id; + int i = 0; int rc; if (LUSTRE_CFG_BUFLEN(lcfg, 1) < 1) { @@ -1275,13 +1284,35 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) obd_str2uuid(&lmv->desc.ld_uuid, desc->ld_uuid.uuid); lmv->desc.ld_tgt_count = 0; lmv->desc.ld_active_tgt_count = 0; - lmv->desc.ld_qos_maxage = 60; + lmv->desc.ld_qos_maxage = LMV_DESC_QOS_MAXAGE_DEFAULT; lmv->max_def_easize = 0; lmv->max_easize = 0; spin_lock_init(&lmv->lmv_lock); mutex_init(&lmv->lmv_init_mutex); + /* Set up allocation policy (QoS and RR) */ + INIT_LIST_HEAD(&lmv->lmv_qos.lq_svr_list); + init_rwsem(&lmv->lmv_qos.lq_rw_sem); + lmv->lmv_qos.lq_dirty = 1; + lmv->lmv_qos.lq_rr.lqr_dirty = 1; + lmv->lmv_qos.lq_reset = 1; + /* Default priority is toward free space balance */ + lmv->lmv_qos.lq_prio_free = 232; + /* Default threshold for rr (roughly 17%) */ + lmv->lmv_qos.lq_threshold_rr = 43; + + /* + * initialize rr_index to lower 32bit of netid, so that client + * can distribute subdirs evenly from the beginning. + */ + while (LNetGetId(i++, &lnet_id) != -ENOENT) { + if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) { + lmv->lmv_qos_rr_index = (u32)lnet_id.nid; + break; + } + } + rc = lmv_tunables_init(obd); if (rc) CWARN("%s: error adding LMV sysfs/debugfs files: rc = %d\n", @@ -1462,6 +1493,7 @@ static int lmv_statfs_update(void *cookie, int rc) tgt->ltd_statfs = *osfs; tgt->ltd_statfs_age = ktime_get_seconds(); spin_unlock(&lmv->lmv_lock); + lmv->lmv_qos.lq_dirty = 1; } return rc; @@ -1541,7 +1573,7 @@ static int lmv_getattr(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(tgt); if (op_data->op_flags & MF_GET_MDT_IDX) { - op_data->op_mds = tgt->ltd_idx; + op_data->op_mds = tgt->ltd_index; return 0; } @@ -1585,17 +1617,6 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -static struct lmv_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) -{ - static unsigned int rr_index; - - /* locate MDT round-robin is the first step */ - *mdt = rr_index % lmv->tgts_size; - rr_index++; - - return lmv->tgts[*mdt]; -} - static struct lmv_tgt_desc * lmv_locate_tgt_by_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, const char *name, int namelen, struct lu_fid *fid, @@ -1609,7 +1630,7 @@ static struct lmv_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) if (IS_ERR(tgt)) return tgt; - *mds = tgt->ltd_idx; + *mds = tgt->ltd_index; return tgt; } @@ -1698,12 +1719,18 @@ struct lmv_tgt_desc * lmv_dir_space_hashed(op_data->op_default_mea1) && !lmv_dir_striped(lsm)) { tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); + if (tgt == ERR_PTR(-EAGAIN)) + tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds); /* * only update statfs when mkdir under dir with "space" hash, * this means the cached statfs may be stale, and current mkdir * may not follow QoS accurately, but it's not serious, and it * avoids periodic statfs when client doesn't mkdir under * "space" hashed directories. + * + * TODO: after MDT support QoS object allocation, also update + * statfs for 'lfs mkdir -i -1 ...", currently it's done in user + * space. */ if (!IS_ERR(tgt)) { struct obd_device *obd; @@ -1823,7 +1850,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(tgt)) return PTR_ERR(tgt); - op_data->op_mds = tgt->ltd_idx; + op_data->op_mds = tgt->ltd_index; } CDEBUG(D_INODE, "CREATE obj " DFID " -> mds #%x\n", @@ -1858,7 +1885,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(tgt); CDEBUG(D_INODE, "ENQUEUE on " DFID " -> mds #%u\n", - PFID(&op_data->op_fid1), tgt->ltd_idx); + PFID(&op_data->op_fid1), tgt->ltd_index); return md_enqueue(tgt->ltd_exp, einfo, policy, op_data, lockh, extra_lock_flags); @@ -1881,7 +1908,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, CDEBUG(D_INODE, "GETATTR_NAME for %*s on " DFID " -> mds #%u\n", (int)op_data->op_namelen, op_data->op_name, - PFID(&op_data->op_fid1), tgt->ltd_idx); + PFID(&op_data->op_fid1), tgt->ltd_index); rc = md_getattr_name(tgt->ltd_exp, op_data, preq); if (rc == -ENOENT && lmv_dir_retry_check_update(op_data)) { @@ -1935,7 +1962,7 @@ static int lmv_early_cancel(struct obd_export *exp, struct lmv_tgt_desc *tgt, return PTR_ERR(tgt); } - if (tgt->ltd_idx != op_tgt) { + if (tgt->ltd_index != op_tgt) { CDEBUG(D_INODE, "EARLY_CANCEL on " DFID "\n", PFID(fid)); policy.l_inodebits.bits = bits; rc = md_cancel_unused(tgt->ltd_exp, fid, &policy, @@ -1981,7 +2008,7 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data, * Cancel UPDATE lock on child (fid1). */ op_data->op_flags |= MF_MDC_CANCEL_FID2; - rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); if (rc != 0) return rc; @@ -2075,7 +2102,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(child_tgt); if (!S_ISDIR(op_data->op_mode) && tp_tgt) - rc = __lmv_fid_alloc(lmv, &target_fid, tp_tgt->ltd_idx); + rc = __lmv_fid_alloc(lmv, &target_fid, tp_tgt->ltd_index); else rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); if (rc) @@ -2101,7 +2128,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, } /* cancel UPDATE lock of parent master object */ - rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); if (rc) return rc; @@ -2126,14 +2153,14 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fid4 = target_fid; /* cancel UPDATE locks of target parent */ - rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID2); if (rc) return rc; /* cancel LOOKUP lock of source if source is remote object */ if (child_tgt != sp_tgt) { - rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID3); if (rc) @@ -2141,7 +2168,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, } /* cancel ELC locks of source */ - rc = lmv_early_cancel(exp, child_tgt, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, child_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_ELC, MF_MDC_CANCEL_FID3); if (rc) return rc; @@ -2201,7 +2228,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, op_data->op_flags |= MF_MDC_CANCEL_FID4; /* cancel UPDATE locks of target parent */ - rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, tp_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID2); if (rc != 0) return rc; @@ -2210,7 +2237,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, /* cancel LOOKUP lock of target on target parent */ if (tgt != tp_tgt) { rc = lmv_early_cancel(exp, tp_tgt, op_data, - tgt->ltd_idx, LCK_EX, + tgt->ltd_index, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID4); if (rc != 0) @@ -2224,7 +2251,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(src_tgt); /* cancel ELC locks of source */ - rc = lmv_early_cancel(exp, src_tgt, op_data, tgt->ltd_idx, + rc = lmv_early_cancel(exp, src_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_ELC, MF_MDC_CANCEL_FID3); if (rc != 0) @@ -2239,7 +2266,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(sp_tgt); /* cancel UPDATE locks of source parent */ - rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, sp_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1); if (rc != 0) return rc; @@ -2248,7 +2275,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, /* cancel LOOKUP lock of source on source parent */ if (src_tgt != sp_tgt) { rc = lmv_early_cancel(exp, sp_tgt, op_data, - tgt->ltd_idx, LCK_EX, + tgt->ltd_index, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID3); if (rc != 0) @@ -2293,7 +2320,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, /* cancel LOOKUP lock of target on target parent */ if (tgt != tp_tgt) { rc = lmv_early_cancel(exp, tp_tgt, op_data, - tgt->ltd_idx, LCK_EX, + tgt->ltd_index, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID4); if (rc != 0) @@ -2781,17 +2808,18 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, op_data->op_flags |= MF_MDC_CANCEL_FID1 | MF_MDC_CANCEL_FID3; if (parent_tgt != tgt) - rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx, + rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_LOOKUP, MF_MDC_CANCEL_FID3); - rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX, + rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_index, LCK_EX, MDS_INODELOCK_ELC, MF_MDC_CANCEL_FID3); if (rc) return rc; CDEBUG(D_INODE, "unlink with fid=" DFID "/" DFID " -> mds #%u\n", - PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), tgt->ltd_idx); + PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), + tgt->ltd_index); rc = md_unlink(tgt->ltd_exp, op_data, request); if (rc == -ENOENT && lmv_dir_retry_check_update(op_data)) { diff --git a/fs/lustre/lmv/lmv_qos.c b/fs/lustre/lmv/lmv_qos.c new file mode 100644 index 0000000..e323398 --- /dev/null +++ b/fs/lustre/lmv/lmv_qos.c @@ -0,0 +1,446 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * This file is part of Lustre, http://www.lustre.org/ + * + * lustre/lmv/lmv_qos.c + * + * LMV QoS. + * These are the only exported functions, they provide some generic + * infrastructure for object allocation QoS + * + */ + +#define DEBUG_SUBSYSTEM S_LMV + +#include +#include +#include +#include + +#include "lmv_internal.h" + +static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) +{ + struct obd_statfs *statfs = &tgt->ltd_statfs; + + return statfs->os_bavail * statfs->os_bsize; +} + +static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) +{ + return tgt->ltd_statfs.os_ffree; +} + +/** + * Calculate penalties per-tgt and per-server + * + * Re-calculate penalties when the configuration changes, active targets + * change and after statfs refresh (all these are reflected by lq_dirty flag). + * On every MDT and MDS: decay the penalty by half for every 8x the update + * interval that the device has been idle. That gives lots of time for the + * statfs information to be updated (which the penalty is only a proxy for), + * and avoids penalizing MDS/MDTs under light load. + * See lmv_qos_calc_weight() for how penalties are factored into the weight. + * + * @lmv LMV device + * + * Return: 0 on success + * -EAGAIN if the number of MDTs isn't enough or all + * MDT spaces are almost the same + */ +static int lmv_qos_calc_ppts(struct lmv_obd *lmv) +{ + struct lu_qos *qos = &lmv->lmv_qos; + struct lu_tgt_desc *tgt; + struct lu_svr_qos *svr; + u64 ba_max, ba_min, ba; + u64 ia_max, ia_min, ia; + u32 num_active; + unsigned int i; + int prio_wide; + time64_t now, age; + u32 maxage = lmv->desc.ld_qos_maxage; + int rc = 0; + + + if (!qos->lq_dirty) + goto out; + + num_active = lmv->desc.ld_active_tgt_count; + if (num_active < 2) { + rc = -EAGAIN; + goto out; + } + + /* find bavail on each server */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + svr->lsq_bavail = 0; + svr->lsq_iavail = 0; + } + qos->lq_active_svr_count = 0; + + /* + * How badly user wants to select targets "widely" (not recently chosen + * and not on recent MDS's). As opposed to "freely" (free space avail.) + * 0-256 + */ + prio_wide = 256 - qos->lq_prio_free; + + ba_min = (u64)(-1); + ba_max = 0; + ia_min = (u64)(-1); + ia_max = 0; + now = ktime_get_real_seconds(); + + /* Calculate server penalty per object */ + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + tgt = lmv->tgts[i]; + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + + /* bavail >> 16 to avoid overflow */ + ba = tgt_statfs_bavail(tgt) >> 16; + if (!ba) + continue; + + ba_min = min(ba, ba_min); + ba_max = max(ba, ba_max); + + /* iavail >> 8 to avoid overflow */ + ia = tgt_statfs_iavail(tgt) >> 8; + if (!ia) + continue; + + ia_min = min(ia, ia_min); + ia_max = max(ia, ia_max); + + /* Count the number of usable MDS's */ + if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) + qos->lq_active_svr_count++; + tgt->ltd_qos.ltq_svr->lsq_bavail += ba; + tgt->ltd_qos.ltq_svr->lsq_iavail += ia; + + /* + * per-MDT penalty is + * prio * bavail * iavail / (num_tgt - 1) / 2 + */ + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active - 1); + tgt->ltd_qos.ltq_penalty_per_obj >>= 1; + + age = (now - tgt->ltd_qos.ltq_used) >> 3; + if (qos->lq_reset || age > 32 * maxage) + tgt->ltd_qos.ltq_penalty = 0; + else if (age > maxage) + /* Decay tgt penalty. */ + tgt->ltd_qos.ltq_penalty >>= (age / maxage); + } + + num_active = qos->lq_active_svr_count; + if (num_active < 2) { + /* + * If there's only 1 MDS, we can't penalize it, so instead + * we have to double the MDT penalty + */ + num_active = 2; + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + tgt = lmv->tgts[i]; + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_penalty_per_obj <<= 1; + } + } + + /* + * Per-MDS penalty is + * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + ba = svr->lsq_bavail; + ia = svr->lsq_iavail; + svr->lsq_penalty_per_obj = prio_wide * ba * ia; + do_div(ba, svr->lsq_tgt_count * (num_active - 1)); + svr->lsq_penalty_per_obj >>= 1; + + age = (now - svr->lsq_used) >> 3; + if (qos->lq_reset || age > 32 * maxage) + svr->lsq_penalty = 0; + else if (age > maxage) + /* Decay server penalty. */ + svr->lsq_penalty >>= age / maxage; + } + + qos->lq_dirty = 0; + qos->lq_reset = 0; + + /* + * If each MDT has almost same free space, do rr allocation for better + * creation performance + */ + qos->lq_same_space = 0; + if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && + (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { + qos->lq_same_space = 1; + /* Reset weights for the next time we enter qos mode */ + qos->lq_reset = 1; + } + rc = 0; + +out: + if (!rc && qos->lq_same_space) + return -EAGAIN; + + return rc; +} + +static inline bool lmv_qos_is_usable(struct lmv_obd *lmv) +{ + if (!lmv->lmv_qos.lq_dirty && lmv->lmv_qos.lq_same_space) + return false; + + if (lmv->desc.ld_active_tgt_count < 2) + return false; + + return true; +} + +/** + * Calculate weight for a given MDT. + * + * The final MDT weight is bavail >> 16 * iavail >> 8 minus the MDT and MDS + * penalties. See lmv_qos_calc_ppts() for how penalties are calculated. + * + * \param[in] tgt MDT target descriptor + */ +static void lmv_qos_calc_weight(struct lu_tgt_desc *tgt) +{ + struct lu_tgt_qos *ltq = &tgt->ltd_qos; + u64 temp, temp2; + + temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); + temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; + if (temp < temp2) + ltq->ltq_weight = 0; + else + ltq->ltq_weight = temp - temp2; +} + +/** + * Re-calculate weights. + * + * The function is called when some target was used for a new object. In + * this case we should re-calculate all the weights to keep new allocations + * balanced well. + * + * \param[in] lmv LMV device + * \param[in] tgt target where a new object was placed + * \param[out] total_wt new total weight for the pool + * + * \retval 0 + */ +static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, + u64 *total_wt) +{ + struct lu_tgt_qos *ltq; + struct lu_svr_qos *svr; + unsigned int i; + + ltq = &tgt->ltd_qos; + LASSERT(ltq); + + /* Don't allocate on this device anymore, until the next alloc_qos */ + ltq->ltq_usable = 0; + + svr = ltq->ltq_svr; + + /* + * Decay old penalty by half (we're adding max penalty, and don't + * want it to run away.) + */ + ltq->ltq_penalty >>= 1; + svr->lsq_penalty >>= 1; + + /* mark the MDS and MDT as recently used */ + ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); + + /* Set max penalties for this MDT and MDS */ + ltq->ltq_penalty += ltq->ltq_penalty_per_obj * + lmv->desc.ld_active_tgt_count; + svr->lsq_penalty += svr->lsq_penalty_per_obj * + lmv->lmv_qos.lq_active_svr_count; + + /* Decrease all MDS penalties */ + list_for_each_entry(svr, &lmv->lmv_qos.lq_svr_list, lsq_svr_list) { + if (svr->lsq_penalty < svr->lsq_penalty_per_obj) + svr->lsq_penalty = 0; + else + svr->lsq_penalty -= svr->lsq_penalty_per_obj; + } + + *total_wt = 0; + /* Decrease all MDT penalties */ + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + ltq = &lmv->tgts[i]->ltd_qos; + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + + if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) + ltq->ltq_penalty = 0; + else + ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; + + lmv_qos_calc_weight(lmv->tgts[i]); + + /* Recalc the total weight of usable osts */ + if (ltq->ltq_usable) + *total_wt += ltq->ltq_weight; + + CDEBUG(D_OTHER, + "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", + i, ltq->ltq_usable, + tgt_statfs_bavail(tgt) >> 10, + ltq->ltq_penalty_per_obj >> 10, + ltq->ltq_penalty >> 10, + ltq->ltq_svr->lsq_penalty_per_obj >> 10, + ltq->ltq_svr->lsq_penalty >> 10, + ltq->ltq_weight >> 10); + } + + return 0; +} + +struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) +{ + struct lu_tgt_desc *tgt; + u64 total_weight = 0; + u64 cur_weight = 0; + u64 rand; + int i; + int rc; + + if (!lmv_qos_is_usable(lmv)) + return ERR_PTR(-EAGAIN); + + down_write(&lmv->lmv_qos.lq_rw_sem); + + if (!lmv_qos_is_usable(lmv)) { + tgt = ERR_PTR(-EAGAIN); + goto unlock; + } + + rc = lmv_qos_calc_ppts(lmv); + if (rc) { + tgt = ERR_PTR(rc); + goto unlock; + } + + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + tgt = lmv->tgts[i]; + if (!tgt) + continue; + + tgt->ltd_qos.ltq_usable = 0; + if (!tgt->ltd_exp || !tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_usable = 1; + lmv_qos_calc_weight(tgt); + total_weight += tgt->ltd_qos.ltq_weight; + } + + if (total_weight) { +#if BITS_PER_LONG == 32 + /* + * If total_weight > 32-bit, first generate the high + * 32 bits of the random number, then add in the low + * 32 bits (truncated to the upper limit, if needed) + */ + if (total_weight > 0xffffffffULL) + rand = (u64)(prandom_u32_max( + (unsigned int)(total_weight >> 32)) << 32; + else + rand = 0; + + if (rand == (total_weight & 0xffffffff00000000ULL)) + rand |= prandom_u32_max((unsigned int)total_weight); + else + rand |= prandom_u32(); + +#else + rand = ((u64)prandom_u32() << 32 | prandom_u32()) % + total_weight; +#endif + } else { + rand = 0; + } + + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + tgt = lmv->tgts[i]; + + if (!tgt || !tgt->ltd_qos.ltq_usable) + continue; + + cur_weight += tgt->ltd_qos.ltq_weight; + if (cur_weight < rand) + continue; + + *mdt = tgt->ltd_index; + lmv_qos_used(lmv, tgt, &total_weight); + rc = 0; + goto unlock; + } + + /* no proper target found */ + tgt = ERR_PTR(-EAGAIN); + goto unlock; +unlock: + up_write(&lmv->lmv_qos.lq_rw_sem); + + return tgt; +} + +struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) +{ + struct lu_tgt_desc *tgt; + int i; + + spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + tgt = lmv->tgts[(i + lmv->lmv_qos_rr_index) % + lmv->desc.ld_tgt_count]; + if (tgt && tgt->ltd_exp && tgt->ltd_active) { + *mdt = tgt->ltd_index; + lmv->lmv_qos_rr_index = + (i + lmv->lmv_qos_rr_index + 1) % + lmv->desc.ld_tgt_count; + spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); + + return tgt; + } + } + spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); + + return ERR_PTR(-ENODEV); +} diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index 170ed564..659ebeb 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -76,6 +76,109 @@ static ssize_t desc_uuid_show(struct kobject *kobj, struct attribute *attr, } LUSTRE_RO_ATTR(desc_uuid); +static ssize_t qos_maxage_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + + return sprintf(buf, "%u\n", dev->u.lmv.desc.ld_qos_maxage); +} + +static ssize_t qos_maxage_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + unsigned int val; + int rc; + + rc = kstrtouint(buffer, 0, &val); + if (rc) + return rc; + + dev->u.lmv.desc.ld_qos_maxage = val; + + return count; +} +LUSTRE_RW_ATTR(qos_maxage); + +static ssize_t qos_prio_free_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + + return sprintf(buf, "%u%%\n", + (dev->u.lmv.lmv_qos.lq_prio_free * 100 + 255) >> 8); +} + +static ssize_t qos_prio_free_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct lmv_obd *lmv = &dev->u.lmv; + unsigned int val; + int rc; + + rc = kstrtouint(buffer, 0, &val); + if (rc) + return rc; + + if (val > 100) + return -EINVAL; + + lmv->lmv_qos.lq_prio_free = (val << 8) / 100; + lmv->lmv_qos.lq_dirty = 1; + lmv->lmv_qos.lq_reset = 1; + + return count; +} +LUSTRE_RW_ATTR(qos_prio_free); + +static ssize_t qos_threshold_rr_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + + return sprintf(buf, "%u%%\n", + (dev->u.lmv.lmv_qos.lq_threshold_rr * 100 + 255) >> 8); +} + +static ssize_t qos_threshold_rr_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct obd_device *dev = container_of(kobj, struct obd_device, + obd_kset.kobj); + struct lmv_obd *lmv = &dev->u.lmv; + unsigned int val; + int rc; + + rc = kstrtouint(buffer, 0, &val); + if (rc) + return rc; + + if (val > 100) + return -EINVAL; + + lmv->lmv_qos.lq_threshold_rr = (val << 8) / 100; + lmv->lmv_qos.lq_dirty = 1; + + return count; +} +LUSTRE_RW_ATTR(qos_threshold_rr); + static void *lmv_tgt_seq_start(struct seq_file *p, loff_t *pos) { struct obd_device *dev = p->private; @@ -117,7 +220,7 @@ static int lmv_tgt_seq_show(struct seq_file *p, void *v) return 0; seq_printf(p, "%u: %s %sACTIVE\n", - tgt->ltd_idx, tgt->ltd_uuid.uuid, + tgt->ltd_index, tgt->ltd_uuid.uuid, tgt->ltd_active ? "" : "IN"); return 0; } @@ -156,6 +259,9 @@ static int lmv_target_seq_open(struct inode *inode, struct file *file) &lustre_attr_activeobd.attr, &lustre_attr_desc_uuid.attr, &lustre_attr_numobd.attr, + &lustre_attr_qos_maxage.attr, + &lustre_attr_qos_prio_free.attr, + &lustre_attr_qos_threshold_rr.attr, NULL, }; diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 25d2e1d..6d762ed 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o integrity.o obd_cksum.o + jobid.o integrity.o obd_cksum.o lu_qos.o diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c new file mode 100644 index 0000000..4ee3f59 --- /dev/null +++ b/fs/lustre/obdclass/lu_qos.c @@ -0,0 +1,166 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * This file is part of Lustre, http://www.lustre.org/ + * + * lustre/obdclass/lu_qos.c + * + * Lustre QoS. + * These are the only exported functions, they provide some generic + * infrastructure for object allocation QoS + * + */ + +#define DEBUG_SUBSYSTEM S_CLASS + +#include +#include +#include +#include +#include +#include +#include + +/** + * Add a new target to Quality of Service (QoS) target table. + * + * Add a new MDT/OST target to the structure representing an OSS. Resort the + * list of known MDSs/OSSs by the number of MDTs/OSTs attached to each MDS/OSS. + * The MDS/OSS list is protected internally and no external locking is required. + * + * @qos lu_qos data + * @ltd target description + * + * Return: 0 on success + * -ENOMEM on error + */ +int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) +{ + struct lu_svr_qos *svr = NULL; + struct lu_svr_qos *tempsvr; + struct obd_export *exp = ltd->ltd_exp; + int found = 0; + u32 id = 0; + int rc = 0; + + down_write(&qos->lq_rw_sem); + /* + * a bit hacky approach to learn NID of corresponding connection + * but there is no official API to access information like this + * with OSD API. + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (obd_uuid_equals(&svr->lsq_uuid, + &exp->exp_connection->c_remote_uuid)) { + found++; + break; + } + if (svr->lsq_id > id) + id = svr->lsq_id; + } + + if (!found) { + svr = kmalloc(sizeof(*svr), GFP_NOFS); + if (!svr) { + rc = -ENOMEM; + goto out; + } + memcpy(&svr->lsq_uuid, &exp->exp_connection->c_remote_uuid, + sizeof(svr->lsq_uuid)); + ++id; + svr->lsq_id = id; + } else { + /* Assume we have to move this one */ + list_del(&svr->lsq_svr_list); + } + + svr->lsq_tgt_count++; + ltd->ltd_qos.ltq_svr = svr; + + CDEBUG(D_OTHER, "add tgt %s to server %s (%d targets)\n", + obd_uuid2str(<d->ltd_uuid), obd_uuid2str(&svr->lsq_uuid), + svr->lsq_tgt_count); + + /* + * Add sorted by # of tgts. Find the first entry that we're + * bigger than... + */ + list_for_each_entry(tempsvr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_tgt_count > tempsvr->lsq_tgt_count) + break; + } + /* + * ...and add before it. If we're the first or smallest, tempsvr + * points to the list head, and we add to the end. + */ + list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list); + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; + +out: + up_write(&qos->lq_rw_sem); + return rc; +} +EXPORT_SYMBOL(lqos_add_tgt); + +/** + * Remove MDT/OST target from QoS table. + * + * Removes given MDT/OST target from QoS table and releases related + * MDS/OSS structure if no target remain on the MDS/OSS. + * + * @qos lu_qos data + * @ltd target description + * + * Return: 0 on success + * -ENOENT if no server was found + */ +int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) +{ + struct lu_svr_qos *svr; + int rc = 0; + + down_write(&qos->lq_rw_sem); + svr = ltd->ltd_qos.ltq_svr; + if (!svr) { + rc = -ENOENT; + goto out; + } + + svr->lsq_tgt_count--; + if (svr->lsq_tgt_count == 0) { + CDEBUG(D_OTHER, "removing server %s\n", + obd_uuid2str(&svr->lsq_uuid)); + list_del(&svr->lsq_svr_list); + ltd->ltd_qos.ltq_svr = NULL; + kfree(svr); + } + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; +out: + up_write(&qos->lq_rw_sem); + return rc; +} +EXPORT_SYMBOL(lqos_del_tgt); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 86395b7..a26f3ae 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1931,6 +1931,8 @@ struct mdt_rec_reint { __u16 rr_padding_4; /* also fix lustre_swab_mdt_rec_reint */ }; +#define LMV_DESC_QOS_MAXAGE_DEFAULT 60 /* Seconds */ + /* lmv structures */ struct lmv_desc { __u32 ld_tgt_count; /* how many MDS's */ From patchwork Thu Feb 27 21:13:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410325 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 623B117E0 for ; Thu, 27 Feb 2020 21:35:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4A39F24677 for ; Thu, 27 Feb 2020 21:35:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A39F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3252D348E24; Thu, 27 Feb 2020 13:29:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5F22E21FD37 for ; Thu, 27 Feb 2020 13:20:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 855D88A9D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 831A646C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:46 -0500 Message-Id: <1582838290-17243-359-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 358/622] lustre: llite: Add persistent cache on client X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Li Xi , Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin PCC is a new framework which provides a group of local cache on Lustre client side. No global namespace will be provided by PCC. Each client uses its own local storage as a cache for itself. Local file system is used to manage the data on local caches. Cached I/O is directed to local filesystem while normal I/O is directed to OSTs. PCC uses HSM for data synchronization. It uses HSM copytool to restore file from local caches to Lustre OSTs. Each PCC has a copytool instance running with unique archive number. Any remote access from another Lustre client would trigger the data synchronization. If a client with PCC goes offline, the cached data becomes inaccessible for other client temporarilly. And after the PCC client reboots and the copytool restarts, the data will be accessible again. ToDo: 1) Make PCC exclusive with HSM. 2) Strong size consistence for PCC cached file among clients. 3) Support to cache partial content of a file. WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: f172b1168857 ("LU-10092 llite: Add persistent cache on client") Signed-off-by: Li Xi Signed-off-by: Wang Shilong Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/32963 Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin WC-bug-id: https://jira.whamcloud.com/browse/LU-12438 Lustre-commit: b5a6ec93ce56 ("LU-12438 llite: vfs_read/write removed, use kernel_read/write") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35223 Reviewed-by: Wang Shilong Reviewed-by: Li Xi Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Petros Koutoupis Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 + fs/lustre/llite/Makefile | 2 +- fs/lustre/llite/dir.c | 74 +++ fs/lustre/llite/file.c | 164 ++++- fs/lustre/llite/llite_internal.h | 25 + fs/lustre/llite/llite_lib.c | 45 +- fs/lustre/llite/llite_mmap.c | 8 + fs/lustre/llite/lproc_llite.c | 45 +- fs/lustre/llite/namei.c | 79 ++- fs/lustre/llite/pcc.c | 1042 +++++++++++++++++++++++++++++++ fs/lustre/llite/pcc.h | 129 ++++ fs/lustre/llite/super25.c | 10 + fs/lustre/lmv/lmv_intent.c | 6 +- fs/lustre/lmv/lmv_obd.c | 1 + fs/lustre/mdc/mdc_lib.c | 6 + include/uapi/linux/lustre/lustre_idl.h | 8 +- include/uapi/linux/lustre/lustre_user.h | 50 +- 17 files changed, 1654 insertions(+), 42 deletions(-) create mode 100644 fs/lustre/llite/pcc.c create mode 100644 fs/lustre/llite/pcc.h diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 2f878d6..f53c303 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -796,6 +796,8 @@ struct md_op_data { bool op_post_migrate; /* used to access dir with bash hash */ u32 op_stripe_index; + /* Archive ID for PCC attach */ + u32 op_archive_id; }; struct md_callback { diff --git a/fs/lustre/llite/Makefile b/fs/lustre/llite/Makefile index 811b9ab..c88a1b0 100644 --- a/fs/lustre/llite/Makefile +++ b/fs/lustre/llite/Makefile @@ -7,6 +7,6 @@ lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \ xattr.o xattr_cache.o xattr_security.o \ super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \ vvp_dev.o vvp_page.o vvp_io.o vvp_object.o \ - lproc_llite.o + lproc_llite.o pcc.o lustre-$(CONFIG_LUSTRE_FS_POSIX_ACL) += acl.o diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index a1dce52..337582b 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1917,6 +1917,80 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return ll_ioctl_fsgetxattr(inode, cmd, arg); case FS_IOC_FSSETXATTR: return ll_ioctl_fssetxattr(inode, cmd, arg); + case LL_IOC_PCC_DETACH: { + struct lu_pcc_detach *detach; + struct lu_fid *fid; + struct inode *inode2; + unsigned long ino; + + /* + * The reason why a dir IOCTL is used to detach a PCC-cached + * file rather than making it a file IOCTL is: + * When PCC caching a file, it will attach the file firstly, + * and increase the refcount of PCC inode (pcci->pcci_refcount) + * from 0 to 1. + * When detaching a PCC-cached file, it will check whether the + * refcount is 1. If so, the file can be detached successfully. + * Otherwise, it means there are some users opened and using + * the file currently, and it will return -EBUSY. + * Each open on the PCC-cached file will increase the refcount + * of the PCC inode; + * Each close on the PCC-cached file will decrease the refcount + * of the PCC inode; + * When used a file IOCTL to detach a PCC-cached file, it needs + * to open it at first, which will increase the refcount. So + * during the process of the detach IOCTL, it will return + * -EBUSY as the PCC inode refcount is larger than 1. Someone + * might argue that here it can just decrease the refcount + * of the PCC inode, return succeed and make the close of + * IOCTL file handle to perform the real detach. But this + * may result in inconsistent state of a PCC file. i.e. Process + * A got a successful return form the detach IOCTL; Process B + * opens the file before Process A finally closed the IOCTL + * file handle. It makes the following I/O of Process B will + * direct into PCC although the file was already detached from + * the view of Process A. + * Using a dir IOCTL does not exist the problem above. + */ + detach = kzalloc(sizeof(*detach), GFP_KERNEL); + if (!detach) + return -ENOMEM; + + if (copy_from_user(detach, + (const struct lu_pcc_detach __user *)arg, + sizeof(*detach))) { + rc = -EFAULT; + goto out_detach; + } + + fid = &detach->pccd_fid; + ino = cl_fid_build_ino(fid, ll_need_32bit_api(sbi)); + inode2 = ilookup5(inode->i_sb, ino, ll_test_inode_by_fid, fid); + if (!inode2) { + /* Target inode is not in inode cache, and PCC file + * has aleady released, return immdiately. + */ + rc = 0; + goto out_detach; + } + + if (!S_ISREG(inode2->i_mode)) { + rc = -EINVAL; + goto out_iput; + } + + if (!inode_owner_or_capable(inode2)) { + rc = -EPERM; + goto out_iput; + } + + rc = pcc_ioctl_detach(inode2); +out_iput: + iput(inode2); +out_detach: + kfree(detach); + return rc; + } default: return obd_iocontrol(cmd, sbi->ll_dt_exp, 0, NULL, (void __user *)arg); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 88d5c2d..95e7c73 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -56,6 +56,11 @@ struct split_param { u16 sp_mirror_id; }; +struct pcc_param { + u64 pa_data_version; + u32 pa_archive_id; +}; + static int ll_put_grouplock(struct inode *inode, struct file *file, unsigned long arg); @@ -70,6 +75,8 @@ static struct ll_file_data *ll_file_data_get(void) if (!fd) return NULL; fd->fd_write_failed = false; + pcc_file_init(&fd->fd_pcc_file); + return fd; } @@ -192,6 +199,17 @@ static int ll_close_inode_openhandle(struct inode *inode, break; } + case MDS_PCC_ATTACH: { + struct pcc_param *param = data; + + LASSERT(data); + op_data->op_bias |= MDS_HSM_RELEASE | MDS_PCC_ATTACH; + op_data->op_archive_id = param->pa_archive_id; + op_data->op_data_version = param->pa_data_version; + op_data->op_lease_handle = och->och_lease_handle; + break; + } + case MDS_HSM_RELEASE: LASSERT(data); op_data->op_bias |= MDS_HSM_RELEASE; @@ -378,6 +396,8 @@ int ll_file_release(struct inode *inode, struct file *file) return 0; } + pcc_file_release(inode, file); + if (!S_ISDIR(inode->i_mode)) { if (lli->lli_clob) lov_read_and_clear_async_rc(lli->lli_clob); @@ -833,6 +853,10 @@ int ll_file_open(struct inode *inode, struct file *file) if (rc) goto out_och_free; } + rc = pcc_file_open(inode, file); + if (rc) + goto out_och_free; + mutex_unlock(&lli->lli_och_mutex); fd = NULL; @@ -858,6 +882,7 @@ int ll_file_open(struct inode *inode, struct file *file) out_openerr: if (lli->lli_opendir_key == fd) ll_deauthorize_statahead(inode, fd); + if (fd) ll_file_data_put(fd); } else { @@ -1632,6 +1657,22 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t result; u16 refcheck; ssize_t rc2; + bool cached = false; + + /** + * Currently when PCC read failed, we do not fall back to the + * normal read path, just return the error. + * The resaon is that: for RW-PCC, the file data may be modified + * in the PCC and inconsistent with the data on OSTs (or file + * data has been removed from the Lustre file system), at this + * time, fallback to the normal read path may read the wrong + * data. + * TODO: for RO-PCC (readonly PCC), fall back to normal read + * path: read data from data copy on OSTs. + */ + result = pcc_file_read_iter(iocb, to, &cached); + if (cached) + return result; ll_ras_enter(iocb->ki_filp); @@ -1725,6 +1766,21 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) struct vvp_io_args *args; ssize_t rc_tiny = 0, rc_normal; u16 refcheck; + bool cached = false; + int result; + + /** + * When PCC write failed, we do not fall back to the normal + * write path, just return the error. The reason is that: + * PCC is actually a HSM device, and HSM does not handle the + * failure especially -ENOSPC due to space used out; Moreover, + * the fallback to normal I/O path for ENOSPC failure, needs + * to restore the file data to OSTs first and redo the write + * again, making the logic of PCC very complex. + */ + result = pcc_file_write_iter(iocb, from, &cached); + if (cached) + return result; /* NB: we can't do direct IO for tiny writes because they use the page * cache, we can't do sync writes because tiny writes can't flush @@ -2979,13 +3035,15 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, struct ll_inode_info *lli = ll_i2info(inode); struct obd_client_handle *och = NULL; struct split_param sp; - bool lease_broken; + struct pcc_param param; + bool lease_broken = false; fmode_t fmode = 0; enum mds_op_bias bias = 0; struct file *layout_file = NULL; void *data = NULL; size_t data_size = 0; - long rc; + bool attached = false; + long rc, rc2 = 0; mutex_lock(&lli->lli_och_mutex); if (fd->fd_lease_och) { @@ -2994,10 +3052,8 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, } mutex_unlock(&lli->lli_och_mutex); - if (!och) { - rc = -ENOLCK; - goto out; - } + if (!och) + return -ENOLCK; fmode = och->och_flags; @@ -3005,19 +3061,19 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, case LL_LEASE_RESYNC_DONE: if (ioc->lil_count > IOC_IDS_MAX) { rc = -EINVAL; - goto out; + goto out_lease_close; } data_size = offsetof(typeof(*ioc), lil_ids[ioc->lil_count]); data = kzalloc(data_size, GFP_KERNEL); if (!data) { rc = -ENOMEM; - goto out; + goto out_lease_close; } if (copy_from_user(data, (void __user *)arg, data_size)) { rc = -EFAULT; - goto out; + goto out_lease_close; } bias = MDS_CLOSE_RESYNC_DONE; @@ -3027,25 +3083,25 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, if (ioc->lil_count != 1) { rc = -EINVAL; - goto out; + goto out_lease_close; } arg += sizeof(*ioc); if (copy_from_user(&fd, (void __user *)arg, sizeof(u32))) { rc = -EFAULT; - goto out; + goto out_lease_close; } layout_file = fget(fd); if (!layout_file) { rc = -EBADF; - goto out; + goto out_lease_close; } if ((file->f_flags & O_ACCMODE) == O_RDONLY || (layout_file->f_flags & O_ACCMODE) == O_RDONLY) { rc = -EPERM; - goto out; + goto out_lease_close; } data = file_inode(layout_file); @@ -3058,26 +3114,26 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, if (ioc->lil_count != 2) { rc = -EINVAL; - goto out; + goto out_lease_close; } arg += sizeof(*ioc); if (copy_from_user(&fdv, (void __user *)arg, sizeof(u32))) { rc = -EFAULT; - goto out; + goto out_lease_close; } arg += sizeof(u32); if (copy_from_user(&mirror_id, (void __user *)arg, sizeof(u32))) { rc = -EFAULT; - goto out; + goto out_lease_close; } layout_file = fget(fdv); if (!layout_file) { rc = -EBADF; - goto out; + goto out_lease_close; } sp.sp_inode = file_inode(layout_file); @@ -3086,11 +3142,37 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, bias = MDS_CLOSE_LAYOUT_SPLIT; break; } + case LL_LEASE_PCC_ATTACH: + if (ioc->lil_count != 1) + return -EINVAL; + + arg += sizeof(*ioc); + if (copy_from_user(¶m.pa_archive_id, (void __user *)arg, + sizeof(u32))) { + rc2 = -EFAULT; + goto out_lease_close; + } + + rc2 = pcc_readwrite_attach(file, inode, param.pa_archive_id); + if (rc2) + goto out_lease_close; + + attached = true; + /* Grab latest data version */ + rc2 = ll_data_version(inode, ¶m.pa_data_version, + LL_DV_WR_FLUSH); + if (rc2) + goto out_lease_close; + + data = ¶m; + bias = MDS_PCC_ATTACH; + break; default: /* without close intent */ break; } +out_lease_close: rc = ll_lease_close_intent(och, inode, &lease_broken, bias, data); if (rc < 0) goto out; @@ -3112,6 +3194,12 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, if (layout_file) fput(layout_file); break; + case LL_LEASE_PCC_ATTACH: + if (!rc) + rc = rc2; + rc = pcc_readwrite_attach_fini(file, inode, lease_broken, + rc, attached); + break; } if (!rc) @@ -3633,6 +3721,33 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags) rc = ll_heat_set(inode, flags); return rc; } + case LL_IOC_PCC_STATE: { + struct lu_pcc_state __user *ustate = + (struct lu_pcc_state __user *)arg; + struct lu_pcc_state *state; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + if (copy_from_user(state, ustate, sizeof(*state))) { + rc = -EFAULT; + goto out_state; + } + + rc = pcc_ioctl_state(inode, state); + if (rc) + goto out_state; + + if (copy_to_user(ustate, state, sizeof(*state))) { + rc = -EFAULT; + goto out_state; + } + +out_state: + kfree(state); + return rc; + } default: return obd_iocontrol(cmd, ll_i2dtexp(inode), 0, NULL, (void __user *)arg); @@ -3740,13 +3855,20 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) { struct inode *inode = file_inode(file); struct ll_inode_info *lli = ll_i2info(inode); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct ptlrpc_request *req; + struct file *pcc_file = fd->fd_pcc_file.pccf_file; int rc, err; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", PFID(ll_inode2fid(inode)), inode); ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); + /* pcc cache path */ + if (pcc_file) + return file_inode(pcc_file)->i_fop->fsync(pcc_file, + start, end, datasync); + rc = file_write_and_wait_range(file, start, end); inode_lock(inode); @@ -4294,6 +4416,11 @@ int ll_getattr(const struct path *path, struct kstat *stat, return rc; if (S_ISREG(inode->i_mode)) { + bool cached = false; + + rc = pcc_inode_getattr(inode, &cached); + if (cached && rc < 0) + return rc; /* In case of restore, the MDT has the right size and has * already send it back without granting the layout lock, * inode is up-to-date so glimpse is useless. @@ -4301,7 +4428,8 @@ int ll_getattr(const struct path *path, struct kstat *stat, * restore the MDT holds the layout lock so the glimpse will * block up to the end of restore (getattr will block) */ - if (!test_bit(LLIF_FILE_RESTORING, &lli->lli_flags)) { + if (!cached && !test_bit(LLIF_FILE_RESTORING, + &lli->lli_flags)) { rc = ll_glimpse_size(inode); if (rc < 0) return rc; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 9e413c2..f2ea856 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -49,6 +49,7 @@ #include #include "vvp_internal.h" #include "range_lock.h" +#include "pcc.h" /** Only used on client-side for indicating the tail of dir hash/offset. */ #define LL_DIR_END_OFF 0x7fffffffffffffffULL @@ -205,6 +206,9 @@ struct ll_inode_info { * accurate if the file is shared by different jobs. */ char lli_jobid[LUSTRE_JOBID_SIZE]; + + struct mutex lli_pcc_lock; + struct pcc_inode *lli_pcc_inode; }; }; @@ -297,6 +301,11 @@ static inline struct ll_inode_info *ll_i2info(struct inode *inode) return container_of(inode, struct ll_inode_info, lli_vfs_inode); } +static inline struct pcc_inode *ll_i2pcci(struct inode *inode) +{ + return ll_i2info(inode)->lli_pcc_inode; +} + /* default to about 64M of readahead on a given system. */ #define SBI_DEFAULT_READAHEAD_MAX MiB_TO_PAGES(64UL) @@ -552,6 +561,9 @@ struct ll_sb_info { /* filesystem fsname */ char ll_fsname[LUSTRE_MAXFSNAME + 1]; + + /* Persistent Client Cache */ + struct pcc_super ll_pcc_super; }; #define SBI_DEFAULT_HEAT_DECAY_WEIGHT ((80 * 256 + 50) / 100) @@ -672,6 +684,7 @@ struct ll_file_data { * layout version for verification to OST objects */ u32 fd_layout_version; + struct pcc_file fd_pcc_file; }; void llite_tunables_unregister(void); @@ -1355,6 +1368,18 @@ static inline void d_lustre_revalidate(struct dentry *dentry) spin_unlock(&dentry->d_lock); } +static inline dev_t ll_compat_encode_dev(dev_t dev) +{ + /* The compat_sys_*stat*() syscalls will fail unless the + * device majors and minors are both less than 256. Note that + * the value returned here will be passed through + * old_encode_dev() in cp_compat_stat(). And so we are not + * trying to return a valid compat (u16) device number, just + * one that will pass the old_valid_dev() check. + */ + return MKDEV(MAJOR(dev) & 0xff, MINOR(dev) & 0xff); +} + int ll_layout_conf(struct inode *inode, const struct cl_object_conf *conf); int ll_layout_refresh(struct inode *inode, u32 *gen); int ll_layout_restore(struct inode *inode, loff_t start, u64 length); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 0633cc5..d46bc99 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -128,6 +128,7 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_squash.rsi_gid = 0; INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids); spin_lock_init(&sbi->ll_squash.rsi_lock); + pcc_super_init(&sbi->ll_pcc_super); /* Per-filesystem file heat */ sbi->ll_heat_decay_weight = SBI_DEFAULT_HEAT_DECAY_WEIGHT; @@ -139,13 +140,13 @@ static void ll_free_sbi(struct super_block *sb) { struct ll_sb_info *sbi = ll_s2sbi(sb); + if (!list_empty(&sbi->ll_squash.rsi_nosquash_nids)) + cfs_free_nidlist(&sbi->ll_squash.rsi_nosquash_nids); if (sbi->ll_cache) { - if (!list_empty(&sbi->ll_squash.rsi_nosquash_nids)) - cfs_free_nidlist(&sbi->ll_squash.rsi_nosquash_nids); cl_cache_decref(sbi->ll_cache); sbi->ll_cache = NULL; } - + pcc_super_fini(&sbi->ll_pcc_super); kfree(sbi); } @@ -215,7 +216,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_ARCHIVE_ID_ARRAY | OBD_CONNECT2_LSOM | - OBD_CONNECT2_ASYNC_DISCARD; + OBD_CONNECT2_ASYNC_DISCARD | + OBD_CONNECT2_PCC; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; @@ -953,6 +955,8 @@ void ll_lli_init(struct ll_inode_info *lli) spin_lock_init(&lli->lli_heat_lock); obd_heat_clear(lli->lli_heat_instances, OBD_HEAT_COUNT); lli->lli_heat_flags = 0; + mutex_init(&lli->lli_pcc_lock); + lli->lli_pcc_inode = NULL; } mutex_init(&lli->lli_layout_mutex); memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); @@ -1486,6 +1490,8 @@ void ll_clear_inode(struct inode *inode) LASSERT(!lli->lli_opendir_key); LASSERT(!lli->lli_sai); LASSERT(lli->lli_opendir_pid == 0); + } else { + pcc_inode_free(inode); } md_null_inode(sbi->ll_md_exp, ll_inode2fid(inode)); @@ -1709,15 +1715,28 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, if (attr->ia_valid & (ATTR_SIZE | ATTR_ATIME | ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | ATTR_CTIME) || xvalid & OP_XVALID_CTIME_SET) { - /* For truncate and utimes sending attributes to OSTs, setting - * mtime/atime to the past will be performed under PW [0:EOF] - * extent lock (new_size:EOF for truncate). It may seem - * excessive to send mtime/atime updates to OSTs when not - * setting times to past, but it is necessary due to possible - * time de-synchronization between MDT inode and OST objects - */ - rc = cl_setattr_ost(ll_i2info(inode)->lli_clob, - attr, xvalid, 0); + bool cached = false; + + rc = pcc_inode_setattr(inode, attr, &cached); + if (cached) { + if (rc) { + CERROR("%s: PCC inode "DFID" setattr failed: rc = %d\n", + ll_i2sbi(inode)->ll_fsname, + PFID(&lli->lli_fid), rc); + goto out; + } + } else { + /* For truncate and utimes sending attributes to OSTs, + * setting mtime/atime to the past will be performed + * under PW [0:EOF] extent lock (new_size:EOF for + * truncate). It may seem excessive to send mtime/atime + * updates to OSTs when not setting times to past, but + * it is necessary due to possible time + * de-synchronization between MDT inode and OST objects + */ + rc = cl_setattr_ost(ll_i2info(inode)->lli_clob, + attr, xvalid, 0); + } } /* diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 37ce508..fc2331b 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -505,6 +505,14 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); int rc; + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + + /* pcc cache path */ + if (pcc_file) { + vma->vm_file = pcc_file; + return file_inode(pcc_file)->i_fop->mmap(pcc_file, vma); + } if (ll_file_nolock(file)) return -EOPNOTSUPP; diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 165d37f..8cb4983 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1317,7 +1317,46 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, LPROC_SEQ_FOPS(ll_nosquash_nids); -static struct lprocfs_vars lprocfs_llite_obd_vars[] = { +static int ll_pcc_seq_show(struct seq_file *m, void *v) +{ + struct super_block *sb = m->private; + struct ll_sb_info *sbi = ll_s2sbi(sb); + + return pcc_super_dump(&sbi->ll_pcc_super, m); +} + +static ssize_t ll_pcc_seq_write(struct file *file, const char __user *buffer, + size_t count, loff_t *off) +{ + struct seq_file *m = file->private_data; + struct super_block *sb = m->private; + struct ll_sb_info *sbi = ll_s2sbi(sb); + int rc; + char *kernbuf; + + if (count >= LPROCFS_WR_PCC_MAX_CMD) + return -EINVAL; + + if (!(exp_connect_flags2(sbi->ll_md_exp) & OBD_CONNECT2_PCC)) + return -EOPNOTSUPP; + + kernbuf = kzalloc(count + 1, GFP_KERNEL); + if (!kernbuf) + return -ENOMEM; + + if (copy_from_user(kernbuf, buffer, count)) { + rc = -EFAULT; + goto out_free_kernbuff; + } + + rc = pcc_cmd_handle(kernbuf, count, &sbi->ll_pcc_super); +out_free_kernbuff: + kfree(kernbuf); + return rc ? rc : count; +} +LPROC_SEQ_FOPS(ll_pcc); + +struct lprocfs_vars lprocfs_llite_obd_vars[] = { { .name = "site", .fops = &ll_site_stats_fops }, { .name = "max_cached_mb", @@ -1329,9 +1368,11 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, { .name = "sbi_flags", .fops = &ll_sbi_flags_fops }, { .name = "root_squash", - .fops = &ll_root_squash_fops }, + .fops = &ll_root_squash_fops }, { .name = "nosquash_nids", .fops = &ll_nosquash_nids_fops }, + { .name = "pcc", + .fops = &ll_pcc_fops, }, { NULL } }; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index fb5caaf..4f39b2c 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -711,14 +711,21 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, return rc; } +struct pcc_create_attach { + struct pcc_dataset *pca_dataset; + struct dentry *pca_dentry; +}; + static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, struct lookup_intent *it, void **secctx, - u32 *secctxlen) + u32 *secctxlen, + struct pcc_create_attach *pca) { struct lookup_intent lookup_it = { .it_op = IT_LOOKUP }; struct dentry *save = dentry, *retval; struct ptlrpc_request *req = NULL; struct md_op_data *op_data = NULL; + struct lov_user_md *lum = NULL; char secctx_name[XATTR_NAME_MAX + 1]; struct inode *inode; u32 opc; @@ -806,6 +813,42 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, } } + if (pca && pca->pca_dataset) { + struct pcc_dataset *dataset = pca->pca_dataset; + + lum = kzalloc(sizeof(*lum), GFP_NOFS); + if (!lum) { + retval = ERR_PTR(-ENOMEM); + goto out; + } + + lum->lmm_magic = LOV_USER_MAGIC_V1; + lum->lmm_pattern = LOV_PATTERN_F_RELEASED | LOV_PATTERN_RAID0; + lum->lmm_stripe_size = 0; + lum->lmm_stripe_count = 0; + lum->lmm_stripe_offset = 0; + + op_data->op_data = lum; + op_data->op_data_size = sizeof(*lum); + op_data->op_archive_id = dataset->pccd_id; + + rc = obd_fid_alloc(NULL, ll_i2mdexp(parent), &op_data->op_fid2, + op_data); + if (rc) { + retval = ERR_PTR(rc); + goto out; + } + + rc = pcc_inode_create(dataset, &op_data->op_fid2, + &pca->pca_dentry); + if (rc) { + retval = ERR_PTR(rc); + goto out; + } + + it->it_flags |= MDS_OPEN_PCC; + } + rc = md_intent_lock(ll_i2mdexp(parent), op_data, it, &req, &ll_md_blocking_ast, 0); /* @@ -878,6 +921,8 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, ll_finish_md_op_data(op_data); } + kfree(lum); + ptlrpc_req_finished(req); return retval; } @@ -903,7 +948,7 @@ static struct dentry *ll_lookup_nd(struct inode *parent, struct dentry *dentry, itp = NULL; else itp = ⁢ - de = ll_lookup_it(parent, dentry, itp, NULL, NULL); + de = ll_lookup_it(parent, dentry, itp, NULL, NULL, NULL); if (itp) ll_intent_release(itp); @@ -923,6 +968,9 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, void *secctx = NULL; u32 secctxlen = 0; struct dentry *de; + struct ll_sb_info *sbi; + struct pcc_create_attach pca = {NULL, NULL}; + struct pcc_dataset *dataset = NULL; int rc = 0; CDEBUG(D_VFSTRACE, @@ -952,14 +1000,24 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, return -ENOMEM; it->it_op = IT_OPEN; - if (open_flags & O_CREAT) + if (open_flags & O_CREAT) { it->it_op |= IT_CREAT; + sbi = ll_i2sbi(dir); + /* Volatile file is used for HSM restore, so do not use PCC */ + if (!filename_is_volatile(dentry->d_name.name, + dentry->d_name.len, NULL)) { + dataset = pcc_dataset_get(&sbi->ll_pcc_super, + ll_i2info(dir)->lli_projid, + 0); + pca.pca_dataset = dataset; + } + } it->it_create_mode = (mode & S_IALLUGO) | S_IFREG; it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags); it->it_flags &= ~MDS_OPEN_FL_INTERNAL; /* Dentry added to dcache tree in ll_lookup_it */ - de = ll_lookup_it(dir, dentry, it, &secctx, &secctxlen); + de = ll_lookup_it(dir, dentry, it, &secctx, &secctxlen, &pca); if (IS_ERR(de)) rc = PTR_ERR(de); else if (de) @@ -976,9 +1034,20 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, dput(de); goto out_release; } + if (dataset && dentry->d_inode) { + rc = pcc_inode_create_fini(dataset, + dentry->d_inode, + pca.pca_dentry); + if (rc) { + if (de) + dput(de); + goto out_release; + } + } file->f_mode |= FMODE_CREATED; } + if (d_really_is_positive(dentry) && it_disposition(it, DISP_OPEN_OPEN)) { /* Open dentry. */ @@ -1003,6 +1072,8 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, } out_release: + if (dataset) + pcc_dataset_put(dataset); ll_intent_release(it); kfree(it); diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c new file mode 100644 index 0000000..53e5cda --- /dev/null +++ b/fs/lustre/llite/pcc.c @@ -0,0 +1,1042 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2017, DDN Storage Corporation. + */ +/* + * Persistent Client Cache + * + * PCC is a new framework which provides a group of local cache on Lustre + * client side. It works in two modes: RW-PCC enables a read-write cache on the + * local SSDs of a single client; RO-PCC provides a read-only cache on the + * local SSDs of multiple clients. Less overhead is visible to the applications + * and network latencies and lock conflicts can be significantly reduced. + * + * For RW-PCC, no global namespace will be provided. Each client uses its own + * local storage as a cache for itself. Local file system is used to manage + * the data on local caches. Cached I/O is directed to local file system while + * normal I/O is directed to OSTs. RW-PCC uses HSM for data synchronization. + * It uses HSM copytool to restore file from local caches to Lustre OSTs. Each + * PCC has a copytool instance running with unique archive number. Any remote + * access from another Lustre client would trigger the data synchronization. If + * a client with RW-PCC goes offline, the cached data becomes inaccessible for + * other client temporarily. And after the RW-PCC client reboots and the + * copytool restarts, the data will be accessible again. + * + * Following is what will happen in different conditions for RW-PCC: + * + * > When file is being created on RW-PCC + * + * A normal HSM released file is created on MDT; + * An empty mirror file is created on local cache; + * The HSM status of the Lustre file will be set to archived and released; + * The archive number will be set to the proper value. + * + * > When file is being prefetched to RW-PCC + * + * An file is copied to the local cache; + * The HSM status of the Lustre file will be set to archived and released; + * The archive number will be set to the proper value. + * + * > When file is being accessed from PCC + * + * Data will be read directly from local cache; + * Metadata will be read from MDT, except file size; + * File size will be got from local cache. + * + * > When PCC cached file is being accessed on another client + * + * RW-PCC cached files are automatically restored when a process on another + * client tries to read or modify them. The corresponding I/O will block + * waiting for the released file to be restored. This is transparent to the + * process. + * + * For RW-PCC, when a file is being created, a rule-based policy is used to + * determine whether it will be cached. Rule-based caching of newly created + * files can determine which file can use a cache on PCC directly without any + * admission control. + * + * RW-PCC design can accelerate I/O intensive applications with one-to-one + * mappings between files and accessing clients. However, in several use cases, + * files will never be updated, but need to be read simultaneously from many + * clients. RO-PCC implements a read-only caching on Lustre clients using + * SSDs. RO-PCC is based on the same framework as RW-PCC, expect + * that no HSM mechanism is used. + * + * The main advantages to use this SSD cache on the Lustre clients via PCC + * is that: + * - The I/O stack becomes much simpler for the cached data, as there is no + * interference with I/Os from other clients, which enables easier + * performance optimizations; + * - The requirements on the HW inside the client nodes are small, any kind of + * SSDs or even HDDs can be used as cache devices; + * - Caching reduces the pressure on the object storage targets (OSTs), as + * small or random I/Os can be regularized to big sequential I/Os and + * temporary files do not even need to be flushed to OSTs. + * + * PCC can accelerate applications with certain I/O patterns: + * - small-sized random writes (< 1MB) from a single client + * - repeated read of data that is larger than RAM + * - clients with high network latency + * + * Author: Li Xi + * Author: Qian Yingjin + */ + +#define DEBUG_SUBSYSTEM S_LLITE + +#include "pcc.h" +#include +#include +#include +#include "llite_internal.h" + +struct kmem_cache *pcc_inode_slab; + +void pcc_super_init(struct pcc_super *super) +{ + spin_lock_init(&super->pccs_lock); + INIT_LIST_HEAD(&super->pccs_datasets); +} + +/** + * pcc_dataset_add - Add a Cache policy to control which files need be + * cached and where it will be cached. + * + * @super: superblock of pcc + * @pathname: root path of pcc + * @id: HSM archive ID + * @projid: files with specified project ID will be cached. + */ +static int +pcc_dataset_add(struct pcc_super *super, const char *pathname, + u32 archive_id, u32 projid) +{ + int rc; + struct pcc_dataset *dataset; + struct pcc_dataset *tmp; + bool found = false; + + dataset = kzalloc(sizeof(*dataset), GFP_NOFS); + if (!dataset) + return -ENOMEM; + + rc = kern_path(pathname, LOOKUP_DIRECTORY, &dataset->pccd_path); + if (unlikely(rc)) { + kfree(dataset); + return rc; + } + strncpy(dataset->pccd_pathname, pathname, PATH_MAX); + dataset->pccd_id = archive_id; + dataset->pccd_projid = projid; + atomic_set(&dataset->pccd_refcount, 1); + + spin_lock(&super->pccs_lock); + list_for_each_entry(tmp, &super->pccs_datasets, pccd_linkage) { + if (tmp->pccd_id == archive_id) { + found = true; + break; + } + } + if (!found) + list_add(&dataset->pccd_linkage, &super->pccs_datasets); + spin_unlock(&super->pccs_lock); + + if (found) { + pcc_dataset_put(dataset); + rc = -EEXIST; + } + + return rc; +} + +struct pcc_dataset * +pcc_dataset_get(struct pcc_super *super, u32 projid, u32 archive_id) +{ + struct pcc_dataset *dataset; + struct pcc_dataset *selected = NULL; + + if (projid == 0 && archive_id == 0) + return NULL; + + /* + * archive ID is unique in the list, projid might be duplicate, + * we just return last added one as first priority. + */ + spin_lock(&super->pccs_lock); + list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { + if (projid && dataset->pccd_projid != projid) + continue; + if (archive_id && dataset->pccd_id != archive_id) + continue; + atomic_inc(&dataset->pccd_refcount); + selected = dataset; + break; + } + spin_unlock(&super->pccs_lock); + if (selected) + CDEBUG(D_CACHE, "matched projid %u, PCC create\n", + selected->pccd_projid); + return selected; +} + +void +pcc_dataset_put(struct pcc_dataset *dataset) +{ + if (atomic_dec_and_test(&dataset->pccd_refcount)) { + path_put(&dataset->pccd_path); + kfree(dataset); + } +} + +static int +pcc_dataset_del(struct pcc_super *super, char *pathname) +{ + struct list_head *l, *tmp; + struct pcc_dataset *dataset; + int rc = -ENOENT; + + spin_lock(&super->pccs_lock); + list_for_each_safe(l, tmp, &super->pccs_datasets) { + dataset = list_entry(l, struct pcc_dataset, pccd_linkage); + if (strcmp(dataset->pccd_pathname, pathname) == 0) { + list_del(&dataset->pccd_linkage); + pcc_dataset_put(dataset); + rc = 0; + break; + } + } + spin_unlock(&super->pccs_lock); + return rc; +} + +static void +pcc_dataset_dump(struct pcc_dataset *dataset, struct seq_file *m) +{ + seq_printf(m, "%s:\n", dataset->pccd_pathname); + seq_printf(m, " rwid: %u\n", dataset->pccd_id); + seq_printf(m, " autocache: projid=%u\n", dataset->pccd_projid); +} + +int +pcc_super_dump(struct pcc_super *super, struct seq_file *m) +{ + struct pcc_dataset *dataset; + + spin_lock(&super->pccs_lock); + list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { + pcc_dataset_dump(dataset, m); + } + spin_unlock(&super->pccs_lock); + return 0; +} + +void pcc_super_fini(struct pcc_super *super) +{ + struct pcc_dataset *dataset, *tmp; + + list_for_each_entry_safe(dataset, tmp, + &super->pccs_datasets, pccd_linkage) { + list_del(&dataset->pccd_linkage); + pcc_dataset_put(dataset); + } +} + +static bool pathname_is_valid(const char *pathname) +{ + /* Needs to be absolute path */ + if (!pathname || strlen(pathname) == 0 || + strlen(pathname) >= PATH_MAX || pathname[0] != '/') + return false; + return true; +} + +static struct pcc_cmd * +pcc_cmd_parse(char *buffer, unsigned long count) +{ + static struct pcc_cmd *cmd; + char *token; + char *val; + unsigned long tmp; + int rc = 0; + + cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); + if (!cmd) { + rc = -ENOMEM; + goto out; + } + + /* clear all setting */ + if (strncmp(buffer, "clear", 5) == 0) { + cmd->pccc_cmd = PCC_CLEAR_ALL; + rc = 0; + goto out; + } + + val = buffer; + token = strsep(&val, " "); + if (!val || strlen(val) == 0) { + rc = -EINVAL; + goto out_free_cmd; + } + + /* Type of the command */ + if (strcmp(token, "add") == 0) { + cmd->pccc_cmd = PCC_ADD_DATASET; + } else if (strcmp(token, "del") == 0) { + cmd->pccc_cmd = PCC_DEL_DATASET; + } else { + rc = -EINVAL; + goto out_free_cmd; + } + + /* Pathname of the dataset */ + token = strsep(&val, " "); + if ((!val && cmd->pccc_cmd != PCC_DEL_DATASET) || + !pathname_is_valid(token)) { + rc = -EINVAL; + goto out_free_cmd; + } + cmd->pccc_pathname = token; + + if (cmd->pccc_cmd == PCC_ADD_DATASET) { + /* archive ID */ + token = strsep(&val, " "); + if (!val) { + rc = -EINVAL; + goto out_free_cmd; + } + + rc = kstrtoul(token, 10, &tmp); + if (rc != 0) { + rc = -EINVAL; + goto out_free_cmd; + } + if (tmp == 0) { + rc = -EINVAL; + goto out_free_cmd; + } + cmd->u.pccc_add.pccc_id = tmp; + + token = val; + rc = kstrtoul(token, 10, &tmp); + if (rc != 0) { + rc = -EINVAL; + goto out_free_cmd; + } + if (tmp == 0) { + rc = -EINVAL; + goto out_free_cmd; + } + cmd->u.pccc_add.pccc_projid = tmp; + } + + goto out; +out_free_cmd: + kfree(cmd); +out: + if (rc) + cmd = ERR_PTR(rc); + return cmd; +} + +int pcc_cmd_handle(char *buffer, unsigned long count, + struct pcc_super *super) +{ + int rc = 0; + struct pcc_cmd *cmd; + + cmd = pcc_cmd_parse(buffer, count); + if (IS_ERR(cmd)) + return PTR_ERR(cmd); + + switch (cmd->pccc_cmd) { + case PCC_ADD_DATASET: + rc = pcc_dataset_add(super, cmd->pccc_pathname, + cmd->u.pccc_add.pccc_id, + cmd->u.pccc_add.pccc_projid); + break; + case PCC_DEL_DATASET: + rc = pcc_dataset_del(super, cmd->pccc_pathname); + break; + case PCC_CLEAR_ALL: + pcc_super_fini(super); + break; + default: + rc = -EINVAL; + break; + } + + kfree(cmd); + return rc; +} + +static inline void pcc_inode_lock(struct inode *inode) +{ + mutex_lock(&ll_i2info(inode)->lli_pcc_lock); +} + +static inline void pcc_inode_unlock(struct inode *inode) +{ + mutex_unlock(&ll_i2info(inode)->lli_pcc_lock); +} + +static void pcc_inode_init(struct pcc_inode *pcci) +{ + atomic_set(&pcci->pcci_refcount, 0); + pcci->pcci_type = LU_PCC_NONE; +} + +static void pcc_inode_fini(struct pcc_inode *pcci) +{ + path_put(&pcci->pcci_path); + pcci->pcci_type = LU_PCC_NONE; + kmem_cache_free(pcc_inode_slab, pcci); +} + +static void pcc_inode_get(struct pcc_inode *pcci) +{ + atomic_inc(&pcci->pcci_refcount); +} + +static void pcc_inode_put(struct pcc_inode *pcci) +{ + if (atomic_dec_and_test(&pcci->pcci_refcount)) + pcc_inode_fini(pcci); +} + +void pcc_inode_free(struct inode *inode) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci = lli->lli_pcc_inode; + + if (pcci) { + WARN_ON(atomic_read(&pcci->pcci_refcount) > 1); + pcc_inode_put(pcci); + lli->lli_pcc_inode = NULL; + } +} + +/* + * TODO: + * As Andreas suggested, we'd better use new layout to + * reduce overhead: + * (fid->f_oid >> 16 & oxFFFF)/FID + */ +#define MAX_PCC_DATABASE_PATH (6 * 5 + FID_NOBRACE_LEN + 1) +static int pcc_fid2dataset_path(char *buf, int sz, struct lu_fid *fid) +{ + return snprintf(buf, sz, "%04x/%04x/%04x/%04x/%04x/%04x/" + DFID_NOBRACE, + (fid)->f_oid & 0xFFFF, + (fid)->f_oid >> 16 & 0xFFFF, + (unsigned int)((fid)->f_seq & 0xFFFF), + (unsigned int)((fid)->f_seq >> 16 & 0xFFFF), + (unsigned int)((fid)->f_seq >> 32 & 0xFFFF), + (unsigned int)((fid)->f_seq >> 48 & 0xFFFF), + PFID(fid)); +} + +void pcc_file_init(struct pcc_file *pccf) +{ + pccf->pccf_file = NULL; + pccf->pccf_type = LU_PCC_NONE; +} + +int pcc_file_open(struct inode *inode, struct file *file) +{ + struct pcc_inode *pcci; + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct pcc_file *pccf = &fd->fd_pcc_file; + struct file *pcc_file; + struct path *path; + struct qstr *dname; + int rc = 0; + + if (!S_ISREG(inode->i_mode)) + return 0; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (!pcci) + goto out_unlock; + + if (atomic_read(&pcci->pcci_refcount) == 0) + goto out_unlock; + + pcc_inode_get(pcci); + WARN_ON(pccf->pccf_file); + + path = &pcci->pcci_path; + dname = &path->dentry->d_name; + CDEBUG(D_CACHE, "opening pcc file '%.*s'\n", dname->len, + dname->name); + pcc_file = dentry_open(path, file->f_flags, current_cred()); + if (IS_ERR_OR_NULL(pcc_file)) { + rc = pcc_file ? PTR_ERR(pcc_file) : -EINVAL; + pcc_inode_put(pcci); + } else { + pccf->pccf_file = pcc_file; + pccf->pccf_type = pcci->pcci_type; + } + +out_unlock: + pcc_inode_unlock(inode); + return rc; +} + +void pcc_file_release(struct inode *inode, struct file *file) +{ + struct pcc_inode *pcci; + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct pcc_file *pccf; + struct path *path; + struct qstr *dname; + + if (!S_ISREG(inode->i_mode) || !fd) + return; + + pccf = &fd->fd_pcc_file; + pcc_inode_lock(inode); + if (!pccf->pccf_file) + goto out; + + pcci = ll_i2pcci(inode); + LASSERT(pcci); + path = &pcci->pcci_path; + dname = &path->dentry->d_name; + CDEBUG(D_CACHE, "releasing pcc file \"%.*s\"\n", dname->len, + dname->name); + pcc_inode_put(pcci); + fput(pccf->pccf_file); + pccf->pccf_file = NULL; +out: + pcc_inode_unlock(inode); +} + +ssize_t pcc_file_read_iter(struct kiocb *iocb, + struct iov_iter *iter, bool *cached) +{ + struct file *file = iocb->ki_filp; + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct pcc_file *pccf = &fd->fd_pcc_file; + ssize_t result; + + if (!pccf->pccf_file) { + *cached = false; + return 0; + } + *cached = true; + iocb->ki_filp = pccf->pccf_file; + + result = generic_file_read_iter(iocb, iter); + iocb->ki_filp = file; + + return result; +} + +ssize_t pcc_file_write_iter(struct kiocb *iocb, + struct iov_iter *iter, bool *cached) +{ + struct file *file = iocb->ki_filp; + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct pcc_file *pccf = &fd->fd_pcc_file; + ssize_t result; + + if (!pccf->pccf_file) { + *cached = false; + return 0; + } + *cached = true; + + if (pccf->pccf_type != LU_PCC_READWRITE) + return -EWOULDBLOCK; + + iocb->ki_filp = pccf->pccf_file; + + /* Since file->fop->write_iter makes write calls via + * the normal vfs interface to the local PCC file system, + * the inode lock is not needed. + */ + result = file->f_op->write_iter(iocb, iter); + iocb->ki_filp = file; + return result; +} + +int pcc_inode_setattr(struct inode *inode, struct iattr *attr, + bool *cached) +{ + int rc = 0; + struct pcc_inode *pcci; + struct iattr attr2 = *attr; + struct dentry *pcc_dentry; + + if (!S_ISREG(inode->i_mode)) { + *cached = false; + return 0; + } + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (!pcci || atomic_read(&pcci->pcci_refcount) == 0) + goto out_unlock; + + *cached = true; + attr2.ia_valid = attr->ia_valid & (ATTR_SIZE | ATTR_ATIME | + ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | + ATTR_CTIME); + pcc_dentry = pcci->pcci_path.dentry; + inode_lock(pcc_dentry->d_inode); + rc = pcc_dentry->d_inode->i_op->setattr(pcc_dentry, &attr2); + inode_unlock(pcc_dentry->d_inode); +out_unlock: + pcc_inode_unlock(inode); + return rc; +} + +int pcc_inode_getattr(struct inode *inode, bool *cached) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci; + struct kstat stat; + s64 atime; + s64 mtime; + s64 ctime; + int rc = 0; + + if (!S_ISREG(inode->i_mode)) { + *cached = false; + return 0; + } + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (!pcci || atomic_read(&pcci->pcci_refcount) == 0) + goto out_unlock; + + *cached = true; + rc = vfs_getattr(&pcci->pcci_path, &stat, + STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); + if (rc) + goto out_unlock; + + ll_inode_size_lock(inode); + if (test_and_clear_bit(LLIF_UPDATE_ATIME, &lli->lli_flags) || + inode->i_atime.tv_sec < lli->lli_atime) + inode->i_atime.tv_sec = lli->lli_atime; + + inode->i_mtime.tv_sec = lli->lli_mtime; + inode->i_ctime.tv_sec = lli->lli_ctime; + + atime = inode->i_atime.tv_sec; + mtime = inode->i_mtime.tv_sec; + ctime = inode->i_ctime.tv_sec; + + if (atime < stat.atime.tv_sec) + atime = stat.atime.tv_sec; + + if (ctime < stat.ctime.tv_sec) + ctime = stat.ctime.tv_sec; + + if (mtime < stat.mtime.tv_sec) + mtime = stat.mtime.tv_sec; + + i_size_write(inode, stat.size); + inode->i_blocks = stat.blocks; + + inode->i_atime.tv_sec = atime; + inode->i_mtime.tv_sec = mtime; + inode->i_ctime.tv_sec = ctime; + + ll_inode_size_unlock(inode); + +out_unlock: + pcc_inode_unlock(inode); + return rc; +} + +/* Create directory under base if directory does not exist */ +static struct dentry * +pcc_mkdir(struct dentry *base, const char *name, umode_t mode) +{ + int rc; + struct dentry *dentry; + struct inode *dir = base->d_inode; + + inode_lock(dir); + dentry = lookup_one_len(name, base, strlen(name)); + if (IS_ERR(dentry)) + goto out; + + if (d_is_positive(dentry)) + goto out; + + rc = vfs_mkdir(dir, dentry, mode); + if (rc) { + dput(dentry); + dentry = ERR_PTR(rc); + goto out; + } +out: + inode_unlock(dir); + return dentry; +} + +static struct dentry * +pcc_mkdir_p(struct dentry *root, char *path, umode_t mode) +{ + char *ptr, *entry_name; + struct dentry *parent; + struct dentry *child = ERR_PTR(-EINVAL); + + ptr = path; + while (*ptr == '/') + ptr++; + + entry_name = ptr; + parent = dget(root); + while ((ptr = strchr(ptr, '/')) != NULL) { + *ptr = '\0'; + child = pcc_mkdir(parent, entry_name, mode); + *ptr = '/'; + if (IS_ERR(child)) + break; + dput(parent); + parent = child; + ptr++; + entry_name = ptr; + } + + return child; +} + +/* Create file under base. If file already exist, return failure */ +static struct dentry * +pcc_create(struct dentry *base, const char *name, umode_t mode) +{ + int rc; + struct dentry *dentry; + struct inode *dir = base->d_inode; + + inode_lock(dir); + dentry = lookup_one_len(name, base, strlen(name)); + if (IS_ERR(dentry)) + goto out; + + if (d_is_positive(dentry)) + goto out; + + rc = vfs_create(dir, dentry, mode, false); + if (rc) { + dput(dentry); + dentry = ERR_PTR(rc); + goto out; + } +out: + inode_unlock(dir); + return dentry; +} + +/* Must be called with pcci->pcci_lock held */ +static void pcc_inode_attach_init(struct pcc_dataset *dataset, + struct pcc_inode *pcci, + struct dentry *dentry, + enum lu_pcc_type type) +{ + pcci->pcci_path.mnt = mntget(dataset->pccd_path.mnt); + pcci->pcci_path.dentry = dentry; + LASSERT(atomic_read(&pcci->pcci_refcount) == 0); + atomic_set(&pcci->pcci_refcount, 1); + pcci->pcci_type = type; + pcci->pcci_attr_valid = false; +} + +static int __pcc_inode_create(struct pcc_dataset *dataset, + struct lu_fid *fid, + struct dentry **dentry) +{ + char *path; + struct dentry *base; + struct dentry *child; + int rc = 0; + + path = kzalloc(MAX_PCC_DATABASE_PATH, GFP_NOFS); + if (!path) + return -ENOMEM; + + pcc_fid2dataset_path(path, MAX_PCC_DATABASE_PATH, fid); + + base = pcc_mkdir_p(dataset->pccd_path.dentry, path, 0700); + if (IS_ERR(base)) { + rc = PTR_ERR(base); + goto out; + } + + snprintf(path, MAX_PCC_DATABASE_PATH, DFID_NOBRACE, PFID(fid)); + child = pcc_create(base, path, 0600); + if (IS_ERR(child)) { + rc = PTR_ERR(child); + goto out_base; + } + *dentry = child; + +out_base: + dput(base); +out: + kfree(path); + return rc; +} + +int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, + struct dentry **pcc_dentry) +{ + return __pcc_inode_create(dataset, fid, pcc_dentry); +} + +int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, + struct dentry *pcc_dentry) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci; + + LASSERT(!ll_i2pcci(inode)); + pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); + if (!pcci) + return -ENOMEM; + + pcc_inode_init(pcci); + pcc_inode_lock(inode); + pcc_inode_attach_init(dataset, pcci, pcc_dentry, LU_PCC_READWRITE); + lli->lli_pcc_inode = pcci; + pcc_inode_unlock(inode); + + return 0; +} + +static int pcc_filp_write(struct file *filp, const void *buf, ssize_t count, + loff_t *offset) +{ + while (count > 0) { + ssize_t size; + + size = kernel_write(filp, buf, count, offset); + if (size < 0) + return size; + count -= size; + buf += size; + } + return 0; +} + +static int pcc_copy_data(struct file *src, struct file *dst) +{ + int rc = 0; + ssize_t rc2; + loff_t pos, offset = 0; + size_t buf_len = 1048576; + void *buf; + + buf = kvzalloc(buf_len, GFP_NOFS); + if (!buf) + return -ENOMEM; + + while (1) { + pos = offset; + rc2 = kernel_read(src, buf, buf_len, &pos); + if (rc2 < 0) { + rc = rc2; + goto out_free; + } else if (rc2 == 0) + break; + + pos = offset; + rc = pcc_filp_write(dst, buf, rc2, &pos); + if (rc < 0) + goto out_free; + offset += rc2; + } + +out_free: + kvfree(buf); + return rc; +} + +int pcc_readwrite_attach(struct file *file, struct inode *inode, + u32 archive_id) +{ + struct pcc_dataset *dataset; + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci; + struct dentry *dentry; + struct file *pcc_filp; + struct path path; + int rc; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (!pcci) { + pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); + if (!pcci) { + pcc_inode_unlock(inode); + return -ENOMEM; + } + + pcc_inode_init(pcci); + } else if (atomic_read(&pcci->pcci_refcount) > 0) { + pcc_inode_unlock(inode); + return -EEXIST; + } + pcc_inode_unlock(inode); + + dataset = pcc_dataset_get(&ll_i2sbi(inode)->ll_pcc_super, 0, + archive_id); + if (!dataset) { + rc = -ENOENT; + goto out_free_pcci; + } + + rc = __pcc_inode_create(dataset, &lli->lli_fid, &dentry); + if (rc) + goto out_dataset_put; + + path.mnt = dataset->pccd_path.mnt; + path.dentry = dentry; + pcc_filp = dentry_open(&path, O_TRUNC | O_WRONLY | O_LARGEFILE, + current_cred()); + if (IS_ERR_OR_NULL(pcc_filp)) { + rc = pcc_filp ? PTR_ERR(pcc_filp) : -EINVAL; + goto out_dentry; + } + + rc = pcc_copy_data(file, pcc_filp); + if (rc) + goto out_fput; + + pcc_inode_lock(inode); + if (lli->lli_pcc_inode) { + rc = -EEXIST; + goto out_unlock; + } + pcc_inode_attach_init(dataset, pcci, dentry, LU_PCC_READWRITE); + lli->lli_pcc_inode = pcci; +out_unlock: + pcc_inode_unlock(inode); +out_fput: + fput(pcc_filp); +out_dentry: + if (rc) + dput(dentry); +out_dataset_put: + pcc_dataset_put(dataset); +out_free_pcci: + if (rc) + kmem_cache_free(pcc_inode_slab, pcci); + return rc; + +} + +int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, + bool lease_broken, int rc, bool attached) +{ + struct pcc_inode *pcci = ll_i2pcci(inode); + + if ((rc || lease_broken) && attached && pcci) + pcc_inode_put(pcci); + + return rc; +} + +int pcc_ioctl_detach(struct inode *inode) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci = lli->lli_pcc_inode; + int rc = 0; + int count; + + pcc_inode_lock(inode); + if (!pcci) + goto out_unlock; + + count = atomic_read(&pcci->pcci_refcount); + if (count > 1) { + rc = -EBUSY; + goto out_unlock; + } else if (count == 0) + goto out_unlock; + + pcc_inode_put(pcci); + lli->lli_pcc_inode = NULL; +out_unlock: + pcc_inode_unlock(inode); + + return rc; +} + +int pcc_ioctl_state(struct inode *inode, struct lu_pcc_state *state) +{ + int rc = 0; + int count; + char *buf; + char *path; + int buf_len = sizeof(state->pccs_path); + struct pcc_inode *pcci; + + if (buf_len <= 0) + return -EINVAL; + + buf = kzalloc(buf_len, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (!pcci) { + state->pccs_type = LU_PCC_NONE; + goto out_unlock; + } + + count = atomic_read(&pcci->pcci_refcount); + if (count == 0) { + state->pccs_type = LU_PCC_NONE; + goto out_unlock; + } + state->pccs_type = pcci->pcci_type; + state->pccs_open_count = count - 1; + state->pccs_flags = pcci->pcci_attr_valid ? + PCC_STATE_FLAG_ATTR_VALID : 0; + path = dentry_path_raw(pcci->pcci_path.dentry, buf, buf_len); + if (IS_ERR(path)) { + rc = PTR_ERR(path); + goto out_unlock; + } + + if (strlcpy(state->pccs_path, path, buf_len) >= buf_len) { + rc = -ENAMETOOLONG; + goto out_unlock; + } + +out_unlock: + pcc_inode_unlock(inode); + kfree(buf); + return rc; +} diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h new file mode 100644 index 0000000..0f960b9 --- /dev/null +++ b/fs/lustre/llite/pcc.h @@ -0,0 +1,129 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2017, DDN Storage Corporation. + */ +/* + * + * Persistent Client Cache + * + * Author: Li Xi + */ + +#ifndef LLITE_PCC_H +#define LLITE_PCC_H + +#include +#include +#include +#include + +extern struct kmem_cache *pcc_inode_slab; + +#define LPROCFS_WR_PCC_MAX_CMD 4096 + +struct pcc_dataset { + u32 pccd_id; /* Archive ID */ + u32 pccd_projid; /* Project ID */ + char pccd_pathname[PATH_MAX]; /* full path */ + struct path pccd_path; /* Root path */ + struct list_head pccd_linkage; /* Linked to pccs_datasets */ + atomic_t pccd_refcount; /* reference count */ +}; + +struct pcc_super { + spinlock_t pccs_lock; /* Protect pccs_datasets */ + struct list_head pccs_datasets; /* List of datasets */ +}; + +struct pcc_inode { + /* Cache path on local file system */ + struct path pcci_path; + /* + * If reference count is 0, then the cache is not inited, if 1, then + * no one is using it. + */ + atomic_t pcci_refcount; + /* Whether readonly or readwrite PCC */ + enum lu_pcc_type pcci_type; + /* Whether the inode is cached locally */ + bool pcci_attr_valid; +}; + +struct pcc_file { + /* Opened cache file */ + struct file *pccf_file; + /* Whether readonly or readwrite PCC */ + enum lu_pcc_type pccf_type; +}; + +enum pcc_cmd_type { + PCC_ADD_DATASET = 0, + PCC_DEL_DATASET, + PCC_CLEAR_ALL, +}; + +struct pcc_cmd { + enum pcc_cmd_type pccc_cmd; + char *pccc_pathname; + union { + struct pcc_cmd_add { + u32 pccc_id; + u32 pccc_projid; + } pccc_add; + struct pcc_cmd_del { + u32 pccc_pad; + } pccc_del; + } u; +}; + +void pcc_super_init(struct pcc_super *super); +void pcc_super_fini(struct pcc_super *super); +int pcc_cmd_handle(char *buffer, unsigned long count, + struct pcc_super *super); +int +pcc_super_dump(struct pcc_super *super, struct seq_file *m); +int pcc_readwrite_attach(struct file *file, + struct inode *inode, u32 arch_id); +int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, + bool lease_broken, int rc, bool attached); +int pcc_ioctl_detach(struct inode *inode); +int pcc_ioctl_state(struct inode *inode, struct lu_pcc_state *state); +void pcc_file_init(struct pcc_file *pccf); +int pcc_file_open(struct inode *inode, struct file *file); +void pcc_file_release(struct inode *inode, struct file *file); +ssize_t pcc_file_read_iter(struct kiocb *iocb, struct iov_iter *iter, + bool *cached); +ssize_t pcc_file_write_iter(struct kiocb *iocb, struct iov_iter *iter, + bool *cached); +int pcc_inode_getattr(struct inode *inode, bool *cached); +int pcc_inode_setattr(struct inode *inode, struct iattr *attr, bool *cached); +int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, + struct dentry **pcc_dentry); +int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, + struct dentry *pcc_dentry); +struct pcc_dataset * +pcc_dataset_get(struct pcc_super *super, u32 projid, u32 archive_id); +void pcc_dataset_put(struct pcc_dataset *dataset); +void pcc_inode_free(struct inode *inode); +#endif /* LLITE_PCC_H */ diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 6cae48c..afd51a6 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -222,6 +222,14 @@ static int __init lustre_init(void) if (!ll_file_data_slab) goto out_cache; + pcc_inode_slab = kmem_cache_create("ll_pcc_inode", + sizeof(struct pcc_inode), 0, + SLAB_HWCACHE_ALIGN, NULL); + if (!pcc_inode_slab) { + rc = -ENOMEM; + goto out_cache; + } + rc = llite_tunables_register(); if (rc) goto out_cache; @@ -258,6 +266,7 @@ static int __init lustre_init(void) out_cache: kmem_cache_destroy(ll_inode_cachep); kmem_cache_destroy(ll_file_data_slab); + kmem_cache_destroy(pcc_inode_slab); return rc; } @@ -278,6 +287,7 @@ static void __exit lustre_exit(void) rcu_barrier(); kmem_cache_destroy(ll_inode_cachep); kmem_cache_destroy(ll_file_data_slab); + kmem_cache_destroy(pcc_inode_slab); } MODULE_AUTHOR("OpenSFS, Inc. "); diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 3efd977..f62cd7c 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -356,7 +356,8 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, op_data->op_mds = tgt->ltd_index; } else { LASSERT(fid_is_sane(&op_data->op_fid1)); - LASSERT(fid_is_zero(&op_data->op_fid2)); + LASSERT(it->it_flags & MDS_OPEN_PCC || + fid_is_zero(&op_data->op_fid2)); LASSERT(op_data->op_name); tgt = lmv_locate_tgt(lmv, op_data); @@ -367,7 +368,8 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, /* If it is ready to open the file by FID, do not need * allocate FID at all, otherwise it will confuse MDT */ - if ((it->it_op & IT_CREAT) && !(it->it_flags & MDS_OPEN_BY_FID)) { + if ((it->it_op & IT_CREAT) && !(it->it_flags & MDS_OPEN_BY_FID || + it->it_flags & MDS_OPEN_PCC)) { /* * For lookup(IT_CREATE) cases allocate new fid and setup FLD * for it. diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 20ae322..bd64ebc 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3480,6 +3480,7 @@ static int lmv_merge_attr(struct obd_export *exp, .set_info_async = lmv_set_info_async, .notify = lmv_notify, .get_uuid = lmv_get_uuid, + .fid_alloc = lmv_fid_alloc, .iocontrol = lmv_iocontrol, .quotactl = lmv_quotactl }; diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c index f0e5a84..be77944b 100644 --- a/fs/lustre/mdc/mdc_lib.c +++ b/fs/lustre/mdc/mdc_lib.c @@ -294,6 +294,10 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data, cr_flags |= MDS_OPEN_HAS_EA; tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA); memcpy(tmp, lmm, lmmlen); + if (cr_flags & MDS_OPEN_PCC) { + LASSERT(op_data); + rec->cr_archive_id = op_data->op_archive_id; + } } set_mrc_cr_flags(rec, cr_flags); } @@ -504,6 +508,8 @@ static void mdc_close_intent_pack(struct ptlrpc_request *req, memcpy(req_capsule_client_get(&req->rq_pill, &RMF_U32), op_data->op_data, count * sizeof(u32)); } + } else if (bias & MDS_PCC_ATTACH) { + data->cd_archive_id = op_data->op_archive_id; } } diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index a26f3ae..2e54dd1 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1719,6 +1719,7 @@ enum mds_op_bias { MDS_CLOSE_RESYNC_DONE = 1 << 16, MDS_CLOSE_LAYOUT_SPLIT = 1 << 17, MDS_TRUNC_KEEP_LEASE = 1 << 18, + MDS_PCC_ATTACH = 1 << 19, }; #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP | \ @@ -1741,7 +1742,10 @@ struct mdt_rec_create { struct lu_fid cr_fid2; struct lustre_handle cr_open_handle_old; /* in case of open replay */ __s64 cr_time; - __u64 cr_rdev; + union { + __u64 cr_rdev; + __u32 cr_archive_id; + }; __u64 cr_ioepoch; __u64 cr_padding_1; /* rr_blocks */ __u32 cr_mode; @@ -2963,6 +2967,8 @@ struct close_data { struct close_data_resync_done cd_resync; /* split close */ __u16 cd_mirror_id; + /* PCC release */ + __u32 cd_archive_id; }; }; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index d66c883..2b12612 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -268,6 +268,7 @@ enum ll_lease_flags { LL_LEASE_RESYNC_DONE = 0x2, LL_LEASE_LAYOUT_MERGE = 0x4, LL_LEASE_LAYOUT_SPLIT = 0x8, + LL_LEASE_PCC_ATTACH = 0x10, }; #define IOC_IDS_MAX 4096 @@ -356,6 +357,8 @@ struct ll_ioc_lease_id { #define LL_IOC_LADVISE _IOR('f', 250, struct llapi_lu_ladvise) #define LL_IOC_HEAT_GET _IOWR('f', 251, struct lu_heat) #define LL_IOC_HEAT_SET _IOW('f', 251, __u64) +#define LL_IOC_PCC_DETACH _IOW('f', 252, struct lu_pcc_detach) +#define LL_IOC_PCC_STATE _IOR('f', 252, struct lu_pcc_state) #define LL_STATFS_LMV 1 #define LL_STATFS_LOV 2 @@ -1048,11 +1051,15 @@ enum la_valid { */ #define MDS_OPEN_RELEASE 02000000000000ULL /* Open the file for HSM release */ #define MDS_OPEN_RESYNC 04000000000000ULL /* FLR: file resync */ +#define MDS_OPEN_PCC 010000000000000ULL /* PCC: auto RW-PCC cache attach + * for newly created file + */ #define MDS_OPEN_FL_INTERNAL (MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS | \ MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK | \ MDS_OPEN_BY_FID | MDS_OPEN_LEASE | \ - MDS_OPEN_RELEASE | MDS_OPEN_RESYNC) + MDS_OPEN_RELEASE | MDS_OPEN_RESYNC | \ + MDS_OPEN_PCC) /********* Changelogs **********/ /** Changelog record types */ @@ -2062,6 +2069,47 @@ struct lu_heat { __u64 lh_heat[0]; }; +enum lu_pcc_type { + LU_PCC_NONE = 0, + LU_PCC_READWRITE, + LU_PCC_MAX +}; + +static inline const char *pcc_type2string(enum lu_pcc_type type) +{ + switch (type) { + case LU_PCC_NONE: + return "none"; + case LU_PCC_READWRITE: + return "readwrite"; + default: + return "fault"; + } +} + +struct lu_pcc_attach { + __u32 pcca_type; /* PCC type */ + __u32 pcca_id; /* archive ID for readwrite, group ID for readonly */ +}; + +struct lu_pcc_detach { + /* fid of the file to detach */ + struct lu_fid pccd_fid; +}; + +enum lu_pcc_state_flags { + /* Whether the inode attr is cached locally */ + PCC_STATE_FLAG_ATTR_VALID = 0x1, +}; + +struct lu_pcc_state { + __u32 pccs_type; /* enum lu_pcc_type */ + __u32 pccs_open_count; + __u32 pccs_flags; /* enum lu_pcc_state_flags */ + __u32 pccs_padding; + char pccs_path[PATH_MAX]; +}; + /** @} lustreuser */ #endif /* _LUSTRE_USER_H */ From patchwork Thu Feb 27 21:13:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410373 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F7A3138D for ; Thu, 27 Feb 2020 21:36:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87DBF24677 for ; Thu, 27 Feb 2020 21:36:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87DBF24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 15F8334A0DB; Thu, 27 Feb 2020 13:30:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B6B5A21FC68 for ; Thu, 27 Feb 2020 13:20:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 884818A9E; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8623746D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:47 -0500 Message-Id: <1582838290-17243-360-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 359/622] lustre: pcc: Non-blocking PCC caching X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Current PCC uses refcount of PCC inode to determine whether a previous PCC-attached file can be detached. If a file is open (refcount > 1), the detaching will return -EBUSY. When another client accesses the PCC-cached file, it will trigger the restore process as the file is HSM released. During restore, the Agent needs to detach the PCC-cached file. Thus, if a PCC-attached file is keeping opened but not closed for a long time, the restore request will always return failure. In this patch, we implement a non-blocking PCC caching mechanism for Lustre. After attaching the file into PCC, the client acquires the layout lock for the file, and the layout generation is maintained in the PCC inode. Under the layout lock protection, the PCC caching state is valid and all I/O will direct into PCC. When the layout lock is revoked, in the blocking AST it will invalidate the PCC caching state and detach the file automatically. This patch is also helpful to handle the ENOSPC error for PCC write by fallback to normal I/O path which will restore the file data into OSTs (The file is in HSM released state) and redo the write again. WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: 58d744e3eaab ("LU-10092 pcc: Non-blocking PCC caching") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/32966 Reviewed-by: Wang Shilong Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 4 + fs/lustre/llite/dir.c | 31 +- fs/lustre/llite/file.c | 63 ++-- fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/llite_lib.c | 1 + fs/lustre/llite/llite_mmap.c | 36 +- fs/lustre/llite/namei.c | 4 - fs/lustre/llite/pcc.c | 569 +++++++++++++++++++++++++++----- fs/lustre/llite/pcc.h | 51 ++- fs/lustre/llite/vvp_object.c | 3 +- include/uapi/linux/lustre/lustre_user.h | 10 +- 11 files changed, 604 insertions(+), 169 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 837b68d..9609dd5 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -458,6 +458,10 @@ #define OBD_FAIL_LLITE_IMUTEX_SEC 0x140e #define OBD_FAIL_LLITE_IMUTEX_NOSEC 0x140f #define OBD_FAIL_LLITE_OPEN_BY_NAME 0x1410 +#define OBD_FAIL_LLITE_PCC_FAKE_ERROR 0x1411 +#define OBD_FAIL_LLITE_PCC_DETACH_MKWRITE 0x1412 +#define OBD_FAIL_LLITE_PCC_MKWRITE_PAUSE 0x1413 +#define OBD_FAIL_LLITE_PCC_ATTACH_PAUSE 0x1414 #define OBD_FAIL_FID_INDIR 0x1501 #define OBD_FAIL_FID_INLMA 0x1502 diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 337582b..1f7ed32 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1917,41 +1917,12 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return ll_ioctl_fsgetxattr(inode, cmd, arg); case FS_IOC_FSSETXATTR: return ll_ioctl_fssetxattr(inode, cmd, arg); - case LL_IOC_PCC_DETACH: { + case LL_IOC_PCC_DETACH_BY_FID: { struct lu_pcc_detach *detach; struct lu_fid *fid; struct inode *inode2; unsigned long ino; - /* - * The reason why a dir IOCTL is used to detach a PCC-cached - * file rather than making it a file IOCTL is: - * When PCC caching a file, it will attach the file firstly, - * and increase the refcount of PCC inode (pcci->pcci_refcount) - * from 0 to 1. - * When detaching a PCC-cached file, it will check whether the - * refcount is 1. If so, the file can be detached successfully. - * Otherwise, it means there are some users opened and using - * the file currently, and it will return -EBUSY. - * Each open on the PCC-cached file will increase the refcount - * of the PCC inode; - * Each close on the PCC-cached file will decrease the refcount - * of the PCC inode; - * When used a file IOCTL to detach a PCC-cached file, it needs - * to open it at first, which will increase the refcount. So - * during the process of the detach IOCTL, it will return - * -EBUSY as the PCC inode refcount is larger than 1. Someone - * might argue that here it can just decrease the refcount - * of the PCC inode, return succeed and make the close of - * IOCTL file handle to perform the real detach. But this - * may result in inconsistent state of a PCC file. i.e. Process - * A got a successful return form the detach IOCTL; Process B - * opens the file before Process A finally closed the IOCTL - * file handle. It makes the following I/O of Process B will - * direct into PCC although the file was already detached from - * the view of Process A. - * Using a dir IOCTL does not exist the problem above. - */ detach = kzalloc(sizeof(*detach), GFP_KERNEL); if (!detach) return -ENOMEM; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 95e7c73..5a52cad 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -59,6 +59,7 @@ struct split_param { struct pcc_param { u64 pa_data_version; u32 pa_archive_id; + u32 pa_layout_gen; }; static int @@ -241,6 +242,12 @@ static int ll_close_inode_openhandle(struct inode *inode, body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); if (!(body->mbo_valid & OBD_MD_CLOSE_INTENT_EXECED)) rc = -EBUSY; + + if (bias & MDS_PCC_ATTACH) { + struct pcc_param *param = data; + + param->pa_layout_gen = body->mbo_layout_gen; + } } ll_finish_md_op_data(op_data); @@ -1657,7 +1664,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t result; u16 refcheck; ssize_t rc2; - bool cached = false; + bool cached; /** * Currently when PCC read failed, we do not fall back to the @@ -1766,20 +1773,21 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) struct vvp_io_args *args; ssize_t rc_tiny = 0, rc_normal; u16 refcheck; - bool cached = false; + bool cached; int result; /** - * When PCC write failed, we do not fall back to the normal - * write path, just return the error. The reason is that: - * PCC is actually a HSM device, and HSM does not handle the - * failure especially -ENOSPC due to space used out; Moreover, - * the fallback to normal I/O path for ENOSPC failure, needs - * to restore the file data to OSTs first and redo the write - * again, making the logic of PCC very complex. + * When PCC write failed, we usually do not fall back to the normal + * write path, just return the error. But there is a special case when + * returned error code is -ENOSPC due to running out of space on PCC HSM + * bakcend. At this time, it will fall back to normal I/O path and + * retry the I/O. As the file is in HSM released state, it will restore + * the file data to OSTs first and redo the write again. And the + * restore process will revoke the layout lock and detach the file + * from PCC cache automatically. */ result = pcc_file_write_iter(iocb, from, &cached); - if (cached) + if (cached && result != -ENOSPC) return result; /* NB: we can't do direct IO for tiny writes because they use the page @@ -3197,8 +3205,10 @@ static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc, case LL_LEASE_PCC_ATTACH: if (!rc) rc = rc2; - rc = pcc_readwrite_attach_fini(file, inode, lease_broken, - rc, attached); + rc = pcc_readwrite_attach_fini(file, inode, + param.pa_layout_gen, + lease_broken, rc, + attached); break; } @@ -3721,6 +3731,14 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags) rc = ll_heat_set(inode, flags); return rc; } + case LL_IOC_PCC_DETACH: + if (!S_ISREG(inode->i_mode)) + return -EINVAL; + + if (!inode_owner_or_capable(inode)) + return -EPERM; + + return pcc_ioctl_detach(inode); case LL_IOC_PCC_STATE: { struct lu_pcc_state __user *ustate = (struct lu_pcc_state __user *)arg; @@ -3735,7 +3753,7 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags) goto out_state; } - rc = pcc_ioctl_state(inode, state); + rc = pcc_ioctl_state(file, inode, state); if (rc) goto out_state; @@ -3855,19 +3873,13 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) { struct inode *inode = file_inode(file); struct ll_inode_info *lli = ll_i2info(inode); - struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct ptlrpc_request *req; - struct file *pcc_file = fd->fd_pcc_file.pccf_file; int rc, err; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", PFID(ll_inode2fid(inode)), inode); ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); - /* pcc cache path */ - if (pcc_file) - return file_inode(pcc_file)->i_fop->fsync(pcc_file, - start, end, datasync); rc = file_write_and_wait_range(file, start, end); inode_lock(inode); @@ -3877,6 +3889,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) */ if (!S_ISDIR(inode->i_mode)) { err = lli->lli_async_rc; + lli->lli_async_rc = 0; if (rc == 0) rc = err; @@ -3895,8 +3908,15 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) if (S_ISREG(inode->i_mode)) { struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + bool cached; - err = cl_sync_file_range(inode, start, end, CL_FSYNC_ALL, 0); + /* Sync metadata on MDT first, and then sync the cached data + * on PCC. + */ + err = pcc_fsync(file, start, end, datasync, &cached); + if (!cached) + err = cl_sync_file_range(inode, start, end, + CL_FSYNC_ALL, 0); if (rc == 0 && err < 0) rc = err; if (rc < 0) @@ -4416,11 +4436,12 @@ int ll_getattr(const struct path *path, struct kstat *stat, return rc; if (S_ISREG(inode->i_mode)) { - bool cached = false; + bool cached; rc = pcc_inode_getattr(inode, &cached); if (cached && rc < 0) return rc; + /* In case of restore, the MDT has the right size and has * already send it back without granting the layout lock, * inode is up-to-date so glimpse is useless. diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index f2ea856..d36e01e 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -208,6 +208,7 @@ struct ll_inode_info { char lli_jobid[LUSTRE_JOBID_SIZE]; struct mutex lli_pcc_lock; + enum lu_pcc_state_flags lli_pcc_state; struct pcc_inode *lli_pcc_inode; }; }; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index d46bc99..1b22062 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -956,6 +956,7 @@ void ll_lli_init(struct ll_inode_info *lli) obd_heat_clear(lli->lli_heat_instances, OBD_HEAT_COUNT); lli->lli_heat_flags = 0; mutex_init(&lli->lli_pcc_lock); + lli->lli_pcc_state = PCC_STATE_FL_NONE; lli->lli_pcc_inode = NULL; } mutex_init(&lli->lli_layout_mutex); diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index fc2331b..71799cd 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -360,9 +360,17 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; int count = 0; bool printed = false; + bool cached; vm_fault_t result; sigset_t old, new; + ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), + LPROC_LL_FAULT, 1); + + result = pcc_fault(vma, vmf, &cached); + if (cached) + return result; + /* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite * so that it can be killed by admin but not cause segfault by * other signals. @@ -370,9 +378,6 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) siginitsetinv(&new, sigmask(SIGKILL) | sigmask(SIGTERM)); sigprocmask(SIG_BLOCK, &new, &old); - ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), - LPROC_LL_FAULT, 1); - /* make sure offset is not a negative number */ if (vmf->pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return VM_FAULT_SIGBUS; @@ -410,12 +415,17 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) int count = 0; bool printed = false; bool retry; + bool cached; int err; vm_fault_t ret; ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), LPROC_LL_MKWRITE, 1); + err = pcc_page_mkwrite(vma, vmf, &cached); + if (cached) + return err; + file_update_time(vma->vm_file); do { retry = false; @@ -463,6 +473,7 @@ static void ll_vm_open(struct vm_area_struct *vma) LASSERT(atomic_read(&vob->vob_mmap_cnt) >= 0); atomic_inc(&vob->vob_mmap_cnt); + pcc_vm_open(vma); } /** @@ -475,6 +486,7 @@ static void ll_vm_close(struct vm_area_struct *vma) atomic_dec(&vob->vob_mmap_cnt); LASSERT(atomic_read(&vob->vob_mmap_cnt) >= 0); + pcc_vm_close(vma); } /* XXX put nice comment here. talk about __free_pte -> dirty pages and @@ -488,7 +500,7 @@ int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last) if (mapping_mapped(mapping)) { rc = 0; unmap_mapping_range(mapping, first + PAGE_SIZE - 1, - last - first + 1, 0); + last - first + 1, 1); } return rc; @@ -504,26 +516,24 @@ int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last) int ll_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); + bool cached; int rc; - struct ll_file_data *fd = LUSTRE_FPRIVATE(file); - struct file *pcc_file = fd->fd_pcc_file.pccf_file; - - /* pcc cache path */ - if (pcc_file) { - vma->vm_file = pcc_file; - return file_inode(pcc_file)->i_fop->mmap(pcc_file, vma); - } if (ll_file_nolock(file)) return -EOPNOTSUPP; + rc = pcc_file_mmap(file, vma, &cached); + if (cached && rc != 0) + return rc; + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_MAP, 1); rc = generic_file_mmap(file, vma); if (rc == 0) { vma->vm_ops = &ll_file_vm_ops; vma->vm_ops->open(vma); /* update the inode's size and mtime */ - rc = ll_glimpse_size(inode); + if (!cached) + rc = ll_glimpse_size(inode); } return rc; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 4f39b2c..d10decb 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -824,10 +824,6 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, lum->lmm_magic = LOV_USER_MAGIC_V1; lum->lmm_pattern = LOV_PATTERN_F_RELEASED | LOV_PATTERN_RAID0; - lum->lmm_stripe_size = 0; - lum->lmm_stripe_count = 0; - lum->lmm_stripe_offset = 0; - op_data->op_data = lum; op_data->op_data_size = sizeof(*lum); op_data->op_archive_id = dataset->pccd_id; diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index 53e5cda..8440647 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -401,17 +401,25 @@ static inline void pcc_inode_unlock(struct inode *inode) mutex_unlock(&ll_i2info(inode)->lli_pcc_lock); } -static void pcc_inode_init(struct pcc_inode *pcci) +static void pcc_inode_init(struct pcc_inode *pcci, struct ll_inode_info *lli) { + pcci->pcci_lli = lli; + lli->lli_pcc_inode = pcci; atomic_set(&pcci->pcci_refcount, 0); pcci->pcci_type = LU_PCC_NONE; + pcci->pcci_layout_gen = CL_LAYOUT_GEN_NONE; + atomic_set(&pcci->pcci_active_ios, 0); + init_waitqueue_head(&pcci->pcci_waitq); } static void pcc_inode_fini(struct pcc_inode *pcci) { + struct ll_inode_info *lli = pcci->pcci_lli; + path_put(&pcci->pcci_path); pcci->pcci_type = LU_PCC_NONE; kmem_cache_free(pcc_inode_slab, pcci); + lli->lli_pcc_inode = NULL; } static void pcc_inode_get(struct pcc_inode *pcci) @@ -427,13 +435,11 @@ static void pcc_inode_put(struct pcc_inode *pcci) void pcc_inode_free(struct inode *inode) { - struct ll_inode_info *lli = ll_i2info(inode); - struct pcc_inode *pcci = lli->lli_pcc_inode; + struct pcc_inode *pcci = ll_i2pcci(inode); if (pcci) { WARN_ON(atomic_read(&pcci->pcci_refcount) > 1); pcc_inode_put(pcci); - lli->lli_pcc_inode = NULL; } } @@ -463,6 +469,11 @@ void pcc_file_init(struct pcc_file *pccf) pccf->pccf_type = LU_PCC_NONE; } +static inline bool pcc_inode_has_layout(struct pcc_inode *pcci) +{ + return pcci->pcci_layout_gen != CL_LAYOUT_GEN_NONE; +} + int pcc_file_open(struct inode *inode, struct file *file) { struct pcc_inode *pcci; @@ -481,7 +492,8 @@ int pcc_file_open(struct inode *inode, struct file *file) if (!pcci) goto out_unlock; - if (atomic_read(&pcci->pcci_refcount) == 0) + if (atomic_read(&pcci->pcci_refcount) == 0 || + !pcc_inode_has_layout(pcci)) goto out_unlock; pcc_inode_get(pcci); @@ -534,24 +546,64 @@ void pcc_file_release(struct inode *inode, struct file *file) pcc_inode_unlock(inode); } +static inline void pcc_layout_gen_set(struct pcc_inode *pcci, + u32 gen) +{ + pcci->pcci_layout_gen = gen; +} + +static void pcc_io_init(struct inode *inode, bool *cached) +{ + struct pcc_inode *pcci; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (pcci && pcc_inode_has_layout(pcci)) { + LASSERT(atomic_read(&pcci->pcci_refcount) > 0); + atomic_inc(&pcci->pcci_active_ios); + *cached = true; + } else { + *cached = false; + } + pcc_inode_unlock(inode); +} + +static void pcc_io_fini(struct inode *inode) +{ + struct pcc_inode *pcci = ll_i2pcci(inode); + + LASSERT(pcci && atomic_read(&pcci->pcci_active_ios) > 0); + if (atomic_dec_and_test(&pcci->pcci_active_ios)) + wake_up_all(&pcci->pcci_waitq); +} + ssize_t pcc_file_read_iter(struct kiocb *iocb, struct iov_iter *iter, bool *cached) { struct file *file = iocb->ki_filp; struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct pcc_file *pccf = &fd->fd_pcc_file; + struct inode *inode = file_inode(file); ssize_t result; if (!pccf->pccf_file) { *cached = false; return 0; } - *cached = true; - iocb->ki_filp = pccf->pccf_file; - result = generic_file_read_iter(iocb, iter); + pcc_io_init(inode, cached); + if (!*cached) + return 0; + + iocb->ki_filp = pccf->pccf_file; + /* generic_file_aio_read does not support ext4-dax, + * filp->f_ops->read_iter uses ->aio_read hook directly + * to add support for ext4-dax. + */ + result = file->f_op->read_iter(iocb, iter); iocb->ki_filp = file; + pcc_io_fini(inode); return result; } @@ -561,16 +613,27 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb, struct file *file = iocb->ki_filp; struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct pcc_file *pccf = &fd->fd_pcc_file; + struct inode *inode = file_inode(file); ssize_t result; if (!pccf->pccf_file) { *cached = false; return 0; } - *cached = true; - if (pccf->pccf_type != LU_PCC_READWRITE) - return -EWOULDBLOCK; + if (pccf->pccf_type != LU_PCC_READWRITE) { + *cached = false; + return -EAGAIN; + } + + pcc_io_init(inode, cached); + if (!*cached) + return 0; + + if (OBD_FAIL_CHECK(OBD_FAIL_LLITE_PCC_FAKE_ERROR)) { + result = -ENOSPC; + goto out; + } iocb->ki_filp = pccf->pccf_file; @@ -580,6 +643,8 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb, */ result = file->f_op->write_iter(iocb, iter); iocb->ki_filp = file; +out: + pcc_io_fini(inode); return result; } @@ -587,37 +652,35 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr, bool *cached) { int rc = 0; - struct pcc_inode *pcci; struct iattr attr2 = *attr; struct dentry *pcc_dentry; + struct pcc_inode *pcci; if (!S_ISREG(inode->i_mode)) { *cached = false; return 0; } - pcc_inode_lock(inode); - pcci = ll_i2pcci(inode); - if (!pcci || atomic_read(&pcci->pcci_refcount) == 0) - goto out_unlock; + pcc_io_init(inode, cached); + if (!*cached) + return 0; - *cached = true; attr2.ia_valid = attr->ia_valid & (ATTR_SIZE | ATTR_ATIME | ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | ATTR_CTIME); + pcci = ll_i2pcci(inode); pcc_dentry = pcci->pcci_path.dentry; inode_lock(pcc_dentry->d_inode); rc = pcc_dentry->d_inode->i_op->setattr(pcc_dentry, &attr2); inode_unlock(pcc_dentry->d_inode); -out_unlock: - pcc_inode_unlock(inode); + + pcc_io_fini(inode); return rc; } int pcc_inode_getattr(struct inode *inode, bool *cached) { struct ll_inode_info *lli = ll_i2info(inode); - struct pcc_inode *pcci; struct kstat stat; s64 atime; s64 mtime; @@ -629,16 +692,14 @@ int pcc_inode_getattr(struct inode *inode, bool *cached) return 0; } - pcc_inode_lock(inode); - pcci = ll_i2pcci(inode); - if (!pcci || atomic_read(&pcci->pcci_refcount) == 0) - goto out_unlock; + pcc_io_init(inode, cached); + if (!*cached) + return 0; - *cached = true; - rc = vfs_getattr(&pcci->pcci_path, &stat, + rc = vfs_getattr(&ll_i2pcci(inode)->pcci_path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); if (rc) - goto out_unlock; + goto out; ll_inode_size_lock(inode); if (test_and_clear_bit(LLIF_UPDATE_ATIME, &lli->lli_flags) || @@ -669,9 +730,274 @@ int pcc_inode_getattr(struct inode *inode, bool *cached) inode->i_ctime.tv_sec = ctime; ll_inode_size_unlock(inode); +out: + pcc_io_fini(inode); + return rc; +} -out_unlock: +ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t count, unsigned int flags, + bool *cached) +{ + struct inode *inode = file_inode(in_file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(in_file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + ssize_t result; + + *cached = false; + if (!pcc_file) + return 0; + + if (!file_inode(pcc_file)->i_fop->splice_read) + return -ENOTSUPP; + + pcc_io_init(inode, cached); + if (!*cached) + return 0; + + result = file_inode(pcc_file)->i_fop->splice_read(pcc_file, + ppos, pipe, count, + flags); + + pcc_io_fini(inode); + return result; +} + +int pcc_fsync(struct file *file, loff_t start, loff_t end, + int datasync, bool *cached) +{ + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + int rc; + + if (!pcc_file) { + *cached = false; + return 0; + } + + pcc_io_init(inode, cached); + if (!*cached) + return 0; + + rc = file_inode(pcc_file)->i_fop->fsync(pcc_file, + start, end, datasync); + + pcc_io_fini(inode); + return rc; +} + +int pcc_file_mmap(struct file *file, struct vm_area_struct *vma, + bool *cached) +{ + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + struct pcc_inode *pcci; + int rc = 0; + + if (!pcc_file || !file_inode(pcc_file)->i_fop->mmap) { + *cached = false; + return 0; + } + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (pcci && pcc_inode_has_layout(pcci)) { + LASSERT(atomic_read(&pcci->pcci_refcount) > 1); + *cached = true; + vma->vm_file = pcc_file; + rc = file_inode(pcc_file)->i_fop->mmap(pcc_file, vma); + vma->vm_file = file; + /* Save the vm ops of backend PCC */ + vma->vm_private_data = (void *)vma->vm_ops; + } else { + *cached = false; + } pcc_inode_unlock(inode); + + return rc; +} + +void pcc_vm_open(struct vm_area_struct *vma) +{ + struct pcc_inode *pcci; + struct file *file = vma->vm_file; + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + const struct vm_operations_struct *pcc_vm_ops = vma->vm_private_data; + + if (!pcc_file || !pcc_vm_ops || !pcc_vm_ops->open) + return; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (pcci && pcc_inode_has_layout(pcci)) { + vma->vm_file = pcc_file; + pcc_vm_ops->open(vma); + vma->vm_file = file; + } + pcc_inode_unlock(inode); +} + +void pcc_vm_close(struct vm_area_struct *vma) +{ + struct file *file = vma->vm_file; + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + const struct vm_operations_struct *pcc_vm_ops = vma->vm_private_data; + + if (!pcc_file || !pcc_vm_ops || !pcc_vm_ops->close) + return; + + pcc_inode_lock(inode); + /* Layout lock maybe revoked here */ + vma->vm_file = pcc_file; + pcc_vm_ops->close(vma); + vma->vm_file = file; + pcc_inode_unlock(inode); +} + +int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, + bool *cached) +{ + struct page *page = vmf->page; + struct mm_struct *mm = vma->vm_mm; + struct file *file = vma->vm_file; + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + const struct vm_operations_struct *pcc_vm_ops = vma->vm_private_data; + int rc; + + if (!pcc_file || !pcc_vm_ops || !pcc_vm_ops->page_mkwrite) { + *cached = false; + return 0; + } + + /* Pause to allow for a race with concurrent detach */ + OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PCC_MKWRITE_PAUSE, cfs_fail_val); + + pcc_io_init(inode, cached); + if (!*cached) { + /* This happens when the file is detached from PCC after got + * the fault page via ->fault() on the inode of the PCC copy. + * Here it can not simply fall back to normal Lustre I/O path. + * The reason is that the address space of fault page used by + * ->page_mkwrite() is still the one of PCC inode. In the + * normal Lustre ->page_mkwrite() I/O path, it will be wrongly + * handled as the address space of the fault page is not + * consistent with the one of the Lustre inode (though the + * fault page was truncated). + * As the file is detached from PCC, the fault page must + * be released frist, and retry the mmap write (->fault() and + * ->page_mkwrite). + * We use an ugly and tricky method by returning + * VM_FAULT_NOPAGE | VM_FAULT_RETRY to the caller + * __do_page_fault and retry the memory fault handling. + */ + if (page->mapping == file_inode(pcc_file)->i_mapping) { + *cached = true; + up_read(&mm->mmap_sem); + return VM_FAULT_RETRY | VM_FAULT_NOPAGE; + } + + return 0; + } + + /* + * This fault injection can also be used to simulate -ENOSPC and + * -EDQUOT failure of underlying PCC backend fs. + */ + if (OBD_FAIL_CHECK(OBD_FAIL_LLITE_PCC_DETACH_MKWRITE)) { + pcc_io_fini(inode); + pcc_ioctl_detach(inode); + up_read(&mm->mmap_sem); + return VM_FAULT_RETRY | VM_FAULT_NOPAGE; + } + + vma->vm_file = pcc_file; + rc = pcc_vm_ops->page_mkwrite(vmf); + vma->vm_file = file; + + pcc_io_fini(inode); + return rc; +} + +int pcc_fault(struct vm_area_struct *vma, struct vm_fault *vmf, + bool *cached) +{ + struct file *file = vma->vm_file; + struct inode *inode = file_inode(file); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct file *pcc_file = fd->fd_pcc_file.pccf_file; + const struct vm_operations_struct *pcc_vm_ops = vma->vm_private_data; + int rc; + + if (!pcc_file || !pcc_vm_ops || !pcc_vm_ops->fault) { + *cached = false; + return 0; + } + + pcc_io_init(inode, cached); + if (!*cached) + return 0; + + vma->vm_file = pcc_file; + rc = pcc_vm_ops->fault(vmf); + vma->vm_file = file; + + pcc_io_fini(inode); + return rc; +} + +static void pcc_layout_wait(struct pcc_inode *pcci) +{ + if (atomic_read(&pcci->pcci_active_ios) > 0) + CDEBUG(D_CACHE, "Waiting for IO completion: %d\n", + atomic_read(&pcci->pcci_active_ios)); + wait_event_idle(pcci->pcci_waitq, + atomic_read(&pcci->pcci_active_ios) == 0); +} + +static void __pcc_layout_invalidate(struct pcc_inode *pcci) +{ + pcci->pcci_type = LU_PCC_NONE; + pcc_layout_gen_set(pcci, CL_LAYOUT_GEN_NONE); + pcc_layout_wait(pcci); +} + +void pcc_layout_invalidate(struct inode *inode) +{ + struct pcc_inode *pcci; + + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + if (pcci && pcc_inode_has_layout(pcci)) { + LASSERT(atomic_read(&pcci->pcci_refcount) > 0); + __pcc_layout_invalidate(pcci); + + CDEBUG(D_CACHE, "Invalidate "DFID" layout gen %d\n", + PFID(&ll_i2info(inode)->lli_fid), pcci->pcci_layout_gen); + + pcc_inode_put(pcci); + } + pcc_inode_unlock(inode); +} + +static int pcc_inode_remove(struct pcc_inode *pcci) +{ + struct dentry *dentry; + int rc; + + dentry = pcci->pcci_path.dentry; + rc = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); + if (rc) + CWARN("failed to unlink cached file, rc = %d\n", rc); + return rc; } @@ -719,9 +1045,10 @@ int pcc_inode_getattr(struct inode *inode, bool *cached) *ptr = '\0'; child = pcc_mkdir(parent, entry_name, mode); *ptr = '/'; + dput(parent); if (IS_ERR(child)) break; - dput(parent); + parent = child; ptr++; entry_name = ptr; @@ -816,21 +1143,36 @@ int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, struct dentry *pcc_dentry) { - struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci; + int rc = 0; + pcc_inode_lock(inode); LASSERT(!ll_i2pcci(inode)); pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); - if (!pcci) - return -ENOMEM; + if (!pcci) { + rc = -ENOMEM; + goto out_unlock; + } - pcc_inode_init(pcci); - pcc_inode_lock(inode); + pcc_inode_init(pcci, ll_i2info(inode)); pcc_inode_attach_init(dataset, pcci, pcc_dentry, LU_PCC_READWRITE); - lli->lli_pcc_inode = pcci; - pcc_inode_unlock(inode); + /* Set the layout generation of newly created file with 0 */ + pcc_layout_gen_set(pcci, 0); - return 0; +out_unlock: + if (rc) { + int rc2; + + rc2 = vfs_unlink(pcc_dentry->d_parent->d_inode, + pcc_dentry, NULL); + if (rc2) + CWARN("failed to unlink PCC file, rc = %d\n", rc2); + + dput(pcc_dentry); + } + + pcc_inode_unlock(inode); + return rc; } static int pcc_filp_write(struct file *filp, const void *buf, ssize_t count, @@ -881,6 +1223,30 @@ static int pcc_copy_data(struct file *src, struct file *dst) return rc; } +static int pcc_attach_allowed_check(struct inode *inode) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci; + int rc = 0; + + pcc_inode_lock(inode); + if (lli->lli_pcc_state & PCC_STATE_FL_ATTACHING) { + rc = -EBUSY; + goto out_unlock; + } + + pcci = ll_i2pcci(inode); + if (pcci && pcc_inode_has_layout(pcci)) { + rc = -EEXIST; + goto out_unlock; + } + + lli->lli_pcc_state |= PCC_STATE_FL_ATTACHING; +out_unlock: + pcc_inode_unlock(inode); + return rc; +} + int pcc_readwrite_attach(struct file *file, struct inode *inode, u32 archive_id) { @@ -892,28 +1258,14 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, struct path path; int rc; - pcc_inode_lock(inode); - pcci = ll_i2pcci(inode); - if (!pcci) { - pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); - if (!pcci) { - pcc_inode_unlock(inode); - return -ENOMEM; - } - - pcc_inode_init(pcci); - } else if (atomic_read(&pcci->pcci_refcount) > 0) { - pcc_inode_unlock(inode); - return -EEXIST; - } - pcc_inode_unlock(inode); + rc = pcc_attach_allowed_check(inode); + if (rc) + return rc; dataset = pcc_dataset_get(&ll_i2sbi(inode)->ll_pcc_super, 0, archive_id); - if (!dataset) { - rc = -ENOENT; - goto out_free_pcci; - } + if (!dataset) + return -ENOENT; rc = __pcc_inode_create(dataset, &lli->lli_fid, &dentry); if (rc) @@ -932,73 +1284,117 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, if (rc) goto out_fput; + /* Pause to allow for a race with concurrent HSM remove */ + OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PCC_ATTACH_PAUSE, cfs_fail_val); + pcc_inode_lock(inode); - if (lli->lli_pcc_inode) { - rc = -EEXIST; + pcci = ll_i2pcci(inode); + LASSERT(!pcci); + pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); + if (!pcci) { + rc = -ENOMEM; goto out_unlock; } + + pcc_inode_init(pcci, lli); pcc_inode_attach_init(dataset, pcci, dentry, LU_PCC_READWRITE); - lli->lli_pcc_inode = pcci; out_unlock: pcc_inode_unlock(inode); out_fput: fput(pcc_filp); out_dentry: - if (rc) + if (rc) { + int rc2; + + rc2 = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); + if (rc2) + CWARN("failed to unlink PCC file, rc = %d\n", rc2); + dput(dentry); + } out_dataset_put: pcc_dataset_put(dataset); -out_free_pcci: - if (rc) - kmem_cache_free(pcc_inode_slab, pcci); return rc; - } int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, - bool lease_broken, int rc, bool attached) + u32 gen, bool lease_broken, int rc, + bool attached) { - struct pcc_inode *pcci = ll_i2pcci(inode); + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci; + u32 gen2; - if ((rc || lease_broken) && attached && pcci) - pcc_inode_put(pcci); + pcc_inode_lock(inode); + pcci = ll_i2pcci(inode); + lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING; + if ((rc || lease_broken)) { + if (attached && pcci) + pcc_inode_put(pcci); + + goto out_unlock; + } + + /* PCC inode may be released due to layout lock revocatioin */ + if (!pcci) { + rc = -ESTALE; + goto out_unlock; + } + LASSERT(attached); + rc = ll_layout_refresh(inode, &gen2); + if (!rc) { + if (gen2 == gen) { + pcc_layout_gen_set(pcci, gen); + } else { + CDEBUG(D_CACHE, + DFID" layout changed from %d to %d.\n", + PFID(ll_inode2fid(inode)), gen, gen2); + rc = -ESTALE; + goto out_put; + } + } + +out_put: + if (rc) { + pcc_inode_remove(pcci); + pcc_inode_put(pcci); + } +out_unlock: + pcc_inode_unlock(inode); return rc; } int pcc_ioctl_detach(struct inode *inode) { struct ll_inode_info *lli = ll_i2info(inode); - struct pcc_inode *pcci = lli->lli_pcc_inode; + struct pcc_inode *pcci; int rc = 0; - int count; pcc_inode_lock(inode); - if (!pcci) - goto out_unlock; - - count = atomic_read(&pcci->pcci_refcount); - if (count > 1) { - rc = -EBUSY; - goto out_unlock; - } else if (count == 0) + pcci = lli->lli_pcc_inode; + if (!pcci || lli->lli_pcc_state & PCC_STATE_FL_ATTACHING || + !pcc_inode_has_layout(pcci)) goto out_unlock; + __pcc_layout_invalidate(pcci); pcc_inode_put(pcci); - lli->lli_pcc_inode = NULL; + out_unlock: pcc_inode_unlock(inode); - return rc; } -int pcc_ioctl_state(struct inode *inode, struct lu_pcc_state *state) +int pcc_ioctl_state(struct file *file, struct inode *inode, + struct lu_pcc_state *state) { int rc = 0; int count; char *buf; char *path; int buf_len = sizeof(state->pccs_path); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct pcc_file *pccf = &fd->fd_pcc_file; struct pcc_inode *pcci; if (buf_len <= 0) @@ -1018,12 +1414,17 @@ int pcc_ioctl_state(struct inode *inode, struct lu_pcc_state *state) count = atomic_read(&pcci->pcci_refcount); if (count == 0) { state->pccs_type = LU_PCC_NONE; + state->pccs_open_count = 0; goto out_unlock; } + + if (pcc_inode_has_layout(pcci)) + count--; + if (pccf->pccf_file) + count--; state->pccs_type = pcci->pcci_type; - state->pccs_open_count = count - 1; - state->pccs_flags = pcci->pcci_attr_valid ? - PCC_STATE_FLAG_ATTR_VALID : 0; + state->pccs_open_count = count; + state->pccs_flags = ll_i2info(inode)->lli_pcc_state; path = dentry_path_raw(pcci->pcci_path.dentry, buf, buf_len); if (IS_ERR(path)) { rc = PTR_ERR(path); diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index 0f960b9..1a73dbb 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -36,6 +36,7 @@ #include #include #include +#include #include extern struct kmem_cache *pcc_inode_slab; @@ -57,17 +58,27 @@ struct pcc_super { }; struct pcc_inode { + struct ll_inode_info *pcci_lli; /* Cache path on local file system */ - struct path pcci_path; + struct path pcci_path; /* * If reference count is 0, then the cache is not inited, if 1, then * no one is using it. */ - atomic_t pcci_refcount; + atomic_t pcci_refcount; /* Whether readonly or readwrite PCC */ - enum lu_pcc_type pcci_type; - /* Whether the inode is cached locally */ - bool pcci_attr_valid; + enum lu_pcc_type pcci_type; + /* Whether the inode attr is cached locally */ + bool pcci_attr_valid; + /* Layout generation */ + u32 pcci_layout_gen; + /* + * How many IOs are on going on this cached object. Layout can be + * changed only if there is no active IO. + */ + atomic_t pcci_active_ios; + /* Waitq - wait for PCC I/O completion. */ + wait_queue_head_t pcci_waitq; }; struct pcc_file { @@ -101,14 +112,15 @@ struct pcc_cmd { void pcc_super_fini(struct pcc_super *super); int pcc_cmd_handle(char *buffer, unsigned long count, struct pcc_super *super); -int -pcc_super_dump(struct pcc_super *super, struct seq_file *m); -int pcc_readwrite_attach(struct file *file, - struct inode *inode, u32 arch_id); +int pcc_super_dump(struct pcc_super *super, struct seq_file *m); +int pcc_readwrite_attach(struct file *file, struct inode *inode, + u32 arch_id); int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, - bool lease_broken, int rc, bool attached); + u32 gen, bool lease_broken, int rc, + bool attached); int pcc_ioctl_detach(struct inode *inode); -int pcc_ioctl_state(struct inode *inode, struct lu_pcc_state *state); +int pcc_ioctl_state(struct file *file, struct inode *inode, + struct lu_pcc_state *state); void pcc_file_init(struct pcc_file *pccf); int pcc_file_open(struct inode *inode, struct file *file); void pcc_file_release(struct inode *inode, struct file *file); @@ -118,12 +130,25 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb, struct iov_iter *iter, bool *cached); int pcc_inode_getattr(struct inode *inode, bool *cached); int pcc_inode_setattr(struct inode *inode, struct iattr *attr, bool *cached); +ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos, + struct pipe_inode_info *pipe, size_t count, + unsigned int flags, bool *cached); +int pcc_fsync(struct file *file, loff_t start, loff_t end, + int datasync, bool *cached); +int pcc_file_mmap(struct file *file, struct vm_area_struct *vma, bool *cached); +void pcc_vm_open(struct vm_area_struct *vma); +void pcc_vm_close(struct vm_area_struct *vma); +int pcc_fault(struct vm_area_struct *mva, struct vm_fault *vmf, bool *cached); +int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, + bool *cached); int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, struct dentry **pcc_dentry); int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, struct dentry *pcc_dentry); -struct pcc_dataset * -pcc_dataset_get(struct pcc_super *super, u32 projid, u32 archive_id); +struct pcc_dataset *pcc_dataset_get(struct pcc_super *super, u32 projid, + u32 archive_id); void pcc_dataset_put(struct pcc_dataset *dataset); void pcc_inode_free(struct inode *inode); +void pcc_layout_invalidate(struct inode *inode); + #endif /* LLITE_PCC_H */ diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index eeb8823..b5ae7ad 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -146,7 +146,8 @@ static int vvp_conf_set(const struct lu_env *env, struct cl_object *obj, * a price themselves. */ unmap_mapping_range(conf->coc_inode->i_mapping, - 0, OBD_OBJECT_EOF, 0); + 0, OBD_OBJECT_EOF, 1); + pcc_layout_invalidate(conf->coc_inode); } return 0; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 2b12612..b024a44 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -357,7 +357,8 @@ struct ll_ioc_lease_id { #define LL_IOC_LADVISE _IOR('f', 250, struct llapi_lu_ladvise) #define LL_IOC_HEAT_GET _IOWR('f', 251, struct lu_heat) #define LL_IOC_HEAT_SET _IOW('f', 251, __u64) -#define LL_IOC_PCC_DETACH _IOW('f', 252, struct lu_pcc_detach) +#define LL_IOC_PCC_DETACH _IO('f', 252) +#define LL_IOC_PCC_DETACH_BY_FID _IOW('f', 252, struct lu_pcc_detach) #define LL_IOC_PCC_STATE _IOR('f', 252, struct lu_pcc_state) #define LL_STATFS_LMV 1 @@ -2098,8 +2099,11 @@ struct lu_pcc_detach { }; enum lu_pcc_state_flags { - /* Whether the inode attr is cached locally */ - PCC_STATE_FLAG_ATTR_VALID = 0x1, + PCC_STATE_FL_NONE = 0x0, + /* The inode attr is cached locally */ + PCC_STATE_FL_ATTR_VALID = 0x01, + /* The file is being attached into PCC */ + PCC_STATE_FL_ATTACHING = 0x02, }; struct lu_pcc_state { From patchwork Thu Feb 27 21:13:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 541C7924 for ; Thu, 27 Feb 2020 21:45:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3D165246A2 for ; Thu, 27 Feb 2020 21:45:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D165246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DE31034B01B; Thu, 27 Feb 2020 13:36:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C7EB21FB88 for ; Thu, 27 Feb 2020 13:20:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8A92B8A9F; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 89135468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:48 -0500 Message-Id: <1582838290-17243-361-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 360/622] lustre: pcc: security and permission for non-root user access X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin For current PCC, if a file is left on the PCC cache, it may be accessible to other jobs/users who would not normally be able to access it. (That is, they access it directly on the PCC mount via FID as the local PCC mount is basically just a normal local file system.) This patch solves this by restricting access on the PCC side and just depending on the Lustre side permissions for opening a file. So PCC files on the local mount fs are created with some minimal (zero) set of permissions. Then, when accessing a PCC cached file, we do the permission check on the Lustre file, then do not do it on the PCC file. This should render the PCC files inaccessible except to root or via Lustre. WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: 2102c86e0d0a ("LU-10092 pcc: security and permission for non-root user access") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34637 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 3 +- fs/lustre/llite/llite_lib.c | 23 +++++++--- fs/lustre/llite/namei.c | 2 +- fs/lustre/llite/pcc.c | 103 ++++++++++++++++++++++++++++++++++++++------ fs/lustre/llite/pcc.h | 16 ++++--- 5 files changed, 120 insertions(+), 27 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 5a52cad..96311ad 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -860,6 +860,7 @@ int ll_file_open(struct inode *inode, struct file *file) if (rc) goto out_och_free; } + rc = pcc_file_open(inode, file); if (rc) goto out_och_free; @@ -1787,7 +1788,7 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * from PCC cache automatically. */ result = pcc_file_write_iter(iocb, from, &cached); - if (cached && result != -ENOSPC) + if (cached && result != -ENOSPC && result != -EDQUOT) return result; /* NB: we can't do direct IO for tiny writes because they use the page diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 1b22062..5ac083c 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -71,11 +71,16 @@ static struct ll_sb_info *ll_init_sbi(void) unsigned long pages; unsigned long lru_page_max; struct sysinfo si; + int rc; int i; sbi = kzalloc(sizeof(*sbi), GFP_NOFS); if (!sbi) - return NULL; + return ERR_PTR(-ENOMEM); + + rc = pcc_super_init(&sbi->ll_pcc_super); + if (rc < 0) + goto out_sbi; spin_lock_init(&sbi->ll_lock); mutex_init(&sbi->ll_lco.lco_lock); @@ -89,8 +94,8 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_cache = cl_cache_init(lru_page_max); if (!sbi->ll_cache) { - kfree(sbi); - return NULL; + rc = -ENOMEM; + goto out_pcc; } sbi->ll_ra_info.ra_max_pages_per_file = min(pages / 32, @@ -128,12 +133,16 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_squash.rsi_gid = 0; INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids); spin_lock_init(&sbi->ll_squash.rsi_lock); - pcc_super_init(&sbi->ll_pcc_super); /* Per-filesystem file heat */ sbi->ll_heat_decay_weight = SBI_DEFAULT_HEAT_DECAY_WEIGHT; sbi->ll_heat_period_second = SBI_DEFAULT_HEAT_PERIOD_SECOND; return sbi; +out_pcc: + pcc_super_fini(&sbi->ll_pcc_super); +out_sbi: + kfree(sbi); + return ERR_PTR(rc); } static void ll_free_sbi(struct super_block *sb) @@ -990,8 +999,8 @@ int ll_fill_super(struct super_block *sb) /* client additional sb info */ sbi = ll_init_sbi(); lsi->lsi_llsbi = sbi; - if (!sbi) { - err = -ENOMEM; + if (IS_ERR(sbi)) { + err = PTR_ERR(sbi); goto out_free; } @@ -1120,7 +1129,7 @@ void ll_put_super(struct super_block *sb) int next, force = 1, rc = 0; long ccc_count; - if (!sbi) + if (IS_ERR(sbi)) goto out_no_sbi; CDEBUG(D_VFSTRACE, "VFS Op: sb %p - %s\n", sb, profilenm); diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index d10decb..10c0cef 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -835,7 +835,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, goto out; } - rc = pcc_inode_create(dataset, &op_data->op_fid2, + rc = pcc_inode_create(parent->i_sb, dataset, &op_data->op_fid2, &pca->pca_dentry); if (rc) { retval = ERR_PTR(rc); diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index 8440647..fa81b55 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -113,10 +113,20 @@ struct kmem_cache *pcc_inode_slab; -void pcc_super_init(struct pcc_super *super) +int pcc_super_init(struct pcc_super *super) { + struct cred *cred; + + super->pccs_cred = cred = prepare_creds(); + if (!cred) + return -ENOMEM; + + /* Never override disk quota limits or use reserved space */ + cap_lower(cred->cap_effective, CAP_SYS_RESOURCE); spin_lock_init(&super->pccs_lock); INIT_LIST_HEAD(&super->pccs_datasets); + + return 0; } /** @@ -251,7 +261,7 @@ struct pcc_dataset * return 0; } -void pcc_super_fini(struct pcc_super *super) +static void pcc_remove_datasets(struct pcc_super *super) { struct pcc_dataset *dataset, *tmp; @@ -262,6 +272,12 @@ void pcc_super_fini(struct pcc_super *super) } } +void pcc_super_fini(struct pcc_super *super) +{ + pcc_remove_datasets(super); + put_cred(super->pccs_cred); +} + static bool pathname_is_valid(const char *pathname) { /* Needs to be absolute path */ @@ -380,7 +396,7 @@ int pcc_cmd_handle(char *buffer, unsigned long count, rc = pcc_dataset_del(super, cmd->pccc_pathname); break; case PCC_CLEAR_ALL: - pcc_super_fini(super); + pcc_remove_datasets(super); break; default: rc = -EINVAL; @@ -463,6 +479,11 @@ static int pcc_fid2dataset_path(char *buf, int sz, struct lu_fid *fid) PFID(fid)); } +static inline const struct cred *pcc_super_cred(struct super_block *sb) +{ + return ll_s2sbi(sb)->ll_pcc_super.pccs_cred; +} + void pcc_file_init(struct pcc_file *pccf) { pccf->pccf_file = NULL; @@ -503,7 +524,9 @@ int pcc_file_open(struct inode *inode, struct file *file) dname = &path->dentry->d_name; CDEBUG(D_CACHE, "opening pcc file '%.*s'\n", dname->len, dname->name); - pcc_file = dentry_open(path, file->f_flags, current_cred()); + + pcc_file = dentry_open(path, file->f_flags, + pcc_super_cred(inode->i_sb)); if (IS_ERR_OR_NULL(pcc_file)) { rc = pcc_file ? PTR_ERR(pcc_file) : -EINVAL; pcc_inode_put(pcci); @@ -652,6 +675,7 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr, bool *cached) { int rc = 0; + const struct cred *old_cred; struct iattr attr2 = *attr; struct dentry *pcc_dentry; struct pcc_inode *pcci; @@ -667,11 +691,13 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr, attr2.ia_valid = attr->ia_valid & (ATTR_SIZE | ATTR_ATIME | ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | - ATTR_CTIME); + ATTR_CTIME | ATTR_UID | ATTR_GID); pcci = ll_i2pcci(inode); pcc_dentry = pcci->pcci_path.dentry; inode_lock(pcc_dentry->d_inode); + old_cred = override_creds(pcc_super_cred(inode->i_sb)); rc = pcc_dentry->d_inode->i_op->setattr(pcc_dentry, &attr2); + revert_creds(old_cred); inode_unlock(pcc_dentry->d_inode); pcc_io_fini(inode); @@ -681,6 +707,7 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr, int pcc_inode_getattr(struct inode *inode, bool *cached) { struct ll_inode_info *lli = ll_i2info(inode); + const struct cred *old_cred; struct kstat stat; s64 atime; s64 mtime; @@ -696,8 +723,10 @@ int pcc_inode_getattr(struct inode *inode, bool *cached) if (!*cached) return 0; + old_cred = override_creds(pcc_super_cred(inode->i_sb)); rc = vfs_getattr(&ll_i2pcci(inode)->pcci_path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); + revert_creds(old_cred); if (rc) goto out; @@ -1113,14 +1142,14 @@ static int __pcc_inode_create(struct pcc_dataset *dataset, pcc_fid2dataset_path(path, MAX_PCC_DATABASE_PATH, fid); - base = pcc_mkdir_p(dataset->pccd_path.dentry, path, 0700); + base = pcc_mkdir_p(dataset->pccd_path.dentry, path, 0); if (IS_ERR(base)) { rc = PTR_ERR(base); goto out; } snprintf(path, MAX_PCC_DATABASE_PATH, DFID_NOBRACE, PFID(fid)); - child = pcc_create(base, path, 0600); + child = pcc_create(base, path, 0); if (IS_ERR(child)) { rc = PTR_ERR(child); goto out_base; @@ -1134,18 +1163,44 @@ static int __pcc_inode_create(struct pcc_dataset *dataset, return rc; } -int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, - struct dentry **pcc_dentry) +/* TODO: Set the project ID for PCC copy */ +int pcc_inode_store_ugpid(struct dentry *dentry, kuid_t uid, kgid_t gid) +{ + struct inode *inode = dentry->d_inode; + struct iattr attr; + int rc; + + attr.ia_valid = ATTR_UID | ATTR_GID; + attr.ia_uid = uid; + attr.ia_gid = gid; + + inode_lock(inode); + rc = notify_change(dentry, &attr, NULL); + inode_unlock(inode); + + return rc; +} + +int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, + struct lu_fid *fid, struct dentry **pcc_dentry) { - return __pcc_inode_create(dataset, fid, pcc_dentry); + const struct cred *old_cred; + int rc; + + old_cred = override_creds(pcc_super_cred(sb)); + rc = __pcc_inode_create(dataset, fid, pcc_dentry); + revert_creds(old_cred); + return rc; } int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, struct dentry *pcc_dentry) { + const struct cred *old_cred; struct pcc_inode *pcci; int rc = 0; + old_cred = override_creds(pcc_super_cred(inode->i_sb)); pcc_inode_lock(inode); LASSERT(!ll_i2pcci(inode)); pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); @@ -1154,6 +1209,11 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, goto out_unlock; } + rc = pcc_inode_store_ugpid(pcc_dentry, old_cred->suid, + old_cred->sgid); + if (rc) + goto out_unlock; + pcc_inode_init(pcci, ll_i2info(inode)); pcc_inode_attach_init(dataset, pcci, pcc_dentry, LU_PCC_READWRITE); /* Set the layout generation of newly created file with 0 */ @@ -1172,6 +1232,10 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, } pcc_inode_unlock(inode); + revert_creds(old_cred); + if (rc) + kmem_cache_free(pcc_inode_slab, pcci); + return rc; } @@ -1253,6 +1317,7 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, struct pcc_dataset *dataset; struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci; + const struct cred *old_cred; struct dentry *dentry; struct file *pcc_filp; struct path path; @@ -1267,9 +1332,12 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, if (!dataset) return -ENOENT; + old_cred = override_creds(pcc_super_cred(inode->i_sb)); rc = __pcc_inode_create(dataset, &lli->lli_fid, &dentry); - if (rc) + if (rc) { + revert_creds(old_cred); goto out_dataset_put; + } path.mnt = dataset->pccd_path.mnt; path.dentry = dentry; @@ -1277,9 +1345,15 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, current_cred()); if (IS_ERR_OR_NULL(pcc_filp)) { rc = pcc_filp ? PTR_ERR(pcc_filp) : -EINVAL; + revert_creds(old_cred); goto out_dentry; } + rc = pcc_inode_store_ugpid(dentry, old_cred->uid, old_cred->gid); + revert_creds(old_cred); + if (rc) + goto out_fput; + rc = pcc_copy_data(file, pcc_filp); if (rc) goto out_fput; @@ -1306,7 +1380,9 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, if (rc) { int rc2; + old_cred = override_creds(pcc_super_cred(inode->i_sb)); rc2 = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); + revert_creds(old_cred); if (rc2) CWARN("failed to unlink PCC file, rc = %d\n", rc2); @@ -1322,13 +1398,14 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, bool attached) { struct ll_inode_info *lli = ll_i2info(inode); + const struct cred *old_cred; struct pcc_inode *pcci; u32 gen2; pcc_inode_lock(inode); pcci = ll_i2pcci(inode); lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING; - if ((rc || lease_broken)) { + if (rc || lease_broken) { if (attached && pcci) pcc_inode_put(pcci); @@ -1357,7 +1434,9 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, out_put: if (rc) { + old_cred = override_creds(pcc_super_cred(inode->i_sb)); pcc_inode_remove(pcci); + revert_creds(old_cred); pcc_inode_put(pcci); } out_unlock: diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index 1a73dbb..54492c9 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -53,8 +53,12 @@ struct pcc_dataset { }; struct pcc_super { - spinlock_t pccs_lock; /* Protect pccs_datasets */ - struct list_head pccs_datasets; /* List of datasets */ + /* Protect pccs_datasets */ + spinlock_t pccs_lock; + /* List of datasets */ + struct list_head pccs_datasets; + /* creds of process who forced instantiation of super block */ + const struct cred *pccs_cred; }; struct pcc_inode { @@ -108,7 +112,7 @@ struct pcc_cmd { } u; }; -void pcc_super_init(struct pcc_super *super); +int pcc_super_init(struct pcc_super *super); void pcc_super_fini(struct pcc_super *super); int pcc_cmd_handle(char *buffer, unsigned long count, struct pcc_super *super); @@ -141,10 +145,10 @@ int pcc_fsync(struct file *file, loff_t start, loff_t end, int pcc_fault(struct vm_area_struct *mva, struct vm_fault *vmf, bool *cached); int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, bool *cached); -int pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, - struct dentry **pcc_dentry); +int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, + struct lu_fid *fid, struct dentry **pcc_dentry); int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, - struct dentry *pcc_dentry); + struct dentry *pcc_dentry); struct pcc_dataset *pcc_dataset_get(struct pcc_super *super, u32 projid, u32 archive_id); void pcc_dataset_put(struct pcc_dataset *dataset); From patchwork Thu Feb 27 21:13:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D13A292A for ; Thu, 27 Feb 2020 21:36:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B9D7824690 for ; Thu, 27 Feb 2020 21:36:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9D7824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DAFD534A138; Thu, 27 Feb 2020 13:30:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 752BF21FD4C for ; Thu, 27 Feb 2020 13:20:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8DC428AA0; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8BF6F47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:49 -0500 Message-Id: <1582838290-17243-362-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 361/622] lustre: llite: Rule based auto PCC caching when create files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Configurable rule based auto PCC caching for newly created files can significantly benefit users for readwrite PCC. It can determine which file can use a cache on PCC directly without any admission control for high priority user/group/project or filename with wildcard support. Meanwhile, we can enforce a quota limitation of capacity usage for each user/group/project to providing caching isolation. Similar to NRS TBF command line, it supports logical conditional conjunction and disjunction operations among different user/group/ project or filename with the wildcard support. The command line to add this kind of rule is as follow: lctl pcc add /mnt/lustre /mnt/pcc "projid={500 1000}&fname={*.h5},uid={1001} rwid=1 roid=1" It means that Project ID of 500, 1000 AND file suffix name is "h5" OR User ID is 1001 can be auto cached on PCC for newly create file on the client. "rwid" means RW-PCC attach ID (which is usually archive ID); "roid" means RO-PCC attach ID. By default, RO-PCC attach id is setting same with RW-PCC attach ID for a shared PCC backend. WC-bug-id: https://jira.whamcloud.com/browse/LU-10918 Lustre-commit: 4fbae1352947 ("LU-10918 llite: Rule based auto PCC caching when create files") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/34751 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 13 +- fs/lustre/llite/pcc.c | 637 ++++++++++++++++++++++++++++++++++++++++++++---- fs/lustre/llite/pcc.h | 67 ++++- 3 files changed, 659 insertions(+), 58 deletions(-) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 10c0cef..49433c9 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -826,7 +826,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, lum->lmm_pattern = LOV_PATTERN_F_RELEASED | LOV_PATTERN_RAID0; op_data->op_data = lum; op_data->op_data_size = sizeof(*lum); - op_data->op_archive_id = dataset->pccd_id; + op_data->op_archive_id = dataset->pccd_rwid; rc = obd_fid_alloc(NULL, ll_i2mdexp(parent), &op_data->op_fid2, op_data); @@ -1002,9 +1002,14 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, /* Volatile file is used for HSM restore, so do not use PCC */ if (!filename_is_volatile(dentry->d_name.name, dentry->d_name.len, NULL)) { - dataset = pcc_dataset_get(&sbi->ll_pcc_super, - ll_i2info(dir)->lli_projid, - 0); + struct pcc_matcher item; + + item.pm_uid = from_kuid(&init_user_ns, current_uid()); + item.pm_gid = from_kgid(&init_user_ns, current_gid()); + item.pm_projid = ll_i2info(dir)->lli_projid; + item.pm_name = &dentry->d_name; + dataset = pcc_dataset_match_get(&sbi->ll_pcc_super, + &item); pca.pca_dataset = dataset; } } diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index fa81b55..469ff6c 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -109,6 +109,7 @@ #include #include #include +#include #include "llite_internal.h" struct kmem_cache *pcc_inode_slab; @@ -129,23 +130,550 @@ int pcc_super_init(struct pcc_super *super) return 0; } +/* Rule based auto caching */ +static void pcc_id_list_free(struct list_head *id_list) +{ + struct pcc_match_id *id, *n; + + list_for_each_entry_safe(id, n, id_list, pmi_linkage) { + list_del_init(&id->pmi_linkage); + kfree(id); + } +} + +static void pcc_fname_list_free(struct list_head *fname_list) +{ + struct pcc_match_fname *fname, *n; + + list_for_each_entry_safe(fname, n, fname_list, pmf_linkage) { + kfree(fname->pmf_name); + list_del_init(&fname->pmf_linkage); + kfree(fname); + } +} + +static void pcc_expression_free(struct pcc_expression *expr) +{ + LASSERT(expr->pe_field >= PCC_FIELD_UID && + expr->pe_field < PCC_FIELD_MAX); + switch (expr->pe_field) { + case PCC_FIELD_UID: + case PCC_FIELD_GID: + case PCC_FIELD_PROJID: + pcc_id_list_free(&expr->pe_cond); + break; + case PCC_FIELD_FNAME: + pcc_fname_list_free(&expr->pe_cond); + break; + default: + LBUG(); + } + kfree(expr); +} + +static void pcc_conjunction_free(struct pcc_conjunction *conjunction) +{ + struct pcc_expression *expression, *n; + + LASSERT(list_empty(&conjunction->pc_linkage)); + list_for_each_entry_safe(expression, n, + &conjunction->pc_expressions, + pe_linkage) { + list_del_init(&expression->pe_linkage); + pcc_expression_free(expression); + } + kfree(conjunction); +} + +static void pcc_rule_conds_free(struct list_head *cond_list) +{ + struct pcc_conjunction *conjunction, *n; + + list_for_each_entry_safe(conjunction, n, cond_list, pc_linkage) { + list_del_init(&conjunction->pc_linkage); + pcc_conjunction_free(conjunction); + } +} + +static void pcc_cmd_fini(struct pcc_cmd *cmd) +{ + if (cmd->pccc_cmd == PCC_ADD_DATASET) { + if (!list_empty(&cmd->u.pccc_add.pccc_conds)) + pcc_rule_conds_free(&cmd->u.pccc_add.pccc_conds); + kfree(cmd->u.pccc_add.pccc_conds_str); + } +} + +#define PCC_DISJUNCTION_DELIM (',') +#define PCC_CONJUNCTION_DELIM ('&') +#define PCC_EXPRESSION_DELIM ('=') + +static int +pcc_fname_list_add(struct cfs_lstr *id, struct list_head *fname_list) +{ + struct pcc_match_fname *fname; + + fname = kzalloc(sizeof(*fname), GFP_KERNEL); + if (!fname) + return -ENOMEM; + + fname->pmf_name = kzalloc(id->ls_len + 1, GFP_KERNEL); + if (!fname->pmf_name) { + kfree(fname); + return -ENOMEM; + } + + memcpy(fname->pmf_name, id->ls_str, id->ls_len); + list_add_tail(&fname->pmf_linkage, fname_list); + return 0; +} + +static int +pcc_fname_list_parse(char *str, int len, struct list_head *fname_list) +{ + struct cfs_lstr src; + struct cfs_lstr res; + int rc = 0; + + src.ls_str = str; + src.ls_len = len; + INIT_LIST_HEAD(fname_list); + while (src.ls_str) { + rc = cfs_gettok(&src, ' ', &res); + if (rc == 0) { + rc = -EINVAL; + break; + } + rc = pcc_fname_list_add(&res, fname_list); + if (rc) + break; + } + if (rc) + pcc_fname_list_free(fname_list); + return rc; +} + +static int +pcc_id_list_parse(char *str, int len, struct list_head *id_list, + enum pcc_field type) +{ + struct cfs_lstr src; + struct cfs_lstr res; + int rc = 0; + + if (type != PCC_FIELD_UID && type != PCC_FIELD_GID && + type != PCC_FIELD_PROJID) + return -EINVAL; + + src.ls_str = str; + src.ls_len = len; + INIT_LIST_HEAD(id_list); + while (src.ls_str) { + struct pcc_match_id *id; + u32 id_val; + + if (cfs_gettok(&src, ' ', &res) == 0) { + rc = -EINVAL; + goto out; + } + + if (!cfs_str2num_check(res.ls_str, res.ls_len, + &id_val, 0, (u32)~0U)) { + rc = -EINVAL; + goto out; + } + + id = kzalloc(sizeof(*id), GFP_KERNEL); + if (!id) { + rc = -ENOMEM; + goto out; + } + + id->pmi_id = id_val; + list_add_tail(&id->pmi_linkage, id_list); + } +out: + if (rc) + pcc_id_list_free(id_list); + return rc; +} + +static inline bool +pcc_check_field(struct cfs_lstr *field, char *str) +{ + int len = strlen(str); + + return (field->ls_len == len && + strncmp(field->ls_str, str, len) == 0); +} + +static int +pcc_expression_parse(struct cfs_lstr *src, struct list_head *cond_list) +{ + struct pcc_expression *expr; + struct cfs_lstr field; + int rc = 0; + + expr = kzalloc(sizeof(*expr), GFP_KERNEL); + if (!expr) + return -ENOMEM; + + rc = cfs_gettok(src, PCC_EXPRESSION_DELIM, &field); + if (rc == 0 || src->ls_len <= 2 || src->ls_str[0] != '{' || + src->ls_str[src->ls_len - 1] != '}') { + rc = -EINVAL; + goto out; + } + + /* Skip '{' and '}' */ + src->ls_str++; + src->ls_len -= 2; + + if (pcc_check_field(&field, "uid")) { + if (pcc_id_list_parse(src->ls_str, + src->ls_len, + &expr->pe_cond, + PCC_FIELD_UID) < 0) { + rc = -EINVAL; + goto out; + } + expr->pe_field = PCC_FIELD_UID; + } else if (pcc_check_field(&field, "gid")) { + if (pcc_id_list_parse(src->ls_str, + src->ls_len, + &expr->pe_cond, + PCC_FIELD_GID) < 0) { + rc = -EINVAL; + goto out; + } + expr->pe_field = PCC_FIELD_GID; + } else if (pcc_check_field(&field, "projid")) { + if (pcc_id_list_parse(src->ls_str, + src->ls_len, + &expr->pe_cond, + PCC_FIELD_PROJID) < 0) { + rc = -EINVAL; + goto out; + } + expr->pe_field = PCC_FIELD_PROJID; + } else if (pcc_check_field(&field, "fname")) { + if (pcc_fname_list_parse(src->ls_str, + src->ls_len, + &expr->pe_cond) < 0) { + rc = -EINVAL; + goto out; + } + expr->pe_field = PCC_FIELD_FNAME; + } else { + rc = -EINVAL; + goto out; + } + + list_add_tail(&expr->pe_linkage, cond_list); + return 0; +out: + kfree(expr); + return rc; +} + +static int +pcc_conjunction_parse(struct cfs_lstr *src, struct list_head *cond_list) +{ + struct pcc_conjunction *conjunction; + struct cfs_lstr expr; + int rc = 0; + + conjunction = kzalloc(sizeof(*conjunction), GFP_KERNEL); + if (!conjunction) + return -ENOMEM; + + INIT_LIST_HEAD(&conjunction->pc_expressions); + list_add_tail(&conjunction->pc_linkage, cond_list); + + while (src->ls_str) { + rc = cfs_gettok(src, PCC_CONJUNCTION_DELIM, &expr); + if (rc == 0) { + rc = -EINVAL; + break; + } + rc = pcc_expression_parse(&expr, + &conjunction->pc_expressions); + if (rc) + break; + } + return rc; +} + +static int pcc_conds_parse(char *str, int len, struct list_head *cond_list) +{ + struct cfs_lstr src; + struct cfs_lstr res; + int rc = 0; + + src.ls_str = str; + src.ls_len = len; + INIT_LIST_HEAD(cond_list); + while (src.ls_str) { + rc = cfs_gettok(&src, PCC_DISJUNCTION_DELIM, &res); + if (rc == 0) { + rc = -EINVAL; + break; + } + rc = pcc_conjunction_parse(&res, cond_list); + if (rc) + break; + } + return rc; +} + +static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) +{ + int rc; + + cmd->u.pccc_add.pccc_conds_str = kzalloc(strlen(id) + 1, GFP_KERNEL); + if (!cmd->u.pccc_add.pccc_conds_str) + return -ENOMEM; + + memcpy(cmd->u.pccc_add.pccc_conds_str, id, strlen(id)); + + rc = pcc_conds_parse(cmd->u.pccc_add.pccc_conds_str, + strlen(cmd->u.pccc_add.pccc_conds_str), + &cmd->u.pccc_add.pccc_conds); + if (rc) + pcc_cmd_fini(cmd); + + return rc; +} + +static int +pcc_parse_value_pair(struct pcc_cmd *cmd, char *buffer) +{ + char *key, *val; + unsigned long id; + int rc; + + val = buffer; + key = strsep(&val, "="); + if (!val || strlen(val) == 0) + return -EINVAL; + + /* Key of the value pair */ + if (strcmp(key, "rwid") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id <= 0) + return -EINVAL; + cmd->u.pccc_add.pccc_rwid = id; + } else if (strcmp(key, "roid") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id <= 0) + return -EINVAL; + cmd->u.pccc_add.pccc_roid = id; + } else { + return -EINVAL; + } + + return 0; +} + +static int +pcc_parse_value_pairs(struct pcc_cmd *cmd, char *buffer) +{ + char *val; + char *token; + int rc; + + val = buffer; + while (val && strlen(val) != 0) { + token = strsep(&val, " "); + rc = pcc_parse_value_pair(cmd, token); + if (rc) + return rc; + } + + return 0; +} + +static void +pcc_dataset_rule_fini(struct pcc_match_rule *rule) +{ + if (!list_empty(&rule->pmr_conds)) + pcc_rule_conds_free(&rule->pmr_conds); + LASSERT(rule->pmr_conds_str); + kfree(rule->pmr_conds_str); +} + +static int +pcc_dataset_rule_init(struct pcc_match_rule *rule, struct pcc_cmd *cmd) +{ + int rc = 0; + + LASSERT(cmd->u.pccc_add.pccc_conds_str); + rule->pmr_conds_str = kzalloc( + strlen(cmd->u.pccc_add.pccc_conds_str) + 1, + GFP_KERNEL); + if (!rule->pmr_conds_str) + return -ENOMEM; + + memcpy(rule->pmr_conds_str, + cmd->u.pccc_add.pccc_conds_str, + strlen(cmd->u.pccc_add.pccc_conds_str)); + + INIT_LIST_HEAD(&rule->pmr_conds); + if (!list_empty(&cmd->u.pccc_add.pccc_conds)) + rc = pcc_conds_parse(rule->pmr_conds_str, + strlen(rule->pmr_conds_str), + &rule->pmr_conds); + + if (rc) + pcc_dataset_rule_fini(rule); + + return rc; +} + +/* Rule Matching */ +static int +pcc_id_list_match(struct list_head *id_list, u32 id_val) +{ + struct pcc_match_id *id; + + list_for_each_entry(id, id_list, pmi_linkage) { + if (id->pmi_id == id_val) + return 1; + } + return 0; +} + +static bool +cfs_match_wildcard(const char *pattern, const char *content) +{ + if (*pattern == '\0' && *content == '\0') + return true; + + if (*pattern == '*' && *(pattern + 1) != '\0' && *content == '\0') + return false; + + while (*pattern == *content) { + pattern++; + content++; + if (*pattern == '\0' && *content == '\0') + return true; + + if (*pattern == '*' && *(pattern + 1) != '\0' && + *content == '\0') + return false; + } + + if (*pattern == '*') + return (cfs_match_wildcard(pattern + 1, content) || + cfs_match_wildcard(pattern, content + 1)); + + return false; +} + +static int +pcc_fname_list_match(struct list_head *fname_list, const char *name) +{ + struct pcc_match_fname *fname; + + list_for_each_entry(fname, fname_list, pmf_linkage) { + if (cfs_match_wildcard(fname->pmf_name, name)) + return 1; + } + return 0; +} + +static int +pcc_expression_match(struct pcc_expression *expr, struct pcc_matcher *matcher) +{ + switch (expr->pe_field) { + case PCC_FIELD_UID: + return pcc_id_list_match(&expr->pe_cond, matcher->pm_uid); + case PCC_FIELD_GID: + return pcc_id_list_match(&expr->pe_cond, matcher->pm_gid); + case PCC_FIELD_PROJID: + return pcc_id_list_match(&expr->pe_cond, matcher->pm_projid); + case PCC_FIELD_FNAME: + return pcc_fname_list_match(&expr->pe_cond, + matcher->pm_name->name); + default: + return 0; + } +} + +static int +pcc_conjunction_match(struct pcc_conjunction *conjunction, + struct pcc_matcher *matcher) +{ + struct pcc_expression *expr; + int matched; + + list_for_each_entry(expr, &conjunction->pc_expressions, pe_linkage) { + matched = pcc_expression_match(expr, matcher); + if (!matched) + return 0; + } + + return 1; +} + +static int +pcc_cond_match(struct pcc_match_rule *rule, struct pcc_matcher *matcher) +{ + struct pcc_conjunction *conjunction; + int matched; + + list_for_each_entry(conjunction, &rule->pmr_conds, pc_linkage) { + matched = pcc_conjunction_match(conjunction, matcher); + if (matched) + return 1; + } + + return 0; +} + +struct pcc_dataset* +pcc_dataset_match_get(struct pcc_super *super, struct pcc_matcher *matcher) +{ + struct pcc_dataset *dataset; + struct pcc_dataset *selected = NULL; + + spin_lock(&super->pccs_lock); + list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { + if (pcc_cond_match(&dataset->pccd_rule, matcher)) { + atomic_inc(&dataset->pccd_refcount); + selected = dataset; + break; + } + } + spin_unlock(&super->pccs_lock); + if (selected) + CDEBUG(D_CACHE, "PCC create, matched %s - %d:%d:%d:%s\n", + dataset->pccd_rule.pmr_conds_str, + matcher->pm_uid, matcher->pm_gid, + matcher->pm_projid, matcher->pm_name->name); + + return selected; +} + /** * pcc_dataset_add - Add a Cache policy to control which files need be * cached and where it will be cached. * - * @super: superblock of pcc - * @pathname: root path of pcc - * @id: HSM archive ID - * @projid: files with specified project ID will be cached. + * @super: superblock of pcc + * @cmd: pcc command */ static int -pcc_dataset_add(struct pcc_super *super, const char *pathname, - u32 archive_id, u32 projid) +pcc_dataset_add(struct pcc_super *super, struct pcc_cmd *cmd) { - int rc; + char *pathname = cmd->pccc_pathname; struct pcc_dataset *dataset; struct pcc_dataset *tmp; bool found = false; + int rc; dataset = kzalloc(sizeof(*dataset), GFP_NOFS); if (!dataset) @@ -157,13 +685,23 @@ int pcc_super_init(struct pcc_super *super) return rc; } strncpy(dataset->pccd_pathname, pathname, PATH_MAX); - dataset->pccd_id = archive_id; - dataset->pccd_projid = projid; + dataset->pccd_rwid = cmd->u.pccc_add.pccc_rwid; + dataset->pccd_roid = cmd->u.pccc_add.pccc_roid; atomic_set(&dataset->pccd_refcount, 1); + rc = pcc_dataset_rule_init(&dataset->pccd_rule, cmd); + if (rc) { + pcc_dataset_put(dataset); + return rc; + } + spin_lock(&super->pccs_lock); list_for_each_entry(tmp, &super->pccs_datasets, pccd_linkage) { - if (tmp->pccd_id == archive_id) { + if (strcmp(tmp->pccd_pathname, pathname) == 0 || + (dataset->pccd_rwid != 0 && + dataset->pccd_rwid == tmp->pccd_rwid) || + (dataset->pccd_roid != 0 && + dataset->pccd_roid == tmp->pccd_roid)) { found = true; break; } @@ -181,23 +719,21 @@ int pcc_super_init(struct pcc_super *super) } struct pcc_dataset * -pcc_dataset_get(struct pcc_super *super, u32 projid, u32 archive_id) +pcc_dataset_get(struct pcc_super *super, enum lu_pcc_type type, u32 id) { struct pcc_dataset *dataset; struct pcc_dataset *selected = NULL; - if (projid == 0 && archive_id == 0) + if (id == 0) return NULL; /* - * archive ID is unique in the list, projid might be duplicate, + * archive ID (read-write ID) or read-only ID is unique in the list, * we just return last added one as first priority. */ spin_lock(&super->pccs_lock); list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { - if (projid && dataset->pccd_projid != projid) - continue; - if (archive_id && dataset->pccd_id != archive_id) + if (type == LU_PCC_READWRITE && dataset->pccd_rwid != id) continue; atomic_inc(&dataset->pccd_refcount); selected = dataset; @@ -205,8 +741,8 @@ struct pcc_dataset * } spin_unlock(&super->pccs_lock); if (selected) - CDEBUG(D_CACHE, "matched projid %u, PCC create\n", - selected->pccd_projid); + CDEBUG(D_CACHE, "matched id %u, PCC mode %d\n", id, type); + return selected; } @@ -214,6 +750,7 @@ struct pcc_dataset * pcc_dataset_put(struct pcc_dataset *dataset) { if (atomic_dec_and_test(&dataset->pccd_refcount)) { + pcc_dataset_rule_fini(&dataset->pccd_rule); path_put(&dataset->pccd_path); kfree(dataset); } @@ -244,8 +781,8 @@ struct pcc_dataset * pcc_dataset_dump(struct pcc_dataset *dataset, struct seq_file *m) { seq_printf(m, "%s:\n", dataset->pccd_pathname); - seq_printf(m, " rwid: %u\n", dataset->pccd_id); - seq_printf(m, " autocache: projid=%u\n", dataset->pccd_projid); + seq_printf(m, " rwid: %u\n", dataset->pccd_rwid); + seq_printf(m, " autocache: %s\n", dataset->pccd_rule.pmr_conds_str); } int @@ -293,7 +830,6 @@ static bool pathname_is_valid(const char *pathname) static struct pcc_cmd *cmd; char *token; char *val; - unsigned long tmp; int rc = 0; cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); @@ -336,38 +872,40 @@ static bool pathname_is_valid(const char *pathname) cmd->pccc_pathname = token; if (cmd->pccc_cmd == PCC_ADD_DATASET) { - /* archive ID */ - token = strsep(&val, " "); + /* List of ID */ + LASSERT(val); + token = val; + val = strrchr(token, '}'); if (!val) { rc = -EINVAL; goto out_free_cmd; } - rc = kstrtoul(token, 10, &tmp); - if (rc != 0) { - rc = -EINVAL; - goto out_free_cmd; - } - if (tmp == 0) { + /* Skip '}' */ + val++; + if (*val == '\0') { + val = NULL; + } else if (*val == ' ') { + *val = '\0'; + val++; + } else { rc = -EINVAL; goto out_free_cmd; } - cmd->u.pccc_add.pccc_id = tmp; - token = val; - rc = kstrtoul(token, 10, &tmp); - if (rc != 0) { - rc = -EINVAL; + rc = pcc_id_parse(cmd, token); + if (rc) goto out_free_cmd; - } - if (tmp == 0) { + + rc = pcc_parse_value_pairs(cmd, val); + if (rc) { rc = -EINVAL; - goto out_free_cmd; + goto out_cmd_fini; } - cmd->u.pccc_add.pccc_projid = tmp; } - goto out; +out_cmd_fini: + pcc_cmd_fini(cmd); out_free_cmd: kfree(cmd); out: @@ -388,9 +926,7 @@ int pcc_cmd_handle(char *buffer, unsigned long count, switch (cmd->pccc_cmd) { case PCC_ADD_DATASET: - rc = pcc_dataset_add(super, cmd->pccc_pathname, - cmd->u.pccc_add.pccc_id, - cmd->u.pccc_add.pccc_projid); + rc = pcc_dataset_add(super, cmd); break; case PCC_DEL_DATASET: rc = pcc_dataset_del(super, cmd->pccc_pathname); @@ -403,6 +939,7 @@ int pcc_cmd_handle(char *buffer, unsigned long count, break; } + pcc_cmd_fini(cmd); kfree(cmd); return rc; } @@ -1025,7 +1562,8 @@ static int pcc_inode_remove(struct pcc_inode *pcci) dentry = pcci->pcci_path.dentry; rc = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); if (rc) - CWARN("failed to unlink cached file, rc = %d\n", rc); + CWARN("failed to unlink PCC file %.*s, rc = %d\n", + dentry->d_name.len, dentry->d_name.name, rc); return rc; } @@ -1226,7 +1764,10 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, rc2 = vfs_unlink(pcc_dentry->d_parent->d_inode, pcc_dentry, NULL); if (rc2) - CWARN("failed to unlink PCC file, rc = %d\n", rc2); + CWARN("%s: failed to unlink PCC file %.*s, rc = %d\n", + ll_i2sbi(inode)->ll_fsname, + pcc_dentry->d_name.len, pcc_dentry->d_name.name, + rc2); dput(pcc_dentry); } @@ -1327,8 +1868,8 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, if (rc) return rc; - dataset = pcc_dataset_get(&ll_i2sbi(inode)->ll_pcc_super, 0, - archive_id); + dataset = pcc_dataset_get(&ll_i2sbi(inode)->ll_pcc_super, + LU_PCC_READWRITE, archive_id); if (!dataset) return -ENOENT; @@ -1384,7 +1925,9 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, rc2 = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); revert_creds(old_cred); if (rc2) - CWARN("failed to unlink PCC file, rc = %d\n", rc2); + CWARN("%s: failed to unlink PCC file %.*s, rc = %d\n", + ll_i2sbi(inode)->ll_fsname, dentry->d_name.len, + dentry->d_name.name, rc2); dput(dentry); } diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index 54492c9..f2b57f9 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -43,13 +43,64 @@ #define LPROCFS_WR_PCC_MAX_CMD 4096 +/* User/Group/Project ID */ +struct pcc_match_id { + u32 pmi_id; + struct list_head pmi_linkage; +}; + +/* wildcard file name */ +struct pcc_match_fname { + char *pmf_name; + struct list_head pmf_linkage; +}; + +enum pcc_field { + PCC_FIELD_UID, + PCC_FIELD_GID, + PCC_FIELD_PROJID, + PCC_FIELD_FNAME, + PCC_FIELD_MAX +}; + +struct pcc_expression { + enum pcc_field pe_field; + struct list_head pe_cond; + struct list_head pe_linkage; +}; + +struct pcc_conjunction { + /* link to disjunction */ + struct list_head pc_linkage; + /* list of logical conjunction */ + struct list_head pc_expressions; +}; + +/** + * Match rule for auto PCC-cached files. + */ +struct pcc_match_rule { + char *pmr_conds_str; + struct list_head pmr_conds; +}; + +struct pcc_matcher { + u32 pm_uid; + u32 pm_gid; + u32 pm_projid; + struct qstr *pm_name; +}; + struct pcc_dataset { - u32 pccd_id; /* Archive ID */ - u32 pccd_projid; /* Project ID */ + u32 pccd_rwid; /* Archive ID */ + u32 pccd_roid; /* Readonly ID */ + struct pcc_match_rule pccd_rule; /* Match rule */ + u32 pccd_rwonly:1, /* Only use as RW-PCC */ + pccd_roonly:1; /* Only use as RO-PCC */ char pccd_pathname[PATH_MAX]; /* full path */ struct path pccd_path; /* Root path */ struct list_head pccd_linkage; /* Linked to pccs_datasets */ - atomic_t pccd_refcount; /* reference count */ + atomic_t pccd_refcount; /* Reference count */ }; struct pcc_super { @@ -103,8 +154,10 @@ struct pcc_cmd { char *pccc_pathname; union { struct pcc_cmd_add { - u32 pccc_id; - u32 pccc_projid; + u32 pccc_rwid; + u32 pccc_roid; + struct list_head pccc_conds; + char *pccc_conds_str; } pccc_add; struct pcc_cmd_del { u32 pccc_pad; @@ -149,8 +202,8 @@ int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, struct lu_fid *fid, struct dentry **pcc_dentry); int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, struct dentry *pcc_dentry); -struct pcc_dataset *pcc_dataset_get(struct pcc_super *super, u32 projid, - u32 archive_id); +struct pcc_dataset *pcc_dataset_match_get(struct pcc_super *super, + struct pcc_matcher *matcher); void pcc_dataset_put(struct pcc_dataset *dataset); void pcc_inode_free(struct inode *inode); void pcc_layout_invalidate(struct inode *inode); From patchwork Thu Feb 27 21:13:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C00117E0 for ; Thu, 27 Feb 2020 21:43:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 24A3324690 for ; Thu, 27 Feb 2020 21:43:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24A3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 34E4221C9E6; Thu, 27 Feb 2020 13:34:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CD6E621FBAA for ; Thu, 27 Feb 2020 13:20:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 917058AA1; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8EEEE46A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:50 -0500 Message-Id: <1582838290-17243-363-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 362/622] lustre: pcc: auto attach during open for valid cache X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin In current PCC implementation, all PCC state information is stored in the in-memory data structure named pcc_inode (a member of data structure ll_inode_info). Once the file inode is reclaimed due to the memory pressure or memory shrinking, the corresponding in-memory pcc_inode will be released too, and the PCC-cached file will be detached automatically. And the revocation of layout lock will also trigger the detach of the PCC-cached file. These all lead that the still valid PCC-cached file can not be used. To solve this problem, we introduce an auto-attaching mechanism during open. During PCC attach, the L.Gen will be stored as extented attribute of the local copy file on PCC device. When the in-memory inode is reclaimed or the layout lock is revoked, and the file is opend again, it can check whether the stored L.Gen on the PCC copy is same as the Lustre file current L.Gen on MDT. If they are consistent, it means the cached copy on PCC device is still valid, we can continue to use it after auto-attach. WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: e29ecb659e51 ("LU-10092 pcc: auto attach during open for valid cache") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/33787 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 2 + fs/lustre/llite/pcc.c | 400 ++++++++++++++++++++++++++------ fs/lustre/llite/pcc.h | 18 +- fs/lustre/lov/lov_object.c | 1 + include/uapi/linux/lustre/lustre_user.h | 2 + 5 files changed, 348 insertions(+), 75 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 3337bbf..d1c1413 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -293,6 +293,8 @@ struct cl_layout { u32 cl_layout_gen; /** whether layout is a composite one */ bool cl_is_composite; + /** Whether layout is a HSM released one */ + bool cl_is_released; }; /** diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index 469ff6c..fc4a2a3 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -124,7 +124,7 @@ int pcc_super_init(struct pcc_super *super) /* Never override disk quota limits or use reserved space */ cap_lower(cred->cap_effective, CAP_SYS_RESOURCE); - spin_lock_init(&super->pccs_lock); + init_rwsem(&super->pccs_rw_sem); INIT_LIST_HEAD(&super->pccs_datasets); return 0; @@ -472,6 +472,24 @@ static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) if (id <= 0) return -EINVAL; cmd->u.pccc_add.pccc_roid = id; + } else if (strcmp(key, "open_attach") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id > 0) + cmd->u.pccc_add.pccc_flags |= PCC_DATASET_OPEN_ATTACH; + } else if (strcmp(key, "rwpcc") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id > 0) + cmd->u.pccc_add.pccc_flags |= PCC_DATASET_RWPCC; + } else if (strcmp(key, "ropcc") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id > 0) + cmd->u.pccc_add.pccc_flags |= PCC_DATASET_ROPCC; } else { return -EINVAL; } @@ -494,6 +512,24 @@ static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) return rc; } + switch (cmd->pccc_cmd) { + case PCC_ADD_DATASET: + if (cmd->u.pccc_add.pccc_flags & PCC_DATASET_RWPCC && + cmd->u.pccc_add.pccc_flags & PCC_DATASET_ROPCC) + return -EINVAL; + /* + * By default, a PCC backend can provide caching service for + * both RW-PCC and RO-PCC. + */ + if ((cmd->u.pccc_add.pccc_flags & PCC_DATASET_PCC_ALL) == 0) + cmd->u.pccc_add.pccc_flags |= PCC_DATASET_PCC_ALL; + break; + case PCC_DEL_DATASET: + case PCC_CLEAR_ALL: + break; + default: + return -EINVAL; + } return 0; } @@ -641,15 +677,18 @@ struct pcc_dataset* struct pcc_dataset *dataset; struct pcc_dataset *selected = NULL; - spin_lock(&super->pccs_lock); + down_read(&super->pccs_rw_sem); list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { + if (!(dataset->pccd_flags & PCC_DATASET_RWPCC)) + continue; + if (pcc_cond_match(&dataset->pccd_rule, matcher)) { atomic_inc(&dataset->pccd_refcount); selected = dataset; break; } } - spin_unlock(&super->pccs_lock); + up_read(&super->pccs_rw_sem); if (selected) CDEBUG(D_CACHE, "PCC create, matched %s - %d:%d:%d:%s\n", dataset->pccd_rule.pmr_conds_str, @@ -687,6 +726,7 @@ struct pcc_dataset* strncpy(dataset->pccd_pathname, pathname, PATH_MAX); dataset->pccd_rwid = cmd->u.pccc_add.pccc_rwid; dataset->pccd_roid = cmd->u.pccc_add.pccc_roid; + dataset->pccd_flags = cmd->u.pccc_add.pccc_flags; atomic_set(&dataset->pccd_refcount, 1); rc = pcc_dataset_rule_init(&dataset->pccd_rule, cmd); @@ -695,7 +735,7 @@ struct pcc_dataset* return rc; } - spin_lock(&super->pccs_lock); + down_write(&super->pccs_rw_sem); list_for_each_entry(tmp, &super->pccs_datasets, pccd_linkage) { if (strcmp(tmp->pccd_pathname, pathname) == 0 || (dataset->pccd_rwid != 0 && @@ -708,7 +748,7 @@ struct pcc_dataset* } if (!found) list_add(&dataset->pccd_linkage, &super->pccs_datasets); - spin_unlock(&super->pccs_lock); + up_write(&super->pccs_rw_sem); if (found) { pcc_dataset_put(dataset); @@ -731,15 +771,16 @@ struct pcc_dataset * * archive ID (read-write ID) or read-only ID is unique in the list, * we just return last added one as first priority. */ - spin_lock(&super->pccs_lock); + down_read(&super->pccs_rw_sem); list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { - if (type == LU_PCC_READWRITE && dataset->pccd_rwid != id) + if (type == LU_PCC_READWRITE && (dataset->pccd_rwid != id || + !(dataset->pccd_flags & PCC_DATASET_RWPCC))) continue; atomic_inc(&dataset->pccd_refcount); selected = dataset; break; } - spin_unlock(&super->pccs_lock); + up_read(&super->pccs_rw_sem); if (selected) CDEBUG(D_CACHE, "matched id %u, PCC mode %d\n", id, type); @@ -763,17 +804,17 @@ struct pcc_dataset * struct pcc_dataset *dataset; int rc = -ENOENT; - spin_lock(&super->pccs_lock); + down_write(&super->pccs_rw_sem); list_for_each_safe(l, tmp, &super->pccs_datasets) { dataset = list_entry(l, struct pcc_dataset, pccd_linkage); if (strcmp(dataset->pccd_pathname, pathname) == 0) { - list_del(&dataset->pccd_linkage); + list_del_init(&dataset->pccd_linkage); pcc_dataset_put(dataset); rc = 0; break; } } - spin_unlock(&super->pccs_lock); + up_write(&super->pccs_rw_sem); return rc; } @@ -782,6 +823,7 @@ struct pcc_dataset * { seq_printf(m, "%s:\n", dataset->pccd_pathname); seq_printf(m, " rwid: %u\n", dataset->pccd_rwid); + seq_printf(m, " flags: %x\n", dataset->pccd_flags); seq_printf(m, " autocache: %s\n", dataset->pccd_rule.pmr_conds_str); } @@ -790,11 +832,11 @@ struct pcc_dataset * { struct pcc_dataset *dataset; - spin_lock(&super->pccs_lock); + down_read(&super->pccs_rw_sem); list_for_each_entry(dataset, &super->pccs_datasets, pccd_linkage) { pcc_dataset_dump(dataset, m); } - spin_unlock(&super->pccs_lock); + up_read(&super->pccs_rw_sem); return 0; } @@ -802,11 +844,13 @@ static void pcc_remove_datasets(struct pcc_super *super) { struct pcc_dataset *dataset, *tmp; + down_write(&super->pccs_rw_sem); list_for_each_entry_safe(dataset, tmp, &super->pccs_datasets, pccd_linkage) { list_del(&dataset->pccd_linkage); pcc_dataset_put(dataset); } + up_write(&super->pccs_rw_sem); } void pcc_super_fini(struct pcc_super *super) @@ -1027,19 +1071,241 @@ void pcc_file_init(struct pcc_file *pccf) pccf->pccf_type = LU_PCC_NONE; } +static inline bool pcc_open_attach_enabled(struct pcc_dataset *dataset) +{ + return dataset->pccd_flags & PCC_DATASET_OPEN_ATTACH; +} + +static const char pcc_xattr_layout[] = XATTR_USER_PREFIX "PCC.layout"; + +static int pcc_layout_xattr_set(struct pcc_inode *pcci, u32 gen) +{ + struct dentry *pcc_dentry = pcci->pcci_path.dentry; + struct ll_inode_info *lli = pcci->pcci_lli; + int rc; + + if (!(lli->lli_pcc_state & PCC_STATE_FL_OPEN_ATTACH)) + return 0; + + rc = __vfs_setxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout, + &gen, sizeof(gen), 0); + return rc; +} + +static int pcc_get_layout_info(struct inode *inode, struct cl_layout *clt) +{ + struct lu_env *env; + struct ll_inode_info *lli = ll_i2info(inode); + u16 refcheck; + int rc; + + if (!lli->lli_clob) + return -EINVAL; + + env = cl_env_get(&refcheck); + if (IS_ERR(env)) + return PTR_ERR(env); + + rc = cl_object_layout_get(env, lli->lli_clob, clt); + if (rc) + CDEBUG(D_INODE, "Cannot get layout for "DFID"\n", + PFID(ll_inode2fid(inode))); + + cl_env_put(env, &refcheck); + return rc; +} + +static int pcc_fid2dataset_fullpath(char *buf, int sz, struct lu_fid *fid, + struct pcc_dataset *dataset) +{ + return snprintf(buf, sz, "%s/%04x/%04x/%04x/%04x/%04x/%04x/" + DFID_NOBRACE, + dataset->pccd_pathname, + (fid)->f_oid & 0xFFFF, + (fid)->f_oid >> 16 & 0xFFFF, + (unsigned int)((fid)->f_seq & 0xFFFF), + (unsigned int)((fid)->f_seq >> 16 & 0xFFFF), + (unsigned int)((fid)->f_seq >> 32 & 0xFFFF), + (unsigned int)((fid)->f_seq >> 48 & 0xFFFF), + PFID(fid)); +} + +/* Must be called with pcci->pcci_lock held */ +static void pcc_inode_attach_init(struct pcc_dataset *dataset, + struct pcc_inode *pcci, + struct dentry *dentry, + enum lu_pcc_type type) +{ + pcci->pcci_path.mnt = mntget(dataset->pccd_path.mnt); + pcci->pcci_path.dentry = dentry; + LASSERT(atomic_read(&pcci->pcci_refcount) == 0); + atomic_set(&pcci->pcci_refcount, 1); + pcci->pcci_type = type; + pcci->pcci_attr_valid = false; + + if (pcc_open_attach_enabled(dataset)) { + struct ll_inode_info *lli = pcci->pcci_lli; + + lli->lli_pcc_state |= PCC_STATE_FL_OPEN_ATTACH; + } +} + +static inline void pcc_layout_gen_set(struct pcc_inode *pcci, + u32 gen) +{ + pcci->pcci_layout_gen = gen; +} + static inline bool pcc_inode_has_layout(struct pcc_inode *pcci) { return pcci->pcci_layout_gen != CL_LAYOUT_GEN_NONE; } +static int pcc_try_dataset_attach(struct inode *inode, u32 gen, + enum lu_pcc_type type, + struct pcc_dataset *dataset, + bool *cached) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_inode *pcci = lli->lli_pcc_inode; + const struct cred *old_cred; + struct dentry *pcc_dentry; + struct path path; + char *pathname; + u32 pcc_gen; + int rc; + + if (type == LU_PCC_READWRITE && + !(dataset->pccd_flags & PCC_DATASET_RWPCC)) + return 0; + + pathname = kzalloc(PATH_MAX, GFP_KERNEL); + if (!pathname) + return -ENOMEM; + + pcc_fid2dataset_fullpath(pathname, PATH_MAX, &lli->lli_fid, dataset); + + old_cred = override_creds(pcc_super_cred(inode->i_sb)); + rc = kern_path(pathname, LOOKUP_FOLLOW, &path); + if (rc) { + /* ignore this error */ + rc = 0; + goto out; + } + + pcc_dentry = path.dentry; + rc = __vfs_getxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout, + &pcc_gen, sizeof(pcc_gen)); + if (rc < 0) { + /* ignore this error */ + rc = 0; + goto out_put_path; + } + + rc = 0; + /* The file is still valid cached in PCC, attach it immediately. */ + if (pcc_gen == gen) { + CDEBUG(D_CACHE, DFID" L.Gen (%d) consistent, auto attached.\n", + PFID(&lli->lli_fid), gen); + if (!pcci) { + pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); + if (!pcci) { + rc = -ENOMEM; + goto out_put_path; + } + + pcc_inode_init(pcci, lli); + dget(pcc_dentry); + pcc_inode_attach_init(dataset, pcci, pcc_dentry, type); + } else { + /* + * This happened when a file was once attached into + * PCC, and some processes keep this file opened + * (pcci->refcount > 1) and corresponding PCC file + * without any I/O activity, and then this file was + * detached by the manual detach command or the + * revocation of the layout lock (i.e. cached LRU lock + * shrinking). + */ + pcc_inode_get(pcci); + pcci->pcci_type = type; + } + pcc_layout_gen_set(pcci, gen); + *cached = true; + } +out_put_path: + path_put(&path); +out: + revert_creds(old_cred); + kfree(pathname); + return rc; +} + +static int pcc_try_datasets_attach(struct inode *inode, u32 gen, + enum lu_pcc_type type, bool *cached) +{ + struct pcc_dataset *dataset, *tmp; + struct pcc_super *super = &ll_i2sbi(inode)->ll_pcc_super; + int rc = 0; + + down_read(&super->pccs_rw_sem); + list_for_each_entry_safe(dataset, tmp, + &super->pccs_datasets, pccd_linkage) { + if (!pcc_open_attach_enabled(dataset)) + continue; + rc = pcc_try_dataset_attach(inode, gen, type, dataset, cached); + if (rc < 0 || (!rc && *cached)) + break; + } + up_read(&super->pccs_rw_sem); + + return rc; +} + +static int pcc_try_open_attach(struct inode *inode, bool *cached) +{ + struct pcc_super *super = &ll_i2sbi(inode)->ll_pcc_super; + struct cl_layout clt = { + .cl_layout_gen = 0, + .cl_is_released = false, + }; + int rc; + + /* + * Quick check whether there is PCC device. + */ + if (list_empty(&super->pccs_datasets)) + return 0; + + /* + * The file layout lock was cancelled. And this open does not + * obtain valid layout lock from MDT (i.e. the file is being + * HSM restoring). + */ + if (ll_layout_version_get(ll_i2info(inode)) == CL_LAYOUT_GEN_NONE) + return 0; + + rc = pcc_get_layout_info(inode, &clt); + if (rc) + return rc; + + if (clt.cl_is_released) + rc = pcc_try_datasets_attach(inode, clt.cl_layout_gen, + LU_PCC_READWRITE, cached); + + return rc; +} + int pcc_file_open(struct inode *inode, struct file *file) { struct pcc_inode *pcci; + struct ll_inode_info *lli = ll_i2info(inode); struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct pcc_file *pccf = &fd->fd_pcc_file; struct file *pcc_file; struct path *path; struct qstr *dname; + bool cached = false; int rc = 0; if (!S_ISREG(inode->i_mode)) @@ -1047,13 +1313,19 @@ int pcc_file_open(struct inode *inode, struct file *file) pcc_inode_lock(inode); pcci = ll_i2pcci(inode); - if (!pcci) - goto out_unlock; - if (atomic_read(&pcci->pcci_refcount) == 0 || - !pcc_inode_has_layout(pcci)) + if (lli->lli_pcc_state & PCC_STATE_FL_ATTACHING) goto out_unlock; + if (!pcci || !pcc_inode_has_layout(pcci)) { + rc = pcc_try_open_attach(inode, &cached); + if (rc < 0 || !cached) + goto out_unlock; + + if (!pcci) + pcci = ll_i2pcci(inode); + } + pcc_inode_get(pcci); WARN_ON(pccf->pccf_file); @@ -1106,12 +1378,6 @@ void pcc_file_release(struct inode *inode, struct file *file) pcc_inode_unlock(inode); } -static inline void pcc_layout_gen_set(struct pcc_inode *pcci, - u32 gen) -{ - pcci->pcci_layout_gen = gen; -} - static void pcc_io_init(struct inode *inode, bool *cached) { struct pcc_inode *pcci; @@ -1439,11 +1705,20 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, const struct vm_operations_struct *pcc_vm_ops = vma->vm_private_data; int rc; - if (!pcc_file || !pcc_vm_ops || !pcc_vm_ops->page_mkwrite) { + if (!pcc_file || !pcc_vm_ops) { *cached = false; return 0; } + if (!pcc_vm_ops->page_mkwrite && + page->mapping == pcc_file->f_mapping) { + CDEBUG(D_MMAP, + "%s: PCC backend fs not support ->page_mkwrite()\n", + ll_i2sbi(inode)->ll_fsname); + pcc_ioctl_detach(inode); + up_read(&mm->mmap_sem); + return VM_FAULT_RETRY | VM_FAULT_NOPAGE; + } /* Pause to allow for a race with concurrent detach */ OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PCC_MKWRITE_PAUSE, cfs_fail_val); @@ -1465,7 +1740,7 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, * VM_FAULT_NOPAGE | VM_FAULT_RETRY to the caller * __do_page_fault and retry the memory fault handling. */ - if (page->mapping == file_inode(pcc_file)->i_mapping) { + if (page->mapping == pcc_file->f_mapping) { *cached = true; up_read(&mm->mmap_sem); return VM_FAULT_RETRY | VM_FAULT_NOPAGE; @@ -1554,16 +1829,15 @@ void pcc_layout_invalidate(struct inode *inode) pcc_inode_unlock(inode); } -static int pcc_inode_remove(struct pcc_inode *pcci) +static int pcc_inode_remove(struct inode *inode, struct dentry *pcc_dentry) { - struct dentry *dentry; int rc; - dentry = pcci->pcci_path.dentry; - rc = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); + rc = vfs_unlink(pcc_dentry->d_parent->d_inode, pcc_dentry, NULL); if (rc) - CWARN("failed to unlink PCC file %.*s, rc = %d\n", - dentry->d_name.len, dentry->d_name.name, rc); + CWARN("%s: failed to unlink PCC file %.*s, rc = %d\n", + ll_i2sbi(inode)->ll_fsname, pcc_dentry->d_name.len, + pcc_dentry->d_name.name, rc); return rc; } @@ -1651,20 +1925,6 @@ static int pcc_inode_remove(struct pcc_inode *pcci) return dentry; } -/* Must be called with pcci->pcci_lock held */ -static void pcc_inode_attach_init(struct pcc_dataset *dataset, - struct pcc_inode *pcci, - struct dentry *dentry, - enum lu_pcc_type type) -{ - pcci->pcci_path.mnt = mntget(dataset->pccd_path.mnt); - pcci->pcci_path.dentry = dentry; - LASSERT(atomic_read(&pcci->pcci_refcount) == 0); - atomic_set(&pcci->pcci_refcount, 1); - pcci->pcci_type = type; - pcci->pcci_attr_valid = false; -} - static int __pcc_inode_create(struct pcc_dataset *dataset, struct lu_fid *fid, struct dentry **dentry) @@ -1744,38 +2004,37 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); if (!pcci) { rc = -ENOMEM; - goto out_unlock; + goto out_put; } rc = pcc_inode_store_ugpid(pcc_dentry, old_cred->suid, old_cred->sgid); if (rc) - goto out_unlock; + goto out_put; pcc_inode_init(pcci, ll_i2info(inode)); pcc_inode_attach_init(dataset, pcci, pcc_dentry, LU_PCC_READWRITE); - /* Set the layout generation of newly created file with 0 */ - pcc_layout_gen_set(pcci, 0); -out_unlock: + rc = pcc_layout_xattr_set(pcci, 0); if (rc) { - int rc2; + (void) pcc_inode_remove(inode, pcci->pcci_path.dentry); + pcc_inode_put(pcci); + goto out_unlock; + } - rc2 = vfs_unlink(pcc_dentry->d_parent->d_inode, - pcc_dentry, NULL); - if (rc2) - CWARN("%s: failed to unlink PCC file %.*s, rc = %d\n", - ll_i2sbi(inode)->ll_fsname, - pcc_dentry->d_name.len, pcc_dentry->d_name.name, - rc2); + /* Set the layout generation of newly created file with 0 */ + pcc_layout_gen_set(pcci, 0); +out_put: + if (rc) { + (void) pcc_inode_remove(inode, pcc_dentry); dput(pcc_dentry); - } + kmem_cache_free(pcc_inode_slab, pcci); + } +out_unlock: pcc_inode_unlock(inode); revert_creds(old_cred); - if (rc) - kmem_cache_free(pcc_inode_slab, pcci); return rc; } @@ -1919,16 +2178,9 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, fput(pcc_filp); out_dentry: if (rc) { - int rc2; - old_cred = override_creds(pcc_super_cred(inode->i_sb)); - rc2 = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); + (void) pcc_inode_remove(inode, dentry); revert_creds(old_cred); - if (rc2) - CWARN("%s: failed to unlink PCC file %.*s, rc = %d\n", - ll_i2sbi(inode)->ll_fsname, dentry->d_name.len, - dentry->d_name.name, rc2); - dput(dentry); } out_dataset_put: @@ -1945,6 +2197,7 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, struct pcc_inode *pcci; u32 gen2; + old_cred = override_creds(pcc_super_cred(inode->i_sb)); pcc_inode_lock(inode); pcci = ll_i2pcci(inode); lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING; @@ -1962,6 +2215,10 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, } LASSERT(attached); + rc = pcc_layout_xattr_set(pcci, gen); + if (rc) + goto out_put; + rc = ll_layout_refresh(inode, &gen2); if (!rc) { if (gen2 == gen) { @@ -1977,13 +2234,12 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, out_put: if (rc) { - old_cred = override_creds(pcc_super_cred(inode->i_sb)); - pcc_inode_remove(pcci); - revert_creds(old_cred); + (void) pcc_inode_remove(inode, pcci->pcci_path.dentry); pcc_inode_put(pcci); } out_unlock: pcc_inode_unlock(inode); + revert_creds(old_cred); return rc; } diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index f2b57f9..4947911 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -91,12 +91,23 @@ struct pcc_matcher { struct qstr *pm_name; }; +enum pcc_dataset_flags { + PCC_DATASET_NONE = 0x0, + /* Try auto attach at open, disabled by default */ + PCC_DATASET_OPEN_ATTACH = 0x1, + /* PCC backend is only used for RW-PCC */ + PCC_DATASET_RWPCC = 0x2, + /* PCC backend is only used for RO-PCC */ + PCC_DATASET_ROPCC = 0x4, + /* PCC backend provides caching services for both RW-PCC and RO-PCC */ + PCC_DATASET_PCC_ALL = PCC_DATASET_RWPCC | PCC_DATASET_ROPCC, +}; + struct pcc_dataset { u32 pccd_rwid; /* Archive ID */ u32 pccd_roid; /* Readonly ID */ struct pcc_match_rule pccd_rule; /* Match rule */ - u32 pccd_rwonly:1, /* Only use as RW-PCC */ - pccd_roonly:1; /* Only use as RO-PCC */ + enum pcc_dataset_flags pccd_flags; /* flags of PCC backend */ char pccd_pathname[PATH_MAX]; /* full path */ struct path pccd_path; /* Root path */ struct list_head pccd_linkage; /* Linked to pccs_datasets */ @@ -105,7 +116,7 @@ struct pcc_dataset { struct pcc_super { /* Protect pccs_datasets */ - spinlock_t pccs_lock; + struct rw_semaphore pccs_rw_sem; /* List of datasets */ struct list_head pccs_datasets; /* creds of process who forced instantiation of super block */ @@ -158,6 +169,7 @@ struct pcc_cmd { u32 pccc_roid; struct list_head pccc_conds; char *pccc_conds_str; + enum pcc_dataset_flags pccc_flags; } pccc_add; struct pcc_cmd_del { u32 pccc_pad; diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 27e0ca5..792d946 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -2049,6 +2049,7 @@ static int lov_object_layout_get(const struct lu_env *env, cl->cl_size = lov_comp_md_size(lsm); cl->cl_layout_gen = lsm->lsm_layout_gen; cl->cl_dom_comp_size = 0; + cl->cl_is_released = lsm->lsm_is_released; if (lsm_is_composite(lsm->lsm_magic)) { struct lov_stripe_md_entry *lsme = lsm->lsm_entries[0]; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index b024a44..2f9687e 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -2104,6 +2104,8 @@ enum lu_pcc_state_flags { PCC_STATE_FL_ATTR_VALID = 0x01, /* The file is being attached into PCC */ PCC_STATE_FL_ATTACHING = 0x02, + /* Allow to auto attach at open */ + PCC_STATE_FL_OPEN_ATTACH = 0x04, }; struct lu_pcc_state { From patchwork Thu Feb 27 21:13:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C15492A for ; Thu, 27 Feb 2020 21:36:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 24A3324690 for ; Thu, 27 Feb 2020 21:36:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24A3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 211863495FD; Thu, 27 Feb 2020 13:30:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 317F521FD51 for ; Thu, 27 Feb 2020 13:20:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 939268AA2; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 91BF146C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:51 -0500 Message-Id: <1582838290-17243-364-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 363/622] lustre: pcc: change detach behavior and add keep option X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin After introduce the feature of auto-attach at open, when the PCC cached file is detach by "pcc detach" command, it will be attached automatically at the next open. This may be not what the user wants. To solve this problem, we change the default detach behavior and add an option "--keep|-k" for the detach of RW-PCC. The manual "lfs pcc detach" command will detach the file from PCC permanently. And it will also remove the PCC copy by default. When the file is detached with "keep" option, it only unmaps the relationship between the file inode and PCC copy, but keep the PCC copy. The file is allowed to be attached automatically at the next open when the file is still valid in cache. Note here that currently auto detach caused by inode reclaim or revocation of the layout lock would not delete the PCC copy too. WC-bug-id: https://jira.whamcloud.com/browse/LU-10092 Lustre-commit: 2dadefb4148f ("LU-10092 pcc: change detach behavior and add keep option") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/33844 Reviewed-by: Patrick Farrell Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 6 +-- fs/lustre/llite/file.c | 33 +++++++++++++--- fs/lustre/llite/pcc.c | 68 ++++++++++++++++++++++++++++++--- fs/lustre/llite/pcc.h | 2 +- include/uapi/linux/lustre/lustre_user.h | 16 ++++++-- 5 files changed, 107 insertions(+), 18 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 1f7ed32..2c39579 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1918,7 +1918,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case FS_IOC_FSSETXATTR: return ll_ioctl_fssetxattr(inode, cmd, arg); case LL_IOC_PCC_DETACH_BY_FID: { - struct lu_pcc_detach *detach; + struct lu_pcc_detach_fid *detach; struct lu_fid *fid; struct inode *inode2; unsigned long ino; @@ -1928,7 +1928,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return -ENOMEM; if (copy_from_user(detach, - (const struct lu_pcc_detach __user *)arg, + (const struct lu_pcc_detach_fid __user *)arg, sizeof(*detach))) { rc = -EFAULT; goto out_detach; @@ -1955,7 +1955,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto out_iput; } - rc = pcc_ioctl_detach(inode2); + rc = pcc_ioctl_detach(inode2, detach->pccd_opt); out_iput: iput(inode2); out_detach: diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 96311ad..a27c06c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3732,14 +3732,35 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags) rc = ll_heat_set(inode, flags); return rc; } - case LL_IOC_PCC_DETACH: - if (!S_ISREG(inode->i_mode)) - return -EINVAL; + case LL_IOC_PCC_DETACH: { + struct lu_pcc_detach *detach; - if (!inode_owner_or_capable(inode)) - return -EPERM; + detach = kzalloc(sizeof(*detach), GFP_KERNEL); + if (!detach) + return -ENOMEM; + + if (copy_from_user(detach, + (const struct lu_pcc_detach __user *)arg, + sizeof(*detach))) { + rc = -EFAULT; + goto out_detach_free; + } + + if (!S_ISREG(inode->i_mode)) { + rc = -EINVAL; + goto out_detach_free; + } - return pcc_ioctl_detach(inode); + if (!inode_owner_or_capable(inode)) { + rc = -EPERM; + goto out_detach_free; + } + + rc = pcc_ioctl_detach(inode, detach->pccd_opt); +out_detach_free: + kfree(detach); + return rc; + } case LL_IOC_PCC_STATE: { struct lu_pcc_state __user *ustate = (struct lu_pcc_state __user *)arg; diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index fc4a2a3..c8c2442 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -1002,6 +1002,7 @@ static void pcc_inode_init(struct pcc_inode *pcci, struct ll_inode_info *lli) { pcci->pcci_lli = lli; lli->lli_pcc_inode = pcci; + lli->lli_pcc_state = PCC_STATE_FL_NONE; atomic_set(&pcci->pcci_refcount, 0); pcci->pcci_type = LU_PCC_NONE; pcci->pcci_layout_gen = CL_LAYOUT_GEN_NONE; @@ -1715,8 +1716,9 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, CDEBUG(D_MMAP, "%s: PCC backend fs not support ->page_mkwrite()\n", ll_i2sbi(inode)->ll_fsname); - pcc_ioctl_detach(inode); + pcc_ioctl_detach(inode, PCC_DETACH_OPT_NONE); up_read(&mm->mmap_sem); + *cached = true; return VM_FAULT_RETRY | VM_FAULT_NOPAGE; } /* Pause to allow for a race with concurrent detach */ @@ -1755,7 +1757,7 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, */ if (OBD_FAIL_CHECK(OBD_FAIL_LLITE_PCC_DETACH_MKWRITE)) { pcc_io_fini(inode); - pcc_ioctl_detach(inode); + pcc_ioctl_detach(inode, PCC_DETACH_OPT_NONE); up_read(&mm->mmap_sem); return VM_FAULT_RETRY | VM_FAULT_NOPAGE; } @@ -2243,10 +2245,51 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, return rc; } -int pcc_ioctl_detach(struct inode *inode) +static int pcc_hsm_remove(struct inode *inode) +{ + struct hsm_user_request *hur; + u32 gen; + int len; + int rc; + + rc = ll_layout_restore(inode, 0, OBD_OBJECT_EOF); + if (rc) { + CDEBUG(D_CACHE, DFID" RESTORE failure: %d\n", + PFID(&ll_i2info(inode)->lli_fid), rc); + return rc; + } + + ll_layout_refresh(inode, &gen); + + len = sizeof(struct hsm_user_request) + + sizeof(struct hsm_user_item); + hur = kzalloc(len, GFP_NOFS); + if (!hur) + return -ENOMEM; + + hur->hur_request.hr_action = HUA_REMOVE; + hur->hur_request.hr_archive_id = 0; + hur->hur_request.hr_flags = 0; + memcpy(&hur->hur_user_item[0].hui_fid, &ll_i2info(inode)->lli_fid, + sizeof(hur->hur_user_item[0].hui_fid)); + hur->hur_user_item[0].hui_extent.offset = 0; + hur->hur_user_item[0].hui_extent.length = OBD_OBJECT_EOF; + hur->hur_request.hr_itemcount = 1; + rc = obd_iocontrol(LL_IOC_HSM_REQUEST, ll_i2sbi(inode)->ll_md_exp, + len, hur, NULL); + if (rc) + CDEBUG(D_CACHE, DFID" HSM REMOVE failure: %d\n", + PFID(&ll_i2info(inode)->lli_fid), rc); + + kfree(hur); + return rc; +} + +int pcc_ioctl_detach(struct inode *inode, u32 opt) { struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci; + bool hsm_remove = false; int rc = 0; pcc_inode_lock(inode); @@ -2255,11 +2298,26 @@ int pcc_ioctl_detach(struct inode *inode) !pcc_inode_has_layout(pcci)) goto out_unlock; - __pcc_layout_invalidate(pcci); - pcc_inode_put(pcci); + LASSERT(atomic_read(&pcci->pcci_refcount) > 0); + + if (pcci->pcci_type == LU_PCC_READWRITE) { + if (opt == PCC_DETACH_OPT_UNCACHE) + hsm_remove = true; + + __pcc_layout_invalidate(pcci); + pcc_inode_put(pcci); + } out_unlock: pcc_inode_unlock(inode); + if (hsm_remove) { + const struct cred *old_cred; + + old_cred = override_creds(pcc_super_cred(inode->i_sb)); + rc = pcc_hsm_remove(inode); + revert_creds(old_cred); + } + return rc; } diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index 4947911..c00cb0b 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -187,7 +187,7 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, u32 gen, bool lease_broken, int rc, bool attached); -int pcc_ioctl_detach(struct inode *inode); +int pcc_ioctl_detach(struct inode *inode, u32 opt); int pcc_ioctl_state(struct file *file, struct inode *inode, struct lu_pcc_state *state); void pcc_file_init(struct pcc_file *pccf); diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 2f9687e..317b236 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -357,8 +357,8 @@ struct ll_ioc_lease_id { #define LL_IOC_LADVISE _IOR('f', 250, struct llapi_lu_ladvise) #define LL_IOC_HEAT_GET _IOWR('f', 251, struct lu_heat) #define LL_IOC_HEAT_SET _IOW('f', 251, __u64) -#define LL_IOC_PCC_DETACH _IO('f', 252) -#define LL_IOC_PCC_DETACH_BY_FID _IOW('f', 252, struct lu_pcc_detach) +#define LL_IOC_PCC_DETACH _IOW('f', 252, struct lu_pcc_detach) +#define LL_IOC_PCC_DETACH_BY_FID _IOW('f', 252, struct lu_pcc_detach_fid) #define LL_IOC_PCC_STATE _IOR('f', 252, struct lu_pcc_state) #define LL_STATFS_LMV 1 @@ -2093,9 +2093,19 @@ struct lu_pcc_attach { __u32 pcca_id; /* archive ID for readwrite, group ID for readonly */ }; -struct lu_pcc_detach { +enum lu_pcc_detach_opts { + PCC_DETACH_OPT_NONE = 0, /* Detach only, keep the PCC copy */ + PCC_DETACH_OPT_UNCACHE, /* Remove the cached file after detach */ +}; + +struct lu_pcc_detach_fid { /* fid of the file to detach */ struct lu_fid pccd_fid; + __u32 pccd_opt; +}; + +struct lu_pcc_detach { + __u32 pccd_opt; }; enum lu_pcc_state_flags { From patchwork Thu Feb 27 21:13:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD696138D for ; Thu, 27 Feb 2020 21:36:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C50DC24690 for ; Thu, 27 Feb 2020 21:36:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C50DC24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A583334A18A; Thu, 27 Feb 2020 13:30:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 86ECD21FCFB for ; Thu, 27 Feb 2020 13:20:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 961D18AA3; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9499D46D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:52 -0500 Message-Id: <1582838290-17243-365-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 364/622] lustre: lov: return error if cl_env_get fails X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff When cl_env_get() fails with an error return the error. Cray-bug-id: LUS-7310 WC-bug-id: https://jira.whamcloud.com/browse/LU-12436 Lustre-commit: a7997c836bbf ("LU-12436 lov: return error if cl_env_get fails") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35229 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 5b28793..9cdfca1 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -120,8 +120,10 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio, /* obtain new environment */ sub->sub_env = cl_env_get(&sub->sub_refcheck); - if (IS_ERR(sub->sub_env)) + if (IS_ERR(sub->sub_env)) { rc = PTR_ERR(sub->sub_env); + return rc; + } sub_obj = lovsub2cl(lov_r0(lov, index)->lo_sub[stripe]); sub_io = &sub->sub_io; From patchwork Thu Feb 27 21:13:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410329 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5F0B92A for ; Thu, 27 Feb 2020 21:35:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8E76B24677 for ; Thu, 27 Feb 2020 21:35:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E76B24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9413134A061; Thu, 27 Feb 2020 13:29:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA63B21FB9D for ; Thu, 27 Feb 2020 13:20:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 98FC98AA4; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 975E0468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:53 -0500 Message-Id: <1582838290-17243-366-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 365/622] lustre: ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman Add rq_no_reply flag to the DEBUG_REQ_FLAGS macro for debug purposes Also, add another debug message to check_write_rcs WC-bug-id: https://jira.whamcloud.com/browse/LU-12333 Lustre-commit: 3e43d06810e6 ("LU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro") Signed-off-by: Vitaly Fertman Reviewed-on: https://review.whamcloud.com/35090 Reviewed-by: Andreas Dilger Reviewed-by: Chris Horn Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 4 ++-- fs/lustre/osc/osc_request.c | 5 ++++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 383d59e..7ed2d99 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1066,7 +1066,7 @@ static inline void lustre_set_rep_swabbed(struct ptlrpc_request *req, FLAG(req->rq_err, "E"), FLAG(req->rq_net_err, "e"), \ FLAG(req->rq_timedout, "X") /* eXpired */, FLAG(req->rq_resend, "S"), \ FLAG(req->rq_restart, "T"), FLAG(req->rq_replay, "P"), \ - FLAG(req->rq_no_resend, "N"), \ + FLAG(req->rq_no_resend, "N"), FLAG(req->rq_no_reply, "n"), \ FLAG(req->rq_waiting, "W"), \ FLAG(req->rq_wait_ctx, "C"), FLAG(req->rq_hp, "H"), \ FLAG(req->rq_committed, "M"), \ @@ -1074,7 +1074,7 @@ static inline void lustre_set_rep_swabbed(struct ptlrpc_request *req, FLAG(req->rq_reply_unlinked, "U"), \ FLAG(req->rq_receiving_reply, "r") -#define REQ_FLAGS_FMT "%s:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s" +#define REQ_FLAGS_FMT "%s:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s" void _debug_req(struct ptlrpc_request *req, struct libcfs_debug_msg_data *data, const char *fmt, ...) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index f929908..6b066e5 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1064,8 +1064,11 @@ static int check_write_rcs(struct ptlrpc_request *req, /* return error if any niobuf was in error */ for (i = 0; i < niocount; i++) { - if ((int)remote_rcs[i] < 0) + if ((int)remote_rcs[i] < 0) { + CDEBUG(D_INFO, "rc[%d]: %d req %p\n", + i, remote_rcs[i], req); return remote_rcs[i]; + } if (remote_rcs[i] != 0) { CDEBUG(D_INFO, "rc[%d] invalid (%d) req %p\n", From patchwork Thu Feb 27 21:13:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA19217E0 for ; Thu, 27 Feb 2020 21:43:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C320E24690 for ; Thu, 27 Feb 2020 21:43:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C320E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1313E349314; Thu, 27 Feb 2020 13:35:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 184F921FB16 for ; Thu, 27 Feb 2020 13:20:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9B8D18AA5; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9A19D47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:54 -0500 Message-Id: <1582838290-17243-367-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 366/622] lustre: ldlm: layout lock fixes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman as the intent_layout operation becomes more frequent with SEL, cancel existent layout locks in advance and reuse ELC to deliver cancels to MDS as clients are given LCK_EX layout locks, take into account this mode as well in ldlm_lock_match Cray-bug-id: LUS-2528 WC-bug-id: https://jira.whamcloud.com/browse/LU-10070 Lustre-commit: 51f23ffa4dae ("LU-10070 ldlm: layout lock fixes") Signed-off-by: Vitaly Fertman Reviewed-on: https://review.whamcloud.com/35232 Reviewed-by: Patrick Farrell Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 3 ++- fs/lustre/mdc/mdc_locks.c | 12 ++++++++++-- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a27c06c..9321b84 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4978,7 +4978,8 @@ int ll_layout_refresh(struct inode *inode, u32 *gen) * match it before grabbing layout lock mutex. */ mode = ll_take_md_lock(inode, MDS_INODELOCK_LAYOUT, &lockh, 0, - LCK_CR | LCK_CW | LCK_PR | LCK_PW); + LCK_CR | LCK_CW | LCK_PR | + LCK_PW | LCK_EX); if (mode != 0) { /* hit cached lock */ rc = ll_layout_lock_set(&lockh, mode, inode); if (rc == -EAGAIN) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index cf6bc9d..5885bbd 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -580,18 +580,26 @@ static struct ptlrpc_request *mdc_intent_layout_pack(struct obd_export *exp, struct md_op_data *op_data) { struct obd_device *obd = class_exp2obd(exp); + struct list_head cancels = LIST_HEAD_INIT(cancels); struct ptlrpc_request *req; struct ldlm_intent *lit; struct layout_intent *layout; - int rc; + int count = 0, rc; req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_LDLM_INTENT_LAYOUT); if (!req) return ERR_PTR(-ENOMEM); + if (fid_is_sane(&op_data->op_fid2) && (it->it_op & IT_LAYOUT) && + (it->it_flags & FMODE_WRITE)) { + count = mdc_resource_get_unused(exp, &op_data->op_fid2, + &cancels, LCK_EX, + MDS_INODELOCK_LAYOUT); + } + req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT, 0); - rc = ldlm_prep_enqueue_req(exp, req, NULL, 0); + rc = ldlm_prep_enqueue_req(exp, req, &cancels, count); if (rc) { ptlrpc_request_free(req); return ERR_PTR(rc); From patchwork Thu Feb 27 21:13:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B001092A for ; Thu, 27 Feb 2020 21:37:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9883024690 for ; Thu, 27 Feb 2020 21:37:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9883024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8915034A1B8; Thu, 27 Feb 2020 13:30:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AFD221FB16 for ; Thu, 27 Feb 2020 13:20:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9E1438AA6; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9CC6546A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:55 -0500 Message-Id: <1582838290-17243-368-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 367/622] lnet: Do not allow gateways on remote nets X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn A gateway needs to be reachable over some local interface. WC-bug-id: https://jira.whamcloud.com/browse/LU-12411 Lustre-commit: 43b35351e9ca ("LU-12411 lnet: Do not allow gateways on remote nets") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35198 Reviewed-by: Amir Shehata Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 81f7a94..f7b53e0 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -436,6 +436,13 @@ static void lnet_shuffle_seed(void) if (lnet_islocalnet(net)) return -EEXIST; + if (!lnet_islocalnet(LNET_NIDNET(gateway))) { + CERROR("Cannot add route with gateway %s. There is no local interface configured on LNet %s\n", + libcfs_nid2str(gateway), + libcfs_net2str(LNET_NIDNET(gateway))); + return -EINVAL; + } + /* Assume net, route, all new */ route = kzalloc(sizeof(*route), GFP_NOFS); rnet = kzalloc(sizeof(*rnet), GFP_NOFS); From patchwork Thu Feb 27 21:13:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DEBB138D for ; Thu, 27 Feb 2020 21:37:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA4C324690 for ; Thu, 27 Feb 2020 21:37:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA4C324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D0A734A1CB; Thu, 27 Feb 2020 13:30:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9C64A21FB4A for ; Thu, 27 Feb 2020 13:20:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A0B8F8AA7; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9F80146C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:56 -0500 Message-Id: <1582838290-17243-369-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 368/622] lustre: osc: reduce lock contention in osc_unreserve_grant X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang In osc_queue_async_io() the cl_loi_list_lock is acquired to reserve and consume the grant and released, right after we expand the extent the same lock is used to unreserve the grant. We can keep the spinlock when we are done with the grant to improve the throughput. mpirun -np 32 /root/ior-openmpi/src/ior -w -t 1m -b 8g -F -e -vv -o /scratch0/file -i 1 master: Max Write: 13799.70 MiB/sec (14470.04 MB/sec) master with 33858: Max Write: 14339.57 MiB/sec (15036.13 MB/sec) WC-bug-id: https://jira.whamcloud.com/browse/LU-11775 Lustre-commit: 8a1ae45a3e4f ("LU-11775 osc: reduce lock contention in osc_unreserve_grant") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/33858 Reviewed-by: Patrick Farrell Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 8ffd8f9..3b4c598 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -636,6 +636,7 @@ void osc_extent_release(const struct lu_env *env, struct osc_extent *ext) */ osc_extent_state_set(ext, OES_TRUNC); ext->oe_trunc_pending = 0; + osc_object_unlock(obj); } else { int grant = 0; @@ -648,8 +649,6 @@ void osc_extent_release(const struct lu_env *env, struct osc_extent *ext) grant += cli->cl_grant_extent_tax; if (!osc_extent_merge(env, ext, next_extent(ext))) grant += cli->cl_grant_extent_tax; - if (grant > 0) - osc_unreserve_grant(cli, 0, grant); if (ext->oe_urgent) list_move_tail(&ext->oe_link, @@ -658,8 +657,10 @@ void osc_extent_release(const struct lu_env *env, struct osc_extent *ext) list_move_tail(&ext->oe_link, &obj->oo_full_exts); } + osc_object_unlock(obj); + if (grant > 0) + osc_unreserve_grant(cli, 0, grant); } - osc_object_unlock(obj); osc_io_unplug_async(env, cli, obj); } @@ -1483,13 +1484,20 @@ static void __osc_unreserve_grant(struct client_obd *cli, } } -static void osc_unreserve_grant(struct client_obd *cli, - unsigned int reserved, unsigned int unused) +static void osc_unreserve_grant_nolock(struct client_obd *cli, + unsigned int reserved, + unsigned int unused) { - spin_lock(&cli->cl_loi_list_lock); __osc_unreserve_grant(cli, reserved, unused); if (unused > 0) osc_wake_cache_waiters(cli); +} + +static void osc_unreserve_grant(struct client_obd *cli, + unsigned int reserved, unsigned int unused) +{ + spin_lock(&cli->cl_loi_list_lock); + osc_unreserve_grant_nolock(cli, reserved, unused); spin_unlock(&cli->cl_loi_list_lock); } @@ -2385,7 +2393,6 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, grants = 0; need_release = true; } - spin_unlock(&cli->cl_loi_list_lock); if (!need_release && ext->oe_end < index) { tmp = grants; /* try to expand this extent */ @@ -2396,10 +2403,11 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, } else { OSC_EXTENT_DUMP(D_CACHE, ext, "expanded for %lu.\n", index); - osc_unreserve_grant(cli, grants, tmp); + osc_unreserve_grant_nolock(cli, grants, tmp); grants = 0; } } + spin_unlock(&cli->cl_loi_list_lock); rc = 0; } else if (ext) { /* index is located outside of active extent */ From patchwork Thu Feb 27 21:13:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C3EB92A for ; Thu, 27 Feb 2020 21:37:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 81CF824690 for ; Thu, 27 Feb 2020 21:37:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81CF824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 60BFE34A21B; Thu, 27 Feb 2020 13:30:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 000FD21FD65 for ; Thu, 27 Feb 2020 13:20:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A37328AA8; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A236B46D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:57 -0500 Message-Id: <1582838290-17243-370-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 369/622] lnet: Change static defines to use macro for module.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch replaces mutex which are defined statically in file net/lnet/lnet/module.c with kernel provided macro. WC-bug-id: https://jira.whamcloud.com/browse/LU-9010 Lustre-commit: bb967468875f ("LU-9010 lnet: Change static defines to use macro for module.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/33932 Reviewed-by: Andreas Dilger Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/module.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/lnet/lnet/module.c b/net/lnet/lnet/module.c index 95e1bae..5905f38 100644 --- a/net/lnet/lnet/module.c +++ b/net/lnet/lnet/module.c @@ -40,7 +40,7 @@ module_param(config_on_load, int, 0444); MODULE_PARM_DESC(config_on_load, "configure network at module load"); -static struct mutex lnet_config_mutex; +static DEFINE_MUTEX(lnet_config_mutex); static int lnet_configure(void *arg) @@ -235,8 +235,6 @@ static int __init lnet_init(void) { int rc; - mutex_init(&lnet_config_mutex); - rc = libcfs_setup(); if (rc) return rc; From patchwork Thu Feb 27 21:13:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D73A317E0 for ; Thu, 27 Feb 2020 21:43:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C04C8246A1 for ; Thu, 27 Feb 2020 21:43:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C04C8246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 98E98349614; Thu, 27 Feb 2020 13:35:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4295F21FD65 for ; Thu, 27 Feb 2020 13:20:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A69CB8AA9; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A4EB9468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:58 -0500 Message-Id: <1582838290-17243-371-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 370/622] lustre: llite, readahead: don't always use max RPC size X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Since 64M RPC landed, @PTLRPC_MAX_BRW_PAGES will be 64M. And we always try to use this max possible RPC size to check whether we should avoid fast IO and trigger real context IO. This is not good for following reasons: (1) Since current default RPC size is still 4M, most of system won't use 64M for most of time. (2) Currently default readahead size per file is still 64M, which makes fast IO always run out of all readahead pages before next IO. This breaks what users really want for readahead grapping pages in advance. To fix this problem, we use 16M as a balance value if RPC smaller than 16M, patch also fix the problem that @ras_rpc_size could not grow bigger which is possibe in the following case: 1) set RPC to 16M 2) Set RPC to 64M In the current logic ras->ras_rpc_size will be kept as 16M which is wrong. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: 7864a6854c3d ("LU-12043 llite,readahead: don't always use max RPC size") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35033 Reviewed-by: Li Xi Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 3 +++ fs/lustre/llite/rw.c | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d36e01e..36b620e 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -307,6 +307,9 @@ static inline struct pcc_inode *ll_i2pcci(struct inode *inode) return ll_i2info(inode)->lli_pcc_inode; } +/* default to use at least 16M for fast read if possible */ +#define RA_REMAIN_WINDOW_MIN MiB_TO_PAGES(16UL) + /* default to about 64M of readahead on a given system. */ #define SBI_DEFAULT_READAHEAD_MAX MiB_TO_PAGES(64UL) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index c42bbab..ad55695 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -376,7 +376,7 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) * update read ahead RPC size. * NB: it's racy but doesn't matter */ - if (ras->ras_rpc_size > ra.cra_rpc_size && + if (ras->ras_rpc_size != ra.cra_rpc_size && ra.cra_rpc_size > 0) ras->ras_rpc_size = ra.cra_rpc_size; /* trim it to align with optimal RPC size */ @@ -1203,6 +1203,8 @@ int ll_readpage(struct file *file, struct page *vmpage) struct ll_readahead_state *ras = &fd->fd_ras; struct lu_env *local_env = NULL; struct inode *inode = file_inode(file); + unsigned long fast_read_pages = + max(RA_REMAIN_WINDOW_MIN, ras->ras_rpc_size); struct vvp_page *vpg; result = -ENODATA; @@ -1245,7 +1247,7 @@ int ll_readpage(struct file *file, struct page *vmpage) * a cl_io to issue the RPC. */ if (ras->ras_window_start + ras->ras_window_len < - ras->ras_next_readahead + PTLRPC_MAX_BRW_PAGES) { + ras->ras_next_readahead + fast_read_pages) { /* export the page and skip io stack */ vpg->vpg_ra_used = 1; cl_page_export(env, page, 1); From patchwork Thu Feb 27 21:13:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2B2317E0 for ; Thu, 27 Feb 2020 21:37:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D9C28246A1 for ; Thu, 27 Feb 2020 21:37:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9C28246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 71BBA34A247; Thu, 27 Feb 2020 13:30:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 97DBD21FD7C for ; Thu, 27 Feb 2020 13:20:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A90DC8AAA; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A7BD347C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:59 -0500 Message-Id: <1582838290-17243-372-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 371/622] lustre: llite: improve single-thread read performance X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Here is whole history: Currently, for sequential read IO, We grow up window size very quickly, and once we cached @max_readahead_per_file pages. For following command: dd if=/mnt/lustre/file of=/dev/null bs=1M We will do something like following: ... 64M bytes cached. fast io for 16M bytes readahead extra 16M to fill up window. fast io for 16M bytes readahead extra 16M to fill up window. .... In this way, we could only use fast IO for 16M bytes and then fall through non-fast IO mode. this is also reason that why increasing @max_readahead_per_file don't give us performances up, since this value only changes how much memory we cached in memory, during my testing whatever I changed the value, i could only get 2GB/s for single thread read. Actually, we could do this better, if we have used more than 16M bytes readahead pages, submit another readahead requests in the background. and ideally, we could always use fast IO. Test Patched Unpatched dd if=file of=/dev/null bs=1M. 4.0G/s 1.9G/s ior -np 192 r -t 1m -b 4g -F -e -vv -o /cache1/ior -k 11195.97 10817.02 MB/sec Tested with drop OSS and client memory before every run. max_readahead_per_mb=128M, RPC size is 16M. dd file's size is 400G which is double of memory or so. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: c2791674260b ("LU-12043 llite: improve single-thread read performance") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/34095 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 6 +- fs/lustre/llite/file.c | 3 +- fs/lustre/llite/llite_internal.h | 27 ++++ fs/lustre/llite/llite_lib.c | 17 ++- fs/lustre/llite/lproc_llite.c | 87 +++++++++++- fs/lustre/llite/rw.c | 277 +++++++++++++++++++++++++++++++++++---- fs/lustre/llite/vvp_io.c | 5 + fs/lustre/lov/lov_io.c | 1 + 8 files changed, 391 insertions(+), 32 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index d1c1413..5096025 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1891,7 +1891,11 @@ struct cl_io { * mirror is inaccessible, non-delay RPC would error out quickly so * that the upper layer can try to access the next mirror. */ - ci_ndelay:1; + ci_ndelay:1, + /** + * Set if IO is triggered by async workqueue readahead. + */ + ci_async_readahead:1; /** * How many times the read has retried before this one. * Set by the top level and consumed by the LOV. diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 9321b84..5d1cfa4 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1407,7 +1407,7 @@ static bool file_is_noatime(const struct file *file) return false; } -static void ll_io_init(struct cl_io *io, const struct file *file, int write) +void ll_io_init(struct cl_io *io, const struct file *file, int write) { struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct inode *inode = file_inode(file); @@ -1431,6 +1431,7 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write) } io->ci_noatime = file_is_noatime(file); + io->ci_async_readahead = false; /* FLR: only use non-delay I/O for read as there is only one * available mirror for write. diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 36b620e..8d95694 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -330,6 +330,8 @@ enum ra_stat { RA_STAT_MAX_IN_FLIGHT, RA_STAT_WRONG_GRAB_PAGE, RA_STAT_FAILED_REACH_END, + RA_STAT_ASYNC, + RA_STAT_FAILED_FAST_READ, _NR_RA_STAT, }; @@ -338,6 +340,16 @@ struct ll_ra_info { unsigned long ra_max_pages; unsigned long ra_max_pages_per_file; unsigned long ra_max_read_ahead_whole_pages; + struct workqueue_struct *ll_readahead_wq; + /* + * Max number of active works for readahead workqueue, + * default is 0 which make workqueue init number itself, + * unless there is a specific need for throttling the + * number of active work items, specifying '0' is recommended. + */ + unsigned int ra_async_max_active; + /* Threshold to control when to trigger async readahead */ + unsigned long ra_async_pages_per_file_threshold; }; /* ra_io_arg will be filled in the beginning of ll_readahead with @@ -656,6 +668,20 @@ struct ll_readahead_state { * stride read-ahead will be enable */ unsigned long ras_consecutive_stride_requests; + /* index of the last page that async readahead starts */ + unsigned long ras_async_last_readpage; +}; + +struct ll_readahead_work { + /** File to readahead */ + struct file *lrw_file; + /** Start bytes */ + unsigned long lrw_start; + /** End bytes */ + unsigned long lrw_end; + + /* async worker to handler read */ + struct work_struct lrw_readahead_work; }; extern struct kmem_cache *ll_file_data_slab; @@ -757,6 +783,7 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, struct ll_file_data *file, loff_t pos, size_t count, int rw); +void ll_io_init(struct cl_io *io, const struct file *file, int write); enum { LPROC_LL_DIRTY_HITS, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 5ac083c..33f7fdb 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -92,14 +92,25 @@ static struct ll_sb_info *ll_init_sbi(void) pages = si.totalram - si.totalhigh; lru_page_max = pages / 2; + sbi->ll_ra_info.ra_async_max_active = 0; + sbi->ll_ra_info.ll_readahead_wq = + alloc_workqueue("ll-readahead-wq", WQ_UNBOUND, + sbi->ll_ra_info.ra_async_max_active); + if (!sbi->ll_ra_info.ll_readahead_wq) { + rc = -ENOMEM; + goto out_pcc; + } + sbi->ll_cache = cl_cache_init(lru_page_max); if (!sbi->ll_cache) { rc = -ENOMEM; - goto out_pcc; + goto out_destroy_ra; } sbi->ll_ra_info.ra_max_pages_per_file = min(pages / 32, SBI_DEFAULT_READAHEAD_MAX); + sbi->ll_ra_info.ra_async_pages_per_file_threshold = + sbi->ll_ra_info.ra_max_pages_per_file; sbi->ll_ra_info.ra_max_pages = sbi->ll_ra_info.ra_max_pages_per_file; sbi->ll_ra_info.ra_max_read_ahead_whole_pages = -1; @@ -138,6 +149,8 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_heat_decay_weight = SBI_DEFAULT_HEAT_DECAY_WEIGHT; sbi->ll_heat_period_second = SBI_DEFAULT_HEAT_PERIOD_SECOND; return sbi; +out_destroy_ra: + destroy_workqueue(sbi->ll_ra_info.ll_readahead_wq); out_pcc: pcc_super_fini(&sbi->ll_pcc_super); out_sbi: @@ -151,6 +164,8 @@ static void ll_free_sbi(struct super_block *sb) if (!list_empty(&sbi->ll_squash.rsi_nosquash_nids)) cfs_free_nidlist(&sbi->ll_squash.rsi_nosquash_nids); + if (sbi->ll_ra_info.ll_readahead_wq) + destroy_workqueue(sbi->ll_ra_info.ll_readahead_wq); if (sbi->ll_cache) { cl_cache_decref(sbi->ll_cache); sbi->ll_cache = NULL; diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 8cb4983..02403e4 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1059,6 +1059,87 @@ static ssize_t tiny_write_store(struct kobject *kobj, } LUSTRE_RW_ATTR(tiny_write); +static ssize_t max_read_ahead_async_active_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", + sbi->ll_ra_info.ra_async_max_active); +} + +static ssize_t max_read_ahead_async_active_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + unsigned int val; + int rc; + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + rc = kstrtouint(buffer, 10, &val); + if (rc) + return rc; + + if (val < 1 || val > WQ_UNBOUND_MAX_ACTIVE) { + CERROR("%s: cannot set max_read_ahead_async_active=%u %s than %u\n", + sbi->ll_fsname, val, + val < 1 ? "smaller" : "larger", + val < 1 ? 1 : WQ_UNBOUND_MAX_ACTIVE); + return -ERANGE; + } + + sbi->ll_ra_info.ra_async_max_active = val; + workqueue_set_max_active(sbi->ll_ra_info.ll_readahead_wq, val); + + return count; +} +LUSTRE_RW_ATTR(max_read_ahead_async_active); + +static ssize_t read_ahead_async_file_threshold_mb_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%lu\n", + PAGES_TO_MiB(sbi->ll_ra_info.ra_async_pages_per_file_threshold)); +} + +static ssize_t +read_ahead_async_file_threshold_mb_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, size_t count) +{ + unsigned long pages_number; + unsigned long max_ra_per_file; + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + int rc; + + rc = kstrtoul(buffer, 10, &pages_number); + if (rc) + return rc; + + pages_number = MiB_TO_PAGES(pages_number); + max_ra_per_file = sbi->ll_ra_info.ra_max_pages_per_file; + if (pages_number < 0 || pages_number > max_ra_per_file) { + CERROR("%s: can't set read_ahead_async_file_threshold_mb=%lu > max_read_readahead_per_file_mb=%lu\n", + sbi->ll_fsname, + PAGES_TO_MiB(pages_number), + PAGES_TO_MiB(max_ra_per_file)); + return -ERANGE; + } + sbi->ll_ra_info.ra_async_pages_per_file_threshold = pages_number; + + return count; +} +LUSTRE_RW_ATTR(read_ahead_async_file_threshold_mb); + static ssize_t fast_read_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -1407,6 +1488,8 @@ struct lprocfs_vars lprocfs_llite_obd_vars[] = { &lustre_attr_file_heat.attr, &lustre_attr_heat_decay_percentage.attr, &lustre_attr_heat_period_second.attr, + &lustre_attr_max_read_ahead_async_active.attr, + &lustre_attr_read_ahead_async_file_threshold_mb.attr, NULL, }; @@ -1505,7 +1588,9 @@ void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, int count) [RA_STAT_EOF] = "read-ahead to EOF", [RA_STAT_MAX_IN_FLIGHT] = "hit max r-a issue", [RA_STAT_WRONG_GRAB_PAGE] = "wrong page from grab_cache_page", - [RA_STAT_FAILED_REACH_END] = "failed to reach end" + [RA_STAT_FAILED_REACH_END] = "failed to reach end", + [RA_STAT_ASYNC] = "async readahead", + [RA_STAT_FAILED_FAST_READ] = "failed to fast read", }; int ll_debugfs_register_super(struct super_block *sb, const char *name) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index ad55695..bec26c4 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -45,6 +45,7 @@ #include #include +#include #include /* current_is_kswapd() */ #include @@ -129,16 +130,17 @@ void ll_ra_stats_inc(struct inode *inode, enum ra_stat which) } #define RAS_CDEBUG(ras) \ - CDEBUG(D_READA, \ + CDEBUG(D_READA, \ "lrp %lu cr %lu cp %lu ws %lu wl %lu nra %lu rpc %lu " \ - "r %lu ri %lu csr %lu sf %lu sp %lu sl %lu\n", \ - ras->ras_last_readpage, ras->ras_consecutive_requests, \ - ras->ras_consecutive_pages, ras->ras_window_start, \ - ras->ras_window_len, ras->ras_next_readahead, \ + "r %lu ri %lu csr %lu sf %lu sp %lu sl %lu lr %lu\n", \ + ras->ras_last_readpage, ras->ras_consecutive_requests, \ + ras->ras_consecutive_pages, ras->ras_window_start, \ + ras->ras_window_len, ras->ras_next_readahead, \ ras->ras_rpc_size, \ - ras->ras_requests, ras->ras_request_index, \ + ras->ras_requests, ras->ras_request_index, \ ras->ras_consecutive_stride_requests, ras->ras_stride_offset, \ - ras->ras_stride_pages, ras->ras_stride_length) + ras->ras_stride_pages, ras->ras_stride_length, \ + ras->ras_async_last_readpage) static int index_in_window(unsigned long index, unsigned long point, unsigned long before, unsigned long after) @@ -432,13 +434,177 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) return count; } +static void ll_readahead_work_free(struct ll_readahead_work *work) +{ + fput(work->lrw_file); + kfree(work); +} + +static void ll_readahead_handle_work(struct work_struct *wq); +static void ll_readahead_work_add(struct inode *inode, + struct ll_readahead_work *work) +{ + INIT_WORK(&work->lrw_readahead_work, ll_readahead_handle_work); + queue_work(ll_i2sbi(inode)->ll_ra_info.ll_readahead_wq, + &work->lrw_readahead_work); +} + +static int ll_readahead_file_kms(const struct lu_env *env, + struct cl_io *io, u64 *kms) +{ + struct cl_object *clob; + struct inode *inode; + struct cl_attr *attr = vvp_env_thread_attr(env); + int ret; + + clob = io->ci_obj; + inode = vvp_object_inode(clob); + + cl_object_attr_lock(clob); + ret = cl_object_attr_get(env, clob, attr); + cl_object_attr_unlock(clob); + + if (ret != 0) + return ret; + + *kms = attr->cat_kms; + return 0; +} + +static void ll_readahead_handle_work(struct work_struct *wq) +{ + struct ll_readahead_work *work; + struct lu_env *env; + u16 refcheck; + struct ra_io_arg *ria; + struct inode *inode; + struct ll_file_data *fd; + struct ll_readahead_state *ras; + struct cl_io *io; + struct cl_2queue *queue; + pgoff_t ra_end = 0; + unsigned long len, mlen = 0; + struct file *file; + u64 kms; + int rc; + unsigned long end_index; + + work = container_of(wq, struct ll_readahead_work, + lrw_readahead_work); + fd = LUSTRE_FPRIVATE(work->lrw_file); + ras = &fd->fd_ras; + file = work->lrw_file; + inode = file_inode(file); + + env = cl_env_alloc(&refcheck, LCT_NOREF); + if (IS_ERR(env)) { + rc = PTR_ERR(env); + goto out_free_work; + } + + io = vvp_env_thread_io(env); + ll_io_init(io, file, 0); + + rc = ll_readahead_file_kms(env, io, &kms); + if (rc != 0) + goto out_put_env; + + if (kms == 0) { + ll_ra_stats_inc(inode, RA_STAT_ZERO_LEN); + rc = 0; + goto out_put_env; + } + + ria = &ll_env_info(env)->lti_ria; + memset(ria, 0, sizeof(*ria)); + + ria->ria_start = work->lrw_start; + /* Truncate RA window to end of file */ + end_index = (unsigned long)((kms - 1) >> PAGE_SHIFT); + if (end_index <= work->lrw_end) { + work->lrw_end = end_index; + ria->ria_eof = true; + } + if (work->lrw_end <= work->lrw_start) { + rc = 0; + goto out_put_env; + } + + ria->ria_end = work->lrw_end; + len = ria->ria_end - ria->ria_start + 1; + ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, + ria_page_count(ria), mlen); + + CDEBUG(D_READA, + "async reserved pages: %lu/%lu/%lu, ra_cur %d, ra_max %lu\n", + ria->ria_reserved, len, mlen, + atomic_read(&ll_i2sbi(inode)->ll_ra_info.ra_cur_pages), + ll_i2sbi(inode)->ll_ra_info.ra_max_pages); + + if (ria->ria_reserved < len) { + ll_ra_stats_inc(inode, RA_STAT_MAX_IN_FLIGHT); + if (PAGES_TO_MiB(ria->ria_reserved) < 1) { + ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved); + rc = 0; + goto out_put_env; + } + } + + rc = cl_io_rw_init(env, io, CIT_READ, ria->ria_start, len); + if (rc) + goto out_put_env; + + vvp_env_io(env)->vui_fd = fd; + io->ci_state = CIS_LOCKED; + io->ci_async_readahead = true; + rc = cl_io_start(env, io); + if (rc) + goto out_io_fini; + + queue = &io->ci_queue; + cl_2queue_init(queue); + + rc = ll_read_ahead_pages(env, io, &queue->c2_qin, ras, ria, &ra_end); + if (ria->ria_reserved != 0) + ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved); + if (queue->c2_qin.pl_nr > 0) { + int count = queue->c2_qin.pl_nr; + + rc = cl_io_submit_rw(env, io, CRT_READ, queue); + if (rc == 0) + task_io_account_read(PAGE_SIZE * count); + } + if (ria->ria_end == ra_end && ra_end == (kms >> PAGE_SHIFT)) + ll_ra_stats_inc(inode, RA_STAT_EOF); + + if (ra_end != ria->ria_end) + ll_ra_stats_inc(inode, RA_STAT_FAILED_REACH_END); + + /* TODO: discard all pages until page reinit route is implemented */ + cl_page_list_discard(env, io, &queue->c2_qin); + + /* Unlock unsent read pages in case of error. */ + cl_page_list_disown(env, io, &queue->c2_qin); + + cl_2queue_fini(env, queue); +out_io_fini: + cl_io_end(env, io); + cl_io_fini(env, io); +out_put_env: + cl_env_put(env, &refcheck); +out_free_work: + if (ra_end > 0) + ll_ra_stats_inc_sbi(ll_i2sbi(inode), RA_STAT_ASYNC); + ll_readahead_work_free(work); +} + static int ll_readahead(const struct lu_env *env, struct cl_io *io, struct cl_page_list *queue, - struct ll_readahead_state *ras, bool hit) + struct ll_readahead_state *ras, bool hit, + struct file *file) { struct vvp_io *vio = vvp_env_io(env); struct ll_thread_info *lti = ll_env_info(env); - struct cl_attr *attr = vvp_env_thread_attr(env); unsigned long len, mlen = 0; pgoff_t ra_end = 0, start = 0, end = 0; struct inode *inode; @@ -451,14 +617,10 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, inode = vvp_object_inode(clob); memset(ria, 0, sizeof(*ria)); - - cl_object_attr_lock(clob); - ret = cl_object_attr_get(env, clob, attr); - cl_object_attr_unlock(clob); - + ret = ll_readahead_file_kms(env, io, &kms); if (ret != 0) return ret; - kms = attr->cat_kms; + if (kms == 0) { ll_ra_stats_inc(inode, RA_STAT_ZERO_LEN); return 0; @@ -1141,7 +1303,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, int rc2; rc2 = ll_readahead(env, io, &queue->c2_qin, ras, - uptodate); + uptodate, file); CDEBUG(D_READA, DFID "%d pages read ahead at %lu\n", PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg)); } @@ -1183,6 +1345,60 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, return rc; } +/* + * Possible return value: + * 0 no async readahead triggered and fast read could not be used. + * 1 no async readahead, but fast read could be used. + * 2 async readahead triggered and fast read could be used too. + * < 0 on error. + */ +static int kickoff_async_readahead(struct file *file, unsigned long pages) +{ + struct ll_readahead_work *lrw; + struct inode *inode = file_inode(file); + struct ll_sb_info *sbi = ll_i2sbi(inode); + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + struct ll_readahead_state *ras = &fd->fd_ras; + struct ll_ra_info *ra = &sbi->ll_ra_info; + unsigned long throttle; + unsigned long start = ras_align(ras, ras->ras_next_readahead, NULL); + unsigned long end = start + pages - 1; + + throttle = min(ra->ra_async_pages_per_file_threshold, + ra->ra_max_pages_per_file); + /* + * If this is strided i/o or the window is smaller than the + * throttle limit, we do not do async readahead. Otherwise, + * we do async readahead, allowing the user thread to do fast i/o. + */ + if (stride_io_mode(ras) || !throttle || + ras->ras_window_len < throttle) + return 0; + + if ((atomic_read(&ra->ra_cur_pages) + pages) > ra->ra_max_pages) + return 0; + + if (ras->ras_async_last_readpage == start) + return 1; + + /* ll_readahead_work_free() free it */ + lrw = kzalloc(sizeof(*lrw), GFP_NOFS); + if (lrw) { + lrw->lrw_file = get_file(file); + lrw->lrw_start = start; + lrw->lrw_end = end; + spin_lock(&ras->ras_lock); + ras->ras_next_readahead = end + 1; + ras->ras_async_last_readpage = start; + spin_unlock(&ras->ras_lock); + ll_readahead_work_add(inode, lrw); + } else { + return -ENOMEM; + } + + return 2; +} + int ll_readpage(struct file *file, struct page *vmpage) { struct cl_object *clob = ll_i2info(file_inode(file))->lli_clob; @@ -1190,6 +1406,7 @@ int ll_readpage(struct file *file, struct page *vmpage) const struct lu_env *env = NULL; struct cl_io *io = NULL; struct cl_page *page; + struct ll_sb_info *sbi = ll_i2sbi(file_inode(file)); int result; lcc = ll_cl_find(file); @@ -1216,14 +1433,10 @@ int ll_readpage(struct file *file, struct page *vmpage) page = cl_vmpage_page(vmpage, clob); if (!page) { unlock_page(vmpage); + ll_ra_stats_inc_sbi(sbi, RA_STAT_FAILED_FAST_READ); return result; } - if (!env) { - local_env = cl_env_percpu_get(); - env = local_env; - } - vpg = cl2vvp_page(cl_object_page_slice(page->cp_obj, page)); if (vpg->vpg_defer_uptodate) { enum ras_update_flags flags = LL_RAS_HIT; @@ -1236,8 +1449,7 @@ int ll_readpage(struct file *file, struct page *vmpage) * if the page is hit in cache because non cache page * case will be handled by slow read later. */ - ras_update(ll_i2sbi(inode), inode, ras, vvp_index(vpg), - flags); + ras_update(sbi, inode, ras, vvp_index(vpg), flags); /* avoid duplicate ras_update() call */ vpg->vpg_ra_updated = 1; @@ -1247,14 +1459,23 @@ int ll_readpage(struct file *file, struct page *vmpage) * a cl_io to issue the RPC. */ if (ras->ras_window_start + ras->ras_window_len < - ras->ras_next_readahead + fast_read_pages) { - /* export the page and skip io stack */ - vpg->vpg_ra_used = 1; - cl_page_export(env, page, 1); + ras->ras_next_readahead + fast_read_pages || + kickoff_async_readahead(file, fast_read_pages) > 0) result = 0; - } } + if (!env) { + local_env = cl_env_percpu_get(); + env = local_env; + } + + /* export the page and skip io stack */ + if (result == 0) { + vpg->vpg_ra_used = 1; + cl_page_export(env, page, 1); + } else { + ll_ra_stats_inc_sbi(sbi, RA_STAT_FAILED_FAST_READ); + } /* release page refcount before unlocking the page to ensure * the object won't be destroyed in the calling path of * cl_page_put(). Please see comment in ll_releasepage(). diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index ee44a18..68455d5 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -749,6 +749,11 @@ static int vvp_io_read_start(const struct lu_env *env, down_read(&lli->lli_trunc_sem); + if (io->ci_async_readahead) { + file_accessed(file); + return 0; + } + if (!can_populate_pages(env, io, inode)) return 0; diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 9cdfca1..9328240 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -136,6 +136,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio, sub_io->ci_type = io->ci_type; sub_io->ci_no_srvlock = io->ci_no_srvlock; sub_io->ci_noatime = io->ci_noatime; + sub_io->ci_async_readahead = io->ci_async_readahead; sub_io->ci_lock_no_expand = io->ci_lock_no_expand; sub_io->ci_ndelay = io->ci_ndelay; sub_io->ci_layout_version = io->ci_layout_version; From patchwork Thu Feb 27 21:14:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BD60924 for ; Thu, 27 Feb 2020 21:43:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0411524690 for ; Thu, 27 Feb 2020 21:43:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0411524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1FE4521FF98; Thu, 27 Feb 2020 13:35:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EEADB21FC6A for ; Thu, 27 Feb 2020 13:20:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id ABAAC8AAB; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AA87A46A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:00 -0500 Message-Id: <1582838290-17243-373-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 372/622] lustre: obdclass: allow per-session jobids. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Lustre includes a jobid in all RPC message sent to the server. This is used to collected per-job statistics, where a "job" can involve multiple processes on multiple nodes in a cluster. Nodes in a cluster can be running processes for multiple jobs, so it is best if different processes can have different jobids, and that processes on different nodes can have the same job id. The current mechanism for supporting this is to use an environment variable which the kernel extracts from the relevant process's address space. Some kernel developers see this to be an unacceptable design choice, and the code is not likely to be accepted upstream. This patch provides an alternate method, leveraging the concept of a "session id", as set with setsid(). Each login session already gets a unique sid which is preserved for all processes in that session unless explicitly changed (with setsid(1)). When a process in a session writes to /sys/fs/lustre/jobid_this_session the string becomes the name for that session. If jobid_var is set to "session", then the per-session jobid is used for the jobid for all requests from processes in that session. When a session ends, the jobid information will be purged within 5 minutes. WC-bug-id: https://jira.whamcloud.com/browse/LU-12330 Lustre-commit: a32ce8f50eca ("LU-12330 obdclass: allow per-session jobids.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/34995 Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 1 + fs/lustre/include/obd_class.h | 4 + fs/lustre/obdclass/jobid.c | 199 +++++++++++++++++++++++++++++++++++-- fs/lustre/obdclass/obd_sysfs.c | 48 +++++++++ 4 files changed, 246 insertions(+), 6 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 9f62d4e..6269bd3 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -360,6 +360,7 @@ enum { #define JOBSTATS_DISABLE "disable" #define JOBSTATS_PROCNAME_UID "procname_uid" #define JOBSTATS_NODELOCAL "nodelocal" +#define JOBSTATS_SESSION "session" /* obd_config.c */ void lustre_register_client_process_config(int (*cpc)(struct lustre_cfg *lcfg)); diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 58c743c..76e8201 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -57,6 +57,10 @@ struct obd_device *class_exp2obd(struct obd_export *exp); int class_handle_ioctl(unsigned int cmd, unsigned long arg); int lustre_get_jobid(char *jobid, size_t len); +void jobid_cache_fini(void); +int jobid_cache_init(void); +char *jobid_current(void); +int jobid_set_current(char *jobid); struct lu_device_type; diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c index 8bad859..98b3f39 100644 --- a/fs/lustre/obdclass/jobid.c +++ b/fs/lustre/obdclass/jobid.c @@ -46,6 +46,151 @@ char obd_jobid_var[JOBSTATS_JOBID_VAR_MAX_LEN + 1] = JOBSTATS_DISABLE; char obd_jobid_name[LUSTRE_JOBID_SIZE] = "%e.%u"; +/* + * Jobid can be set for a session (see setsid(2)) by writing to + * a sysfs file from any process in that session. + * The jobids are stored in a hash table indexed by the relevant + * struct pid. We periodically look for entries where the pid has + * no PIDTYPE_SID tasks any more, and prune them. This happens within + * 5 seconds of a jobid being added, and every 5 minutes when jobids exist, + * but none are added. + */ +#define JOBID_EXPEDITED_CLEAN (5) +#define JOBID_BACKGROUND_CLEAN (5 * 60) + +struct session_jobid { + struct pid *sj_session; + struct rhash_head sj_linkage; + struct rcu_head sj_rcu; + char sj_jobid[1]; +}; + +static const struct rhashtable_params jobid_params = { + .key_len = sizeof(struct pid *), + .key_offset = offsetof(struct session_jobid, sj_session), + .head_offset = offsetof(struct session_jobid, sj_linkage), +}; + +static struct rhashtable session_jobids; + +/* + * jobid_current must be called with rcu_read_lock held. + * if it returns non-NULL, the string can only be used + * until rcu_read_unlock is called. + */ +char *jobid_current(void) +{ + struct pid *sid = task_session(current); + struct session_jobid *sj; + + sj = rhashtable_lookup_fast(&session_jobids, &sid, jobid_params); + if (sj) + return sj->sj_jobid; + return NULL; +} + +static void jobid_prune_expedite(void); +/* + * jobid_set_current will try to add a new entry + * to the table. If one exists with the same key, the + * jobid will be replaced + */ +int jobid_set_current(char *jobid) +{ + struct pid *sid; + struct session_jobid *sj, *origsj; + int ret; + int len = strlen(jobid); + + sj = kmalloc(sizeof(*sj) + len, GFP_KERNEL); + if (!sj) + return -ENOMEM; + rcu_read_lock(); + sid = task_session(current); + sj->sj_session = get_pid(sid); + strncpy(sj->sj_jobid, jobid, len+1); + origsj = rhashtable_lookup_get_insert_fast(&session_jobids, + &sj->sj_linkage, + jobid_params); + if (!origsj) { + /* successful insert */ + rcu_read_unlock(); + jobid_prune_expedite(); + return 0; + } + + if (IS_ERR(origsj)) { + put_pid(sj->sj_session); + kfree(sj); + rcu_read_unlock(); + return PTR_ERR(origsj); + } + ret = rhashtable_replace_fast(&session_jobids, + &origsj->sj_linkage, + &sj->sj_linkage, + jobid_params); + if (ret) { + put_pid(sj->sj_session); + kfree(sj); + rcu_read_unlock(); + return ret; + } + put_pid(origsj->sj_session); + rcu_read_unlock(); + kfree_rcu(origsj, sj_rcu); + jobid_prune_expedite(); + + return 0; +} + +static void jobid_free(void *vsj, void *arg) +{ + struct session_jobid *sj = vsj; + + put_pid(sj->sj_session); + kfree(sj); +} + +static void jobid_prune(struct work_struct *work); +static DECLARE_DELAYED_WORK(jobid_prune_work, jobid_prune); +static int jobid_prune_expedited; +static void jobid_prune(struct work_struct *work) +{ + int remaining = 0; + struct rhashtable_iter iter; + struct session_jobid *sj; + + jobid_prune_expedited = 0; + rhashtable_walk_enter(&session_jobids, &iter); + rhashtable_walk_start(&iter); + while ((sj = rhashtable_walk_next(&iter)) != NULL) { + if (!hlist_empty(&sj->sj_session->tasks[PIDTYPE_SID])) { + remaining++; + continue; + } + if (rhashtable_remove_fast(&session_jobids, + &sj->sj_linkage, + jobid_params) == 0) { + put_pid(sj->sj_session); + kfree_rcu(sj, sj_rcu); + } + } + rhashtable_walk_stop(&iter); + rhashtable_walk_exit(&iter); + if (remaining) + schedule_delayed_work(&jobid_prune_work, + JOBID_BACKGROUND_CLEAN * HZ); +} + +static void jobid_prune_expedite(void) +{ + if (!jobid_prune_expedited) { + jobid_prune_expedited = 1; + mod_delayed_work(system_wq, &jobid_prune_work, + JOBID_EXPEDITED_CLEAN * HZ); + } +} + /* Get jobid of current process from stored variable or calculate * it from pid and user_id. * @@ -134,14 +279,40 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid, return joblen < 0 ? -EOVERFLOW : 0; } +/** + * Generate the job identifier string for this process for tracking purposes. + * + * Fill in @jobid string based on the value of obd_jobid_var: + * JOBSTATS_DISABLE: none + * JOBSTATS_NODELOCAL: content of obd_jobid_name (jobid_interpret_string()) + * JOBSTATS_PROCNAME_UID: process name/UID + * JOBSTATS_SESSION per-session value set by + * /sys/fs/lustre/jobid_this_session + * + * Return -ve error number, 0 on success. + */ int lustre_get_jobid(char *jobid, size_t joblen) { char tmp_jobid[LUSTRE_JOBID_SIZE] = ""; + if (unlikely(joblen < 2)) { + if (joblen == 1) + jobid[0] = '\0'; + return -EINVAL; + } + /* Jobstats isn't enabled */ if (strcmp(obd_jobid_var, JOBSTATS_DISABLE) == 0) goto out_cache_jobid; + /* Whole node dedicated to single job */ + if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0) { + int rc2 = jobid_interpret_string(obd_jobid_name, + tmp_jobid, joblen); + if (!rc2) + goto out_cache_jobid; + } + /* Use process name + fsuid as jobid */ if (strcmp(obd_jobid_var, JOBSTATS_PROCNAME_UID) == 0) { snprintf(tmp_jobid, LUSTRE_JOBID_SIZE, "%s.%u", @@ -150,13 +321,17 @@ int lustre_get_jobid(char *jobid, size_t joblen) goto out_cache_jobid; } - /* Whole node dedicated to single job */ - if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0) { - int rc2 = jobid_interpret_string(obd_jobid_name, - tmp_jobid, joblen); - if (!rc2) - goto out_cache_jobid; + if (strcmp(obd_jobid_var, JOBSTATS_SESSION) == 0) { + char *jid; + + rcu_read_lock(); + jid = jobid_current(); + if (jid) + strlcpy(jobid, jid, sizeof(jobid)); + rcu_read_unlock(); + goto out_cache_jobid; } + return -ENOENT; out_cache_jobid: @@ -167,3 +342,15 @@ int lustre_get_jobid(char *jobid, size_t joblen) return 0; } EXPORT_SYMBOL(lustre_get_jobid); + +int jobid_cache_init(void) +{ + return rhashtable_init(&session_jobids, &jobid_params); +} + +void jobid_cache_fini(void) +{ + cancel_delayed_work_sync(&jobid_prune_work); + + rhashtable_free_and_destroy(&session_jobids, jobid_free, NULL); +} diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c index ca15936..8803d05 100644 --- a/fs/lustre/obdclass/obd_sysfs.c +++ b/fs/lustre/obdclass/obd_sysfs.c @@ -259,6 +259,44 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, return count; } +static ssize_t jobid_this_session_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + char *jid; + int ret = -ENOENT; + + rcu_read_lock(); + jid = jobid_current(); + if (jid) + ret = snprintf(buf, PAGE_SIZE, "%s\n", jid); + rcu_read_unlock(); + return ret; +} + +static ssize_t jobid_this_session_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + char *jobid; + int len; + int ret; + + if (!count || count > LUSTRE_JOBID_SIZE) + return -EINVAL; + + jobid = kstrndup(buffer, count, GFP_KERNEL); + if (!jobid) + return -ENOMEM; + len = strcspn(jobid, "\n "); + jobid[len] = '\0'; + ret = jobid_set_current(jobid); + kfree(jobid); + + return ret ?: count; +} + /* Root for /sys/kernel/debug/lustre */ struct dentry *debugfs_lustre_root; EXPORT_SYMBOL_GPL(debugfs_lustre_root); @@ -268,6 +306,7 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, LUSTRE_RO_ATTR(health_check); LUSTRE_RW_ATTR(jobid_var); LUSTRE_RW_ATTR(jobid_name); +LUSTRE_RW_ATTR(jobid_this_session); static struct attribute *lustre_attrs[] = { &lustre_attr_version.attr, @@ -275,6 +314,7 @@ static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr, &lustre_attr_health_check.attr, &lustre_attr_jobid_name.attr, &lustre_attr_jobid_var.attr, + &lustre_attr_jobid_this_session.attr, &lustre_sattr_timeout.u.attr, &lustre_attr_max_dirty_mb.attr, &lustre_sattr_debug_peer_on_timeout.u.attr, @@ -441,6 +481,12 @@ int class_procfs_init(void) goto out; } + rc = jobid_cache_init(); + if (rc) { + kset_unregister(lustre_kset); + goto out; + } + debugfs_lustre_root = debugfs_create_dir("lustre", NULL); debugfs_create_file("devices", 0444, debugfs_lustre_root, NULL, @@ -458,6 +504,8 @@ int class_procfs_clean(void) debugfs_lustre_root = NULL; + jobid_cache_fini(); + sysfs_remove_group(&lustre_kset->kobj, &lustre_attr_group); kset_unregister(lustre_kset); From patchwork Thu Feb 27 21:14:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410333 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C313138D for ; Thu, 27 Feb 2020 21:35:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8DB924677 for ; Thu, 27 Feb 2020 21:35:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8DB924677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C1D034A095; Thu, 27 Feb 2020 13:29:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5029721FC6A for ; Thu, 27 Feb 2020 13:20:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AE6D38AAC; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AD37C46C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:01 -0500 Message-Id: <1582838290-17243-374-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 373/622] lustre: llite: fix deadloop with tiny write X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong For a small write(<4K), we will use tiny write and __generic_file_write_iter() will be called to handle it. On newer kernel(4.14 etc), the function is exported and will do something like following: |->__generic_file_write_iter |->generic_perform_write() If iov_iter_count() passed in is 0, generic_write_perform() will try go to forever loop as bytes copied is always calculated as 0. The problem is VFS doesn't always skip IO count zero before it comes to lower layer read/write hook, and we should do it by ourselves. To fix this problem, always return 0 early if there is no real any IO needed. WC-bug-id: https://jira.whamcloud.com/browse/LU-12382 Lustre-commit: e9a543b0d303 ("LU-12382 llite: fix deadloop with tiny write") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35058 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 5d1cfa4..1ed4b14 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1668,6 +1668,9 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t rc2; bool cached; + if (!iov_iter_count(to)) + return 0; + /** * Currently when PCC read failed, we do not fall back to the * normal read path, just return the error. @@ -1778,6 +1781,11 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) bool cached; int result; + if (!iov_iter_count(from)) { + rc_normal = 0; + goto out; + } + /** * When PCC write failed, we usually do not fall back to the normal * write path, just return the error. But there is a special case when From patchwork Thu Feb 27 21:14:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6889492A for ; Thu, 27 Feb 2020 21:37:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 50D7224690 for ; Thu, 27 Feb 2020 21:37:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 50D7224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A74B334A272; Thu, 27 Feb 2020 13:30:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 918D321FD5C for ; Thu, 27 Feb 2020 13:20:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B0E488AAD; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AFE7246D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:02 -0500 Message-Id: <1582838290-17243-375-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 374/622] lnet: prevent loop in LNetPrimaryNID() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata If discovery is disabled locally or at the remote end, then attempt discovery only once. Do not update the internal database when discovery is disabled and do not repeat discovery. This change prevents LNet from getting hung waiting for discovery to complete. WC-bug-id: https://jira.whamcloud.com/browse/LU-12424 Lustre-commit: 439520f762b0 ("LU-12424 lnet: prevent loop in LNetPrimaryNID()") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35191 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 73 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 42 insertions(+), 31 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 55ff01d..e5cce2f 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1137,6 +1137,34 @@ struct lnet_peer_ni * return primary_nid; } +bool +lnet_is_discovery_disabled_locked(struct lnet_peer *lp) +{ + if (lnet_peer_discovery_disabled) + return true; + + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL) || + (lp->lp_state & LNET_PEER_NO_DISCOVERY)) { + return true; + } + + return false; +} + +/* Peer Discovery + */ +bool +lnet_is_discovery_disabled(struct lnet_peer *lp) +{ + bool rc = false; + + spin_lock(&lp->lp_lock); + rc = lnet_is_discovery_disabled_locked(lp); + spin_unlock(&lp->lp_lock); + + return rc; +} + lnet_nid_t LNetPrimaryNID(lnet_nid_t nid) { @@ -1153,11 +1181,16 @@ struct lnet_peer_ni * goto out_unlock; } lp = lpni->lpni_peer_net->lpn_peer; + while (!lnet_peer_is_uptodate(lp)) { rc = lnet_discover_peer_locked(lpni, cpt, true); if (rc) goto out_decref; lp = lpni->lpni_peer_net->lpn_peer; + + /* Only try once if discovery is disabled */ + if (lnet_is_discovery_disabled(lp)) + break; } primary_nid = lp->lp_primary_nid; out_decref: @@ -1784,35 +1817,6 @@ struct lnet_peer_ni * } bool -lnet_is_discovery_disabled_locked(struct lnet_peer *lp) -{ - if (lnet_peer_discovery_disabled) - return true; - - if (!(lp->lp_state & LNET_PEER_MULTI_RAIL) || - (lp->lp_state & LNET_PEER_NO_DISCOVERY)) { - return true; - } - - return false; -} - -/* - * Peer Discovery - */ -bool -lnet_is_discovery_disabled(struct lnet_peer *lp) -{ - bool rc = false; - - spin_lock(&lp->lp_lock); - rc = lnet_is_discovery_disabled_locked(lp); - spin_unlock(&lp->lp_lock); - - return rc; -} - -bool lnet_peer_gw_discovery(struct lnet_peer *lp) { bool rc = false; @@ -2157,8 +2161,6 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) break; lnet_peer_queue_for_discovery(lp); - if (lnet_is_discovery_disabled(lp)) - break; /* * if caller requested a non-blocking operation then * return immediately. Once discovery is complete then the @@ -2176,6 +2178,15 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) lnet_peer_decref_locked(lp); /* Peer may have changed */ lp = lpni->lpni_peer_net->lpn_peer; + + /* Wait for discovery to complete, but don't repeat if + * discovery is disabled. This is done to ensure we can + * use discovery as a standard ping as well for backwards + * compatibility with routers which do not have discovery + * or have discovery disabled + */ + if (lnet_is_discovery_disabled(lp)) + break; } finish_wait(&lp->lp_dc_waitq, &wait); From patchwork Thu Feb 27 21:14:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410411 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DC8A138D for ; Thu, 27 Feb 2020 21:37:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA39424690 for ; Thu, 27 Feb 2020 21:37:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA39424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B7B13496F9; Thu, 27 Feb 2020 13:30:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E795F21FD5C for ; Thu, 27 Feb 2020 13:20:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B3A6A8AAE; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B29F7468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:03 -0500 Message-Id: <1582838290-17243-376-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 375/622] lustre: ldlm: Fix style issues for ldlm_lib.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arshad Hussain , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Arshad Hussain This patch fixes issues reported by checkpatch for file fs/lustre/ldlm/ldlm_lib.c WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 939cdd034e7b ("LU-6142 ldlm: Fix style issues for ldlm_lib.c") Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/34495 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_lib.c | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 4a982ab..af74f97 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -48,7 +48,8 @@ #include #include "ldlm_internal.h" -/* @priority: If non-zero, move the selected connection to the list head. +/* + * @priority: If non-zero, move the selected connection to the list head. * @create: If zero, only search in existing connections. */ static int import_set_conn(struct obd_import *imp, struct obd_uuid *uuid, @@ -223,7 +224,8 @@ int client_import_find_conn(struct obd_import *imp, lnet_nid_t peer, void client_destroy_import(struct obd_import *imp) { - /* Drop security policy instance after all RPCs have finished/aborted + /* + * Drop security policy instance after all RPCs have finished/aborted * to let all busy contexts be released. */ class_import_get(imp); @@ -233,7 +235,8 @@ void client_destroy_import(struct obd_import *imp) } EXPORT_SYMBOL(client_destroy_import); -/* Configure an RPC client OBD device. +/* + * Configure an RPC client OBD device. * * lcfg parameters: * 1 - client UUID @@ -255,7 +258,8 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) }; int rc; - /* In a more perfect world, we would hang a ptlrpc_client off of + /* + * In a more perfect world, we would hang a ptlrpc_client off of * obd_type and just use the values from there. */ if (!strcmp(name, LUSTRE_OSC_NAME)) { @@ -630,7 +634,8 @@ int client_disconnect_export(struct obd_export *exp) goto out_disconnect; } - /* Mark import deactivated now, so we don't try to reconnect if any + /* + * Mark import deactivated now, so we don't try to reconnect if any * of the cleanup RPCs fails (e.g. LDLM cancel, etc). We don't * fully deactivate the import, or that would drop all requests. */ @@ -638,7 +643,8 @@ int client_disconnect_export(struct obd_export *exp) imp->imp_deactive = 1; spin_unlock(&imp->imp_lock); - /* Some non-replayable imports (MDS's OSCs) are pinged, so just + /* + * Some non-replayable imports (MDS's OSCs) are pinged, so just * delete it regardless. (It's safe to delete an import that was * never added.) */ @@ -652,7 +658,8 @@ int client_disconnect_export(struct obd_export *exp) obd->obd_force); } - /* There's no need to hold sem while disconnecting an import, + /* + * There's no need to hold sem while disconnecting an import, * and it may actually cause deadlock in GSS. */ up_write(&cli->cl_sem); @@ -662,7 +669,8 @@ int client_disconnect_export(struct obd_export *exp) ptlrpc_invalidate_import(imp); out_disconnect: - /* Use server style - class_disconnect should be always called for + /* + * Use server style - class_disconnect should be always called for * o_disconnect. */ err = class_disconnect(exp); @@ -680,9 +688,10 @@ int client_disconnect_export(struct obd_export *exp) */ int target_pack_pool_reply(struct ptlrpc_request *req) { - struct obd_device *obd; +struct obd_device *obd; - /* Check that we still have all structures alive as this may + /* + * Check that we still have all structures alive as this may * be some late RPC at shutdown time. */ if (unlikely(!req->rq_export || !req->rq_export->exp_obd || @@ -711,7 +720,8 @@ int target_pack_pool_reply(struct ptlrpc_request *req) DEBUG_REQ(D_ERROR, req, "dropping reply"); return -ECOMM; } - /* We can have a null rq_reqmsg in the event of bad signature or + /* + * We can have a null rq_reqmsg in the event of bad signature or * no context when unwrapping */ if (req->rq_reqmsg && @@ -792,7 +802,8 @@ void target_send_reply(struct ptlrpc_request *req, int rc, int fail_id) atomic_inc(&svcpt->scp_nreps_difficult); if (netrc != 0) { - /* error sending: reply is off the net. Also we need +1 + /* + * error sending: reply is off the net. Also we need +1 * reply ref until ptlrpc_handle_rs() is done * with the reply state (if the send was successful, there * would have been +1 ref for the net, which From patchwork Thu Feb 27 21:14:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5157517E0 for ; Thu, 27 Feb 2020 21:43:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3A6F0246A1 for ; Thu, 27 Feb 2020 21:43:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A6F0246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 06FF63492EF; Thu, 27 Feb 2020 13:35:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4993021FC2C for ; Thu, 27 Feb 2020 13:20:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B68BC8AAF; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B55A647C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:04 -0500 Message-Id: <1582838290-17243-377-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 376/622] lustre: obdclass: protect imp_sec using rwlock_t X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang We've seen spinlock contention on imp_lock in sptlrpc_import_sec_ref(), introduce a new rwlock imp_sec_lock to protect imp_sec instead of using imp_lock. This patch also removes imp_sec_mutex from obd_import, which is not needed, to avoid confusion between imp_sec_lock/mutex. WC-bug-id: https://jira.whamcloud.com/browse/LU-11775 Lustre-commit: 8ed361345154 ("LU-11775 obdclass: protect imp_sec using rwlock_t") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/33861 Reviewed-by: Alexey Lyashkov Reviewed-by: Alexandr Boyko Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 2 +- fs/lustre/obdclass/genops.c | 2 +- fs/lustre/ptlrpc/sec.c | 15 ++++++--------- fs/lustre/ptlrpc/sec_config.c | 4 ++-- 4 files changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index f16d621..ff171d1 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -206,7 +206,7 @@ struct obd_import { * @{ */ struct ptlrpc_sec *imp_sec; - struct mutex imp_sec_mutex; + rwlock_t imp_sec_lock; time64_t imp_sec_expire; /** @} */ diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index fd9dd96..2b1175f 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -997,7 +997,7 @@ struct obd_import *class_new_import(struct obd_device *obd) imp->imp_last_success_conn = 0; imp->imp_state = LUSTRE_IMP_NEW; imp->imp_obd = class_incref(obd, "import", imp); - mutex_init(&imp->imp_sec_mutex); + rwlock_init(&imp->imp_sec_lock); init_waitqueue_head(&imp->imp_recovery_waitq); INIT_WORK(&imp->imp_zombie_work, obd_zombie_imp_cull); diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c index 789b5cb..d82809f 100644 --- a/fs/lustre/ptlrpc/sec.c +++ b/fs/lustre/ptlrpc/sec.c @@ -303,13 +303,13 @@ static int import_sec_check_expire(struct obd_import *imp) { int adapt = 0; - spin_lock(&imp->imp_lock); + write_lock(&imp->imp_sec_lock); if (imp->imp_sec_expire && imp->imp_sec_expire < ktime_get_real_seconds()) { adapt = 1; imp->imp_sec_expire = 0; } - spin_unlock(&imp->imp_lock); + write_unlock(&imp->imp_sec_lock); if (!adapt) return 0; @@ -1317,9 +1317,9 @@ struct ptlrpc_sec *sptlrpc_import_sec_ref(struct obd_import *imp) { struct ptlrpc_sec *sec; - spin_lock(&imp->imp_lock); + read_lock(&imp->imp_sec_lock); sec = sptlrpc_sec_get(imp->imp_sec); - spin_unlock(&imp->imp_lock); + read_unlock(&imp->imp_sec_lock); return sec; } @@ -1332,10 +1332,10 @@ static void sptlrpc_import_sec_install(struct obd_import *imp, LASSERT_ATOMIC_POS(&sec->ps_refcount); - spin_lock(&imp->imp_lock); + write_lock(&imp->imp_sec_lock); old_sec = imp->imp_sec; imp->imp_sec = sec; - spin_unlock(&imp->imp_lock); + write_unlock(&imp->imp_sec_lock); if (old_sec) { sptlrpc_sec_kill(old_sec); @@ -1455,8 +1455,6 @@ int sptlrpc_import_sec_adapt(struct obd_import *imp, sptlrpc_flavor2name(&sf, str, sizeof(str))); } - mutex_lock(&imp->imp_sec_mutex); - newsec = sptlrpc_sec_create(imp, svc_ctx, &sf, sp); if (newsec) { sptlrpc_import_sec_install(imp, newsec); @@ -1467,7 +1465,6 @@ int sptlrpc_import_sec_adapt(struct obd_import *imp, rc = -EPERM; } - mutex_unlock(&imp->imp_sec_mutex); out: sptlrpc_sec_put(sec); return rc; diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c index e4b1a075..9ced6c7 100644 --- a/fs/lustre/ptlrpc/sec_config.c +++ b/fs/lustre/ptlrpc/sec_config.c @@ -846,11 +846,11 @@ void sptlrpc_conf_client_adapt(struct obd_device *obd) imp = obd->u.cli.cl_import; if (imp) { - spin_lock(&imp->imp_lock); + write_lock(&imp->imp_sec_lock); if (imp->imp_sec) imp->imp_sec_expire = ktime_get_real_seconds() + SEC_ADAPT_DELAY; - spin_unlock(&imp->imp_lock); + write_unlock(&imp->imp_sec_lock); } up_read(&obd->u.cli.cl_sem); From patchwork Thu Feb 27 21:14:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410415 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0BA8117E0 for ; Thu, 27 Feb 2020 21:37:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8BA124690 for ; Thu, 27 Feb 2020 21:37:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8BA124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3D42734A2D5; Thu, 27 Feb 2020 13:31:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9FCA821FC2C for ; Thu, 27 Feb 2020 13:20:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B92AA8AB0; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B81B246A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:05 -0500 Message-Id: <1582838290-17243-378-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 377/622] lustre: llite: console message for disabled flock call X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi When flock option is disabled on a Lustre client, any call to flock() or lockf() would cause a return value with failure. For applications that don't print proper error message, it is hard to know the root cause is the missing flock option on Lustre file system. Thus this patch prints following error message to the tty that calls flock()/lockf(): "Lustre: flock disabled, mount with '-o [local]flock' to enable" Such message will print to each file descriptor no more than once to avoid message flood. In order to do so, this patch adds support for CDEBUG_LIMIT(D_TTY). It prints the message to tty. When using this macro, please note that "\r\n" needs to be the end of the line. Otherwise, message like "format at $FILE:$LINO:$FUNC doesn't end in '\r\n'" will be printed to the system message for warning. Note that LL_FILE_RMTACL should have been removed by Commit 341f1f0affed ("staging: lustre: remove remote client support") WC-bug-id: https://jira.whamcloud.com/browse/LU-12349 Lustre-commit: f6497eb3503b ("LU-12349 llite: console message for disabled flock call") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/34986 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 12 ++++++++++++ include/uapi/linux/lnet/libcfs_debug.h | 4 ++-- include/uapi/linux/lustre/lustre_user.h | 2 +- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 1ed4b14..76a5074 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4275,6 +4275,18 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, static int ll_file_noflock(struct file *file, int cmd, struct file_lock *file_lock) { + struct ll_file_data *fd = LUSTRE_FPRIVATE(file); + + /* + * In order to avoid flood of warning messages, only print one message + * for one file. And the entire message rate on the client is limited + * by CDEBUG_LIMIT too. + */ + if (!(fd->fd_flags & LL_FILE_FLOCK_WARNING)) { + fd->fd_flags |= LL_FILE_FLOCK_WARNING; + CDEBUG_LIMIT(D_TTY | D_CONSOLE, + "flock disabled, mount with '-o [local]flock' to enable\r\n"); + } return -EINVAL; } diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h index 1a68667..6255331 100644 --- a/include/uapi/linux/lnet/libcfs_debug.h +++ b/include/uapi/linux/lnet/libcfs_debug.h @@ -106,7 +106,7 @@ struct ptldebug_header { #define D_TRACE 0x00000001 /* ENTRY/EXIT markers */ #define D_INODE 0x00000002 #define D_SUPER 0x00000004 -#define D_EXT2 0x00000008 /* anything from ext2_debug */ +#define D_TTY 0x00000008 /* notification printed to TTY */ #define D_MALLOC 0x00000010 /* print malloc, free information */ #define D_CACHE 0x00000020 /* cache-related items */ #define D_INFO 0x00000040 /* general information */ @@ -137,7 +137,7 @@ struct ptldebug_header { #define D_LAYOUT 0x80000000 #define LIBCFS_DEBUG_MASKS_NAMES { \ - "trace", "inode", "super", "ext2", "malloc", "cache", "info", \ + "trace", "inode", "super", "tty", "malloc", "cache", "info", \ "ioctl", "neterror", "net", "warning", "buffs", "other", \ "dentry", "nettrace", "page", "dlmtrace", "error", "emerg", \ "ha", "rpctrace", "vfstrace", "reada", "mmap", "config", \ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 317b236..d43170f 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -385,7 +385,7 @@ struct ll_ioc_lease_id { #define LL_FILE_READAHEA 0x00000004 #define LL_FILE_LOCKED_DIRECTIO 0x00000008 /* client-side locks with dio */ #define LL_FILE_LOCKLESS_IO 0x00000010 /* server-side locks with cio */ -#define LL_FILE_RMTACL 0x00000020 +#define LL_FILE_FLOCK_WARNING 0x00000020 /* warned about disabled flock */ #define LOV_USER_MAGIC_V1 0x0BD10BD0 #define LOV_USER_MAGIC LOV_USER_MAGIC_V1 From patchwork Thu Feb 27 21:14:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410735 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1D3C1580 for ; Thu, 27 Feb 2020 21:45:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DA7B3246A2 for ; Thu, 27 Feb 2020 21:45:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA7B3246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B3C3C34B04C; Thu, 27 Feb 2020 13:36:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0372321FC2C for ; Thu, 27 Feb 2020 13:20:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BC1888AB1; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BAE1146C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:06 -0500 Message-Id: <1582838290-17243-379-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 378/622] lustre: ptlrpc: Add increasing XIDs CONNECT2 flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh This patch reserves the OBD_CONNECT2 flag for increasing XIDs. Cray-bug-id: LUS-6272 WC-bug-id: https://jira.whamcloud.com/browse/LU-11444 Lustre-commit: b4375f5fc66c ("LU-11444 ptlrpc: Add increasing XIDs CONNECT2 flag") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/35113 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index c244adb..ca169ec 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -120,7 +120,7 @@ "wbc", /* 0x40 */ "lock_convert", /* 0x80 */ "archive_id_array", /* 0x100 */ - "unknown", /* 0x200 */ + "increasing_xid", /* 0x200 */ "selinux_policy", /* 0x400 */ "lsom", /* 0x800 */ "pcc", /* 0x1000 */ diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 64ccc6e..e801f2c 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1148,6 +1148,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LOCK_CONVERT); LASSERTF(OBD_CONNECT2_ARCHIVE_ID_ARRAY == 0x100ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ARCHIVE_ID_ARRAY); + LASSERTF(OBD_CONNECT2_INC_XID == 0x200ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_INC_XID); LASSERTF(OBD_CONNECT2_SELINUX_POLICY == 0x400ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_SELINUX_POLICY); LASSERTF(OBD_CONNECT2_LSOM == 0x800ULL, "found 0x%.16llxULL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 2e54dd1..c86b188 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -806,6 +806,7 @@ struct ptlrpc_body_v2 { */ #define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ #define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ +#define OBD_CONNECT2_INC_XID 0x200ULL /* Increasing xid */ #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ #define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ From patchwork Thu Feb 27 21:14:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A03517E0 for ; Thu, 27 Feb 2020 21:43:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 12C6F246A1 for ; Thu, 27 Feb 2020 21:43:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 12C6F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D25B23493B4; Thu, 27 Feb 2020 13:35:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 46EC121FD69 for ; Thu, 27 Feb 2020 13:20:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BE9B48AB2; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BD8C246D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:07 -0500 Message-Id: <1582838290-17243-380-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 379/622] lustre: ptlrpc: don't reset lru_resize on idle reconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh ptlrpc_disconnect_idle_interpret() clears imp_remote_handle, so reconnect has pcaa_initial_connect set to 1. Update only changed ns_connect_flags bits. Fixes: 4b102da53ad ("lustre: ptlrpc: idle connections can disconnect") Cray-bug-id: LUS-7471 WC-bug-id: https://jira.whamcloud.com/browse/LU-11518 Lustre-commit: acacc9d9b1d0 ("LU-11518 ptlrpc: don't reset lru_resize on idle reconnect") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/35285 Reviewed-by: Alexandr Boyko Reviewed-by: Andreas Dilger Reviewed-by: Gu Zheng Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 6f13ec1..f8e15f2 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -858,13 +858,17 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, * disable lru_resize, etc. */ if (old_connect_flags != exp_connect_flags(exp) || init_connect) { + struct ldlm_namespace *ns = imp->imp_obd->obd_namespace; + u64 changed_flags; + + changed_flags = + ns->ns_connect_flags ^ ns->ns_orig_connect_flags; CDEBUG(D_HA, "%s: Resetting ns_connect_flags to server flags: %#llx\n", imp->imp_obd->obd_name, ocd->ocd_connect_flags); - imp->imp_obd->obd_namespace->ns_connect_flags = - ocd->ocd_connect_flags; - imp->imp_obd->obd_namespace->ns_orig_connect_flags = - ocd->ocd_connect_flags; + ns->ns_connect_flags = (ns->ns_connect_flags & changed_flags) | + (ocd->ocd_connect_flags & ~changed_flags); + ns->ns_orig_connect_flags = ocd->ocd_connect_flags; } if (ocd->ocd_connect_flags & OBD_CONNECT_AT) From patchwork Thu Feb 27 21:14:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410337 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 671D192A for ; Thu, 27 Feb 2020 21:35:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4FD5D24677 for ; Thu, 27 Feb 2020 21:35:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4FD5D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2EFD23494AE; Thu, 27 Feb 2020 13:29:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DBFD21FD70 for ; Thu, 27 Feb 2020 13:20:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C1E6A8AB3; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C043D468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:08 -0500 Message-Id: <1582838290-17243-381-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 380/622] lnet: use after free in lnet_discover_peer_locked() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Olaf Weber When the lnet_net_lock is unlocked, the peer attached to an lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer) can change, and the old peer deallocated. If we are really unlucky, then all the churn could give us a new, different, peer at the same address in memory. Change the reference counting on the lnet_peer lp so that it is guaranteed to be alive when we relock the lnet_net_lock for the cpt. When the reference count is dropped lp may go away if it was unlinked, but the new peer is guaranteed to have a different address, so we can still correctly determine whether the peer changed and discovery should be redone. WC-bug-id: https://jira.whamcloud.com/browse/LU-9971 Lustre-commit: 2b5b551b15d9 ("LU-9971 lnet: use after free in lnet_discover_peer_locked()") Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/28944 Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index e5cce2f..d167a37 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2150,6 +2150,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) * zombie if we race with DLC, so we must check for that. */ for (;;) { + /* Keep lp alive when the lnet_net_lock is unlocked */ + lnet_peer_addref_locked(lp); prepare_to_wait(&lp->lp_dc_waitq, &wait, TASK_INTERRUPTIBLE); if (signal_pending(current)) break; @@ -2161,16 +2163,14 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) break; lnet_peer_queue_for_discovery(lp); - /* - * if caller requested a non-blocking operation then - * return immediately. Once discovery is complete then the - * peer ref will be decremented and any pending messages - * that were stopped due to discovery will be transmitted. + /* If caller requested a non-blocking operation then + * return immediately. Once discovery is complete any + * pending messages that were stopped due to discovery + * will be transmitted. */ if (!block) break; - lnet_peer_addref_locked(lp); lnet_net_unlock(LNET_LOCK_EX); schedule(); finish_wait(&lp->lp_dc_waitq, &wait); @@ -2192,10 +2192,12 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) lnet_net_unlock(LNET_LOCK_EX); lnet_net_lock(cpt); - - /* If the peer has changed after we've discovered the older peer, - * then we need to discovery the new peer to make sure the - * interface information is up to date + lnet_peer_decref_locked(lp); + /* The peer may have changed, so re-check and rediscover if that turns + * out to have been the case. The reference count on lp ensured that + * even if it was unlinked from lpni the memory could not be recycled. + * Thus the check below is sufficient to determine whether the peer + * changed. If the peer changed, then lp must not be dereferenced. */ if (lp != lpni->lpni_peer_net->lpn_peer) goto again; From patchwork Thu Feb 27 21:14:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410739 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F87E924 for ; Thu, 27 Feb 2020 21:45:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 88503246A1 for ; Thu, 27 Feb 2020 21:45:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88503246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 209E334B075; Thu, 27 Feb 2020 13:36:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E63E521FDAA for ; Thu, 27 Feb 2020 13:20:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C414E8AB4; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C300247C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:09 -0500 Message-Id: <1582838290-17243-382-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 381/622] lustre: obdclass: generate random u64 max correctly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Generate random u64 max number correctly, and make it an obdclass function lu_prandom_u64_max(). Fixes: bcfa98a507 ("staging: lustre: replace cfs_rand() with prandom_u32_max()") WC-bug-id: https://jira.whamcloud.com/browse/LU-12495 Lustre-commit: 645b72c5c058 ("LU-12495 obdclass: generate random u64 max correctly") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35394 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 1 + fs/lustre/lmv/lmv_qos.c | 26 +------------------------- fs/lustre/obdclass/lu_qos.c | 36 ++++++++++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 25 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index 0f3e3be..6b1064a 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1390,6 +1390,7 @@ struct lu_qos { int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); +u64 lu_prandom_u64_max(u64 ep_ro); /** @} lu */ #endif /* __LUSTRE_LU_OBJECT_H */ diff --git a/fs/lustre/lmv/lmv_qos.c b/fs/lustre/lmv/lmv_qos.c index e323398..85053d2e 100644 --- a/fs/lustre/lmv/lmv_qos.c +++ b/fs/lustre/lmv/lmv_qos.c @@ -370,31 +370,7 @@ struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) total_weight += tgt->ltd_qos.ltq_weight; } - if (total_weight) { -#if BITS_PER_LONG == 32 - /* - * If total_weight > 32-bit, first generate the high - * 32 bits of the random number, then add in the low - * 32 bits (truncated to the upper limit, if needed) - */ - if (total_weight > 0xffffffffULL) - rand = (u64)(prandom_u32_max( - (unsigned int)(total_weight >> 32)) << 32; - else - rand = 0; - - if (rand == (total_weight & 0xffffffff00000000ULL)) - rand |= prandom_u32_max((unsigned int)total_weight); - else - rand |= prandom_u32(); - -#else - rand = ((u64)prandom_u32() << 32 | prandom_u32()) % - total_weight; -#endif - } else { - rand = 0; - } + rand = lu_prandom_u64_max(total_weight); for (i = 0; i < lmv->desc.ld_tgt_count; i++) { tgt = lmv->tgts[i]; diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c index 4ee3f59..9fdcbc2 100644 --- a/fs/lustre/obdclass/lu_qos.c +++ b/fs/lustre/obdclass/lu_qos.c @@ -35,6 +35,7 @@ #include #include +#include #include #include #include @@ -164,3 +165,38 @@ int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) return rc; } EXPORT_SYMBOL(lqos_del_tgt); + +/** + * lu_prandom_u64_max - returns a pseudo-random u64 number in interval + * [0, ep_ro) + * + * #ep_ro right open interval endpoint + * + * Return: a pseudo-random 64-bit number that is in interval [0, ep_ro). + */ +u64 lu_prandom_u64_max(u64 ep_ro) +{ + u64 rand = 0; + + if (ep_ro) { +#if BITS_PER_LONG == 32 + /* + * If ep_ro > 32-bit, first generate the high + * 32 bits of the random number, then add in the low + * 32 bits (truncated to the upper limit, if needed) + */ + if (ep_ro > 0xffffffffULL) + rand = prandom_u32_max((u32)(ep_ro >> 32)) << 32; + + if (rand == (ep_ro & 0xffffffff00000000ULL)) + rand |= prandom_u32_max((u32)ep_ro); + else + rand |= prandom_u32(); +#else + rand = ((u64)prandom_u32() << 32 | prandom_u32()) % ep_ro; +#endif + } + + return rand; +} +EXPORT_SYMBOL(lu_prandom_u64_max); From patchwork Thu Feb 27 21:14:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410673 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7EA0B17E0 for ; Thu, 27 Feb 2020 21:44:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 67903246A1 for ; Thu, 27 Feb 2020 21:44:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 67903246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A544D34987C; Thu, 27 Feb 2020 13:35:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4A59F21FBA2 for ; Thu, 27 Feb 2020 13:20:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C6CF38AB5; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C5C8246A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:10 -0500 Message-Id: <1582838290-17243-383-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 382/622] lnet: fix peer ref counting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Exit from the loop after peer ref count has been incremented to avoid wrong ref count. The code makes sure that a peer is queued for discovery at most once if discovery is disabled. This is done to use discovery as a standard ping for gateways which do not have discovery feature or discovery is disabled. WC-bug-id: https://jira.whamcloud.com/browse/LU-9971 Lustre-commit: dbcddb4824f0 ("LU-9971 lnet: fix peer ref counting") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35446 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index d167a37..e33dc0e 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2138,6 +2138,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) DEFINE_WAIT(wait); struct lnet_peer *lp; int rc = 0; + int count = 0; again: lnet_net_unlock(cpt); @@ -2157,11 +2158,20 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) break; if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) break; + /* Don't repeat discovery if discovery is disabled. This is + * done to ensure we can use discovery as a standard ping as + * well for backwards compatibility with routers which do not + * have discovery or have discovery disabled + */ + if (lnet_is_discovery_disabled(lp) && count > 0) + break; if (lp->lp_dc_error) break; if (lnet_peer_is_uptodate(lp)) break; lnet_peer_queue_for_discovery(lp); + count++; + CDEBUG(D_NET, "Discovery attempt # %d\n", count); /* If caller requested a non-blocking operation then * return immediately. Once discovery is complete any @@ -2178,15 +2188,6 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) lnet_peer_decref_locked(lp); /* Peer may have changed */ lp = lpni->lpni_peer_net->lpn_peer; - - /* Wait for discovery to complete, but don't repeat if - * discovery is disabled. This is done to ensure we can - * use discovery as a standard ping as well for backwards - * compatibility with routers which do not have discovery - * or have discovery disabled - */ - if (lnet_is_discovery_disabled(lp)) - break; } finish_wait(&lp->lp_dc_waitq, &wait); From patchwork Thu Feb 27 21:14:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410419 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 746A417E0 for ; Thu, 27 Feb 2020 21:37:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5D5A524690 for ; Thu, 27 Feb 2020 21:37:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D5A524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 34678348FCB; Thu, 27 Feb 2020 13:31:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8BB7121FBA2 for ; Thu, 27 Feb 2020 13:20:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C98A88AB6; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C87EB46C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:11 -0500 Message-Id: <1582838290-17243-384-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 383/622] lustre: llite: collect debug info for ll_fsync X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Improve ll_fsync() debug message to capture all the arguments of the current fsync. WC-bug-id: https://jira.whamcloud.com/browse/LU-12462 Lustre-commit: 4cb6ce1863d0 ("LU-12462 llite: Remove old fsync versions") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35339 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 76a5074..a20896c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3907,8 +3907,10 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct ptlrpc_request *req; int rc, err; - CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", - PFID(ll_inode2fid(inode)), inode); + CDEBUG(D_VFSTRACE, + "VFS Op:inode=" DFID "(%p), start %lld, end %lld, datasync %d\n", + PFID(ll_inode2fid(inode)), inode, start, end, datasync); + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); From patchwork Thu Feb 27 21:14:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F223138D for ; Thu, 27 Feb 2020 21:35:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1802F24677 for ; Thu, 27 Feb 2020 21:35:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1802F24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0253134A0ED; Thu, 27 Feb 2020 13:29:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC6AB21FBA2 for ; Thu, 27 Feb 2020 13:20:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD3FC8AB7; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CB35746D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:12 -0500 Message-Id: <1582838290-17243-385-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 384/622] lustre: obdclass: use RCU to release lu_env_item X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev as rhashtable_lookup_fast() is lockless and can find just released objects. Fixes: c678ad5a25 ("lustre: obdclass: put all service's env on the list") WC-bug-id: https://jira.whamcloud.com/browse/LU-12491 Lustre-commit: 87306c22e4b9 ("LU-12491 obdclass: use RCU to release lu_env_item") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/35038 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_object.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index bafd817..c94911d 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1870,6 +1870,7 @@ struct lu_env_item { struct task_struct *lei_task; /* rhashtable key */ struct rhash_head lei_linkage; struct lu_env *lei_env; + struct rcu_head lei_rcu_head; }; static const struct rhashtable_params lu_env_rhash_params = { @@ -1909,6 +1910,14 @@ int lu_env_add(struct lu_env *env) } EXPORT_SYMBOL(lu_env_add); +static void lu_env_item_free(struct rcu_head *head) +{ + struct lu_env_item *lei; + + lei = container_of(head, struct lu_env_item, lei_rcu_head); + kfree(lei); +} + void lu_env_remove(struct lu_env *env) { struct lu_env_item *lei; @@ -1923,13 +1932,11 @@ void lu_env_remove(struct lu_env *env) } } - rcu_read_lock(); lei = rhashtable_lookup_fast(&lu_env_rhash, &task, lu_env_rhash_params); if (lei && rhashtable_remove_fast(&lu_env_rhash, &lei->lei_linkage, lu_env_rhash_params) == 0) - kfree(lei); - rcu_read_unlock(); + call_rcu(&lei->lei_rcu_head, lu_env_item_free); } EXPORT_SYMBOL(lu_env_remove); From patchwork Thu Feb 27 21:14:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410423 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E29DB138D for ; Thu, 27 Feb 2020 21:38:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB35224690 for ; Thu, 27 Feb 2020 21:38:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB35224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 986EE34A32D; Thu, 27 Feb 2020 13:31:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1CB2221FBA2 for ; Thu, 27 Feb 2020 13:20:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CF6188AB8; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CE071468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:13 -0500 Message-Id: <1582838290-17243-386-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 385/622] lustre: mdt: improve IBITS lock definitions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Move MDS_INODELOCK_* flags into a named enum, and add the definitions for the newer flags into wirecheck/wiretest to ensure consistency. Rename MDS_INODELOCK_MAXSHIFT to MDS_INODELOCK_NUMBITS to hold current number of lockbits, rather than one less than the number of lockbits, since the only two places that use it expect it to be one larger than it is. Fix uses of MDS_INODELOCK_NUMBITS to be number of locks. This does not change the value of MDS_INODELOCK_FULL, which is used in the protocol to exchange supported lock bits between client and server. WC-bug-id: https://jira.whamcloud.com/browse/LU-11285 Lustre-commit: 3611352b699c ("LU-11285 mdt: improve IBITS lock definitions") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35045 Reviewed-by: Patrick Farrell Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 6 ++++ include/uapi/linux/lustre/lustre_idl.h | 51 +++++++++++++++++----------------- 3 files changed, 32 insertions(+), 27 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a20896c..d313730 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4323,7 +4323,7 @@ int ll_have_md_lock(struct inode *inode, u64 *bits, ldlm_lockname[mode]); flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_CBPENDING | LDLM_FL_TEST_LOCK; - for (i = 0; i <= MDS_INODELOCK_MAXSHIFT && *bits != 0; i++) { + for (i = 0; i < MDS_INODELOCK_NUMBITS && *bits != 0; i++) { policy.l_inodebits.bits = *bits & (1 << i); if (policy.l_inodebits.bits == 0) continue; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index e801f2c..adc71ff 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -2185,6 +2185,12 @@ void lustre_assert_wire_constants(void) MDS_INODELOCK_OPEN); LASSERTF(MDS_INODELOCK_LAYOUT == 0x000008, "found 0x%.8x\n", MDS_INODELOCK_LAYOUT); + LASSERTF(MDS_INODELOCK_PERM == 0x000010, "found 0x%.8x\n", + MDS_INODELOCK_PERM); + LASSERTF(MDS_INODELOCK_XATTR == 0x000020, "found 0x%.8x\n", + MDS_INODELOCK_XATTR); + LASSERTF(MDS_INODELOCK_DOM == 0x000040, "found 0x%.8x\n", + MDS_INODELOCK_DOM); /* Checks for struct mdt_ioepoch */ LASSERTF((int)sizeof(struct mdt_ioepoch) == 24, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index c86b188..5acf781 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1482,33 +1482,32 @@ enum mdt_reint_cmd { #define DISP_OPEN_DENY 0x10000000 /* INODE LOCK PARTS */ -#define MDS_INODELOCK_LOOKUP 0x000001 /* - * For namespace, dentry etc, and - * also was used to protect - * permission (mode, owner, group - * etc) before 2.4. - */ -#define MDS_INODELOCK_UPDATE 0x000002 /* size, links, timestamps */ -#define MDS_INODELOCK_OPEN 0x000004 /* For opened files */ -#define MDS_INODELOCK_LAYOUT 0x000008 /* for layout */ - -/* The PERM bit is added int 2.4, and it is used to protect permission(mode, - * owner, group, acl etc), so to separate the permission from LOOKUP lock. - * Because for remote directories(in DNE), these locks will be granted by - * different MDTs(different ldlm namespace). - * - * For local directory, MDT will always grant UPDATE_LOCK|PERM_LOCK together. - * For Remote directory, the master MDT, where the remote directory is, will - * grant UPDATE_LOCK|PERM_LOCK, and the remote MDT, where the name entry is, - * will grant LOOKUP_LOCK. - */ -#define MDS_INODELOCK_PERM 0x000010 -#define MDS_INODELOCK_XATTR 0x000020 /* extended attributes */ -#define MDS_INODELOCK_DOM 0x000040 /* Data for data-on-mdt files */ - -#define MDS_INODELOCK_MAXSHIFT 6 +enum mds_ibits_locks { + MDS_INODELOCK_LOOKUP = 0x000001, /* For namespace, dentry etc. Was + * used to protect permission (mode, + * owner, group, etc) before 2.4. + */ + MDS_INODELOCK_UPDATE = 0x000002, /* size, links, timestamps */ + MDS_INODELOCK_OPEN = 0x000004, /* For opened files */ + MDS_INODELOCK_LAYOUT = 0x000008, /* for layout */ + + /* The PERM bit is added in 2.4, and is used to protect permission + * (mode, owner, group, ACL, etc.) separate from LOOKUP lock. + * For remote directories (in DNE) these locks will be granted by + * different MDTs (different LDLM namespace). + * + * For local directory, the MDT always grants UPDATE|PERM together. + * For remote directory, master MDT (where remote directory is) grants + * UPDATE|PERM, and remote MDT (where name entry is) grants LOOKUP_LOCK. + */ + MDS_INODELOCK_PERM = 0x000010, + MDS_INODELOCK_XATTR = 0x000020, /* non-permission extended attrs */ + MDS_INODELOCK_DOM = 0x000040, /* Data for Data-on-MDT files */ + /* Do not forget to increase MDS_INODELOCK_NUMBITS when adding bits */ +}; +#define MDS_INODELOCK_NUMBITS 7 /* This FULL lock is useful to take on unlink sort of operations */ -#define MDS_INODELOCK_FULL ((1 << (MDS_INODELOCK_MAXSHIFT + 1)) - 1) +#define MDS_INODELOCK_FULL ((1 << MDS_INODELOCK_NUMBITS) - 1) /* DOM lock shouldn't be canceled early, use this macro for ELC */ #define MDS_INODELOCK_ELC (MDS_INODELOCK_FULL & ~MDS_INODELOCK_DOM) From patchwork Thu Feb 27 21:14:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410427 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD6E892A for ; Thu, 27 Feb 2020 21:38:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B621B24690 for ; Thu, 27 Feb 2020 21:38:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B621B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89C2C349834; Thu, 27 Feb 2020 13:31:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 72DAA21FDBB for ; Thu, 27 Feb 2020 13:20:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D21648AB9; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D103D47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:14 -0500 Message-Id: <1582838290-17243-387-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 386/622] lustre: uapi: change "space" hash type to hash flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Change LMV_HASH_TYPE_SPACE to LMV_HASH_FLAG_SPACE to make it flexible in directory layout inheritance in the future. But it's still exposed to user as hash type "space" in "lfs setdirstripe" command to make it easy to understand. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: c605ef1dbeb4 ("LU-11213 uapi: change "space" hash type to hash flag") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35318 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 5 ++--- fs/lustre/lmv/lmv_obd.c | 4 ++-- fs/lustre/ptlrpc/wiretest.c | 2 +- include/uapi/linux/lustre/lustre_idl.h | 10 ---------- include/uapi/linux/lustre/lustre_user.h | 35 ++++++++++++++++++++++++++------- 5 files changed, 33 insertions(+), 23 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index bb1efb4..b33a6ed 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -55,7 +55,6 @@ struct lmv_stripe_md { struct lmv_oinfo lsm_md_oinfo[0]; }; -/* NB: LMV_HASH_TYPE_SPACE is set in default LMV only */ static inline bool lmv_is_known_hash_type(u32 type) { return (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_FNV_1A_64 || @@ -91,9 +90,9 @@ static inline bool lmv_dir_bad_hash(const struct lmv_stripe_md *lsm) } /* NB, this is checking directory default LMV */ -static inline bool lmv_dir_space_hashed(const struct lmv_stripe_md *lsm) +static inline bool lmv_dir_qos_mkdir(const struct lmv_stripe_md *lsm) { - return lsm && lsm->lsm_md_hash_type == LMV_HASH_TYPE_SPACE; + return lsm && (lsm->lsm_md_hash_type & LMV_HASH_FLAG_SPACE); } static inline bool diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index bd64ebc..ae799db 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1187,7 +1187,7 @@ static u32 lmv_placement_policy(struct obd_device *obd, mdt = le32_to_cpu(lum->lum_stripe_offset); } else if (op_data->op_code == LUSTRE_OPC_MKDIR && !lmv_dir_striped(op_data->op_mea1) && - lmv_dir_space_hashed(op_data->op_default_mea1)) { + lmv_dir_qos_mkdir(op_data->op_default_mea1)) { mdt = op_data->op_mds; } else if (op_data->op_code == LUSTRE_OPC_MKDIR && op_data->op_default_mea1 && @@ -1716,7 +1716,7 @@ struct lmv_tgt_desc * op_data->op_mds = oinfo->lmo_mds; tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); } else if (op_data->op_code == LUSTRE_OPC_MKDIR && - lmv_dir_space_hashed(op_data->op_default_mea1) && + lmv_dir_qos_mkdir(op_data->op_default_mea1) && !lmv_dir_striped(lsm)) { tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); if (tgt == ERR_PTR(-EAGAIN)) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index adc71ff..1d34b15 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1661,8 +1661,8 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LMV_MAGIC_V1 != 0x0CD20CD0); BUILD_BUG_ON(LMV_MAGIC_STRIPE != 0x0CD40CD0); BUILD_BUG_ON(LMV_HASH_TYPE_MASK != 0x0000ffff); + BUILD_BUG_ON(LMV_HASH_FLAG_SPACE != 0x08000000); BUILD_BUG_ON(LMV_HASH_FLAG_MIGRATION != 0x80000000); - BUILD_BUG_ON(LMV_HASH_FLAG_DEAD != 0x40000000); /* Checks for struct obd_statfs */ LASSERTF((int)sizeof(struct obd_statfs) == 144, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 5acf781..5740d42 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2001,16 +2001,6 @@ struct lmv_foreign_md { #define LMV_MAGIC_STRIPE 0x0CD40CD0 /* magic for dir sub_stripe */ #define LMV_MAGIC_FOREIGN 0x0CD50CD0 /* magic for lmv foreign */ -/* - *Right now only the lower part(0-16bits) of lmv_hash_type is being used, - * and the higher part will be the flag to indicate the status of object, - * for example the object is being migrated. And the hash function - * might be interpreted differently with different flags. - */ -#define LMV_HASH_TYPE_MASK 0x0000ffff - -#define LMV_HASH_FLAG_MIGRATION 0x80000000 -#define LMV_HASH_FLAG_DEAD 0x40000000 /** * The FNV-1a hash algorithm is as follows: diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index d43170f..86f3111 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -655,16 +655,37 @@ enum lmv_hash_type { LMV_HASH_TYPE_UNKNOWN = 0, /* 0 is reserved for testing purpose */ LMV_HASH_TYPE_ALL_CHARS = 1, LMV_HASH_TYPE_FNV_1A_64 = 2, - LMV_HASH_TYPE_SPACE = 3, /* - * distribute subdirs among all MDTs - * with balanced space usage. - */ LMV_HASH_TYPE_MAX, }; -#define LMV_HASH_NAME_ALL_CHARS "all_char" -#define LMV_HASH_NAME_FNV_1A_64 "fnv_1a_64" -#define LMV_HASH_NAME_SPACE "space" +#define LMV_HASH_TYPE_DEFAULT LMV_HASH_TYPE_FNV_1A_64 + +#define LMV_HASH_NAME_ALL_CHARS "all_char" +#define LMV_HASH_NAME_FNV_1A_64 "fnv_1a_64" + +/* not real hash type, but exposed to user as "space" hash type */ +#define LMV_HASH_NAME_SPACE "space" + +/* Right now only the lower part(0-16bits) of lmv_hash_type is being used, + * and the higher part will be the flag to indicate the status of object, + * for example the object is being migrated. And the hash function + * might be interpreted differently with different flags. + */ +#define LMV_HASH_TYPE_MASK 0x0000ffff + +/* once this is set on a plain directory default layout, newly created + * subdirectories will be distributed on all MDTs by space usage. + */ +#define LMV_HASH_FLAG_SPACE 0x08000000 + +/* The striped directory has ever lost its master LMV EA, then LFSCK + * re-generated it. This flag is used to indicate such case. It is an + * on-disk flag. + */ +#define LMV_HASH_FLAG_LOST_LMV 0x10000000 + +#define LMV_HASH_FLAG_BAD_TYPE 0x20000000 +#define LMV_HASH_FLAG_MIGRATION 0x80000000 struct lustre_foreign_type { uint32_t lft_type; From patchwork Thu Feb 27 21:14:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3A0D17E0 for ; Thu, 27 Feb 2020 21:44:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8B66B24690 for ; Thu, 27 Feb 2020 21:44:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B66B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 920783494A0; Thu, 27 Feb 2020 13:35:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CACE021FBE7 for ; Thu, 27 Feb 2020 13:20:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D64098ABA; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D3C2946A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:15 -0500 Message-Id: <1582838290-17243-388-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 387/622] lustre: osc: cancel osc_lock list traversal once found the lock is being used X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Gu Zheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Gu Zheng Currently, in osc_ldlm_weigh_ast, it walks osc_lock list (oo_ol_list) to check whether target dlm is being used, normally, if found, it needs to skip the rest ones and cancel the traversal, but it doesn't, let's fix it here. Fixes: 3f3a24dc5d7d ("LU-3259 clio: cl_lock simplification") WC-bug-id: https://jira.whamcloud.com/browse/LU-11518 Lustre-commit: eb9aa909343b ("LU-11518 osc: cancel osc_lock list traversal once found the lock is being used") Signed-off-by: Gu Zheng Reviewed-on: https://review.whamcloud.com/35396 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_lock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 29d8373..e01bf5f 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -687,9 +687,10 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) spin_lock(&obj->oo_ol_spin); list_for_each_entry(oscl, &obj->oo_ol_list, ols_nextlock_oscobj) { - if (oscl->ols_dlmlock && oscl->ols_dlmlock != dlmlock) - continue; - found = true; + if (oscl->ols_dlmlock == dlmlock) { + found = true; + break; + } } spin_unlock(&obj->oo_ol_spin); if (found) { From patchwork Thu Feb 27 21:14:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410743 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCACF924 for ; Thu, 27 Feb 2020 21:45:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A4566246A2 for ; Thu, 27 Feb 2020 21:45:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A4566246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F68034B0AC; Thu, 27 Feb 2020 13:36:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1937421FBE7 for ; Thu, 27 Feb 2020 13:20:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D82BA8ABB; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D6A0846C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:16 -0500 Message-Id: <1582838290-17243-389-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 388/622] lustre: obdclass: add comment for rcu handling in lu_env_remove X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" During the review it was pointed out why the RCU lock was dropped in lu_env_remove() but the code itself doesn't explain why. Add a comment giving the details why RCU locking is not needed. WC-bug-id: https://jira.whamcloud.com/browse/LU-12491 Lustre-commit: 709fbe6ee54a ("LU-12491 obdclass: add comment for rcu handling in lu_env_remove") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/35447 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_object.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index c94911d..d8bff3f 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1932,6 +1932,11 @@ void lu_env_remove(struct lu_env *env) } } + /* The rcu_lock is not taking in this case since the key + * used is the actual task_struct. This implies that each + * object is only removed by the owning thread, so there + * can never be a race on a particular object. + */ lei = rhashtable_lookup_fast(&lu_env_rhash, &task, lu_env_rhash_params); if (lei && rhashtable_remove_fast(&lu_env_rhash, &lei->lei_linkage, From patchwork Thu Feb 27 21:14:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410431 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2107138D for ; Thu, 27 Feb 2020 21:38:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DAADA24690 for ; Thu, 27 Feb 2020 21:38:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DAADA24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4A42834A369; Thu, 27 Feb 2020 13:31:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AFCA21FBE7 for ; Thu, 27 Feb 2020 13:20:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DAFBB8ABC; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D953846D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:17 -0500 Message-Id: <1582838290-17243-390-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 389/622] lnet: honor discovery setting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata If discovery is off do not push out any updates. This could be triggered in case of a gateway's interface changing. WC-bug-id: https://jira.whamcloud.com/browse/LU-12423 Lustre-commit: a06b656639c4 ("LU-12423 lnet: honor discovery setting") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35192 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index e33dc0e..b0ca1de 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -877,6 +877,8 @@ struct lnet_peer_ni * int cpt; lnet_net_lock(LNET_LOCK_EX); + if (lnet_peer_discovery_disabled) + force = 0; lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); for (cpt = 0; cpt < lncpt; cpt++) { ptable = the_lnet.ln_peer_tables[cpt]; From patchwork Thu Feb 27 21:14:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 518E092A for ; Thu, 27 Feb 2020 21:35:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3A0A224677 for ; Thu, 27 Feb 2020 21:35:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A0A224677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5206721FD9E; Thu, 27 Feb 2020 13:29:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BE4121FBE7 for ; Thu, 27 Feb 2020 13:20:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DD8AC8ABD; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DC20D468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:18 -0500 Message-Id: <1582838290-17243-391-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 390/622] lustre: obdclass: don't send multiple statfs RPCs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger If multiple threads are racing to send a non-cached OST_STATFS or MDS_STATFS RPC, this can cause a significant RPC storm for systems with many-core clients and many OSTs due to amplification of the requests, and the fact that STATFS RPCs are sent asynchronously. Some logs have shown few 96-core clients have 20k+ OST_STATFS RPCs in flight concurrently, which can overload the network if many OSTs are on the same OSS nodes (osc.*.max_rpcs_in_flight is per OST). This was not previously a significant issue when core counts were smaller on the clients, or with fewer OSTs per OSS. If a thread can't use the cached statfs values, limit statfs to one thread at a time, since the thread(s) would be blocked waiting for the RPC replies anyway, which can't finish faster if many are sent. Also add a llite.*.statfs_max_age parameter that can be tuned on to control the maximum age (in seconds) of the statfs cache. This can avoid overhead for workloads that are statfs heavy, given that the filesystem is _probably_ not running out of space this second, and even so "statfs" does not guarantee space in parallel workloads. WC-bug-id: https://jira.whamcloud.com/browse/LU-12368 Lustre-commit: 1c41a6ac390b ("LU-12368 obdclass: don't send multiple statfs RPCs") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35380 Reviewed-by: Patrick Farrell Reviewed-by: Alex Zhuravlev Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 ++ fs/lustre/include/obd_class.h | 22 ++++++++++++++++++++-- fs/lustre/llite/llite_internal.h | 3 +++ fs/lustre/llite/llite_lib.c | 5 +++-- fs/lustre/llite/lproc_llite.c | 31 +++++++++++++++++++++++++++++++ 5 files changed, 59 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index f53c303..53d078e 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -379,6 +379,8 @@ struct echo_client_obd { /* allow statfs data caching for 1 second */ #define OBD_STATFS_CACHE_SECONDS 1 +/* arbitrary maximum. larger would be useless, allows catching bogus input */ +#define OBD_STATFS_CACHE_MAX_AGE 3600 /* seconds */ #define lov_tgt_desc lu_tgt_desc diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 76e8201..b8afa5a 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -952,13 +952,31 @@ static inline int obd_statfs(const struct lu_env *env, struct obd_export *exp, if (obd->obd_osfs_age < max_age || ((obd->obd_osfs.os_state & OS_STATE_SUM) && !(flags & OBD_STATFS_SUM))) { - rc = OBP(obd, statfs)(env, exp, osfs, max_age, flags); + bool update_age = false; + /* the RPC will block anyway, so avoid sending many at once */ + rc = mutex_lock_interruptible(&obd->obd_dev_mutex); + if (rc) + return rc; + if (obd->obd_osfs_age < max_age || + ((obd->obd_osfs.os_state & OS_STATE_SUM) && + !(flags & OBD_STATFS_SUM))) { + rc = OBP(obd, statfs)(env, exp, osfs, max_age, flags); + update_age = true; + } else { + CDEBUG(D_SUPER, + "%s: new %p cache blocks %llu/%llu objects %llu/%llu\n", + obd->obd_name, &obd->obd_osfs, + obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks, + obd->obd_osfs.os_ffree, obd->obd_osfs.os_files); + } if (rc == 0) { spin_lock(&obd->obd_osfs_lock); memcpy(&obd->obd_osfs, osfs, sizeof(obd->obd_osfs)); - obd->obd_osfs_age = ktime_get_seconds(); + if (update_age) + obd->obd_osfs_age = ktime_get_seconds(); spin_unlock(&obd->obd_osfs_lock); } + mutex_unlock(&obd->obd_dev_mutex); } else { CDEBUG(D_SUPER, "%s: use %p cache blocks %llu/%llu objects %llu/%llu\n", diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 8d95694..9d60ae5 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -568,6 +568,9 @@ struct ll_sb_info { /* st_blksize returned by stat(2), when non-zero */ unsigned int ll_stat_blksize; + /* maximum relative age of cached statfs results */ + unsigned int ll_statfs_max_age; + struct kset ll_kset; /* sysfs object */ struct completion ll_kobj_unregister; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 33f7fdb..cc417d6 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -87,6 +87,7 @@ static struct ll_sb_info *ll_init_sbi(void) spin_lock_init(&sbi->ll_pp_extent_lock); spin_lock_init(&sbi->ll_process_lock); sbi->ll_rw_stats_on = 0; + sbi->ll_statfs_max_age = OBD_STATFS_CACHE_SECONDS; si_meminfo(&si); pages = si.totalram - si.totalhigh; @@ -330,7 +331,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) * available */ err = obd_statfs(NULL, sbi->ll_md_exp, osfs, - ktime_get_seconds() - OBD_STATFS_CACHE_SECONDS, + ktime_get_seconds() - sbi->ll_statfs_max_age, OBD_STATFS_FOR_MDT0); if (err) goto out_md_fid; @@ -1860,7 +1861,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, time64_t max_age; int rc; - max_age = ktime_get_seconds() - OBD_STATFS_CACHE_SECONDS; + max_age = ktime_get_seconds() - sbi->ll_statfs_max_age; rc = obd_statfs(NULL, sbi->ll_md_exp, osfs, max_age, flags); if (rc) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 02403e4..4cffd36 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -882,6 +882,36 @@ static ssize_t lazystatfs_store(struct kobject *kobj, } LUSTRE_RW_ATTR(lazystatfs); +static ssize_t statfs_max_age_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_statfs_max_age); +} + +static ssize_t statfs_max_age_store(struct kobject *kobj, + struct attribute *attr, const char *buffer, + size_t count) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + unsigned int val; + int rc; + + rc = kstrtouint(buffer, 10, &val); + if (rc) + return rc; + if (val > OBD_STATFS_CACHE_MAX_AGE) + return -EINVAL; + + sbi->ll_statfs_max_age = val; + + return count; +} +LUSTRE_RW_ATTR(statfs_max_age); + static ssize_t max_easize_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -1480,6 +1510,7 @@ struct lprocfs_vars lprocfs_llite_obd_vars[] = { &lustre_attr_statahead_max.attr, &lustre_attr_statahead_agl.attr, &lustre_attr_lazystatfs.attr, + &lustre_attr_statfs_max_age.attr, &lustre_attr_max_easize.attr, &lustre_attr_default_easize.attr, &lustre_attr_xattr_cache.attr, From patchwork Thu Feb 27 21:14:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410435 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3107E138D for ; Thu, 27 Feb 2020 21:38:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1A21A24690 for ; Thu, 27 Feb 2020 21:38:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A21A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5D37F34902A; Thu, 27 Feb 2020 13:31:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F18C321FBE7 for ; Thu, 27 Feb 2020 13:20:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E00C08ABE; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DEEBA47C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:19 -0500 Message-Id: <1582838290-17243-392-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 391/622] lustre: lov: Correct bounds checking X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nathaniel Clark , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Nathaniel Clark While Dan Carpenter ran his smatch tool against the lustre code base he encountered the following static checker warning: fs/lustre/lov/lov_ea.c:207 lsm_unpackmd_common() warn: signed overflow undefined. 'min_stripe_maxbytes * stripe_count < min_stripe_maxbytes' The current code doesn't properly handle the potential overflow with the min_stripe_maxbytes * stripe_count. This fixes the overflow detection for maxbytes in lsme_unpack(). Fixes: 476f575cf070 ("staging: lustre: lov: Ensure correct operation for large object sizes") Reported-by: Dan Carpenter WC-bug-id: https://jira.whamcloud.com/browse/LU-9862 Lustre-commit: 31ff883c7b0c ("LU-9862 lov: Correct bounds checking") Signed-off-by: Nathaniel Clark Reviewed-on: https://review.whamcloud.com/28484 Reviewed-by: Patrick Farrell Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_ea.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c index 07bfe0f..4be01bb8 100644 --- a/fs/lustre/lov/lov_ea.c +++ b/fs/lustre/lov/lov_ea.c @@ -274,15 +274,16 @@ void lsm_free(struct lov_stripe_md *lsm) if (min_stripe_maxbytes == 0) min_stripe_maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES; - lov_bytes = min_stripe_maxbytes * stripe_count; + if (stripe_count == 0) + lov_bytes = min_stripe_maxbytes; + else if (min_stripe_maxbytes <= LLONG_MAX / stripe_count) + lov_bytes = min_stripe_maxbytes * stripe_count; + else + lov_bytes = MAX_LFS_FILESIZE; out_dom: - if (maxbytes) { - if (lov_bytes < min_stripe_maxbytes) /* handle overflow */ - *maxbytes = MAX_LFS_FILESIZE; - else - *maxbytes = lov_bytes; - } + if (maxbytes) + *maxbytes = min_t(loff_t, lov_bytes, MAX_LFS_FILESIZE); return lsme; From patchwork Thu Feb 27 21:14:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410349 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EDAE92A for ; Thu, 27 Feb 2020 21:35:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4621024677 for ; Thu, 27 Feb 2020 21:35:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4621024677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44E64348BED; Thu, 27 Feb 2020 13:29:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42B6321FBE7 for ; Thu, 27 Feb 2020 13:20:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E29378ABF; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E192246A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:20 -0500 Message-Id: <1582838290-17243-393-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 392/622] lustre: lu_object: Add missed qos_rr_init X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The new lmv space hash code uses the lu_qos_rr struct, but forgot to init it fully. Specifically, the spin lock isn't inited, causing failures. WC-bug-id: https://jira.whamcloud.com/browse/LU-12538 Lustre-commit: 5e6a30cc2f34 ("LU-12538 lod: Add missed qos_rr_init") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35490 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 1 + fs/lustre/lmv/lmv_obd.c | 3 ++- fs/lustre/obdclass/lu_qos.c | 7 +++++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index 6b1064a..d2e84a3 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1388,6 +1388,7 @@ struct lu_qos { lq_reset:1; /* zero current penalties */ }; +void lu_qos_rr_init(struct lu_qos_rr *lqr); int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); u64 lu_prandom_u64_max(u64 ep_ro); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index ae799db..e9f9c36 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1295,13 +1295,14 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) INIT_LIST_HEAD(&lmv->lmv_qos.lq_svr_list); init_rwsem(&lmv->lmv_qos.lq_rw_sem); lmv->lmv_qos.lq_dirty = 1; - lmv->lmv_qos.lq_rr.lqr_dirty = 1; lmv->lmv_qos.lq_reset = 1; /* Default priority is toward free space balance */ lmv->lmv_qos.lq_prio_free = 232; /* Default threshold for rr (roughly 17%) */ lmv->lmv_qos.lq_threshold_rr = 43; + lu_qos_rr_init(&lmv->lmv_qos.lq_rr); + /* * initialize rr_index to lower 32bit of netid, so that client * can distribute subdirs evenly from the beginning. diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c index 9fdcbc2..d4803e8 100644 --- a/fs/lustre/obdclass/lu_qos.c +++ b/fs/lustre/obdclass/lu_qos.c @@ -42,6 +42,13 @@ #include #include +void lu_qos_rr_init(struct lu_qos_rr *lqr) +{ + spin_lock_init(&lqr->lqr_alloc); + lqr->lqr_dirty = 1; +} +EXPORT_SYMBOL(lu_qos_rr_init); + /** * Add a new target to Quality of Service (QoS) target table. * From patchwork Thu Feb 27 21:14:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410439 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62251138D for ; Thu, 27 Feb 2020 21:38:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4AE9024690 for ; Thu, 27 Feb 2020 21:38:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AE9024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A989D34A397; Thu, 27 Feb 2020 13:31:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8476621FBE7 for ; Thu, 27 Feb 2020 13:20:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E5A5E8F00; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E44D346C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:21 -0500 Message-Id: <1582838290-17243-394-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 393/622] lustre: fld: let's caller to retry FLD_QUERY X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang In fld_client_rpc(), if the FLD_QUERY request between MDTs fails with -EWOUDBLOCK because the connection is lost, return -EAGAIN to notify the caller to retry. WC-bug-id: https://jira.whamcloud.com/browse/LU-11761 Lustre-commit: e3f6111dfd1c ("LU-11761 fld: let's caller to retry FLD_QUERY") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/34962 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/fld/fld_request.c | 23 ++++++++++++++--------- fs/lustre/include/obd_support.h | 1 + 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c index 75cba18..52c148a 100644 --- a/fs/lustre/fld/fld_request.c +++ b/fs/lustre/fld/fld_request.c @@ -314,7 +314,6 @@ int fld_client_rpc(struct obd_export *exp, LASSERT(exp); -again: imp = class_exp2cliimp(exp); switch (fld_op) { case FLD_QUERY: @@ -363,17 +362,23 @@ int fld_client_rpc(struct obd_export *exp, req->rq_reply_portal = MDC_REPLY_PORTAL; ptlrpc_at_set_req_timeout(req); - obd_get_request_slot(&exp->exp_obd->u.cli); - rc = ptlrpc_queue_wait(req); - obd_put_request_slot(&exp->exp_obd->u.cli); + if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ && req->rq_no_delay)) { + /* the same error returned by ptlrpc_import_delay_req */ + rc = -EWOULDBLOCK; + req->rq_status = rc; + } else { + obd_get_request_slot(&exp->exp_obd->u.cli); + rc = ptlrpc_queue_wait(req); + obd_put_request_slot(&exp->exp_obd->u.cli); + } + if (rc != 0) { if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) { - /* Since LWP is not replayable, so it will keep - * trying unless umount happens, otherwise it would - * cause unnecessary failure of the application. + /* + * Since LWP is not replayable, so notify the caller + * to retry if needed after a while. */ - ptlrpc_req_finished(req); - goto again; + rc = -EAGAIN; } goto out_req; } diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 9609dd5..23f6bae 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -424,6 +424,7 @@ #define OBD_FAIL_FLD 0x1100 #define OBD_FAIL_FLD_QUERY_NET 0x1101 #define OBD_FAIL_FLD_READ_NET 0x1102 +#define OBD_FAIL_FLD_QUERY_REQ 0x1103 #define OBD_FAIL_SEC_CTX 0x1200 #define OBD_FAIL_SEC_CTX_INIT_NET 0x1201 From patchwork Thu Feb 27 21:14:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A42C092A for ; Thu, 27 Feb 2020 21:38:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8CDCE24690 for ; Thu, 27 Feb 2020 21:38:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CDCE24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B6B934A3E6; Thu, 27 Feb 2020 13:31:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C61D421FBE7 for ; Thu, 27 Feb 2020 13:20:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E85A48F01; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E713246D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:22 -0500 Message-Id: <1582838290-17243-395-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 394/622] lustre: llite: make sure readahead cover current read X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong When doing readahead, @ria_end_min is used to indicate how far we are expected to read to cover current read. update @ria_end_min unconditionally with IO end. also @ria_end_min is closed interval which should be calculated as start + count - 1; WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: 8fbef5ee7619 ("LU-12043 llite: make sure readahead cover current read") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35215 Reviewed-by: Patrick Farrell Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index bec26c4..fe9a2b0 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -689,16 +689,8 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, /* at least to extend the readahead window to cover current read */ if (!hit && vio->vui_ra_valid && - vio->vui_ra_start + vio->vui_ra_count > ria->ria_start) { - unsigned long remainder; - - /* to the end of current read window. */ - mlen = vio->vui_ra_start + vio->vui_ra_count - ria->ria_start; - /* trim to RPC boundary */ - ras_align(ras, ria->ria_start, &remainder); - mlen = min(mlen, ras->ras_rpc_size - remainder); - ria->ria_end_min = ria->ria_start + mlen; - } + vio->vui_ra_start + vio->vui_ra_count > ria->ria_start) + ria->ria_end_min = vio->vui_ra_start + vio->vui_ra_count - 1; ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, len, mlen); if (ria->ria_reserved < len) From patchwork Thu Feb 27 21:14:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56DEF138D for ; Thu, 27 Feb 2020 21:44:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3FB1D24690 for ; Thu, 27 Feb 2020 21:44:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3FB1D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 213DE34948A; Thu, 27 Feb 2020 13:35:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17B4B21FDBF for ; Thu, 27 Feb 2020 13:20:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EB5FC8F02; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E9C3D468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:23 -0500 Message-Id: <1582838290-17243-396-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 395/622] lustre: ptlrpc: Add jobid to rpctrace debug messages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler This mod adds the jobid string found in the ptlrpc_body of an rpc to the output of rpctrace messages. If jobids are not in use the string will be empty. If jobids are in use, the string can be useful in analyzing Lustre activity. Cray-bug-id: LUS-7557 WC-bug-id: https://jira.whamcloud.com/browse/LU-12523 Lustre-commit: 9ae40e4c5ecb ("LU-12523 ptlrpc: Add jobid to rpctrace debug messages") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/35445 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 1 + fs/lustre/ptlrpc/client.c | 15 +++++++++------ fs/lustre/ptlrpc/pack_generic.c | 30 ++++++++++++++++++++++++++++-- fs/lustre/ptlrpc/service.c | 12 +++++++----- 4 files changed, 45 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 7ed2d99..d03e8c6 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -2074,6 +2074,7 @@ int lustre_shrink_msg(struct lustre_msg *msg, int segment, u32 lustre_msg_get_magic(struct lustre_msg *msg); u32 lustre_msg_get_timeout(struct lustre_msg *msg); u32 lustre_msg_get_service_time(struct lustre_msg *msg); +char *lustre_msg_get_jobid(struct lustre_msg *msg); u32 lustre_msg_get_cksum(struct lustre_msg *msg); u32 lustre_msg_calc_cksum(struct lustre_msg *msg); void lustre_msg_set_handle(struct lustre_msg *msg, diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index ac16878..bd641cc 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1639,11 +1639,12 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req) } CDEBUG(D_RPCTRACE, - "Sending RPC pname:cluuid:pid:xid:nid:opc %s:%s:%d:%llu:%s:%d\n", - current->comm, + "Sending RPC req@%p pname:cluuid:pid:xid:nid:opc:job %s:%s:%d:%llu:%s:%d:%s\n", + req, current->comm, imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, - obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg)); + obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg), + lustre_msg_get_jobid(req->rq_reqmsg)); rc = ptl_send_rpc(req, 0); if (rc == -ENOMEM) { @@ -2057,12 +2058,14 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) if (req->rq_reqmsg) CDEBUG(D_RPCTRACE, - "Completed RPC pname:cluuid:pid:xid:nid:opc %s:%s:%d:%llu:%s:%d\n", - current->comm, imp->imp_obd->obd_uuid.uuid, + "Completed RPC req@%p pname:cluuid:pid:xid:nid:opc:job %s:%s:%d:%llu:%s:%d:%s\n", + req, current->comm, + imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, obd_import_nid2str(imp), - lustre_msg_get_opc(req->rq_reqmsg)); + lustre_msg_get_opc(req->rq_reqmsg), + lustre_msg_get_jobid(req->rq_reqmsg)); spin_lock(&imp->imp_lock); /* diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index a4f28f3..f687ecc 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1183,6 +1183,31 @@ u32 lustre_msg_get_service_time(struct lustre_msg *msg) } } +char *lustre_msg_get_jobid(struct lustre_msg *msg) +{ + switch (msg->lm_magic) { + case LUSTRE_MSG_MAGIC_V2: { + struct ptlrpc_body *pb; + + /* the old pltrpc_body_v2 is smaller; doesn't include jobid */ + if (msg->lm_buflens[MSG_PTLRPC_BODY_OFF] < + sizeof(struct ptlrpc_body)) + return NULL; + + pb = lustre_msg_buf_v2(msg, MSG_PTLRPC_BODY_OFF, + sizeof(struct ptlrpc_body)); + if (!pb) + return NULL; + + return pb->pb_jobid; + } + default: + CERROR("incorrect message magic: %08x\n", msg->lm_magic); + return NULL; + } +} +EXPORT_SYMBOL(lustre_msg_get_jobid); + u32 lustre_msg_get_cksum(struct lustre_msg *msg) { switch (msg->lm_magic) { @@ -2337,7 +2362,7 @@ void _debug_req(struct ptlrpc_request *req, vaf.fmt = fmt; vaf.va = &args; libcfs_debug_msg(msgdata, - "%pV req@%p x%llu/t%lld(%lld) o%d->%s@%s:%d/%d lens %d/%d e %d to %lld dl %lld ref %d fl " REQ_FLAGS_FMT "/%x/%x rc %d/%d\n", + "%pV req@%p x%llu/t%lld(%lld) o%d->%s@%s:%d/%d lens %d/%d e %d to %lld dl %lld ref %d fl " REQ_FLAGS_FMT "/%x/%x rc %d/%d job:'%s'\n", &vaf, req, req->rq_xid, req->rq_transno, req_ok ? lustre_msg_get_transno(req->rq_reqmsg) : 0, @@ -2355,7 +2380,8 @@ void _debug_req(struct ptlrpc_request *req, atomic_read(&req->rq_refcount), DEBUG_REQ_FLAGS(req), req_ok ? lustre_msg_get_flags(req->rq_reqmsg) : -1, - rep_flags, req->rq_status, rep_status); + rep_flags, req->rq_status, rep_status, + req_ok ? lustre_msg_get_jobid(req->rq_reqmsg) : ""); va_end(args); } EXPORT_SYMBOL(_debug_req); diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 8e6013a..3132a1e 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1756,15 +1756,16 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, } CDEBUG(D_RPCTRACE, - "Handling RPC pname:cluuid+ref:pid:xid:nid:opc %s:%s+%d:%d:x%llu:%s:%d\n", - current->comm, + "Handling RPC req@%p pname:cluuid+ref:pid:xid:nid:opc:job %s:%s+%d:%d:x%llu:%s:%d:%s\n", + request, current->comm, (request->rq_export ? (char *)request->rq_export->exp_client_uuid.uuid : "0"), (request->rq_export ? refcount_read(&request->rq_export->exp_refcount) : -99), lustre_msg_get_status(request->rq_reqmsg), request->rq_xid, libcfs_id2str(request->rq_peer), - lustre_msg_get_opc(request->rq_reqmsg)); + lustre_msg_get_opc(request->rq_reqmsg), + lustre_msg_get_jobid(request->rq_reqmsg)); if (lustre_msg_get_opc(request->rq_reqmsg) != OBD_PING) CFS_FAIL_TIMEOUT_MS(OBD_FAIL_PTLRPC_PAUSE_REQ, cfs_fail_val); @@ -1796,8 +1797,8 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, timediff_usecs = ktime_us_delta(work_end, work_start); arrived_usecs = ktime_us_delta(work_end, arrived); CDEBUG(D_RPCTRACE, - "Handled RPC pname:cluuid+ref:pid:xid:nid:opc %s:%s+%d:%d:x%llu:%s:%d Request processed in %lldus (%lldus total) trans %llu rc %d/%d\n", - current->comm, + "Handled RPC req@%p pname:cluuid+ref:pid:xid:nid:opc:job %s:%s+%d:%d:x%llu:%s:%d:%s Request processed in %lldus (%lldus total) trans %llu rc %d/%d\n", + request, current->comm, (request->rq_export ? (char *)request->rq_export->exp_client_uuid.uuid : "0"), (request->rq_export ? @@ -1806,6 +1807,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, request->rq_xid, libcfs_id2str(request->rq_peer), lustre_msg_get_opc(request->rq_reqmsg), + lustre_msg_get_jobid(request->rq_reqmsg), timediff_usecs, arrived_usecs, (request->rq_repmsg ? From patchwork Thu Feb 27 21:14:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410357 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9711D138D for ; Thu, 27 Feb 2020 21:35:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7F73124677 for ; Thu, 27 Feb 2020 21:35:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F73124677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C4BC200CF4; Thu, 27 Feb 2020 13:30:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DE8B21FC27 for ; Thu, 27 Feb 2020 13:20:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EDECE8F03; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EC82047C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:24 -0500 Message-Id: <1582838290-17243-397-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 396/622] lnet: libcfs: Reduce memory frag due to HA debug msg X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler The dynamic allocation and freeing of Lustre trace pages has been shown to cause memory fragmentation that sometimes prevents applications from getting the contiguous memory they need to run. In one such occurrence over 99% of the messages were the matched open trace messages issued by mdc_close(): DEBUG_REQ(D_HA, mod->mod_open_req, "matched open; tag %d", tag); D_HA is included in the default set of debug flags. This has proven to be quite useful in debugging connection issues particularly at mount time. So removing all HA message from the default tracing is not a good option. However, the matched open debug message has not proven itself to be as generally useful. So moving the message under a different debug flag, one that must be explicitly enabled, reduces the amount of default tracing and thereby helps reduce fragmentation without causing much loss of functionality. Using D_RPCTRACE to match the corresponding open debug message in mdc_set_open_replay_data. Cray-bug-id: LUS-7560 WC-bug-id: https://jira.whamcloud.com/browse/LU-12524 Lustre-commit: 076a5961f20b ("LU-12524 libcfs: Reduce memory frag due to HA debug msg") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/35449 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index a26efa1..7bc6196 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -937,7 +937,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data, mod->mod_close_req = req; - DEBUG_REQ(D_HA, mod->mod_open_req, "matched open"); + DEBUG_REQ(D_RPCTRACE, mod->mod_open_req, "matched open"); /* We no longer want to preserve this open for replay even * though the open was committed. b=3632, b=3633 */ From patchwork Thu Feb 27 21:14:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410447 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 739D5138D for ; Thu, 27 Feb 2020 21:38:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C33C24690 for ; Thu, 27 Feb 2020 21:38:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C33C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7CDB234A41A; Thu, 27 Feb 2020 13:31:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B18FE21FCB4 for ; Thu, 27 Feb 2020 13:20:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F0A668F04; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EF43546A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:25 -0500 Message-Id: <1582838290-17243-398-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 397/622] lustre: ptlrpc: change IMPORT_SET_* macros into real functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Make the IMPORT_SET_STATE_NOLOCK and IMPORT_SET_STATE macros into normal functions. Since import_set_state_nolock() is basically a wrapper around __import_set_state() we can merge both functions. WC-bug-id: https://jira.whamcloud.com/browse/LU-10756 Lustre-commit: cf78502e48d6 ("LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/35463 Reviewed-by: Neil Brown Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 96 +++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index f8e15f2..98c09f6 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -58,10 +58,10 @@ struct ptlrpc_connect_async_args { /** * Updates import @imp current state to provided @state value - * Helper function. Must be called under imp_lock. + * Helper function. */ -static void __import_set_state(struct obd_import *imp, - enum lustre_imp_state state) +static void import_set_state_nolock(struct obd_import *imp, + enum lustre_imp_state state) { switch (state) { case LUSTRE_IMP_CLOSED: @@ -74,6 +74,18 @@ static void __import_set_state(struct obd_import *imp, break; default: imp->imp_replay_state = LUSTRE_IMP_REPLAY; + break; + } + + /* A CLOSED import should remain so. */ + if (state == LUSTRE_IMP_CLOSED) + return; + + if (imp->imp_state != LUSTRE_IMP_NEW) { + CDEBUG(D_HA, "%p %s: changing import state from %s to %s\n", + imp, obd2cli_tgt(imp->imp_obd), + ptlrpc_import_state_name(imp->imp_state), + ptlrpc_import_state_name(state)); } imp->imp_state = state; @@ -84,24 +96,13 @@ static void __import_set_state(struct obd_import *imp, IMP_STATE_HIST_LEN; } -/* A CLOSED import should remain so. */ -#define IMPORT_SET_STATE_NOLOCK(imp, state) \ -do { \ - if (imp->imp_state != LUSTRE_IMP_CLOSED) { \ - CDEBUG(D_HA, "%p %s: changing import state from %s to %s\n", \ - imp, obd2cli_tgt(imp->imp_obd), \ - ptlrpc_import_state_name(imp->imp_state), \ - ptlrpc_import_state_name(state)); \ - __import_set_state(imp, state); \ - } \ -} while (0) - -#define IMPORT_SET_STATE(imp, state) \ -do { \ - spin_lock(&imp->imp_lock); \ - IMPORT_SET_STATE_NOLOCK(imp, state); \ - spin_unlock(&imp->imp_lock); \ -} while (0) +static void import_set_state(struct obd_import *imp, + enum lustre_imp_state new_state) +{ + spin_lock(&imp->imp_lock); + import_set_state_nolock(imp, new_state); + spin_unlock(&imp->imp_lock); +} static int ptlrpc_connect_interpret(const struct lu_env *env, struct ptlrpc_request *request, @@ -180,7 +181,7 @@ int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) target_len, target_start, obd_import_nid2str(imp)); } - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_DISCON); + import_set_state_nolock(imp, LUSTRE_IMP_DISCON); spin_unlock(&imp->imp_lock); if (obd_dump_on_timeout) @@ -629,7 +630,7 @@ int ptlrpc_connect_import(struct obd_import *imp) return -EALREADY; } - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_CONNECTING); + import_set_state_nolock(imp, LUSTRE_IMP_CONNECTING); imp->imp_conn_cnt++; imp->imp_resend_replay = 0; @@ -742,7 +743,7 @@ int ptlrpc_connect_import(struct obd_import *imp) rc = 0; out: if (rc != 0) - IMPORT_SET_STATE(imp, LUSTRE_IMP_DISCON); + import_set_state(imp, LUSTRE_IMP_DISCON); return rc; } @@ -1094,9 +1095,9 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, if (msg_flags & MSG_CONNECT_RECOVERING) { CDEBUG(D_HA, "connect to %s during recovery\n", obd2cli_tgt(imp->imp_obd)); - IMPORT_SET_STATE(imp, LUSTRE_IMP_REPLAY_LOCKS); + import_set_state(imp, LUSTRE_IMP_REPLAY_LOCKS); } else { - IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL); + import_set_state(imp, LUSTRE_IMP_FULL); ptlrpc_activate_import(imp); } @@ -1149,8 +1150,8 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, imp->imp_remote_handle = *lustre_msg_get_handle(request->rq_repmsg); - if (!(msg_flags & MSG_CONNECT_RECOVERING)) { - IMPORT_SET_STATE(imp, LUSTRE_IMP_EVICTED); + if (!(MSG_CONNECT_RECOVERING & msg_flags)) { + import_set_state(imp, LUSTRE_IMP_EVICTED); rc = 0; goto finish; } @@ -1162,11 +1163,10 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, } if (imp->imp_invalid) { - CDEBUG(D_HA, - "%s: reconnected but import is invalid; marking evicted\n", - imp->imp_obd->obd_name); - IMPORT_SET_STATE(imp, LUSTRE_IMP_EVICTED); - } else if (msg_flags & MSG_CONNECT_RECOVERING) { + CDEBUG(D_HA, "%s: reconnected but import is invalid; " + "marking evicted\n", imp->imp_obd->obd_name); + import_set_state(imp, LUSTRE_IMP_EVICTED); + } else if (MSG_CONNECT_RECOVERING & msg_flags) { CDEBUG(D_HA, "%s: reconnected to %s during replay\n", imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd)); @@ -1175,9 +1175,9 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, imp->imp_resend_replay = 1; spin_unlock(&imp->imp_lock); - IMPORT_SET_STATE(imp, imp->imp_replay_state); + import_set_state(imp, imp->imp_replay_state); } else { - IMPORT_SET_STATE(imp, LUSTRE_IMP_RECOVER); + import_set_state(imp, LUSTRE_IMP_RECOVER); } } else if ((msg_flags & MSG_CONNECT_RECOVERING) && !imp->imp_invalid) { LASSERT(imp->imp_replayable); @@ -1185,14 +1185,14 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, *lustre_msg_get_handle(request->rq_repmsg); imp->imp_last_replay_transno = 0; imp->imp_replay_cursor = &imp->imp_committed_list; - IMPORT_SET_STATE(imp, LUSTRE_IMP_REPLAY); + import_set_state(imp, LUSTRE_IMP_REPLAY); } else { DEBUG_REQ(D_HA, request, "%s: evicting (reconnect/recover flags not set: %x)", imp->imp_obd->obd_name, msg_flags); imp->imp_remote_handle = *lustre_msg_get_handle(request->rq_repmsg); - IMPORT_SET_STATE(imp, LUSTRE_IMP_EVICTED); + import_set_state(imp, LUSTRE_IMP_EVICTED); } /* Sanity checks for a reconnected import. */ @@ -1232,7 +1232,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, class_export_put(exp); if (rc != 0) { - IMPORT_SET_STATE(imp, LUSTRE_IMP_DISCON); + import_set_state(imp, LUSTRE_IMP_DISCON); if (rc == -EACCES) { /* * Give up trying to reconnect @@ -1268,7 +1268,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, OBD_OCD_VERSION_FIX(ocd->ocd_version), LUSTRE_VERSION_STRING); ptlrpc_deactivate_import(imp); - IMPORT_SET_STATE(imp, LUSTRE_IMP_CLOSED); + import_set_state(imp, LUSTRE_IMP_CLOSED); } return -EPROTO; } @@ -1367,7 +1367,7 @@ static int ptlrpc_invalidate_import_thread(void *data) libcfs_debug_dumplog(); } - IMPORT_SET_STATE(imp, LUSTRE_IMP_RECOVER); + import_set_state(imp, LUSTRE_IMP_RECOVER); ptlrpc_import_recovery_state_machine(imp); class_import_put(imp); @@ -1448,7 +1448,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) rc = ptlrpc_replay_next(imp, &inflight); if (inflight == 0 && atomic_read(&imp->imp_replay_inflight) == 0) { - IMPORT_SET_STATE(imp, LUSTRE_IMP_REPLAY_LOCKS); + import_set_state(imp, LUSTRE_IMP_REPLAY_LOCKS); rc = ldlm_replay_locks(imp); if (rc) goto out; @@ -1458,7 +1458,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) if (imp->imp_state == LUSTRE_IMP_REPLAY_LOCKS) if (atomic_read(&imp->imp_replay_inflight) == 0) { - IMPORT_SET_STATE(imp, LUSTRE_IMP_REPLAY_WAIT); + import_set_state(imp, LUSTRE_IMP_REPLAY_WAIT); rc = signal_completed_replay(imp); if (rc) goto out; @@ -1466,7 +1466,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) if (imp->imp_state == LUSTRE_IMP_REPLAY_WAIT) if (atomic_read(&imp->imp_replay_inflight) == 0) - IMPORT_SET_STATE(imp, LUSTRE_IMP_RECOVER); + import_set_state(imp, LUSTRE_IMP_RECOVER); if (imp->imp_state == LUSTRE_IMP_RECOVER) { CDEBUG(D_HA, "reconnected to %s@%s\n", @@ -1476,7 +1476,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) rc = ptlrpc_resend(imp); if (rc) goto out; - IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL); + import_set_state(imp, LUSTRE_IMP_FULL); ptlrpc_activate_import(imp); deuuidify(obd2cli_tgt(imp->imp_obd), NULL, @@ -1536,7 +1536,7 @@ static struct ptlrpc_request *ptlrpc_disconnect_prep_req(struct obd_import *imp) req->rq_timeout = min_t(int, req->rq_timeout, INITIAL_CONNECT_TIMEOUT); - IMPORT_SET_STATE(imp, LUSTRE_IMP_CONNECTING); + import_set_state(imp, LUSTRE_IMP_CONNECTING); req->rq_send_state = LUSTRE_IMP_CONNECTING; ptlrpc_request_set_replen(req); @@ -1601,9 +1601,9 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose) spin_lock(&imp->imp_lock); out: if (noclose) - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_DISCON); + import_set_state_nolock(imp, LUSTRE_IMP_DISCON); else - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_CLOSED); + import_set_state_nolock(imp, LUSTRE_IMP_CLOSED); memset(&imp->imp_remote_handle, 0, sizeof(imp->imp_remote_handle)); spin_unlock(&imp->imp_lock); @@ -1657,7 +1657,7 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, if (atomic_read(&imp->imp_inflight) > 1) { imp->imp_generation++; imp->imp_initiated_at = imp->imp_generation; - IMPORT_SET_STATE_NOLOCK(imp, LUSTRE_IMP_NEW); + import_set_state_nolock(imp, LUSTRE_IMP_NEW); ptlrpc_reset_reqs_generation(imp); connect = 1; } From patchwork Thu Feb 27 21:14:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFC50138D for ; Thu, 27 Feb 2020 21:38:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D8DF624690 for ; Thu, 27 Feb 2020 21:38:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8DF624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B814734A444; Thu, 27 Feb 2020 13:31:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 18AF821FC7B for ; Thu, 27 Feb 2020 13:20:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F35648F05; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F207C46C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:26 -0500 Message-Id: <1582838290-17243-399-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 398/622] lustre: uapi: add unused enum obd_statfs_state X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The 3rd and 4th bit field of enum obd_statfs_state are for values that have been obsoleted since Lustre 1.6. Lets make this clear for end user applications. WC-bug-id: https://jira.whamcloud.com/browse/LU-12501 Lustre-commit: e4d92a8a08ac ("LU-12501 utils: fix 'lfs df' printing loop") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35456 Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 86f3111..9c849ce 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -102,6 +102,8 @@ enum obd_statfs_state { OS_STATE_DEGRADED = 0x00000001, /**< RAID degraded/rebuilding */ OS_STATE_READONLY = 0x00000002, /**< filesystem is read-only */ OS_STATE_NOPRECREATE = 0x00000004, /**< no object precreation */ + OS_STATE_UNUSED1 = 0x00000008, /**< obsolete 1.6, was EROFS=30 */ + OS_STATE_UNUSED2 = 0x00000010, /**< obsolete 1.6, was EROFS=30 */ OS_STATE_ENOSPC = 0x00000020, /**< not enough free space */ OS_STATE_ENOINO = 0x00000040, /**< not enough inodes */ OS_STATE_SUM = 0x00000100, /**< aggregated for all tagrets */ From patchwork Thu Feb 27 21:14:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410455 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BBE317E0 for ; Thu, 27 Feb 2020 21:38:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 54AF224690 for ; Thu, 27 Feb 2020 21:38:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 54AF224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BFB58349902; Thu, 27 Feb 2020 13:31:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5A06F21FDE7 for ; Thu, 27 Feb 2020 13:20:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 020F08F06; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 00A9246D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:27 -0500 Message-Id: <1582838290-17243-400-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 399/622] lustre: llite: create obd_device with usercopy whitelist X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang Since kernel 4.16 hardened usercopy has been added, whitelist the struct obd_device to silence the warning. Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'll_obd_dev_cache' (offset 1256, size 40)! WARNING: CPU: 1 PID: 17534 at mm/usercopy.c:83 usercopy_warn+0x7d/0xa0 Call Trace: __check_object_size+0xfa/0x181 lmv_iocontrol+0x1146/0x1880 [lmv] ll_obd_statfs+0x356/0x860 [lustre] ll_dir_ioctl+0x1e37/0x6760 [lustre] do_vfs_ioctl+0xa4/0x630 Linux-commit: 8eb8284b412906181357c2b0110d879d5af95e52 WC-bug-id: https://jira.whamcloud.com/browse/LU-12331 Lustre-commit: e34c59812abf ("LU-12331 llite: create obd_device with usercopy whitelist") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/34946 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 2b1175f..49db077 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -648,9 +648,11 @@ void obd_cleanup_caches(void) int obd_init_caches(void) { LASSERT(!obd_device_cachep); - obd_device_cachep = kmem_cache_create("ll_obd_dev_cache", - sizeof(struct obd_device), - 0, 0, NULL); + obd_device_cachep = kmem_cache_create_usercopy("ll_obd_dev_cache", + sizeof(struct obd_device), + 0, 0, 0, + sizeof(struct obd_device), + NULL); if (!obd_device_cachep) goto out; From patchwork Thu Feb 27 21:14:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410459 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D373138D for ; Thu, 27 Feb 2020 21:38:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 857AF246A1 for ; Thu, 27 Feb 2020 21:38:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 857AF246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 67BBC34A49B; Thu, 27 Feb 2020 13:31:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B4BD21FCA4 for ; Thu, 27 Feb 2020 13:20:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 06F3D8F07; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 05E68468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:28 -0500 Message-Id: <1582838290-17243-401-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 400/622] lnet: warn if discovery is off X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Output a warning if discovery is off and admin is either trying to add a route or enable routing WC-bug-id: https://jira.whamcloud.com/browse/LU-12427 Lustre-commit: c9718be06192 ("LU-12427 lnet: warn if discovery is off") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35200 Reviewed-by: Olaf Weber Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index f7b53e0..eb76c72 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -519,6 +519,8 @@ static void lnet_shuffle_seed(void) if (add_route) { gw->lp_health_sensitivity = sensitivity; lnet_add_route_to_rnet(rnet2, route); + if (lnet_peer_discovery_disabled) + CWARN("Consider turning discovery on to enable full Multi-Rail routing functionality\n"); } /* get rid of the reference on the lpni. @@ -1379,6 +1381,9 @@ bool lnet_router_checker_active(void) ~LNET_PING_FEAT_RTE_DISABLED; lnet_net_unlock(LNET_LOCK_EX); + if (lnet_peer_discovery_disabled) + CWARN("Consider turning discovery on to enable full Multi-Rail routing functionality\n"); + return rc; } From patchwork Thu Feb 27 21:14:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410685 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51062138D for ; Thu, 27 Feb 2020 21:44:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 39A6D24690 for ; Thu, 27 Feb 2020 21:44:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39A6D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 149E434A085; Thu, 27 Feb 2020 13:35:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0ABF21FCA4 for ; Thu, 27 Feb 2020 13:20:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 07D878F08; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 06AA347C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:29 -0500 Message-Id: <1582838290-17243-402-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 401/622] lustre: ldlm: always cancel aged locks regardless enabling or disabling lru resize X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Gu Zheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Gu Zheng Currently cancelling aged locks is handled by of ldlm_pool_recalc routine, and it only works when lru resize is enabled, means if we disabled lru resize, old aged locks are still cached even though they reach the ns_max_age. But theoretically, even lru resize disabled, lru_max_age should behave same as enabling lru resize. At the end, lru_size is like hard limit of number of locks, but ns_max_age/lru_max_age is a elimination mechanism, regardless enabling or disabling lru resize meaning once it gets lru_max_age, locks need to be cancelled. So fix it here with changing the lru flags when invoking ldlm_cancel_lru to do the real cancel work, if lru resize is enabled, set flag to LDLM_LRU_FLAG_LRUR, otherwise LDLM_LRU_FLAG_AGED. Change lru_flags into a proper enum WC-bug-id: https://jira.whamcloud.com/browse/LU-11672 Lustre-commit: e4c490bac770 ("LU-11672 ldlm: awalys cancel aged locks regardless enabling or disabling lru resize") Signed-off-by: Gu Zheng Reviewed-on: https://review.whamcloud.com/35467 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_internal.h | 8 +++++--- fs/lustre/ldlm/ldlm_pool.c | 14 +++++++------- fs/lustre/ldlm/ldlm_request.c | 40 ++++++++++++++++++++++------------------ 3 files changed, 34 insertions(+), 28 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 3789496..4844a9b 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -87,7 +87,7 @@ void ldlm_namespace_move_to_inactive_locked(struct ldlm_namespace *ns, /* ldlm_request.c */ /* Cancel lru flag, it indicates we cancel aged locks. */ -enum { +enum ldlm_lru_flags { LDLM_LRU_FLAG_AGED = BIT(0), /* Cancel old non-LRU resize locks */ LDLM_LRU_FLAG_PASSED = BIT(1), /* Cancel passed number of locks. */ LDLM_LRU_FLAG_SHRINK = BIT(2), /* Cancel locks from shrinker. */ @@ -104,10 +104,12 @@ enum { }; int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr, - enum ldlm_cancel_flags sync, int flags); + enum ldlm_cancel_flags cancel_flags, + enum ldlm_lru_flags lru_flags); int ldlm_cancel_lru_local(struct ldlm_namespace *ns, struct list_head *cancels, int count, int max, - enum ldlm_cancel_flags cancel_flags, int flags); + enum ldlm_cancel_flags cancel_flags, + enum ldlm_lru_flags lru_flags); extern unsigned int ldlm_enqueue_min; extern unsigned int ldlm_cancel_unused_locks_before_replay; diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c index b2b3ead..9185dc93 100644 --- a/fs/lustre/ldlm/ldlm_pool.c +++ b/fs/lustre/ldlm/ldlm_pool.c @@ -255,6 +255,7 @@ static void ldlm_cli_pool_pop_slv(struct ldlm_pool *pl) static int ldlm_cli_pool_recalc(struct ldlm_pool *pl) { time64_t recalc_interval_sec; + enum ldlm_lru_flags lru_flags; int ret; recalc_interval_sec = ktime_get_real_seconds() - pl->pl_recalc_time; @@ -279,13 +280,13 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl) spin_unlock(&pl->pl_lock); /* - * Do not cancel locks in case lru resize is disabled for this ns. + * Cancel aged locks if lru resize is disabled for this ns. */ if (!ns_connect_lru_resize(container_of(pl, struct ldlm_namespace, - ns_pool))) { - ret = 0; - goto out; - } + ns_pool))) + lru_flags = LDLM_LRU_FLAG_LRUR; + else + lru_flags = LDLM_LRU_FLAG_AGED; /* * In the time of canceling locks on client we do not need to maintain @@ -294,9 +295,8 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl) * take into account pl->pl_recalc_time here. */ ret = ldlm_cancel_lru(container_of(pl, struct ldlm_namespace, ns_pool), - 0, LCF_ASYNC, LDLM_LRU_FLAG_LRUR); + 0, LCF_ASYNC, lru_flags); -out: spin_lock(&pl->pl_lock); /* * Time of LRU resizing might be longer than period, diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 5a7026d..75492f6 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -590,7 +590,8 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req, struct ldlm_namespace *ns = exp->exp_obd->obd_namespace; struct req_capsule *pill = &req->rq_pill; struct ldlm_request *dlm = NULL; - int flags, avail, to_free, pack = 0; + enum ldlm_lru_flags lru_flags; + int avail, to_free, pack = 0; LIST_HEAD(head); int rc; @@ -601,9 +602,9 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req, req_capsule_filled_sizes(pill, RCL_CLIENT); avail = ldlm_capsule_handles_avail(pill, RCL_CLIENT, canceloff); - flags = LDLM_LRU_FLAG_NO_WAIT | - (ns_connect_lru_resize(ns) ? - LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED); + lru_flags = LDLM_LRU_FLAG_NO_WAIT | + (ns_connect_lru_resize(ns) ? + LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED); to_free = !ns_connect_lru_resize(ns) && opc == LDLM_ENQUEUE ? 1 : 0; @@ -614,7 +615,8 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req, */ if (avail > count) count += ldlm_cancel_lru_local(ns, cancels, to_free, - avail - count, 0, flags); + avail - count, 0, + lru_flags); if (avail > count) pack = count; else @@ -1279,7 +1281,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, enum ldlm_cancel_flags cancel_flags) { struct obd_export *exp; - int avail, flags, count = 1; + enum ldlm_lru_flags lru_flags; + int avail, count = 1; u64 rc = 0; struct ldlm_namespace *ns; struct ldlm_lock *lock; @@ -1354,10 +1357,10 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, LASSERT(avail > 0); ns = ldlm_lock_to_ns(lock); - flags = ns_connect_lru_resize(ns) ? - LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED; + lru_flags = ns_connect_lru_resize(ns) ? + LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED; count += ldlm_cancel_lru_local(ns, &cancels, 0, avail - 1, - LCF_BL_AST, flags); + LCF_BL_AST, lru_flags); } ldlm_cli_cancel_list(&cancels, count, NULL, cancel_flags); return 0; @@ -1593,7 +1596,7 @@ typedef enum ldlm_policy_res (*ldlm_cancel_lru_policy_t)(struct ldlm_namespace * int, int, int); static ldlm_cancel_lru_policy_t -ldlm_cancel_lru_policy(struct ldlm_namespace *ns, int lru_flags) +ldlm_cancel_lru_policy(struct ldlm_namespace *ns, enum ldlm_lru_flags lru_flags) { if (ns_connect_lru_resize(ns)) { if (lru_flags & LDLM_LRU_FLAG_SHRINK) { @@ -1662,16 +1665,16 @@ typedef enum ldlm_policy_res (*ldlm_cancel_lru_policy_t)(struct ldlm_namespace * */ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, struct list_head *cancels, int count, int max, - int flags) + enum ldlm_lru_flags lru_flags) { ldlm_cancel_lru_policy_t pf; int added = 0; - int no_wait = flags & LDLM_LRU_FLAG_NO_WAIT; + int no_wait = lru_flags & LDLM_LRU_FLAG_NO_WAIT; if (!ns_connect_lru_resize(ns)) count += ns->ns_nr_unused - ns->ns_max_unused; - pf = ldlm_cancel_lru_policy(ns, flags); + pf = ldlm_cancel_lru_policy(ns, lru_flags); LASSERT(pf); /* For any flags, stop scanning if @max is reached. */ @@ -1787,7 +1790,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, */ lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_CANCELING; - if ((flags & LDLM_LRU_FLAG_CLEANUP) && + if ((lru_flags & LDLM_LRU_FLAG_CLEANUP) && (lock->l_resource->lr_type == LDLM_EXTENT || ldlm_has_dom(lock)) && lock->l_granted_mode == LCK_PR) ldlm_set_discard_data(lock); @@ -1811,11 +1814,12 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, int ldlm_cancel_lru_local(struct ldlm_namespace *ns, struct list_head *cancels, int count, int max, - enum ldlm_cancel_flags cancel_flags, int flags) + enum ldlm_cancel_flags cancel_flags, + enum ldlm_lru_flags lru_flags) { int added; - added = ldlm_prepare_lru_list(ns, cancels, count, max, flags); + added = ldlm_prepare_lru_list(ns, cancels, count, max, lru_flags); if (added <= 0) return added; return ldlm_cli_cancel_list_local(cancels, added, cancel_flags); @@ -1831,7 +1835,7 @@ int ldlm_cancel_lru_local(struct ldlm_namespace *ns, */ int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr, enum ldlm_cancel_flags cancel_flags, - int flags) + enum ldlm_lru_flags lru_flags) { LIST_HEAD(cancels); int count, rc; @@ -1840,7 +1844,7 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr, * Just prepare the list of locks, do not actually cancel them yet. * Locks are cancelled later in a separate thread. */ - count = ldlm_prepare_lru_list(ns, &cancels, nr, 0, flags); + count = ldlm_prepare_lru_list(ns, &cancels, nr, 0, lru_flags); rc = ldlm_bl_to_thread_list(ns, NULL, &cancels, count, cancel_flags); if (rc == 0) return count; From patchwork Thu Feb 27 21:14:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410747 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A3EF924 for ; Thu, 27 Feb 2020 21:45:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E6F80246A2 for ; Thu, 27 Feb 2020 21:45:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6F80246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB34434B0DE; Thu, 27 Feb 2020 13:36:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 423AD21FCA4 for ; Thu, 27 Feb 2020 13:20:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0A1A18F09; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 090AC46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:30 -0500 Message-Id: <1582838290-17243-403-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 402/622] lustre: llite: cleanup stats of LPROC_LL_* X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi Some LPROC_LL_ stats are not used for a long time. This patch removes them. LPROC_LL_STAFS is changed to LPROC_LL_STATFS in this patch too. WC-bug-id: https://jira.whamcloud.com/browse/LU-12545 Lustre-commit: 976c1f334fcb ("LU-12545 llite: cleanup stats of LPROC_LL_*") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/35514 Reviewed-by: Gu Zheng Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 6 +----- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/lproc_llite.c | 8 +------- 3 files changed, 3 insertions(+), 13 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 9d60ae5..a0d631d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -789,12 +789,8 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, void ll_io_init(struct cl_io *io, const struct file *file, int write); enum { - LPROC_LL_DIRTY_HITS, - LPROC_LL_DIRTY_MISSES, LPROC_LL_READ_BYTES, LPROC_LL_WRITE_BYTES, - LPROC_LL_BRW_READ, - LPROC_LL_BRW_WRITE, LPROC_LL_IOCTL, LPROC_LL_OPEN, LPROC_LL_RELEASE, @@ -816,7 +812,7 @@ enum { LPROC_LL_RMDIR, LPROC_LL_MKNOD, LPROC_LL_RENAME, - LPROC_LL_STAFS, + LPROC_LL_STATFS, LPROC_LL_ALLOC_INODE, LPROC_LL_SETXATTR, LPROC_LL_GETXATTR, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index cc417d6..e0395e5 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1918,7 +1918,7 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) int rc; CDEBUG(D_VFSTRACE, "VFS Op: at %llu jiffies\n", get_jiffies_64()); - ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STAFS, 1); + ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STATFS, 1); /* Some amount of caching on the client is allowed */ rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, OBD_STATFS_SUM); diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 4cffd36..6eb3d33 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1543,16 +1543,10 @@ static void sbi_kobj_release(struct kobject *kobj) const char *opname; } llite_opcode_table[LPROC_LL_FILE_OPCODES] = { /* file operation */ - { LPROC_LL_DIRTY_HITS, LPROCFS_TYPE_REGS, "dirty_pages_hits" }, - { LPROC_LL_DIRTY_MISSES, LPROCFS_TYPE_REGS, "dirty_pages_misses" }, { LPROC_LL_READ_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, "read_bytes" }, { LPROC_LL_WRITE_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, "write_bytes" }, - { LPROC_LL_BRW_READ, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_PAGES, - "brw_read" }, - { LPROC_LL_BRW_WRITE, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_PAGES, - "brw_write" }, { LPROC_LL_IOCTL, LPROCFS_TYPE_REGS, "ioctl" }, { LPROC_LL_OPEN, LPROCFS_TYPE_REGS, "open" }, { LPROC_LL_RELEASE, LPROCFS_TYPE_REGS, "close" }, @@ -1577,7 +1571,7 @@ static void sbi_kobj_release(struct kobject *kobj) { LPROC_LL_MKNOD, LPROCFS_TYPE_REGS, "mknod" }, { LPROC_LL_RENAME, LPROCFS_TYPE_REGS, "rename" }, /* special inode operation */ - { LPROC_LL_STAFS, LPROCFS_TYPE_REGS, "statfs" }, + { LPROC_LL_STATFS, LPROCFS_TYPE_REGS, "statfs" }, { LPROC_LL_ALLOC_INODE, LPROCFS_TYPE_REGS, "alloc_inode" }, { LPROC_LL_SETXATTR, LPROCFS_TYPE_REGS, "setxattr" }, { LPROC_LL_GETXATTR, LPROCFS_TYPE_REGS, "getxattr" }, From patchwork Thu Feb 27 21:14:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DCC9A17E0 for ; Thu, 27 Feb 2020 21:44:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C5AFF24690 for ; Thu, 27 Feb 2020 21:44:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5AFF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BB1DB3498D4; Thu, 27 Feb 2020 13:35:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 98A5921FCA4 for ; Thu, 27 Feb 2020 13:20:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0CCFC8F0A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0BBC946C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:31 -0500 Message-Id: <1582838290-17243-404-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 403/622] lustre: osc: Do not assert for first extent X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell In the discard case, the OSC fsync/writeback code asserts that each OSC extent is fully covered by the fsync request. This is not valid for the DOM case, because OSC extent alignment requirements can create OSC extents which start before the OST region of the layout (ie, they cross in to the DOM region). This is OK because the layout prevents them from ever being used for i/o, but this same behavior means that the OSC fsync start/end is aligned with the layout, and so does not necessarily cover that first extent. The simplest solution is just to not assert on the first extent. (There is no way at the OSC layer to recognize the DOM case.) WC-bug-id: https://jira.whamcloud.com/browse/LU-12462 Lustre-commit: 092ecd66127e ("LU-12462 osc: Do not assert for first extent") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35525 Reviewed-by: Mike Pershin Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 3b4c598..9e2f90d 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2931,10 +2931,17 @@ int osc_cache_writeback_range(const struct lu_env *env, struct osc_object *obj, unplug = true; } else { /* the only discarder is lock cancelling, so - * [start, end] must contain this extent + * [start, end] must contain this extent. + * However, with DOM, osc extent alignment may + * cause the first extent to start before the + * OST portion of the layout. This is never + * accessed for i/o, but the unused portion + * will not be covered by the sync request, + * so we cannot assert in that case. */ - EASSERT(ext->oe_start >= start && - ext->oe_end <= end, ext); + EASSERT(ergo(!(ext == first_extent(obj)), + ext->oe_start >= start && + ext->oe_end <= end), ext); osc_extent_state_set(ext, OES_LOCKING); ext->oe_owner = current; list_move_tail(&ext->oe_link, &discard_list); From patchwork Thu Feb 27 21:14:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410363 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 234D792A for ; Thu, 27 Feb 2020 21:36:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0BD9824677 for ; Thu, 27 Feb 2020 21:36:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BD9824677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58FC521FFD3; Thu, 27 Feb 2020 13:30:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA5E921FCA4 for ; Thu, 27 Feb 2020 13:20:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0F8228F0B; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0E82346D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:32 -0500 Message-Id: <1582838290-17243-405-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 404/622] lustre: llite: MS_* flags and SB_* flags split X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff In kernel 4.20 the MS_* flags should only be used for mount time flags and SB_* flags for checking super_block.s_flags The MS_* flags have moved to a uapi header. Change the one that was missed, MS_NOSEC to SB_NOSEC. WC-bug-id: https://jira.whamcloud.com/browse/LU-12355 Lustre-commit:72a84970e6d2a ("LU-12355 llite: MS_* flags and SB_* flags split") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35019 Reviewed-by: Andreas Dilger Reviewed-by: Petros Koutoupis Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e0395e5..3e058d2 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -270,7 +270,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) /* Setting this indicates we correctly support S_NOSEC (See kernel * commit 9e1f1de02c2275d7172e18dc4e7c2065777611bf) */ - sb->s_flags |= MS_NOSEC; + sb->s_flags |= SB_NOSEC; if (sbi->ll_flags & LL_SBI_FLOCK) sbi->ll_fop = &ll_file_operations_flock; From patchwork Thu Feb 27 21:14:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410461 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDE7F92A for ; Thu, 27 Feb 2020 21:39:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A630C24690 for ; Thu, 27 Feb 2020 21:39:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A630C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3E3EF34A4C3; Thu, 27 Feb 2020 13:31:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2D3D621FCA4 for ; Thu, 27 Feb 2020 13:20:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 127C38F0C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 114D4496; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:33 -0500 Message-Id: <1582838290-17243-406-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 405/622] lustre: llite: improve ll_dom_lock_cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev ll_dom_lock_cancel() should zero kms attribute similar to mdc_ldlm_blocking_ast0(). In order to avoid code duplication between mdc_ldlm_blocking_ast0() and ll_dom_lock_cancel() - add cl_object_operations method to be able to reach mdc's blocking ast from llite level. Test illustrating the issue is added. Cray-bug-id: LUS-7118 WC-bug-id: https://jira.whamcloud.com/browse/LU-12296 Lustre-commit: 707bab62f5d6 ("LU-12296 llite: improve ll_dom_lock_cancel") Signed-off-by: Vladimir Saveliev Reviewed-on: https://review.whamcloud.com/34858 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 10 ++++++++++ fs/lustre/llite/namei.c | 25 +++++-------------------- fs/lustre/lov/lov_object.c | 26 +++++++++++++++++++++++++- fs/lustre/mdc/mdc_dev.c | 11 +++++++++-- fs/lustre/obdclass/cl_object.c | 17 +++++++++++++++++ 5 files changed, 66 insertions(+), 23 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 5096025..7ac0dd2 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -417,6 +417,13 @@ struct cl_object_operations { void (*coo_req_attr_set)(const struct lu_env *env, struct cl_object *obj, struct cl_req_attr *attr); + /** + * Flush \a obj data corresponding to \a lock. Used for DoM + * locks in llite's cancelling blocking ast callback. + */ + int (*coo_object_flush)(const struct lu_env *env, + struct cl_object *obj, + struct ldlm_lock *lock); }; /** @@ -2108,6 +2115,9 @@ int cl_object_fiemap(const struct lu_env *env, struct cl_object *obj, int cl_object_layout_get(const struct lu_env *env, struct cl_object *obj, struct cl_layout *cl); loff_t cl_object_maxbytes(struct cl_object *obj); +int cl_object_flush(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock); + /** * Returns true, iff @o0 and @o1 are slices of the same object. diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 49433c9..71e757a 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -177,13 +177,12 @@ int ll_test_inode_by_fid(struct inode *inode, void *opaque) return lu_fid_eq(&ll_i2info(inode)->lli_fid, opaque); } -int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) +static int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) { struct lu_env *env; struct ll_inode_info *lli = ll_i2info(inode); - struct cl_layout clt = { .cl_layout_gen = 0, }; - int rc; u16 refcheck; + int rc; if (!lli->lli_clob) { /* Due to DoM read on open, there may exist pages for Lustre @@ -197,28 +196,14 @@ int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) if (IS_ERR(env)) return PTR_ERR(env); - rc = cl_object_layout_get(env, lli->lli_clob, &clt); - if (rc) { - CDEBUG(D_INODE, "Cannot get layout for "DFID"\n", - PFID(ll_inode2fid(inode))); - rc = -ENODATA; - } else if (clt.cl_size == 0 || clt.cl_dom_comp_size == 0) { - CDEBUG(D_INODE, "DOM lock without DOM layout for "DFID"\n", - PFID(ll_inode2fid(inode))); - } else { - enum cl_fsync_mode mode; - loff_t end = clt.cl_dom_comp_size - 1; + /* reach MDC layer to flush data under the DoM ldlm lock */ + rc = cl_object_flush(env, lli->lli_clob, lock); - mode = ldlm_is_discard_data(lock) ? - CL_FSYNC_DISCARD : CL_FSYNC_LOCAL; - rc = cl_sync_file_range(inode, 0, end, mode, 1); - truncate_inode_pages_range(inode->i_mapping, 0, end); - } cl_env_put(env, &refcheck); return rc; } -void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) +static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) { struct inode *inode = ll_inode_from_resource_lock(lock); struct ll_inode_info *lli; diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 792d946..52d8c30 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -75,6 +75,8 @@ struct lov_layout_operations { struct cl_object *obj, struct cl_io *io); int (*llo_getattr)(const struct lu_env *env, struct cl_object *obj, struct cl_attr *attr); + int (*llo_flush)(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock); }; static int lov_layout_wait(const struct lu_env *env, struct lov_object *lov); @@ -1021,7 +1023,21 @@ static int lov_attr_get_composite(const struct lu_env *env, return 0; } -static const struct lov_layout_operations lov_dispatch[] = { +static int lov_flush_composite(const struct lu_env *env, + struct cl_object *obj, + struct ldlm_lock *lock) +{ + struct lov_object *lov = cl2lov(obj); + struct lovsub_object *lovsub; + + if (!lsme_is_dom(lov->lo_lsm->lsm_entries[0])) + return -EINVAL; + + lovsub = lov->u.composite.lo_entries[0].lle_dom.lo_dom; + return cl_object_flush(env, lovsub2cl(lovsub), lock); +} + +const static struct lov_layout_operations lov_dispatch[] = { [LLT_EMPTY] = { .llo_init = lov_init_empty, .llo_delete = lov_delete_empty, @@ -1051,6 +1067,7 @@ static int lov_attr_get_composite(const struct lu_env *env, .llo_lock_init = lov_lock_init_composite, .llo_io_init = lov_io_init_composite, .llo_getattr = lov_attr_get_composite, + .llo_flush = lov_flush_composite, }, [LLT_FOREIGN] = { .llo_init = lov_init_foreign, @@ -2083,6 +2100,12 @@ static loff_t lov_object_maxbytes(struct cl_object *obj) return maxbytes; } +static int lov_object_flush(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock) +{ + return LOV_2DISPATCH_NOLOCK(cl2lov(obj), llo_flush, env, obj, lock); +} + static const struct cl_object_operations lov_ops = { .coo_page_init = lov_page_init, .coo_lock_init = lov_lock_init, @@ -2094,6 +2117,7 @@ static loff_t lov_object_maxbytes(struct cl_object *obj) .coo_layout_get = lov_object_layout_get, .coo_maxbytes = lov_object_maxbytes, .coo_fiemap = lov_object_fiemap, + .coo_object_flush = lov_object_flush }; static const struct lu_object_operations lov_lu_obj_ops = { diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 14cece1..df8bb33 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -286,7 +286,7 @@ void mdc_lock_lockless_cancel(const struct lu_env *env, */ static int mdc_dlm_blocking_ast0(const struct lu_env *env, struct ldlm_lock *dlmlock, - void *data, int flag) + int flag) { struct cl_object *obj = NULL; int result = 0; @@ -375,7 +375,7 @@ int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock, break; } - rc = mdc_dlm_blocking_ast0(env, dlmlock, data, flag); + rc = mdc_dlm_blocking_ast0(env, dlmlock, flag); cl_env_put(env, &refcheck); break; } @@ -1382,6 +1382,12 @@ int mdc_object_prune(const struct lu_env *env, struct cl_object *obj) return 0; } +static int mdc_object_flush(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock) +{ + return mdc_dlm_blocking_ast0(env, lock, LDLM_CB_CANCELING); +} + static const struct cl_object_operations mdc_ops = { .coo_page_init = osc_page_init, .coo_lock_init = mdc_lock_init, @@ -1391,6 +1397,7 @@ int mdc_object_prune(const struct lu_env *env, struct cl_object *obj) .coo_glimpse = osc_object_glimpse, .coo_req_attr_set = mdc_req_attr_set, .coo_prune = mdc_object_prune, + .coo_object_flush = mdc_object_flush }; static const struct osc_object_operations mdc_object_ops = { diff --git a/fs/lustre/obdclass/cl_object.c b/fs/lustre/obdclass/cl_object.c index f0ae34f..b323eb4 100644 --- a/fs/lustre/obdclass/cl_object.c +++ b/fs/lustre/obdclass/cl_object.c @@ -389,6 +389,23 @@ loff_t cl_object_maxbytes(struct cl_object *obj) } EXPORT_SYMBOL(cl_object_maxbytes); +int cl_object_flush(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock) +{ + struct lu_object_header *top = obj->co_lu.lo_header; + int rc = 0; + + list_for_each_entry(obj, &top->loh_layers, co_lu.lo_linkage) { + if (obj->co_ops->coo_object_flush) { + rc = obj->co_ops->coo_object_flush(env, obj, lock); + if (rc) + break; + } + } + return rc; +} +EXPORT_SYMBOL(cl_object_flush); + /** * Helper function removing all object locks, and marking object for * deletion. All object pages must have been deleted at this point. From patchwork Thu Feb 27 21:14:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410465 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87D7C17E0 for ; Thu, 27 Feb 2020 21:39:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7098524690 for ; Thu, 27 Feb 2020 21:39:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7098524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 99D373490CE; Thu, 27 Feb 2020 13:31:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 837D021FCA4 for ; Thu, 27 Feb 2020 13:20:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1596D8F0D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 14282468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:34 -0500 Message-Id: <1582838290-17243-407-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 406/622] lustre: llite: swab LOV EA user data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jian Yu Many sub-tests failed with "Invalid argument" failures on PPC client because of the endianness issue. This patch fixes the issue by adding a common function lustre_swab_lov_user_md() to swab the LOV EA user data. WC-bug-id: https://jira.whamcloud.com/browse/LU-10100 Lustre-commit: 9d17996766e0 ("LU-10100 llite: swab LOV EA user data") Signed-off-by: Jian Yu Reviewed-on: https://review.whamcloud.com/35291 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_swab.h | 1 + fs/lustre/llite/dir.c | 65 ++++++++++-------------------------- fs/lustre/llite/file.c | 46 ++++++++++++------------- fs/lustre/llite/llite_lib.c | 4 +-- fs/lustre/llite/xattr.c | 25 ++++++++++++-- fs/lustre/ptlrpc/pack_generic.c | 74 +++++++++++++++++++++++++++++++++-------- 6 files changed, 126 insertions(+), 89 deletions(-) diff --git a/fs/lustre/include/lustre_swab.h b/fs/lustre/include/lustre_swab.h index 7e96640..e99e16d 100644 --- a/fs/lustre/include/lustre_swab.h +++ b/fs/lustre/include/lustre_swab.h @@ -86,6 +86,7 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum); void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, int stripe_count); +void lustre_swab_lov_user_md(struct lov_user_md *lum); void lustre_swab_lov_mds_md(struct lov_mds_md *lmm); void lustre_swab_lustre_capa(struct lustre_capa *c); void lustre_swab_lustre_capa_key(struct lustre_capa_key *k); diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 2c39579..f87ddd2 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -525,60 +525,46 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, int lum_size; if (lump) { - /* - * This is coming from userspace, so should be in - * local endian. But the MDS would like it in little - * endian, so we swab it before we send it. - */ switch (lump->lmm_magic) { - case LOV_USER_MAGIC_V1: { - if (lump->lmm_magic != cpu_to_le32(LOV_USER_MAGIC_V1)) - lustre_swab_lov_user_md_v1(lump); + case LOV_USER_MAGIC_V1: lum_size = sizeof(struct lov_user_md_v1); break; - } - case LOV_USER_MAGIC_V3: { - if (lump->lmm_magic != cpu_to_le32(LOV_USER_MAGIC_V3)) - lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lump); + case LOV_USER_MAGIC_V3: lum_size = sizeof(struct lov_user_md_v3); break; - } - case LOV_USER_MAGIC_COMP_V1: { - if (lump->lmm_magic != - cpu_to_le32(LOV_USER_MAGIC_COMP_V1)) - lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lump); - lum_size = le32_to_cpu(((struct lov_comp_md_v1 *)lump)->lcm_size); + case LOV_USER_MAGIC_COMP_V1: + lum_size = ((struct lov_comp_md_v1 *)lump)->lcm_size; break; - } - case LMV_USER_MAGIC: { + case LMV_USER_MAGIC: if (lump->lmm_magic != cpu_to_le32(LMV_USER_MAGIC)) lustre_swab_lmv_user_md((struct lmv_user_md *)lump); lum_size = sizeof(struct lmv_user_md); break; - } case LOV_USER_MAGIC_SPECIFIC: { struct lov_user_md_v3 *v3 = - (struct lov_user_md_v3 *)lump; + (struct lov_user_md_v3 *)lump; if (v3->lmm_stripe_count > LOV_MAX_STRIPE_COUNT) return -EINVAL; - if (lump->lmm_magic != - cpu_to_le32(LOV_USER_MAGIC_SPECIFIC)) { - lustre_swab_lov_user_md_v3(v3); - lustre_swab_lov_user_md_objects(v3->lmm_objects, - v3->lmm_stripe_count); - } lum_size = lov_user_md_size(v3->lmm_stripe_count, LOV_USER_MAGIC_SPECIFIC); break; } - default: { + default: CDEBUG(D_IOCTL, "bad userland LOV MAGIC: %#08x != %#08x nor %#08x\n", lump->lmm_magic, LOV_USER_MAGIC_V1, LOV_USER_MAGIC_V3); return -EINVAL; } - } + + /* + * This is coming from userspace, so should be in + * local endian. But the MDS would like it in little + * endian, so we swab it before we send it. + */ + if ((__swab32(lump->lmm_magic) & le32_to_cpu(LOV_MAGIC_MASK)) == + le32_to_cpu(LOV_MAGIC_MAGIC)) + lustre_swab_lov_user_md(lump); } else { lum_size = sizeof(struct lov_user_md_v1); } @@ -706,16 +692,11 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, /* We don't swab objects for directories */ switch (le32_to_cpu(lmm->lmm_magic)) { case LOV_MAGIC_V1: - if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) - lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lmm); - break; case LOV_MAGIC_V3: - if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) - lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm); - break; case LOV_MAGIC_COMP_V1: + case LOV_USER_MAGIC_SPECIFIC: if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) - lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm); + lustre_swab_lov_user_md((struct lov_user_md *)lmm); break; case LMV_MAGIC_V1: if (cpu_to_le32(LMV_MAGIC) != LMV_MAGIC) @@ -725,16 +706,6 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC) lustre_swab_lmv_user_md((struct lmv_user_md *)lmm); break; - case LOV_USER_MAGIC_SPECIFIC: { - struct lov_user_md_v3 *v3 = (struct lov_user_md_v3 *)lmm; - - if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) { - lustre_swab_lov_user_md_v3(v3); - lustre_swab_lov_user_md_objects(v3->lmm_objects, - v3->lmm_stripe_count); - } - } - break; case LMV_MAGIC_FOREIGN: { struct lmv_foreign_md *lfm = (struct lmv_foreign_md *)lmm; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index d313730..5a3e80e 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1852,6 +1852,12 @@ int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry, }; int rc = 0; + if ((__swab32(lum->lmm_magic) & le32_to_cpu(LOV_MAGIC_MASK)) == + le32_to_cpu(LOV_MAGIC_MAGIC)) { + /* this code will only exist for big-endian systems */ + lustre_swab_lov_user_md(lum); + } + ll_inode_size_lock(inode); rc = ll_intent_file_open(dentry, lum, lum_size, &oit); if (rc < 0) @@ -1920,8 +1926,9 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, * little endian. We convert it to host endian before * passing it to userspace. */ - if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) { - int stripe_count; + if ((lmm->lmm_magic & __swab32(LOV_MAGIC_MAGIC)) == + __swab32(LOV_MAGIC_MAGIC)) { + int stripe_count = 0; if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1) || lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) { @@ -1931,31 +1938,20 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, stripe_count = 0; } + lustre_swab_lov_user_md((struct lov_user_md *)lmm); + /* if function called for directory - we should * avoid swab not existent lsm objects */ - if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1)) { - lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lmm); - if (S_ISREG(body->mbo_mode)) - lustre_swab_lov_user_md_objects(((struct lov_user_md_v1 *)lmm)->lmm_objects, - stripe_count); - } else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) { - lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm); - if (S_ISREG(body->mbo_mode)) - lustre_swab_lov_user_md_objects(((struct lov_user_md_v3 *)lmm)->lmm_objects, - stripe_count); - } else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) { - lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm); - } else if (lmm->lmm_magic == - cpu_to_le32(LOV_MAGIC_FOREIGN)) { - struct lov_foreign_md *lfm; - - lfm = (struct lov_foreign_md *)lmm; - __swab32s(&lfm->lfm_magic); - __swab32s(&lfm->lfm_length); - __swab32s(&lfm->lfm_type); - __swab32s(&lfm->lfm_flags); - } + if (lmm->lmm_magic == LOV_MAGIC_V1 && S_ISREG(body->mbo_mode)) + lustre_swab_lov_user_md_objects( + ((struct lov_user_md_v1 *)lmm)->lmm_objects, + stripe_count); + else if (lmm->lmm_magic == LOV_MAGIC_V3 && + S_ISREG(body->mbo_mode)) + lustre_swab_lov_user_md_objects( + ((struct lov_user_md_v3 *)lmm)->lmm_objects, + stripe_count); } out: @@ -2040,7 +2036,7 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, cl_lov_delay_create_clear(&file->f_flags); out: - kfree(klum); + kvfree(klum); return rc; } diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 3e058d2..86be562 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2757,14 +2757,14 @@ ssize_t ll_copy_user_md(const struct lov_user_md __user *md, if (lum_size < 0) goto no_kbuf; - *kbuf = kzalloc(lum_size, GFP_NOFS); + *kbuf = kvzalloc(lum_size, GFP_NOFS); if (!*kbuf) { lum_size = -ENOMEM; goto no_kbuf; } if (copy_from_user(*kbuf, md, lum_size) != 0) { - kfree(*kbuf); + kvfree(*kbuf); *kbuf = NULL; lum_size = -EFAULT; } diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 9707e78..cf1cfd2 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -40,6 +40,7 @@ #include #include +#include #include "llite_internal.h" @@ -316,6 +317,11 @@ static int ll_xattr_set(const struct xattr_handler *handler, return 0; } + if (strncmp(name, "lov.", 4) == 0 && + (__swab32(((struct lov_user_md *)value)->lmm_magic) & + le32_to_cpu(LOV_MAGIC_MASK)) == le32_to_cpu(LOV_MAGIC_MAGIC)) + lustre_swab_lov_user_md((struct lov_user_md *)value); + return ll_xattr_set_common(handler, dentry, inode, name, value, size, flags); } @@ -485,10 +491,25 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) * file is restored. See LU-2809. */ magic = ((struct lov_mds_md *)buf)->lmm_magic; - if (magic == LOV_MAGIC_COMP_V1 || magic == LOV_MAGIC_FOREIGN) + if ((magic & __swab32(LOV_MAGIC_MAGIC)) == + __swab32(LOV_MAGIC_MAGIC)) + magic = __swab32(magic); + + switch (magic) { + case LOV_MAGIC_V1: + case LOV_MAGIC_V3: + case LOV_MAGIC_SPECIFIC: + ((struct lov_mds_md *)buf)->lmm_layout_gen = 0; + break; + case LOV_MAGIC_COMP_V1: + case LOV_MAGIC_FOREIGN: + goto out_env; + default: + CERROR("Invalid LOV magic %08x\n", magic); + rc = -EINVAL; goto out_env; + } - ((struct lov_mds_md *)buf)->lmm_layout_gen = 0; out_env: cl_env_put(env, &refcheck); diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index f687ecc..7acb4a8 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2004,6 +2004,8 @@ void lustre_swab_lmv_user_md(struct lmv_user_md *lum) if (lum->lum_magic == LMV_MAGIC_FOREIGN) { __swab32s(&lum->lum_magic); __swab32s(&((struct lmv_foreign_md *)lum)->lfm_length); + __swab32s(&((struct lmv_foreign_md *)lum)->lfm_type); + __swab32s(&((struct lmv_foreign_md *)lum)->lfm_flags); return; } @@ -2132,18 +2134,6 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum) } EXPORT_SYMBOL(lustre_swab_lov_comp_md_v1); -void lustre_swab_lov_mds_md(struct lov_mds_md *lmm) -{ - CDEBUG(D_IOCTL, "swabbing lov_mds_md\n"); - __swab32s(&lmm->lmm_magic); - __swab32s(&lmm->lmm_pattern); - lustre_swab_lmm_oi(&lmm->lmm_oi); - __swab32s(&lmm->lmm_stripe_size); - __swab16s(&lmm->lmm_stripe_count); - __swab16s(&lmm->lmm_layout_gen); -} -EXPORT_SYMBOL(lustre_swab_lov_mds_md); - void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, int stripe_count) { @@ -2157,9 +2147,67 @@ void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, } EXPORT_SYMBOL(lustre_swab_lov_user_md_objects); +void lustre_swab_lov_user_md(struct lov_user_md *lum) +{ + CDEBUG(D_IOCTL, "swabbing lov_user_md\n"); + switch (lum->lmm_magic) { + case __swab32(LOV_MAGIC_V1): + case LOV_USER_MAGIC_V1: + lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lum); + break; + case __swab32(LOV_MAGIC_V3): + case LOV_USER_MAGIC_V3: + lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lum); + break; + case __swab32(LOV_USER_MAGIC_SPECIFIC): + case LOV_USER_MAGIC_SPECIFIC: + { + struct lov_user_md_v3 *v3 = (struct lov_user_md_v3 *)lum; + u16 stripe_count = v3->lmm_stripe_count; + + if (lum->lmm_magic != LOV_USER_MAGIC_SPECIFIC) + __swab16s(&stripe_count); + + lustre_swab_lov_user_md_v3(v3); + lustre_swab_lov_user_md_objects(v3->lmm_objects, stripe_count); + break; + } + case __swab32(LOV_MAGIC_COMP_V1): + case LOV_USER_MAGIC_COMP_V1: + lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lum); + break; + case __swab32(LOV_MAGIC_FOREIGN): + case LOV_USER_MAGIC_FOREIGN: + { + struct lov_foreign_md *lfm = (struct lov_foreign_md *)lum; + + __swab32s(&lfm->lfm_magic); + __swab32s(&lfm->lfm_length); + __swab32s(&lfm->lfm_type); + __swab32s(&lfm->lfm_flags); + break; + } + default: + CDEBUG(D_IOCTL, "Invalid LOV magic %08x\n", lum->lmm_magic); + } +} +EXPORT_SYMBOL(lustre_swab_lov_user_md); + +void lustre_swab_lov_mds_md(struct lov_mds_md *lmm) +{ + CDEBUG(D_IOCTL, "swabbing lov_mds_md\n"); + __swab32s(&lmm->lmm_magic); + __swab32s(&lmm->lmm_pattern); + lustre_swab_lmm_oi(&lmm->lmm_oi); + __swab32s(&lmm->lmm_stripe_size); + __swab16s(&lmm->lmm_stripe_count); + __swab16s(&lmm->lmm_layout_gen); +} +EXPORT_SYMBOL(lustre_swab_lov_mds_md); + static void lustre_swab_ldlm_res_id(struct ldlm_res_id *id) { - int i; + int i; for (i = 0; i < RES_NAME_SIZE; i++) __swab64s(&id->name[i]); From patchwork Thu Feb 27 21:14:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64CEF138D for ; Thu, 27 Feb 2020 21:36:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4DA9E24677 for ; Thu, 27 Feb 2020 21:36:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DA9E24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7443C348FA5; Thu, 27 Feb 2020 13:30:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DDE6321FCA4 for ; Thu, 27 Feb 2020 13:20:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 180988F0E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 16F5A46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:35 -0500 Message-Id: <1582838290-17243-408-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 407/622] lustre: clio: support custom csi_end_io handler X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff Provide an initialize that supports a custom end_io handler. Cray-bug-id: LUS-7330 WC-bug-id: https://jira.whamcloud.com/browse/LU-12431 Lustre-commit: 6ee742fd5c56 ("LU-12431 clio: remove default csi_end_io handler") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35400 Reviewed-by: Neil Brown Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 24 ++++++++++++++++++------ fs/lustre/obdclass/cl_io.c | 19 ++++++++++++++++--- 2 files changed, 34 insertions(+), 9 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 7ac0dd2..71ca283 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2457,6 +2457,22 @@ void cl_req_attr_set(const struct lu_env *env, struct cl_object *obj, * @{ */ +struct cl_sync_io; + +typedef void (cl_sync_io_end_t)(const struct lu_env *, struct cl_sync_io *); + +void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, + cl_sync_io_end_t *end); + +int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, + long timeout); +void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, + int ioret); +static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr) +{ + cl_sync_io_init_notify(anchor, nr, NULL); +} + /** * Anchor for synchronous transfer. This is allocated on a stack by thread * doing synchronous transfer, and a pointer to this structure is set up in @@ -2470,14 +2486,10 @@ struct cl_sync_io { int csi_sync_rc; /** completion to be signaled when transfer is complete. */ wait_queue_head_t csi_waitq; + /** callback to invoke when this IO is finished */ + cl_sync_io_end_t *csi_end_io; }; -void cl_sync_io_init(struct cl_sync_io *anchor, int nr); -int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, - long timeout); -void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, - int ioret); - /** @} cl_sync_io */ /** \defgroup cl_env cl_env diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 4278bc0..14849ed 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1024,16 +1024,26 @@ void cl_req_attr_set(const struct lu_env *env, struct cl_object *obj, EXPORT_SYMBOL(cl_req_attr_set); /** - * Initialize synchronous io wait anchor + * Initialize synchronous io wait @anchor for @nr pages with optional + * @end handler. + * + * @anchor owned by caller, initialzied here. + * @nr number of pages initally pending in sync. + * @end optional callback sync_io completion, can be used to + * trigger erasure coding, integrity, dedupe, or similar + * operation. @end is called with a spinlock on + * anchor->csi_waitq.lock */ -void cl_sync_io_init(struct cl_sync_io *anchor, int nr) +void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, + cl_sync_io_end_t *end) { memset(anchor, 0, sizeof(*anchor)); init_waitqueue_head(&anchor->csi_waitq); atomic_set(&anchor->csi_sync_nr, nr); anchor->csi_sync_rc = 0; + anchor->csi_end_io = end; } -EXPORT_SYMBOL(cl_sync_io_init); +EXPORT_SYMBOL(cl_sync_io_init_notify); /** * Wait until all IO completes. Transfer completion routine has to call @@ -1088,6 +1098,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, LASSERT(atomic_read(&anchor->csi_sync_nr) > 0); if (atomic_dec_and_lock(&anchor->csi_sync_nr, &anchor->csi_waitq.lock)) { + cl_sync_io_end_t *end_io = anchor->csi_end_io; /* * Holding the lock across both the decrement and @@ -1095,6 +1106,8 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, * before the wakeup completes. */ wake_up_all_locked(&anchor->csi_waitq); + if (end_io) + end_io(env, anchor); spin_unlock(&anchor->csi_waitq.lock); /* Can't access anchor any more */ From patchwork Thu Feb 27 21:14:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410469 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1D1592A for ; Thu, 27 Feb 2020 21:39:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AA12A24690 for ; Thu, 27 Feb 2020 21:39:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA12A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B8B2B34A53C; Thu, 27 Feb 2020 13:32:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F54221FA3D for ; Thu, 27 Feb 2020 13:20:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1AF6F8F0F; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 19B4B46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:36 -0500 Message-Id: <1582838290-17243-409-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 408/622] lustre: llite: release active extent on sync write commit X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler Processes can wait forever in osc_extent_wait() for the extent state to change because the extent write is not started before the wait begins. A 4.7 kernel change to generic_write_sync() modified it to check IOCB_DSYNC instead of O_SYNC. Thus an active extent is not released (written) in osc_io_commit_async() in the synchronous case. Cray-bug-id: LUS-7435 WC-bug-id: https://jira.whamcloud.com/browse/LU-12536 Lustre-commit: a9af7100ce72 ("LU-12536 llite: release active extent on sync write commit") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/35472 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 9 +++++++-- fs/lustre/llite/llite_internal.h | 4 +++- fs/lustre/llite/rw.c | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 5a3e80e..6f418e0 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1407,7 +1407,8 @@ static bool file_is_noatime(const struct file *file) return false; } -void ll_io_init(struct cl_io *io, const struct file *file, int write) +void ll_io_init(struct cl_io *io, const struct file *file, int write, + struct vvp_io_args *args) { struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct inode *inode = file_inode(file); @@ -1420,7 +1421,11 @@ void ll_io_init(struct cl_io *io, const struct file *file, int write) io->u.ci_wr.wr_sync = file->f_flags & O_SYNC || file->f_flags & O_DIRECT || IS_SYNC(inode); + io->u.ci_wr.wr_sync |= !!(args && + (args->u.normal.via_iocb->ki_flags & + IOCB_DSYNC)); } + io->ci_obj = ll_i2info(inode)->lli_clob; io->ci_lockreq = CILR_MAYBE; if (ll_file_nolock(file)) { @@ -1491,7 +1496,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, restart: io = vvp_env_thread_io(env); - ll_io_init(io, file, iot == CIT_WRITE); + ll_io_init(io, file, iot == CIT_WRITE, args); io->ci_ndelay_tried = retried; if (cl_io_rw_init(env, io, iot, *ppos, count) == 0) { diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index a0d631d..49c0c78 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -786,7 +786,6 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, struct ll_file_data *file, loff_t pos, size_t count, int rw); -void ll_io_init(struct cl_io *io, const struct file *file, int write); enum { LPROC_LL_READ_BYTES, @@ -1056,6 +1055,9 @@ static inline struct vvp_io_args *ll_env_args(const struct lu_env *env) return &ll_env_info(env)->lti_args; } +void ll_io_init(struct cl_io *io, const struct file *file, int write, + struct vvp_io_args *args); + /* llite/llite_mmap.c */ int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last); diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index fe9a2b0..9c4b89f 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -503,7 +503,7 @@ static void ll_readahead_handle_work(struct work_struct *wq) } io = vvp_env_thread_io(env); - ll_io_init(io, file, 0); + ll_io_init(io, file, 0, NULL); rc = ll_readahead_file_kms(env, io, &kms); if (rc != 0) From patchwork Thu Feb 27 21:14:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 114CF17E0 for ; Thu, 27 Feb 2020 21:44:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EE43224690 for ; Thu, 27 Feb 2020 21:44:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE43224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D5C6934AB9F; Thu, 27 Feb 2020 13:35:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 965FA21FA3D for ; Thu, 27 Feb 2020 13:20:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1DC148F10; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1C89B46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:37 -0500 Message-Id: <1582838290-17243-410-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 409/622] lustre: obd: harden debugfs handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" While the seq_file private data shouldn't disappear from under us just in case always test if the private field is set. If not return -ENODEV for debugfs read and write operations. WC-bug-id: https://jira.whamcloud.com/browse/LU-8066 Lustre-commit: 44d450890f43 ("LU-8066 obd: harden debugfs handling") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/35575 Reviewed-by: Arshad Hussain Reviewed-by: Alex Zhuravlev Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 6269bd3..fdc1b19 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -519,6 +519,8 @@ void lprocfs_stats_collect(struct lprocfs_stats *stats, int idx, #define LPROC_SEQ_FOPS_RO_TYPE(name, type) \ static int name##_##type##_seq_show(struct seq_file *m, void *v)\ { \ + if (!m->private) \ + return -ENODEV; \ return lprocfs_rd_##type(m, m->private); \ } \ LPROC_SEQ_FOPS_RO(name##_##type) @@ -526,6 +528,8 @@ void lprocfs_stats_collect(struct lprocfs_stats *stats, int idx, #define LPROC_SEQ_FOPS_RW_TYPE(name, type) \ static int name##_##type##_seq_show(struct seq_file *m, void *v)\ { \ + if (!m->private) \ + return -ENODEV; \ return lprocfs_rd_##type(m, m->private); \ } \ static ssize_t name##_##type##_seq_write(struct file *file, \ @@ -533,6 +537,9 @@ void lprocfs_stats_collect(struct lprocfs_stats *stats, int idx, loff_t *off) \ { \ struct seq_file *seq = file->private_data; \ + \ + if (!seq->private) \ + return -ENODEV; \ return lprocfs_wr_##type(file, buffer, \ count, seq->private); \ } \ From patchwork Thu Feb 27 21:14:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410473 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF1FB92A for ; Thu, 27 Feb 2020 21:39:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A7FA824690 for ; Thu, 27 Feb 2020 21:39:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A7FA824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9784734A55F; Thu, 27 Feb 2020 13:32:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D778E21FE29 for ; Thu, 27 Feb 2020 13:20:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 20AA18F11; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1F83747C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:38 -0500 Message-Id: <1582838290-17243-411-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 410/622] lustre: obd: add rmfid support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev a new RPC_REINT_RMFID has been introduced by the patch. it's supposed to be used with corresponding llapi_rmfid() to unlink a batch of MDS files by their FIDs. the caller has to have permission to modify parent dir(s) and the objects themselves. WC-bug-id: https://jira.whamcloud.com/browse/LU-12090 Lustre-commit: 1fd63fcb045c ("LU-12090 utils: lfs rmfid") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/34449 Reviewed-by: Li Xi Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 2 + fs/lustre/include/obd.h | 2 + fs/lustre/include/obd_class.h | 12 ++++ fs/lustre/include/obd_support.h | 1 + fs/lustre/llite/dir.c | 54 +++++++++++++++++- fs/lustre/lmv/lmv_obd.c | 98 +++++++++++++++++++++++++++++++++ fs/lustre/mdc/mdc_request.c | 76 ++++++++++++++++++++++++- fs/lustre/ptlrpc/layout.c | 25 +++++++++ fs/lustre/ptlrpc/lproc_ptlrpc.c | 1 + fs/lustre/ptlrpc/wiretest.c | 4 +- include/uapi/linux/lustre/lustre_idl.h | 1 + include/uapi/linux/lustre/lustre_user.h | 10 ++++ 12 files changed, 283 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index dca4ef4..feb5e77 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -165,6 +165,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_format RQF_MDS_SWAP_LAYOUTS; extern struct req_format RQF_MDS_REINT_MIGRATE; extern struct req_format RQF_MDS_REINT_RESYNC; +extern struct req_format RQF_MDS_RMFID; /* MDS hsm formats */ extern struct req_format RQF_MDS_HSM_STATE_GET; extern struct req_format RQF_MDS_HSM_STATE_SET; @@ -236,6 +237,7 @@ void req_capsule_shrink(struct req_capsule *pill, extern struct req_msg_field RMF_CLOSE_DATA; extern struct req_msg_field RMF_FILE_SECCTX_NAME; extern struct req_msg_field RMF_FILE_SECCTX; +extern struct req_msg_field RMF_FID_ARRAY; /* * connection handle received in MDS_CONNECT request. diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 53d078e..886c697 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -1039,6 +1039,8 @@ struct md_ops { int (*unpackmd)(struct obd_export *exp, struct lmv_stripe_md **plsm, const union lmv_mds_md *lmv, size_t lmv_size); + int (*rmfid)(struct obd_export *exp, struct fid_array *fa, int *rcs, + struct ptlrpc_request_set *set); }; static inline struct md_open_data *obd_mod_alloc(void) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index b8afa5a..bc01eca 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1663,6 +1663,18 @@ static inline int md_unpackmd(struct obd_export *exp, return MDP(exp->exp_obd, unpackmd)(exp, plsm, lmm, lmm_size); } +static inline int md_rmfid(struct obd_export *exp, struct fid_array *fa, + int *rcs, struct ptlrpc_request_set *set) +{ + int rc; + + rc = exp_check_ops(exp); + if (rc) + return rc; + + return MDP(exp->exp_obd, rmfid)(exp, fa, rcs, set); +} + /* OBD Metadata Support */ int obd_init_caches(void); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 23f6bae..c66b61a 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -194,6 +194,7 @@ #define OBD_FAIL_MDS_CHANGELOG_INIT 0x151 #define OBD_FAIL_MDS_REINT_MULTI_NET 0x159 #define OBD_FAIL_MDS_REINT_MULTI_NET_REP 0x15a +#define OBD_FAIL_MDS_RMFID_NET 0x166 /* layout lock */ #define OBD_FAIL_MDS_NO_LL_GETATTR 0x170 diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f87ddd2..3540c18 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1180,6 +1180,57 @@ static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl) return rc; } +int ll_rmfid(struct file *file, void __user *arg) +{ + const struct fid_array __user *ufa = arg; + struct fid_array *lfa = NULL; + size_t size; + unsigned int nr; + int i, rc, *rcs = NULL; + + if (!capable(CAP_DAC_READ_SEARCH) && + !(ll_i2sbi(file_inode(file))->ll_flags & LL_SBI_USER_FID2PATH)) + return -EPERM; + /* Only need to get the buflen */ + if (get_user(nr, &ufa->fa_nr)) + return -EFAULT; + /* DoS protection */ + if (nr > OBD_MAX_FIDS_IN_ARRAY) + return -E2BIG; + + size = offsetof(struct fid_array, fa_fids[nr]); + lfa = kzalloc(size, GFP_NOFS); + if (!lfa) + return -ENOMEM; + rcs = kcalloc(nr, sizeof(int), GFP_NOFS); + if (!rcs) { + rc = -ENOMEM; + goto free_lfa; + } + + if (copy_from_user(lfa, arg, size)) { + rc = -EFAULT; + goto free_rcs; + } + + /* Call mdc_iocontrol */ + rc = md_rmfid(ll_i2mdexp(file_inode(file)), lfa, rcs, NULL); + if (!rc) { + for (i = 0; i < nr; i++) + if (rcs[i]) + lfa->fa_fids[i].f_ver = rcs[i]; + if (copy_to_user(arg, lfa, size)) + rc = -EFAULT; + } + +free_rcs: + kfree(rcs); +free_lfa: + kfree(lfa); + + return rc; +} + /* This function tries to get a single name component, * to send to the server. No actual path traversal involved, * so we limit to NAME_MAX @@ -1544,7 +1595,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ptlrpc_req_finished(request); return rc; } - + case LL_IOC_RMFID: + return ll_rmfid(file, (void __user *)arg); case LL_IOC_LOV_SWAP_LAYOUTS: return -EPERM; case IOC_OBD_STATFS: diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index e9f9c36..d323250 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2930,6 +2930,103 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, return -EINVAL; } +static int lmv_rmfid(struct obd_export *exp, struct fid_array *fa, + int *__rcs, struct ptlrpc_request_set *_set) +{ + struct obd_device *obddev = class_exp2obd(exp); + struct ptlrpc_request_set *set = _set; + struct lmv_obd *lmv = &obddev->u.lmv; + int tgt_count = lmv->desc.ld_tgt_count; + struct fid_array *fat, **fas = NULL; + int i, rc, **rcs = NULL; + + if (!set) { + set = ptlrpc_prep_set(); + if (!set) + return -ENOMEM; + } + + /* split FIDs by targets */ + fas = kcalloc(tgt_count, sizeof(fas), GFP_NOFS); + if (!fas) { + rc = -ENOMEM; + goto out; + } + rcs = kcalloc(tgt_count, sizeof(int *), GFP_NOFS); + if (!rcs) { + rc = -ENOMEM; + goto out_fas; + } + + for (i = 0; i < fa->fa_nr; i++) { + unsigned int idx; + + rc = lmv_fld_lookup(lmv, &fa->fa_fids[i], &idx); + if (rc) { + CDEBUG(D_OTHER, "can't lookup "DFID": rc = %d\n", + PFID(&fa->fa_fids[i]), rc); + continue; + } + LASSERT(idx < tgt_count); + if (!fas[idx]) + fas[idx] = kzalloc(offsetof(struct fid_array, + fa_fids[fa->fa_nr]), + GFP_NOFS); + if (!fas[idx]) { + rc = -ENOMEM; + goto out; + } + if (!rcs[idx]) + rcs[idx] = kcalloc(fa->fa_nr, sizeof(int), GFP_NOFS); + if (!rcs[idx]) { + rc = -ENOMEM; + goto out; + } + + fat = fas[idx]; + fat->fa_fids[fat->fa_nr++] = fa->fa_fids[i]; + } + + for (i = 0; i < tgt_count; i++) { + fat = fas[i]; + if (!fat || fat->fa_nr == 0) + continue; + rc = md_rmfid(lmv->tgts[i]->ltd_exp, fat, rcs[i], set); + } + + rc = ptlrpc_set_wait(NULL, set); + if (rc == 0) { + int j = 0; + + for (i = 0; i < tgt_count; i++) { + fat = fas[i]; + if (!fat || fat->fa_nr == 0) + continue; + /* copy FIDs back */ + memcpy(fa->fa_fids + j, fat->fa_fids, + fat->fa_nr * sizeof(struct lu_fid)); + /* copy rcs back */ + memcpy(__rcs + j, rcs[i], fat->fa_nr * sizeof(**rcs)); + j += fat->fa_nr; + } + } + if (set != _set) + ptlrpc_set_destroy(set); + +out: + for (i = 0; i < tgt_count; i++) { + if (fas) + kfree(fas[i]); + if (rcs) + kfree(rcs[i]); + } + kfree(rcs); +out_fas: + kfree(fas); + + return rc; +} + /** * Asynchronously set by key a value associated with a LMV device. * @@ -3517,6 +3614,7 @@ static int lmv_merge_attr(struct obd_export *exp, .revalidate_lock = lmv_revalidate_lock, .get_fid_from_lsm = lmv_get_fid_from_lsm, .unpackmd = lmv_unpackmd, + .rmfid = lmv_rmfid, }; static int __init lmv_init(void) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 7bc6196..693c455 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -2585,6 +2585,79 @@ static int mdc_fsync(struct obd_export *exp, const struct lu_fid *fid, return rc; } +struct mdc_rmfid_args { + int *mra_rcs; + int mra_nr; +}; + +int mdc_rmfid_interpret(const struct lu_env *env, struct ptlrpc_request *req, + void *args, int rc) +{ + struct mdc_rmfid_args *aa; + int *rcs, size; + + if (!rc) { + aa = ptlrpc_req_async_args(aa, req); + + size = req_capsule_get_size(&req->rq_pill, &RMF_RCS, + RCL_SERVER); + LASSERT(size == sizeof(int) * aa->mra_nr); + rcs = req_capsule_server_get(&req->rq_pill, &RMF_RCS); + LASSERT(rcs); + LASSERT(aa->mra_rcs); + LASSERT(aa->mra_nr); + memcpy(aa->mra_rcs, rcs, size); + } + + return rc; +} + +static int mdc_rmfid(struct obd_export *exp, struct fid_array *fa, + int *rcs, struct ptlrpc_request_set *set) +{ + struct ptlrpc_request *req; + struct mdc_rmfid_args *aa; + struct mdt_body *b; + struct lu_fid *tmp; + int rc, flen; + + req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_MDS_RMFID); + if (!req) + return -ENOMEM; + + flen = fa->fa_nr * sizeof(struct lu_fid); + req_capsule_set_size(&req->rq_pill, &RMF_FID_ARRAY, + RCL_CLIENT, flen); + req_capsule_set_size(&req->rq_pill, &RMF_FID_ARRAY, + RCL_SERVER, flen); + req_capsule_set_size(&req->rq_pill, &RMF_RCS, + RCL_SERVER, fa->fa_nr * sizeof(u32)); + rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_RMFID); + if (rc) { + ptlrpc_request_free(req); + return rc; + } + tmp = req_capsule_client_get(&req->rq_pill, &RMF_FID_ARRAY); + memcpy(tmp, fa->fa_fids, flen); + + mdc_pack_body(req, NULL, 0, 0, -1, 0); + b = req_capsule_client_get(&req->rq_pill, &RMF_MDT_BODY); + b->mbo_ctime = ktime_get_real_seconds(); + + ptlrpc_request_set_replen(req); + + LASSERT(rcs); + aa = ptlrpc_req_async_args(aa, req); + aa->mra_rcs = rcs; + aa->mra_nr = fa->fa_nr; + req->rq_interpret_reply = mdc_rmfid_interpret; + + ptlrpc_set_add_req(set, req); + ptlrpc_check_set(NULL, set); + + return rc; +} + static int mdc_import_event(struct obd_device *obd, struct obd_import *imp, enum obd_import_event event) { @@ -2886,7 +2959,8 @@ static int mdc_cleanup(struct obd_device *obd) .set_open_replay_data = mdc_set_open_replay_data, .clear_open_replay_data = mdc_clear_open_replay_data, .intent_getattr_async = mdc_intent_getattr_async, - .revalidate_lock = mdc_revalidate_lock + .revalidate_lock = mdc_revalidate_lock, + .rmfid = mdc_rmfid, }; static int __init mdc_init(void) diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index c10b593..fb60558 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -318,6 +318,21 @@ &RMF_DLM_REQ }; +static const struct req_msg_field *mds_rmfid_client[] = { + &RMF_PTLRPC_BODY, + &RMF_MDT_BODY, + &RMF_FID_ARRAY, + &RMF_CAPA1, + &RMF_CAPA2, +}; + +static const struct req_msg_field *mds_rmfid_server[] = { + &RMF_PTLRPC_BODY, + &RMF_MDT_BODY, + &RMF_FID_ARRAY, + &RMF_RCS, +}; + static const struct req_msg_field *obd_connect_client[] = { &RMF_PTLRPC_BODY, &RMF_TGTUUID, @@ -731,6 +746,7 @@ &RQF_MDS_HSM_ACTION, &RQF_MDS_HSM_REQUEST, &RQF_MDS_SWAP_LAYOUTS, + &RQF_MDS_RMFID, &RQF_OST_CONNECT, &RQF_OST_DISCONNECT, &RQF_OST_QUOTACTL, @@ -929,6 +945,10 @@ struct req_msg_field RMF_NAME = DEFINE_MSGF("name", RMF_F_STRING, -1, NULL, NULL); EXPORT_SYMBOL(RMF_NAME); +struct req_msg_field RMF_FID_ARRAY = + DEFINE_MSGF("fid_array", 0, -1, NULL, NULL); +EXPORT_SYMBOL(RMF_FID_ARRAY); + struct req_msg_field RMF_SYMTGT = DEFINE_MSGF("symtgt", RMF_F_STRING, -1, NULL, NULL); EXPORT_SYMBOL(RMF_SYMTGT); @@ -1511,6 +1531,11 @@ struct req_format RQF_MDS_WRITEPAGE = mdt_body_capa, mdt_body_only); EXPORT_SYMBOL(RQF_MDS_WRITEPAGE); +struct req_format RQF_MDS_RMFID = + DEFINE_REQ_FMT0("MDS_RMFID", mds_rmfid_client, + mds_rmfid_server); +EXPORT_SYMBOL(RQF_MDS_RMFID); + struct req_format RQF_LLOG_ORIGIN_HANDLE_CREATE = DEFINE_REQ_FMT0("LLOG_ORIGIN_HANDLE_CREATE", llog_origin_handle_create_client, llogd_body_only); diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index 700e109..d52a08a 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -96,6 +96,7 @@ { MDS_HSM_CT_REGISTER, "mds_hsm_ct_register" }, { MDS_HSM_CT_UNREGISTER, "mds_hsm_ct_unregister" }, { MDS_SWAP_LAYOUTS, "mds_swap_layouts" }, + { MDS_RMFID, "mds_rmfid" }, { LDLM_ENQUEUE, "ldlm_enqueue" }, { LDLM_CONVERT, "ldlm_convert" }, { LDLM_CANCEL, "ldlm_cancel" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 1d34b15..9298c97 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -178,7 +178,9 @@ void lustre_assert_wire_constants(void) (long long)MDS_HSM_CT_UNREGISTER); LASSERTF(MDS_SWAP_LAYOUTS == 61, "found %lld\n", (long long)MDS_SWAP_LAYOUTS); - LASSERTF(MDS_LAST_OPC == 62, "found %lld\n", + LASSERTF(MDS_RMFID == 62, "found %lld\n", + (long long)MDS_RMFID); + LASSERTF(MDS_LAST_OPC == 63, "found %lld\n", (long long)MDS_LAST_OPC); LASSERTF(REINT_SETATTR == 1, "found %lld\n", (long long)REINT_SETATTR); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 5740d42..87251ee 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1443,6 +1443,7 @@ enum mds_cmd { MDS_HSM_CT_REGISTER = 59, MDS_HSM_CT_UNREGISTER = 60, MDS_SWAP_LAYOUTS = 61, + MDS_RMFID = 62, MDS_LAST_OPC }; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 9c849ce..db36ce5 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -348,6 +348,7 @@ struct ll_ioc_lease_id { #define LL_IOC_LMV_SETSTRIPE _IOWR('f', 240, struct lmv_user_md) #define LL_IOC_LMV_GETSTRIPE _IOWR('f', 241, struct lmv_user_md) +#define LL_IOC_RMFID _IOR('f', 242, struct fid_array) #define LL_IOC_SET_LEASE _IOWR('f', 243, struct ll_ioc_lease) #define LL_IOC_SET_LEASE_OLD _IOWR('f', 243, long) #define LL_IOC_GET_LEASE _IO('f', 244) @@ -2149,6 +2150,15 @@ struct lu_pcc_state { char pccs_path[PATH_MAX]; }; +struct fid_array { + __u32 fa_nr; + /* make header's size equal lu_fid */ + __u32 fa_padding0; + __u64 fa_padding1; + struct lu_fid fa_fids[0]; +}; +#define OBD_MAX_FIDS_IN_ARRAY 4096 + /** @} lustreuser */ #endif /* _LUSTRE_USER_H */ From patchwork Thu Feb 27 21:14:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410477 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8779F138D for ; Thu, 27 Feb 2020 21:39:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6F93124690 for ; Thu, 27 Feb 2020 21:39:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F93124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B350E34A595; Thu, 27 Feb 2020 13:32:08 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B93D21FE2E for ; Thu, 27 Feb 2020 13:20:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2365A8F12; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 22470468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:39 -0500 Message-Id: <1582838290-17243-412-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 411/622] lnet: Convert noisy timeout error to cdebug X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn This error message in lnet_finalize_expired_responses is very noisy when nodes go down or are rebooted, and it does not provide much value to system administrators. Convert it to a CDEBUG instead WC-bug-id: https://jira.whamcloud.com/browse/LU-12439 Lustre-commit: bd3ed8cb7165 ("LU-12439 lnet: Convert noisy timeout error to cdebug") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35233 Reviewed-by: Amir Shehata Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 629856c..9a4c426 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2636,8 +2636,9 @@ struct lnet_mt_event_info { nid = rspt->rspt_next_hop_nid; - CNETERR("Response timed out: md = %p: nid = %s\n", - md, libcfs_nid2str(nid)); + CDEBUG(D_NET, + "Response timeout: md = %p: nid = %s\n", + md, libcfs_nid2str(nid)); LNetMDUnlink(rspt->rspt_mdh); lnet_rspt_free(rspt, i); From patchwork Thu Feb 27 21:14:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410371 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97D69138D for ; Thu, 27 Feb 2020 21:36:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 807CC24677 for ; Thu, 27 Feb 2020 21:36:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 807CC24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 751B7349546; Thu, 27 Feb 2020 13:30:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7C46221FE2E for ; Thu, 27 Feb 2020 13:20:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 260C48F13; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2508A46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:40 -0500 Message-Id: <1582838290-17243-413-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 412/622] lnet: Misleading error from lnet_is_health_check X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn In the case of sending to 0@lo we never set msg_txpeer nor msg_rxpeer. This results in failing this lnet_is_health_check condition and a misleading error message. The condition is only an error the msg status is non-zero. An additional case where we can have msg_rx_committed, but not msg_rxpeer is for optimized GETs. In this case we allocate a reply message but do not set msg_rxpeer. We cannot perform further health checking on this message, but it is not an error condition. WC-bug-id: https://jira.whamcloud.com/browse/LU-12440 Lustre-commit: 6caa6ed07df0 ("LU-12440 lnet: Misleading error from lnet_is_health_check") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35235 Reviewed-by: Amir Shehata Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 9ffd874..b70a6c9 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -848,8 +848,13 @@ if ((msg->msg_tx_committed && !msg->msg_txpeer) || (msg->msg_rx_committed && !msg->msg_rxpeer)) { - CDEBUG(D_NET, "msg %p failed too early to retry and send\n", - msg); + /* The optimized GET case does not set msg_rxpeer, but status + * could be zero. Only print the error message if we have a + * non-zero status. + */ + if (status) + CDEBUG(D_NET, "msg %p status %d cannot retry\n", msg, + status); return false; } From patchwork Thu Feb 27 21:14:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D484392A for ; Thu, 27 Feb 2020 21:36:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD66024677 for ; Thu, 27 Feb 2020 21:36:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD66024677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DAEB434A115; Thu, 27 Feb 2020 13:30:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BD72F21FE2E for ; Thu, 27 Feb 2020 13:20:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 28F348F14; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 27CC646C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:41 -0500 Message-Id: <1582838290-17243-414-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 413/622] lustre: llite: do not cache write open lock for exec file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Gu Zheng , Jinshan Xiong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jinshan Xiong This is to avoid the problem that the MDT needs an extra lock revocation to make the file be able to execute. WC-bug-id: https://jira.whamcloud.com/browse/LU-4398 Lustre-commit: 6dd9d57bc006 ("LU-4398 llite: do not cache write open lock for exec file") Signed-off-by: Jinshan Xiong Signed-off-by: Gu Zheng Reviewed-on: https://review.whamcloud.com/32265 Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 6f418e0..35e31ad 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -360,7 +360,9 @@ static int ll_md_close(struct inode *inode, struct file *file) } mutex_unlock(&lli->lli_och_mutex); - if (!md_lock_match(ll_i2mdexp(inode), flags, ll_inode2fid(inode), + /* LU-4398: do not cache write open lock if the file has exec bit */ + if ((lockmode == LCK_CW && inode->i_mode & 0111) || + !md_lock_match(ll_i2mdexp(inode), flags, ll_inode2fid(inode), LDLM_IBITS, &policy, lockmode, &lockh)) rc = ll_md_real_close(inode, fd->fd_omode); From patchwork Thu Feb 27 21:14:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11D2017E0 for ; Thu, 27 Feb 2020 21:36:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EEC4F24690 for ; Thu, 27 Feb 2020 21:36:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EEC4F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3031234A141; Thu, 27 Feb 2020 13:30:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0A0E821FE2E for ; Thu, 27 Feb 2020 13:20:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2BB028F15; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2A7FD46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:42 -0500 Message-Id: <1582838290-17243-415-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 414/622] lustre: mdc: polling mode for changelog reader X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev this allows the user (like lsom_sync and similar) to follow the changelog and don't rescan getting duplicates. WC-bug-id: https://jira.whamcloud.com/browse/LU-12553 Lustre-commit: e215002883d5 ("LU-12553 mdc: polling mode for changelog reader") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/35262 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_changelog.c | 37 +++++++++++++++++++++++++++++++- include/uapi/linux/lustre/lustre_ioctl.h | 1 + 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c index fb0de68..ea74bab 100644 --- a/fs/lustre/mdc/mdc_changelog.c +++ b/fs/lustre/mdc/mdc_changelog.c @@ -37,6 +37,7 @@ #include #include +#include #include "mdc_internal.h" @@ -88,6 +89,9 @@ struct chlg_reader_state { u64 crs_rec_count; /* List of prefetched enqueued_record::enq_linkage_items */ struct list_head crs_rec_queue; + unsigned int crs_last_catidx; + unsigned int crs_last_idx; + bool crs_poll; }; struct chlg_rec_entry { @@ -132,6 +136,9 @@ static int chlg_read_cat_process_cb(const struct lu_env *env, rec = container_of(hdr, struct llog_changelog_rec, cr_hdr); + crs->crs_last_catidx = llh->lgh_hdr->llh_cat_idx; + crs->crs_last_idx = hdr->lrh_index; + if (rec->cr_hdr.lrh_type != CHANGELOG_REC) { rc = -EINVAL; CERROR("%s: not a changelog rec %x/%d in llog : rc = %d\n", @@ -225,6 +232,10 @@ static int chlg_load(void *args) goto err_out; } + crs->crs_last_catidx = -1; + crs->crs_last_idx = 0; + +again: rc = llog_open(NULL, ctx, &llh, NULL, CHANGELOG_CATALOG, LLOG_OPEN_EXISTS); if (rc) { @@ -248,12 +259,18 @@ static int chlg_load(void *args) goto err_out; } - rc = llog_cat_process(NULL, llh, chlg_read_cat_process_cb, crs, 0, 0); + rc = llog_cat_process(NULL, llh, chlg_read_cat_process_cb, crs, + crs->crs_last_catidx, crs->crs_last_idx); if (rc < 0) { CERROR("%s: fail to process llog: rc = %d\n", obd->obd_name, rc); goto err_out; } + if (!kthread_should_stop() && crs->crs_poll) { + llog_cat_close(NULL, llh); + schedule_timeout_interruptible(HZ); + goto again; + } crs->crs_eof = true; @@ -602,6 +619,23 @@ static unsigned int chlg_poll(struct file *file, poll_table *wait) return mask; } +static long chlg_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct chlg_reader_state *crs = file->private_data; + int rc; + + switch (cmd) { + case OBD_IOC_CHLG_POLL: + crs->crs_poll = !!arg; + rc = 0; + break; + default: + rc = -EINVAL; + break; + } + return rc; +} + static const struct file_operations chlg_fops = { .owner = THIS_MODULE, .llseek = chlg_llseek, @@ -610,6 +644,7 @@ static unsigned int chlg_poll(struct file *file, poll_table *wait) .open = chlg_open, .release = chlg_release, .poll = chlg_poll, + .unlocked_ioctl = chlg_ioctl, }; /** diff --git a/include/uapi/linux/lustre/lustre_ioctl.h b/include/uapi/linux/lustre/lustre_ioctl.h index b067cc6..53dd34f 100644 --- a/include/uapi/linux/lustre/lustre_ioctl.h +++ b/include/uapi/linux/lustre/lustre_ioctl.h @@ -221,6 +221,7 @@ static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data) #define OBD_IOC_START_LFSCK _IOWR('f', 230, OBD_IOC_DATA_TYPE) #define OBD_IOC_STOP_LFSCK _IOW('f', 231, OBD_IOC_DATA_TYPE) #define OBD_IOC_QUERY_LFSCK _IOR('f', 232, struct obd_ioctl_data) +#define OBD_IOC_CHLG_POLL _IOR('f', 233, long) /* lustre/lustre_user.h 240-249 */ /* was LIBCFS_IOC_DEBUG_MASK _IOWR('f', 250, long) until 2.11 */ From patchwork Thu Feb 27 21:14:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D909817E0 for ; Thu, 27 Feb 2020 21:44:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1E2E24690 for ; Thu, 27 Feb 2020 21:44:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1E2E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6A88521FFD4; Thu, 27 Feb 2020 13:35:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5F90A21FE2E for ; Thu, 27 Feb 2020 13:20:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2EBD48F16; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2D3FA47C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:43 -0500 Message-Id: <1582838290-17243-416-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 415/622] lnet: Sync the start of discovery and monitor threads X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The discovery thread starts up before the monitor thread so it may issue PUTs or GETs before the monitor thread has a chance to initialize its data structures (namely the_lnet.ln_mt_rstq). This can result in an OOPs when we attempt to attach response trackers to MDs. Introduce a completion to synchronize the startup of these threads. WC-bug-id: https://jira.whamcloud.com/browse/LU-12537 Lustre-commit: 9283e2ed6655 ("LU-12537 lnet: Sync the start of discovery and monitor threads") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35478 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 5 +++++ net/lnet/lnet/api-ni.c | 3 +++ net/lnet/lnet/lib-move.c | 1 + net/lnet/lnet/peer.c | 11 ++++++++++- 4 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index b240361..1009a69 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -1161,6 +1161,11 @@ struct lnet { /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; + /* + * Completed when the discovery and monitor threads can enter their + * work loops + */ + struct completion ln_started; }; #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 65f1f17..aa5ca52 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1062,6 +1062,7 @@ struct lnet_libhandle * INIT_LIST_HEAD(&the_lnet.ln_mt_peerNIRecovq); init_waitqueue_head(&the_lnet.ln_dc_waitq); LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); + init_completion(&the_lnet.ln_started); rc = lnet_descriptor_setup(); if (rc != 0) @@ -2583,6 +2584,8 @@ void lnet_lib_exit(void) mutex_unlock(&the_lnet.ln_api_mutex); + complete_all(&the_lnet.ln_started); + /* wait for all routers to start */ lnet_wait_router_start(); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 9a4c426..413397c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3529,6 +3529,7 @@ void lnet_monitor_thr_stop(void) lnet_build_msg_event(msg, LNET_EVENT_PUT); + wait_for_completion(&the_lnet.ln_started); /* * Must I ACK? If so I'll grab the ack_wmd out of the header and put * it back into the ACK during lnet_finalize() diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b0ca1de..49da7a1 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3258,6 +3258,8 @@ static int lnet_peer_discovery(void *arg) struct lnet_peer *lp; int rc; + wait_for_completion(&the_lnet.ln_started); + CDEBUG(D_NET, "started\n"); for (;;) { @@ -3429,7 +3431,14 @@ void lnet_peer_discovery_stop(void) LASSERT(the_lnet.ln_dc_state == LNET_DC_STATE_RUNNING); the_lnet.ln_dc_state = LNET_DC_STATE_STOPPING; - wake_up(&the_lnet.ln_dc_waitq); + + /* In the LNetNIInit() path we may be stopping discovery before it + * entered its work loop + */ + if (!completion_done(&the_lnet.ln_started)) + complete(&the_lnet.ln_started); + else + wake_up(&the_lnet.ln_dc_waitq); wait_event(the_lnet.ln_dc_waitq, the_lnet.ln_dc_state == LNET_DC_STATE_SHUTDOWN); From patchwork Thu Feb 27 21:14:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40F1B17E0 for ; Thu, 27 Feb 2020 21:36:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 26D2A246A1 for ; Thu, 27 Feb 2020 21:36:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26D2A246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 20EFE3495F3; Thu, 27 Feb 2020 13:30:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B760421FE52 for ; Thu, 27 Feb 2020 13:20:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 311838F17; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2FFC8468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:44 -0500 Message-Id: <1582838290-17243-417-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 416/622] lustre: llite: don't check vmpage refcount in ll_releasepage() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong We could not use vmpage refcount to check whether page could be released because it break invalidate_complete_page2(): See comments: /* * This is like invalidate_complete_page(), except it ignores the page's * refcount. We do this because invalidate_inode_pages2() needs stronger * invalidation guarantees, and cannot afford to leave pages behind because * shrink_page_list() has a temp ref on them, or because they're transiently * sitting in the lru_cache_add() pagevecs. */ So checking refcount > 3 might be wrong here, one common case is page might be transiently in lru_cache_add(). Since we have checked whether vmpage is used by cl_page later in the function, and vmpage will be locked before called, it should be safe to remove vmpage refcount check. One of problem currently is following DIO will mostly fall back to Buffer IO: $ dd if=/dev/zero of=data bs=1M count=1 $ dd if=/dev/zero of=data bs=1M count=1 oflag=direct conv=notrunc Which is because DIO will firstly try to writeback and invalidate clean page which fail because vmpage refcount could be 4 here. Function calls come from: |->generic_file_direct_write() |->filemap_write_and_wait_range() |->invalidate_inode_pages2_range() |->invalidate_complete_page2() If a page can not be invalidated, return 0 to fall back to buffered write. |->try_to_release_page() |->ll_releasepage() return 0 because of vmpage count is 4 > 3 |->generic_file_buffered_write WC-bug-id: https://jira.whamcloud.com/browse/LU-12587 Lustre-commit: e59f0c9a245f ("LU-12587 llite: don't check vmpage refcount in ll_releasepage()") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35610 Reviewed-by: Patrick Farrell Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw26.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index f5c1479..75348bf 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -119,10 +119,6 @@ static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask) if (!obj) return 1; - /* 1 for caller, 1 for cl_page and 1 for page cache */ - if (page_count(vmpage) > 3) - return 0; - page = cl_vmpage_page(vmpage, obj); if (!page) return 1; From patchwork Thu Feb 27 21:14:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410481 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9009592A for ; Thu, 27 Feb 2020 21:39:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 787BC24690 for ; Thu, 27 Feb 2020 21:39:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 787BC24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9A43C34A5BB; Thu, 27 Feb 2020 13:32:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0ABC721FE5C for ; Thu, 27 Feb 2020 13:20:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 33BD78F18; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 32AAB46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:45 -0500 Message-Id: <1582838290-17243-418-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 417/622] lnet: Deprecate live and dead router check params X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Rather than delete these params let's deprecate them for one release and print a warning to console if the user is setting them. WC-bug-id: https://jira.whamcloud.com/browse/LU-12492 Lustre-commit: fca1a999899a ("LU-12492 lnet: Deprecate live and dead router check params") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35387 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/lnet/module.c | 4 ++++ net/lnet/lnet/router.c | 8 ++++++++ 3 files changed, 14 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 3dd56a2..dd0075b 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -501,6 +501,8 @@ struct lnet_ni * extern unsigned int lnet_drop_asym_route; extern unsigned int router_sensitivity_percentage; extern int alive_router_check_interval; +extern int live_router_check_interval; +extern int dead_router_check_interval; extern int portal_rotor; int lnet_lib_init(void); diff --git a/net/lnet/lnet/module.c b/net/lnet/lnet/module.c index 5905f38..939c255 100644 --- a/net/lnet/lnet/module.c +++ b/net/lnet/lnet/module.c @@ -245,6 +245,10 @@ static int __init lnet_init(void) return rc; } + if (live_router_check_interval != INT_MIN || + dead_router_check_interval != INT_MIN) + LCONSOLE_WARN("live_router_check_interval and dead_router_check_interval have been deprecated. Use alive_router_check_interval instead. Ignoring these deprecated parameters.\n"); + rc = blocking_notifier_chain_register(&libcfs_ioctl_list, &lnet_ioctl_handler); LASSERT(!rc); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index eb76c72..892164b 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -78,6 +78,14 @@ module_param(avoid_asym_router_failure, int, 0644); MODULE_PARM_DESC(avoid_asym_router_failure, "Avoid asymmetrical router failures (0 to disable)"); +int dead_router_check_interval = INT_MIN; +module_param(dead_router_check_interval, int, 0444); +MODULE_PARM_DESC(dead_router_check_interval, "(DEPRECATED - Use alive_router_check_interval)"); + +int live_router_check_interval = INT_MIN; +module_param(live_router_check_interval, int, 0444); +MODULE_PARM_DESC(live_router_check_interval, "(DEPRECATED - Use alive_router_check_interval)"); + int alive_router_check_interval = 60; module_param(alive_router_check_interval, int, 0644); MODULE_PARM_DESC(alive_router_check_interval, "Seconds between live router health checks (<= 0 to disable)"); From patchwork Thu Feb 27 21:14:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F357717E0 for ; Thu, 27 Feb 2020 21:36:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DC286246A1 for ; Thu, 27 Feb 2020 21:36:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC286246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0FD4134A190; Thu, 27 Feb 2020 13:30:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E9E121FE5C for ; Thu, 27 Feb 2020 13:20:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3676C8F19; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 355BF46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:46 -0500 Message-Id: <1582838290-17243-419-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 418/622] lnet: Detach rspt when md_threshold is infinite X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn MDs for pings use the infinite threshold on MD operations. As such they aren't normally unlinkable as determined by lnet_md_unlinkable(). We can cover this case by checking whether the refcount is zero and threshold is LNET_MD_THRESH_INF. Cray-bug-id: LUS-7366 WC-bug-id: https://jira.whamcloud.com/browse/LU-12441 Lustre-commit: ebbf909a1c2d ("LU-12441 lnet: Detach rspt when md_threshold is infinite") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35452 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index b70a6c9..805d5b9 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -825,10 +825,12 @@ lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev); } - if (unlink) { + if (unlink || (md->md_refcount == 0 && + md->md_threshold == LNET_MD_THRESH_INF)) lnet_detach_rsp_tracker(md, cpt); + + if (unlink) lnet_md_unlink(md); - } msg->msg_md = NULL; } From patchwork Thu Feb 27 21:14:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3DA2717E0 for ; Thu, 27 Feb 2020 21:37:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2641924690 for ; Thu, 27 Feb 2020 21:37:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2641924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3401534A1BE; Thu, 27 Feb 2020 13:30:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 929F921FE5C for ; Thu, 27 Feb 2020 13:20:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 390F58F1A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3819F46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:47 -0500 Message-Id: <1582838290-17243-420-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 419/622] lnet: Return EHOSTUNREACH for unreachable gateway X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Commit f1d0660a5bbe ("lnet: Do not allow gateways on remote nets") contains a flaw in that it shouldn't be a fatal error to encounter an unreachable gateway when parsing routes. Parsing should continue in case there are any valid, reachable routes that are being added. Returning EINAL here will cause a failure to load the LNet module. lnet_parse_route() explicitly allows for lnet_add_route() to return EHOSTUNREACH for just this purpose. Fixes: f1d0660a5bbe ("lnet: Do not allow gateways on remote nets") WC-bug-id: https://jira.whamcloud.com/browse/LU-12595 Lustre-commit: 7c12c24c8a10 ("LU-12595 lnet: Return EHOSTUNREACH for unreachable gateway") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35630 Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 892164b..4ab587d 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -448,7 +448,7 @@ static void lnet_shuffle_seed(void) CERROR("Cannot add route with gateway %s. There is no local interface configured on LNet %s\n", libcfs_nid2str(gateway), libcfs_net2str(LNET_NIDNET(gateway))); - return -EINVAL; + return -EHOSTUNREACH; } /* Assume net, route, all new */ From patchwork Thu Feb 27 21:14:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410485 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9B38138D for ; Thu, 27 Feb 2020 21:39:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D28A024690 for ; Thu, 27 Feb 2020 21:39:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D28A024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8135B349A20; Thu, 27 Feb 2020 13:32:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D31C921FE6E for ; Thu, 27 Feb 2020 13:20:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3C2B28F1B; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3ACB447C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:48 -0500 Message-Id: <1582838290-17243-421-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 420/622] lustre: ptlrpc: Don't get jobid in body_v2 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Some Lustre messages are still sent with ptlrpc_body_v2, which does not have space for the jobid. This results in errors like this when getting the jobid from these messages, which we do now that the jobid is in all RPC debug: LustreError: 6817:0:(pack_generic.c:425:lustre_msg_buf_v2()) msg 000000005c83b7a2 buffer[0] size 152 too small (required 184, opc=-1) While we should stop sending ptlrpc_body_v2 messages, we we still have to support these messages from older servers. So put a check in lustre_msg_get_jobid so it won't try to get the jobid if it's the old, smaller RPC body. Fixes: 9eabc4eaba47 ("lustre: ptlrpc: Add jobid to rpctrace debug messages") WC-bug-id: https://jira.whamcloud.com/browse/LU-12523 Lustre-commit: 544701a782fb ("LU-12523 ptlrpc: Don't get jobid in body_v2") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35584 Reviewed-by: Ann Koehler Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 4 ++-- fs/lustre/ptlrpc/pack_generic.c | 3 ++- fs/lustre/ptlrpc/service.c | 4 ++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index bd641cc..9920a95 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1644,7 +1644,7 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req) imp->imp_obd->obd_uuid.uuid, lustre_msg_get_status(req->rq_reqmsg), req->rq_xid, obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg), - lustre_msg_get_jobid(req->rq_reqmsg)); + lustre_msg_get_jobid(req->rq_reqmsg) ?: ""); rc = ptl_send_rpc(req, 0); if (rc == -ENOMEM) { @@ -2065,7 +2065,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) req->rq_xid, obd_import_nid2str(imp), lustre_msg_get_opc(req->rq_reqmsg), - lustre_msg_get_jobid(req->rq_reqmsg)); + lustre_msg_get_jobid(req->rq_reqmsg) ?: ""); spin_lock(&imp->imp_lock); /* diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 7acb4a8..b066113 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2429,7 +2429,8 @@ void _debug_req(struct ptlrpc_request *req, DEBUG_REQ_FLAGS(req), req_ok ? lustre_msg_get_flags(req->rq_reqmsg) : -1, rep_flags, req->rq_status, rep_status, - req_ok ? lustre_msg_get_jobid(req->rq_reqmsg) : ""); + req_ok ? lustre_msg_get_jobid(req->rq_reqmsg) ?: "" + : ""); va_end(args); } EXPORT_SYMBOL(_debug_req); diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index 3132a1e..f40cb8d 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1765,7 +1765,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, lustre_msg_get_status(request->rq_reqmsg), request->rq_xid, libcfs_id2str(request->rq_peer), lustre_msg_get_opc(request->rq_reqmsg), - lustre_msg_get_jobid(request->rq_reqmsg)); + lustre_msg_get_jobid(request->rq_reqmsg) ?: ""); if (lustre_msg_get_opc(request->rq_reqmsg) != OBD_PING) CFS_FAIL_TIMEOUT_MS(OBD_FAIL_PTLRPC_PAUSE_REQ, cfs_fail_val); @@ -1807,7 +1807,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, request->rq_xid, libcfs_id2str(request->rq_peer), lustre_msg_get_opc(request->rq_reqmsg), - lustre_msg_get_jobid(request->rq_reqmsg), + lustre_msg_get_jobid(request->rq_reqmsg) ?: "", timediff_usecs, arrived_usecs, (request->rq_repmsg ? From patchwork Thu Feb 27 21:14:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F23A692A for ; Thu, 27 Feb 2020 21:39:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DAB1224690 for ; Thu, 27 Feb 2020 21:39:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DAB1224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69AFA34A615; Thu, 27 Feb 2020 13:32:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3723321FE75 for ; Thu, 27 Feb 2020 13:20:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3EE658F1C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3DB26468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:49 -0500 Message-Id: <1582838290-17243-422-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 421/622] lnet: Defer rspt cleanup when MD queued for unlink X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn When an MD is queued for unlink its lnet_libhandle is invalidated so that future lookups of the MD fail. As a result, the monitor thread cannot detach the response tracker from such an MD, and instead must wait for the remaining operations on the MD to complete before it can safely free the response tracker and remove it from the list. Freeing the memory while there are pending operations on the MD can result in a use after free situation when the final operation on the MD completes and we attempt to remove the response tracker from the MD via the lnet_msg_detach_md()->lnet_detach_rsp_tracker() call chain. Here we introduce zombie lists for such response trackers. This will allow us to also handle the case where there are response trackers on the monitor queue during LNet shutdown. In this instance the zombie response trackers will be freed when either all the operations on the MD have completed (this free'ing is performed by lnet_detach_rsp_tracker()) or after the LND Nets have shutdown since we are ensured there will not be any more operations on the associated MDs (this free'ing is performed by lnet_clean_zombie_rstqs()). Three other small changes are included in this patch: - When deleting the response tracker from the monitor's list we should use list_del() rather than list_del_init() since we'll be freeing the response tracker after removing it from the list. - Perform a single ktime_get() call for each local queue. - Move the check of whether the local queue is empty outside of the net lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-12568 Lustre-commit: 4a4ac34de42c ("LU-12568 lnet: Defer rspt cleanup when MD queued for unlink") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35576 Reviewed-by: Amir Shehata Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 3 + include/linux/lnet/lib-types.h | 7 +++ net/lnet/lnet/api-ni.c | 31 ++++++++++ net/lnet/lnet/lib-move.c | 134 +++++++++++++++++++++++++++-------------- 4 files changed, 131 insertions(+), 44 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index dd0075b..b1407b3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -571,6 +571,8 @@ int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis, void lnet_schedule_blocked_locked(struct lnet_rtrbufpool *rbp); void lnet_drop_routed_msgs_locked(struct list_head *list, int cpt); +struct list_head **lnet_create_array_of_queues(void); + /* portals functions */ /* portals attributes */ static inline int @@ -641,6 +643,7 @@ struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni, void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, unsigned int len); void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt); +void lnet_clean_zombie_rstqs(void); void lnet_finalize(struct lnet_msg *msg, int rc); bool lnet_send_error_simulation(struct lnet_msg *msg, diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 1009a69..904ef7a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -1158,6 +1158,13 @@ struct lnet { * based on the mdh cookie. */ struct list_head **ln_mt_rstq; + /* + * A response tracker becomes a zombie when the associated MD is queued + * for unlink before the response tracker is detached from the MD. An + * entry on a zombie list can be freed when either the remaining + * operations on the MD complete or when LNet has shut down. + */ + struct list_head **ln_mt_zombie_rstqs; /* recovery eq handler */ struct lnet_handle_eq ln_mt_eqh; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index aa5ca52..e773839 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1028,6 +1028,26 @@ struct lnet_libhandle * list_add(&lh->lh_hash_chain, &rec->rec_lh_hash[hash]); } +struct list_head ** +lnet_create_array_of_queues(void) +{ + struct list_head **qs; + struct list_head *q; + int i; + + qs = cfs_percpt_alloc(lnet_cpt_table(), + sizeof(struct list_head)); + if (!qs) { + CERROR("Failed to allocate queues\n"); + return NULL; + } + + cfs_percpt_for_each(q, i, qs) + INIT_LIST_HEAD(q); + + return qs; +} + static int lnet_unprepare(void); static int @@ -1120,6 +1140,12 @@ struct lnet_libhandle * goto failed; } + the_lnet.ln_mt_zombie_rstqs = lnet_create_array_of_queues(); + if (!the_lnet.ln_mt_zombie_rstqs) { + rc = -ENOMEM; + goto failed; + } + return 0; failed: @@ -1144,6 +1170,11 @@ struct lnet_libhandle * LASSERT(list_empty(&the_lnet.ln_test_peers)); LASSERT(list_empty(&the_lnet.ln_nets)); + if (the_lnet.ln_mt_zombie_rstqs) { + lnet_clean_zombie_rstqs(); + the_lnet.ln_mt_zombie_rstqs = NULL; + } + if (!LNetEQHandleIsInvalid(the_lnet.ln_mt_eqh)) { rc = LNetEQFree(the_lnet.ln_mt_eqh); LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 413397c..322998a 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2556,24 +2556,55 @@ struct lnet_mt_event_info { return; rspt = md->md_rspt_ptr; - md->md_rspt_ptr = NULL; /* debug code */ LASSERT(rspt->rspt_cpt == cpt); - /* invalidate the handle to indicate that a response has been - * received, which will then lead the monitor thread to clean up - * the rspt block. - */ - LNetInvalidateMDHandle(&rspt->rspt_mdh); + md->md_rspt_ptr = NULL; + + if (LNetMDHandleIsInvalid(rspt->rspt_mdh)) { + /* The monitor thread has invalidated this handle because the + * response timed out, but it failed to lookup the MD. That + * means this response tracker is on the zombie list. We can + * safely remove it under the resource lock (held by caller) and + * free the response tracker block. + */ + list_del(&rspt->rspt_on_list); + lnet_rspt_free(rspt, cpt); + } else { + /* invalidate the handle to indicate that a response has been + * received, which will then lead the monitor thread to clean up + * the rspt block. + */ + LNetInvalidateMDHandle(&rspt->rspt_mdh); + } +} + +void +lnet_clean_zombie_rstqs(void) +{ + struct lnet_rsp_tracker *rspt, *tmp; + int i; + + cfs_cpt_for_each(i, lnet_cpt_table()) { + list_for_each_entry_safe(rspt, tmp, + the_lnet.ln_mt_zombie_rstqs[i], + rspt_on_list) { + list_del(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + } + } + + cfs_percpt_free(the_lnet.ln_mt_zombie_rstqs); } static void -lnet_finalize_expired_responses(bool force) +lnet_finalize_expired_responses(void) { struct lnet_libmd *md; struct list_head local_queue; struct lnet_rsp_tracker *rspt, *tmp; + ktime_t now; int i; if (!the_lnet.ln_mt_rstq) @@ -2590,6 +2621,8 @@ struct lnet_mt_event_info { list_splice_init(the_lnet.ln_mt_rstq[i], &local_queue); lnet_net_unlock(i); + now = ktime_get(); + list_for_each_entry_safe(rspt, tmp, &local_queue, rspt_on_list) { /* The rspt mdh will be invalidated when a response @@ -2605,42 +2638,74 @@ struct lnet_mt_event_info { lnet_res_lock(i); if (LNetMDHandleIsInvalid(rspt->rspt_mdh)) { lnet_res_unlock(i); - list_del_init(&rspt->rspt_on_list); + list_del(&rspt->rspt_on_list); lnet_rspt_free(rspt, i); continue; } - if (ktime_compare(ktime_get(), - rspt->rspt_deadline) >= 0 || - force) { + if (ktime_compare(now, rspt->rspt_deadline) >= 0 || + the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN) { struct lnet_peer_ni *lpni; lnet_nid_t nid; md = lnet_handle2md(&rspt->rspt_mdh); if (!md) { + /* MD has been queued for unlink, but + * rspt hasn't been detached (Note we've + * checked above that the rspt_mdh is + * valid). Since we cannot lookup the MD + * we're unable to detach the rspt + * ourselves. Thus, move the rspt to the + * zombie list where we'll wait for + * either: + * 1. The remaining operations on the + * MD to complete. In this case the + * final operation will result in + * lnet_msg_detach_md()-> + * lnet_detach_rsp_tracker() where + * we will clean up this response + * tracker. + * 2. LNet to shutdown. In this case + * we'll wait until after all LND Nets + * have shutdown and then we can + * safely free any remaining response + * tracker blocks on the zombie list. + * Note: We need to hold the resource + * lock when adding to the zombie list + * because we may have concurrent access + * with lnet_detach_rsp_tracker(). + */ LNetInvalidateMDHandle(&rspt->rspt_mdh); + list_move(&rspt->rspt_on_list, + the_lnet.ln_mt_zombie_rstqs[i]); lnet_res_unlock(i); - list_del_init(&rspt->rspt_on_list); - lnet_rspt_free(rspt, i); continue; } LASSERT(md->md_rspt_ptr == rspt); md->md_rspt_ptr = NULL; lnet_res_unlock(i); + LNetMDUnlink(rspt->rspt_mdh); + + nid = rspt->rspt_next_hop_nid; + + list_del(&rspt->rspt_on_list); + lnet_rspt_free(rspt, i); + + /* If we're shutting down we just want to clean + * up the rspt blocks + */ + if (the_lnet.ln_mt_state == + LNET_MT_STATE_SHUTDOWN) + continue; + lnet_net_lock(i); the_lnet.ln_counters[i]->lct_health.lch_response_timeout_count++; lnet_net_unlock(i); - list_del_init(&rspt->rspt_on_list); - - nid = rspt->rspt_next_hop_nid; - CDEBUG(D_NET, "Response timeout: md = %p: nid = %s\n", md, libcfs_nid2str(nid)); - LNetMDUnlink(rspt->rspt_mdh); - lnet_rspt_free(rspt, i); /* If there is a timeout on the response * from the next hop decrement its health @@ -2659,10 +2724,11 @@ struct lnet_mt_event_info { } } - lnet_net_lock(i); - if (!list_empty(&local_queue)) + if (!list_empty(&local_queue)) { + lnet_net_lock(i); list_splice(&local_queue, the_lnet.ln_mt_rstq[i]); - lnet_net_unlock(i); + lnet_net_unlock(i); + } } } @@ -2927,26 +2993,6 @@ struct lnet_mt_event_info { lnet_net_unlock(0); } -static struct list_head ** -lnet_create_array_of_queues(void) -{ - struct list_head **qs; - struct list_head *q; - int i; - - qs = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct list_head)); - if (!qs) { - CERROR("Failed to allocate queues\n"); - return NULL; - } - - cfs_percpt_for_each(q, i, qs) - INIT_LIST_HEAD(q); - - return qs; -} - static int lnet_resendqs_create(void) { @@ -3204,7 +3250,7 @@ struct lnet_mt_event_info { lnet_resend_pending_msgs(); if (now >= rsp_timeout) { - lnet_finalize_expired_responses(false); + lnet_finalize_expired_responses(); rsp_timeout = now + (lnet_transaction_timeout / 2); } @@ -3422,7 +3468,7 @@ struct lnet_mt_event_info { static void lnet_rsp_tracker_clean(void) { - lnet_finalize_expired_responses(true); + lnet_finalize_expired_responses(); cfs_percpt_free(the_lnet.ln_mt_rstq); the_lnet.ln_mt_rstq = NULL; From patchwork Thu Feb 27 21:14:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A01517E0 for ; Thu, 27 Feb 2020 21:37:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 306DF24690 for ; Thu, 27 Feb 2020 21:37:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 306DF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1DC9334A1F9; Thu, 27 Feb 2020 13:30:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8D22821FE7B for ; Thu, 27 Feb 2020 13:20:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 41E598F1D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4066646A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:50 -0500 Message-Id: <1582838290-17243-423-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 422/622] lustre: lov: Correct write_intent end for trunc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When instantiating a layout, the server interprets the write intent from the client as the range [start, end), not including the last byte. This is correct for writes because the last byte given for a write is actually 'endpos', the resulting file pointer position, and so is not included. However, truncate is specifying a size, not an endpos, so truncate is [start, size]. To make this work with the [start, end) processing for write_intents, we have to add 1 to the size when sending a write intent. Without this, a truncate operation to the first byte of a new layout component fails silently because the component is not instantiated. WC-bug-id: https://jira.whamcloud.com/browse/LU-12586 Lustre-commit: c32c7401426d ("LU-12586 lov: Correct write_intent end for trunc") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35607 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_io.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 9328240..6e86efa 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -555,7 +555,15 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj, */ if (cl_io_is_trunc(io)) { io->ci_write_intent.e_start = 0; - io->ci_write_intent.e_end = io->u.ci_setattr.sa_attr.lvb_size; + /* for writes, e_end is endpos, the location of the file + * pointer after the write is completed, so it is not accessed. + * For truncate, 'end' is the size, and *is* acccessed. + * In other words, writes are [start, end), but truncate is + * [start, size], where both are included. So add 1 to the + * size when creating the write intent to account for this. + */ + io->ci_write_intent.e_end = + io->u.ci_setattr.sa_attr.lvb_size + 1; } else { io->ci_write_intent.e_start = lio->lis_pos; io->ci_write_intent.e_end = lio->lis_endpos; From patchwork Thu Feb 27 21:14:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B033B17E0 for ; Thu, 27 Feb 2020 21:39:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 98E86246A1 for ; Thu, 27 Feb 2020 21:39:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98E86246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73059349A5F; Thu, 27 Feb 2020 13:32:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D035721FE7B for ; Thu, 27 Feb 2020 13:20:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 457128F1E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 431DC46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:51 -0500 Message-Id: <1582838290-17243-424-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 423/622] lustre: mdc: hold lock while walking changelog dev list X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In mdc_changelog_cdev_finish() we need chlg_registered_dev_lock while walking and changing entries on the chlog_registered_devs and ced_obds lists in chlg_registered_dev_find_by_obd(). Move the calling of chlg_registered_dev_find_by_obd() under the mutex, and add assertions to the places where the lists are walked and changed that the mutex is held. Fixes: dfecb064ac1f ("lustre: mdc: expose changelog through char devices") WC-bug-id: https://jira.whamcloud.com/browse/LU-12566 Lustre-commit: a260c530801d ("LU-12566 mdc: hold lock while walking changelog dev list") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35668 Reviewed-by: Hongchao Zhang Reviewed-by: Quentin Bouget Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_changelog.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c index ea74bab..9af0541 100644 --- a/fs/lustre/mdc/mdc_changelog.c +++ b/fs/lustre/mdc/mdc_changelog.c @@ -677,6 +677,7 @@ static void get_chlg_name(char *name, size_t name_len, struct obd_device *obd) { struct chlg_registered_dev *dit; + LASSERT(mutex_is_locked(&chlg_registered_dev_lock)); list_for_each_entry(dit, &chlg_registered_devices, ced_link) if (strcmp(name, dit->ced_name) == 0) return dit; @@ -695,6 +696,7 @@ static void get_chlg_name(char *name, size_t name_len, struct obd_device *obd) struct chlg_registered_dev *dit; struct obd_device *oit; + LASSERT(mutex_is_locked(&chlg_registered_dev_lock)); list_for_each_entry(dit, &chlg_registered_devices, ced_link) list_for_each_entry(oit, &dit->ced_obds, u.cli.cl_chg_dev_linkage) @@ -768,6 +770,7 @@ static void chlg_dev_clear(struct kref *kref) struct chlg_registered_dev *entry = container_of(kref, struct chlg_registered_dev, ced_refs); + LASSERT(mutex_is_locked(&chlg_registered_dev_lock)); list_del(&entry->ced_link); misc_deregister(&entry->ced_misc); kfree(entry); @@ -778,9 +781,10 @@ static void chlg_dev_clear(struct kref *kref) */ void mdc_changelog_cdev_finish(struct obd_device *obd) { - struct chlg_registered_dev *dev = chlg_registered_dev_find_by_obd(obd); + struct chlg_registered_dev *dev; mutex_lock(&chlg_registered_dev_lock); + dev = chlg_registered_dev_find_by_obd(obd); list_del_init(&obd->u.cli.cl_chg_dev_linkage); kref_put(&dev->ced_refs, chlg_dev_clear); mutex_unlock(&chlg_registered_dev_lock); From patchwork Thu Feb 27 21:14:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDCF717E0 for ; Thu, 27 Feb 2020 21:37:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A6C95246A1 for ; Thu, 27 Feb 2020 21:37:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6C95246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B4CCA348F90; Thu, 27 Feb 2020 13:30:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1CDE921FB08 for ; Thu, 27 Feb 2020 13:20:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 47A2F8F1F; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 460FA46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:52 -0500 Message-Id: <1582838290-17243-425-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 424/622] lustre: import: fix race between imp_state & imp_invalid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng We set import to LUSTRE_IMP_DISCON and then deactive when it is unreplayable. Someone may set this import up between those two operations. So we will get a invalid import with FULL state. WC-bug-id: https://jira.whamcloud.com/browse/LU-11542 Lustre-commit: 29904135df67 ("LU-11542 import: fix race between imp_state & imp_invalid") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/33395 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_ha.h | 2 +- fs/lustre/lov/lov_obd.c | 2 +- fs/lustre/ptlrpc/client.c | 3 +- fs/lustre/ptlrpc/import.c | 104 ++++++++++++++++++++++++------------- fs/lustre/ptlrpc/pinger.c | 13 ++--- fs/lustre/ptlrpc/ptlrpc_internal.h | 3 +- fs/lustre/ptlrpc/recover.c | 14 ++--- 7 files changed, 80 insertions(+), 61 deletions(-) diff --git a/fs/lustre/include/lustre_ha.h b/fs/lustre/include/lustre_ha.h index af92a56..c914ef6 100644 --- a/fs/lustre/include/lustre_ha.h +++ b/fs/lustre/include/lustre_ha.h @@ -50,7 +50,7 @@ void ptlrpc_wake_delayed(struct obd_import *imp); int ptlrpc_recover_import(struct obd_import *imp, char *new_uuid, int async); int ptlrpc_set_import_active(struct obd_import *imp, int active); -void ptlrpc_activate_import(struct obd_import *imp); +void ptlrpc_activate_import(struct obd_import *imp, bool set_state_full); void ptlrpc_deactivate_import(struct obd_import *imp); void ptlrpc_invalidate_import(struct obd_import *imp); void ptlrpc_fail_import(struct obd_import *imp, u32 conn_cnt); diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c index 234b556..3348380 100644 --- a/fs/lustre/lov/lov_obd.c +++ b/fs/lustre/lov/lov_obd.c @@ -157,7 +157,7 @@ int lov_connect_osc(struct obd_device *obd, u32 index, int activate, /* FIXME this is probably supposed to be * ptlrpc_set_import_active. Horrible naming. */ - ptlrpc_activate_import(imp); + ptlrpc_activate_import(imp, false); } rc = obd_register_observer(tgt_obd, obd); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 9920a95..dcc5e6b 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -3033,7 +3033,7 @@ void ptlrpc_abort_inflight(struct obd_import *imp) * ptlrpc_{queue,set}_wait must (and does) hold imp_lock while testing * this flag and then putting requests on sending_list or delayed_list. */ - spin_lock(&imp->imp_lock); + assert_spin_locked(&imp->imp_lock); /* * XXX locking? Maybe we should remove each request with the list @@ -3071,7 +3071,6 @@ void ptlrpc_abort_inflight(struct obd_import *imp) if (imp->imp_replayable) ptlrpc_free_committed(imp); - spin_unlock(&imp->imp_lock); } /** diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 98c09f6..0ade41e 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -144,6 +144,17 @@ static void deuuidify(char *uuid, const char *prefix, char **uuid_start, *uuid_len -= strlen(UUID_STR); } +/* Must be called with imp_lock held! */ +static void ptlrpc_deactivate_import_nolock(struct obd_import *imp) +{ + assert_spin_locked(&imp->imp_lock); + CDEBUG(D_HA, "setting import %s INVALID\n", obd2cli_tgt(imp->imp_obd)); + imp->imp_invalid = 1; + imp->imp_generation++; + + ptlrpc_abort_inflight(imp); +} + /** * Returns true if import was FULL, false if import was already not * connected. @@ -154,8 +165,10 @@ static void deuuidify(char *uuid, const char *prefix, char **uuid_start, * bulk requests) and if one has already caused a reconnection * (increasing the import->conn_cnt) the older failure should * not also cause a reconnection. If zero it forces a reconnect. + * @invalid - set import invalid flag */ -int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) +int ptlrpc_set_import_discon(struct obd_import *imp, + u32 conn_cnt, bool invalid) { int rc = 0; @@ -165,10 +178,12 @@ int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) (conn_cnt == 0 || conn_cnt == imp->imp_conn_cnt)) { char *target_start; int target_len; + bool inact = false; deuuidify(obd2cli_tgt(imp->imp_obd), NULL, &target_start, &target_len); + import_set_state_nolock(imp, LUSTRE_IMP_DISCON); if (imp->imp_replayable) { LCONSOLE_WARN("%s: Connection to %.*s (at %s) was lost; in progress operations using this service will wait for recovery to complete\n", imp->imp_obd->obd_name, @@ -180,14 +195,25 @@ int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt) imp->imp_obd->obd_name, target_len, target_start, obd_import_nid2str(imp)); + if (invalid) { + CDEBUG(D_HA, + "import %s@%s for %s not replayable, auto-deactivating\n", + obd2cli_tgt(imp->imp_obd), + imp->imp_connection->c_remote_uuid.uuid, + imp->imp_obd->obd_name); + ptlrpc_deactivate_import_nolock(imp); + inact = true; + } } - import_set_state_nolock(imp, LUSTRE_IMP_DISCON); spin_unlock(&imp->imp_lock); if (obd_dump_on_timeout) libcfs_debug_dumplog(); obd_import_event(imp->imp_obd, imp, IMP_EVENT_DISCON); + + if (inact) + obd_import_event(imp->imp_obd, imp, IMP_EVENT_INACTIVE); rc = 1; } else { spin_unlock(&imp->imp_lock); @@ -211,11 +237,9 @@ void ptlrpc_deactivate_import(struct obd_import *imp) CDEBUG(D_HA, "setting import %s INVALID\n", obd2cli_tgt(imp->imp_obd)); spin_lock(&imp->imp_lock); - imp->imp_invalid = 1; - imp->imp_generation++; + ptlrpc_deactivate_import_nolock(imp); spin_unlock(&imp->imp_lock); - ptlrpc_abort_inflight(imp); obd_import_event(imp->imp_obd, imp, IMP_EVENT_INACTIVE); } EXPORT_SYMBOL(ptlrpc_deactivate_import); @@ -379,17 +403,23 @@ void ptlrpc_invalidate_import(struct obd_import *imp) EXPORT_SYMBOL(ptlrpc_invalidate_import); /* unset imp_invalid */ -void ptlrpc_activate_import(struct obd_import *imp) +void ptlrpc_activate_import(struct obd_import *imp, bool set_state_full) { struct obd_device *obd = imp->imp_obd; spin_lock(&imp->imp_lock); if (imp->imp_deactive != 0) { + LASSERT(imp->imp_state != LUSTRE_IMP_FULL); + if (imp->imp_state != LUSTRE_IMP_DISCON) + import_set_state_nolock(imp, LUSTRE_IMP_DISCON); spin_unlock(&imp->imp_lock); return; } + if (set_state_full) + import_set_state_nolock(imp, LUSTRE_IMP_FULL); imp->imp_invalid = 0; + spin_unlock(&imp->imp_lock); obd_import_event(obd, imp, IMP_EVENT_ACTIVE); } @@ -413,18 +443,8 @@ void ptlrpc_fail_import(struct obd_import *imp, u32 conn_cnt) { LASSERT(!imp->imp_dlm_fake); - if (ptlrpc_set_import_discon(imp, conn_cnt)) { - if (!imp->imp_replayable) { - CDEBUG(D_HA, - "import %s@%s for %s not replayable, auto-deactivating\n", - obd2cli_tgt(imp->imp_obd), - imp->imp_connection->c_remote_uuid.uuid, - imp->imp_obd->obd_name); - ptlrpc_deactivate_import(imp); - } - + if (ptlrpc_set_import_discon(imp, conn_cnt, true)) ptlrpc_pinger_force(imp); - } } int ptlrpc_reconnect_import(struct obd_import *imp) @@ -1073,12 +1093,10 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, spin_lock(&imp->imp_lock); if (msg_flags & MSG_CONNECT_REPLAYABLE) { imp->imp_replayable = 1; - spin_unlock(&imp->imp_lock); CDEBUG(D_HA, "connected to replayable target: %s\n", obd2cli_tgt(imp->imp_obd)); } else { imp->imp_replayable = 0; - spin_unlock(&imp->imp_lock); } /* if applies, adjust the imp->imp_msg_magic here @@ -1095,10 +1113,11 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, if (msg_flags & MSG_CONNECT_RECOVERING) { CDEBUG(D_HA, "connect to %s during recovery\n", obd2cli_tgt(imp->imp_obd)); - import_set_state(imp, LUSTRE_IMP_REPLAY_LOCKS); + import_set_state_nolock(imp, LUSTRE_IMP_REPLAY_LOCKS); + spin_unlock(&imp->imp_lock); } else { - import_set_state(imp, LUSTRE_IMP_FULL); - ptlrpc_activate_import(imp); + spin_unlock(&imp->imp_lock); + ptlrpc_activate_import(imp, true); } rc = 0; @@ -1223,31 +1242,33 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, } out: + if (exp) + class_export_put(exp); + spin_lock(&imp->imp_lock); imp->imp_connected = 0; imp->imp_connect_tried = 1; - spin_unlock(&imp->imp_lock); - if (exp) - class_export_put(exp); + if (rc) { + bool inact = false; - if (rc != 0) { - import_set_state(imp, LUSTRE_IMP_DISCON); + import_set_state_nolock(imp, LUSTRE_IMP_DISCON); if (rc == -EACCES) { /* * Give up trying to reconnect * EACCES means client has no permission for connection */ imp->imp_obd->obd_no_recov = 1; - ptlrpc_deactivate_import(imp); - } - - if (rc == -EPROTO) { + ptlrpc_deactivate_import_nolock(imp); + inact = true; + } else if (rc == -EPROTO) { struct obd_connect_data *ocd; /* reply message might not be ready */ - if (!request->rq_repmsg) + if (!request->rq_repmsg) { + spin_unlock(&imp->imp_lock); return -EPROTO; + } ocd = req_capsule_server_get(&request->rq_pill, &RMF_CONNECT_DATA); @@ -1267,17 +1288,26 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, OBD_OCD_VERSION_PATCH(ocd->ocd_version), OBD_OCD_VERSION_FIX(ocd->ocd_version), LUSTRE_VERSION_STRING); - ptlrpc_deactivate_import(imp); - import_set_state(imp, LUSTRE_IMP_CLOSED); + ptlrpc_deactivate_import_nolock(imp); + import_set_state_nolock(imp, LUSTRE_IMP_CLOSED); + inact = true; } - return -EPROTO; } + spin_unlock(&imp->imp_lock); + + if (inact) + obd_import_event(imp->imp_obd, imp, IMP_EVENT_INACTIVE); + + if (rc == -EPROTO) + return rc; ptlrpc_maybe_ping_import_soon(imp); CDEBUG(D_HA, "recovery of %s on %s failed (%d)\n", obd2cli_tgt(imp->imp_obd), (char *)imp->imp_connection->c_remote_uuid.uuid, rc); + } else { + spin_unlock(&imp->imp_lock); } wake_up_all(&imp->imp_recovery_waitq); @@ -1476,8 +1506,7 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) rc = ptlrpc_resend(imp); if (rc) goto out; - import_set_state(imp, LUSTRE_IMP_FULL); - ptlrpc_activate_import(imp); + ptlrpc_activate_import(imp, true); deuuidify(obd2cli_tgt(imp->imp_obd), NULL, &target_start, &target_len); @@ -1684,6 +1713,7 @@ int ptlrpc_disconnect_and_idle_import(struct obd_import *imp) return 0; spin_lock(&imp->imp_lock); + if (imp->imp_state != LUSTRE_IMP_FULL) { spin_unlock(&imp->imp_lock); return 0; diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index c3fbddc..a812942 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -217,8 +217,6 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp, imp->imp_force_next_verify = 0; - spin_unlock(&imp->imp_lock); - CDEBUG(level == LUSTRE_IMP_FULL ? D_INFO : D_HA, "%s->%s: level %s/%u force %u force_next %u deactive %u pingable %u suppress %u\n", imp->imp_obd->obd_uuid.uuid, obd2cli_tgt(imp->imp_obd), @@ -228,22 +226,21 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp, if (level == LUSTRE_IMP_DISCON && !imp_is_deactive(imp)) { /* wait for a while before trying recovery again */ imp->imp_next_ping = ptlrpc_next_reconnect(imp); + spin_unlock(&imp->imp_lock); if (!imp->imp_no_pinger_recover || imp->imp_connect_error == -EAGAIN) ptlrpc_initiate_recovery(imp); - } else if (level != LUSTRE_IMP_FULL || - imp->imp_obd->obd_no_recov || + } else if (level != LUSTRE_IMP_FULL || imp->imp_obd->obd_no_recov || imp_is_deactive(imp)) { CDEBUG(D_HA, "%s->%s: not pinging (in recovery or recovery disabled: %s)\n", imp->imp_obd->obd_uuid.uuid, obd2cli_tgt(imp->imp_obd), ptlrpc_import_state_name(level)); - if (force) { - spin_lock(&imp->imp_lock); + if (force) imp->imp_force_verify = 1; - spin_unlock(&imp->imp_lock); - } + spin_unlock(&imp->imp_lock); } else if ((imp->imp_pingable && !suppress) || force_next || force) { + spin_unlock(&imp->imp_lock); ptlrpc_ping(imp); } } diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h index f84d278..9e74d71 100644 --- a/fs/lustre/ptlrpc/ptlrpc_internal.h +++ b/fs/lustre/ptlrpc/ptlrpc_internal.h @@ -83,7 +83,8 @@ void ptlrpc_set_add_new_req(struct ptlrpcd_ctl *pc, void ptlrpc_request_handle_notconn(struct ptlrpc_request *req); void lustre_assert_wire_constants(void); int ptlrpc_import_in_recovery(struct obd_import *imp); -int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt); +int ptlrpc_set_import_discon(struct obd_import *imp, u32 conn_cnt, + bool invalid); int ptlrpc_replay_next(struct obd_import *imp, int *inflight); void ptlrpc_initiate_recovery(struct obd_import *imp); diff --git a/fs/lustre/ptlrpc/recover.c b/fs/lustre/ptlrpc/recover.c index e26612d..e6e6661 100644 --- a/fs/lustre/ptlrpc/recover.c +++ b/fs/lustre/ptlrpc/recover.c @@ -224,21 +224,13 @@ void ptlrpc_wake_delayed(struct obd_import *imp) void ptlrpc_request_handle_notconn(struct ptlrpc_request *failed_req) { struct obd_import *imp = failed_req->rq_import; + int conn = lustre_msg_get_conn_cnt(failed_req->rq_reqmsg); CDEBUG(D_HA, "import %s of %s@%s abruptly disconnected: reconnecting\n", imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd), imp->imp_connection->c_remote_uuid.uuid); - if (ptlrpc_set_import_discon(imp, - lustre_msg_get_conn_cnt(failed_req->rq_reqmsg))) { - if (!imp->imp_replayable) { - CDEBUG(D_HA, - "import %s@%s for %s not replayable, auto-deactivating\n", - obd2cli_tgt(imp->imp_obd), - imp->imp_connection->c_remote_uuid.uuid, - imp->imp_obd->obd_name); - ptlrpc_deactivate_import(imp); - } + if (ptlrpc_set_import_discon(imp, conn, true)) { /* to control recovery via lctl {disable|enable}_recovery */ if (imp->imp_deactive == 0) ptlrpc_connect_import(imp); @@ -317,7 +309,7 @@ int ptlrpc_recover_import(struct obd_import *imp, char *new_uuid, int async) goto out; /* force import to be disconnected. */ - ptlrpc_set_import_discon(imp, 0); + ptlrpc_set_import_discon(imp, 0, false); if (new_uuid) { struct obd_uuid uuid; From patchwork Thu Feb 27 21:14:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58FAA92A for ; Thu, 27 Feb 2020 21:39:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4190624690 for ; Thu, 27 Feb 2020 21:39:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4190624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8DB69349A65; Thu, 27 Feb 2020 13:32:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7313821FE8C for ; Thu, 27 Feb 2020 13:20:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4A5B68F20; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 48EF447C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:53 -0500 Message-Id: <1582838290-17243-426-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 425/622] lnet: support non-default network namespace X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Aurelien Degremont Replace hard coded references to default root network namespace (&init_net) in LNET code (LNET, socklnd and o2iblnd). When a network interface is created, Lustre records the current network namespace. This patch improves the LNET code to use this reference namespace most of the time instead of the root network namespace. When using lctl, lnetctl or insmod, we use the current process network namespace. When starting the listening acceptor, we use the namespace of the process that triggers this start. An additional patch is needed for RPCSEC GSS support. WC-bug-id: https://jira.whamcloud.com/browse/LU-12236 Lustre-commit: 93b08edfb1c6 ("LU-12236 lnet: support non-default network namespace") Signed-off-by: Aurelien Degremont Reviewed-on: https://review.whamcloud.com/34768 Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 9 +++++---- net/lnet/klnds/o2iblnd/o2iblnd.c | 22 +++++++++++----------- net/lnet/klnds/o2iblnd/o2iblnd.h | 9 ++++----- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 8 +++++--- net/lnet/klnds/socklnd/socklnd.c | 2 +- net/lnet/klnds/socklnd/socklnd_cb.c | 3 ++- net/lnet/lnet/acceptor.c | 11 +++++++---- net/lnet/lnet/config.c | 6 +++--- net/lnet/lnet/lib-socket.c | 13 +++++++------ 9 files changed, 45 insertions(+), 38 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index b1407b3..b889af2 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -717,7 +717,7 @@ void lnet_copy_kiov2iter(struct iov_iter *to, void lnet_unregister_lnd(struct lnet_lnd *lnd); int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid, - u32 local_ip, u32 peer_ip, int peer_port); + u32 local_ip, u32 peer_ip, int peer_port, struct net *ns); void lnet_connect_console_error(int rc, lnet_nid_t peer_nid, u32 peer_ip, int port); int lnet_count_acceptor_nets(void); @@ -738,18 +738,19 @@ struct lnet_inetdev { char li_name[IFNAMSIZ]; }; -int lnet_inet_enumerate(struct lnet_inetdev **dev_list); +int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns); int lnet_sock_setbuf(struct socket *socket, int txbufsize, int rxbufsize); int lnet_sock_getbuf(struct socket *socket, int *txbufsize, int *rxbufsize); int lnet_sock_getaddr(struct socket *socket, bool remote, u32 *ip, int *port); int lnet_sock_write(struct socket *sock, void *buffer, int nob, int timeout); int lnet_sock_read(struct socket *sock, void *buffer, int nob, int timeout); -int lnet_sock_listen(struct socket **sockp, u32 ip, int port, int backlog); +int lnet_sock_listen(struct socket **sockp, u32 ip, int port, int backlog, + struct net *ns); int lnet_sock_accept(struct socket **newsockp, struct socket *sock); int lnet_sock_connect(struct socket **sockp, int *fatal, u32 local_ip, int local_port, - u32 peer_ip, int peer_port); + u32 peer_ip, int peer_port, struct net *ns); void libcfs_sock_release(struct socket *sock); int lnet_peers_start_down(void); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index bb7590f..f3176e1 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2358,7 +2358,7 @@ static int kiblnd_dummy_callback(struct rdma_cm_id *cmid, return 0; } -static int kiblnd_dev_need_failover(struct kib_dev *dev) +static int kiblnd_dev_need_failover(struct kib_dev *dev, struct net *ns) { struct rdma_cm_id *cmid; struct sockaddr_in srcaddr; @@ -2382,8 +2382,8 @@ static int kiblnd_dev_need_failover(struct kib_dev *dev) * a. rdma_bind_addr(), it will conflict with listener cmid * b. rdma_resolve_addr() to zero addr */ - cmid = kiblnd_rdma_create_id(kiblnd_dummy_callback, dev, RDMA_PS_TCP, - IB_QPT_RC); + cmid = kiblnd_rdma_create_id(ns, kiblnd_dummy_callback, dev, + RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(cmid)) { rc = PTR_ERR(cmid); CERROR("Failed to create cmid for failover: %d\n", rc); @@ -2412,7 +2412,7 @@ static int kiblnd_dev_need_failover(struct kib_dev *dev) return rc; } -int kiblnd_dev_failover(struct kib_dev *dev) +int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) { LIST_HEAD(zombie_tpo); LIST_HEAD(zombie_ppo); @@ -2429,7 +2429,7 @@ int kiblnd_dev_failover(struct kib_dev *dev) LASSERT(*kiblnd_tunables.kib_dev_failover > 1 || dev->ibd_can_failover || !dev->ibd_hdev); - rc = kiblnd_dev_need_failover(dev); + rc = kiblnd_dev_need_failover(dev, ns); if (rc <= 0) goto out; @@ -2454,7 +2454,7 @@ int kiblnd_dev_failover(struct kib_dev *dev) rdma_destroy_id(cmid); } - cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev, RDMA_PS_TCP, + cmid = kiblnd_rdma_create_id(ns, kiblnd_cm_callback, dev, RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(cmid)) { rc = PTR_ERR(cmid); @@ -2683,7 +2683,7 @@ static void kiblnd_shutdown(struct lnet_ni *ni) kiblnd_base_shutdown(); } -static int kiblnd_base_startup(void) +static int kiblnd_base_startup(struct net *ns) { struct kib_sched_info *sched; int rc; @@ -2758,7 +2758,7 @@ static int kiblnd_base_startup(void) } if (*kiblnd_tunables.kib_dev_failover) - rc = kiblnd_thread_start(kiblnd_failover_thread, NULL, + rc = kiblnd_thread_start(kiblnd_failover_thread, ns, "kiblnd_failover"); if (rc) { @@ -2856,7 +2856,7 @@ static int kiblnd_startup(struct lnet_ni *ni) LASSERT(ni->ni_net->net_lnd == &the_o2iblnd); if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) { - rc = kiblnd_base_startup(); + rc = kiblnd_base_startup(ni->ni_net_ns); if (rc) return rc; } @@ -2894,7 +2894,7 @@ static int kiblnd_startup(struct lnet_ni *ni) goto failed; } - rc = lnet_inet_enumerate(&ifaces); + rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns); if (rc < 0) goto failed; @@ -2925,7 +2925,7 @@ static int kiblnd_startup(struct lnet_ni *ni) INIT_LIST_HEAD(&ibdev->ibd_fail_list); /* initialize the device */ - rc = kiblnd_dev_failover(ibdev); + rc = kiblnd_dev_failover(ibdev, ni->ni_net_ns); if (rc) { CERROR("ko2iblnd: Can't initialize device: rc = %d\n", rc); goto failed; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 2f7ca52..1285ab1 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -109,10 +109,9 @@ struct kib_tunables { IBLND_CREDIT_HIGHWATER_V1 : \ t->lnd_peercredits_hiw) -#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(current->nsproxy->net_ns, \ - cb, dev, \ - ps, qpt) - +# define kiblnd_rdma_create_id(ns, cb, dev, ps, qpt) rdma_create_id(ns, cb, \ + dev, ps, \ + qpt) /* 2 OOB shall suffice for 1 keepalive and 1 returning credits */ #define IBLND_OOB_CAPABLE(v) ((v) != IBLND_MSG_VERSION_1) #define IBLND_OOB_MSGS(v) (IBLND_OOB_CAPABLE(v) ? 2 : 0) @@ -1030,7 +1029,7 @@ int kiblnd_cm_callback(struct rdma_cm_id *cmid, struct rdma_cm_event *event); int kiblnd_translate_mtu(int value); -int kiblnd_dev_failover(struct kib_dev *dev); +int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns); int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp, lnet_nid_t nid); void kiblnd_destroy_peer(struct kib_peer_ni *peer_ni); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 69918cf..1110553 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1330,8 +1330,9 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, LASSERT(net); LASSERT(peer_ni->ibp_connecting > 0); - cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, peer_ni, RDMA_PS_TCP, - IB_QPT_RC); + cmid = kiblnd_rdma_create_id(peer_ni->ibp_ni->ni_net_ns, + kiblnd_cm_callback, peer_ni, + RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(cmid)) { CERROR("Can't create CMID for %s: %ld\n", @@ -3830,6 +3831,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, { rwlock_t *glock = &kiblnd_data.kib_global_lock; struct kib_dev *dev; + struct net *ns = arg; wait_queue_entry_t wait; unsigned long flags; int rc; @@ -3856,7 +3858,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, dev->ibd_failover = 1; write_unlock_irqrestore(glock, flags); - rc = kiblnd_dev_failover(dev); + rc = kiblnd_dev_failover(dev, ns); write_lock_irqsave(glock, flags); diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 0f5c7fc..78f6c7e 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2718,7 +2718,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) net_tunables->lct_peer_rtr_credits = *ksocknal_tunables.ksnd_peerrtrcredits; - rc = lnet_inet_enumerate(&ifaces); + rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns); if (rc < 0) goto fail_1; diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 581f734..0132727 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -1871,7 +1871,8 @@ void ksocknal_write_callback(struct ksock_conn *conn) rc = lnet_connect(&sock, peer_ni->ksnp_id.nid, route->ksnr_myipaddr, - route->ksnr_ipaddr, route->ksnr_port); + route->ksnr_ipaddr, route->ksnr_port, + peer_ni->ksnp_ni->ni_net_ns); if (rc) goto failed; diff --git a/net/lnet/lnet/acceptor.c b/net/lnet/lnet/acceptor.c index 1854347..23b5bf0 100644 --- a/net/lnet/lnet/acceptor.c +++ b/net/lnet/lnet/acceptor.c @@ -44,6 +44,7 @@ int pta_shutdown; struct socket *pta_sock; struct completion pta_signal; + struct net *pta_ns; } lnet_acceptor_state = { .pta_shutdown = 1 }; @@ -142,7 +143,7 @@ int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid, - u32 local_ip, u32 peer_ip, int peer_port) + u32 local_ip, u32 peer_ip, int peer_port, struct net *ns) { struct lnet_acceptor_connreq cr; struct socket *sock; @@ -158,7 +159,7 @@ /* Iterate through reserved ports. */ rc = lnet_sock_connect(&sock, &fatal, local_ip, port, peer_ip, - peer_port); + peer_port, ns); if (rc) { if (fatal) goto failed; @@ -335,8 +336,9 @@ LASSERT(!lnet_acceptor_state.pta_sock); - rc = lnet_sock_listen(&lnet_acceptor_state.pta_sock, 0, accept_port, - accept_backlog); + rc = lnet_sock_listen(&lnet_acceptor_state.pta_sock, + 0, accept_port, accept_backlog, + lnet_acceptor_state.pta_ns); if (rc) { if (rc == -EADDRINUSE) LCONSOLE_ERROR_MSG(0x122, "Can't start acceptor on port %d: port already in use\n", @@ -457,6 +459,7 @@ if (!lnet_count_acceptor_nets()) /* not required */ return 0; + lnet_acceptor_state.pta_ns = current->nsproxy->net_ns; task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure, "acceptor_%03ld", secure); if (IS_ERR(task)) { diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index a2a9c79..2c8edcd 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1563,7 +1563,7 @@ struct lnet_ni * return count; } -int lnet_inet_enumerate(struct lnet_inetdev **dev_list) +int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns) { struct lnet_inetdev *ifaces = NULL; struct net_device *dev; @@ -1571,7 +1571,7 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list) int nip = 0; rtnl_lock(); - for_each_netdev(&init_net, dev) { + for_each_netdev(ns, dev) { int flags = dev_get_flags(dev); const struct in_ifaddr *ifa; struct in_device *in_dev; @@ -1642,7 +1642,7 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list) int rc; int i; - nip = lnet_inet_enumerate(&ifaces); + nip = lnet_inet_enumerate(&ifaces, current->nsproxy->net_ns); if (nip < 0) { if (nip != -ENOENT) { LCONSOLE_ERROR_MSG(0x117, diff --git a/net/lnet/lnet/lib-socket.c b/net/lnet/lnet/lib-socket.c index d430d6f..046bd2d 100644 --- a/net/lnet/lnet/lib-socket.c +++ b/net/lnet/lnet/lib-socket.c @@ -156,7 +156,7 @@ static int lnet_sock_create(struct socket **sockp, int *fatal, u32 local_ip, - int local_port) + int local_port, struct net *ns) { struct sockaddr_in locaddr; struct socket *sock; @@ -166,7 +166,7 @@ /* All errors are fatal except bind failure if the port is in use */ *fatal = 1; - rc = sock_create_kern(&init_net, PF_INET, SOCK_STREAM, 0, &sock); + rc = sock_create_kern(ns, PF_INET, SOCK_STREAM, 0, &sock); *sockp = sock; if (rc) { CERROR("Can't create socket: %d\n", rc); @@ -282,12 +282,12 @@ int lnet_sock_listen(struct socket **sockp, u32 local_ip, int local_port, - int backlog) + int backlog, struct net *ns) { int fatal; int rc; - rc = lnet_sock_create(sockp, &fatal, local_ip, local_port); + rc = lnet_sock_create(sockp, &fatal, local_ip, local_port, ns); if (rc) { if (!fatal) CERROR("Can't create socket: port %d already in use\n", @@ -347,12 +347,13 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, u32 local_ip, - int local_port, u32 peer_ip, int peer_port) + int local_port, u32 peer_ip, int peer_port, + struct net *ns) { struct sockaddr_in srvaddr; int rc; - rc = lnet_sock_create(sockp, fatal, local_ip, local_port); + rc = lnet_sock_create(sockp, fatal, local_ip, local_port, ns); if (rc) return rc; From patchwork Thu Feb 27 21:14:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF465138D for ; Thu, 27 Feb 2020 21:37:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D747324690 for ; Thu, 27 Feb 2020 21:37:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D747324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A031734A24A; Thu, 27 Feb 2020 13:30:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA98121FE8C for ; Thu, 27 Feb 2020 13:20:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4D1408F21; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4BCFF468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:54 -0500 Message-Id: <1582838290-17243-427-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 426/622] lustre: obdclass: 0-nlink race in lu_object_find_at() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao There is a race in lu_object_find_at: in the gap between lu_object_alloc() and hash insertion, another thread may have allocated another object for the same file and unlinked it, so we may get an object with 0-nlink, which will trigger assertion in osd_object_release(). To avoid such race, initialize object after hash insertion. But this may cause an uninitialized object found in cache, if so, wait for the object initialized by the allocator. To reproduce the race, introduced cfs_race_wait() and cfs_race_wakeup(): cfs_race_wait() will cause the thread that calls it wait on the race; while cfs_race_wakeup() will wake up the waiting thread. Same as cfs_race(), CFS_FAIL_ONCE should be set together with fail_loc. WC-bug-id: https://jira.whamcloud.com/browse/LU-12485 Lustre-commit: 2ff420913b97 ("LU-12485 obdclass: 0-nlink race in lu_object_find_at()") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35360 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 15 ++++- fs/lustre/include/obd_support.h | 1 + fs/lustre/obdclass/lu_object.c | 127 ++++++++++++++++++++++++++++--------- include/linux/libcfs/libcfs_fail.h | 40 +++++++++++- 4 files changed, 151 insertions(+), 32 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index d2e84a3..1c1a60f 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -461,7 +461,12 @@ enum lu_object_header_flags { /** * Mark this object has already been taken out of cache. */ - LU_OBJECT_UNHASHED = 1, + LU_OBJECT_UNHASHED = 1, + /** + * Object is initialized, when object is found in cache, it may not be + * initialized yet, the object allocator will initialize it. + */ + LU_OBJECT_INITED = 2 }; enum lu_object_header_attr { @@ -656,6 +661,14 @@ static inline int lu_object_is_dying(const struct lu_object_header *h) return test_bit(LU_OBJECT_HEARD_BANSHEE, &h->loh_flags); } +/** + * Return true if object is initialized. + */ +static inline int lu_object_is_inited(const struct lu_object_header *h) +{ + return test_bit(LU_OBJECT_INITED, &h->loh_flags); +} + void lu_object_put(const struct lu_env *env, struct lu_object *o); void lu_object_unhash(const struct lu_env *env, struct lu_object *o); int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, int nr, diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index c66b61a..506535b 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -371,6 +371,7 @@ #define OBD_FAIL_OBD_IDX_READ_BREAK 0x608 #define OBD_FAIL_OBD_NO_LRU 0x609 #define OBD_FAIL_OBDCLASS_MODULE_LOAD 0x60a +#define OBD_FAIL_OBD_ZERO_NLINK_RACE 0x60b #define OBD_FAIL_TGT_REPLY_NET 0x700 #define OBD_FAIL_TGT_CONN_RACE 0x701 diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index d8bff3f..6fea1f3 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -67,13 +67,14 @@ struct lu_site_bkt_data { struct list_head lsb_lru; /** * Wait-queue signaled when an object in this site is ultimately - * destroyed (lu_object_free()). It is used by lu_object_find() to - * wait before re-trying when object in the process of destruction is - * found in the hash table. + * destroyed (lu_object_free()) or initialized (lu_object_start()). + * It is used by lu_object_find() to wait before re-trying when + * object in the process of destruction is found in the hash table; + * or wait object to be initialized by the allocator. * * \see htable_lookup(). */ - wait_queue_head_t lsb_marche_funebre; + wait_queue_head_t lsb_waitq; }; enum { @@ -116,7 +117,7 @@ enum { cfs_hash_bd_get(site->ls_obj_hash, fid, &bd); bkt = cfs_hash_bd_extra_get(site->ls_obj_hash, &bd); - return &bkt->lsb_marche_funebre; + return &bkt->lsb_waitq; } EXPORT_SYMBOL(lu_site_wq_from_fid); @@ -168,7 +169,7 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) * somebody may be waiting for this, currently only * used for cl_object, see cl_object_put_last(). */ - wake_up_all(&bkt->lsb_marche_funebre); + wake_up_all(&bkt->lsb_waitq); } return; } @@ -255,16 +256,9 @@ void lu_object_unhash(const struct lu_env *env, struct lu_object *o) */ static struct lu_object *lu_object_alloc(const struct lu_env *env, struct lu_device *dev, - const struct lu_fid *f, - const struct lu_object_conf *conf) + const struct lu_fid *f) { - struct lu_object *scan; struct lu_object *top; - struct list_head *layers; - unsigned int init_mask = 0; - unsigned int init_flag; - int clean; - int result; /* * Create top-level object slice. This will also create @@ -280,6 +274,27 @@ static struct lu_object *lu_object_alloc(const struct lu_env *env, * after this point. */ top->lo_header->loh_fid = *f; + + return top; +} + +/** + * Initialize object. + * + * This is called after object hash insertion to avoid returning an object with + * stale attributes. + */ +static int lu_object_start(const struct lu_env *env, struct lu_device *dev, + struct lu_object *top, + const struct lu_object_conf *conf) +{ + struct lu_object *scan; + struct list_head *layers; + unsigned int init_mask = 0; + unsigned int init_flag; + int clean; + int result; + layers = &top->lo_header->loh_layers; do { @@ -295,10 +310,9 @@ static struct lu_object *lu_object_alloc(const struct lu_env *env, clean = 0; scan->lo_header = top->lo_header; result = scan->lo_ops->loo_object_init(env, scan, conf); - if (result != 0) { - lu_object_free(env, top); - return ERR_PTR(result); - } + if (result) + return result; + init_mask |= init_flag; next: init_flag <<= 1; @@ -308,15 +322,16 @@ static struct lu_object *lu_object_alloc(const struct lu_env *env, list_for_each_entry_reverse(scan, layers, lo_linkage) { if (scan->lo_ops->loo_object_start) { result = scan->lo_ops->loo_object_start(env, scan); - if (result != 0) { - lu_object_free(env, top); - return ERR_PTR(result); - } + if (result) + return result; } } lprocfs_counter_incr(dev->ld_site->ls_stats, LU_SS_CREATED); - return top; + + set_bit(LU_OBJECT_INITED, &top->lo_header->loh_flags); + + return 0; } /** @@ -598,7 +613,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, const struct lu_fid *f, u64 *version) { - struct lu_site_bkt_data *bkt; struct lu_object_header *h; struct hlist_node *hnode; u64 ver = cfs_hash_bd_version_get(bd); @@ -607,7 +621,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, return ERR_PTR(-ENOENT); *version = ver; - bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd); /* cfs_hash_bd_peek_locked is a somehow "internal" function * of cfs_hash, it doesn't add refcount on object. */ @@ -681,7 +694,9 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, struct lu_site *s; struct cfs_hash *hs; struct cfs_hash_bd bd; + struct lu_site_bkt_data *bkt; u64 version = 0; + int rc; /* * This uses standard index maintenance protocol: @@ -703,26 +718,50 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, */ s = dev->ld_site; hs = s->ls_obj_hash; + if (unlikely(OBD_FAIL_PRECHECK(OBD_FAIL_OBD_ZERO_NLINK_RACE))) + lu_site_purge(env, s, -1); cfs_hash_bd_get(hs, f, &bd); + bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, &bd); if (!(conf && conf->loc_flags & LOC_F_NEW)) { cfs_hash_bd_lock(hs, &bd, 1); o = htable_lookup(s, &bd, f, &version); cfs_hash_bd_unlock(hs, &bd, 1); - if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) + if (!IS_ERR(o)) { + if (likely(lu_object_is_inited(o->lo_header))) + return o; + + wait_event_idle(bkt->lsb_waitq, + lu_object_is_inited(o->lo_header) || + lu_object_is_dying(o->lo_header)); + + if (lu_object_is_dying(o->lo_header)) { + lu_object_put(env, o); + + return ERR_PTR(-ENOENT); + } + + return o; + } + + if (PTR_ERR(o) != -ENOENT) return o; } + /* - * Allocate new object. This may result in rather complicated - * operations, including fld queries, inode loading, etc. + * Allocate new object, NB, object is uninitialized in case object + * is changed between allocation and hash insertion, thus the object + * with stale attributes is returned. */ - o = lu_object_alloc(env, dev, f, conf); + o = lu_object_alloc(env, dev, f); if (IS_ERR(o)) return o; LASSERT(lu_fid_eq(lu_object_fid(o), f)); + CFS_RACE_WAIT(OBD_FAIL_OBD_ZERO_NLINK_RACE); + cfs_hash_bd_lock(hs, &bd, 1); if (conf && conf->loc_flags & LOC_F_NEW) @@ -733,6 +772,20 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash); cfs_hash_bd_unlock(hs, &bd, 1); + /* + * This may result in rather complicated operations, including + * fld queries, inode loading, etc. + */ + rc = lu_object_start(env, dev, o, conf); + if (rc) { + set_bit(LU_OBJECT_HEARD_BANSHEE, + &o->lo_header->loh_flags); + lu_object_put(env, o); + return ERR_PTR(rc); + } + + wake_up_all(&bkt->lsb_waitq); + lu_object_limit(env, dev); return o; @@ -741,6 +794,20 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE); cfs_hash_bd_unlock(hs, &bd, 1); lu_object_free(env, o); + + if (!(conf && conf->loc_flags & LOC_F_NEW) && + !lu_object_is_inited(shadow->lo_header)) { + wait_event_idle(bkt->lsb_waitq, + lu_object_is_inited(shadow->lo_header) || + lu_object_is_dying(shadow->lo_header)); + + if (lu_object_is_dying(shadow->lo_header)) { + lu_object_put(env, shadow); + + return ERR_PTR(-ENOENT); + } + } + return shadow; } EXPORT_SYMBOL(lu_object_find_at); @@ -998,7 +1065,7 @@ int lu_site_init(struct lu_site *s, struct lu_device *top) cfs_hash_for_each_bucket(s->ls_obj_hash, &bd, i) { bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, &bd); INIT_LIST_HEAD(&bkt->lsb_lru); - init_waitqueue_head(&bkt->lsb_marche_funebre); + init_waitqueue_head(&bkt->lsb_waitq); } s->ls_stats = lprocfs_alloc_stats(LU_SS_LAST_STAT, 0); diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h index c341567..45166c5 100644 --- a/include/linux/libcfs/libcfs_fail.h +++ b/include/linux/libcfs/libcfs_fail.h @@ -187,7 +187,7 @@ static inline void cfs_race(u32 id) CERROR("cfs_race id %x sleeping\n", id); rc = wait_event_interruptible(cfs_race_waitq, !!cfs_race_state); - CERROR("cfs_fail_race id %x awake, rc=%d\n", id, rc); + CERROR("cfs_fail_race id %x awake: rc=%d\n", id, rc); } else { CERROR("cfs_fail_race id %x waking\n", id); cfs_race_state = 1; @@ -198,4 +198,42 @@ static inline void cfs_race(u32 id) #define CFS_RACE(id) cfs_race(id) +/** + * Wait on race. + * + * The first thread that calls this with a matching fail_loc is put to sleep, + * but subseqent callers of this won't sleep. Until another thread that calls + * cfs_race_wakeup(), the first thread will be woken up and continue. + */ +static inline void cfs_race_wait(u32 id) +{ + if (CFS_FAIL_PRECHECK(id)) { + if (unlikely(__cfs_fail_check_set(id, 0, CFS_FAIL_LOC_NOSET))) { + int rc; + + cfs_race_state = 0; + CERROR("cfs_race id %x sleeping\n", id); + rc = wait_event_interruptible(cfs_race_waitq, + cfs_race_state != 0); + CERROR("cfs_fail_race id %x awake: rc=%d\n", id, rc); + } + } +} +#define CFS_RACE_WAIT(id) cfs_race_wait(id) + +/** + * Wake up the thread that is waiting on the matching fail_loc. + */ +static inline void cfs_race_wakeup(u32 id) +{ + if (CFS_FAIL_PRECHECK(id)) { + if (likely(!__cfs_fail_check_set(id, 0, CFS_FAIL_LOC_NOSET))) { + CERROR("cfs_fail_race id %x waking\n", id); + cfs_race_state = 1; + wake_up(&cfs_race_waitq); + } + } +} +#define CFS_RACE_WAKEUP(id) cfs_race_wakeup(id) + #endif /* _LIBCFS_FAIL_H */ From patchwork Thu Feb 27 21:14:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A57192A for ; Thu, 27 Feb 2020 21:40:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E6DA424690 for ; Thu, 27 Feb 2020 21:39:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6DA424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8E21534A690; Thu, 27 Feb 2020 13:32:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2BDA121FE8C for ; Thu, 27 Feb 2020 13:20:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4FDB08F22; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4E9FE46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:55 -0500 Message-Id: <1582838290-17243-428-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 427/622] lustre: osc: reserve lru pages for read in batch X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong The benefit of doing this is to reduce contention against atomic counter cl_lru_left by changing it from per-page access to per-IO access. We have done this optimization for write, do it for read too. WC-bug-id: https://jira.whamcloud.com/browse/LU-12520 Lustre-commit: 0692dadfba87 ("LU-12520 osc: reserve lru pages for read in batch") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35440 Reviewed-by: Patrick Farrell Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 6 +++--- fs/lustre/mdc/mdc_dev.c | 8 ++++---- fs/lustre/osc/osc_io.c | 20 ++++++++++---------- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 1c5af80..37e56ef 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -685,9 +685,9 @@ int osc_io_commit_async(const struct lu_env *env, int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios); void osc_io_iter_fini(const struct lu_env *env, const struct cl_io_slice *ios); -int osc_io_write_iter_init(const struct lu_env *env, - const struct cl_io_slice *ios); -void osc_io_write_iter_fini(const struct lu_env *env, +int osc_io_rw_iter_init(const struct lu_env *env, + const struct cl_io_slice *ios); +void osc_io_rw_iter_fini(const struct lu_env *env, const struct cl_io_slice *ios); int osc_io_fault_start(const struct lu_env *env, const struct cl_io_slice *ios); void osc_io_setattr_end(const struct lu_env *env, diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index df8bb33..b49509c 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -1257,13 +1257,13 @@ static void mdc_io_data_version_end(const struct lu_env *env, static struct cl_io_operations mdc_io_ops = { .op = { [CIT_READ] = { - .cio_iter_init = osc_io_iter_init, - .cio_iter_fini = osc_io_iter_fini, + .cio_iter_init = osc_io_rw_iter_init, + .cio_iter_fini = osc_io_rw_iter_fini, .cio_start = osc_io_read_start, }, [CIT_WRITE] = { - .cio_iter_init = osc_io_write_iter_init, - .cio_iter_fini = osc_io_write_iter_fini, + .cio_iter_init = osc_io_rw_iter_init, + .cio_iter_fini = osc_io_rw_iter_fini, .cio_start = osc_io_write_start, .cio_end = osc_io_end, }, diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index dfdf064..4f46b95 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -375,8 +375,8 @@ int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios) } EXPORT_SYMBOL(osc_io_iter_init); -int osc_io_write_iter_init(const struct lu_env *env, - const struct cl_io_slice *ios) +int osc_io_rw_iter_init(const struct lu_env *env, + const struct cl_io_slice *ios) { struct cl_io *io = ios->cis_io; struct osc_io *oio = osc_env_io(env); @@ -394,7 +394,7 @@ int osc_io_write_iter_init(const struct lu_env *env, return osc_io_iter_init(env, ios); } -EXPORT_SYMBOL(osc_io_write_iter_init); +EXPORT_SYMBOL(osc_io_rw_iter_init); void osc_io_iter_fini(const struct lu_env *env, const struct cl_io_slice *ios) @@ -412,8 +412,8 @@ void osc_io_iter_fini(const struct lu_env *env, } EXPORT_SYMBOL(osc_io_iter_fini); -void osc_io_write_iter_fini(const struct lu_env *env, - const struct cl_io_slice *ios) +void osc_io_rw_iter_fini(const struct lu_env *env, + const struct cl_io_slice *ios) { struct osc_io *oio = osc_env_io(env); struct osc_object *osc = cl2osc(ios->cis_obj); @@ -426,7 +426,7 @@ void osc_io_write_iter_fini(const struct lu_env *env, osc_io_iter_fini(env, ios); } -EXPORT_SYMBOL(osc_io_write_iter_fini); +EXPORT_SYMBOL(osc_io_rw_iter_fini); int osc_io_fault_start(const struct lu_env *env, const struct cl_io_slice *ios) { @@ -970,14 +970,14 @@ void osc_io_end(const struct lu_env *env, const struct cl_io_slice *slice) static const struct cl_io_operations osc_io_ops = { .op = { [CIT_READ] = { - .cio_iter_init = osc_io_iter_init, - .cio_iter_fini = osc_io_iter_fini, + .cio_iter_init = osc_io_rw_iter_init, + .cio_iter_fini = osc_io_rw_iter_fini, .cio_start = osc_io_read_start, .cio_fini = osc_io_fini }, [CIT_WRITE] = { - .cio_iter_init = osc_io_write_iter_init, - .cio_iter_fini = osc_io_write_iter_fini, + .cio_iter_init = osc_io_rw_iter_init, + .cio_iter_fini = osc_io_rw_iter_fini, .cio_start = osc_io_write_start, .cio_end = osc_io_end, .cio_fini = osc_io_fini From patchwork Thu Feb 27 21:14:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DACF7138D for ; Thu, 27 Feb 2020 21:40:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C2C6524690 for ; Thu, 27 Feb 2020 21:40:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2C6524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 096FC34A6D0; Thu, 27 Feb 2020 13:32:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8444221FE9E for ; Thu, 27 Feb 2020 13:20:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 52A008F23; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 515B946C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:56 -0500 Message-Id: <1582838290-17243-429-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 428/622] lustre: uapi: Make lustre_user.h c++-legal X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Rob Latham recent c++ compilers did not like some of the C idioms used in this header: - C++ checks the types of enums more forecfully than is done in C - signed vs unsigned comparisons will generate a warning under g++ - "invalid suffix on literal" warning: Lustre is not trying to generate a new literal identifier. WC-bug-id: https://jira.whamcloud.com/browse/LU-12527 Lustre-commit: 14b11dc3526a ("LU-12527 utils: Make lustre_user.h c++-legal") Signed-off-by: Rob Latham Reviewed-on: https://review.whamcloud.com/35471 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 139 +++++++++++++++++++------------- 1 file changed, 84 insertions(+), 55 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index db36ce5..3016b73 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -885,7 +885,7 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen) #define ALLQUOTA 255 /* set all quota */ -static inline char *qtype_name(int qtype) +static inline const char *qtype_name(int qtype) { switch (qtype) { case USRQUOTA: @@ -1206,7 +1206,8 @@ static inline enum hsm_event hsm_get_cl_event(__u16 flags) static inline void hsm_set_cl_event(enum changelog_rec_flags *clf_flags, enum hsm_event he) { - *clf_flags |= (he << CLF_HSM_EVENT_L); + *clf_flags = (enum changelog_rec_flags) + (*clf_flags | (he << CLF_HSM_EVENT_L)); } static inline __u16 hsm_get_cl_flags(enum changelog_rec_flags clf_flags) @@ -1217,7 +1218,8 @@ static inline __u16 hsm_get_cl_flags(enum changelog_rec_flags clf_flags) static inline void hsm_set_cl_flags(enum changelog_rec_flags *clf_flags, unsigned int bits) { - *clf_flags |= (bits << CLF_HSM_FLAG_L); + *clf_flags = (enum changelog_rec_flags) + (*clf_flags | (bits << CLF_HSM_FLAG_L)); } static inline int hsm_get_cl_error(enum changelog_rec_flags clf_flags) @@ -1228,7 +1230,8 @@ static inline int hsm_get_cl_error(enum changelog_rec_flags clf_flags) static inline void hsm_set_cl_error(enum changelog_rec_flags *clf_flags, unsigned int error) { - *clf_flags |= (error << CLF_HSM_ERR_L); + *clf_flags = (enum changelog_rec_flags) + (*clf_flags | (error << CLF_HSM_ERR_L)); } enum changelog_rec_extra_flags { @@ -1370,9 +1373,11 @@ static inline size_t changelog_rec_size(struct changelog_rec *rec) enum changelog_rec_extra_flags cref = CLFE_INVALID; if (rec->cr_flags & CLF_EXTRA_FLAGS) - cref = changelog_rec_extra_flags(rec)->cr_extra_flags; + cref = (enum changelog_rec_extra_flags) + changelog_rec_extra_flags(rec)->cr_extra_flags; - return changelog_rec_offset(rec->cr_flags, cref); + return changelog_rec_offset((enum changelog_rec_flags)rec->cr_flags, + cref); } static inline size_t changelog_rec_varsize(struct changelog_rec *rec) @@ -1383,7 +1388,8 @@ static inline size_t changelog_rec_varsize(struct changelog_rec *rec) static inline struct changelog_ext_rename *changelog_rec_rename(struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & CLF_VERSION; + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & CLF_VERSION); return (struct changelog_ext_rename *)((char *)rec + changelog_rec_offset(crf, @@ -1394,8 +1400,8 @@ struct changelog_ext_rename *changelog_rec_rename(struct changelog_rec *rec) static inline struct changelog_ext_jobid *changelog_rec_jobid(struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & (CLF_VERSION | CLF_RENAME)); return (struct changelog_ext_jobid *)((char *)rec + changelog_rec_offset(crf, @@ -1407,8 +1413,8 @@ struct changelog_ext_jobid *changelog_rec_jobid(struct changelog_rec *rec) struct changelog_ext_extra_flags *changelog_rec_extra_flags( const struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME | CLF_JOBID); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & (CLF_VERSION | CLF_RENAME | CLF_JOBID)); return (struct changelog_ext_extra_flags *)((char *)rec + changelog_rec_offset(crf, @@ -1420,8 +1426,9 @@ struct changelog_ext_extra_flags *changelog_rec_extra_flags( struct changelog_ext_uidgid *changelog_rec_uidgid( const struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & + (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS)); return (struct changelog_ext_uidgid *)((char *)rec + changelog_rec_offset(crf, @@ -1432,13 +1439,15 @@ struct changelog_ext_uidgid *changelog_rec_uidgid( static inline struct changelog_ext_nid *changelog_rec_nid(const struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & + (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS)); enum changelog_rec_extra_flags cref = CLFE_INVALID; if (rec->cr_flags & CLF_EXTRA_FLAGS) - cref = changelog_rec_extra_flags(rec)->cr_extra_flags & - CLFE_UIDGID; + cref = (enum changelog_rec_extra_flags) + (changelog_rec_extra_flags(rec)->cr_extra_flags & + CLFE_UIDGID); return (struct changelog_ext_nid *)((char *)rec + changelog_rec_offset(crf, cref)); @@ -1449,13 +1458,16 @@ struct changelog_ext_nid *changelog_rec_nid(const struct changelog_rec *rec) struct changelog_ext_openmode *changelog_rec_openmode( const struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & + (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS)); enum changelog_rec_extra_flags cref = CLFE_INVALID; - if (rec->cr_flags & CLF_EXTRA_FLAGS) - cref = changelog_rec_extra_flags(rec)->cr_extra_flags & - (CLFE_UIDGID | CLFE_NID); + if (rec->cr_flags & CLF_EXTRA_FLAGS) { + cref = (enum changelog_rec_extra_flags) + (changelog_rec_extra_flags(rec)->cr_extra_flags & + (CLFE_UIDGID | CLFE_NID)); + } return (struct changelog_ext_openmode *)((char *)rec + changelog_rec_offset(crf, cref)); @@ -1466,13 +1478,15 @@ struct changelog_ext_openmode *changelog_rec_openmode( struct changelog_ext_xattr *changelog_rec_xattr( const struct changelog_rec *rec) { - enum changelog_rec_flags crf = rec->cr_flags & - (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS); + enum changelog_rec_flags crf = (enum changelog_rec_flags) + (rec->cr_flags & + (CLF_VERSION | CLF_RENAME | CLF_JOBID | CLF_EXTRA_FLAGS)); enum changelog_rec_extra_flags cref = CLFE_INVALID; if (rec->cr_flags & CLF_EXTRA_FLAGS) - cref = changelog_rec_extra_flags(rec)->cr_extra_flags & - (CLFE_UIDGID | CLFE_NID | CLFE_OPEN); + cref = (enum changelog_rec_extra_flags) + (changelog_rec_extra_flags(rec)->cr_extra_flags & + (CLFE_UIDGID | CLFE_NID | CLFE_OPEN)); return (struct changelog_ext_xattr *)((char *)rec + changelog_rec_offset(crf, cref)); @@ -1484,10 +1498,12 @@ static inline char *changelog_rec_name(struct changelog_rec *rec) enum changelog_rec_extra_flags cref = CLFE_INVALID; if (rec->cr_flags & CLF_EXTRA_FLAGS) - cref = changelog_rec_extra_flags(rec)->cr_extra_flags; + cref = (enum changelog_rec_extra_flags) + changelog_rec_extra_flags(rec)->cr_extra_flags; - return (char *)rec + changelog_rec_offset(rec->cr_flags & CLF_SUPPORTED, - cref & CLFE_SUPPORTED); + return (char *)rec + changelog_rec_offset( + (enum changelog_rec_flags)(rec->cr_flags & CLF_SUPPORTED), + (enum changelog_rec_extra_flags)(cref & CLFE_SUPPORTED)); } static inline size_t changelog_rec_snamelen(struct changelog_rec *rec) @@ -1535,8 +1551,10 @@ static inline void changelog_remap_rec(struct changelog_rec *rec, char *jid_mov, *rnm_mov; enum changelog_rec_extra_flags cref = CLFE_INVALID; - crf_wanted &= CLF_SUPPORTED; - cref_want &= CLFE_SUPPORTED; + crf_wanted = (enum changelog_rec_flags) + (crf_wanted & CLF_SUPPORTED); + cref_want = (enum changelog_rec_extra_flags) + (cref_want & CLFE_SUPPORTED); if ((rec->cr_flags & CLF_SUPPORTED) == crf_wanted) { if (!(rec->cr_flags & CLF_EXTRA_FLAGS) || @@ -1554,38 +1572,49 @@ static inline void changelog_remap_rec(struct changelog_rec *rec, /* Locations of extensions in the remapped record */ if (rec->cr_flags & CLF_EXTRA_FLAGS) { xattr_mov = (char *)rec + - changelog_rec_offset(crf_wanted & CLF_SUPPORTED, - cref_want & ~CLFE_XATTR); + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & CLF_SUPPORTED), + (enum changelog_rec_extra_flags) + (cref_want & ~CLFE_XATTR)); omd_mov = (char *)rec + - changelog_rec_offset(crf_wanted & CLF_SUPPORTED, - cref_want & ~(CLFE_OPEN | - CLFE_XATTR)); + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & CLF_SUPPORTED), + (enum changelog_rec_extra_flags) + (cref_want & ~(CLFE_OPEN | + CLFE_XATTR))); nid_mov = (char *)rec + - changelog_rec_offset(crf_wanted & CLF_SUPPORTED, - cref_want & ~(CLFE_NID | - CLFE_OPEN | - CLFE_XATTR)); + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & CLF_SUPPORTED), + (enum changelog_rec_extra_flags) + (cref_want & ~(CLFE_NID | + CLFE_OPEN | + CLFE_XATTR))); uidgid_mov = (char *)rec + - changelog_rec_offset(crf_wanted & CLF_SUPPORTED, - cref_want & ~(CLFE_UIDGID | - CLFE_NID | - CLFE_OPEN | - CLFE_XATTR)); - cref = changelog_rec_extra_flags(rec)->cr_extra_flags; + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & CLF_SUPPORTED), + (enum changelog_rec_extra_flags) + (cref_want & ~(CLFE_UIDGID | + CLFE_NID | + CLFE_OPEN | + CLFE_XATTR))); + cref = (enum changelog_rec_extra_flags) + changelog_rec_extra_flags(rec)->cr_extra_flags; } ef_mov = (char *)rec + - changelog_rec_offset(crf_wanted & ~CLF_EXTRA_FLAGS, - CLFE_INVALID); + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & ~CLF_EXTRA_FLAGS), + CLFE_INVALID); jid_mov = (char *)rec + - changelog_rec_offset(crf_wanted & - ~(CLF_EXTRA_FLAGS | CLF_JOBID), + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & ~(CLF_EXTRA_FLAGS | + CLF_JOBID)), CLFE_INVALID); rnm_mov = (char *)rec + - changelog_rec_offset(crf_wanted & - ~(CLF_EXTRA_FLAGS | - CLF_JOBID | - CLF_RENAME), + changelog_rec_offset((enum changelog_rec_flags) + (crf_wanted & ~(CLF_EXTRA_FLAGS | + CLF_JOBID | + CLF_RENAME)), CLFE_INVALID); /* Move the extension fields to the desired positions */ @@ -1824,7 +1853,7 @@ static inline ssize_t hur_len(struct hsm_user_request *hur) (__u64)hur->hur_request.hr_itemcount * sizeof(hur->hur_user_item[0]) + hur->hur_request.hr_data_len; - if (size != (ssize_t)size) + if ((ssize_t)size < 0) return -1; return size; From patchwork Thu Feb 27 21:14:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410509 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0609692A for ; Thu, 27 Feb 2020 21:40:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E27DD24690 for ; Thu, 27 Feb 2020 21:40:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E27DD24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 13CE634A6FB; Thu, 27 Feb 2020 13:32:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DE8D721F799 for ; Thu, 27 Feb 2020 13:20:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 554368F24; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5424F46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:57 -0500 Message-Id: <1582838290-17243-430-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 429/622] lnet: create existing net returns EEXIST X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Olaf Faaland When "lnetctl net add" is called for an interface/net pair that already exists, the error returned should be EEXIST, so the user knows that the net is already configured. WC-bug-id: https://jira.whamcloud.com/browse/LU-12626 Lustre-commit: 4aa71267cc03 ("LU-12626 lnet: create existing net returns EEXIST") Signed-off-by: Olaf Faaland Reviewed-on: https://review.whamcloud.com/35681 Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e773839..79deaac 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2301,7 +2301,7 @@ static void lnet_push_target_fini(void) * up is actually unique. if it's not fail. */ if (!lnet_ni_unique_net(&net_l->net_ni_list, ni->ni_interfaces[0])) { - rc = -EINVAL; + rc = -EEXIST; goto failed1; } From patchwork Thu Feb 27 21:14:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410513 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCC73138D for ; Thu, 27 Feb 2020 21:40:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B4C8424690 for ; Thu, 27 Feb 2020 21:40:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4C8424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E348D34A725; Thu, 27 Feb 2020 13:32:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B0D621F799 for ; Thu, 27 Feb 2020 13:20:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 581058F25; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 56E4347C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:58 -0500 Message-Id: <1582838290-17243-431-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 430/622] lustre: obdecho: reuse an cl env cache for obdecho survey X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov obdecho environment is already CL_thread type, so easy to reuse cl_env cache instead of allocate env on each ioctl call. It reduce cpu usage dramatically. Cray-bug-id: LUS-7552 WC-bug-id: https://jira.whamcloud.com/browse/LU-12578 Lustre-commit: 55c33b70c46f ("LU-12578 obdecho: reuse an cl env cache for obdecho survey") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/35700 Reviewed-by: Alex Zhuravlev Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 9 ++++++ fs/lustre/obdclass/cl_object.c | 6 ++-- fs/lustre/obdclass/lu_object.c | 68 +++++++++++++++++++++++++++++++++++++++-- fs/lustre/obdecho/echo_client.c | 28 +++++++++++------ 4 files changed, 97 insertions(+), 14 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index 1c1a60f..b00fad8 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1208,6 +1208,14 @@ void *lu_context_key_get(const struct lu_context *ctx, void lu_context_key_revive_many(struct lu_context_key *k, ...); void lu_context_key_quiesce_many(struct lu_context_key *k, ...); +/* + * update/clear ctx/ses tags. + */ +void lu_context_tags_update(u32 tags); +void lu_context_tags_clear(u32 tags); +void lu_session_tags_update(u32 tags); +void lu_session_tags_clear(u32 tags); + /** * Environment. */ @@ -1225,6 +1233,7 @@ struct lu_env { int lu_env_init(struct lu_env *env, u32 tags); void lu_env_fini(struct lu_env *env); int lu_env_refill(struct lu_env *env); +int lu_env_refill_by_tags(struct lu_env *env, u32 ctags, u32 stags); struct lu_env *lu_env_find(void); int lu_env_add(struct lu_env *env); diff --git a/fs/lustre/obdclass/cl_object.c b/fs/lustre/obdclass/cl_object.c index b323eb4..57b3a9a 100644 --- a/fs/lustre/obdclass/cl_object.c +++ b/fs/lustre/obdclass/cl_object.c @@ -788,8 +788,10 @@ void cl_env_put(struct lu_env *env, u16 *refcheck) * with the standard tags. */ if (cl_envs[cpu].cec_count < cl_envs_cached_max && - (env->le_ctx.lc_tags & ~LCT_HAS_EXIT) == LCT_CL_THREAD && - (env->le_ses->lc_tags & ~LCT_HAS_EXIT) == LCT_SESSION) { + (env->le_ctx.lc_tags & ~LCT_HAS_EXIT) == + lu_context_tags_default && + (env->le_ses->lc_tags & ~LCT_HAS_EXIT) == + lu_session_tags_default) { read_lock(&cl_envs[cpu].cec_guard); list_add(&cle->ce_linkage, &cl_envs[cpu].cec_envs); cl_envs[cpu].cec_count++; diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 6fea1f3..dccff91 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1778,8 +1778,44 @@ int lu_context_refill(struct lu_context *ctx) * predefined when the lu_device type are registered, during the module probe * phase. */ -u32 lu_context_tags_default; -u32 lu_session_tags_default; +u32 lu_context_tags_default = LCT_CL_THREAD; +u32 lu_session_tags_default = LCT_SESSION; + +void lu_context_tags_update(__u32 tags) +{ + spin_lock(&lu_context_remembered_guard); + lu_context_tags_default |= tags; + atomic_inc(&key_set_version); + spin_unlock(&lu_context_remembered_guard); +} +EXPORT_SYMBOL(lu_context_tags_update); + +void lu_context_tags_clear(__u32 tags) +{ + spin_lock(&lu_context_remembered_guard); + lu_context_tags_default &= ~tags; + atomic_inc(&key_set_version); + spin_unlock(&lu_context_remembered_guard); +} +EXPORT_SYMBOL(lu_context_tags_clear); + +void lu_session_tags_update(__u32 tags) +{ + spin_lock(&lu_context_remembered_guard); + lu_session_tags_default |= tags; + atomic_inc(&key_set_version); + spin_unlock(&lu_context_remembered_guard); +} +EXPORT_SYMBOL(lu_session_tags_update); + +void lu_session_tags_clear(__u32 tags) +{ + spin_lock(&lu_context_remembered_guard); + lu_session_tags_default &= ~tags; + atomic_inc(&key_set_version); + spin_unlock(&lu_context_remembered_guard); +} +EXPORT_SYMBOL(lu_session_tags_clear); int lu_env_init(struct lu_env *env, u32 tags) { @@ -1801,6 +1837,34 @@ void lu_env_fini(struct lu_env *env) } EXPORT_SYMBOL(lu_env_fini); +/** + * Currently, this API will only be used by echo client. + * Because echo client and normal lustre client will share + * same cl_env cache. So echo client needs to refresh + * the env context after it get one from the cache, especially + * when normal client and echo client co-exist in the same client. + */ +int lu_env_refill_by_tags(struct lu_env *env, u32 ctags, + u32 stags) +{ + int result; + + if ((env->le_ctx.lc_tags & ctags) != ctags) { + env->le_ctx.lc_version = 0; + env->le_ctx.lc_tags |= ctags; + } + + if (env->le_ses && (env->le_ses->lc_tags & stags) != stags) { + env->le_ses->lc_version = 0; + env->le_ses->lc_tags |= stags; + } + + result = lu_env_refill(env); + + return result; +} +EXPORT_SYMBOL(lu_env_refill_by_tags); + int lu_env_refill(struct lu_env *env) { int result; diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 01d8c04..84823ec 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -50,6 +50,10 @@ * @{ */ +/* echo thread key have a CL_THREAD flag, which set cl_env function directly */ +#define ECHO_DT_CTX_TAG (LCT_REMEMBER | LCT_DT_THREAD) +#define ECHO_SES_TAG (LCT_REMEMBER | LCT_SESSION | LCT_SERVER_SESSION) + struct echo_device { struct cl_device ed_cl; struct echo_client_obd *ed_ec; @@ -1481,6 +1485,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, struct echo_object *eco; struct obd_ioctl_data *data = karg; struct lu_env *env; + u16 refcheck; struct obdo *oa; struct lu_fid fid; int rw = OBD_BRW_READ; @@ -1497,16 +1502,14 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, if (rc < 0) return rc; - env = kzalloc(sizeof(*env), GFP_NOFS); - if (!env) - return -ENOMEM; + env = cl_env_get(&refcheck); + if (IS_ERR(env)) + return PTR_ERR(env); - rc = lu_env_init(env, LCT_DT_THREAD); - if (rc) { - rc = -ENOMEM; - goto out; - } lu_env_add(env); + rc = lu_env_refill_by_tags(env, ECHO_DT_CTX_TAG, ECHO_SES_TAG); + if (rc != 0) + goto out; switch (cmd) { case OBD_IOC_CREATE: /* may create echo object */ @@ -1574,8 +1577,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw, out: lu_env_remove(env); - lu_env_fini(env); - kfree(env); + cl_env_put(env, &refcheck); return rc; } @@ -1606,6 +1608,9 @@ static int echo_client_setup(const struct lu_env *env, INIT_LIST_HEAD(&ec->ec_locks); ec->ec_unique = 0; + lu_context_tags_update(ECHO_DT_CTX_TAG); + lu_session_tags_update(ECHO_SES_TAG); + ocd = kzalloc(sizeof(*ocd), GFP_NOFS); if (!ocd) return -ENOMEM; @@ -1642,6 +1647,9 @@ static int echo_client_cleanup(struct obd_device *obddev) return -EBUSY; } + lu_session_tags_clear(ECHO_SES_TAG & ~LCT_SESSION); + lu_context_tags_clear(ECHO_DT_CTX_TAG); + LASSERT(refcount_read(&ec->ec_exp->exp_refcount) > 0); rc = obd_disconnect(ec->ec_exp); if (rc != 0) From patchwork Thu Feb 27 21:14:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410517 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBE5A92A for ; Thu, 27 Feb 2020 21:40:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C486524690 for ; Thu, 27 Feb 2020 21:40:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C486524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 729B134A752; Thu, 27 Feb 2020 13:32:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 83ACD21FEB2 for ; Thu, 27 Feb 2020 13:20:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5B5168F26; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 59AFA468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:14:59 -0500 Message-Id: <1582838290-17243-432-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 431/622] lustre: mdc: dir page ldp_hash_end mistakenly adjusted X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao On system PAGE_SIZE > 4k, mdc_adjust_dirpages() adjusts dir page end hash with le64_to_cpu() value, but it should be little endian. Fixes: 4f76f0ec093 ("staging: lustre: llite: move dir cache to MDC layer") WC-bug-id: https://jira.whamcloud.com/browse/LU-10094 Lustre-commit: d8b19ae66177 ("LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35517 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 693c455..162ace7 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1259,8 +1259,8 @@ static void mdc_adjust_dirpages(struct page **pages, int cfs_pgs, int lu_pgs) for (i = 0; i < cfs_pgs; i++) { struct lu_dirpage *dp = kmap(pages[i]); - u64 hash_end = le64_to_cpu(dp->ldp_hash_end); - u32 flags = le32_to_cpu(dp->ldp_flags); + u64 hash_end = dp->ldp_hash_end; + u32 flags = dp->ldp_flags; struct lu_dirpage *first = dp; while (--lu_pgs > 0) { @@ -1279,8 +1279,8 @@ static void mdc_adjust_dirpages(struct page **pages, int cfs_pgs, int lu_pgs) break; /* Save the hash and flags of this lu_dirpage. */ - hash_end = le64_to_cpu(dp->ldp_hash_end); - flags = le32_to_cpu(dp->ldp_flags); + hash_end = dp->ldp_hash_end; + flags = dp->ldp_flags; /* Check if lu_dirpage contains no entries. */ if (!end_dirent) From patchwork Thu Feb 27 21:15:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0907717E0 for ; Thu, 27 Feb 2020 21:37:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E5FA624690 for ; Thu, 27 Feb 2020 21:37:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5FA624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 32874349600; Thu, 27 Feb 2020 13:30:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C6C6021FEB2 for ; Thu, 27 Feb 2020 13:20:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5E5EB8F28; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5C7D446A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:00 -0500 Message-Id: <1582838290-17243-433-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 432/622] lnet: handle unlink before send completes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata If LNetMDUnlink() is called on an md with md->md_refcount > 0 then the eq callback isn't called. There is a scenario where the response times out before the send completes. So we have a refcount on the MD. The Unlink callback gets dropped on the floor. Send completes, but because we've already timed out, the REPLY for the GET is dropped. Now we're left with a peer that is in the following state: LNET_PEER_MULTI_RAIL LNET_PEER_DISCOVERING LNET_PEER_PING_SENT But no more events are coming to it, and the discovery never completes. This scenario can get RPCs stuck as well if the response times out before the send completes. The solution is to set the event status to -ETIMEDOUT to inform the send event handler that it should not expect a reply WC-bug-id: https://jira.whamcloud.com/browse/LU-10931 Lustre-commit: d8fc5c23fe54 ("LU-10931 lnet: handle unlink before send completes") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35444 Reviewed-by: Chris Horn Reviewed-by: Alexandr Boyko Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 805d5b9..0d6c363 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -820,7 +820,12 @@ unlink = lnet_md_unlinkable(md); if (md->md_eq) { - msg->msg_ev.status = status; + if ((md->md_flags & LNET_MD_FLAG_ABORTED) && !status) { + msg->msg_ev.status = -ETIMEDOUT; + CDEBUG(D_NET, "md 0x%p already unlinked\n", md); + } else { + msg->msg_ev.status = status; + } msg->msg_ev.unlinked = unlink; lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev); } From patchwork Thu Feb 27 21:15:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410409 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10AE517E0 for ; Thu, 27 Feb 2020 21:37:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EAC7C246A1 for ; Thu, 27 Feb 2020 21:37:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EAC7C246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C839034A2A0; Thu, 27 Feb 2020 13:30:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1444F21FEB2 for ; Thu, 27 Feb 2020 13:20:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 61D298F29; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F47346C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:01 -0500 Message-Id: <1582838290-17243-434-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 433/622] lustre: osc: layout and chunkbits alignment mismatch X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman In the discard case, the OSC fsync/writeback code asserts that each OSC extent is fully covered by the fsync request. It may happen that a start(or an end) of a component does not match the first (the last) osc object extent start (end), which is aligned by the cl_chunkbits which depends on the OST block size. The requirement for the component alignment is LOV_MIN_STRIPE_SIZE which is 64K, the ZFS block size could be in MBs. Use an aligned by chunk size the fsync reqion in the assertion. Fixes: 58c252e47d ("lustre: osc: Do not assert for first extent") WC-bug-id: https://jira.whamcloud.com/browse/LU-12462 Lustre-commit: 7a9f7dec700c ("LU-12462 osc: layout and chunkbits alignment mismatch") Signed-off-by: Vitaly Fertman Cray-bug-id: LUS-7498 Reviewed-on: https://review.whamcloud.com/35733 Reviewed-by: Mike Pershin Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 9e2f90d..3d47c02 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2930,18 +2930,25 @@ int osc_cache_writeback_range(const struct lu_env *env, struct osc_object *obj, list_move_tail(&ext->oe_link, list); unplug = true; } else { + struct client_obd *cli = osc_cli(obj); + int pcc_bits = cli->cl_chunkbits - PAGE_SHIFT; + pgoff_t align_by = (1 << pcc_bits); + pgoff_t a_start = round_down(start, align_by); + pgoff_t a_end = round_up(end, align_by); + + /* overflow case */ + if (end && !a_end) + a_end = CL_PAGE_EOF; /* the only discarder is lock cancelling, so - * [start, end] must contain this extent. - * However, with DOM, osc extent alignment may - * cause the first extent to start before the - * OST portion of the layout. This is never - * accessed for i/o, but the unused portion - * will not be covered by the sync request, - * so we cannot assert in that case. + * [start, end], aligned by chunk size, must + * contain this extent */ - EASSERT(ergo(!(ext == first_extent(obj)), - ext->oe_start >= start && - ext->oe_end <= end), ext); + LASSERTF(ext->oe_start >= a_start && + ext->oe_end <= a_end, + "ext [%lu, %lu] reg [%lu, %lu] orig [%lu %lu] align %lu bits %d\n", + ext->oe_start, ext->oe_end, + a_start, a_end, start, end, + align_by, pcc_bits); osc_extent_state_set(ext, OES_LOCKING); ext->oe_owner = current; list_move_tail(&ext->oe_link, &discard_list); From patchwork Thu Feb 27 21:15:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8183D92A for ; Thu, 27 Feb 2020 21:37:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 698FA24690 for ; Thu, 27 Feb 2020 21:37:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 698FA24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADF98349748; Thu, 27 Feb 2020 13:31:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 553C621FECF for ; Thu, 27 Feb 2020 13:20:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 644668F2A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6210446D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:02 -0500 Message-Id: <1582838290-17243-435-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 434/622] lnet: handle recursion in resend X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When we're resending a message we have to decommit it first. This could potentially result in another message being picked up from the queue and sent, which could fail immediately and be finalized, causing recursion. This problem was observed when a router was being shutdown. This patch uses the same mechanism used in lnet_finalize() to limit recursion. If a thread is already finalizing a message and it gets into path where it starts finalizing a second, then that message is queued and handled later. WC-bug-id: https://jira.whamcloud.com/browse/LU-12402 Lustre-commit: ad9243693c9a ("LU-12402 lnet: handle recursion in resend") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35431 Reviewed-by: Chris Horn Reviewed-by: Alexandr Boyko Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 4 + net/lnet/lnet/lib-msg.c | 292 +++++++++++++++++++++++++++-------------- 2 files changed, 194 insertions(+), 102 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 904ef7a..3f81928 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -985,9 +985,13 @@ struct lnet_msg_container { int msc_nfinalizers; /* msgs waiting to complete finalizing */ struct list_head msc_finalizing; + /* msgs waiting to be resent */ + struct list_head msc_resending; struct list_head msc_active; /* active message list */ /* threads doing finalization */ void **msc_finalizers; + /* threads doing resends */ + void **msc_resenders; }; /* Peer Discovery states */ diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 0d6c363..5c39ce3 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -597,6 +597,168 @@ } } +static void +lnet_resend_msg_locked(struct lnet_msg *msg) +{ + msg->msg_retry_count++; + + /* remove message from the active list and reset it to prepare + * for a resend. Two exceptions to this + * + * 1. the router case. When a message is being routed it is + * committed for rx when received and committed for tx when + * forwarded. We don't want to remove it from the active list, since + * code which handles receiving expects it to remain on the active + * list. + * + * 2. The REPLY case. Reply messages use the same message + * structure for the GET that was received. + */ + if (!msg->msg_routing && msg->msg_type != LNET_MSG_REPLY) { + list_del_init(&msg->msg_activelist); + msg->msg_onactivelist = 0; + } + + /* The msg_target.nid which was originally set + * when calling LNetGet() or LNetPut() might've + * been overwritten if we're routing this message. + * Call lnet_msg_decommit_tx() to return the credit + * this message consumed. The message will + * consume another credit when it gets resent. + */ + msg->msg_target.nid = msg->msg_hdr.dest_nid; + lnet_msg_decommit_tx(msg, -EAGAIN); + msg->msg_sending = 0; + msg->msg_receiving = 0; + msg->msg_target_is_router = 0; + + CDEBUG(D_NET, "%s->%s:%s:%s - queuing msg (%p) for resend\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + lnet_msgtyp2str(msg->msg_type), + lnet_health_error2str(msg->msg_health_status), msg); + + list_add_tail(&msg->msg_list, the_lnet.ln_mt_resendqs[msg->msg_tx_cpt]); + + wake_up(&the_lnet.ln_mt_waitq); +} + +int +lnet_check_finalize_recursion_locked(struct lnet_msg *msg, + struct list_head *containerq, + int nworkers, void **workers) +{ + int my_slot = -1; + int i; + + list_add_tail(&msg->msg_list, containerq); + + for (i = 0; i < nworkers; i++) { + if (workers[i] == current) + break; + + if (my_slot < 0 && !workers[i]) + my_slot = i; + } + + if (i < nworkers || my_slot < 0) + return -1; + + workers[my_slot] = current; + + return my_slot; +} + +int +lnet_attempt_msg_resend(struct lnet_msg *msg) +{ + struct lnet_msg_container *container; + int my_slot; + int cpt; + + /* we can only resend tx_committed messages */ + LASSERT(msg->msg_tx_committed); + + /* don't resend recovery messages */ + if (msg->msg_recovery) { + CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); + return -ENOTRECOVERABLE; + } + + /* if we explicitly indicated we don't want to resend then just + * return + */ + if (msg->msg_no_resend) { + CDEBUG(D_NET, "msg %s->%s requested no resend. retry# %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); + return -ENOTRECOVERABLE; + } + + /* check if the message has exceeded the number of retries */ + if (msg->msg_retry_count >= lnet_retry_count) { + CNETERR("msg %s->%s exceeded retry count %d\n", + libcfs_nid2str(msg->msg_from), + libcfs_nid2str(msg->msg_target.nid), + msg->msg_retry_count); + return -ENOTRECOVERABLE; + } + + cpt = msg->msg_tx_cpt; + lnet_net_lock(cpt); + + /* check again under lock */ + if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { + lnet_net_unlock(cpt); + return -ESHUTDOWN; + } + + container = the_lnet.ln_msg_containers[cpt]; + my_slot = lnet_check_finalize_recursion_locked(msg, + &container->msc_resending, + container->msc_nfinalizers, + container->msc_resenders); + /* enough threads are resending */ + if (my_slot == -1) { + lnet_net_unlock(cpt); + return 0; + } + + while (!list_empty(&container->msc_resending)) { + msg = list_entry(container->msc_resending.next, + struct lnet_msg, msg_list); + list_del(&msg->msg_list); + + /* resending the message will require us to call + * lnet_msg_decommit_tx() which will return the credit + * which this message holds. This could trigger another + * queued message to be sent. If that message fails and + * requires a resend we will recurse. + * But since at this point the slot is taken, the message + * will be queued in the container and dealt with + * later. This breaks the recursion. + */ + lnet_resend_msg_locked(msg); + } + + /* msc_resenders is an array of process pointers. Each entry holds + * a pointer to the current process operating on the message. An + * array entry is created per CPT. If the array slot is already + * set, then it means that there is a thread on the CPT currently + * resending a message. + * Once the thread finishes clear the slot to enable the thread to + * take on more resend work. + */ + container->msc_resenders[my_slot] = NULL; + lnet_net_unlock(cpt); + + return 0; +} + /* Do a health check on the message: * return -1 if we're not going to handle the error or * if we've reached the maximum number of retries. @@ -607,9 +769,9 @@ lnet_health_check(struct lnet_msg *msg) { enum lnet_msg_hstatus hstatus = msg->msg_health_status; - bool lo = false; - struct lnet_ni *ni; struct lnet_peer_ni *lpni; + struct lnet_ni *ni; + bool lo = false; /* if we're shutting down no point in handling health. */ if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) @@ -697,7 +859,7 @@ lnet_handle_local_failure(ni); if (msg->msg_tx_committed) /* add to the re-send queue */ - goto resend; + return lnet_attempt_msg_resend(msg); break; /* These errors will not trigger a resend so simply @@ -713,7 +875,7 @@ case LNET_MSG_STATUS_REMOTE_DROPPED: lnet_handle_remote_failure(lpni); if (msg->msg_tx_committed) - goto resend; + return lnet_attempt_msg_resend(msg); break; case LNET_MSG_STATUS_REMOTE_ERROR: @@ -725,87 +887,8 @@ LBUG(); } -resend: - /* we can only resend tx_committed messages */ - LASSERT(msg->msg_tx_committed); - - /* don't resend recovery messages */ - if (msg->msg_recovery) { - CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n", - libcfs_nid2str(msg->msg_from), - libcfs_nid2str(msg->msg_target.nid), - msg->msg_retry_count); - return -1; - } - - /* if we explicitly indicated we don't want to resend then just - * return - */ - if (msg->msg_no_resend) { - CDEBUG(D_NET, "msg %s->%s requested no resend. retry# %d\n", - libcfs_nid2str(msg->msg_from), - libcfs_nid2str(msg->msg_target.nid), - msg->msg_retry_count); - return -1; - } - - /* check if the message has exceeded the number of retries */ - if (msg->msg_retry_count >= lnet_retry_count) { - CNETERR("msg %s->%s exceeded retry count %d\n", - libcfs_nid2str(msg->msg_from), - libcfs_nid2str(msg->msg_target.nid), - msg->msg_retry_count); - return -1; - } - msg->msg_retry_count++; - - lnet_net_lock(msg->msg_tx_cpt); - - /* check again under lock */ - if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING) { - lnet_net_unlock(msg->msg_tx_cpt); - return -1; - } - - /* remove message from the active list and reset it in preparation - * for a resend. Two exception to this - * - * 1. the router case, when a message is committed for rx when - * received, then tx when it is sent. When committed to both tx and - * rx we don't want to remove it from the active list. - * - * 2. The REPLY case since it uses the same msg block for the GET - * that was received. - */ - if (!msg->msg_routing && msg->msg_type != LNET_MSG_REPLY) { - list_del_init(&msg->msg_activelist); - msg->msg_onactivelist = 0; - } - - /* The msg_target.nid which was originally set - * when calling LNetGet() or LNetPut() might've - * been overwritten if we're routing this message. - * Call lnet_return_tx_credits_locked() to return - * the credit this message consumed. The message will - * consume another credit when it gets resent. - */ - msg->msg_target.nid = msg->msg_hdr.dest_nid; - lnet_msg_decommit_tx(msg, -EAGAIN); - msg->msg_sending = 0; - msg->msg_receiving = 0; - msg->msg_target_is_router = 0; - - CDEBUG(D_NET, "%s->%s:%s:%s - queuing for resend\n", - libcfs_nid2str(msg->msg_hdr.src_nid), - libcfs_nid2str(msg->msg_hdr.dest_nid), - lnet_msgtyp2str(msg->msg_type), - lnet_health_error2str(hstatus)); - - list_add_tail(&msg->msg_list, the_lnet.ln_mt_resendqs[msg->msg_tx_cpt]); - lnet_net_unlock(msg->msg_tx_cpt); - - wake_up(&the_lnet.ln_mt_waitq); - return 0; + /* no resend is needed */ + return -1; } static void @@ -945,7 +1028,6 @@ int my_slot; int cpt; int rc; - int i; LASSERT(!in_interrupt()); @@ -967,7 +1049,6 @@ * put on the resend queue. */ if (!lnet_health_check(msg)) - /* Message is queued for resend */ return; } @@ -998,28 +1079,20 @@ lnet_net_lock(cpt); container = the_lnet.ln_msg_containers[cpt]; - list_add_tail(&msg->msg_list, &container->msc_finalizing); - /* - * Recursion breaker. Don't complete the message here if I am (or + /* Recursion breaker. Don't complete the message here if I am (or * enough other threads are) already completing messages */ - my_slot = -1; - for (i = 0; i < container->msc_nfinalizers; i++) { - if (container->msc_finalizers[i] == current) - break; - - if (my_slot < 0 && !container->msc_finalizers[i]) - my_slot = i; - } - - if (i < container->msc_nfinalizers || my_slot < 0) { + my_slot = lnet_check_finalize_recursion_locked(msg, + &container->msc_finalizing, + container->msc_nfinalizers, + container->msc_finalizers); + /* enough threads are resending */ + if (my_slot == -1) { lnet_net_unlock(cpt); return; } - container->msc_finalizers[my_slot] = current; - rc = 0; while ((msg = list_first_entry_or_null(&container->msc_finalizing, struct lnet_msg, @@ -1073,6 +1146,10 @@ kvfree(container->msc_finalizers); container->msc_finalizers = NULL; + + kfree(container->msc_resenders); + container->msc_resenders = NULL; + container->msc_init = 0; } @@ -1083,6 +1160,7 @@ INIT_LIST_HEAD(&container->msc_active); INIT_LIST_HEAD(&container->msc_finalizing); + INIT_LIST_HEAD(&container->msc_resending); /* number of CPUs */ container->msc_nfinalizers = cfs_cpt_weight(lnet_cpt_table(), cpt); @@ -1099,6 +1177,16 @@ return -ENOMEM; } + container->msc_resenders = kzalloc_cpt(container->msc_nfinalizers * + sizeof(*container->msc_resenders), + GFP_KERNEL, cpt); + + if (!container->msc_resenders) { + CERROR("Failed to allocate message resenders\n"); + lnet_msg_container_cleanup(container); + return -ENOMEM; + } + return 0; } From patchwork Thu Feb 27 21:15:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410521 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B29A138D for ; Thu, 27 Feb 2020 21:40:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6391924690 for ; Thu, 27 Feb 2020 21:40:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6391924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C19B234A783; Thu, 27 Feb 2020 13:32:53 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ABB6321FEDD for ; Thu, 27 Feb 2020 13:20:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 662B08F2B; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 64F3747C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:03 -0500 Message-Id: <1582838290-17243-436-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 435/622] lustre: llite: forget cached ACLs properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev Lustre with linux-4.* fails ACL tests (e.g. sanity/103 and sanityn/25) because ll_lock_cancel_bits() does not reset i_acl and i_default_acl into initial state. use kernel's forget_all_cached_acls() to do so. WC-bug-id: https://jira.whamcloud.com/browse/LU-12657 Lustre-commit: 3df034f8f46b ("LU-12657 llite: forget cached ACLs properly") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/35756 Reviewed-by: Neil Brown Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 71e757a..de01a73 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -361,6 +361,9 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) !is_root_inode(inode)) ll_invalidate_aliases(inode); + if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) + forget_all_cached_acls(inode); + iput(inode); } From patchwork Thu Feb 27 21:15:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF5DE924 for ; Thu, 27 Feb 2020 21:45:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D799D24690 for ; Thu, 27 Feb 2020 21:45:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D799D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1495E34B10F; Thu, 27 Feb 2020 13:36:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F19D821FEDD for ; Thu, 27 Feb 2020 13:20:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 692A18F2C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 67DF2468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:04 -0500 Message-Id: <1582838290-17243-437-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 436/622] lustre: osc: Fix dom handling in weight_ast X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The DOM bit can be cancelled at any time during calls to weigh_ast, so: 1. We cannot assert that it is present 2. We cannot use it to identify the !LDLM_EXTENT case when calling osc_lock_weight WC-bug-id: https://jira.whamcloud.com/browse/LU-12343 Lustre-commit: 92c4ad14d4b1 ("LU-12343 osc: Fix dom handling in weight_ast") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/34966 Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_lock.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index e01bf5f..33fdc7e7 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -673,7 +673,8 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) return 1; LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT || - ldlm_has_dom(dlmlock)); + dlmlock->l_resource->lr_type == LDLM_IBITS); + lock_res_and_lock(dlmlock); obj = dlmlock->l_ast_data; if (obj) @@ -701,12 +702,17 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock) goto out; } - if (ldlm_has_dom(dlmlock)) - weight = osc_lock_weight(env, obj, 0, OBD_OBJECT_EOF); - else + if (dlmlock->l_resource->lr_type == LDLM_EXTENT) weight = osc_lock_weight(env, obj, dlmlock->l_policy_data.l_extent.start, dlmlock->l_policy_data.l_extent.end); + else if (ldlm_has_dom(dlmlock)) + weight = osc_lock_weight(env, obj, 0, OBD_OBJECT_EOF); + /* The DOM bit can be cancelled at any time; in that case, we know + * there are no pages, so just return weight of 0 + */ + else + weight = 0; out: if (obj) From patchwork Thu Feb 27 21:15:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410417 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24CF0138D for ; Thu, 27 Feb 2020 21:37:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D61624690 for ; Thu, 27 Feb 2020 21:37:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D61624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9821234A2FC; Thu, 27 Feb 2020 13:31:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4381721FEDD for ; Thu, 27 Feb 2020 13:20:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6C0548F2D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6AB1346A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:05 -0500 Message-Id: <1582838290-17243-438-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 437/622] lustre: llite: Fix extents_stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Patch 32517 from LU-8066 that landed in OpenSFS branch changed: (1 << LL_HIST_START << i) to BIT(LL_HIST_START << i) But these are not equivalent because this changes the order of operations. The earlier one does the operations in this order: (1 << LL_HIST_START) << i The new one is this order: 1 << (LL_HIST_START << i) Which is quite different, as it's left shifting LL_HIST_START directly, and LL_HIST_START is a number of bits. The goal is really just to start with BIT(LL_HIST_START) and left shift by one (going from 4K, to 8K, etc) each time, so just use: BIT(LL_HIST_START + i) The result of this was that all i/os over 8K were placed in the 4K-8K stat bucket, because the loop exited early. Also add mmap'ed reads & writes to extents_stats. Add test for extents_stats. This was only broken in the OpenSFS branch but we want the improvements. WC-bug-id: https://jira.whamcloud.com/browse/LU-12394 Lustre-commit: d31a4dad4e69 ("LU-12394 llite: Fix extents_stats") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35075 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 23 +++++++++++++++++------ fs/lustre/llite/llite_mmap.c | 11 +++++++++++ fs/lustre/llite/lproc_llite.c | 6 +++--- fs/lustre/llite/vvp_io.c | 4 ---- 4 files changed, 31 insertions(+), 13 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 35e31ad..fa61b09 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1670,6 +1670,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct lu_env *env; struct vvp_io_args *args; + struct file *file = iocb->ki_filp; ssize_t result; u16 refcheck; ssize_t rc2; @@ -1693,7 +1694,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (cached) return result; - ll_ras_enter(iocb->ki_filp); + ll_ras_enter(file); result = ll_do_fast_read(iocb, to); if (result < 0 || iov_iter_count(to) == 0) @@ -1707,7 +1708,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) args->u.normal.via_iter = to; args->u.normal.via_iocb = iocb; - rc2 = ll_file_io_generic(env, args, iocb->ki_filp, CIT_READ, + rc2 = ll_file_io_generic(env, args, file, CIT_READ, &iocb->ki_pos, iov_iter_count(to)); if (rc2 > 0) result += rc2; @@ -1716,6 +1717,11 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) cl_env_put(env, &refcheck); out: + if (result > 0) + ll_rw_stats_tally(ll_i2sbi(file_inode(file)), current->pid, + LUSTRE_FPRIVATE(file), iocb->ki_pos, result, + READ); + return result; } @@ -1784,6 +1790,7 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) struct lu_env *env; struct vvp_io_args *args; ssize_t rc_tiny = 0, rc_normal; + struct file *file = iocb->ki_filp; u16 refcheck; bool cached; int result; @@ -1812,8 +1819,8 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * pages, and we can't do append writes because we can't guarantee the * required DLM locks are held to protect file size. */ - if (ll_sbi_has_tiny_write(ll_i2sbi(file_inode(iocb->ki_filp))) && - !(iocb->ki_filp->f_flags & (O_DIRECT | O_SYNC | O_APPEND))) + if (ll_sbi_has_tiny_write(ll_i2sbi(file_inode(file))) && + !(file->f_flags & (O_DIRECT | O_SYNC | O_APPEND))) rc_tiny = ll_do_tiny_write(iocb, from); /* In case of error, go on and try normal write - Only stop if tiny @@ -1832,8 +1839,8 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) args->u.normal.via_iter = from; args->u.normal.via_iocb = iocb; - rc_normal = ll_file_io_generic(env, args, iocb->ki_filp, CIT_WRITE, - &iocb->ki_pos, iov_iter_count(from)); + rc_normal = ll_file_io_generic(env, args, file, CIT_WRITE, + &iocb->ki_pos, iov_iter_count(from)); /* On success, combine bytes written. */ if (rc_tiny >= 0 && rc_normal > 0) @@ -1846,6 +1853,10 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) cl_env_put(env, &refcheck); out: + if (rc_normal > 0) + ll_rw_stats_tally(ll_i2sbi(file_inode(file)), current->pid, + LUSTRE_FPRIVATE(file), iocb->ki_pos, + rc_normal, WRITE); return rc_normal; } diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 71799cd..5c13164 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -406,6 +406,12 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) result = VM_FAULT_LOCKED; } sigprocmask(SIG_SETMASK, &old, NULL); + + if (vmf->page && result == VM_FAULT_LOCKED) + ll_rw_stats_tally(ll_i2sbi(file_inode(vma->vm_file)), + current->pid, LUSTRE_FPRIVATE(vma->vm_file), + cl_offset(NULL, vmf->page->index), PAGE_SIZE, + READ); return result; } @@ -459,6 +465,11 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) break; } + if (ret == VM_FAULT_LOCKED) + ll_rw_stats_tally(ll_i2sbi(file_inode(vma->vm_file)), + current->pid, LUSTRE_FPRIVATE(vma->vm_file), + cl_offset(NULL, vmf->page->index), PAGE_SIZE, + WRITE); return ret; } diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 6eb3d33..c2ec3fb 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1937,7 +1937,7 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, lprocfs_oh_clear(&io_extents->pp_extents[cur].pp_w_hist); } - for (i = 0; (count >= (1 << LL_HIST_START << i)) && + for (i = 0; (count >= BIT(LL_HIST_START + i)) && (i < (LL_HIST_MAX - 1)); i++) ; if (rw == 0) { @@ -2032,7 +2032,7 @@ static int ll_rw_offset_stats_seq_show(struct seq_file *seq, void *v) for (i = 0; i < LL_OFFSET_HIST_MAX; i++) { if (offset[i].rw_pid != 0) seq_printf(seq, - "%3c %10d %14llu %14llu %17lu %17lu %14llu\n", + "%3c %10d %14llu %14llu %17lu %17lu %14lld\n", offset[i].rw_op == READ ? 'R' : 'W', offset[i].rw_pid, offset[i].rw_range_start, @@ -2045,7 +2045,7 @@ static int ll_rw_offset_stats_seq_show(struct seq_file *seq, void *v) for (i = 0; i < LL_PROCESS_HIST_MAX; i++) { if (process[i].rw_pid != 0) seq_printf(seq, - "%3c %10d %14llu %14llu %17lu %17lu %14llu\n", + "%3c %10d %14llu %14llu %17lu %17lu %14lld\n", process[i].rw_op == READ ? 'R' : 'W', process[i].rw_pid, process[i].rw_range_start, diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 68455d5..847fb5e 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -791,8 +791,6 @@ static int vvp_io_read_start(const struct lu_env *env, if (result < cnt) io->ci_continue = 0; io->ci_nob += result; - ll_rw_stats_tally(ll_i2sbi(inode), current->pid, - vio->vui_fd, pos, result, READ); result = 0; } return result; @@ -1069,8 +1067,6 @@ static int vvp_io_write_start(const struct lu_env *env, if (result < cnt) io->ci_continue = 0; - ll_rw_stats_tally(ll_i2sbi(inode), current->pid, - vio->vui_fd, pos, result, WRITE); result = 0; } return result; From patchwork Thu Feb 27 21:15:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410525 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B557092A for ; Thu, 27 Feb 2020 21:40:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D96224690 for ; Thu, 27 Feb 2020 21:40:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D96224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA4A734A7AB; Thu, 27 Feb 2020 13:32:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9A27521FE3B for ; Thu, 27 Feb 2020 13:20:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6EFD48F2E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6D70C46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:06 -0500 Message-Id: <1582838290-17243-439-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 438/622] lustre: llite: don't miss every first stride page X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Whenever we need skip some pages for stride io read, we will calculate next start page index, however, this page index is skipped every time, because loop start from index + 1 Testing command: iozone -w -c -i 5 -t1 -j 2 -s 100m -r 1m -F data Without patch: 587384.69 kB/sec read write pages per rpc rpcs % cum % | rpcs % cum % 1: 16 19 19 | 0 0 0 2: 0 0 19 | 0 0 0 4: 0 0 19 | 0 0 0 8: 0 0 19 | 0 0 0 16: 0 0 19 | 0 0 0 32: 0 0 19 | 0 0 0 64: 0 0 19 | 0 0 0 128: 0 0 19 | 0 0 0 256: 0 0 19 | 0 0 0 512: 22 26 46 | 0 0 0 1024: 44 53 100 | 0 0 0 With patch: 744635.56 kB/sec read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 0 0 0 512: 8 13 13 | 0 0 0 1024: 50 86 100 | 0 0 0 We get better performances ~27% up here, and all 1 page RPC disappear. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: 29d8eb5ee7df ("LU-12043 llite: don't miss every first stride page") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35216 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 9c4b89f..4fec9a6 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -407,12 +407,12 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) } else if (stride_ria) { /* If it is not in the read-ahead window, and it is * read-ahead mode, then check whether it should skip - * the stride gap + * the stride gap. */ pgoff_t offset; - /* FIXME: This assertion only is valid when it is for - * forward read-ahead, it will be fixed when backward - * read-ahead is implemented + /* NOTE: This assertion only is valid when it is for + * forward read-ahead, must adjust if backward + * readahead is implemented. */ LASSERTF(page_idx >= ria->ria_stoff, "Invalid page_idx %lu rs %lu re %lu ro %lu rl %lu rp %lu\n", @@ -421,10 +421,11 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) ria->ria_length, ria->ria_pages); offset = page_idx - ria->ria_stoff; offset = offset % (ria->ria_length); - if (offset > ria->ria_pages) { - page_idx += ria->ria_length - offset; - CDEBUG(D_READA, "i %lu skip %lu\n", page_idx, - ria->ria_length - offset); + if (offset >= ria->ria_pages) { + page_idx += ria->ria_length - offset - 1; + CDEBUG(D_READA, + "Stride: jump %lu pages to %lu\n", + ria->ria_length - offset, page_idx); continue; } } From patchwork Thu Feb 27 21:15:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410529 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 77B70138D for ; Thu, 27 Feb 2020 21:40:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 60B3624690 for ; Thu, 27 Feb 2020 21:40:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 60B3624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5F60434A7D6; Thu, 27 Feb 2020 13:33:02 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 07ECB21FF50 for ; Thu, 27 Feb 2020 13:20:34 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 71A188F2F; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 704B146D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:07 -0500 Message-Id: <1582838290-17243-440-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 439/622] lustre: llite: swab LOV EA data in ll_getxattr_lov() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jian Yu On PPC client, the LOV EA data returned by getfattr from x86_64 server was not swabbed to the host endian. While running setfattr, the data was swabbed in ll_lov_setstripe_ea_info(), which caused magic mis-match in ll_lov_user_md_size() and then ll_setstripe_ea() returned -ERANGE. This patch fixed the above issue by swabbing LOV EA data in ll_getxattr_lov(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12589 Lustre-commit: 5590f5aa94a5 ("LU-12589 llite: swab LOV EA data in ll_getxattr_lov()") Signed-off-by: Jian Yu Reviewed-on: https://review.whamcloud.com/35626 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/include/lustre_swab.h | 2 +- fs/lustre/llite/dir.c | 4 ++-- fs/lustre/llite/file.c | 4 ++-- fs/lustre/llite/xattr.c | 16 ++++++++-------- fs/lustre/ptlrpc/pack_generic.c | 40 ++++++++++++++++++++++++++++++++++------ 5 files changed, 47 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/lustre_swab.h b/fs/lustre/include/lustre_swab.h index e99e16d..dd3c50c 100644 --- a/fs/lustre/include/lustre_swab.h +++ b/fs/lustre/include/lustre_swab.h @@ -86,7 +86,7 @@ void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum); void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, int stripe_count); -void lustre_swab_lov_user_md(struct lov_user_md *lum); +void lustre_swab_lov_user_md(struct lov_user_md *lum, size_t size); void lustre_swab_lov_mds_md(struct lov_mds_md *lmm); void lustre_swab_lustre_capa(struct lustre_capa *c); void lustre_swab_lustre_capa_key(struct lustre_capa_key *k); diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 3540c18..812f535 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -564,7 +564,7 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, */ if ((__swab32(lump->lmm_magic) & le32_to_cpu(LOV_MAGIC_MASK)) == le32_to_cpu(LOV_MAGIC_MAGIC)) - lustre_swab_lov_user_md(lump); + lustre_swab_lov_user_md(lump, 0); } else { lum_size = sizeof(struct lov_user_md_v1); } @@ -696,7 +696,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, case LOV_MAGIC_COMP_V1: case LOV_USER_MAGIC_SPECIFIC: if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) - lustre_swab_lov_user_md((struct lov_user_md *)lmm); + lustre_swab_lov_user_md((struct lov_user_md *)lmm, 0); break; case LMV_MAGIC_V1: if (cpu_to_le32(LMV_MAGIC) != LMV_MAGIC) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fa61b09..6c5b9eb 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1873,7 +1873,7 @@ int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry, if ((__swab32(lum->lmm_magic) & le32_to_cpu(LOV_MAGIC_MASK)) == le32_to_cpu(LOV_MAGIC_MAGIC)) { /* this code will only exist for big-endian systems */ - lustre_swab_lov_user_md(lum); + lustre_swab_lov_user_md(lum, 0); } ll_inode_size_lock(inode); @@ -1956,7 +1956,7 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, stripe_count = 0; } - lustre_swab_lov_user_md((struct lov_user_md *)lmm); + lustre_swab_lov_user_md((struct lov_user_md *)lmm, 0); /* if function called for directory - we should * avoid swab not existent lsm objects diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index cf1cfd2..4e1ce34 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -320,7 +320,7 @@ static int ll_xattr_set(const struct xattr_handler *handler, if (strncmp(name, "lov.", 4) == 0 && (__swab32(((struct lov_user_md *)value)->lmm_magic) & le32_to_cpu(LOV_MAGIC_MASK)) == le32_to_cpu(LOV_MAGIC_MAGIC)) - lustre_swab_lov_user_md((struct lov_user_md *)value); + lustre_swab_lov_user_md((struct lov_user_md *)value, 0); return ll_xattr_set_common(handler, dentry, inode, name, value, size, flags); @@ -459,7 +459,6 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) }; struct lu_env *env; u16 refcheck; - u32 magic; if (!obj) return -ENODATA; @@ -490,12 +489,12 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) * recognizing layout gen as stripe offset when the * file is restored. See LU-2809. */ - magic = ((struct lov_mds_md *)buf)->lmm_magic; - if ((magic & __swab32(LOV_MAGIC_MAGIC)) == - __swab32(LOV_MAGIC_MAGIC)) - magic = __swab32(magic); + if ((((struct lov_mds_md *)buf)->lmm_magic & + __swab32(LOV_MAGIC_MAGIC)) == __swab32(LOV_MAGIC_MAGIC)) + lustre_swab_lov_user_md((struct lov_user_md *)buf, + cl.cl_size); - switch (magic) { + switch (((struct lov_mds_md *)buf)->lmm_magic) { case LOV_MAGIC_V1: case LOV_MAGIC_V3: case LOV_MAGIC_SPECIFIC: @@ -505,7 +504,8 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) case LOV_MAGIC_FOREIGN: goto out_env; default: - CERROR("Invalid LOV magic %08x\n", magic); + CERROR("Invalid LOV magic %08x\n", + ((struct lov_mds_md *)buf)->lmm_magic); rc = -EINVAL; goto out_env; } diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index b066113..6a4ea7a 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -2147,23 +2147,51 @@ void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, } EXPORT_SYMBOL(lustre_swab_lov_user_md_objects); -void lustre_swab_lov_user_md(struct lov_user_md *lum) +void lustre_swab_lov_user_md(struct lov_user_md *lum, size_t size) { + struct lov_user_md_v1 *v1; + struct lov_user_md_v3 *v3; + struct lov_foreign_md *lfm; + u16 stripe_count; + CDEBUG(D_IOCTL, "swabbing lov_user_md\n"); switch (lum->lmm_magic) { case __swab32(LOV_MAGIC_V1): case LOV_USER_MAGIC_V1: - lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lum); + { + v1 = (struct lov_user_md_v1 *)lum; + stripe_count = v1->lmm_stripe_count; + + if (lum->lmm_magic != LOV_USER_MAGIC_V1) + __swab16s(&stripe_count); + + lustre_swab_lov_user_md_v1(v1); + if (size > sizeof(*v1)) + lustre_swab_lov_user_md_objects(v1->lmm_objects, + stripe_count); + break; + } case __swab32(LOV_MAGIC_V3): case LOV_USER_MAGIC_V3: - lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lum); + { + v3 = (struct lov_user_md_v3 *)lum; + stripe_count = v3->lmm_stripe_count; + + if (lum->lmm_magic != LOV_USER_MAGIC_V3) + __swab16s(&stripe_count); + + lustre_swab_lov_user_md_v3(v3); + if (size > sizeof(*v3)) + lustre_swab_lov_user_md_objects(v3->lmm_objects, + stripe_count); break; + } case __swab32(LOV_USER_MAGIC_SPECIFIC): case LOV_USER_MAGIC_SPECIFIC: { - struct lov_user_md_v3 *v3 = (struct lov_user_md_v3 *)lum; - u16 stripe_count = v3->lmm_stripe_count; + v3 = (struct lov_user_md_v3 *)lum; + stripe_count = v3->lmm_stripe_count; if (lum->lmm_magic != LOV_USER_MAGIC_SPECIFIC) __swab16s(&stripe_count); @@ -2179,7 +2207,7 @@ void lustre_swab_lov_user_md(struct lov_user_md *lum) case __swab32(LOV_MAGIC_FOREIGN): case LOV_USER_MAGIC_FOREIGN: { - struct lov_foreign_md *lfm = (struct lov_foreign_md *)lum; + lfm = (struct lov_foreign_md *)lum; __swab32s(&lfm->lfm_magic); __swab32s(&lfm->lfm_length); From patchwork Thu Feb 27 21:15:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410533 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C89FE138D for ; Thu, 27 Feb 2020 21:40:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B14DE24690 for ; Thu, 27 Feb 2020 21:40:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B14DE24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5918434A805; Thu, 27 Feb 2020 13:33:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6534C21FE3B for ; Thu, 27 Feb 2020 13:20:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7491E8F30; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7313D47C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:08 -0500 Message-Id: <1582838290-17243-441-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 440/622] lustre: llite: Mark lustre_inode_cache as reclaimable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jacek Tomaka , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jacek Tomaka This is required for proper kernel memory available accounting. Without it memory allocated to lustre_inode_cache appears as SUnreclaim where in reality it should apper as SReclaimable. This affect MemAvailable as well (it is lower than it should be). WC-bug-id: https://jira.whamcloud.com/browse/LU-12313 Lustre-commit: b09e63db24e5 ("LU-12313 llite: Mark lustre_inode_cache as reclaimable") Signed-off-by: Jacek Tomaka Reviewed-on: https://review.whamcloud.com/35790 Reviewed-by: Wang Shilong Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/super25.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index afd51a6..38d60b0 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -211,7 +211,11 @@ static int __init lustre_init(void) rc = -ENOMEM; ll_inode_cachep = kmem_cache_create("lustre_inode_cache", sizeof(struct ll_inode_info), 0, - SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, + SLAB_HWCACHE_ALIGN | + SLAB_RECLAIM_ACCOUNT | + SLAB_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, NULL); if (!ll_inode_cachep) goto out_cache; From patchwork Thu Feb 27 21:15:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410537 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C717C92A for ; Thu, 27 Feb 2020 21:40:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF90A24690 for ; Thu, 27 Feb 2020 21:40:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF90A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 403B034A837; Thu, 27 Feb 2020 13:33:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AABE621FEB4 for ; Thu, 27 Feb 2020 13:20:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7813C8F31; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 75D47468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:09 -0500 Message-Id: <1582838290-17243-442-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 441/622] lustre: osc: add preferred checksum type support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Xi Some checksum types might not work correctly even though they are available options and have the best speeds during test. In these circumstances, users might want to use a certain checksum type which is known to be functional. However, "lctl conf_param XXX-YYY.osc. checksum_type=ZZZ" won't help to enforce a certain checksum type, because the selected checksum type is determined during OSC connection, which will overwrite the LLOG parameter. To solve this problem, whenever a valid checksum type is set by "lctl conf_param" or "lctl set_param", it is remembered as the perferred checksum type for the OSC. During connection process, if that checksum type is available, that checksum type will be selected as the RPC checksum type regardless of its speed. The semantics of interface /proc/fs/lustre/osc/*/checksum_type is changed for a little bit. If a wrong checksum name is being written into this entry, -EINVAL will be returned as before. If the written string is a valid checksum name, even though the checksum type is not supported by this OSC/OST pair, the checksum type will still be remembered as the perferred checksum type, and return value will be -ENOTSUPP. Whenever connecting/reconnecting happens, if perferred checksum type is available, it will be used for the RPC checksum. WC-bug-id: https://jira.whamcloud.com/browse/LU-11011 Lustre-commit: 9b6b5e479828 ("LU-11011 osc: add preferred checksum type support") Signed-off-by: Li Xi Reviewed-on: https://review.whamcloud.com/32349 Reviewed-by: Li Dongyang Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 2 ++ fs/lustre/include/obd_cksum.h | 13 ++++++++++--- fs/lustre/ldlm/ldlm_lib.c | 1 + fs/lustre/osc/lproc_osc.c | 19 ++++++++++++------- fs/lustre/ptlrpc/import.c | 3 ++- 5 files changed, 27 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 886c697..70dbaaf 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -339,6 +339,8 @@ struct client_obd { u32 cl_supp_cksum_types; /* checksum algorithm to be used */ enum cksum_type cl_cksum_type; + /* preferred checksum algorithm to be used */ + enum cksum_type cl_preferred_cksum_type; /* also protected by the poorly named _loi_list_lock lock above */ struct osc_async_rc cl_ar; diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h index cc47c44..c03d0e6 100644 --- a/fs/lustre/include/obd_cksum.h +++ b/fs/lustre/include/obd_cksum.h @@ -109,10 +109,17 @@ static inline enum cksum_type obd_cksum_types_supported_client(void) * Caution is advised, however, since what is fastest on a single client may * not be the fastest or most efficient algorithm on the server. */ -static inline enum cksum_type -obd_cksum_type_select(const char *obd_name, enum cksum_type cksum_types) +static inline +enum cksum_type obd_cksum_type_select(const char *obd_name, + enum cksum_type cksum_types, + enum cksum_type preferred) { - u32 flag = obd_cksum_type_pack(obd_name, cksum_types); + u32 flag; + + if (preferred & cksum_types) + return preferred; + + flag = obd_cksum_type_pack(obd_name, cksum_types); return obd_cksum_type_unpack(flag); } diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index af74f97..127ed32 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -364,6 +364,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) atomic_set(&cli->cl_destroy_in_flight, 0); cli->cl_supp_cksum_types = OBD_CKSUM_CRC32; + cli->cl_preferred_cksum_type = 0; /* Turn on checksumming by default. */ cli->cl_checksum = 1; /* diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 775bf74..8e0088b 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -415,6 +415,7 @@ static ssize_t osc_checksum_type_seq_write(struct file *file, DECLARE_CKSUM_NAME; char kernbuf[10]; int i; + int rc = -EINVAL; if (!obd) return 0; @@ -423,22 +424,26 @@ static ssize_t osc_checksum_type_seq_write(struct file *file, return -EINVAL; if (copy_from_user(kernbuf, buffer, count)) return -EFAULT; + if (count > 0 && kernbuf[count - 1] == '\n') kernbuf[count - 1] = '\0'; else kernbuf[count] = '\0'; for (i = 0; i < ARRAY_SIZE(cksum_name); i++) { - if (((1 << i) & obd->u.cli.cl_supp_cksum_types) == 0) - continue; - if (!strcmp(kernbuf, cksum_name[i])) { - obd->u.cli.cl_cksum_type = 1 << i; - return count; + if (strcmp(kernbuf, cksum_name[i]) == 0) { + obd->u.cli.cl_preferred_cksum_type = BIT(i); + if (obd->u.cli.cl_supp_cksum_types & BIT(i)) { + obd->u.cli.cl_cksum_type = BIT(i); + rc = count; + } else { + rc = -ENOTSUPP; + } + break; } } - return -EINVAL; + return rc; } - LPROC_SEQ_FOPS(osc_checksum_type); static ssize_t resend_count_show(struct kobject *kobj, diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 0ade41e..a6d0b32 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -846,7 +846,8 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp, cli->cl_supp_cksum_types = OBD_CKSUM_ADLER; } cli->cl_cksum_type = obd_cksum_type_select(imp->imp_obd->obd_name, - cli->cl_supp_cksum_types); + cli->cl_supp_cksum_types, + cli->cl_preferred_cksum_type); if (ocd->ocd_connect_flags & OBD_CONNECT_BRW_SIZE) cli->cl_max_pages_per_rpc = From patchwork Thu Feb 27 21:15:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410421 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48DB092A for ; Thu, 27 Feb 2020 21:38:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3155024690 for ; Thu, 27 Feb 2020 21:38:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3155024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 90F94349775; Thu, 27 Feb 2020 13:31:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0D06D21FF56 for ; Thu, 27 Feb 2020 13:20:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7A2E78F32; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 78AEE46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:10 -0500 Message-Id: <1582838290-17243-443-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 442/622] lustre: ptlrpc: Stop sending ptlrpc_body_v2 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell ptlrpc_body_v2 does not include space for jobids, that means that when we added jobid to the RPC debug messages, we started getting errors like this: LustreError: 6817:0:(pack_generic.c:425:lustre_msg_buf_v2()) msg 000000005c83b7a2 buffer[0] size 152 too small (required 184, opc=-1) This happened every time we tried to print a ptlrpc_body_v2 message. body_v2 is still sent on some RPCs for compatibility with very old versions of Lustre, but we no longer support interop with those versions (latest reported is 2.3). So, stop sending ptlrpc_body_v2 on any RPCs. Note that we need to retain the ptlrpc_body_v2 definitions and parsing capability for interop with servers which still use them for some messages, which is all prior to this patch. One further note: This does *not* fix the case of newer clients collecting rpctrace with older servers. They will still see the error message for some RPCs. That could be fixed with tweaks to the debug printing code. WC-bug-id: https://jira.whamcloud.com/browse/LU-12523 Lustre-commit: fb18c05c0f5e ("LU-12523 ptlrpc: Stop sending ptlrpc_body_v2") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35583 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 27 +-------------------------- fs/lustre/ptlrpc/niobuf.c | 11 ----------- fs/lustre/ptlrpc/pack_generic.c | 16 ++-------------- 3 files changed, 3 insertions(+), 51 deletions(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index dcc5e6b..c750a4e 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -817,32 +817,7 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, int ptlrpc_request_pack(struct ptlrpc_request *request, u32 version, int opcode) { - int rc; - - rc = ptlrpc_request_bufs_pack(request, version, opcode, NULL, NULL); - if (rc) - return rc; - - /* - * For some old 1.8 clients (< 1.8.7), they will LASSERT the size of - * ptlrpc_body sent from server equal to local ptlrpc_body size, so we - * have to send old ptlrpc_body to keep interoperability with these - * clients. - * - * Only three kinds of server->client RPCs so far: - * - LDLM_BL_CALLBACK - * - LDLM_CP_CALLBACK - * - LDLM_GL_CALLBACK - * - * XXX This should be removed whenever we drop the interoperability with - * the these old clients. - */ - if (opcode == LDLM_BL_CALLBACK || opcode == LDLM_CP_CALLBACK || - opcode == LDLM_GL_CALLBACK) - req_capsule_shrink(&request->rq_pill, &RMF_PTLRPC_BODY, - sizeof(struct ptlrpc_body_v2), RCL_CLIENT); - - return rc; + return ptlrpc_request_bufs_pack(request, version, opcode, NULL, NULL); } EXPORT_SYMBOL(ptlrpc_request_pack); diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 2e866fe..9d9e94c 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -388,17 +388,6 @@ int ptlrpc_send_reply(struct ptlrpc_request *req, int flags) req->rq_export->exp_obd->obd_minor); } - /* In order to keep interoperability with the client (< 2.3) which - * doesn't have pb_jobid in ptlrpc_body, We have to shrink the - * ptlrpc_body in reply buffer to ptlrpc_body_v2, otherwise, the - * reply buffer on client will be overflow. - * - * XXX Remove this whenever we drop the interoperability with - * such client. - */ - req->rq_replen = lustre_shrink_msg(req->rq_repmsg, 0, - sizeof(struct ptlrpc_body_v2), 1); - if (req->rq_type != PTL_RPC_MSG_ERR) req->rq_type = PTL_RPC_MSG_REPLY; diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 6a4ea7a..e63720b 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -91,21 +91,9 @@ bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, /* early reply size */ u32 lustre_msg_early_size(void) { - static u32 size; - - if (!size) { - /* Always reply old ptlrpc_body_v2 to keep interoperability - * with the old client (< 2.3) which doesn't have pb_jobid - * in the ptlrpc_body. - * - * XXX Remove this whenever we drop interoperability with such - * client. - */ - u32 pblen = sizeof(struct ptlrpc_body_v2); + u32 pblen = sizeof(struct ptlrpc_body); - size = lustre_msg_size(LUSTRE_MSG_MAGIC_V2, 1, &pblen); - } - return size; + return lustre_msg_size(LUSTRE_MSG_MAGIC_V2, 1, &pblen); } EXPORT_SYMBOL(lustre_msg_early_size); From patchwork Thu Feb 27 21:15:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB77B138D for ; Thu, 27 Feb 2020 21:38:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C3BAD24690 for ; Thu, 27 Feb 2020 21:38:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3BAD24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9144A34A34C; Thu, 27 Feb 2020 13:31:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6340021F7B2 for ; Thu, 27 Feb 2020 13:20:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7CB718F33; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7B9A246C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:11 -0500 Message-Id: <1582838290-17243-444-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 443/622] lnet: Fix style issues for selftest/rpc.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff This patch fixes issues reported by checkpatch for the file selftest/rpc.c. Linux 5.3 enforces the use of 'fallthrough' which is also suggested by checkpatch Cray-bug-id: LUS-7690 WC-bug-id: https://jira.whamcloud.com/browse/LU-12635 Lustre-commit: 4bfe21d09c39 ("LU-12635 lnet: Fix style issues for selftest/rpc.c") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35800 Reviewed-by: Petros Koutoupis Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/selftest/rpc.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index a5941e4..4645f04 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -141,7 +141,8 @@ struct srpc_bulk * struct page *pg; int nob; - pg = alloc_pages_node(cfs_cpt_spread_node(lnet_cpt_table(), cpt), + pg = alloc_pages_node(cfs_cpt_spread_node(lnet_cpt_table(), + cpt), GFP_KERNEL, 0); if (!pg) { CERROR("Can't allocate page %d of %d\n", i, bulk_npg); @@ -386,7 +387,8 @@ struct srpc_bulk * return -ENOMEM; } - CDEBUG(D_NET, "Posted passive RDMA: peer %s, portal %d, matchbits %#llx\n", + CDEBUG(D_NET, + "Posted passive RDMA: peer %s, portal %d, matchbits %#llx\n", libcfs_id2str(peer), portal, matchbits); return 0; } @@ -440,7 +442,8 @@ struct srpc_bulk * rc = LNetMDUnlink(*mdh); LASSERT(!rc); } else { - CDEBUG(D_NET, "Posted active RDMA: peer %s, portal %u, matchbits %#llx\n", + CDEBUG(D_NET, + "Posted active RDMA: peer %s, portal %u, matchbits %#llx\n", libcfs_id2str(peer), portal, matchbits); } return 0; @@ -515,7 +518,8 @@ struct srpc_bulk * void srpc_add_buffer(struct swi_workitem *wi) { - struct srpc_service_cd *scd = container_of(wi, struct srpc_service_cd, scd_buf_wi); + struct srpc_service_cd *scd = container_of(wi, struct srpc_service_cd, + scd_buf_wi); struct srpc_buffer *buf; int rc = 0; @@ -662,7 +666,8 @@ struct srpc_bulk * spin_lock(&scd->scd_lock); if (scd->scd_buf_nposted > 0) { - CDEBUG(D_NET, "waiting for %d posted buffers to unlink\n", + CDEBUG(D_NET, + "waiting for %d posted buffers to unlink\n", scd->scd_buf_nposted); spin_unlock(&scd->scd_lock); return 0; @@ -960,7 +965,8 @@ struct srpc_bulk * void srpc_handle_rpc(struct swi_workitem *wi) { - struct srpc_server_rpc *rpc = container_of(wi, struct srpc_server_rpc, srpc_wi); + struct srpc_server_rpc *rpc = container_of(wi, struct srpc_server_rpc, + srpc_wi); struct srpc_service_cd *scd = rpc->srpc_scd; struct srpc_service *sv = scd->scd_svc; struct srpc_event *ev = &rpc->srpc_ev; @@ -1398,7 +1404,9 @@ struct srpc_client_rpc * return rc; } -/* when in kernel always called with lnet_net_lock() held, and in thread context */ +/* when in kernel always called with lnet_net_lock() held, + * and in thread context + */ static void srpc_lnet_ev_handler(struct lnet_event *ev) { @@ -1451,7 +1459,8 @@ struct srpc_client_rpc * rpcev, crpc, &crpc->crpc_reqstev, &crpc->crpc_replyev, &crpc->crpc_bulkev); CERROR("Bad event: status %d, type %d, lnet %d\n", - rpcev->ev_status, rpcev->ev_type, rpcev->ev_lnet); + rpcev->ev_status, rpcev->ev_type, + rpcev->ev_lnet); LBUG(); } From patchwork Thu Feb 27 21:15:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4F95E17E0 for ; Thu, 27 Feb 2020 21:42:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3747824690 for ; Thu, 27 Feb 2020 21:42:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3747824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DFD5349295; Thu, 27 Feb 2020 13:34:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B94A621F190 for ; Thu, 27 Feb 2020 13:20:36 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7FD748F34; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7E54F46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:12 -0500 Message-Id: <1582838290-17243-445-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 444/622] lnet: Fix style issues for module.c conctl.c X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff This patch fixes issues reported by checkpatch for the file selftest/module.c and selftest/conctl.c. Linux 5.3 enforces the use of 'fallthrough' which is also suggested by checkpatch Cray-bug-id: LUS-7690 WC-bug-id: https://jira.whamcloud.com/browse/LU-12635 Lustre-commit: ebff8aba3392 ("LU-12635 lnet: Fix style issues for module.c conctl.c") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/35802 Reviewed-by: Petros Koutoupis Reviewed-by: Neil Brown Reviewed-by: Arshad Hussain Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/selftest/conctl.c | 4 ++-- net/lnet/selftest/module.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/lnet/selftest/conctl.c b/net/lnet/selftest/conctl.c index 906d82d..ed9eab9 100644 --- a/net/lnet/selftest/conctl.c +++ b/net/lnet/selftest/conctl.c @@ -121,7 +121,6 @@ return -EINVAL; if (args->lstio_dbg_namep) { - if (copy_from_user(name, args->lstio_dbg_namep, args->lstio_dbg_nmlen)) return -EFAULT; @@ -727,7 +726,8 @@ static int lst_test_add_ioctl(struct lstio_test_args *args) goto out; } - memset(&console_session.ses_trans_stat, 0, sizeof(struct lstcon_trans_stat)); + memset(&console_session.ses_trans_stat, + 0, sizeof(struct lstcon_trans_stat)); switch (opc) { case LSTIO_SESSION_NEW: diff --git a/net/lnet/selftest/module.c b/net/lnet/selftest/module.c index 9ba6532..2de2b59 100644 --- a/net/lnet/selftest/module.c +++ b/net/lnet/selftest/module.c @@ -105,7 +105,7 @@ enum { nscheds = cfs_cpt_number(lnet_cpt_table()); lst_test_wq = kvmalloc_array(nscheds, sizeof(lst_test_wq[0]), - GFP_KERNEL | __GFP_ZERO); + GFP_KERNEL | __GFP_ZERO); if (!lst_test_wq) { rc = -ENOMEM; goto error; From patchwork Thu Feb 27 21:15:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410429 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 370B9138D for ; Thu, 27 Feb 2020 21:38:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1FAF724690 for ; Thu, 27 Feb 2020 21:38:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1FAF724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8698434A367; Thu, 27 Feb 2020 13:31:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0A06821FF5D for ; Thu, 27 Feb 2020 13:20:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 829C98F35; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8116C47C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:13 -0500 Message-Id: <1582838290-17243-446-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 445/622] lustre: ptlrpc: check lm_bufcount and lm_buflen X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu Check lm_bufcount to be used by lustre_msg_hdr_size_v2() and validate individual and total buffer lengths in lustre_unpack_msg_v2() in case of any out-of-bound read. Reported-by: Alibaba Cloud WC-bug-id: https://jira.whamcloud.com/browse/LU-12590 Lustre-commit: 268edb13d769 ("LU-12590 ptlrpc: check lm_bufcount and lm_buflen") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/35783 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Yunye Ry Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 40 ++++++++++++++++++++++++++++++++++++++++ fs/lustre/ptlrpc/pack_generic.c | 29 +++++++++++++++++++++++------ 2 files changed, 63 insertions(+), 6 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index d03e8c6..caf766d 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -238,6 +238,34 @@ * */ +/** + * This is the size of a maximum REINT_SETXATTR request: + * + * lustre_msg 56 (32 + 4 x 5 + 4) + * ptlrpc_body 184 + * mdt_rec_setxattr 136 + * lustre_capa 120 + * name 256 (XATTR_NAME_MAX) + * value 65536 (XATTR_SIZE_MAX) + */ +#define MDS_EA_MAXREQSIZE 66288 + +/** + * These are the maximum request and reply sizes (rounded up to 1 KB + * boundaries) for the "regular" MDS_REQUEST_PORTAL and MDS_REPLY_PORTAL. + */ +#define MDS_REG_MAXREQSIZE (((max(MDS_EA_MAXREQSIZE, \ + MDS_LOV_MAXREQSIZE) + 1023) >> 10) << 10) +#define MDS_REG_MAXREPSIZE MDS_REG_MAXREQSIZE + +/** + * The update request includes all of updates from the create, which might + * include linkea (4K maxim), together with other updates, we set it to 1000K: + * lustre_msg + ptlrpc_body + OUT_UPDATE_BUFFER_SIZE_MAX + */ +#define OUT_MAXREQSIZE (1000 * 1024) +#define OUT_MAXREPSIZE MDS_MAXREPSIZE + /* * LDLM threads constants: * @@ -291,6 +319,12 @@ (DT_MAX_BRW_PAGES - 1))) /** + * MDS incoming request with LOV EA + * 24 = sizeof(struct lov_ost_data), i.e: replay of opencreate + */ +#define MDS_LOV_MAXREQSIZE max(MDS_MAXREQSIZE, \ + 362 + LOV_MAX_STRIPE_COUNT * 24) +/** * FIEMAP request can be 4K+ for now */ #define OST_MAXREQSIZE (16UL * 1024UL) @@ -2017,6 +2051,12 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, * * @{ */ +#define PTLRPC_MAX_BUFCOUNT \ + (sizeof(((struct ptlrpc_request *)0)->rq_req_swab_mask) * 8) +#define MD_MAX_BUFLEN (MDS_REG_MAXREQSIZE > OUT_MAXREQSIZE ? \ + MDS_REG_MAXREQSIZE : OUT_MAXREQSIZE) +#define PTLRPC_MAX_BUFLEN (OST_IO_MAXREQSIZE > MD_MAX_BUFLEN ? \ + OST_IO_MAXREQSIZE : MD_MAX_BUFLEN) bool ptlrpc_buf_need_swab(struct ptlrpc_request *req, const int inout, u32 index); void ptlrpc_buf_set_swabbed(struct ptlrpc_request *req, const int inout, diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index e63720b..4a0856a 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -60,6 +60,8 @@ static inline u32 lustre_msg_hdr_size_v2(u32 count) u32 lustre_msg_hdr_size(u32 magic, u32 count) { + LASSERT(count > 0); + switch (magic) { case LUSTRE_MSG_MAGIC_V2: return lustre_msg_hdr_size_v2(count); @@ -102,6 +104,7 @@ u32 lustre_msg_size_v2(int count, u32 *lengths) u32 size; int i; + LASSERT(count > 0); size = lustre_msg_hdr_size_v2(count); for (i = 0; i < count; i++) size += cfs_size_round(lengths[i]); @@ -159,6 +162,8 @@ void lustre_init_msg_v2(struct lustre_msg_v2 *msg, int count, u32 *lens, char *ptr; int i; + LASSERT(count > 0); + msg->lm_bufcount = count; /* XXX: lm_secflvr uninitialized here */ msg->lm_magic = LUSTRE_MSG_MAGIC_V2; @@ -291,6 +296,7 @@ int lustre_pack_reply_v2(struct ptlrpc_request *req, int count, int msg_len, rc; LASSERT(!req->rq_reply_state); + LASSERT(count > 0); if ((flags & LPRFL_EARLY_REPLY) == 0) { spin_lock(&req->rq_lock); @@ -366,6 +372,9 @@ void *lustre_msg_buf_v2(struct lustre_msg_v2 *m, u32 n, u32 min_size) { u32 i, offset, buflen, bufcount; + LASSERT(m); + LASSERT(m->lm_bufcount > 0); + bufcount = m->lm_bufcount; if (unlikely(n >= bufcount)) { CDEBUG(D_INFO, "msg %p buffer[%d] not present (count %d)\n", @@ -479,7 +488,7 @@ void lustre_free_reply_state(struct ptlrpc_reply_state *rs) static int lustre_unpack_msg_v2(struct lustre_msg_v2 *m, int len) { - int swabbed, required_len, i; + int swabbed, required_len, i, buflen; /* Now we know the sender speaks my language. */ required_len = lustre_msg_hdr_size_v2(0); @@ -502,6 +511,10 @@ static int lustre_unpack_msg_v2(struct lustre_msg_v2 *m, int len) BUILD_BUG_ON(offsetof(typeof(*m), lm_padding_3) == 0); } + if (m->lm_bufcount == 0 || m->lm_bufcount > PTLRPC_MAX_BUFCOUNT) { + CERROR("message bufcount %d is not valid\n", m->lm_bufcount); + return -EINVAL; + } required_len = lustre_msg_hdr_size_v2(m->lm_bufcount); if (len < required_len) { /* didn't receive all the buffer lengths */ @@ -513,12 +526,16 @@ static int lustre_unpack_msg_v2(struct lustre_msg_v2 *m, int len) for (i = 0; i < m->lm_bufcount; i++) { if (swabbed) __swab32s(&m->lm_buflens[i]); - required_len += cfs_size_round(m->lm_buflens[i]); + buflen = cfs_size_round(m->lm_buflens[i]); + if (buflen < 0 || buflen > PTLRPC_MAX_BUFLEN) { + CERROR("buffer %d length %d is not valid\n", i, buflen); + return -EINVAL; + } + required_len += buflen; } - - if (len < required_len) { - CERROR("len: %d, required_len %d\n", len, required_len); - CERROR("bufcount: %d\n", m->lm_bufcount); + if (len < required_len || required_len > PTLRPC_MAX_BUFLEN) { + CERROR("len: %d, required_len %d, bufcount: %d\n", + len, required_len, m->lm_bufcount); for (i = 0; i < m->lm_bufcount; i++) CERROR("buffer %d length %d\n", i, m->lm_buflens[i]); return -EINVAL; From patchwork Thu Feb 27 21:15:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDD0E138D for ; Thu, 27 Feb 2020 21:40:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B566224690 for ; Thu, 27 Feb 2020 21:40:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B566224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 80C8D34921B; Thu, 27 Feb 2020 13:33:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5FDCB21FEDE for ; Thu, 27 Feb 2020 13:20:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 855768F36; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 83C8E468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:14 -0500 Message-Id: <1582838290-17243-447-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 446/622] lustre: uapi: Remove unused CONNECT flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The plain layout connect flag was added as part of an earlier implementation of LU-11213, but the design was improved before landing and the flag was not needed. Let's remove it. Since it was never actually marked as supported in any client/server version, we can just remove it entirely, leaving the flag bit open for future use. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 11eba11fe045 ("LU-11213 uapi: Remove unused CONNECT flag") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/36008 Reviewed-by: Shilong Wang Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 -- include/uapi/linux/lustre/lustre_idl.h | 1 - 2 files changed, 3 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 9298c97..c0b4ad9 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1158,8 +1158,6 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LSOM); LASSERTF(OBD_CONNECT2_PCC == 0x1000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_PCC); - LASSERTF(OBD_CONNECT2_PLAIN_LAYOUT == 0x2000ULL, "found 0x%.16llxULL\n", - OBD_CONNECT2_PLAIN_LAYOUT); LASSERTF(OBD_CONNECT2_ASYNC_DISCARD == 0x4000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ASYNC_DISCARD); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 87251ee..47321ae 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -810,7 +810,6 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ #define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ -#define OBD_CONNECT2_PLAIN_LAYOUT 0x2000ULL /* Plain Directory Layout */ #define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* support async DoM data * discard */ From patchwork Thu Feb 27 21:15:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410433 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 586DA92A for ; Thu, 27 Feb 2020 21:38:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40F8624690 for ; Thu, 27 Feb 2020 21:38:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40F8624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F9523497C9; Thu, 27 Feb 2020 13:31:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A34FC21FEDE for ; Thu, 27 Feb 2020 13:20:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 87E9E8F37; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 867C346A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:15 -0500 Message-Id: <1582838290-17243-448-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 447/622] lustre: lmv: disable remote file statahead X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Remote file statahead is not supported, because such file needs two RPCs to fetch both LOOKUP and GETATTR lock, on LOOKUP success we only know file FID, thus can't prepare an inode correctly. Disable this to avoid noise messages and confusion. Update sanity.sh test_60g. WC-bug-id: https://jira.whamcloud.com/browse/LU-11681 Lustre-commit: 02b5a407081c ("LU-11681 lmv: disable remote file statahead") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/33930 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index d323250..26021bb 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3416,25 +3416,28 @@ static int lmv_intent_getattr_async(struct obd_export *exp, struct md_op_data *op_data = &minfo->mi_data; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_tgt_desc *tgt = NULL; + struct lmv_tgt_desc *ptgt = NULL; + struct lmv_tgt_desc *ctgt; if (!fid_is_sane(&op_data->op_fid2)) return -EINVAL; - tgt = lmv_find_target(lmv, &op_data->op_fid1); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); + ptgt = lmv_locate_tgt(lmv, op_data); + if (IS_ERR(ptgt)) + return PTR_ERR(ptgt); + + ctgt = lmv_find_target(lmv, &op_data->op_fid2); + if (IS_ERR(ctgt)) + return PTR_ERR(ctgt); /* - * no special handle for remote dir, which needs to fetch both LOOKUP - * lock on parent, and then UPDATE lock on child MDT, which makes all - * complicated because this is done async. So only LOOKUP lock is - * fetched for remote dir, but considering remote dir is rare case, - * and not supporting it in statahead won't cause any issue, just leave - * it as is. + * remote object needs two RPCs to lookup and getattr, considering the + * complexity don't support statahead for now. */ + if (ctgt != ptgt) + return -EREMOTE; - return md_intent_getattr_async(tgt->ltd_exp, minfo); + return md_intent_getattr_async(ptgt->ltd_exp, minfo); } static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, From patchwork Thu Feb 27 21:15:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410545 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6FD4A924 for ; Thu, 27 Feb 2020 21:40:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 58B7924690 for ; Thu, 27 Feb 2020 21:40:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 58B7924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6014934A882; Thu, 27 Feb 2020 13:33:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E465E21E093 for ; Thu, 27 Feb 2020 13:20:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8B5628F38; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 892DD46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:16 -0500 Message-Id: <1582838290-17243-449-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 448/622] lustre: llite: Fix page count for unaligned reads X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When a read is unaligned on both the first and last page, the calculation used to determine count of pages for readahead misses that we access both of those pages. Increase the calculated count by 1 in this case. This case is covered by the generic readahead tests added in LU-12645. WC-bug-id: https://jira.whamcloud.com/browse/LU-12367 Lustre-commit: d4a54de84c05 ("LU-12367 llite: Fix page count for unaligned reads") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35015 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/vvp_io.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 847fb5e..e676e62 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -778,6 +778,14 @@ static int vvp_io_read_start(const struct lu_env *env, vio->vui_ra_valid = true; vio->vui_ra_start = cl_index(obj, pos); vio->vui_ra_count = cl_index(obj, tot + PAGE_SIZE - 1); + /* If both start and end are unaligned, we read one more page + * than the index math suggests. + */ + if (pos % PAGE_SIZE != 0 && (pos + tot) % PAGE_SIZE != 0) + vio->vui_ra_count++; + + CDEBUG(D_READA, "tot %ld, ra_start %lu, ra_count %lu\n", tot, + vio->vui_ra_start, vio->vui_ra_count); } /* BUG: 5972 */ From patchwork Thu Feb 27 21:15:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BB8492A for ; Thu, 27 Feb 2020 21:38:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 746CD24690 for ; Thu, 27 Feb 2020 21:38:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 746CD24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C2E6D3497FC; Thu, 27 Feb 2020 13:31:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 30DAA21FF6A for ; Thu, 27 Feb 2020 13:20:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8DBB98F39; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8C03A46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:17 -0500 Message-Id: <1582838290-17243-450-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 449/622] lnet: discovery off route state update X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When discovery is off rely on the discovery ping response only, rather than the internal peer database to determine route state. With discovery off the internal peer database is not updated with all the gateway's interfaces. WC-bug-id: https://jira.whamcloud.com/browse/LU-12422 Lustre-commit: e35be987da57 ("LU-12422 lnet: discovery off route state update") Signed-off-by: Amir Shehata Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35199 Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/linux/lnet/lib-types.h | 4 +- net/lnet/lnet/peer.c | 8 +++ net/lnet/lnet/router.c | 134 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 146 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index b889af2..f2f5455 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -758,6 +758,7 @@ int lnet_sock_connect(struct socket **sockp, int *fatal, void lnet_consolidate_routes_locked(struct lnet_peer *orig_lp, struct lnet_peer *new_lp); void lnet_router_discovery_complete(struct lnet_peer *lp); +void lnet_router_discovery_ping_reply(struct lnet_peer *lp); int lnet_monitor_thr_start(void); void lnet_monitor_thr_stop(void); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 3f81928..22c2bc6 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -611,7 +611,7 @@ struct lnet_peer { /* number of NIDs on this peer */ int lp_nnis; - /* # refs from lnet_route_t::lr_gateway */ + /* # refs from lnet_route::lr_gateway */ int lp_rtr_refcount; /* @@ -822,6 +822,8 @@ struct lnet_route { u32 lr_hops; /* route priority */ unsigned int lr_priority; + /* cached route aliveness */ + bool lr_alive; }; #define LNET_REMOTE_NETS_HASH_DEFAULT (1U << 7) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 49da7a1..088bb62 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2398,6 +2398,14 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) out: lp->lp_state &= ~LNET_PEER_PING_SENT; spin_unlock(&lp->lp_lock); + + lnet_net_lock(LNET_LOCK_EX); + /* If this peer is a gateway, call the routing callback to + * handle the ping reply + */ + if (lp->lp_rtr_refcount > 0) + lnet_router_discovery_ping_reply(lp); + lnet_net_unlock(LNET_LOCK_EX); } /* diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 4ab587d..bc9494d 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -221,6 +221,15 @@ bool lnet_is_route_alive(struct lnet_route *route) struct lnet_peer_net *rlpn; bool route_alive; + /* if discovery is disabled then rely on the cached aliveness + * information. This is handicapped information which we log when + * we receive the discovery ping response. The most uptodate + * aliveness information can only be obtained when discovery is + * enabled. + */ + if (lnet_peer_discovery_disabled) + return route->lr_alive; + /* check the gateway's interfaces on the route rnet to make sure * that the gateway is viable. */ @@ -279,10 +288,125 @@ bool lnet_is_route_alive(struct lnet_route *route) } } +static inline void +lnet_set_route_aliveness(struct lnet_route *route, bool alive) +{ + /* Log when there's a state change */ + if (route->lr_alive != alive) { + CERROR("route to %s through %s has gone from %s to %s\n", + libcfs_net2str(route->lr_net), + libcfs_nid2str(route->lr_gateway->lp_primary_nid), + (route->lr_alive) ? "up" : "down", + alive ? "up" : "down"); + route->lr_alive = alive; + } +} + +void +lnet_router_discovery_ping_reply(struct lnet_peer *lp) +{ + struct lnet_ping_buffer *pbuf = lp->lp_data; + struct lnet_remotenet *rnet; + struct lnet_peer_net *llpn; + struct lnet_route *route; + bool net_up = false; + unsigned int lp_state; + u32 net, net2; + int i, j; + + spin_lock(&lp->lp_lock); + lp_state = lp->lp_state; + spin_unlock(&lp->lp_lock); + + /* only handle replies if discovery is disabled. */ + if (!lnet_peer_discovery_disabled) + return; + + if (lp_state & LNET_PEER_PING_FAILED) { + CDEBUG(D_NET, + "Ping failed with %d. Set routes down for gw %s\n", + lp->lp_ping_error, libcfs_nid2str(lp->lp_primary_nid)); + /* If the ping failed then mark the routes served by this + * peer down + */ + list_for_each_entry(route, &lp->lp_routes, lr_gwlist) + lnet_set_route_aliveness(route, false); + return; + } + + CDEBUG(D_NET, "Discovery is disabled. Processing reply for gw: %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + + /* examine the ping response: + * For each NID in the ping response, extract the net + * if the net exists on our remote net list then + * iterate over the routes on the rnet and if: + * The route's local net is healthy and + * The remote net status is UP, then mark the route up + * otherwise mark the route down + */ + for (i = 1; i < pbuf->pb_info.pi_nnis; i++) { + net = LNET_NIDNET(pbuf->pb_info.pi_ni[i].ns_nid); + rnet = lnet_find_rnet_locked(net); + if (!rnet) + continue; + list_for_each_entry(route, &rnet->lrn_routes, lr_list) { + /* check if this is the route's gateway */ + if (lp->lp_primary_nid != + route->lr_gateway->lp_primary_nid) + continue; + + /* gateway has the routing feature disabled */ + if (pbuf->pb_info.pi_features & + LNET_PING_FEAT_RTE_DISABLED) { + lnet_set_route_aliveness(route, false); + continue; + } + + llpn = lnet_peer_get_net_locked(lp, route->lr_lnet); + if (!llpn) { + lnet_set_route_aliveness(route, false); + continue; + } + + if (!lnet_is_gateway_net_alive(llpn)) { + lnet_set_route_aliveness(route, false); + continue; + } + + if (avoid_asym_router_failure && + pbuf->pb_info.pi_ni[i].ns_status != + LNET_NI_STATUS_UP) { + net_up = false; + + /* revisit all previous NIDs and check if + * any on the network we're examining is + * up. If at least one is up then we consider + * the route to be alive. + */ + for (j = 1; j < i; j++) { + net2 = LNET_NIDNET(pbuf->pb_info.pi_ni[j].ns_nid); + if (net2 == net && + pbuf->pb_info.pi_ni[j].ns_status == + LNET_NI_STATUS_UP) + net_up = true; + } + if (!net_up) { + lnet_set_route_aliveness(route, false); + continue; + } + } + + lnet_set_route_aliveness(route, true); + } + } +} + void lnet_router_discovery_complete(struct lnet_peer *lp) { struct lnet_peer_ni *lpni = NULL; + struct lnet_route *route; spin_lock(&lp->lp_lock); lp->lp_state &= ~LNET_PEER_RTR_DISCOVERY; @@ -306,6 +430,9 @@ bool lnet_is_route_alive(struct lnet_route *route) libcfs_nid2str(lp->lp_primary_nid), lp->lp_dc_error); while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) lpni->lpni_ns_status = LNET_NI_STATUS_DOWN; + + list_for_each_entry(route, &lp->lp_routes, lr_gwlist) + lnet_set_route_aliveness(route, false); } static void @@ -1431,6 +1558,8 @@ bool lnet_router_checker_active(void) time64_t when) { struct lnet_peer_ni *lpni = NULL; + struct lnet_route *route; + struct lnet_peer *lp; time64_t now = ktime_get_seconds(); int cpt; @@ -1499,6 +1628,11 @@ bool lnet_router_checker_active(void) cpt = lpni->lpni_cpt; lnet_net_lock(cpt); lnet_peer_ni_decref_locked(lpni); + if (lpni && lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer) { + lp = lpni->lpni_peer_net->lpn_peer; + list_for_each_entry(route, &lp->lp_routes, lr_gwlist) + lnet_set_route_aliveness(route, alive); + } lnet_net_unlock(cpt); return 0; From patchwork Thu Feb 27 21:15:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410549 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B617817E0 for ; Thu, 27 Feb 2020 21:41:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9EDE424690 for ; Thu, 27 Feb 2020 21:41:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EDE424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D496634A8AF; Thu, 27 Feb 2020 13:33:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8CCAA21FF6A for ; Thu, 27 Feb 2020 13:20:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 907F58F3A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F26747C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:18 -0500 Message-Id: <1582838290-17243-451-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 450/622] lustre: llite: prevent mulitple group locks X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko The patch adds mutex for group lock enqueue. It also adds waiting of group lock users on a client side for a same node. This prevents mulitple locks on the same resource and fixes a bugs when two locks cover the same dirty pages. The patch adds test sanity 244b. It creates threads which opens file, takes group lock, writes data, puts group lock, closes. It recreates the problem when a client has two or more group locks for a single file. This leads to a wrong behaviour for a flush etc. osc_cache_writeback_range()) ASSERTION( hp == 0 && discard == 0 ) failed One more test for group lock with open file and fork. It checks that child doesn't unlock file until the last close. Cray-bug-id: LUS-7232 WC-bug-id: https://jira.whamcloud.com/browse/LU-9964 Lustre-commit: aba68250a67a ("LU-9964 llite: prevent mulitple group locks") Signed-off-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/35791 Reviewed-by: Patrick Farrell Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_request.c | 3 +- fs/lustre/llite/file.c | 75 ++++++++++++++++++++++++++-------------- fs/lustre/llite/llite_internal.h | 3 ++ fs/lustre/llite/llite_lib.c | 3 ++ fs/lustre/osc/osc_lock.c | 2 ++ 5 files changed, 60 insertions(+), 26 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 75492f6..0dd9fea 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -750,7 +750,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, lock->l_conn_export = exp; lock->l_export = NULL; lock->l_blocking_ast = einfo->ei_cb_bl; - lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL)); + lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL | + LDLM_FL_ATOMIC_CB)); lock->l_activity = ktime_get_real_seconds(); /* lock not sent to server yet */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 6c5b9eb..856aa64 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2075,15 +2075,30 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, if (ll_file_nolock(file)) return -EOPNOTSUPP; - spin_lock(&lli->lli_lock); +retry: + if (file->f_flags & O_NONBLOCK) { + if (!mutex_trylock(&lli->lli_group_mutex)) + return -EAGAIN; + } else + mutex_lock(&lli->lli_group_mutex); + if (fd->fd_flags & LL_FILE_GROUP_LOCKED) { CWARN("group lock already existed with gid %lu\n", fd->fd_grouplock.lg_gid); - spin_unlock(&lli->lli_lock); - return -EINVAL; + rc = -EINVAL; + goto out; + } + if (arg != lli->lli_group_gid && lli->lli_group_users != 0) { + if (file->f_flags & O_NONBLOCK) { + rc = -EAGAIN; + goto out; + } + mutex_unlock(&lli->lli_group_mutex); + wait_var_event(&lli->lli_group_users, !lli->lli_group_users); + rc = 0; + goto retry; } LASSERT(!fd->fd_grouplock.lg_lock); - spin_unlock(&lli->lli_lock); /** * XXX: group lock needs to protect all OST objects while PFL @@ -2102,8 +2117,10 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, u16 refcheck; env = cl_env_get(&refcheck); - if (IS_ERR(env)) - return PTR_ERR(env); + if (IS_ERR(env)) { + rc = PTR_ERR(env); + goto out; + } rc = cl_object_layout_get(env, obj, &cl); if (!rc && cl.cl_is_composite) @@ -2112,28 +2129,26 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, cl_env_put(env, &refcheck); if (rc) - return rc; + goto out; } rc = cl_get_grouplock(ll_i2info(inode)->lli_clob, arg, (file->f_flags & O_NONBLOCK), &grouplock); - if (rc) - return rc; - spin_lock(&lli->lli_lock); - if (fd->fd_flags & LL_FILE_GROUP_LOCKED) { - spin_unlock(&lli->lli_lock); - CERROR("another thread just won the race\n"); - cl_put_grouplock(&grouplock); - return -EINVAL; - } + if (rc) + goto out; fd->fd_flags |= LL_FILE_GROUP_LOCKED; fd->fd_grouplock = grouplock; - spin_unlock(&lli->lli_lock); + if (lli->lli_group_users == 0) + lli->lli_group_gid = grouplock.lg_gid; + lli->lli_group_users++; CDEBUG(D_INFO, "group lock %lu obtained\n", arg); - return 0; +out: + mutex_unlock(&lli->lli_group_mutex); + + return rc; } static int ll_put_grouplock(struct inode *inode, struct file *file, @@ -2142,30 +2157,40 @@ static int ll_put_grouplock(struct inode *inode, struct file *file, struct ll_inode_info *lli = ll_i2info(inode); struct ll_file_data *fd = LUSTRE_FPRIVATE(file); struct ll_grouplock grouplock; + int rc; - spin_lock(&lli->lli_lock); + mutex_lock(&lli->lli_group_mutex); if (!(fd->fd_flags & LL_FILE_GROUP_LOCKED)) { - spin_unlock(&lli->lli_lock); CWARN("no group lock held\n"); - return -EINVAL; + rc = -EINVAL; + goto out; } LASSERT(fd->fd_grouplock.lg_lock); if (fd->fd_grouplock.lg_gid != arg) { CWARN("group lock %lu doesn't match current id %lu\n", arg, fd->fd_grouplock.lg_gid); - spin_unlock(&lli->lli_lock); - return -EINVAL; + rc = -EINVAL; + goto out; } grouplock = fd->fd_grouplock; memset(&fd->fd_grouplock, 0, sizeof(fd->fd_grouplock)); fd->fd_flags &= ~LL_FILE_GROUP_LOCKED; - spin_unlock(&lli->lli_lock); cl_put_grouplock(&grouplock); + + lli->lli_group_users--; + if (lli->lli_group_users == 0) { + lli->lli_group_gid = 0; + wake_up_var(&lli->lli_group_users); + } CDEBUG(D_INFO, "group lock %lu released\n", arg); - return 0; + rc = 0; +out: + mutex_unlock(&lli->lli_group_mutex); + + return rc; } /** diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 49c0c78..232fb0a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -210,6 +210,9 @@ struct ll_inode_info { struct mutex lli_pcc_lock; enum lu_pcc_state_flags lli_pcc_state; struct pcc_inode *lli_pcc_inode; + struct mutex lli_group_mutex; + u64 lli_group_users; + unsigned long lli_group_gid; }; }; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 86be562..8946dc6 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -983,6 +983,9 @@ void ll_lli_init(struct ll_inode_info *lli) mutex_init(&lli->lli_pcc_lock); lli->lli_pcc_state = PCC_STATE_FL_NONE; lli->lli_pcc_inode = NULL; + mutex_init(&lli->lli_group_mutex); + lli->lli_group_users = 0; + lli->lli_group_gid = 0; } mutex_init(&lli->lli_layout_mutex); memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 33fdc7e7..c748e58 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -1182,6 +1182,8 @@ int osc_lock_init(const struct lu_env *env, oscl->ols_flags = osc_enq2ldlm_flags(enqflags); oscl->ols_speculative = !!(enqflags & CEF_SPECULATIVE); + if (lock->cll_descr.cld_mode == CLM_GROUP) + oscl->ols_flags |= LDLM_FL_ATOMIC_CB; if (oscl->ols_flags & LDLM_FL_HAS_INTENT) { oscl->ols_flags |= LDLM_FL_BLOCK_GRANTED; From patchwork Thu Feb 27 21:15:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6836D138D for ; Thu, 27 Feb 2020 21:38:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 50781246A1 for ; Thu, 27 Feb 2020 21:38:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 50781246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EDEC23496ED; Thu, 27 Feb 2020 13:31:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E391921FF6A for ; Thu, 27 Feb 2020 13:20:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9352D8F3C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 91F83468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:19 -0500 Message-Id: <1582838290-17243-452-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 451/622] lustre: ptlrpc: make DEBUG_REQ messages consistent X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove linefeed from DEBUG_REQ() messages, since this results in debug logs that are split across multiple lines and do not start with the proper timestamp or other standard fields. This makes post-processing difficult. Some error and debug messages are checked for explicitly in tests. Add a comment by those lines in the code to alert the reader that changes to those messages may cause test failures, and make the tests more forgiving in case of minor changes to the formatting. Fix several tests to check for actual error message. Some tests have been broken for so long (1.5/1.8) that there is no point to also check for the old messages, so use only the new messages. The EINPROGRESS messages should not use D_ERROR, since they can be hit under normal usage (e.g. LFSCK running), so D_WARNING at most. Don't print every one to the console, that would be too verbose. Fix code style of affected lines. WC-bug-id: https://jira.whamcloud.com/browse/LU-12368 Lustre-commit: c0fa0ba4a8ef ("LU-12368 ptlrpc: make DEBUG_REQ messages consistent") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35311 Reviewed-by: Ben Evans Reviewed-by: Arshad Hussain Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 +- fs/lustre/ldlm/ldlm_request.c | 4 ++-- fs/lustre/llite/llite_lib.c | 1 + fs/lustre/mdc/mdc_locks.c | 5 ++-- fs/lustre/mdc/mdc_request.c | 11 +++++---- fs/lustre/osc/osc_request.c | 21 ++++++++++------- fs/lustre/ptlrpc/client.c | 53 ++++++++++++++++++++++-------------------- fs/lustre/ptlrpc/events.c | 2 +- fs/lustre/ptlrpc/import.c | 15 +++++++++--- fs/lustre/ptlrpc/layout.c | 4 ++-- fs/lustre/ptlrpc/niobuf.c | 6 ++--- fs/lustre/ptlrpc/ptlrpcd.c | 2 +- fs/lustre/ptlrpc/recover.c | 2 +- fs/lustre/ptlrpc/sec.c | 6 ++++- fs/lustre/ptlrpc/service.c | 18 ++++++++------ net/lnet/libcfs/module.c | 1 + 16 files changed, 90 insertions(+), 63 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index caf766d..f16c6d3 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -2202,7 +2202,7 @@ static inline int ptlrpc_status_ntoh(int n) atomic_dec(&req->rq_import->imp_unregistering); } - DEBUG_REQ(D_INFO, req, "move req \"%s\" -> \"%s\"", + DEBUG_REQ(D_INFO, req, "move request phase from %s to %s", ptlrpc_rqphase2str(req), ptlrpc_phase2str(new_phase)); req->rq_phase = new_phase; diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 0dd9fea..20bdba4 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -777,7 +777,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, } if (*flags & LDLM_FL_NDELAY) { - DEBUG_REQ(D_DLMTRACE, req, "enque lock with no delay\n"); + DEBUG_REQ(D_DLMTRACE, req, "enqueue lock with no delay"); req->rq_no_resend = req->rq_no_delay = 1; /* * probably set a shorter timeout value and handle ETIMEDOUT @@ -1248,7 +1248,7 @@ int ldlm_cli_update_pool(struct ptlrpc_request *req) if (lustre_msg_get_slv(req->rq_repmsg) == 0 || lustre_msg_get_limit(req->rq_repmsg) == 0) { DEBUG_REQ(D_HA, req, - "Zero SLV or Limit found (SLV: %llu, Limit: %u)", + "Zero SLV or limit found (SLV=%llu, limit=%u)", lustre_msg_get_slv(req->rq_repmsg), lustre_msg_get_limit(req->rq_repmsg)); return 0; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 8946dc6..217268e 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2731,6 +2731,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret) path = dentry_path_raw(dentry, buf, PAGE_SIZE); } + /* The below message is checked in recovery-small.sh test_24b */ CDEBUG(D_WARNING, "%s: dirty page discard: %s/fid: " DFID "/%s may get corrupted (rc %d)\n", ll_i2sbi(inode)->ll_fsname, diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 5885bbd..b91c162 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -198,7 +198,8 @@ static inline void mdc_clear_replay_flag(struct ptlrpc_request *req, int rc) spin_unlock(&req->rq_lock); } if (rc && req->rq_transno != 0) { - DEBUG_REQ(D_ERROR, req, "transno returned on error rc %d", rc); + DEBUG_REQ(D_ERROR, req, "transno returned on error: rc = %d", + rc); LBUG(); } } @@ -710,7 +711,7 @@ static int mdc_finish_enqueue(struct obd_export *exp, (!it_disposition(it, DISP_OPEN_OPEN) || it->it_status != 0)) mdc_clear_replay_flag(req, it->it_status); - DEBUG_REQ(D_RPCTRACE, req, "op: %x disposition: %x, status: %d", + DEBUG_REQ(D_RPCTRACE, req, "op=%x disposition=%x, status=%d", it->it_op, it->it_disposition, it->it_status); /* We know what to expect, so we do any byte flipping required here */ diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 162ace7..34cf177 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -450,6 +450,7 @@ static int mdc_getxattr(struct obd_export *exp, const struct lu_fid *fid, LASSERT(obd_md_valid == OBD_MD_FLXATTR || obd_md_valid == OBD_MD_FLXATTRLS); + /* The below message is checked in sanity-selinux.sh test_20d */ CDEBUG(D_INFO, "%s: get xattr '%s' for " DFID "\n", exp->exp_obd->obd_name, name, PFID(fid)); rc = mdc_xattr_common(exp, &RQF_MDS_GETXATTR, fid, MDS_GETXATTR, @@ -695,7 +696,7 @@ void mdc_replay_open(struct ptlrpc_request *req) if (!mod) { DEBUG_REQ(D_ERROR, req, - "Can't properly replay without open data."); + "cannot properly replay without open data"); return; } @@ -794,7 +795,7 @@ int mdc_set_open_replay_data(struct obd_export *exp, mod = obd_mod_alloc(); if (!mod) { DEBUG_REQ(D_ERROR, open_req, - "Can't allocate md_open_data"); + "cannot allocate md_open_data"); return 0; } @@ -848,7 +849,7 @@ static void mdc_free_open(struct md_open_data *mod) * The worst thing is eviction if the client gets open lock */ DEBUG_REQ(D_RPCTRACE, mod->mod_open_req, - "free open request rq_replay = %d\n", + "free open request, rq_replay=%d\n", mod->mod_open_req->rq_replay); ptlrpc_request_committed(mod->mod_open_req, committed); @@ -993,7 +994,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data, mdc_put_mod_rpc_slot(req, NULL); if (!req->rq_repmsg) { - CDEBUG(D_RPCTRACE, "request failed to send: %p, %d\n", req, + CDEBUG(D_RPCTRACE, "request %p failed to send: rc = %d\n", req, req->rq_status); if (rc == 0) rc = req->rq_status ?: -EIO; @@ -1003,7 +1004,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data, rc = lustre_msg_get_status(req->rq_repmsg); if (lustre_msg_get_type(req->rq_repmsg) == PTL_RPC_MSG_ERR) { DEBUG_REQ(D_ERROR, req, - "type == PTL_RPC_MSG_ERR, err = %d", rc); + "type = PTL_RPC_MSG_ERR: rc = %d", rc); if (rc > 0) rc = -rc; } diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 6b066e5..75e0823 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1735,14 +1735,14 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) u32 client_cksum = 0; if (rc < 0 && rc != -EDQUOT) { - DEBUG_REQ(D_INFO, req, "Failed request with rc = %d\n", rc); + DEBUG_REQ(D_INFO, req, "Failed request: rc = %d", rc); return rc; } LASSERTF(req->rq_repmsg, "rc = %d\n", rc); body = req_capsule_server_get(&req->rq_pill, &RMF_OST_BODY); if (!body) { - DEBUG_REQ(D_INFO, req, "Can't unpack body\n"); + DEBUG_REQ(D_INFO, req, "cannot unpack body"); return -EPROTO; } @@ -1770,7 +1770,8 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE) { if (rc > 0) { - CERROR("Unexpected +ve rc %d\n", rc); + CERROR("%s: unexpected positive size %d\n", + obd_name, rc); return -EPROTO; } @@ -1805,13 +1806,13 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) } if (rc > aa->aa_requested_nob) { - CERROR("Unexpected rc %d (%d requested)\n", rc, - aa->aa_requested_nob); + CERROR("%s: unexpected size %d, requested %d\n", obd_name, + rc, aa->aa_requested_nob); return -EPROTO; } if (req->rq_bulk && rc != req->rq_bulk->bd_nob_transferred) { - CERROR("Unexpected rc %d (%d transferred)\n", + CERROR("%s: unexpected size %d, transferred %d\n", obd_name, rc, req->rq_bulk->bd_nob_transferred); return -EPROTO; } @@ -1916,8 +1917,9 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc) cksum_missed++; if ((cksum_missed & (-cksum_missed)) == cksum_missed) - CERROR("Checksum %u requested from %s but not sent\n", - cksum_missed, libcfs_nid2str(peer->nid)); + CERROR("%s: checksum %u requested from %s but not sent\n", + obd_name, cksum_missed, + libcfs_nid2str(peer->nid)); } else { rc = 0; } @@ -1936,6 +1938,7 @@ static int osc_brw_redo_request(struct ptlrpc_request *request, struct osc_brw_async_args *new_aa; struct osc_async_page *oap; + /* The below message is checked in replay-ost-single.sh test_8ae*/ DEBUG_REQ(rc == -EINPROGRESS ? D_RPCTRACE : D_ERROR, request, "redo for recoverable error %d", rc); @@ -2346,7 +2349,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli, } spin_unlock(&cli->cl_loi_list_lock); - DEBUG_REQ(D_INODE, req, "%d pages, aa %p. now %ur/%dw in flight", + DEBUG_REQ(D_INODE, req, "%d pages, aa %p, now %ur/%dw in flight", page_count, aa, cli->cl_r_in_flight, cli->cl_w_in_flight); OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_DELAY_IO, cfs_fail_val); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index c750a4e..d2e5e04 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -424,14 +424,16 @@ static int unpack_reply(struct ptlrpc_request *req) if (SPTLRPC_FLVR_POLICY(req->rq_flvr.sf_rpc) != SPTLRPC_POLICY_NULL) { rc = ptlrpc_unpack_rep_msg(req, req->rq_replen); if (rc) { - DEBUG_REQ(D_ERROR, req, "unpack_rep failed: %d", rc); + DEBUG_REQ(D_ERROR, req, "unpack_rep failed: rc = %d", + rc); return -EPROTO; } } rc = lustre_unpack_rep_ptlrpc_body(req, MSG_PTLRPC_BODY_OFF); if (rc) { - DEBUG_REQ(D_ERROR, req, "unpack ptlrpc body failed: %d", rc); + DEBUG_REQ(D_ERROR, req, "unpack ptlrpc body failed: rc = %d", + rc); return -EPROTO; } return 0; @@ -491,6 +493,8 @@ static int ptlrpc_at_recv_early_reply(struct ptlrpc_request *req) req->rq_deadline = req->rq_sent + req->rq_timeout + ptlrpc_at_get_net_latency(req); + /* The below message is checked in replay-single.sh test_65{a,b} */ + /* The below message is checked in sanity-{gss,krb5} test_8 */ DEBUG_REQ(D_ADAPTTO, req, "Early reply #%d, new deadline in %lds (%lds)", req->rq_early_count, @@ -1163,18 +1167,18 @@ static int ptlrpc_import_delay_req(struct obd_import *imp, if (req->rq_ctx_init || req->rq_ctx_fini) { /* always allow ctx init/fini rpc go through */ } else if (imp->imp_state == LUSTRE_IMP_NEW) { - DEBUG_REQ(D_ERROR, req, "Uninitialized import."); + DEBUG_REQ(D_ERROR, req, "Uninitialized import"); *status = -EIO; } else if (imp->imp_state == LUSTRE_IMP_CLOSED) { /* pings may safely race with umount */ DEBUG_REQ(lustre_msg_get_opc(req->rq_reqmsg) == OBD_PING ? - D_HA : D_ERROR, req, "IMP_CLOSED "); + D_HA : D_ERROR, req, "IMP_CLOSED"); *status = -EIO; } else if (ptlrpc_send_limit_expired(req)) { /* probably doesn't need to be a D_ERROR after initial * testing */ - DEBUG_REQ(D_HA, req, "send limit expired "); + DEBUG_REQ(D_HA, req, "send limit expired"); *status = -ETIMEDOUT; } else if (req->rq_send_state == LUSTRE_IMP_CONNECTING && imp->imp_state == LUSTRE_IMP_CONNECTING) { @@ -1204,7 +1208,7 @@ static int ptlrpc_import_delay_req(struct obd_import *imp, imp->imp_state == LUSTRE_IMP_REPLAY_LOCKS || imp->imp_state == LUSTRE_IMP_REPLAY_WAIT || imp->imp_state == LUSTRE_IMP_RECOVER)) { - DEBUG_REQ(D_HA, req, "allow during recovery.\n"); + DEBUG_REQ(D_HA, req, "allow during recovery"); } else { delay = 1; } @@ -1258,9 +1262,9 @@ static bool ptlrpc_console_allow(struct ptlrpc_request *req) */ static int ptlrpc_check_status(struct ptlrpc_request *req) { - int err; + int rc; - err = lustre_msg_get_status(req->rq_repmsg); + rc = lustre_msg_get_status(req->rq_repmsg); if (lustre_msg_get_type(req->rq_repmsg) == PTL_RPC_MSG_ERR) { struct obd_import *imp = req->rq_import; lnet_nid_t nid = imp->imp_connection->c_peer.nid; @@ -1268,22 +1272,19 @@ static int ptlrpc_check_status(struct ptlrpc_request *req) /* -EAGAIN is normal when using POSIX flocks */ if (ptlrpc_console_allow(req) && - !(opc == LDLM_ENQUEUE && err == -EAGAIN)) + !(opc == LDLM_ENQUEUE && rc == -EAGAIN)) LCONSOLE_ERROR_MSG(0x011, "%s: operation %s to node %s failed: rc = %d\n", imp->imp_obd->obd_name, ll_opcode2str(opc), - libcfs_nid2str(nid), err); - return err < 0 ? err : -EINVAL; + libcfs_nid2str(nid), rc); + return rc < 0 ? rc : -EINVAL; } - if (err < 0) - DEBUG_REQ(D_INFO, req, "status is %d", err); - else if (err > 0) - /* XXX: translate this error from net to host */ - DEBUG_REQ(D_INFO, req, "status is %d", err); + if (rc) + DEBUG_REQ(D_INFO, req, "check status: rc = %d", rc); - return err; + return rc; } /** @@ -1347,7 +1348,7 @@ static int after_reply(struct ptlrpc_request *req) if (req->rq_reply_truncated) { if (ptlrpc_no_resend(req)) { DEBUG_REQ(D_ERROR, req, - "reply buffer overflow, expected: %d, actual size: %d", + "reply buffer overflow, expected=%d, actual size=%d", req->rq_nob_received, req->rq_repbuf_len); return -EOVERFLOW; } @@ -1375,7 +1376,7 @@ static int after_reply(struct ptlrpc_request *req) */ rc = sptlrpc_cli_unwrap_reply(req); if (rc) { - DEBUG_REQ(D_ERROR, req, "unwrap reply failed (%d):", rc); + DEBUG_REQ(D_ERROR, req, "unwrap reply failed: rc = %d", rc); return rc; } @@ -1392,7 +1393,8 @@ static int after_reply(struct ptlrpc_request *req) ptlrpc_no_resend(req) == 0 && !req->rq_no_retry_einprogress) { time64_t now = ktime_get_real_seconds(); - DEBUG_REQ(D_RPCTRACE, req, "Resending request on EINPROGRESS"); + DEBUG_REQ((req->rq_nr_resend % 8 == 1 ? D_WARNING : 0) | + D_RPCTRACE, req, "resending request on EINPROGRESS"); spin_lock(&req->rq_lock); req->rq_resend = 1; spin_unlock(&req->rq_lock); @@ -1634,7 +1636,8 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req) return rc; } if (rc) { - DEBUG_REQ(D_HA, req, "send failed (%d); expect timeout", rc); + DEBUG_REQ(D_HA, req, "send failed, expect timeout: rc = %d", + rc); spin_lock(&req->rq_lock); req->rq_net_err = 1; spin_unlock(&req->rq_lock); @@ -2875,7 +2878,7 @@ static int ptlrpc_replay_interpret(const struct lu_env *env, if (!ptlrpc_client_replied(req) || (req->rq_bulk && lustre_msg_get_status(req->rq_repmsg) == -ETIMEDOUT)) { - DEBUG_REQ(D_ERROR, req, "request replay timed out.\n"); + DEBUG_REQ(D_ERROR, req, "request replay timed out"); rc = -ETIMEDOUT; goto out; } @@ -2890,7 +2893,7 @@ static int ptlrpc_replay_interpret(const struct lu_env *env, /** VBR: check version failure */ if (lustre_msg_get_status(req->rq_repmsg) == -EOVERFLOW) { /** replay was failed due to version mismatch */ - DEBUG_REQ(D_WARNING, req, "Version mismatch during replay\n"); + DEBUG_REQ(D_WARNING, req, "Version mismatch during replay"); spin_lock(&imp->imp_lock); imp->imp_vbr_failed = 1; spin_unlock(&imp->imp_lock); @@ -2913,14 +2916,14 @@ static int ptlrpc_replay_interpret(const struct lu_env *env, /* transaction number shouldn't be bigger than the latest replayed */ if (req->rq_transno > lustre_msg_get_transno(req->rq_reqmsg)) { DEBUG_REQ(D_ERROR, req, - "Reported transno %llu is bigger than the replayed one: %llu", + "Reported transno=%llu is bigger than replayed=%llu", req->rq_transno, lustre_msg_get_transno(req->rq_reqmsg)); rc = -EINVAL; goto out; } - DEBUG_REQ(D_HA, req, "got rep"); + DEBUG_REQ(D_HA, req, "got reply"); /* let the callback do fixups, possibly including in the request */ if (req->rq_replay_cb) diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c index 87c0ab7..e6a49db 100644 --- a/fs/lustre/ptlrpc/events.c +++ b/fs/lustre/ptlrpc/events.c @@ -132,7 +132,7 @@ void reply_in_callback(struct lnet_event *ev) ((lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT))) { /* Early reply */ DEBUG_REQ(D_ADAPTTO, req, - "Early reply received: mlen=%u offset=%d replen=%d replied=%d unlinked=%d", + "Early reply received, mlen=%u offset=%d replen=%d replied=%d unlinked=%d", ev->mlength, ev->offset, req->rq_replen, req->rq_replied, ev->unlinked); diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index a6d0b32..ff1b810 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -567,6 +567,7 @@ static int import_select_connection(struct obd_import *imp) imp->imp_conn_current = imp_conn; } + /* The below message is checked in conf-sanity.sh test_35[ab] */ CDEBUG(D_HA, "%s: import %p using connection %s/%s\n", imp->imp_obd->obd_name, imp, imp_conn->oic_uuid.uuid, libcfs_nid2str(imp_conn->oic_conn->c_peer.nid)); @@ -1221,10 +1222,18 @@ static int ptlrpc_connect_interpret(const struct lu_env *env, if (lustre_msg_get_last_committed(request->rq_repmsg) > 0 && lustre_msg_get_last_committed(request->rq_repmsg) < - aa->pcaa_peer_committed) - CERROR("%s went back in time (transno %lld was previously committed, server now claims %lld)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646\n", + aa->pcaa_peer_committed) { + static bool printed; + + /* The below message is checked in recovery-small.sh test_54 */ + CERROR("%s: went back in time (transno %lld was previously committed, server now claims %lld)!\n", obd2cli_tgt(imp->imp_obd), aa->pcaa_peer_committed, lustre_msg_get_last_committed(request->rq_repmsg)); + if (!printed) { + CERROR("For further information, see http://doc.lustre.org/lustre_manual.xhtml#went_back_in_time\n"); + printed = true; + } + } finish: ptlrpc_prepare_replay(imp); @@ -1668,7 +1677,7 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, struct obd_import *imp = req->rq_import; int connect = 0; - DEBUG_REQ(D_HA, req, "inflight=%d, refcount=%d: rc = %d ", + DEBUG_REQ(D_HA, req, "inflight=%d, refcount=%d: rc = %d", atomic_read(&imp->imp_inflight), atomic_read(&imp->imp_refcount), rc); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index fb60558..67a7cd5 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1825,7 +1825,7 @@ int req_capsule_server_pack(struct req_capsule *pill) pill->rc_area[RCL_SERVER], NULL); if (rc != 0) { DEBUG_REQ(D_ERROR, pill->rc_req, - "Cannot pack %d fields in format `%s': ", + "Cannot pack %d fields in format '%s'", count, fmt->rf_name); } return rc; @@ -1988,7 +1988,7 @@ static void *__req_capsule_get(struct req_capsule *pill, if (!value) { DEBUG_REQ(D_ERROR, pill->rc_req, - "Wrong buffer for field `%s' (%u of %u) in format `%s': %u vs. %u (%s)\n", + "Wrong buffer for field '%s' (%u of %u) in format '%s', %u vs. %u (%s)", field->rmf_name, offset, lustre_msg_bufcount(msg), fmt->rf_name, lustre_msg_buflen(msg, offset), len, rcl_names[loc]); diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 9d9e94c..12a9a5e 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -540,7 +540,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) lustre_msg_set_last_xid(request->rq_reqmsg, min_xid); DEBUG_REQ(D_RPCTRACE, request, - "Allocating new xid for resend on EINPROGRESS"); + "Allocating new XID for resend on EINPROGRESS"); } if (request->rq_bulk) { @@ -551,7 +551,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) if (list_empty(&request->rq_unreplied_list) || request->rq_xid <= imp->imp_known_replied_xid) { DEBUG_REQ(D_ERROR, request, - "xid: %llu, replied: %llu, list_empty:%d\n", + "xid=%llu, replied=%llu, list_empty=%d", request->rq_xid, imp->imp_known_replied_xid, list_empty(&request->rq_unreplied_list)); LBUG(); @@ -689,7 +689,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) ptlrpc_pinger_sending_on_import(imp); - DEBUG_REQ(D_INFO, request, "send flg=%x", + DEBUG_REQ(D_INFO, request, "send flags=%x", lustre_msg_get_flags(request->rq_reqmsg)); rc = ptl_send_buf(&request->rq_req_md_h, request->rq_reqbuf, request->rq_reqdata_len, diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c index bcf1e46..1a1fa05 100644 --- a/fs/lustre/ptlrpc/ptlrpcd.c +++ b/fs/lustre/ptlrpc/ptlrpcd.c @@ -256,7 +256,7 @@ void ptlrpcd_add_req(struct ptlrpc_request *req) pc = ptlrpcd_select_pc(req); - DEBUG_REQ(D_INFO, req, "add req [%p] to pc [%s:%d]", + DEBUG_REQ(D_INFO, req, "add req [%p] to pc [%s+%d]", req, pc->pc_name, pc->pc_index); ptlrpc_set_add_new_req(pc, req); diff --git a/fs/lustre/ptlrpc/recover.c b/fs/lustre/ptlrpc/recover.c index e6e6661..09ea3b3 100644 --- a/fs/lustre/ptlrpc/recover.c +++ b/fs/lustre/ptlrpc/recover.c @@ -143,7 +143,7 @@ int ptlrpc_replay_next(struct obd_import *imp, int *inflight) * unreplied list. */ if (req && list_empty(&req->rq_unreplied_list)) { - DEBUG_REQ(D_HA, req, "resend_replay: %d, last_transno: %llu\n", + DEBUG_REQ(D_HA, req, "resend_replay=%d, last_transno=%llu", imp->imp_resend_replay, last_transno); ptlrpc_add_unreplied(req); imp->imp_known_replied_xid = ptlrpc_known_replied_xid(imp); diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c index d82809f..15667454 100644 --- a/fs/lustre/ptlrpc/sec.c +++ b/fs/lustre/ptlrpc/sec.c @@ -1151,7 +1151,7 @@ int sptlrpc_cli_unwrap_early_reply(struct ptlrpc_request *req, rc = do_cli_unwrap_reply(early_req); if (rc) { DEBUG_REQ(D_ADAPTTO, early_req, - "error %d unwrap early reply", rc); + "unwrap early reply: rc = %d", rc); goto err_ctx; } @@ -2037,18 +2037,21 @@ static int sptlrpc_svc_check_from(struct ptlrpc_request *req, int svc_rc) switch (req->rq_sp_from) { case LUSTRE_SP_CLI: if (req->rq_auth_usr_mdt || req->rq_auth_usr_ost) { + /* The below message is checked in sanity-sec test_33 */ DEBUG_REQ(D_ERROR, req, "faked source CLI"); svc_rc = SECSVC_DROP; } break; case LUSTRE_SP_MDT: if (!req->rq_auth_usr_mdt) { + /* The below message is checked in sanity-sec test_33 */ DEBUG_REQ(D_ERROR, req, "faked source MDT"); svc_rc = SECSVC_DROP; } break; case LUSTRE_SP_OST: if (!req->rq_auth_usr_ost) { + /* The below message is checked in sanity-sec test_33 */ DEBUG_REQ(D_ERROR, req, "faked source OST"); svc_rc = SECSVC_DROP; } @@ -2057,6 +2060,7 @@ static int sptlrpc_svc_check_from(struct ptlrpc_request *req, int svc_rc) case LUSTRE_SP_MGC: if (!req->rq_auth_usr_root && !req->rq_auth_usr_mdt && !req->rq_auth_usr_ost) { + /* The below message is checked in sanity-sec test_33 */ DEBUG_REQ(D_ERROR, req, "faked source MGC/MGS"); svc_rc = SECSVC_DROP; } diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index f40cb8d..c66c690 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1072,6 +1072,7 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) return 0; if (olddl < 0) { + /* below message is checked in replay-ost-single.sh test_9 */ DEBUG_REQ(D_WARNING, req, "Already past deadline (%+llds), not sending early reply. Consider increasing at_early_margin (%d)?", (s64)olddl, at_early_margin); @@ -1104,7 +1105,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) * we may be past adaptive_max */ if (req->rq_deadline >= newdl) { - DEBUG_REQ(D_WARNING, req, "Couldn't add any time (%ld/%lld), not sending early reply\n", + DEBUG_REQ(D_WARNING, req, + "Could not add any time (%ld/%lld), not sending early reply", olddl, newdl - ktime_get_real_seconds()); return -ETIMEDOUT; } @@ -1140,10 +1142,10 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) } LASSERT(atomic_read(&req->rq_refcount)); - /** if it is last refcount then early reply isn't needed */ + /* if it is last refcount then early reply isn't needed */ if (atomic_read(&req->rq_refcount) == 1) { DEBUG_REQ(D_ADAPTTO, reqcopy, - "Normal reply already sent out, abort sending early reply\n"); + "Normal reply already sent, abort early reply"); rc = -EINVAL; goto out; } @@ -1174,7 +1176,7 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) req->rq_deadline = newdl; req->rq_early_count++; /* number sent, server side */ } else { - DEBUG_REQ(D_ERROR, req, "Early reply send failed %d", rc); + DEBUG_REQ(D_ERROR, req, "Early reply send failed: rc = %d", rc); } /* @@ -1628,7 +1630,7 @@ static int ptlrpc_server_handle_req_in(struct ptlrpc_service_part *svcpt, rc = sptlrpc_target_export_check(req->rq_export, req); if (rc) DEBUG_REQ(D_ERROR, req, - "DROPPING req with illegal security flavor,"); + "DROPPING req with illegal security flavor"); } if (rc) @@ -1747,7 +1749,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, */ if (ktime_get_real_seconds() > request->rq_deadline) { DEBUG_REQ(D_ERROR, request, - "Dropping timed-out request from %s: deadline %lld:%llds ago\n", + "Dropping timed-out request from %s: deadline %lld/%llds ago", libcfs_id2str(request->rq_peer), request->rq_deadline - request->rq_arrival_time.tv_sec, @@ -1787,7 +1789,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, put_conn: if (unlikely(ktime_get_real_seconds() > request->rq_deadline)) { DEBUG_REQ(D_WARNING, request, - "Request took longer than estimated (%lld:%llds); client may timeout.", + "Request took longer than estimated (%lld/%llds); client may timeout", (s64)request->rq_deadline - request->rq_arrival_time.tv_sec, (s64)ktime_get_real_seconds() - request->rq_deadline); @@ -2061,12 +2063,14 @@ static void ptlrpc_watchdog_fire(struct work_struct *w) u32 ms_frac = do_div(ms_lapse, MSEC_PER_SEC); if (!__ratelimit(&watchdog_limit)) { + /* below message is checked in sanity-quota.sh test_6,18 */ LCONSOLE_WARN("%s: service thread pid %u was inactive for %llu.%.03u seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:\n", thread->t_task->comm, thread->t_task->pid, ms_lapse, ms_frac); libcfs_debug_dumpstack(thread->t_task); } else { + /* below message is checked in sanity-quota.sh test_6,18 */ LCONSOLE_WARN("%s: service thread pid %u was inactive for %llu.%.03u seconds. Watchdog stack traces are limited to 3 per %u seconds, skipping this one.\n", thread->t_task->comm, thread->t_task->pid, ms_lapse, ms_frac, libcfs_watchdog_ratelimit); diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c index 2e803d6..20d4302 100644 --- a/net/lnet/libcfs/module.c +++ b/net/lnet/libcfs/module.c @@ -791,6 +791,7 @@ static void libcfs_exit(void) cfs_cpu_fini(); + /* the below message is checked in test-framework.sh check_mem_leak() */ rc = libcfs_debug_cleanup(); if (rc) pr_err("LustreError: libcfs_debug_cleanup: %d\n", rc); From patchwork Thu Feb 27 21:15:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410553 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08A3217E0 for ; Thu, 27 Feb 2020 21:41:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E590124690 for ; Thu, 27 Feb 2020 21:41:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E590124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D1B5134A8D9; Thu, 27 Feb 2020 13:33:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45FDC21CBAC for ; Thu, 27 Feb 2020 13:20:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 960408F3D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 94BE746A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:20 -0500 Message-Id: <1582838290-17243-453-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 452/622] lustre: ptlrpc: check buffer length in lustre_msg_string() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Emoly Liu Check buffer length in lustre_msg_string() in case of any invalid access. Reported-by: Alibaba Cloud WC-bug-id: https://jira.whamcloud.com/browse/LU-12613 Lustre-commit: 728c58d60fae ("LU-12613 ptlrpc: check buffer length in lustre_msg_string()") Signed-off-by: Emoly Liu Reviewed-on: https://review.whamcloud.com/35932 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Yunye Ry Signed-off-by: James Simmons --- fs/lustre/ptlrpc/pack_generic.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 4a0856a..9b28624 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -712,6 +712,11 @@ char *lustre_msg_string(struct lustre_msg *m, u32 index, u32 max_len) m, index, blen); return NULL; } + if (blen > PTLRPC_MAX_BUFLEN) { + CERROR("buffer length of msg %p buffer[%d] is invalid(%d)\n", + m, index, blen); + return NULL; + } if (max_len == 0) { if (slen != blen - 1) { From patchwork Thu Feb 27 21:15:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19D6017E0 for ; Thu, 27 Feb 2020 21:41:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0295024690 for ; Thu, 27 Feb 2020 21:41:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0295024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6AB8234A908; Thu, 27 Feb 2020 13:33:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88E6421CBAC for ; Thu, 27 Feb 2020 13:20:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 98F8F8F3E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9799746C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:21 -0500 Message-Id: <1582838290-17243-454-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 453/622] lustre: uapi: fix building fail against Power9 little endian X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Gu Zheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Gu Zheng We use "%ll[dux]" for __u64 variable as an input/output modifier, this may cause building error on some architectures which use "long" for 64-bit types, for example, Power9 little endian. Here add necessary typecasting (long long/unsigned long long) to make the build correct. WC-bug-id: https://jira.whamcloud.com/browse/LU-12705 Lustre-commit: 4eddf36ac360 ("LU-12705 build: fix building fail against Power9 little endian") Signed-off-by: Gu Zheng Reviewed-on: https://review.whamcloud.com/36007 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 3016b73..695ceb2 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -512,7 +512,7 @@ struct lu_extent { }; #define DEXT "[%#llx, %#llx)" -#define PEXT(ext) (ext)->e_start, (ext)->e_end +#define PEXT(ext) (unsigned long long)(ext)->e_start, (unsigned long long)(ext)->e_end static inline bool lu_extent_is_overlapped(struct lu_extent *e1, struct lu_extent *e2) From patchwork Thu Feb 27 21:15:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7C9517E0 for ; Thu, 27 Feb 2020 21:38:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B08B224690 for ; Thu, 27 Feb 2020 21:38:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B08B224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E49C334A410; Thu, 27 Feb 2020 13:31:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C99C821CBAC for ; Thu, 27 Feb 2020 13:20:39 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9BB018F3F; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9A60F46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:22 -0500 Message-Id: <1582838290-17243-455-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 454/622] lustre: ptlrpc: fix reply buffers shrinking and growing X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The req_capsule_shrink() doesn't update capsule itself with new buffer lenghts after the shrinking. Usually it is not needed because reply is packed already. But if reply buffers are re-allocated by req_capsule_server_grow() then non-updated lenghts from capsule are used causing bigger reply message. That may cause client buffer re-allocation with resend. Patch does the following: - update capsule length after the shrinking introduce lustre_grow_msg() to grow msg field in-place - update req_capsule_server_grow() to use generic lustre_grow_msg() and make it able to grow reply without re-allocation if reply buffer is big enough already - update sanity test 271f to use bigger file size to exceed current maximum reply buffer size allocated on client. WC-bug-id: https://jira.whamcloud.com/browse/LU-12443 Lustre-commit: cedbb25e984c ("LU-12443 ptlrpc: fix reply buffers shrinking and growing") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/35243 Reviewed-by: Sebastien Buisson Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/layout.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 67a7cd5..dd04eee 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -2309,11 +2309,16 @@ void req_capsule_shrink(struct req_capsule *pill, LASSERTF(newlen <= len, "%s:%s, oldlen=%u, newlen=%u\n", fmt->rf_name, field->rmf_name, len, newlen); - if (loc == RCL_CLIENT) + if (loc == RCL_CLIENT) { pill->rc_req->rq_reqlen = lustre_shrink_msg(msg, offset, newlen, 1); - else + } else { pill->rc_req->rq_replen = lustre_shrink_msg(msg, offset, newlen, 1); + /* update also field size in reply lenghts arrays for possible + * reply re-pack due to req_capsule_server_grow() call. + */ + req_capsule_set_size(pill, field, loc, newlen); + } } EXPORT_SYMBOL(req_capsule_shrink); From patchwork Thu Feb 27 21:15:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B267692A for ; Thu, 27 Feb 2020 21:38:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 98C0F24690 for ; Thu, 27 Feb 2020 21:38:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98C0F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45C1D34A43D; Thu, 27 Feb 2020 13:31:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 188AB21CBAC for ; Thu, 27 Feb 2020 13:20:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9E79A9160; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9D2A147C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:23 -0500 Message-Id: <1582838290-17243-456-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 455/622] lustre: dom: manual OST-to-DOM migration via mirroring X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Allow DOM mirroring, update LOV/LOD code to check not just first component for DOM pattern but cycle through all mirrors if any. Sanity checks allows one DOM component in a mirror and it should be the first one. Multiple DOM components are allowed only with the same for now. Do OST file migration to MDT by using FLR. That can't be done by layout swapping, because MDT data will be tied to temporary volatile file but we want to keep data with the original file. The mirroring allows that with the following steps: - extent layout with new mirror on MDT, no data is copied but new mirror stays in 'stale' state. The reason is the same problem with volatile file. - resync mirrors, now new DOM layout is filled with data. - remove first mirror WC-bug-id: https://jira.whamcloud.com/browse/LU-11421 Lustre-commit: 44a721b8c106 ("LU-11421 dom: manual OST-to-DOM migration via mirroring") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/35359 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_object.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 52d8c30..5c4d8f9 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -543,7 +543,13 @@ static int lov_init_dom(const struct lu_env *env, struct lov_device *dev, u32 idx = 0; int rc; - LASSERT(index == 0); + /* DOM entry may be not zero index due to FLR but must start from 0 */ + if (unlikely(lle->lle_extent->e_start != 0)) { + CERROR("%s: DOM entry must be the first stripe in a mirror\n", + lov2obd(dev->ld_lov)->obd_name); + dump_lsm(D_ERROR, lov->lo_lsm); + return -EINVAL; + } /* find proper MDS device */ rc = lov_fld_lookup(dev, fid, &idx); @@ -636,6 +642,7 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev, int result = 0; unsigned int seq; int i, j; + bool dom_size = 0; LASSERT(lsm->lsm_entry_count > 0); LASSERT(!lov->lo_lsm); @@ -679,6 +686,18 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev, lle->lle_comp_ops = &raid0_ops; break; case LOV_PATTERN_MDT: + /* Allowed to have several DOM stripes in different + * mirrors with the same DoM size. + */ + if (!dom_size) { + dom_size = lle->lle_lsme->lsme_extent.e_end; + } else if (dom_size != + lle->lle_lsme->lsme_extent.e_end) { + CERROR("%s: DOM entries with different sizes\n", + lov2obd(dev->ld_lov)->obd_name); + dump_lsm(D_ERROR, lsm); + return -EINVAL; + } lle->lle_comp_ops = &dom_ops; break; default: @@ -869,7 +888,8 @@ static void lov_fini_composite(const struct lu_env *env, struct lov_layout_entry *entry; lov_foreach_layout_entry(lov, entry) - entry->lle_comp_ops->lco_fini(env, entry); + if (entry->lle_comp_ops) + entry->lle_comp_ops->lco_fini(env, entry); kvfree(comp->lo_entries); comp->lo_entries = NULL; From patchwork Thu Feb 27 21:15:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410561 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 626A917E0 for ; Thu, 27 Feb 2020 21:41:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4B18F24690 for ; Thu, 27 Feb 2020 21:41:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B18F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0F72B34A930; Thu, 27 Feb 2020 13:33:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6EF0621FF8C for ; Thu, 27 Feb 2020 13:20:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A0DBA9161; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9FE19468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:24 -0500 Message-Id: <1582838290-17243-457-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 456/622] lustre: fld: remove fci_no_shrink field. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set, so is always zero. Remove it, and the one place where it is tested. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: e669586775c6 ("LU-6142 fld: remove fci_no_shrink field.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35875 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/fld/fld_cache.c | 3 +-- fs/lustre/fld/fld_internal.h | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/lustre/fld/fld_cache.c b/fs/lustre/fld/fld_cache.c index 5267ba2..79b10bb 100644 --- a/fs/lustre/fld/fld_cache.c +++ b/fs/lustre/fld/fld_cache.c @@ -381,8 +381,7 @@ static int fld_cache_insert_nolock(struct fld_cache *cache, * insertion loop. */ - if (!cache->fci_no_shrink) - fld_cache_shrink(cache); + fld_cache_shrink(cache); head = &cache->fci_entries_head; diff --git a/fs/lustre/fld/fld_internal.h b/fs/lustre/fld/fld_internal.h index 465c6ccf..53648d2 100644 --- a/fs/lustre/fld/fld_internal.h +++ b/fs/lustre/fld/fld_internal.h @@ -109,7 +109,6 @@ struct fld_cache { /** Cache name used for debug and messages. */ char fci_name[LUSTRE_MDT_MAXNAMELEN]; - unsigned int fci_no_shrink:1; }; enum { From patchwork Thu Feb 27 21:15:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9BD1C924 for ; Thu, 27 Feb 2020 21:46:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 848DD24690 for ; Thu, 27 Feb 2020 21:46:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 848DD24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C205C34B144; Thu, 27 Feb 2020 13:36:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B120F21FF93 for ; Thu, 27 Feb 2020 13:20:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A43E19162; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A2E1B46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:25 -0500 Message-Id: <1582838290-17243-458-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 457/622] lustre: lustre: remove ldt_obd_type field of lu_device_type X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set, so it is always NULL. So remove it, and the one place it is used, and a variable that now will now never be set. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 5274e833f5e6 ("LU-6142 lustre: remove ldt_obd_type field of lu_device_type") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35876 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 5 +---- fs/lustre/obdclass/lu_object.c | 6 ------ 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index b00fad8..aed0d4b 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -43,6 +43,7 @@ struct seq_file; struct lustre_cfg; struct lprocfs_stats; +struct obd_type; /** \defgroup lu lu * lu_* data-types represent server-side entities shared by data and meta-data @@ -319,10 +320,6 @@ struct lu_device_type { */ const struct lu_device_type_operations *ldt_ops; /** - * \todo XXX: temporary pointer to associated obd_type. - */ - struct obd_type *ldt_obd_type; - /** * \todo XXX: temporary: context tags used by obd_*() calls. */ u32 ldt_ctx_tags; diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index dccff91..38c04c7 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1336,14 +1336,8 @@ void lu_stack_fini(const struct lu_env *env, struct lu_device *top) for (scan = top; scan; scan = next) { const struct lu_device_type *ldt = scan->ld_type; - struct obd_type *type; next = ldt->ldt_ops->ldto_device_free(env, scan); - type = ldt->ldt_obd_type; - if (type) { - atomic_dec(&type->typ_refcnt); - class_put_type(type); - } } } From patchwork Thu Feb 27 21:15:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A806D17E0 for ; Thu, 27 Feb 2020 21:41:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 90A68246A1 for ; Thu, 27 Feb 2020 21:41:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90A68246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B2E8934A959; Thu, 27 Feb 2020 13:33:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F296221FF93 for ; Thu, 27 Feb 2020 13:20:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A6DAA9163; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A5A8E46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:26 -0500 Message-Id: <1582838290-17243-459-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 458/622] lustre: lustre: remove imp_no_timeout field X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set and never used. Remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: b9dd17681bfa ("LU-6142 lustre: remove imp_no_timeout field") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35877 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index ff171d1..c2f98e6 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -273,8 +273,7 @@ struct obd_import { spinlock_t imp_lock; /* flags */ - unsigned long imp_no_timeout:1, /* timeouts are disabled */ - imp_invalid:1, /* evicted */ + unsigned long imp_invalid:1, /* evicted */ /* administratively disabled */ imp_deactive:1, /* try to recover the import */ From patchwork Thu Feb 27 21:15:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC3E0138D for ; Thu, 27 Feb 2020 21:41:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 947B424690 for ; Thu, 27 Feb 2020 21:41:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 947B424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3C6AF34A98B; Thu, 27 Feb 2020 13:33:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 41B2A21FF9E for ; Thu, 27 Feb 2020 13:20:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A9C919164; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A856446D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:27 -0500 Message-Id: <1582838290-17243-460-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 459/622] lustre: llog: remove olg_cat_processing field. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This mutex is initialized but never used. Remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 2801ef81f1d0 ("LU-6142 llog: remove olg_cat_processing field.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35878 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/include/lustre_log.h | 1 - fs/lustre/include/obd.h | 1 - 2 files changed, 2 deletions(-) diff --git a/fs/lustre/include/lustre_log.h b/fs/lustre/include/lustre_log.h index 99c6305..9c784ac 100644 --- a/fs/lustre/include/lustre_log.h +++ b/fs/lustre/include/lustre_log.h @@ -288,7 +288,6 @@ static inline void llog_group_init(struct obd_llog_group *olg) { init_waitqueue_head(&olg->olg_waitq); spin_lock_init(&olg->olg_lock); - mutex_init(&olg->olg_cat_processing); } static inline int llog_group_set_ctxt(struct obd_llog_group *olg, diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 70dbaaf..ef37f78 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -527,7 +527,6 @@ struct obd_llog_group { struct llog_ctxt *olg_ctxts[LLOG_MAX_CTXTS]; wait_queue_head_t olg_waitq; spinlock_t olg_lock; - struct mutex olg_cat_processing; }; /* corresponds to one of the obd's */ From patchwork Thu Feb 27 21:15:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F225924 for ; Thu, 27 Feb 2020 21:41:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 86E85246A1 for ; Thu, 27 Feb 2020 21:41:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 86E85246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB54434A9B7; Thu, 27 Feb 2020 13:33:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 832E921FFA3 for ; Thu, 27 Feb 2020 13:20:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AD34A9165; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AB25647C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:28 -0500 Message-Id: <1582838290-17243-461-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 460/622] lustre: ptlrpc: remove struct ptlrpc_bulk_page X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This structure is never used, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 2b9bf4c00bce ("LU-6142 ptlrpc: remove struct ptlrpc_bulk_page") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35879 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index f16c6d3..faf15e9 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1146,22 +1146,6 @@ void _debug_req(struct ptlrpc_request *req, } while (0) /** @} */ -/** - * Structure that defines a single page of a bulk transfer - */ -struct ptlrpc_bulk_page { - /** Linkage to list of pages in a bulk */ - struct list_head bp_link; - /** - * Number of bytes in a page to transfer starting from @bp_pageoffset - */ - int bp_buflen; - /** offset within a page */ - int bp_pageoffset; - /** The page itself */ - struct page *bp_page; -}; - enum ptlrpc_bulk_op_type { PTLRPC_BULK_OP_ACTIVE = 0x00000001, PTLRPC_BULK_OP_PASSIVE = 0x00000002, From patchwork Thu Feb 27 21:15:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E2CF138D for ; Thu, 27 Feb 2020 21:41:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1589324690 for ; Thu, 27 Feb 2020 21:41:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1589324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 82F4F34A9E4; Thu, 27 Feb 2020 13:33:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C463221FFA3 for ; Thu, 27 Feb 2020 13:20:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AF66E9166; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ADF41468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:29 -0500 Message-Id: <1582838290-17243-462-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 461/622] lustre: ptlrpc: remove bd_import_generation field. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is set, but never accessed. So remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 531bbc669d66 ("LU-6142 ptlrpc: remove bd_import_generation field.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35880 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 -- fs/lustre/ptlrpc/client.c | 1 - 2 files changed, 3 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index faf15e9..bec92cf 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1251,8 +1251,6 @@ struct ptlrpc_bulk_desc { unsigned long bd_registered:1; /** For serialization with callback */ spinlock_t bd_lock; - /** Import generation when request for this bulk was sent */ - int bd_import_generation; /** {put,get}{source,sink}{kvec,kiov} */ enum ptlrpc_bulk_op_type bd_type; /** LNet portal for this bulk */ diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index d2e5e04..478ba85 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -210,7 +210,6 @@ struct ptlrpc_bulk_desc *ptlrpc_prep_bulk_imp(struct ptlrpc_request *req, if (!desc) return NULL; - desc->bd_import_generation = req->rq_import_generation; desc->bd_import = class_import_get(imp); desc->bd_req = req; From patchwork Thu Feb 27 21:15:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410453 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41C3192A for ; Thu, 27 Feb 2020 21:38:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2A57124690 for ; Thu, 27 Feb 2020 21:38:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A57124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 48BBA34A46C; Thu, 27 Feb 2020 13:31:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14D5721FFA9 for ; Thu, 27 Feb 2020 13:20:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B22919167; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0E9846A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:30 -0500 Message-Id: <1582838290-17243-463-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 462/622] lustre: ptlrpc: remove srv_threads from struct ptlrpc_service X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The threads are not stored here - nothing is. Threads are stored in svcpt->scp_threads. So remove the field and update the comment. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 6d1062cdffca ("LU-6142 ptlrpc: remove srv_threads from struct ptlrpc_service") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35881 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Reviewed-by: Shaun Tancheff Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index bec92cf..68db603 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1314,7 +1314,7 @@ enum { */ struct ptlrpc_thread { /** - * List of active threads in svc->srv_threads + * List of active threads in svcpt->scp_threads */ struct list_head t_link; /** @@ -1474,8 +1474,6 @@ struct ptlrpc_service { char *srv_name; /** only statically allocated strings here; we don't clean them */ char *srv_thread_name; - /** service thread list */ - struct list_head srv_threads; /** threads # should be created for each partition on initializing */ int srv_nthrs_cpt_init; /** limit of threads number for each partition */ From patchwork Thu Feb 27 21:15:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A8D4138D for ; Thu, 27 Feb 2020 21:41:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8222724690 for ; Thu, 27 Feb 2020 21:41:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8222724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1B13D349DC0; Thu, 27 Feb 2020 13:33:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5A6B121FCC1 for ; Thu, 27 Feb 2020 13:20:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B53DB9168; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B3A8546C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:31 -0500 Message-Id: <1582838290-17243-464-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 463/622] lustre: ptlrpc: remove scp_nthrs_stopping field. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is unused, so remove it. If "shrinking threads" is ever needed, any extra fields required can be added then. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 7233248e565f ("LU-6142 ptlrpc: remove scp_nthrs_stopping field.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35882 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 68db603..aaf5cb8 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1557,8 +1557,6 @@ struct ptlrpc_service_part { int scp_thr_nextid; /** # of starting threads */ int scp_nthrs_starting; - /** # of stopping threads, reserved for shrinking threads */ - int scp_nthrs_stopping; /** # running threads */ int scp_nthrs_running; /** service threads list */ From patchwork Thu Feb 27 21:15:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410583 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6C95924 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF68424690 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF68424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A700434AA37; Thu, 27 Feb 2020 13:33:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9DBD521FCC1 for ; Thu, 27 Feb 2020 13:20:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B82059169; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B67ED46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:32 -0500 Message-Id: <1582838290-17243-465-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 464/622] lustre: ldlm: remove unused ldlm_server_conn X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set or used, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 047a6185a1ed ("LU-6142 ldlm: remove unused ldlm_server_conn") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35883 Reviewed-by: Andreas Dilger Reviewed-by: Arshad Hussain Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_internal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 4844a9b..336d9b7 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -197,7 +197,6 @@ struct ldlm_state { struct ptlrpc_service *ldlm_cb_service; struct ptlrpc_service *ldlm_cancel_service; struct ptlrpc_client *ldlm_client; - struct ptlrpc_connection *ldlm_server_conn; struct ldlm_bl_pool *ldlm_bl_pool; }; From patchwork Thu Feb 27 21:15:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410457 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A204717E0 for ; Thu, 27 Feb 2020 21:38:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 83B9324690 for ; Thu, 27 Feb 2020 21:38:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83B9324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2048A34A497; Thu, 27 Feb 2020 13:31:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E128F21FFB5 for ; Thu, 27 Feb 2020 13:20:42 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BA5A5916A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B92B647C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:33 -0500 Message-Id: <1582838290-17243-466-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 465/622] lustre: llite: remove lli_readdir_mutex X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This mutex is initialized but never used, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 26bf41c177a5 ("LU-6142 llite: remove lli_readdir_mutex") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35884 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 3 --- fs/lustre/llite/llite_lib.c | 1 - 2 files changed, 4 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 232fb0a..77854a5 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -143,9 +143,6 @@ struct ll_inode_info { union { /* for directory */ struct { - /* serialize normal readdir and statahead-readdir. */ - struct mutex lli_readdir_mutex; - /* metadata statahead */ /* since parent-child threads can share the same @file * struct, "opendir_key" is the token when dir close for diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 217268e..7d83ee3 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -960,7 +960,6 @@ void ll_lli_init(struct ll_inode_info *lli) LASSERT(lli->lli_vfs_inode.i_mode != 0); if (S_ISDIR(lli->lli_vfs_inode.i_mode)) { - mutex_init(&lli->lli_readdir_mutex); lli->lli_opendir_key = NULL; lli->lli_sai = NULL; spin_lock_init(&lli->lli_sa_lock); From patchwork Thu Feb 27 21:15:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94B01138D for ; Thu, 27 Feb 2020 21:41:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D29D24690 for ; Thu, 27 Feb 2020 21:41:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D29D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 549CF21F637; Thu, 27 Feb 2020 13:33:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 30EB021FFBB for ; Thu, 27 Feb 2020 13:20:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BD73F916B; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BBDDD468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:34 -0500 Message-Id: <1582838290-17243-467-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 466/622] lustre: llite: remove ll_umounting field X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is set but never accessed, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 15b83c9b7b28 ("LU-6142 llite: remove ll_umounting field") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35885 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 3 +-- fs/lustre/llite/llite_lib.c | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 77854a5..6186720 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -505,8 +505,7 @@ struct ll_sb_info { struct lu_fid ll_root_fid; /* root object fid */ int ll_flags; - unsigned int ll_umounting:1, - ll_xattr_cache_enabled:1, + unsigned int ll_xattr_cache_enabled:1, ll_xattr_cache_set:1, /* already set to 0/1 */ ll_client_common_fill_super_succeeded:1, ll_checksum_set:1; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 7d83ee3..ad7c2e2 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -785,7 +785,6 @@ void ll_kill_super(struct super_block *sb) */ if (sbi) { sb->s_dev = sbi->ll_sdev_orig; - sbi->ll_umounting = 1; /* wait running statahead threads to quit */ while (atomic_read(&sbi->ll_sa_running) > 0) From patchwork Thu Feb 27 21:15:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDD42138D for ; Thu, 27 Feb 2020 21:41:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B528824690 for ; Thu, 27 Feb 2020 21:41:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B528824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1889A349D59; Thu, 27 Feb 2020 13:34:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 72DFA21FFB5 for ; Thu, 27 Feb 2020 13:20:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C0381916C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BE9CC46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:35 -0500 Message-Id: <1582838290-17243-468-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 467/622] lustre: llite: align field names in ll_sb_info X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Align field names and most comments in struct ll_sb_info. Signed-off-by: NeilBrown Reviewed-by: James Simmons --- fs/lustre/llite/llite_internal.h | 74 ++++++++++++++++++++-------------------- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 6186720..bb5f519 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -493,26 +493,26 @@ struct ll_sb_info { /* this protects pglist and ra_info. It isn't safe to * grab from interrupt contexts */ - spinlock_t ll_lock; - spinlock_t ll_pp_extent_lock; /* pp_extent entry*/ - spinlock_t ll_process_lock; /* ll_rw_process_info */ - struct obd_uuid ll_sb_uuid; + spinlock_t ll_lock; + spinlock_t ll_pp_extent_lock; /* pp_extent entry*/ + spinlock_t ll_process_lock; /* ll_rw_process_info */ + struct obd_uuid ll_sb_uuid; struct obd_export *ll_md_exp; struct obd_export *ll_dt_exp; struct obd_device *ll_md_obd; struct obd_device *ll_dt_obd; struct dentry *ll_debugfs_entry; - struct lu_fid ll_root_fid; /* root object fid */ + struct lu_fid ll_root_fid; /* root object fid */ - int ll_flags; - unsigned int ll_xattr_cache_enabled:1, + int ll_flags; + unsigned int ll_xattr_cache_enabled:1, ll_xattr_cache_set:1, /* already set to 0/1 */ - ll_client_common_fill_super_succeeded:1, - ll_checksum_set:1; + ll_client_common_fill_super_succeeded:1, + ll_checksum_set:1; - struct lustre_client_ocd ll_lco; + struct lustre_client_ocd ll_lco; - struct lprocfs_stats *ll_stats; /* lprocfs stats counter */ + struct lprocfs_stats *ll_stats; /* lprocfs stats counter */ /* * Used to track "unstable" pages on a client, and maintain a @@ -520,58 +520,58 @@ struct ll_sb_info { * any page which is sent to a server as part of a bulk request, * but is uncommitted to stable storage. */ - struct cl_client_cache *ll_cache; + struct cl_client_cache *ll_cache; - struct lprocfs_stats *ll_ra_stats; + struct lprocfs_stats *ll_ra_stats; - struct ll_ra_info ll_ra_info; - unsigned int ll_namelen; + struct ll_ra_info ll_ra_info; + unsigned int ll_namelen; const struct file_operations *ll_fop; - struct lu_site *ll_site; - struct cl_device *ll_cl; + struct lu_site *ll_site; + struct cl_device *ll_cl; /* Statistics */ struct ll_rw_extents_info ll_rw_extents_info; - int ll_extent_process_count; + int ll_extent_process_count; struct ll_rw_process_info ll_rw_process_info[LL_PROCESS_HIST_MAX]; - unsigned int ll_offset_process_count; + unsigned int ll_offset_process_count; struct ll_rw_process_info ll_rw_offset_info[LL_OFFSET_HIST_MAX]; - unsigned int ll_rw_offset_entry_count; - int ll_stats_track_id; - enum stats_track_type ll_stats_track_type; - int ll_rw_stats_on; + unsigned int ll_rw_offset_entry_count; + int ll_stats_track_id; + enum stats_track_type ll_stats_track_type; + int ll_rw_stats_on; /* metadata stat-ahead */ unsigned int ll_sa_running_max; /* max concurrent * statahead instances */ - unsigned int ll_sa_max; /* max statahead RPCs */ - atomic_t ll_sa_total; /* statahead thread started - * count - */ - atomic_t ll_sa_wrong; /* statahead thread stopped for - * low hit ratio - */ + unsigned int ll_sa_max; /* max statahead RPCs */ + atomic_t ll_sa_total; /* statahead thread started + * count + */ + atomic_t ll_sa_wrong; /* statahead thread stopped for + * low hit ratio + */ atomic_t ll_sa_running; /* running statahead thread * count */ - atomic_t ll_agl_total; /* AGL thread started count */ + atomic_t ll_agl_total; /* AGL thread started count */ - dev_t ll_sdev_orig; /* save s_dev before assign for + dev_t ll_sdev_orig; /* save s_dev before assign for * clustered nfs */ /* root squash */ - struct root_squash_info ll_squash; - struct path ll_mnt; + struct root_squash_info ll_squash; + struct path ll_mnt; /* st_blksize returned by stat(2), when non-zero */ - unsigned int ll_stat_blksize; + unsigned int ll_stat_blksize; /* maximum relative age of cached statfs results */ - unsigned int ll_statfs_max_age; + unsigned int ll_statfs_max_age; struct kset ll_kset; /* sysfs object */ - struct completion ll_kobj_unregister; + struct completion ll_kobj_unregister; /* File heat */ unsigned int ll_heat_decay_weight; From patchwork Thu Feb 27 21:15:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8044138D for ; Thu, 27 Feb 2020 21:42:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE5EC24690 for ; Thu, 27 Feb 2020 21:42:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE5EC24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D7F0534AAB7; Thu, 27 Feb 2020 13:34:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C8B8C21FFB5 for ; Thu, 27 Feb 2020 13:20:43 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C273D916D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C169A46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:36 -0500 Message-Id: <1582838290-17243-469-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 468/622] lustre: llite: remove lti_iter field X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never used, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 0140f50c1287 ("LU-6142 llite: remove lti_iter field") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35886 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index bb5f519..025d33e 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1033,7 +1033,6 @@ struct ll_cl_context { }; struct ll_thread_info { - struct iov_iter lti_iter; struct vvp_io_args lti_args; struct ra_io_arg lti_ria; struct ll_cl_context lti_io_ctx; From patchwork Thu Feb 27 21:15:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410463 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECF1B17E0 for ; Thu, 27 Feb 2020 21:39:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D5EA524690 for ; Thu, 27 Feb 2020 21:39:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5EA524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A016034A4CB; Thu, 27 Feb 2020 13:31:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1806A21FF03 for ; Thu, 27 Feb 2020 13:20:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C5AE1916E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C42D446D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:37 -0500 Message-Id: <1582838290-17243-470-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 469/622] lustre: llite: remove ft_mtime field X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is set but never accessed, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: b674c418fa04 ("LU-6142 llite: remove ft_mtime field") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35887 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/llite/vvp_internal.h | 5 ----- fs/lustre/llite/vvp_io.c | 1 - 2 files changed, 6 deletions(-) diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h index 7a463cb..1cc152f 100644 --- a/fs/lustre/llite/vvp_internal.h +++ b/fs/lustre/llite/vvp_internal.h @@ -66,11 +66,6 @@ struct vvp_io { union { struct vvp_fault_io { - /** - * Inode modification time that is checked across DLM - * lock request. - */ - time64_t ft_mtime; struct vm_area_struct *ft_vma; /** * locked page returned from vvp_io diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index e676e62..d0d8b1f 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -271,7 +271,6 @@ static int vvp_io_fault_iter_init(const struct lu_env *env, struct inode *inode = vvp_object_inode(ios->cis_obj); LASSERT(inode == file_inode(vio->vui_fd->fd_file)); - vio->u.fault.ft_mtime = inode->i_mtime.tv_sec; return 0; } From patchwork Thu Feb 27 21:15:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EADB2924 for ; Thu, 27 Feb 2020 21:42:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D26A124690 for ; Thu, 27 Feb 2020 21:42:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D26A124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D4B934AAF5; Thu, 27 Feb 2020 13:34:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5E07221FF03 for ; Thu, 27 Feb 2020 13:20:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C84D8916F; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C6D8647C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:38 -0500 Message-Id: <1582838290-17243-471-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 470/622] lustre: llite: remove sub_reenter field. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set or accessed, so remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 3118333b9664 ("LU-6142 llite: remove sub_reenter field.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35888 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/lov/lov_cl_internal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index 6fea0f5..40bb6f0 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -509,7 +509,6 @@ struct lov_io_sub { * \see cl_env_get() */ u16 sub_refcheck; - u16 sub_reenter; }; /** From patchwork Thu Feb 27 21:15:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410885 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED7421580 for ; Thu, 27 Feb 2020 21:49:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D623224690 for ; Thu, 27 Feb 2020 21:49:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D623224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A3B3834A81B; Thu, 27 Feb 2020 13:41:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9FDC021FFD3 for ; Thu, 27 Feb 2020 13:20:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CAA679170; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C98D8468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:39 -0500 Message-Id: <1582838290-17243-472-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 471/622] lustre: osc: remove oti_descr oti_handle oti_plist X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown These three fields in 'struct osc_thread_info' are unused, so remove them. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 3bc9a5e32542 ("LU-6142 osc: remove oti_descr oti_handle oti_plist") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35889 Reviewed-by: Andreas Dilger Reviewed-by: Arshad Hussain Reviewed-by: James Simmons Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 37e56ef..044185d 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -176,10 +176,7 @@ struct osc_session { struct osc_thread_info { struct ldlm_res_id oti_resname; union ldlm_policy_data oti_policy; - struct cl_lock_descr oti_descr; struct cl_attr oti_attr; - struct lustre_handle oti_handle; - struct cl_page_list oti_plist; struct cl_io oti_io; struct pagevec oti_pagevec; void *oti_pvec[OTI_PVEC_SIZE]; From patchwork Thu Feb 27 21:15:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C11CE924 for ; Thu, 27 Feb 2020 21:42:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A91F124690 for ; Thu, 27 Feb 2020 21:42:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A91F124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EF7E634AB22; Thu, 27 Feb 2020 13:34:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E55A021FFDC for ; Thu, 27 Feb 2020 13:20:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD7D59171; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CC3BD46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:40 -0500 Message-Id: <1582838290-17243-473-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 472/622] lustre: osc: remove oe_next_page X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown As the comment says, this field is unused. So remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: d1b08c58b43e ("LU-6142 osc: remove oe_next_page") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35890 Reviewed-by: Andreas Dilger Reviewed-by: Arshad Hussain Reviewed-by: James Simmons Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 044185d..de7ccd6 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -946,11 +946,6 @@ struct osc_extent { unsigned int oe_nr_pages; /* list of pending oap pages. Pages in this list are NOT sorted. */ struct list_head oe_pages; - /* Since an extent has to be written out in atomic, this is used to - * remember the next page need to be locked to write this extent out. - * Not used right now. - */ - struct osc_page *oe_next_page; /* start and end index of this extent, include start and end * themselves. Page offset here is the page index of osc_pages. * oe_start is used as keyword for red-black tree. From patchwork Thu Feb 27 21:15:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2802C1871 for ; Thu, 27 Feb 2020 21:42:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11EAB246A2 for ; Thu, 27 Feb 2020 21:42:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11EAB246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D376034AB50; Thu, 27 Feb 2020 13:34:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 34F4921FFDC for ; Thu, 27 Feb 2020 13:20:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D05BD9172; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEF4C46C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:41 -0500 Message-Id: <1582838290-17243-474-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 473/622] lnet: o2iblnd: remove some unused fields. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Fields kib_min_reconnect_interval kib_max_reconnect_interval kib_ntx are never used or set. ibh_mr_shift is set but never used; rx_status is used (in a debug message) but never set. Remove them all. We could possibly remove ibh_mr_size too. It is only used for an error message. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 68c04b8fdd5d ("LU-6142 o2iblnd: remove some unused fields.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35891 Reviewed-by: Shaun Tancheff Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 7 +------ net/lnet/klnds/o2iblnd/o2iblnd.h | 5 ----- 2 files changed, 1 insertion(+), 11 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index f3176e1..278823f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2303,7 +2303,6 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) { struct ib_device_attr *dev_attr = &hdev->ibh_ibdev->attrs; - int rc = 0; /* * It's safe to assume a HCA can handle a page size @@ -2326,15 +2325,11 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT; } else { CERROR("IB device does not support FMRs nor FastRegs, can't register memory: %d\n", - rc); + -ENXIO); return -ENXIO; } hdev->ibh_mr_size = dev_attr->max_mr_size; - if (hdev->ibh_mr_size == ~0ULL) { - hdev->ibh_mr_shift = 64; - return 0; - } CERROR("Invalid mr size: %#llx\n", hdev->ibh_mr_size); return -EINVAL; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 1285ab1..2f2337a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -76,12 +76,9 @@ struct kib_tunables { int *kib_dev_failover; /* HCA failover */ unsigned int *kib_service; /* IB service number */ - int *kib_min_reconnect_interval; /* first failed connection retry... */ - int *kib_max_reconnect_interval; /* exponentially increasing to this */ int *kib_cksum; /* checksum struct kib_msg? */ int *kib_timeout; /* comms timeout (seconds) */ int *kib_keepalive; /* keepalive timeout (seconds) */ - int *kib_ntx; /* # tx descs */ char **kib_default_ipif; /* default IPoIB interface */ int *kib_retry_count; int *kib_rnr_retry_count; @@ -178,7 +175,6 @@ struct kib_hca_dev { int ibh_page_shift; /* page shift of current HCA */ int ibh_page_size; /* page size of current HCA */ u64 ibh_page_mask; /* page mask of current HCA */ - int ibh_mr_shift; /* bits shift of max MR size */ u64 ibh_mr_size; /* size of MR */ struct ib_pd *ibh_pd; /* PD */ struct kib_dev *ibh_dev; /* owner */ @@ -492,7 +488,6 @@ struct kib_rx { /* receive message */ struct list_head rx_list; /* queue for attention */ struct kib_conn *rx_conn; /* owning conn */ int rx_nob; /* # bytes received (-1 while posted) */ - enum ib_wc_status rx_status; /* completion status */ struct kib_msg *rx_msg; /* message buffer (host vaddr) */ u64 rx_msgaddr; /* message buffer (I/O addr) */ DEFINE_DMA_UNMAP_ADDR(rx_msgunmap); /* for dma_unmap_single() */ From patchwork Thu Feb 27 21:15:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 084B5138D for ; Thu, 27 Feb 2020 21:42:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E44E624690 for ; Thu, 27 Feb 2020 21:42:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E44E624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6773734AB80; Thu, 27 Feb 2020 13:34:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8BBB621FFDC for ; Thu, 27 Feb 2020 13:20:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D342D9173; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D1A5D46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:42 -0500 Message-Id: <1582838290-17243-475-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 474/622] lnet: socklnd: remove ksnp_sharecount X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This field is never set, though its value is printed. Remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 408a5a527567 ("LU-6142 socklnd: remove ksnp_sharecount") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35892 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 4 ++-- net/lnet/klnds/socklnd/socklnd.h | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 78f6c7e..e2a9819 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2471,10 +2471,10 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) if (peer_ni->ksnp_ni != ni) continue; - CWARN("Active peer_ni on shutdown: %s, ref %d, scnt %d, closing %d, accepting %d, err %d, zcookie %llu, txq %d, zc_req %d\n", + CWARN("Active peer_ni on shutdown: %s, ref %d, closing %d, accepting %d, err %d, zcookie %llu, txq %d, zc_req %d\n", libcfs_id2str(peer_ni->ksnp_id), atomic_read(&peer_ni->ksnp_refcount), - peer_ni->ksnp_sharecount, peer_ni->ksnp_closing, + peer_ni->ksnp_closing, peer_ni->ksnp_accepting, peer_ni->ksnp_error, peer_ni->ksnp_zc_next_cookie, !list_empty(&peer_ni->ksnp_tx_queue), diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 80c2e19..efdd02e 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -415,7 +415,6 @@ struct ksock_peer { */ struct lnet_process_id ksnp_id; /* who's on the other end(s) */ atomic_t ksnp_refcount; /* # users */ - int ksnp_sharecount; /* lconf usage counter */ int ksnp_closing; /* being closed */ int ksnp_accepting; /* # passive connections pending */ From patchwork Thu Feb 27 21:15:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410617 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55316138D for ; Thu, 27 Feb 2020 21:42:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C33424690 for ; Thu, 27 Feb 2020 21:42:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C33424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E7D6C34ABC3; Thu, 27 Feb 2020 13:34:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D09AC21F3A2 for ; Thu, 27 Feb 2020 13:20:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D5F919174; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D491247C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:43 -0500 Message-Id: <1582838290-17243-476-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 475/622] lustre: llite: extend readahead locks for striped file X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Currently cl_io_read_ahead() can not return locks that cross stripe boundary at one time, thus readahead will stop because of this reason. This is really bad, as we will stop readahead every time we hit stripe boundary, for example default stripe size is 1M, this could hurt performances very much especially with async readahead introduced. So try to use existed locks aggressivly if there is no lock contention, otherwise lock should be not less than requested extent. WC-bug-id: https://jira.whamcloud.com/browse/LU-12043 Lustre-commit: cfbeae97d736 ("LU-12043 llite: extend readahead locks for striped file") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35438 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 2 ++ fs/lustre/llite/rw.c | 14 ++++++++++++-- fs/lustre/osc/osc_io.c | 2 ++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 71ca283..65fdab9 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1474,6 +1474,8 @@ struct cl_read_ahead { void (*cra_release)(const struct lu_env *env, void *cbdata); /* Callback data for cra_release routine */ void *cra_cbdata; + /* whether lock is in contention */ + bool cra_contention; }; static inline void cl_read_ahead_release(const struct lu_env *env, diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 4fec9a6..7c2dbdc 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -369,6 +369,18 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) if (rc < 0) break; + /* Do not shrink the ria_end at any case until + * the minimum end of current read is covered. + * And only shrink the ria_end if the matched + * LDLM lock doesn't cover more. + */ + if (page_idx > ra.cra_end || + (ra.cra_contention && + page_idx > ria->ria_end_min)) { + ria->ria_end = ra.cra_end; + break; + } + CDEBUG(D_READA, "idx: %lu, ra: %lu, rpc: %lu\n", page_idx, ra.cra_end, ra.cra_rpc_size); LASSERTF(ra.cra_end >= page_idx, @@ -387,8 +399,6 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) ria->ria_end = end - 1; if (ria->ria_end < ria->ria_end_min) ria->ria_end = ria->ria_end_min; - if (ria->ria_end > ra.cra_end) - ria->ria_end = ra.cra_end; } /* If the page is inside the read-ahead window */ diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 4f46b95..8e299d4 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -92,6 +92,8 @@ static int osc_io_read_ahead(const struct lu_env *env, dlmlock->l_policy_data.l_extent.end); ra->cra_release = osc_read_ahead_release; ra->cra_cbdata = dlmlock; + if (ra->cra_end != CL_PAGE_EOF) + ra->cra_contention = true; result = 0; } From patchwork Thu Feb 27 21:15:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410467 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5F74138D for ; Thu, 27 Feb 2020 21:39:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9E12024690 for ; Thu, 27 Feb 2020 21:39:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E12024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F010034A51A; Thu, 27 Feb 2020 13:31:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 387CC21F3A2 for ; Thu, 27 Feb 2020 13:20:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D8BA09175; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D756B468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:44 -0500 Message-Id: <1582838290-17243-477-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 476/622] lustre: llite: Improve readahead RPC issuance X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell lov_io_submit receives a range of pages, then adds pages in to a batch until it hits a page which is not in the stripe associated with this lov object. This means that if a readahead page range hits the same stripe more than once, we will issue multiple I/Os, even if the pages would fit in one RPC. This is unnecessary - Just submit all these pages at once. mpirun -n 2 $IOR -s 2000 -t 47K -b 47K -k -r -E -o $FILE Without patch: osc.lustre-OST0001-osc-ffff8fe82c952000.rpc_stats= read write pages per rpc rpcs % cum % | rpcs % cum % 1: 118 56 56 | 0 0 0 2: 0 0 56 | 0 0 0 4: 0 0 56 | 0 0 0 8: 0 0 56 | 0 0 0 16: 5 2 58 | 0 0 0 32: 0 0 58 | 0 0 0 64: 0 0 58 | 0 0 0 128: 21 10 68 | 0 0 0 256: 25 11 80 | 0 0 0 512: 10 4 85 | 0 0 0 1024: 31 14 100 | 0 0 0 osc.lustre-OST0002-osc-ffff8fe82c952000.rpc_stats= read write pages per rpc rpcs % cum % | rpcs % cum % 1: 5 6 6 | 0 0 0 2: 0 0 6 | 0 0 0 4: 0 0 6 | 0 0 0 8: 0 0 6 | 0 0 0 16: 0 0 6 | 0 0 0 32: 0 0 6 | 0 0 0 64: 0 0 6 | 0 0 0 128: 19 23 29 | 0 0 0 256: 19 23 52 | 0 0 0 512: 5 6 58 | 0 0 0 1024: 34 41 100 | 0 0 0 With patch: osc.lustre-OST0001-osc-ffff8fe7a7227800.rpc_stats= read write pages per rpc rpcs % cum % | rpcs % cum % 1: 12 17 17 | 0 0 0 2: 0 0 17 | 0 0 0 4: 0 0 17 | 0 0 0 8: 0 0 17 | 0 0 0 16: 5 7 24 | 0 0 0 32: 0 0 24 | 0 0 0 64: 5 7 31 | 0 0 0 128: 6 8 40 | 0 0 0 256: 1 1 42 | 0 0 0 512: 2 2 44 | 0 0 0 1024: 38 55 100 | 0 0 0 osc.lustre-OST0002-osc-ffff8fe7a7227800.rpc_stats= read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 4 7 7 | 0 0 0 128: 7 13 21 | 0 0 0 256: 0 0 21 | 0 0 0 512: 3 5 26 | 0 0 0 1024: 38 73 100 | 0 0 0 Note the much larger # of smaller RPC issued without the patch. WC-bug-id: https://jira.whamcloud.com/browse/LU-12533 Lustre-commit: 05b9da4fd124 ("LU-12533 llite: Improve readahead RPC issuance") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35458 Reviewed-by: Li Xi Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_io.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 6e86efa..fbed3de 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -1081,6 +1081,7 @@ static int lov_io_submit(const struct lu_env *env, struct lov_io_sub *sub; struct cl_page_list *plist = &lov_env_info(env)->lti_plist; struct cl_page *page; + struct cl_page *tmp; int index; int rc = 0; @@ -1105,10 +1106,10 @@ static int lov_io_submit(const struct lu_env *env, cl_page_list_move(&cl2q->c2_qin, qin, page); index = lov_page_index(page); - while (qin->pl_nr > 0) { - page = cl_page_list_first(qin); + cl_page_list_for_each_safe(page, tmp, qin) { + /* this page is not on this stripe */ if (index != lov_page_index(page)) - break; + continue; cl_page_list_move(&cl2q->c2_qin, qin, page); } From patchwork Thu Feb 27 21:15:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410619 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C638924 for ; Thu, 27 Feb 2020 21:42:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E827924690 for ; Thu, 27 Feb 2020 21:42:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E827924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 97B8234ABF0; Thu, 27 Feb 2020 13:34:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9686821FFF2 for ; Thu, 27 Feb 2020 13:20:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DB6539176; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DA0CA46A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:45 -0500 Message-Id: <1582838290-17243-478-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 477/622] lustre: lov: Move page index to top level X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When doing readahead, we see an amazing amount of time (~5-8%) just looking up the page index from the lov layer. In particular, this is more than half the time spent submitting pages: - 14.14% cl_io_submit_rw - 13.40% lov_io_submit - 8.24% lov_page_index This requires several indirections, all of which can be avoided by moving this up to the cl_page struct. WC-bug-id: https://jira.whamcloud.com/browse/LU-12535 Lustre-commit: 8d6d2914cf85 ("LU-12535 lov: Move page index to top level") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35470 Reviewed-by: Wang Shilong Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 2 ++ fs/lustre/lov/lov_cl_internal.h | 2 -- fs/lustre/lov/lov_io.c | 21 +++++---------------- fs/lustre/lov/lov_page.c | 10 +++++----- 4 files changed, 12 insertions(+), 23 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 65fdab9..4c68d7b 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -762,6 +762,8 @@ struct cl_page { struct lu_ref_link cp_queue_ref; /** Assigned if doing a sync_io */ struct cl_sync_io *cp_sync_io; + /** layout_entry + stripe index, composed using lov_comp_index() */ + unsigned int cp_lov_index; }; /** diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index 40bb6f0..8791e69 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -440,8 +440,6 @@ struct lov_lock { struct lov_page { struct cl_page_slice lps_cl; - /** layout_entry + stripe index, composed using lov_comp_index() */ - unsigned int lps_index; /* the layout gen when this page was created */ u32 lps_layout_gen; }; diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index fbed3de..56e4a982 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -189,17 +189,6 @@ struct lov_io_sub *lov_sub_get(const struct lu_env *env, * Lov io operations. * */ -static int lov_page_index(const struct cl_page *page) -{ - const struct cl_page_slice *slice; - - slice = cl_page_at(page, &lov_device_type); - LASSERT(slice); - LASSERT(slice->cpl_obj); - - return cl2lov_page(slice)->lps_index; -} - static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio, struct cl_io *io) { @@ -1105,10 +1094,10 @@ static int lov_io_submit(const struct lu_env *env, cl_2queue_init(cl2q); cl_page_list_move(&cl2q->c2_qin, qin, page); - index = lov_page_index(page); + index = page->cp_lov_index; cl_page_list_for_each_safe(page, tmp, qin) { /* this page is not on this stripe */ - if (index != lov_page_index(page)) + if (index != page->cp_lov_index) continue; cl_page_list_move(&cl2q->c2_qin, qin, page); @@ -1171,10 +1160,10 @@ static int lov_io_commit_async(const struct lu_env *env, cl_page_list_move(plist, queue, page); - index = lov_page_index(page); + index = page->cp_lov_index; while (queue->pl_nr > 0) { page = cl_page_list_first(queue); - if (index != lov_page_index(page)) + if (index != page->cp_lov_index) break; cl_page_list_move(plist, queue, page); @@ -1218,7 +1207,7 @@ static int lov_io_fault_start(const struct lu_env *env, fio = &ios->cis_io->u.ci_fault; lio = cl2lov_io(env, ios); - sub = lov_sub_get(env, lio, lov_page_index(fio->ft_page)); + sub = lov_sub_get(env, lio, fio->ft_page->cp_lov_index); if (IS_ERR(sub)) return PTR_ERR(sub); sub->sub_io.u.ci_fault.ft_nob = fio->ft_nob; diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c index c3337706..e73b5ff 100644 --- a/fs/lustre/lov/lov_page.c +++ b/fs/lustre/lov/lov_page.c @@ -57,8 +57,8 @@ static int lov_comp_page_print(const struct lu_env *env, struct lov_page *lp = cl2lov_page(slice); return (*printer)(env, cookie, - LUSTRE_LOV_NAME "-page@%p, comp index: %x, gen: %u\n", - lp, lp->lps_index, lp->lps_layout_gen); + LUSTRE_LOV_NAME "-page@%p, gen: %u\n", + lp, lp->lps_layout_gen); } static const struct cl_page_operations lov_comp_page_ops = { @@ -95,11 +95,11 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj, rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, &suboff); LASSERT(rc == 0); - lpg->lps_index = lov_comp_index(entry, stripe); + page->cp_lov_index = lov_comp_index(entry, stripe); lpg->lps_layout_gen = loo->lo_lsm->lsm_layout_gen; cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_comp_page_ops); - sub = lov_sub_get(env, lio, lpg->lps_index); + sub = lov_sub_get(env, lio, page->cp_lov_index); if (IS_ERR(sub)) return PTR_ERR(sub); @@ -136,7 +136,7 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj, struct lov_page *lpg = cl_object_page_slice(obj, page); void *addr; - lpg->lps_index = ~0; + page->cp_lov_index = ~0; cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_empty_page_ops); addr = kmap(page->cp_vmpage); memset(addr, 0, cl_page_size(obj)); From patchwork Thu Feb 27 21:15:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE64B138D for ; Thu, 27 Feb 2020 21:44:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D6A2924690 for ; Thu, 27 Feb 2020 21:44:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6A2924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 61D0421F943; Thu, 27 Feb 2020 13:35:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC67821FFFA for ; Thu, 27 Feb 2020 13:20:46 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DE7F89177; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DCED546C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:46 -0500 Message-Id: <1582838290-17243-479-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 478/622] lustre: readahead: convert stride page index to byte X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong This is a prepared patch to support unaligned stride readahead. Some detection variables are converted to byte unit to be aware of possible unaligned stride read. Since we still need read pages by page index, so those variables are still kept as page unit. to make things more clear, fix them to use pgoff_t rather than unsigned long. WC-bug-id: https://jira.whamcloud.com/browse/LU-12518 Lustre-commit: 0923e4055116 ("LU-12518 readahead: convert stride page index to byte") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35829 Reviewed-by: Li Xi Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 60 +++++----- fs/lustre/llite/rw.c | 243 ++++++++++++++++++++------------------- 2 files changed, 153 insertions(+), 150 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 025d33e..d84f50c 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -358,22 +358,22 @@ struct ll_ra_info { * counted by page index. */ struct ra_io_arg { - unsigned long ria_start; /* start offset of read-ahead*/ - unsigned long ria_end; /* end offset of read-ahead*/ - unsigned long ria_reserved; /* reserved pages for read-ahead */ - unsigned long ria_end_min; /* minimum end to cover current read */ - bool ria_eof; /* reach end of file */ + pgoff_t ria_start; /* start offset of read-ahead*/ + pgoff_t ria_end; /* end offset of read-ahead*/ + unsigned long ria_reserved; /* reserved pages for read-ahead */ + pgoff_t ria_end_min; /* minimum end to cover current read */ + bool ria_eof; /* reach end of file */ /* If stride read pattern is detected, ria_stoff means where * stride read is started. Note: for normal read-ahead, the * value here is meaningless, and also it will not be accessed */ - pgoff_t ria_stoff; - /* ria_length and ria_pages are the length and pages length in the + unsigned long ria_stoff; + /* ria_length and ria_bytes are the length and pages length in the * stride I/O mode. And they will also be used to check whether * it is stride I/O read-ahead in the read-ahead pages */ - unsigned long ria_length; - unsigned long ria_pages; + unsigned long ria_length; + unsigned long ria_bytes; }; /* LL_HIST_MAX=32 causes an overflow */ @@ -592,16 +592,10 @@ struct ll_sb_info { */ struct ll_readahead_state { spinlock_t ras_lock; + /* End byte that read(2) try to read. */ + unsigned long ras_last_read_end; /* - * index of the last page that read(2) needed and that wasn't in the - * cache. Used by ras_update() to detect seeks. - * - * XXX nikita: if access seeks into cached region, Lustre doesn't see - * this. - */ - unsigned long ras_last_readpage; - /* - * number of pages read after last read-ahead window reset. As window + * number of bytes read after last read-ahead window reset. As window * is reset on each seek, this is effectively a number of consecutive * accesses. Maybe ->ras_accessed_in_window is better name. * @@ -610,13 +604,13 @@ struct ll_readahead_state { * case, it probably doesn't make sense to expand window to * PTLRPC_MAX_BRW_PAGES on the third access. */ - unsigned long ras_consecutive_pages; + unsigned long ras_consecutive_bytes; /* * number of read requests after the last read-ahead window reset * As window is reset on each seek, this is effectively the number * on consecutive read request and is used to trigger read-ahead. */ - unsigned long ras_consecutive_requests; + unsigned long ras_consecutive_requests; /* * Parameters of current read-ahead window. Handled by * ras_update(). On the initial access to the file or after a seek, @@ -624,7 +618,7 @@ struct ll_readahead_state { * expanded to PTLRPC_MAX_BRW_PAGES. Afterwards, window is enlarged by * PTLRPC_MAX_BRW_PAGES chunks up to ->ra_max_pages. */ - unsigned long ras_window_start, ras_window_len; + pgoff_t ras_window_start, ras_window_len; /* * Optimal RPC size. It decides how many pages will be sent * for each read-ahead. @@ -637,41 +631,41 @@ struct ll_readahead_state { * ->ra_max_pages (see ll_ra_count_get()), 2. client cannot read pages * not covered by DLM lock. */ - unsigned long ras_next_readahead; + pgoff_t ras_next_readahead; /* * Total number of ll_file_read requests issued, reads originating * due to mmap are not counted in this total. This value is used to * trigger full file read-ahead after multiple reads to a small file. */ - unsigned long ras_requests; + unsigned long ras_requests; /* * Page index with respect to the current request, these value * will not be accurate when dealing with reads issued via mmap. */ - unsigned long ras_request_index; + unsigned long ras_request_index; /* * The following 3 items are used for detecting the stride I/O * mode. * In stride I/O mode, * ...............|-----data-----|****gap*****|--------|******|.... - * offset |-stride_pages-|-stride_gap-| + * offset |-stride_bytes-|-stride_gap-| * ras_stride_offset = offset; - * ras_stride_length = stride_pages + stride_gap; - * ras_stride_pages = stride_pages; - * Note: all these three items are counted by pages. + * ras_stride_length = stride_bytes + stride_gap; + * ras_stride_bytes = stride_bytes; + * Note: all these three items are counted by bytes. */ - unsigned long ras_stride_length; - unsigned long ras_stride_pages; - pgoff_t ras_stride_offset; + unsigned long ras_stride_length; + unsigned long ras_stride_bytes; + unsigned long ras_stride_offset; /* * number of consecutive stride request count, and it is similar as * ras_consecutive_requests, but used for stride I/O mode. * Note: only more than 2 consecutive stride request are detected, * stride read-ahead will be enable */ - unsigned long ras_consecutive_stride_requests; + unsigned long ras_consecutive_stride_requests; /* index of the last page that async readahead starts */ - unsigned long ras_async_last_readpage; + pgoff_t ras_async_last_readpage; }; struct ll_readahead_work { diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 7c2dbdc..38f7aa2c 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -131,19 +131,18 @@ void ll_ra_stats_inc(struct inode *inode, enum ra_stat which) #define RAS_CDEBUG(ras) \ CDEBUG(D_READA, \ - "lrp %lu cr %lu cp %lu ws %lu wl %lu nra %lu rpc %lu " \ - "r %lu ri %lu csr %lu sf %lu sp %lu sl %lu lr %lu\n", \ - ras->ras_last_readpage, ras->ras_consecutive_requests, \ - ras->ras_consecutive_pages, ras->ras_window_start, \ + "lre %lu cr %lu cb %lu ws %lu wl %lu nra %lu rpc %lu r %lu ri %lu csr %lu sf %lu sb %lu sl %lu lr %lu\n", \ + ras->ras_last_read_end, ras->ras_consecutive_requests, \ + ras->ras_consecutive_bytes, ras->ras_window_start, \ ras->ras_window_len, ras->ras_next_readahead, \ ras->ras_rpc_size, \ ras->ras_requests, ras->ras_request_index, \ ras->ras_consecutive_stride_requests, ras->ras_stride_offset, \ - ras->ras_stride_pages, ras->ras_stride_length, \ + ras->ras_stride_bytes, ras->ras_stride_length, \ ras->ras_async_last_readpage) -static int index_in_window(unsigned long index, unsigned long point, - unsigned long before, unsigned long after) +static int pos_in_window(unsigned long pos, unsigned long point, + unsigned long before, unsigned long after) { unsigned long start = point - before, end = point + after; @@ -152,7 +151,7 @@ static int index_in_window(unsigned long index, unsigned long point, if (end < point) end = ~0; - return start <= index && index <= end; + return start <= pos && pos <= end; } void ll_ras_enter(struct file *f) @@ -242,10 +241,10 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io, return rc; } -#define RIA_DEBUG(ria) \ - CDEBUG(D_READA, "rs %lu re %lu ro %lu rl %lu rp %lu\n", \ - ria->ria_start, ria->ria_end, ria->ria_stoff, ria->ria_length,\ - ria->ria_pages) +#define RIA_DEBUG(ria) \ + CDEBUG(D_READA, "rs %lu re %lu ro %lu rl %lu rb %lu\n", \ + ria->ria_start, ria->ria_end, ria->ria_stoff, \ + ria->ria_length, ria->ria_bytes) static inline int stride_io_mode(struct ll_readahead_state *ras) { @@ -255,72 +254,76 @@ static inline int stride_io_mode(struct ll_readahead_state *ras) /* The function calculates how much pages will be read in * [off, off + length], in such stride IO area, * stride_offset = st_off, stride_length = st_len, - * stride_pages = st_pgs + * stride_bytes = st_bytes * * |------------------|*****|------------------|*****|------------|*****|.... * st_off - * |--- st_pgs ---| + * |--- st_bytes ---| * |----- st_len -----| * - * How many pages it should read in such pattern + * How many bytes it should read in such pattern * |-------------------------------------------------------------| * off * |<------ length ------->| * * = |<----->| + |-------------------------------------| + |---| - * start_left st_pgs * i end_left + * start_left st_bytes * i end_left */ static unsigned long -stride_pg_count(pgoff_t st_off, unsigned long st_len, unsigned long st_pgs, - unsigned long off, unsigned long length) +stride_byte_count(unsigned long st_off, unsigned long st_len, + unsigned long st_bytes, unsigned long off, + unsigned long length) { u64 start = off > st_off ? off - st_off : 0; u64 end = off + length > st_off ? off + length - st_off : 0; unsigned long start_left = 0; unsigned long end_left = 0; - unsigned long pg_count; + unsigned long bytes_count; if (st_len == 0 || length == 0 || end == 0) return length; start_left = do_div(start, st_len); - if (start_left < st_pgs) - start_left = st_pgs - start_left; + if (start_left < st_bytes) + start_left = st_bytes - start_left; else start_left = 0; end_left = do_div(end, st_len); - if (end_left > st_pgs) - end_left = st_pgs; + if (end_left > st_bytes) + end_left = st_bytes; CDEBUG(D_READA, "start %llu, end %llu start_left %lu end_left %lu\n", start, end, start_left, end_left); if (start == end) - pg_count = end_left - (st_pgs - start_left); + bytes_count = end_left - (st_bytes - start_left); else - pg_count = start_left + st_pgs * (end - start - 1) + end_left; + bytes_count = start_left + + st_bytes * (end - start - 1) + end_left; CDEBUG(D_READA, - "st_off %lu, st_len %lu st_pgs %lu off %lu length %lu pgcount %lu\n", - st_off, st_len, st_pgs, off, length, pg_count); + "st_off %lu, st_len %lu st_bytes %lu off %lu length %lu bytescount %lu\n", + st_off, st_len, st_bytes, off, length, bytes_count); - return pg_count; + return bytes_count; } static int ria_page_count(struct ra_io_arg *ria) { u64 length = ria->ria_end >= ria->ria_start ? ria->ria_end - ria->ria_start + 1 : 0; + unsigned int bytes_count; + + bytes_count = stride_byte_count(ria->ria_stoff, ria->ria_length, + ria->ria_bytes, ria->ria_start, + length << PAGE_SHIFT); + return (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT; - return stride_pg_count(ria->ria_stoff, ria->ria_length, - ria->ria_pages, ria->ria_start, - length); } static unsigned long ras_align(struct ll_readahead_state *ras, - unsigned long index, - unsigned long *remainder) + pgoff_t index, unsigned long *remainder) { unsigned long rem = index % ras->ras_rpc_size; @@ -337,9 +340,9 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) * For stride I/O mode, just check whether the idx is inside * the ria_pages. */ - return ria->ria_length == 0 || ria->ria_length == ria->ria_pages || + return ria->ria_length == 0 || ria->ria_length == ria->ria_bytes || (idx >= ria->ria_stoff && (idx - ria->ria_stoff) % - ria->ria_length < ria->ria_pages); + ria->ria_length < ria->ria_bytes); } static unsigned long @@ -356,7 +359,7 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) LASSERT(ria); RIA_DEBUG(ria); - stride_ria = ria->ria_length > ria->ria_pages && ria->ria_pages > 0; + stride_ria = ria->ria_length > ria->ria_bytes && ria->ria_bytes > 0; for (page_idx = ria->ria_start; page_idx <= ria->ria_end && ria->ria_reserved > 0; page_idx++) { if (ras_inside_ra_window(page_idx, ria)) { @@ -419,20 +422,13 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) * read-ahead mode, then check whether it should skip * the stride gap. */ - pgoff_t offset; - /* NOTE: This assertion only is valid when it is for - * forward read-ahead, must adjust if backward - * readahead is implemented. - */ - LASSERTF(page_idx >= ria->ria_stoff, - "Invalid page_idx %lu rs %lu re %lu ro %lu rl %lu rp %lu\n", - page_idx, - ria->ria_start, ria->ria_end, ria->ria_stoff, - ria->ria_length, ria->ria_pages); - offset = page_idx - ria->ria_stoff; - offset = offset % (ria->ria_length); - if (offset >= ria->ria_pages) { - page_idx += ria->ria_length - offset - 1; + unsigned long offset; + unsigned long pos = page_idx << PAGE_SHIFT; + + offset = (pos - ria->ria_stoff) % ria->ria_length; + if (offset >= ria->ria_bytes) { + pos += (ria->ria_length - offset); + page_idx = (pos >> PAGE_SHIFT) - 1; CDEBUG(D_READA, "Stride: jump %lu pages to %lu\n", ria->ria_length - offset, page_idx); @@ -647,7 +643,8 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, * so that stride read ahead can work correctly. */ if (stride_io_mode(ras)) - start = max(ras->ras_next_readahead, ras->ras_stride_offset); + start = max(ras->ras_next_readahead, + ras->ras_stride_offset >> PAGE_SHIFT); else start = ras->ras_next_readahead; @@ -676,7 +673,7 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, if (stride_io_mode(ras)) { ria->ria_stoff = ras->ras_stride_offset; ria->ria_length = ras->ras_stride_length; - ria->ria_pages = ras->ras_stride_pages; + ria->ria_bytes = ras->ras_stride_bytes; } spin_unlock(&ras->ras_lock); @@ -739,21 +736,18 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, return ret; } -static void ras_set_start(struct inode *inode, struct ll_readahead_state *ras, - unsigned long index) +static void ras_set_start(struct ll_readahead_state *ras, pgoff_t index) { ras->ras_window_start = ras_align(ras, index, NULL); } /* called with the ras_lock held or from places where it doesn't matter */ -static void ras_reset(struct inode *inode, struct ll_readahead_state *ras, - unsigned long index) +static void ras_reset(struct ll_readahead_state *ras, pgoff_t index) { - ras->ras_last_readpage = index; ras->ras_consecutive_requests = 0; - ras->ras_consecutive_pages = 0; + ras->ras_consecutive_bytes = 0; ras->ras_window_len = 0; - ras_set_start(inode, ras, index); + ras_set_start(ras, index); ras->ras_next_readahead = max(ras->ras_window_start, index + 1); RAS_CDEBUG(ras); @@ -764,7 +758,7 @@ static void ras_stride_reset(struct ll_readahead_state *ras) { ras->ras_consecutive_stride_requests = 0; ras->ras_stride_length = 0; - ras->ras_stride_pages = 0; + ras->ras_stride_bytes = 0; RAS_CDEBUG(ras); } @@ -772,56 +766,59 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras) { spin_lock_init(&ras->ras_lock); ras->ras_rpc_size = PTLRPC_MAX_BRW_PAGES; - ras_reset(inode, ras, 0); + ras_reset(ras, 0); + ras->ras_last_read_end = 0; ras->ras_requests = 0; } /* * Check whether the read request is in the stride window. - * If it is in the stride window, return 1, otherwise return 0. + * If it is in the stride window, return true, otherwise return false. */ -static int index_in_stride_window(struct ll_readahead_state *ras, - unsigned long index) +static bool index_in_stride_window(struct ll_readahead_state *ras, + pgoff_t index) { unsigned long stride_gap; + unsigned long pos = index << PAGE_SHIFT; - if (ras->ras_stride_length == 0 || ras->ras_stride_pages == 0 || - ras->ras_stride_pages == ras->ras_stride_length) - return 0; + if (ras->ras_stride_length == 0 || ras->ras_stride_bytes == 0 || + ras->ras_stride_bytes == ras->ras_stride_length) + return false; - stride_gap = index - ras->ras_last_readpage - 1; + stride_gap = pos - ras->ras_last_read_end - 1; /* If it is contiguous read */ if (stride_gap == 0) - return ras->ras_consecutive_pages + 1 <= ras->ras_stride_pages; + return ras->ras_consecutive_bytes + PAGE_SIZE <= + ras->ras_stride_bytes; /* Otherwise check the stride by itself */ - return (ras->ras_stride_length - ras->ras_stride_pages) == stride_gap && - ras->ras_consecutive_pages == ras->ras_stride_pages; + return (ras->ras_stride_length - ras->ras_stride_bytes) == stride_gap && + ras->ras_consecutive_bytes == ras->ras_stride_bytes; } -static void ras_update_stride_detector(struct ll_readahead_state *ras, - unsigned long index) +static void ras_init_stride_detector(struct ll_readahead_state *ras, + unsigned long pos, unsigned long count) { - unsigned long stride_gap = index - ras->ras_last_readpage - 1; + unsigned long stride_gap = pos - ras->ras_last_read_end - 1; if ((stride_gap != 0 || ras->ras_consecutive_stride_requests == 0) && !stride_io_mode(ras)) { - ras->ras_stride_pages = ras->ras_consecutive_pages; - ras->ras_stride_length = ras->ras_consecutive_pages + + ras->ras_stride_bytes = ras->ras_consecutive_bytes; + ras->ras_stride_length = ras->ras_consecutive_bytes + stride_gap; } LASSERT(ras->ras_request_index == 0); LASSERT(ras->ras_consecutive_stride_requests == 0); - if (index <= ras->ras_last_readpage) { + if (pos <= ras->ras_last_read_end) { /*Reset stride window for forward read*/ ras_stride_reset(ras); return; } - ras->ras_stride_pages = ras->ras_consecutive_pages; - ras->ras_stride_length = stride_gap + ras->ras_consecutive_pages; + ras->ras_stride_bytes = ras->ras_consecutive_bytes; + ras->ras_stride_length = stride_gap + ras->ras_consecutive_bytes; RAS_CDEBUG(ras); } @@ -835,36 +832,42 @@ static void ras_stride_increase_window(struct ll_readahead_state *ras, { unsigned long left, step, window_len; unsigned long stride_len; + unsigned long end = ras->ras_window_start + ras->ras_window_len; LASSERT(ras->ras_stride_length > 0); - LASSERTF(ras->ras_window_start + ras->ras_window_len >= - ras->ras_stride_offset, + LASSERTF(end >= (ras->ras_stride_offset >> PAGE_SHIFT), "window_start %lu, window_len %lu stride_offset %lu\n", - ras->ras_window_start, - ras->ras_window_len, ras->ras_stride_offset); + ras->ras_window_start, ras->ras_window_len, + ras->ras_stride_offset); - stride_len = ras->ras_window_start + ras->ras_window_len - - ras->ras_stride_offset; + end <<= PAGE_SHIFT; + if (end < ras->ras_stride_offset) + stride_len = 0; + else + stride_len = end - ras->ras_stride_offset; left = stride_len % ras->ras_stride_length; - window_len = ras->ras_window_len - left; + window_len = (ras->ras_window_len << PAGE_SHIFT) - left; - if (left < ras->ras_stride_pages) + if (left < ras->ras_stride_bytes) left += inc_len; else - left = ras->ras_stride_pages + inc_len; + left = ras->ras_stride_bytes + inc_len; - LASSERT(ras->ras_stride_pages != 0); + LASSERT(ras->ras_stride_bytes != 0); - step = left / ras->ras_stride_pages; - left %= ras->ras_stride_pages; + step = left / ras->ras_stride_bytes; + left %= ras->ras_stride_bytes; window_len += step * ras->ras_stride_length + left; - if (stride_pg_count(ras->ras_stride_offset, ras->ras_stride_length, - ras->ras_stride_pages, ras->ras_stride_offset, - window_len) <= ra->ra_max_pages_per_file) - ras->ras_window_len = window_len; + if (DIV_ROUND_UP(stride_byte_count(ras->ras_stride_offset, + ras->ras_stride_length, + ras->ras_stride_bytes, + ras->ras_stride_offset, + window_len), PAGE_SIZE) + <= ra->ra_max_pages_per_file) + ras->ras_window_len = (window_len >> PAGE_SHIFT); RAS_CDEBUG(ras); } @@ -878,7 +881,8 @@ static void ras_increase_window(struct inode *inode, * information from lower layer. FIXME later */ if (stride_io_mode(ras)) { - ras_stride_increase_window(ras, ra, ras->ras_rpc_size); + ras_stride_increase_window(ras, ra, + ras->ras_rpc_size << PAGE_SHIFT); } else { unsigned long wlen; @@ -897,6 +901,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, { struct ll_ra_info *ra = &sbi->ll_ra_info; int zero = 0, stride_detect = 0, ra_miss = 0; + unsigned long pos = index << PAGE_SHIFT; bool hit = flags & LL_RAS_HIT; spin_lock(&ras->ras_lock); @@ -913,13 +918,14 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, * be a symptom of there being so many read-ahead pages that the VM is * reclaiming it before we get to it. */ - if (!index_in_window(index, ras->ras_last_readpage, 8, 8)) { + if (!pos_in_window(pos, ras->ras_last_read_end, + 8 << PAGE_SHIFT, 8 << PAGE_SHIFT)) { zero = 1; ll_ra_stats_inc_sbi(sbi, RA_STAT_DISTANT_READPAGE); } else if (!hit && ras->ras_window_len && index < ras->ras_next_readahead && - index_in_window(index, ras->ras_window_start, 0, - ras->ras_window_len)) { + pos_in_window(index, ras->ras_window_start, 0, + ras->ras_window_len)) { ra_miss = 1; ll_ra_stats_inc_sbi(sbi, RA_STAT_MISS_IN_WINDOW); } @@ -955,16 +961,16 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, if (!index_in_stride_window(ras, index)) { if (ras->ras_consecutive_stride_requests == 0 && ras->ras_request_index == 0) { - ras_update_stride_detector(ras, index); + ras_init_stride_detector(ras, pos, PAGE_SIZE); ras->ras_consecutive_stride_requests++; } else { ras_stride_reset(ras); } - ras_reset(inode, ras, index); - ras->ras_consecutive_pages++; + ras_reset(ras, index); + ras->ras_consecutive_bytes += PAGE_SIZE; goto out_unlock; } else { - ras->ras_consecutive_pages = 0; + ras->ras_consecutive_bytes = 0; ras->ras_consecutive_requests = 0; if (++ras->ras_consecutive_stride_requests > 1) stride_detect = 1; @@ -974,9 +980,10 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, if (ra_miss) { if (index_in_stride_window(ras, index) && stride_io_mode(ras)) { - if (index != ras->ras_last_readpage + 1) - ras->ras_consecutive_pages = 0; - ras_reset(inode, ras, index); + if (index != (ras->ras_last_read_end >> + PAGE_SHIFT) + 1) + ras->ras_consecutive_bytes = 0; + ras_reset(ras, index); /* If stride-RA hit cache miss, the stride * detector will not be reset to avoid the @@ -986,15 +993,15 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, * read-ahead window. */ if (ras->ras_window_start < - ras->ras_stride_offset) + (ras->ras_stride_offset >> PAGE_SHIFT)) ras_stride_reset(ras); RAS_CDEBUG(ras); } else { /* Reset both stride window and normal RA * window */ - ras_reset(inode, ras, index); - ras->ras_consecutive_pages++; + ras_reset(ras, index); + ras->ras_consecutive_bytes += PAGE_SIZE; ras_stride_reset(ras); goto out_unlock; } @@ -1011,9 +1018,8 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, } } } - ras->ras_consecutive_pages++; - ras->ras_last_readpage = index; - ras_set_start(inode, ras, index); + ras->ras_consecutive_bytes += PAGE_SIZE; + ras_set_start(ras, index); if (stride_io_mode(ras)) { /* Since stride readahead is sensitive to the offset @@ -1022,8 +1028,9 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, */ ras->ras_next_readahead = max(index + 1, ras->ras_next_readahead); - ras->ras_window_start = max(ras->ras_stride_offset, - ras->ras_window_start); + ras->ras_window_start = + max(ras->ras_stride_offset >> PAGE_SHIFT, + ras->ras_window_start); } else { if (ras->ras_next_readahead < ras->ras_window_start) ras->ras_next_readahead = ras->ras_window_start; @@ -1035,13 +1042,14 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, /* Trigger RA in the mmap case where ras_consecutive_requests * is not incremented and thus can't be used to trigger RA */ - if (ras->ras_consecutive_pages >= 4 && flags & LL_RAS_MMAP) { + if (ras->ras_consecutive_bytes >= (4 << PAGE_SHIFT) && + flags & LL_RAS_MMAP) { ras_increase_window(inode, ras, ra); /* * reset consecutive pages so that the readahead window can * grow gradually. */ - ras->ras_consecutive_pages = 0; + ras->ras_consecutive_bytes = 0; goto out_unlock; } @@ -1052,7 +1060,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, * reset to make sure next_readahead > stride offset */ ras->ras_next_readahead = max(index, ras->ras_next_readahead); - ras->ras_stride_offset = index; + ras->ras_stride_offset = index << PAGE_SHIFT; ras->ras_window_start = max(index, ras->ras_window_start); } @@ -1066,6 +1074,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, out_unlock: RAS_CDEBUG(ras); ras->ras_request_index++; + ras->ras_last_read_end = pos + PAGE_SIZE - 1; spin_unlock(&ras->ras_lock); } From patchwork Thu Feb 27 21:15:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410471 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51C8D138D for ; Thu, 27 Feb 2020 21:39:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3AB1224690 for ; Thu, 27 Feb 2020 21:39:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3AB1224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1A31834A543; Thu, 27 Feb 2020 13:32:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4ED6421CA27 for ; Thu, 27 Feb 2020 13:20:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E172A9178; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DFAD146D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:47 -0500 Message-Id: <1582838290-17243-480-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 479/622] lustre: osc: prevent use after free X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam Clear aa_oa after it's been freed to prevent use after free. WC-bug-id: https://jira.whamcloud.com/browse/LU-12581 Lustre-commit: 61c9f8797771 ("LU-12581 osc: prevent use after free") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/35601 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 75e0823..7ba9ea5 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -748,6 +748,7 @@ static int osc_shrink_grant_interpret(const struct lu_env *env, osc_update_grant(cli, body); out: kmem_cache_free(osc_obdo_kmem, aa->aa_oa); + aa->aa_oa = NULL; return rc; } @@ -2131,6 +2132,7 @@ static int brw_interpret(const struct lu_env *env, cl_object_attr_unlock(obj); } kmem_cache_free(osc_obdo_kmem, aa->aa_oa); + aa->aa_oa = NULL; if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0) osc_inc_unstable_pages(req); From patchwork Thu Feb 27 21:15:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410475 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3A0D17E0 for ; Thu, 27 Feb 2020 21:39:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CC35724690 for ; Thu, 27 Feb 2020 21:39:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC35724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C4FF34A56A; Thu, 27 Feb 2020 13:32:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 90DCD220000 for ; Thu, 27 Feb 2020 13:20:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E40739179; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E277447C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:48 -0500 Message-Id: <1582838290-17243-481-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 480/622] lustre: mdc: hold obd while processing changelog X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang During read/write changelog, the corresponding obd_device should be held to protect it from being released by umount. WC-bug-id: https://jira.whamcloud.com/browse/LU-11626 Lustre-commit: d7bb6647cd4d ("LU-11626 mdc: hold obd while processing changelog") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/35784 Reviewed-by: Andreas Dilger Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_changelog.c | 121 ++++++++++++++++++++++++------------------ 1 file changed, 70 insertions(+), 51 deletions(-) diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c index 9af0541..043549d 100644 --- a/fs/lustre/mdc/mdc_changelog.c +++ b/fs/lustre/mdc/mdc_changelog.c @@ -69,8 +69,10 @@ struct chlg_registered_dev { }; struct chlg_reader_state { - /* Device this state is associated with */ - struct chlg_registered_dev *crs_dev; + /* Shortcut to the corresponding OBD device */ + struct obd_device *crs_obd; + /* the corresponding chlg_registered_dev */ + struct chlg_registered_dev *crs_ced; /* Producer thread (if any) */ struct task_struct *crs_prod_task; /* An error occurred that prevents from reading further */ @@ -109,6 +111,41 @@ enum { }; /** + * Deregister a changelog character device whose refcount has reached zero. + */ +static void chlg_dev_clear(struct kref *kref) +{ + struct chlg_registered_dev *entry = container_of(kref, + struct chlg_registered_dev, + ced_refs); + + list_del(&entry->ced_link); + misc_deregister(&entry->ced_misc); + kfree(entry); +} + +static inline struct obd_device *chlg_obd_get(struct chlg_registered_dev *dev) +{ + struct obd_device *obd; + + mutex_lock(&chlg_registered_dev_lock); + if (list_empty(&dev->ced_obds)) + return NULL; + + obd = list_first_entry(&dev->ced_obds, struct obd_device, + u.cli.cl_chg_dev_linkage); + class_incref(obd, "changelog", dev); + mutex_unlock(&chlg_registered_dev_lock); + return obd; +} + +static inline void chlg_obd_put(struct chlg_registered_dev *dev, + struct obd_device *obd) +{ + class_decref(obd, "changelog", dev); +} + +/** * ChangeLog catalog processing callback invoked on each record. * If the current record is eligible to userland delivery, push * it into the crs_rec_queue where the consumer code will fetch it. @@ -142,7 +179,7 @@ static int chlg_read_cat_process_cb(const struct lu_env *env, if (rec->cr_hdr.lrh_type != CHANGELOG_REC) { rc = -EINVAL; CERROR("%s: not a changelog rec %x/%d in llog : rc = %d\n", - crs->crs_dev->ced_name, rec->cr_hdr.lrh_type, + crs->crs_obd->obd_name, rec->cr_hdr.lrh_type, rec->cr.cr_type, rc); return rc; } @@ -193,17 +230,6 @@ static void enq_record_delete(struct chlg_rec_entry *rec) kfree(rec); } -/* - * Find any OBD device associated with this reader - * chlg_registered_dev_lock is held. - */ -static inline struct obd_device *chlg_obd_get(struct chlg_registered_dev *dev) -{ - return list_first_entry_or_null(&dev->ced_obds, - struct obd_device, - u.cli.cl_chg_dev_linkage); -} - /** * Record prefetch thread entry point. Opens the changelog catalog and starts * reading records. @@ -215,27 +241,28 @@ static inline struct obd_device *chlg_obd_get(struct chlg_registered_dev *dev) static int chlg_load(void *args) { struct chlg_reader_state *crs = args; + struct chlg_registered_dev *ced = crs->crs_ced; struct obd_device *obd; struct llog_ctxt *ctx = NULL; struct llog_handle *llh = NULL; int rc; - mutex_lock(&chlg_registered_dev_lock); - obd = chlg_obd_get(crs->crs_dev); - if (!obd) { - rc = -ENOENT; - goto err_out; - } + crs->crs_last_catidx = -1; + crs->crs_last_idx = 0; + +again: + obd = chlg_obd_get(ced); + if (!obd) + return -ENODEV; + + crs->crs_obd = obd; + ctx = llog_get_context(obd, LLOG_CHANGELOG_REPL_CTXT); if (!ctx) { rc = -ENOENT; goto err_out; } - crs->crs_last_catidx = -1; - crs->crs_last_idx = 0; - -again: rc = llog_open(NULL, ctx, &llh, NULL, CHANGELOG_CATALOG, LLOG_OPEN_EXISTS); if (rc) { @@ -268,6 +295,8 @@ static int chlg_load(void *args) } if (!kthread_should_stop() && crs->crs_poll) { llog_cat_close(NULL, llh); + llog_ctxt_put(ctx); + class_decref(obd, "changelog", crs); schedule_timeout_interruptible(HZ); goto again; } @@ -275,7 +304,6 @@ static int chlg_load(void *args) crs->crs_eof = true; err_out: - mutex_unlock(&chlg_registered_dev_lock); if (rc < 0) crs->crs_err = rc; @@ -287,6 +315,8 @@ static int chlg_load(void *args) if (ctx) llog_ctxt_put(ctx); + crs->crs_obd = NULL; + chlg_obd_put(ced, obd); wait_event_idle(crs->crs_waitq_prod, kthread_should_stop()); return rc; @@ -454,19 +484,19 @@ static int chlg_clear(struct chlg_reader_state *crs, u32 reader, u64 record) .cs_recno = record, .cs_id = reader }; - int ret; + int rc; - mutex_lock(&chlg_registered_dev_lock); - obd = chlg_obd_get(crs->crs_dev); + obd = chlg_obd_get(crs->crs_ced); if (!obd) - ret = -ENOENT; - else - ret = obd_set_info_async(NULL, obd->obd_self_export, - strlen(KEY_CHANGELOG_CLEAR), - KEY_CHANGELOG_CLEAR, sizeof(cs), - &cs, NULL); - mutex_unlock(&chlg_registered_dev_lock); - return ret; + return -ENODEV; + + rc = obd_set_info_async(NULL, obd->obd_self_export, + strlen(KEY_CHANGELOG_CLEAR), + KEY_CHANGELOG_CLEAR, sizeof(cs), + &cs, NULL); + chlg_obd_put(crs->crs_ced, obd); + + return rc; } /** Maximum changelog control command size */ @@ -540,7 +570,8 @@ static int chlg_open(struct inode *inode, struct file *file) if (!crs) return -ENOMEM; - crs->crs_dev = dev; + kref_get(&dev->ced_refs); + crs->crs_ced = dev; crs->crs_err = false; crs->crs_eof = false; @@ -564,6 +595,7 @@ static int chlg_open(struct inode *inode, struct file *file) return 0; err_crs: + kref_put(&dev->ced_refs, chlg_dev_clear); kfree(crs); return rc; } @@ -589,6 +621,7 @@ static int chlg_release(struct inode *inode, struct file *file) list_for_each_entry_safe(rec, tmp, &crs->crs_rec_queue, enq_linkage) enq_record_delete(rec); + kref_put(&crs->crs_ced->ced_refs, chlg_dev_clear); kfree(crs); return rc; } @@ -763,20 +796,6 @@ int mdc_changelog_cdev_init(struct obd_device *obd) } /** - * Deregister a changelog character device whose refcount has reached zero. - */ -static void chlg_dev_clear(struct kref *kref) -{ - struct chlg_registered_dev *entry = container_of(kref, - struct chlg_registered_dev, - ced_refs); - LASSERT(mutex_is_locked(&chlg_registered_dev_lock)); - list_del(&entry->ced_link); - misc_deregister(&entry->ced_misc); - kfree(entry); -} - -/** * Release OBD, decrease reference count of the corresponding changelog device. */ void mdc_changelog_cdev_finish(struct obd_device *obd) From patchwork Thu Feb 27 21:15:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410479 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E07A17E0 for ; Thu, 27 Feb 2020 21:39:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7684F246A1 for ; Thu, 27 Feb 2020 21:39:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7684F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1FF5F349119; Thu, 27 Feb 2020 13:32:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E735E220000 for ; Thu, 27 Feb 2020 13:20:47 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E6930917A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E52CE468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:49 -0500 Message-Id: <1582838290-17243-482-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 481/622] lnet: change ln_mt_waitq to a completion. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown ln_mt_waitq is only waited on by a call to wait_event_interruptible_timeout(..., false, timeout); As 'false' is never 'true', this will always wait for the full timeout to expire. So the waitq is effectively pointless. To acheive the apparent intent of the waitq, change it to a completion. The completion adds a 'done' flag to a waitq so we can wait until a timeout or until a wakeup is requested. With this, a longer timeout would could be used, but that is left to a later patch. WC-bug-id: https://jira.whamcloud.com/browse/LU-12686 Lustre-commit: b81bcc6c6f0c ("LU-12686 lnet: change ln_mt_waitq to a completion.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35874 Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 5 +++-- net/lnet/lnet/api-ni.c | 2 +- net/lnet/lnet/lib-move.c | 10 +++++++--- net/lnet/lnet/lib-msg.c | 2 +- net/lnet/lnet/router.c | 4 ++-- 5 files changed, 14 insertions(+), 9 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 22c2bc6..18d4e4e 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -1145,10 +1145,11 @@ struct lnet { */ bool ln_nis_from_mod_params; - /* waitq for the monitor thread. The monitor thread takes care of + /* + * completion for the monitor thread. The monitor thread takes care of * checking routes, timedout messages and resending messages. */ - wait_queue_head_t ln_mt_waitq; + struct completion ln_mt_wait_complete; /* per-cpt resend queues */ struct list_head **ln_mt_resendqs; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 79deaac..e66d9dc7 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -486,7 +486,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force, spin_lock_init(&the_lnet.ln_eq_wait_lock); spin_lock_init(&the_lnet.ln_msg_resend_lock); init_waitqueue_head(&the_lnet.ln_eq_waitq); - init_waitqueue_head(&the_lnet.ln_mt_waitq); + init_completion(&the_lnet.ln_mt_wait_complete); mutex_init(&the_lnet.ln_lnd_mutex); } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 322998a..2f31f06 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3276,8 +3276,12 @@ struct lnet_mt_event_info { min((unsigned int)alive_router_check_interval / lnet_current_net_count, lnet_transaction_timeout / 2)); - wait_event_interruptible_timeout(the_lnet.ln_mt_waitq, - false, HZ * interval); + wait_for_completion_interruptible_timeout(&the_lnet.ln_mt_wait_complete, + interval * HZ); + /* Must re-init the completion before testing anything, + * including ln_mt_state. + */ + reinit_completion(&the_lnet.ln_mt_wait_complete); } /* Shutting down */ @@ -3539,7 +3543,7 @@ void lnet_monitor_thr_stop(void) lnet_net_unlock(LNET_LOCK_EX); /* tell the monitor thread that we're shutting down */ - wake_up(&the_lnet.ln_mt_waitq); + complete(&the_lnet.ln_mt_wait_complete); /* block until monitor thread signals that it's done */ wait_for_completion(&the_lnet.ln_mt_signal); diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 5c39ce3..d74ff53 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -640,7 +640,7 @@ list_add_tail(&msg->msg_list, the_lnet.ln_mt_resendqs[msg->msg_tx_cpt]); - wake_up(&the_lnet.ln_mt_waitq); + complete(&the_lnet.ln_mt_wait_complete); } int diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index bc9494d..7246eea 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -674,7 +674,7 @@ static void lnet_shuffle_seed(void) kfree(rnet); /* kick start the monitor thread to handle the added route */ - wake_up(&the_lnet.ln_mt_waitq); + complete(&the_lnet.ln_mt_wait_complete); return rc; } @@ -1419,7 +1419,7 @@ bool lnet_router_checker_active(void) lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_routing = 1; lnet_net_unlock(LNET_LOCK_EX); - wake_up(&the_lnet.ln_mt_waitq); + complete(&the_lnet.ln_mt_wait_complete); return 0; failed: From patchwork Thu Feb 27 21:15:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410889 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18AE1924 for ; Thu, 27 Feb 2020 21:49:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 017B224690 for ; Thu, 27 Feb 2020 21:49:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 017B224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4C40034A8D5; Thu, 27 Feb 2020 13:41:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 49D57220000 for ; Thu, 27 Feb 2020 13:20:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E943F917B; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E7F2946A; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:50 -0500 Message-Id: <1582838290-17243-483-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 482/622] lustre: obdclass: align to T10 sector size when generating guard X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xi , Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Otherwise the client and server would come up with different checksum when the page size is different. Improve test_810 to verify all available checksum types. WC-bug-id: https://jira.whamcloud.com/browse/LU-11729 Lustre-commit: 98ceaf854bb4 ("LU-11729 obdclass: align to T10 sector size when generating guard") Signed-off-by: Andreas Dilger Signed-off-by: Li Xi Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/34043 Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Signed-off-by: James Simmons --- fs/lustre/obdclass/integrity.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c index 5cb9a25..2d5760d 100644 --- a/fs/lustre/obdclass/integrity.c +++ b/fs/lustre/obdclass/integrity.c @@ -50,26 +50,26 @@ int obd_page_dif_generate_buffer(const char *obd_name, struct page *page, int *used_number, int sector_size, obd_dif_csum_fn *fn) { - unsigned int i; + unsigned int i = offset; + unsigned int end = offset + length; char *data_buf; u16 *guard_buf = guard_start; unsigned int data_size; int used = 0; data_buf = kmap(page) + offset; - for (i = 0; i < length; i += sector_size) { + while (i < end) { if (used >= guard_number) { CERROR("%s: unexpected used guard number of DIF %u/%u, data length %u, sector size %u: rc = %d\n", obd_name, used, guard_number, length, sector_size, -E2BIG); return -E2BIG; } - data_size = length - i; - if (data_size > sector_size) - data_size = sector_size; + data_size = min(round_up(i + 1, sector_size), end) - i; *guard_buf = fn(data_buf, data_size); guard_buf++; data_buf += data_size; + i += data_size; used++; } kunmap(page); From patchwork Thu Feb 27 21:15:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410483 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C04EA17E0 for ; Thu, 27 Feb 2020 21:39:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A907424690 for ; Thu, 27 Feb 2020 21:39:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A907424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 103E3348953; Thu, 27 Feb 2020 13:32:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8D2E5220000 for ; Thu, 27 Feb 2020 13:20:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EBCF5917C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EAAF146C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:51 -0500 Message-Id: <1582838290-17243-484-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 483/622] lustre: ptlrpc: Hold imp lock for idle reconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Idle reconnect sets import state to IMP_NEW, then releases the import lock before calling ptlrpc_connect_import. This creates a gap where an import in IMP_NEW state is exposed, which can cause new requests to fail with EIO. Hold the lock across the call so as not to expose imports in this state. WC-bug-id: https://jira.whamcloud.com/browse/LU-12559 Lustre-commit: e9472c54ac82 ("LU-12559 ptlrpc: Hold imp lock for idle reconnect") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35530 Reviewed-by: Alex Zhuravlev Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 1 + fs/lustre/ptlrpc/client.c | 13 ++++++------- fs/lustre/ptlrpc/import.c | 19 +++++++++++++++---- 3 files changed, 22 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index aaf5cb8..8dad08e 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -2015,6 +2015,7 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf, * @{ */ int ptlrpc_connect_import(struct obd_import *imp); +int ptlrpc_connect_import_locked(struct obd_import *imp); int ptlrpc_init_import(struct obd_import *imp); int ptlrpc_disconnect_import(struct obd_import *imp, int noclose); int ptlrpc_disconnect_and_idle_import(struct obd_import *imp); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 478ba85..c359ac0 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -870,7 +870,6 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, const struct req_format *format) { struct ptlrpc_request *request; - int connect = 0; request = __ptlrpc_request_alloc(imp, pool); if (!request) @@ -890,17 +889,17 @@ struct ptlrpc_request *__ptlrpc_request_alloc(struct obd_import *imp, if (imp->imp_state == LUSTRE_IMP_IDLE) { imp->imp_generation++; imp->imp_initiated_at = imp->imp_generation; - imp->imp_state = LUSTRE_IMP_NEW; - connect = 1; - } - spin_unlock(&imp->imp_lock); - if (connect) { - rc = ptlrpc_connect_import(imp); + imp->imp_state = LUSTRE_IMP_NEW; + + /* connect_import_locked releases imp_lock */ + rc = ptlrpc_connect_import_locked(imp); if (rc < 0) { ptlrpc_request_free(request); return NULL; } ptlrpc_pinger_add_import(imp); + } else { + spin_unlock(&imp->imp_lock); } } diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index ff1b810..c4a732d 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -611,13 +611,22 @@ static int ptlrpc_first_transno(struct obd_import *imp, u64 *transno) return 0; } +int ptlrpc_connect_import(struct obd_import *imp) +{ + spin_lock(&imp->imp_lock); + return ptlrpc_connect_import_locked(imp); +} + /** * Attempt to (re)connect import @imp. This includes all preparations, * initializing CONNECT RPC request and passing it to ptlrpcd for * actual sending. + * + * Assumes imp->imp_lock is held, and releases it. + * * Returns 0 on success or error code. */ -int ptlrpc_connect_import(struct obd_import *imp) +int ptlrpc_connect_import_locked(struct obd_import *imp) { struct obd_device *obd = imp->imp_obd; int initial_connect = 0; @@ -634,7 +643,8 @@ int ptlrpc_connect_import(struct obd_import *imp) struct ptlrpc_connect_async_args *aa; int rc; - spin_lock(&imp->imp_lock); + assert_spin_locked(&imp->imp_lock); + if (imp->imp_state == LUSTRE_IMP_CLOSED) { spin_unlock(&imp->imp_lock); CERROR("can't connect to a closed import\n"); @@ -1701,12 +1711,13 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env, connect = 1; } } - spin_unlock(&imp->imp_lock); if (connect) { - rc = ptlrpc_connect_import(imp); + rc = ptlrpc_connect_import_locked(imp); if (rc >= 0) ptlrpc_pinger_add_import(imp); + } else { + spin_unlock(&imp->imp_lock); } return 0; From patchwork Thu Feb 27 21:15:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC37017E0 for ; Thu, 27 Feb 2020 21:39:37 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D4D91246A1 for ; Thu, 27 Feb 2020 21:39:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D4D91246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E5CA4349A08; Thu, 27 Feb 2020 13:32:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E243221FB29 for ; Thu, 27 Feb 2020 13:20:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EEE06917D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ED68F46D; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:52 -0500 Message-Id: <1582838290-17243-485-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 484/622] lustre: osc: glimpse - search for active lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When there are lock-ahead write locks on a file, the server sends one glimpse AST RPC to each client having such (it may have many) locks. This callback is sent to the lock having the highest offset. Client's glimpse callback goes up to the clio layers and gets the global (not lock-specific) view of size. The clio layers are connected to the extent lock through the l_ast_data (which points to the OSC object). Speculative locks (AGL, lockahead) do not have l_ast_data initialised until an IO happens under the lock. Thus, some speculative locks may not have l_ast_data initialized. It is possible for the client to do a write using one lock (changing file size), but for the glimpse AST to be sent to another lock without l_ast_data initialized. Currently, a lock with no l_ast_data set returns ELDLM_NO_LOCK_DATA to the server. In this case, this means we do not return the updated size. The solution is to search the granted lock tree for any lock with initialized l_ast_data (it points to the OSC object which is the same for all the extent locks) and to reach the clio layers for the size through this lock instead. cray-bug-id: LUS-6747 WC-bug-id: https://jira.whamcloud.com/browse/LU-11670 Lustre-commit: b3461d11dcb0 ("LU-11670 osc: glimpse - search for active lock") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/33660 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 17 ++++++++++++++++- fs/lustre/include/obd_support.h | 1 + fs/lustre/ldlm/ldlm_lock.c | 39 ++++++++++++++++++++------------------- fs/lustre/osc/osc_lock.c | 41 ++++++++++++++++++++++++++++++++++++----- 4 files changed, 73 insertions(+), 25 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 4060bb4..f7d2d9c 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -809,6 +809,20 @@ struct ldlm_lock { }; /** + * Describe the overlap between two locks. itree_overlap_cb data. + */ +struct ldlm_match_data { + struct ldlm_lock *lmd_old; + struct ldlm_lock *lmd_lock; + enum ldlm_mode *lmd_mode; + union ldlm_policy_data *lmd_policy; + u64 lmd_flags; + u64 lmd_skip_flags; + int lmd_unref; + bool lmd_has_ast_data; +}; + +/** * LDLM resource description. * Basically, resource is a representation for a single object. * Object has a name which is currently 4 64-bit integers. LDLM user is @@ -1163,7 +1177,8 @@ static inline enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, return ldlm_lock_match_with_skip(ns, flags, 0, res_id, type, policy, mode, lh, unref); } - +struct ldlm_lock *search_itree(struct ldlm_resource *res, + struct ldlm_match_data *data); enum ldlm_mode ldlm_revalidate_lock_handle(const struct lustre_handle *lockh, u64 *bits); void ldlm_lock_cancel(struct ldlm_lock *lock); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 506535b..acfd098 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -330,6 +330,7 @@ #define OBD_FAIL_OSC_DELAY_SETTIME 0x412 #define OBD_FAIL_OSC_CONNECT_GRANT_PARAM 0x413 #define OBD_FAIL_OSC_DELAY_IO 0x414 +#define OBD_FAIL_OSC_NO_SIZE_DATA 0x415 #define OBD_FAIL_PTLRPC 0x500 #define OBD_FAIL_PTLRPC_ACK 0x501 diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index b6c49c5..d14221a 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -1045,19 +1045,6 @@ void ldlm_grant_lock(struct ldlm_lock *lock, struct list_head *work_list) } /** - * Describe the overlap between two locks. itree_overlap_cb data. - */ -struct lock_match_data { - struct ldlm_lock *lmd_old; - struct ldlm_lock *lmd_lock; - enum ldlm_mode *lmd_mode; - union ldlm_policy_data *lmd_policy; - u64 lmd_flags; - u64 lmd_skip_flags; - int lmd_unref; -}; - -/** * Check if the given @lock meets the criteria for a match. * A reference on the lock is taken if matched. * @@ -1066,9 +1053,9 @@ struct lock_match_data { */ static bool lock_matches(struct ldlm_lock *lock, void *vdata) { - struct lock_match_data *data = vdata; + struct ldlm_match_data *data = vdata; union ldlm_policy_data *lpol = &lock->l_policy_data; - enum ldlm_mode match; + enum ldlm_mode match = LCK_MINMODE; if (lock == data->lmd_old) return true; @@ -1098,6 +1085,17 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) if (!(lock->l_req_mode & *data->lmd_mode)) return false; + + /* When we search for ast_data, we are not doing a traditional match, + * so we don't worry about IBITS or extent matching. + */ + if (data->lmd_has_ast_data) { + if (!lock->l_ast_data) + return false; + + goto matched; + } + match = lock->l_req_mode; switch (lock->l_resource->lr_type) { @@ -1138,6 +1136,7 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) if (data->lmd_skip_flags & lock->l_flags) return false; +matched: if (data->lmd_flags & LDLM_FL_TEST_LOCK) { LDLM_LOCK_GET(lock); ldlm_lock_touch_in_lru(lock); @@ -1159,8 +1158,8 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) * * Return: a referenced lock or NULL. */ -static struct ldlm_lock *search_itree(struct ldlm_resource *res, - struct lock_match_data *data) +struct ldlm_lock *search_itree(struct ldlm_resource *res, + struct ldlm_match_data *data) { int idx; @@ -1185,6 +1184,7 @@ static struct ldlm_lock *search_itree(struct ldlm_resource *res, return NULL; } +EXPORT_SYMBOL(search_itree); /* * Search for a lock with given properties in a queue. @@ -1195,7 +1195,7 @@ static struct ldlm_lock *search_itree(struct ldlm_resource *res, * Return: a referenced lock or NULL. */ static struct ldlm_lock *search_queue(struct list_head *queue, - struct lock_match_data *data) + struct ldlm_match_data *data) { struct ldlm_lock *lock; @@ -1280,7 +1280,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, enum ldlm_mode mode, struct lustre_handle *lockh, int unref) { - struct lock_match_data data = { + struct ldlm_match_data data = { .lmd_old = NULL, .lmd_lock = NULL, .lmd_mode = &mode, @@ -1288,6 +1288,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, .lmd_flags = flags, .lmd_skip_flags = skip_flags, .lmd_unref = unref, + .lmd_has_ast_data = false, }; struct ldlm_resource *res; struct ldlm_lock *lock; diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index c748e58..dcddf17 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -549,6 +549,10 @@ int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data) struct ost_lvb *lvb; struct req_capsule *cap; struct cl_object *obj = NULL; + struct ldlm_resource *res = dlmlock->l_resource; + struct ldlm_match_data matchdata = { 0 }; + union ldlm_policy_data policy; + enum ldlm_mode mode = LCK_PW | LCK_GROUP | LCK_PR; int result; u16 refcheck; @@ -559,13 +563,40 @@ int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data) result = PTR_ERR(env); goto out; } + policy.l_extent.start = 0; + policy.l_extent.end = LUSTRE_EOF; - lock_res_and_lock(dlmlock); - if (dlmlock->l_ast_data) { - obj = osc2cl(dlmlock->l_ast_data); - cl_object_get(obj); + matchdata.lmd_mode = &mode; + matchdata.lmd_policy = &policy; + matchdata.lmd_flags = LDLM_FL_TEST_LOCK | LDLM_FL_CBPENDING; + matchdata.lmd_unref = 1; + matchdata.lmd_has_ast_data = true; + + LDLM_LOCK_GET(dlmlock); + + /* If any dlmlock has l_ast_data set, we must find it or we risk + * missing a size update done under a different lock. + */ + while (dlmlock) { + lock_res_and_lock(dlmlock); + if (dlmlock->l_ast_data) { + obj = osc2cl(dlmlock->l_ast_data); + cl_object_get(obj); + } + unlock_res_and_lock(dlmlock); + LDLM_LOCK_PUT(dlmlock); + + dlmlock = NULL; + + if (!obj && res->lr_type == LDLM_EXTENT) { + if (OBD_FAIL_CHECK(OBD_FAIL_OSC_NO_SIZE_DATA)) + break; + + lock_res(res); + dlmlock = search_itree(res, &matchdata); + unlock_res(res); + } } - unlock_res_and_lock(dlmlock); if (obj) { /* Do not grab the mutex of cl_lock for glimpse. From patchwork Thu Feb 27 21:15:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A969C138D for ; Thu, 27 Feb 2020 21:42:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 91FCB246A1 for ; Thu, 27 Feb 2020 21:42:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 91FCB246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 53EE7349EED; Thu, 27 Feb 2020 13:34:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4368221FB29 for ; Thu, 27 Feb 2020 13:20:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F327D917E; Thu, 27 Feb 2020 16:18:18 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F071447C; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:53 -0500 Message-Id: <1582838290-17243-486-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 485/622] lustre: lmv: use lu_tgt_descs to manage tgts X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Like LOD, use lu_tgt_descs to manage tgts, so that they can share tgt management code. TODO: use the same tgt management code for LOV/LFSCK. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: 59fc1218fccf ("LU-11213 lmv: use lu_tgt_descs to manage tgts") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35218 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 68 +++++ fs/lustre/include/obd.h | 5 +- fs/lustre/lmv/lmv_intent.c | 10 +- fs/lustre/lmv/lmv_internal.h | 86 +++--- fs/lustre/lmv/lmv_obd.c | 540 +++++++++++++++----------------------- fs/lustre/lmv/lmv_qos.c | 57 ++-- fs/lustre/lmv/lproc_lmv.c | 20 +- fs/lustre/obdclass/Makefile | 2 +- fs/lustre/obdclass/lu_tgt_descs.c | 192 ++++++++++++++ 9 files changed, 572 insertions(+), 408 deletions(-) create mode 100644 fs/lustre/obdclass/lu_tgt_descs.c diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index aed0d4b..c30c06d 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1392,6 +1392,38 @@ struct lu_tgt_desc { ltd_connecting:1; /* target is connecting */ }; +/* number of pointers at 1st level */ +#define TGT_PTRS (PAGE_SIZE / sizeof(void *)) +/* number of pointers at 2nd level */ +#define TGT_PTRS_PER_BLOCK (PAGE_SIZE / sizeof(void *)) + +struct lu_tgt_desc_idx { + struct lu_tgt_desc *ldi_tgt[TGT_PTRS_PER_BLOCK]; +}; + +struct lu_tgt_descs { + /* list of known TGTs */ + struct lu_tgt_desc_idx *ltd_tgt_idx[TGT_PTRS]; + /* Size of the lu_tgts array, granted to be a power of 2 */ + u32 ltd_tgts_size; + /* number of registered TGTs */ + u32 ltd_tgtnr; + /* bitmap of TGTs available */ + unsigned long *ltd_tgt_bitmap; + /* TGTs scheduled to be deleted */ + u32 ltd_death_row; + /* Table refcount used for delayed deletion */ + int ltd_refcount; + /* mutex to serialize concurrent updates to the tgt table */ + struct mutex ltd_mutex; + /* read/write semaphore used for array relocation */ + struct rw_semaphore ltd_rw_sem; +}; + +#define LTD_TGT(ltd, index) \ + ((ltd)->ltd_tgt_idx[(index) / TGT_PTRS_PER_BLOCK] \ + ->ldi_tgt[(index) % TGT_PTRS_PER_BLOCK]) + /* QoS data for LOD/LMV */ struct lu_qos { struct list_head lq_svr_list; /* lu_svr_qos list */ @@ -1412,5 +1444,41 @@ struct lu_qos { int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); u64 lu_prandom_u64_max(u64 ep_ro); +int lu_tgt_descs_init(struct lu_tgt_descs *ltd); +void lu_tgt_descs_fini(struct lu_tgt_descs *ltd); +int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); + +static inline struct lu_tgt_desc *ltd_first_tgt(struct lu_tgt_descs *ltd) +{ + int index; + + index = find_first_bit(ltd->ltd_tgt_bitmap, + ltd->ltd_tgts_size); + return (index < ltd->ltd_tgts_size) ? LTD_TGT(ltd, index) : NULL; +} + +static inline struct lu_tgt_desc *ltd_next_tgt(struct lu_tgt_descs *ltd, + struct lu_tgt_desc *tgt) +{ + int index; + + if (!tgt) + return NULL; + + index = tgt->ltd_index; + LASSERT(index < ltd->ltd_tgts_size); + index = find_next_bit(ltd->ltd_tgt_bitmap, + ltd->ltd_tgts_size, index + 1); + return (index < ltd->ltd_tgts_size) ? LTD_TGT(ltd, index) : NULL; +} + +#define ltd_foreach_tgt(ltd, tgt) \ + for (tgt = ltd_first_tgt(ltd); tgt; tgt = ltd_next_tgt(ltd, tgt)) + +#define ltd_foreach_tgt_safe(ltd, tgt, tmp) \ + for (tgt = ltd_first_tgt(ltd), tmp = ltd_next_tgt(ltd, tgt); tgt; \ + tgt = tmp, tmp = ltd_next_tgt(ltd, tgt)) + /** @} lu */ #endif /* __LUSTRE_LU_OBJECT_H */ diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index ef37f78..41431f9 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -424,14 +424,13 @@ struct lmv_obd { spinlock_t lmv_lock; struct lmv_desc desc; - struct mutex lmv_init_mutex; int connected; int max_easize; int max_def_easize; u32 lmv_statfs_start; - u32 tgts_size; /* size of tgts array */ - struct lmv_tgt_desc **tgts; + struct lu_tgt_descs lmv_mdt_descs; + struct obd_connect_data conn_data; struct kobject *lmv_tgts_kobj; void *lmv_cache; diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index f62cd7c..542b16d 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -83,7 +83,7 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, LASSERT(fid_is_sane(&body->mbo_fid1)); - tgt = lmv_find_target(lmv, &body->mbo_fid1); + tgt = lmv_fid2tgt(lmv, &body->mbo_fid1); if (IS_ERR(tgt)) { rc = PTR_ERR(tgt); goto out; @@ -199,9 +199,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, op_data->op_fid1 = fid; op_data->op_fid2 = fid; - tgt = lmv_get_target(lmv, lsm->lsm_md_oinfo[i].lmo_mds, NULL); - if (IS_ERR(tgt)) { - rc = PTR_ERR(tgt); + tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[i].lmo_mds); + if (!tgt) { + rc = -ENODEV; goto cleanup; } @@ -349,7 +349,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, if (lmv_dir_striped(op_data->op_mea1)) op_data->op_fid1 = op_data->op_fid2; - tgt = lmv_find_target(lmv, &op_data->op_fid2); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(tgt)) return PTR_ERR(tgt); diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index c673656..e0c3ba0 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -70,57 +70,81 @@ static inline struct obd_device *lmv2obd_dev(struct lmv_obd *lmv) return container_of_safe(lmv, struct obd_device, u.lmv); } -static inline struct lmv_tgt_desc * -lmv_get_target(struct lmv_obd *lmv, u32 mdt_idx, int *index) +static inline struct lu_tgt_desc * +lmv_tgt(struct lmv_obd *lmv, u32 index) { - int i; + return index < lmv->lmv_mdt_descs.ltd_tgts_size ? + LTD_TGT(&lmv->lmv_mdt_descs, index) : NULL; +} - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i]) - continue; +static inline bool +lmv_mdt0_inited(struct lmv_obd *lmv) +{ + return lmv->lmv_mdt_descs.ltd_tgt_bitmap && + test_bit(0, lmv->lmv_mdt_descs.ltd_tgt_bitmap); +} - if (lmv->tgts[i]->ltd_index == mdt_idx) { - if (index) - *index = i; - return lmv->tgts[i]; - } - } +#define lmv_foreach_tgt(lmv, tgt) ltd_foreach_tgt(&(lmv)->lmv_mdt_descs, tgt) + +#define lmv_foreach_tgt_safe(lmv, tgt, tmp) \ + ltd_foreach_tgt_safe(&(lmv)->lmv_mdt_descs, tgt, tmp) + +static inline +struct lu_tgt_desc *lmv_first_connected_tgt(struct lmv_obd *lmv) +{ + struct lu_tgt_desc *tgt; - return ERR_PTR(-ENODEV); + tgt = ltd_first_tgt(&lmv->lmv_mdt_descs); + while (tgt && !tgt->ltd_exp) + tgt = ltd_next_tgt(&lmv->lmv_mdt_descs, tgt); + + return tgt; } -static inline int -lmv_find_target_index(struct lmv_obd *lmv, const struct lu_fid *fid) +static inline +struct lu_tgt_desc *lmv_next_connected_tgt(struct lmv_obd *lmv, + struct lu_tgt_desc *tgt) { - struct lmv_tgt_desc *ltd; - u32 mdt_idx = 0; - int index = 0; + do { + tgt = ltd_next_tgt(&lmv->lmv_mdt_descs, tgt); + } while (tgt && !tgt->ltd_exp); - if (lmv->desc.ld_tgt_count > 1) { - int rc; + return tgt; +} - rc = lmv_fld_lookup(lmv, fid, &mdt_idx); - if (rc < 0) - return rc; - } +#define lmv_foreach_connected_tgt(lmv, tgt) \ + for (tgt = lmv_first_connected_tgt(lmv); tgt; \ + tgt = lmv_next_connected_tgt(lmv, tgt)) - ltd = lmv_get_target(lmv, mdt_idx, &index); - if (IS_ERR(ltd)) - return PTR_ERR(ltd); +static inline int +lmv_fid2tgt_index(struct lmv_obd *lmv, const struct lu_fid *fid) +{ + u32 mdt_idx; + int rc; + + if (lmv->desc.ld_tgt_count < 2) + return 0; - return index; + rc = lmv_fld_lookup(lmv, fid, &mdt_idx); + if (rc < 0) + return rc; + + return mdt_idx; } static inline struct lmv_tgt_desc * -lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid) +lmv_fid2tgt(struct lmv_obd *lmv, const struct lu_fid *fid) { + struct lu_tgt_desc *tgt; int index; - index = lmv_find_target_index(lmv, fid); + index = lmv_fid2tgt_index(lmv, fid); if (index < 0) return ERR_PTR(index); - return lmv->tgts[index]; + tgt = lmv_tgt(lmv, index); + + return tgt ? tgt : ERR_PTR(-ENODEV); } static inline int lmv_stripe_md_size(int stripe_count) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 26021bb..8d682b4 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -78,28 +78,24 @@ void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, static int lmv_set_mdc_active(struct lmv_obd *lmv, const struct obd_uuid *uuid, int activate) { - struct lmv_tgt_desc *tgt = NULL; + struct lu_tgt_desc *tgt = NULL; struct obd_device *obd; - u32 i; int rc = 0; CDEBUG(D_INFO, "Searching in lmv %p for uuid %s (activate=%d)\n", lmv, uuid->uuid, activate); spin_lock(&lmv->lmv_lock); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp) - continue; - - CDEBUG(D_INFO, "Target idx %d is %s conn %#llx\n", i, - tgt->ltd_uuid.uuid, tgt->ltd_exp->exp_handle.h_cookie); + lmv_foreach_connected_tgt(lmv, tgt) { + CDEBUG(D_INFO, "Target idx %d is %s conn %#llx\n", + tgt->ltd_index, tgt->ltd_uuid.uuid, + tgt->ltd_exp->exp_handle.h_cookie); if (obd_uuid_equals(uuid, &tgt->ltd_uuid)) break; } - if (i == lmv->desc.ld_tgt_count) { + if (!tgt) { rc = -EINVAL; goto out_lmv_lock; } @@ -112,7 +108,7 @@ static int lmv_set_mdc_active(struct lmv_obd *lmv, const struct obd_uuid *uuid, CDEBUG(D_INFO, "Found OBD %s=%s device %d (%p) type %s at LMV idx %d\n", obd->obd_name, obd->obd_uuid.uuid, obd->obd_minor, obd, - obd->obd_type->typ_name, i); + obd->obd_type->typ_name, tgt->ltd_index); LASSERT(strcmp(obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0); if (tgt->ltd_active == activate) { @@ -133,7 +129,7 @@ static int lmv_set_mdc_active(struct lmv_obd *lmv, const struct obd_uuid *uuid, static struct obd_uuid *lmv_get_uuid(struct obd_export *exp) { struct lmv_obd *lmv = &exp->exp_obd->u.lmv; - struct lmv_tgt_desc *tgt = lmv->tgts[0]; + struct lmv_tgt_desc *tgt = lmv_tgt(lmv, 0); return tgt ? obd_get_uuid(tgt->ltd_exp) : NULL; } @@ -235,9 +231,9 @@ static int lmv_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - u32 i; - int rc = 0; + struct lmv_tgt_desc *tgt; int change = 0; + int rc = 0; if (lmv->max_easize < easize) { lmv->max_easize = easize; @@ -254,20 +250,14 @@ static int lmv_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize) if (lmv->connected == 0) return 0; - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - struct lmv_tgt_desc *tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp) { - CWARN("%s: NULL export for %d\n", obd->obd_name, i); - continue; - } + lmv_foreach_connected_tgt(lmv, tgt) { if (!tgt->ltd_active) continue; rc = md_init_ea_size(tgt->ltd_exp, easize, def_easize); if (rc) { CERROR("%s: obd_init_ea_size() failed on MDT target %d: rc = %d\n", - obd->obd_name, i, rc); + obd->obd_name, tgt->ltd_index, rc); break; } } @@ -364,15 +354,12 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) return 0; } -static void lmv_del_target(struct lmv_obd *lmv, int index) +static void lmv_del_target(struct lmv_obd *lmv, struct lu_tgt_desc *tgt) { - if (!lmv->tgts[index]) - return; - - lqos_del_tgt(&lmv->lmv_qos, lmv->tgts[index]); - - kfree(lmv->tgts[index]); - lmv->tgts[index] = NULL; + LASSERT(tgt); + lqos_del_tgt(&lmv->lmv_qos, tgt); + lu_tgt_descs_del(&lmv->lmv_mdt_descs, tgt); + kfree(tgt); } static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, @@ -381,6 +368,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, struct lmv_obd *lmv = &obd->u.lmv; struct obd_device *mdc_obd; struct lmv_tgt_desc *tgt; + struct lu_tgt_descs *ltd = &lmv->lmv_mdt_descs; int orig_tgt_count = 0; int rc = 0; @@ -394,78 +382,36 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, return -EINVAL; } - mutex_lock(&lmv->lmv_init_mutex); - - if ((index < lmv->tgts_size) && lmv->tgts[index]) { - tgt = lmv->tgts[index]; - CERROR("%s: UUID %s already assigned at LMV target index %d: rc = %d\n", - obd->obd_name, - obd_uuid2str(&tgt->ltd_uuid), index, -EEXIST); - mutex_unlock(&lmv->lmv_init_mutex); - return -EEXIST; - } - - if (index >= lmv->tgts_size) { - /* We need to reallocate the lmv target array. */ - struct lmv_tgt_desc **newtgts, **old = NULL; - u32 newsize = 1; - u32 oldsize = 0; - - while (newsize < index + 1) - newsize <<= 1; - newtgts = kcalloc(newsize, sizeof(*newtgts), GFP_NOFS); - if (!newtgts) { - mutex_unlock(&lmv->lmv_init_mutex); - return -ENOMEM; - } - - if (lmv->tgts_size) { - memcpy(newtgts, lmv->tgts, - sizeof(*newtgts) * lmv->tgts_size); - old = lmv->tgts; - oldsize = lmv->tgts_size; - } - - lmv->tgts = newtgts; - lmv->tgts_size = newsize; - smp_rmb(); - kfree(old); - - CDEBUG(D_CONFIG, "tgts: %p size: %d\n", lmv->tgts, - lmv->tgts_size); - } - tgt = kzalloc(sizeof(*tgt), GFP_NOFS); - if (!tgt) { - mutex_unlock(&lmv->lmv_init_mutex); + if (!tgt) return -ENOMEM; - } mutex_init(&tgt->ltd_fid_mutex); tgt->ltd_index = index; tgt->ltd_uuid = *uuidp; tgt->ltd_active = 0; - lmv->tgts[index] = tgt; - if (index >= lmv->desc.ld_tgt_count) { + + mutex_lock(<d->ltd_mutex); + rc = lu_tgt_descs_add(ltd, tgt); + if (!rc && index >= lmv->desc.ld_tgt_count) { orig_tgt_count = lmv->desc.ld_tgt_count; lmv->desc.ld_tgt_count = index + 1; } + mutex_unlock(<d->ltd_mutex); - if (!lmv->connected) { + if (rc) + goto out_tgt; + + if (!lmv->connected) /* lmv_check_connect() will connect this target. */ - mutex_unlock(&lmv->lmv_init_mutex); return rc; - } - /* Otherwise let's connect it ourselves */ - mutex_unlock(&lmv->lmv_init_mutex); rc = lmv_connect_mdc(obd, tgt); if (rc) { - spin_lock(&lmv->lmv_lock); - if (lmv->desc.ld_tgt_count == index + 1) - lmv->desc.ld_tgt_count = orig_tgt_count; + mutex_lock(<d->ltd_mutex); + lmv->desc.ld_tgt_count = orig_tgt_count; memset(tgt, 0, sizeof(*tgt)); - spin_unlock(&lmv->lmv_lock); + mutex_unlock(<d->ltd_mutex); } else { int easize = sizeof(struct lmv_stripe_md) + lmv->desc.ld_tgt_count * sizeof(struct lu_fid); @@ -473,47 +419,46 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, } return rc; + +out_tgt: + kfree(tgt); + return rc; } static int lmv_check_connect(struct obd_device *obd) { struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - u32 i; - int rc; int easize; + int rc; if (lmv->connected) return 0; - mutex_lock(&lmv->lmv_init_mutex); + mutex_lock(&lmv->lmv_mdt_descs.ltd_mutex); if (lmv->connected) { - mutex_unlock(&lmv->lmv_init_mutex); - return 0; + rc = 0; + goto unlock; } if (lmv->desc.ld_tgt_count == 0) { - mutex_unlock(&lmv->lmv_init_mutex); - CERROR("%s: no targets configured.\n", obd->obd_name); - return -EINVAL; + CERROR("%s: no targets configured: rc = -EINVAL\n", + obd->obd_name); + rc = -EINVAL; + goto unlock; } - LASSERT(lmv->tgts); - - if (!lmv->tgts[0]) { - mutex_unlock(&lmv->lmv_init_mutex); - CERROR("%s: no target configured for index 0.\n", + if (!lmv_mdt0_inited(lmv)) { + CERROR("%s: no target configured for index 0: rc = -EINVAL.\n", obd->obd_name); - return -EINVAL; + rc = -EINVAL; + goto unlock; } CDEBUG(D_CONFIG, "Time to connect %s to %s\n", obd->obd_uuid.uuid, obd->obd_name); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - if (!tgt) - continue; + lmv_foreach_tgt(lmv, tgt) { rc = lmv_connect_mdc(obd, tgt); if (rc) goto out_disc; @@ -522,29 +467,22 @@ static int lmv_check_connect(struct obd_device *obd) lmv->connected = 1; easize = lmv_mds_md_size(lmv->desc.ld_tgt_count, LMV_MAGIC); lmv_init_ea_size(obd->obd_self_export, easize, 0); - mutex_unlock(&lmv->lmv_init_mutex); - return 0; +unlock: + mutex_unlock(&lmv->lmv_mdt_descs.ltd_mutex); -out_disc: - while (i-- > 0) { - int rc2; + return rc; - tgt = lmv->tgts[i]; - if (!tgt) - continue; +out_disc: + lmv_foreach_tgt(lmv, tgt) { tgt->ltd_active = 0; - if (tgt->ltd_exp) { - --lmv->desc.ld_active_tgt_count; - rc2 = obd_disconnect(tgt->ltd_exp); - if (rc2) { - CERROR("LMV target %s disconnect on MDC idx %d: error %d\n", - tgt->ltd_uuid.uuid, i, rc2); - } - } + if (!tgt->ltd_exp) + continue; + + --lmv->desc.ld_active_tgt_count; + obd_disconnect(tgt->ltd_exp); } - mutex_unlock(&lmv->lmv_init_mutex); - return rc; + goto unlock; } static int lmv_disconnect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) @@ -591,27 +529,15 @@ static int lmv_disconnect(struct obd_export *exp) { struct obd_device *obd = class_exp2obd(exp); struct lmv_obd *lmv = &obd->u.lmv; + struct lmv_tgt_desc *tgt; int rc; - u32 i; - if (!lmv->tgts) - goto out_local; - - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp) - continue; - - lmv_disconnect_mdc(obd, lmv->tgts[i]); - } + lmv_foreach_connected_tgt(lmv, tgt) + lmv_disconnect_mdc(obd, tgt); if (lmv->lmv_tgts_kobj) kobject_put(lmv->lmv_tgts_kobj); -out_local: - /* - * This is the case when no real connection is established by - * lmv_check_connect(). - */ if (!lmv->connected) class_export_put(exp); rc = class_disconnect(exp); @@ -631,7 +557,7 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, int remote_gf_size = 0; int rc; - tgt = lmv_find_target(lmv, &gf->gf_fid); + tgt = lmv_fid2tgt(lmv, &gf->gf_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -696,7 +622,7 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, goto out_fid2path; } - tgt = lmv_find_target(lmv, &gf->gf_fid); + tgt = lmv_fid2tgt(lmv, &gf->gf_fid); if (IS_ERR(tgt)) { rc = -EINVAL; goto out_fid2path; @@ -719,12 +645,13 @@ static int lmv_hsm_req_count(struct lmv_obd *lmv, const struct hsm_user_request *hur, const struct lmv_tgt_desc *tgt_mds) { - u32 i, nr = 0; struct lmv_tgt_desc *curr_tgt; + u32 i; + int nr = 0; /* count how many requests must be sent to the given target */ for (i = 0; i < hur->hur_request.hr_itemcount; i++) { - curr_tgt = lmv_find_target(lmv, &hur->hur_user_item[i].hui_fid); + curr_tgt = lmv_fid2tgt(lmv, &hur->hur_user_item[i].hui_fid); if (IS_ERR(curr_tgt)) return PTR_ERR(curr_tgt); if (obd_uuid_equals(&curr_tgt->ltd_uuid, &tgt_mds->ltd_uuid)) @@ -736,17 +663,16 @@ static int lmv_hsm_req_count(struct lmv_obd *lmv, static int lmv_hsm_req_build(struct lmv_obd *lmv, struct hsm_user_request *hur_in, const struct lmv_tgt_desc *tgt_mds, - struct hsm_user_request *hur_out) + struct hsm_user_request *hur_out) { - int i, nr_out; + u32 i, nr_out; struct lmv_tgt_desc *curr_tgt; /* build the hsm_user_request for the given target */ hur_out->hur_request = hur_in->hur_request; nr_out = 0; for (i = 0; i < hur_in->hur_request.hr_itemcount; i++) { - curr_tgt = lmv_find_target(lmv, - &hur_in->hur_user_item[i].hui_fid); + curr_tgt = lmv_fid2tgt(lmv, &hur_in->hur_user_item[i].hui_fid); if (IS_ERR(curr_tgt)) return PTR_ERR(curr_tgt); if (obd_uuid_equals(&curr_tgt->ltd_uuid, &tgt_mds->ltd_uuid)) { @@ -767,20 +693,14 @@ static int lmv_hsm_ct_unregister(struct obd_device *obd, unsigned int cmd, void __user *uarg) { struct lmv_obd *lmv = &obd->u.lmv; - u32 i; + struct lu_tgt_desc *tgt; /* unregister request (call from llapi_hsm_copytool_fini) */ - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - struct lmv_tgt_desc *tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp) - continue; - + lmv_foreach_connected_tgt(lmv, tgt) /* best effort: try to clean as much as possible * (continue on error) */ - obd_iocontrol(cmd, lmv->tgts[i]->ltd_exp, len, lk, uarg); - } + obd_iocontrol(cmd, tgt->ltd_exp, len, lk, uarg); /* Whatever the result, remove copytool from kuc groups. * Unreached coordinators will get EPIPE on next requests @@ -795,11 +715,12 @@ static int lmv_hsm_ct_register(struct obd_device *obd, unsigned int cmd, { struct lmv_obd *lmv = &obd->u.lmv; struct file *filp; - u32 i, j; - int err; bool any_set = false; struct kkuc_ct_data *kcd; size_t kcd_size; + struct lu_tgt_desc *tgt; + u32 i; + int err; int rc = 0; filp = fget(lk->lk_wfd); @@ -838,26 +759,22 @@ static int lmv_hsm_ct_register(struct obd_device *obd, unsigned int cmd, * In case of failure, unregister from previous MDS, * except if it because of inactive target. */ - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - struct lmv_tgt_desc *tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp) - continue; - + lmv_foreach_connected_tgt(lmv, tgt) { err = obd_iocontrol(cmd, tgt->ltd_exp, len, lk, uarg); if (err) { if (tgt->ltd_active) { /* permanent error */ CERROR("error: iocontrol MDC %s on MDTidx %d cmd %x: err = %d\n", - tgt->ltd_uuid.uuid, i, cmd, err); + tgt->ltd_uuid.uuid, tgt->ltd_index, cmd, + err); rc = err; lk->lk_flags |= LK_FLG_STOP; + i = tgt->ltd_index; /* unregister from previous MDS */ - for (j = 0; j < i; j++) { - tgt = lmv->tgts[j]; + lmv_foreach_connected_tgt(lmv, tgt) { + if (tgt->ltd_index >= i) + break; - if (!tgt || !tgt->ltd_exp) - continue; obd_iocontrol(cmd, tgt->ltd_exp, len, lk, uarg); } @@ -891,11 +808,10 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, { struct obd_device *obddev = class_exp2obd(exp); struct lmv_obd *lmv = &obddev->u.lmv; - struct lmv_tgt_desc *tgt = NULL; - u32 i = 0; - int rc = 0; + struct lu_tgt_desc *tgt = NULL; int set = 0; u32 count = lmv->desc.ld_tgt_count; + int rc = 0; if (count == 0) return -ENOTTY; @@ -911,7 +827,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, if (index >= count) return -ENODEV; - tgt = lmv->tgts[index]; + tgt = lmv_tgt(lmv, index); if (!tgt || !tgt->ltd_active) return -ENODATA; @@ -944,14 +860,11 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, if (count <= qctl->qc_idx) return -EINVAL; - tgt = lmv->tgts[qctl->qc_idx]; + tgt = lmv_tgt(lmv, qctl->qc_idx); if (!tgt || !tgt->ltd_exp) return -EINVAL; } else if (qctl->qc_valid == QC_UUID) { - for (i = 0; i < count; i++) { - tgt = lmv->tgts[i]; - if (!tgt) - continue; + lmv_foreach_tgt(lmv, tgt) { if (!obd_uuid_equals(&tgt->ltd_uuid, &qctl->obd_uuid)) continue; @@ -965,11 +878,11 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, return -EINVAL; } - if (i >= count) + if (tgt->ltd_index >= count) return -EAGAIN; LASSERT(tgt && tgt->ltd_exp); - oqctl = kzalloc(sizeof(*oqctl), GFP_NOFS); + oqctl = kzalloc(sizeof(*oqctl), GFP_KERNEL); if (!oqctl) return -ENOMEM; @@ -984,11 +897,11 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, break; } case LL_IOC_GET_CONNECT_FLAGS: { - tgt = lmv->tgts[0]; + tgt = lmv_tgt(lmv, 0); + rc = -ENODATA; - if (!tgt || !tgt->ltd_exp) - return -ENODATA; - rc = obd_iocontrol(cmd, tgt->ltd_exp, len, karg, uarg); + if (tgt && tgt->ltd_exp) + rc = obd_iocontrol(cmd, tgt->ltd_exp, len, karg, uarg); break; } case LL_IOC_FID2MDTIDX: { @@ -1015,7 +928,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, case LL_IOC_HSM_ACTION: { struct md_op_data *op_data = karg; - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1028,7 +941,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, case LL_IOC_HSM_PROGRESS: { const struct hsm_progress_kernel *hpk = karg; - tgt = lmv_find_target(lmv, &hpk->hpk_fid); + tgt = lmv_fid2tgt(lmv, &hpk->hpk_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); rc = obd_iocontrol(cmd, tgt->ltd_exp, len, karg, uarg); @@ -1046,22 +959,17 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, * the request. */ if (reqcount == 1 || count == 1) { - tgt = lmv_find_target(lmv, - &hur->hur_user_item[0].hui_fid); + tgt = lmv_fid2tgt(lmv, &hur->hur_user_item[0].hui_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); rc = obd_iocontrol(cmd, tgt->ltd_exp, len, karg, uarg); } else { /* split fid list to their respective MDS */ - for (i = 0; i < count; i++) { + lmv_foreach_connected_tgt(lmv, tgt) { struct hsm_user_request *req; size_t reqlen; int nr, rc1; - tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp) - continue; - nr = lmv_hsm_req_count(lmv, hur, tgt); if (nr < 0) return nr; @@ -1094,11 +1002,11 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, struct md_op_data *op_data = karg; struct lmv_tgt_desc *tgt1, *tgt2; - tgt1 = lmv_find_target(lmv, &op_data->op_fid1); + tgt1 = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt1)) return PTR_ERR(tgt1); - tgt2 = lmv_find_target(lmv, &op_data->op_fid2); + tgt2 = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(tgt2)) return PTR_ERR(tgt2); @@ -1122,13 +1030,10 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, break; } default: - for (i = 0; i < count; i++) { + lmv_foreach_connected_tgt(lmv, tgt) { struct obd_device *mdc_obd; int err; - tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp) - continue; /* ll_umount_begin() sets force flag but for lmv, not * mdc. Let's pass it through */ @@ -1139,7 +1044,8 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, if (tgt->ltd_active) { CERROR("%s: error: iocontrol MDC %s on MDTidx %d cmd %x: err = %d\n", lmv2obd_dev(lmv)->obd_name, - tgt->ltd_uuid.uuid, i, cmd, err); + tgt->ltd_uuid.uuid, + tgt->ltd_index, cmd, err); if (!rc) rc = err; } @@ -1207,9 +1113,9 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds) struct lmv_tgt_desc *tgt; int rc; - tgt = lmv_get_target(lmv, mds, NULL); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); + tgt = lmv_tgt(lmv, mds); + if (!tgt) + return -ENODEV; /* * New seq alloc and FLD setup should be atomic. Otherwise we may find @@ -1276,11 +1182,6 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) return -EINVAL; } - lmv->tgts_size = 32U; - lmv->tgts = kcalloc(lmv->tgts_size, sizeof(*lmv->tgts), GFP_NOFS); - if (!lmv->tgts) - return -ENOMEM; - obd_str2uuid(&lmv->desc.ld_uuid, desc->ld_uuid.uuid); lmv->desc.ld_tgt_count = 0; lmv->desc.ld_active_tgt_count = 0; @@ -1289,7 +1190,6 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) lmv->max_easize = 0; spin_lock_init(&lmv->lmv_lock); - mutex_init(&lmv->lmv_init_mutex); /* Set up allocation policy (QoS and RR) */ INIT_LIST_HEAD(&lmv->lmv_qos.lq_svr_list); @@ -1321,30 +1221,28 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) rc = fld_client_init(&lmv->lmv_fld, obd->obd_name, LUSTRE_CLI_FLD_HASH_DHT); - if (rc) { + if (rc) CERROR("Can't init FLD, err %d\n", rc); - return rc; - } - return 0; + rc = lu_tgt_descs_init(&lmv->lmv_mdt_descs); + if (rc) + CWARN("%s: error initialize target table: rc = %d\n", + obd->obd_name, rc); + + return rc; } static int lmv_cleanup(struct obd_device *obd) { struct lmv_obd *lmv = &obd->u.lmv; + struct lu_tgt_desc *tgt; + struct lu_tgt_desc *tmp; fld_client_fini(&lmv->lmv_fld); - if (lmv->tgts) { - int i; + lmv_foreach_tgt_safe(lmv, tgt, tmp) + lmv_del_target(lmv, tgt); + lu_tgt_descs_fini(&lmv->lmv_mdt_descs); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i]) - continue; - lmv_del_target(lmv, i); - } - kfree(lmv->tgts); - lmv->tgts_size = 0; - } return 0; } @@ -1423,8 +1321,10 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, struct obd_device *obd = class_exp2obd(exp); struct lmv_obd *lmv = &obd->u.lmv; struct obd_statfs *temp; + struct lu_tgt_desc *tgt; + u32 i; + u32 idx; int rc = 0; - u32 i, idx; temp = kzalloc(sizeof(*temp), GFP_NOFS); if (!temp) @@ -1435,15 +1335,14 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, for (i = 0; i < lmv->desc.ld_tgt_count; i++, idx++) { idx = idx % lmv->desc.ld_tgt_count; - if (!lmv->tgts[idx] || !lmv->tgts[idx]->ltd_exp) + tgt = lmv_tgt(lmv, idx); + if (!tgt || !tgt->ltd_exp) continue; - rc = obd_statfs(env, lmv->tgts[idx]->ltd_exp, temp, - max_age, flags); + rc = obd_statfs(env, tgt->ltd_exp, temp, max_age, flags); if (rc) { CERROR("%s: can't stat MDS #%d: rc = %d\n", - lmv->tgts[idx]->ltd_exp->exp_obd->obd_name, i, - rc); + tgt->ltd_exp->exp_obd->obd_name, i, rc); goto out_free_temp; } @@ -1524,8 +1423,12 @@ static int lmv_get_root(struct obd_export *exp, const char *fileset, { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; + struct lu_tgt_desc *tgt = lmv_tgt(lmv, 0); + + if (!tgt) + return -ENODEV; - return md_get_root(lmv->tgts[0]->ltd_exp, fileset, fid); + return md_get_root(tgt->ltd_exp, fileset, fid); } static int lmv_getxattr(struct obd_export *exp, const struct lu_fid *fid, @@ -1536,7 +1439,7 @@ static int lmv_getxattr(struct obd_export *exp, const struct lu_fid *fid, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1554,7 +1457,7 @@ static int lmv_setxattr(struct obd_export *exp, const struct lu_fid *fid, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1569,7 +1472,7 @@ static int lmv_getattr(struct obd_export *exp, struct md_op_data *op_data, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1585,7 +1488,7 @@ static int lmv_null_inode(struct obd_export *exp, const struct lu_fid *fid) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - u32 i; + struct lu_tgt_desc *tgt; CDEBUG(D_INODE, "CBDATA for " DFID "\n", PFID(fid)); @@ -1594,11 +1497,8 @@ static int lmv_null_inode(struct obd_export *exp, const struct lu_fid *fid) * lookup lock in space of MDT storing direntry and update/open lock in * space of MDT storing inode. */ - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp) - continue; - md_null_inode(lmv->tgts[i]->ltd_exp, fid); - } + lmv_foreach_connected_tgt(lmv, tgt) + md_null_inode(tgt->ltd_exp, fid); return 0; } @@ -1610,7 +1510,7 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1627,7 +1527,7 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, struct lmv_tgt_desc *tgt; if (!lmv_dir_striped(lsm) || !namelen) { - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return tgt; @@ -1648,11 +1548,11 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, *fid = oinfo->lmo_fid; *mds = oinfo->lmo_mds; - tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); + tgt = lmv_tgt(lmv, oinfo->lmo_mds); CDEBUG(D_INODE, "locate MDT %u parent " DFID "\n", *mds, PFID(fid)); - return tgt; + return tgt ? tgt : ERR_PTR(-ENODEV); } /** @@ -1690,9 +1590,9 @@ struct lmv_tgt_desc * */ if (op_data->op_bias & MDS_CREATE_VOLATILE && (int)op_data->op_mds != -1) { - tgt = lmv_get_target(lmv, op_data->op_mds, NULL); - if (IS_ERR(tgt)) - return tgt; + tgt = lmv_tgt(lmv, op_data->op_mds); + if (!tgt) + return ERR_PTR(-ENODEV); if (lmv_dir_striped(lsm)) { int i; @@ -1715,7 +1615,10 @@ struct lmv_tgt_desc * op_data->op_fid1 = oinfo->lmo_fid; op_data->op_mds = oinfo->lmo_mds; - tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL); + + tgt = lmv_tgt(lmv, oinfo->lmo_mds); + if (!tgt) + tgt = ERR_PTR(-ENODEV); } else if (op_data->op_code == LUSTRE_OPC_MKDIR && lmv_dir_qos_mkdir(op_data->op_default_mea1) && !lmv_dir_striped(lsm)) { @@ -1847,7 +1750,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, * Send the create request to the MDT where the object * will be located */ - tgt = lmv_find_target(lmv, &op_data->op_fid2); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1881,7 +1784,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(&op_data->op_fid1)); - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -1958,7 +1861,7 @@ static int lmv_early_cancel(struct obd_export *exp, struct lmv_tgt_desc *tgt, return 0; if (!tgt) { - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); } @@ -2041,7 +1944,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid()); op_data->op_cap = current_cap(); - parent_tgt = lmv_find_target(lmv, &op_data->op_fid1); + parent_tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(parent_tgt)) return PTR_ERR(parent_tgt); @@ -2068,10 +1971,9 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, /* save it in fid4 temporarily for early cancel */ op_data->op_fid4 = lsm->lsm_md_oinfo[rc].lmo_fid; - sp_tgt = lmv_get_target(lmv, lsm->lsm_md_oinfo[rc].lmo_mds, - NULL); - if (IS_ERR(sp_tgt)) - return PTR_ERR(sp_tgt); + sp_tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[rc].lmo_mds); + if (!sp_tgt) + return -ENODEV; /* * if parent is being migrated too, fill op_fid2 with target @@ -2088,17 +1990,15 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, return rc; op_data->op_fid2 = lsm->lsm_md_oinfo[rc].lmo_fid; - tp_tgt = lmv_get_target(lmv, - lsm->lsm_md_oinfo[rc].lmo_mds, - NULL); - if (IS_ERR(tp_tgt)) - return PTR_ERR(tp_tgt); + tp_tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[rc].lmo_mds); + if (!tp_tgt) + return -ENODEV; } } else { sp_tgt = parent_tgt; } - child_tgt = lmv_find_target(lmv, &op_data->op_fid3); + child_tgt = lmv_fid2tgt(lmv, &op_data->op_fid3); if (IS_ERR(child_tgt)) return PTR_ERR(child_tgt); @@ -2121,7 +2021,7 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, */ if (S_ISDIR(op_data->op_mode) && (exp_connect_flags2(exp) & OBD_CONNECT2_DIR_MIGRATE)) { - tgt = lmv_find_target(lmv, &target_fid); + tgt = lmv_fid2tgt(lmv, &target_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); } else { @@ -2219,7 +2119,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, * then it will send the request to the target parent */ if (fid_is_sane(&op_data->op_fid4)) { - tgt = lmv_find_target(lmv, &op_data->op_fid4); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid4); if (IS_ERR(tgt)) return PTR_ERR(tgt); } else { @@ -2247,7 +2147,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, } if (fid_is_sane(&op_data->op_fid3)) { - src_tgt = lmv_find_target(lmv, &op_data->op_fid3); + src_tgt = lmv_fid2tgt(lmv, &op_data->op_fid3); if (IS_ERR(src_tgt)) return PTR_ERR(src_tgt); @@ -2313,7 +2213,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data, ptlrpc_req_finished(*request); *request = NULL; - tgt = lmv_find_target(lmv, &op_data->op_fid4); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid4); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2344,7 +2244,7 @@ static int lmv_setattr(struct obd_export *exp, struct md_op_data *op_data, op_data->op_xvalid); op_data->op_flags |= MF_MDC_CANCEL_FID1; - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2358,7 +2258,7 @@ static int lmv_fsync(struct obd_export *exp, const struct lu_fid *fid, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2465,9 +2365,9 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, break; } - tgt = lmv_get_target(ctxt->ldc_lmv, oinfo->lmo_mds, NULL); - if (IS_ERR(tgt)) { - rc = PTR_ERR(tgt); + tgt = lmv_tgt(ctxt->ldc_lmv, oinfo->lmo_mds); + if (!tgt) { + rc = -ENODEV; break; } @@ -2516,7 +2416,7 @@ static int lmv_file_resync(struct obd_export *exp, struct md_op_data *data) if (rc != 0) return rc; - tgt = lmv_find_target(lmv, &data->op_fid1); + tgt = lmv_fid2tgt(lmv, &data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2741,7 +2641,7 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, offset, ppage); } - tgt = lmv_find_target(lmv, &op_data->op_fid1); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2792,7 +2692,7 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, return PTR_ERR(parent_tgt); if (likely(!fid_is_zero(&op_data->op_fid2))) { - tgt = lmv_find_target(lmv, &op_data->op_fid2); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(tgt)) return PTR_ERR(tgt); } else { @@ -2845,7 +2745,7 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data, ptlrpc_req_finished(*request); *request = NULL; - tgt = lmv_find_target(lmv, &op_data->op_fid2); + tgt = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -2881,6 +2781,7 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, { struct obd_device *obd; struct lmv_obd *lmv; + struct lu_tgt_desc *tgt; int rc = 0; obd = class_exp2obd(exp); @@ -2892,18 +2793,8 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, lmv = &obd->u.lmv; if (keylen >= strlen("remote_flag") && !strcmp(key, "remote_flag")) { - int i; - LASSERT(*vallen == sizeof(u32)); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - struct lmv_tgt_desc *tgt = lmv->tgts[i]; - - /* - * All tgts should be connected when this gets called. - */ - if (!tgt || !tgt->ltd_exp) - continue; - + lmv_foreach_connected_tgt(lmv, tgt) { if (!obd_get_info(env, tgt->ltd_exp, keylen, key, vallen, val)) return 0; @@ -2916,8 +2807,11 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, * Forwarding this request to first MDS, it should know LOV * desc. */ - rc = obd_get_info(env, lmv->tgts[0]->ltd_exp, keylen, key, - vallen, val); + tgt = lmv_tgt(lmv, 0); + if (!tgt) + return -ENODEV; + + rc = obd_get_info(env, tgt->ltd_exp, keylen, key, vallen, val); if (!rc && KEY_IS(KEY_CONN_DATA)) exp->exp_connect_data = *(struct obd_connect_data *)val; return rc; @@ -2937,6 +2831,7 @@ static int lmv_rmfid(struct obd_export *exp, struct fid_array *fa, struct ptlrpc_request_set *set = _set; struct lmv_obd *lmv = &obddev->u.lmv; int tgt_count = lmv->desc.ld_tgt_count; + struct lu_tgt_desc *tgt; struct fid_array *fat, **fas = NULL; int i, rc, **rcs = NULL; @@ -2987,11 +2882,11 @@ static int lmv_rmfid(struct obd_export *exp, struct fid_array *fa, fat->fa_fids[fat->fa_nr++] = fa->fa_fids[i]; } - for (i = 0; i < tgt_count; i++) { - fat = fas[i]; + lmv_foreach_connected_tgt(lmv, tgt) { + fat = fas[tgt->ltd_index]; if (!fat || fat->fa_nr == 0) continue; - rc = md_rmfid(lmv->tgts[i]->ltd_exp, fat, rcs[i], set); + rc = md_rmfid(tgt->ltd_exp, fat, rcs[tgt->ltd_index], set); } rc = ptlrpc_set_wait(NULL, set); @@ -3062,14 +2957,9 @@ static int lmv_set_info_async(const struct lu_env *env, struct obd_export *exp, if (KEY_IS(KEY_READ_ONLY) || KEY_IS(KEY_FLUSH_CTX) || KEY_IS(KEY_DEFAULT_EASIZE)) { - int i, err = 0; - - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp) - continue; + int err = 0; + lmv_foreach_connected_tgt(lmv, tgt) { err = obd_set_info_async(env, tgt->ltd_exp, keylen, key, vallen, val, set); if (err && rc == 0) @@ -3272,16 +3162,14 @@ static int lmv_cancel_unused(struct obd_export *exp, const struct lu_fid *fid, { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - int rc = 0; + struct lu_tgt_desc *tgt; int err; - u32 i; + int rc = 0; LASSERT(fid); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - struct lmv_tgt_desc *tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + lmv_foreach_connected_tgt(lmv, tgt) { + if (!tgt->ltd_active) continue; err = md_cancel_unused(tgt->ltd_exp, fid, policy, mode, flags, @@ -3297,7 +3185,7 @@ static int lmv_set_lock_data(struct obd_export *exp, void *data, u64 *bits) { struct lmv_obd *lmv = &exp->exp_obd->u.lmv; - struct lmv_tgt_desc *tgt = lmv->tgts[0]; + struct lmv_tgt_desc *tgt = lmv_tgt(lmv, 0); if (!tgt || !tgt->ltd_exp) return -EINVAL; @@ -3315,8 +3203,9 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, u64 flags, struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; enum ldlm_mode rc; - int tgt; - u32 i; + struct lu_tgt_desc *tgt; + int i; + int index; CDEBUG(D_INODE, "Lock match for " DFID "\n", PFID(fid)); @@ -3326,21 +3215,21 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, u64 flags, * space of MDT storing inode. Try the MDT that the FID maps to first, * since this can be easily found, and only try others if that fails. */ - for (i = 0, tgt = lmv_find_target_index(lmv, fid); + for (i = 0, index = lmv_fid2tgt_index(lmv, fid); i < lmv->desc.ld_tgt_count; - i++, tgt = (tgt + 1) % lmv->desc.ld_tgt_count) { - if (tgt < 0) { + i++, index = (index + 1) % lmv->desc.ld_tgt_count) { + if (index < 0) { CDEBUG(D_HA, "%s: " DFID " is inaccessible: rc = %d\n", - obd->obd_name, PFID(fid), tgt); - tgt = 0; + obd->obd_name, PFID(fid), index); + index = 0; } - if (!lmv->tgts[tgt] || !lmv->tgts[tgt]->ltd_exp || - !lmv->tgts[tgt]->ltd_active) + tgt = lmv_tgt(lmv, index); + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) continue; - rc = md_lock_match(lmv->tgts[tgt]->ltd_exp, flags, fid, - type, policy, mode, lockh); + rc = md_lock_match(tgt->ltd_exp, flags, fid, type, policy, mode, + lockh); if (rc) return rc; } @@ -3355,7 +3244,7 @@ static int lmv_get_lustre_md(struct obd_export *exp, struct lustre_md *md) { struct lmv_obd *lmv = &exp->exp_obd->u.lmv; - struct lmv_tgt_desc *tgt = lmv->tgts[0]; + struct lmv_tgt_desc *tgt = lmv_tgt(lmv, 0); if (!tgt || !tgt->ltd_exp) return -EINVAL; @@ -3366,7 +3255,7 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md) { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_tgt_desc *tgt = lmv->tgts[0]; + struct lmv_tgt_desc *tgt = lmv_tgt(lmv, 0); if (md->default_lmv) { lmv_free_memmd(md->default_lmv); @@ -3389,7 +3278,7 @@ static int lmv_set_open_replay_data(struct obd_export *exp, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, &och->och_fid); + tgt = lmv_fid2tgt(lmv, &och->och_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -3403,7 +3292,7 @@ static int lmv_clear_open_replay_data(struct obd_export *exp, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, &och->och_fid); + tgt = lmv_fid2tgt(lmv, &och->och_fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -3426,7 +3315,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp, if (IS_ERR(ptgt)) return PTR_ERR(ptgt); - ctgt = lmv_find_target(lmv, &op_data->op_fid2); + ctgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(ctgt)) return PTR_ERR(ctgt); @@ -3447,7 +3336,7 @@ static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *tgt; - tgt = lmv_find_target(lmv, fid); + tgt = lmv_fid2tgt(lmv, fid); if (IS_ERR(tgt)) return PTR_ERR(tgt); @@ -3482,13 +3371,11 @@ static int lmv_quotactl(struct obd_device *unused, struct obd_export *exp, { struct obd_device *obd = class_exp2obd(exp); struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_tgt_desc *tgt = lmv->tgts[0]; + struct lmv_tgt_desc *tgt = lmv_tgt(lmv, 0); int rc = 0; u64 curspace = 0, curinodes = 0; - u32 i; - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active || - !lmv->desc.ld_tgt_count) { + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) { CERROR("master lmv inactive\n"); return -EIO; } @@ -3496,17 +3383,16 @@ static int lmv_quotactl(struct obd_device *unused, struct obd_export *exp, if (oqctl->qc_cmd != Q_GETOQUOTA) return obd_quotactl(tgt->ltd_exp, oqctl); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + lmv_foreach_connected_tgt(lmv, tgt) { int err; - tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + if (!tgt->ltd_active) continue; err = obd_quotactl(tgt->ltd_exp, oqctl); if (err) { - CERROR("getquota on mdt %d failed. %d\n", i, err); + CERROR("getquota on mdt %d failed. %d\n", + tgt->ltd_index, err); if (!rc) rc = err; } else { diff --git a/fs/lustre/lmv/lmv_qos.c b/fs/lustre/lmv/lmv_qos.c index 85053d2e..0bee7c0 100644 --- a/fs/lustre/lmv/lmv_qos.c +++ b/fs/lustre/lmv/lmv_qos.c @@ -77,7 +77,6 @@ static int lmv_qos_calc_ppts(struct lmv_obd *lmv) u64 ba_max, ba_min, ba; u64 ia_max, ia_min, ia; u32 num_active; - unsigned int i; int prio_wide; time64_t now, age; u32 maxage = lmv->desc.ld_qos_maxage; @@ -114,9 +113,8 @@ static int lmv_qos_calc_ppts(struct lmv_obd *lmv) now = ktime_get_real_seconds(); /* Calculate server penalty per object */ - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + lmv_foreach_tgt(lmv, tgt) { + if (!tgt->ltd_exp || !tgt->ltd_active) continue; /* bavail >> 16 to avoid overflow */ @@ -164,9 +162,8 @@ static int lmv_qos_calc_ppts(struct lmv_obd *lmv) * we have to double the MDT penalty */ num_active = 2; - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + lmv_foreach_tgt(lmv, tgt) { + if (!tgt->ltd_exp || !tgt->ltd_active) continue; tgt->ltd_qos.ltq_penalty_per_obj <<= 1; @@ -265,7 +262,6 @@ static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, { struct lu_tgt_qos *ltq; struct lu_svr_qos *svr; - unsigned int i; ltq = &tgt->ltd_qos; LASSERT(ltq); @@ -301,9 +297,8 @@ static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, *total_wt = 0; /* Decrease all MDT penalties */ - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - ltq = &lmv->tgts[i]->ltd_qos; - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + lmv_foreach_tgt(lmv, tgt) { + if (!tgt->ltd_exp || !tgt->ltd_active) continue; if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) @@ -311,7 +306,7 @@ static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, else ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; - lmv_qos_calc_weight(lmv->tgts[i]); + lmv_qos_calc_weight(tgt); /* Recalc the total weight of usable osts */ if (ltq->ltq_usable) @@ -319,7 +314,7 @@ static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, CDEBUG(D_OTHER, "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", - i, ltq->ltq_usable, + tgt->ltd_index, ltq->ltq_usable, tgt_statfs_bavail(tgt) >> 10, ltq->ltq_penalty_per_obj >> 10, ltq->ltq_penalty >> 10, @@ -337,7 +332,6 @@ struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) u64 total_weight = 0; u64 cur_weight = 0; u64 rand; - int i; int rc; if (!lmv_qos_is_usable(lmv)) @@ -356,11 +350,7 @@ struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) goto unlock; } - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - if (!tgt) - continue; - + lmv_foreach_tgt(lmv, tgt) { tgt->ltd_qos.ltq_usable = 0; if (!tgt->ltd_exp || !tgt->ltd_active) continue; @@ -372,10 +362,8 @@ struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) rand = lu_prandom_u64_max(total_weight); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[i]; - - if (!tgt || !tgt->ltd_qos.ltq_usable) + lmv_foreach_tgt(lmv, tgt) { + if (!tgt->ltd_qos.ltq_usable) continue; cur_weight += tgt->ltd_qos.ltq_weight; @@ -404,17 +392,18 @@ struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv->tgts[(i + lmv->lmv_qos_rr_index) % - lmv->desc.ld_tgt_count]; - if (tgt && tgt->ltd_exp && tgt->ltd_active) { - *mdt = tgt->ltd_index; - lmv->lmv_qos_rr_index = - (i + lmv->lmv_qos_rr_index + 1) % - lmv->desc.ld_tgt_count; - spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); - - return tgt; - } + tgt = lmv_tgt(lmv, + (i + lmv->lmv_qos_rr_index) % lmv->desc.ld_tgt_count); + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + + *mdt = tgt->ltd_index; + lmv->lmv_qos_rr_index = + (i + lmv->lmv_qos_rr_index + 1) % + lmv->desc.ld_tgt_count; + spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); + + return tgt; } spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index 659ebeb..af670f8 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -183,14 +183,17 @@ static void *lmv_tgt_seq_start(struct seq_file *p, loff_t *pos) { struct obd_device *dev = p->private; struct lmv_obd *lmv = &dev->u.lmv; + struct lu_tgt_desc *tgt; + + while (*pos < lmv->lmv_mdt_descs.ltd_tgts_size) { + tgt = lmv_tgt(lmv, (u32)*pos); + if (tgt) + return tgt; - while (*pos < lmv->tgts_size) { - if (lmv->tgts[*pos]) - return lmv->tgts[*pos]; ++*pos; } - return NULL; + return NULL; } static void lmv_tgt_seq_stop(struct seq_file *p, void *v) @@ -201,11 +204,14 @@ static void *lmv_tgt_seq_next(struct seq_file *p, void *v, loff_t *pos) { struct obd_device *dev = p->private; struct lmv_obd *lmv = &dev->u.lmv; + struct lu_tgt_desc *tgt; ++*pos; - while (*pos < lmv->tgts_size) { - if (lmv->tgts[*pos]) - return lmv->tgts[*pos]; + while (*pos < lmv->lmv_mdt_descs.ltd_tgts_size) { + tgt = lmv_tgt(lmv, (u32)*pos); + if (tgt) + return tgt; + ++*pos; } diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 6d762ed..5718a6d 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o integrity.o obd_cksum.o lu_qos.o + jobid.o integrity.o obd_cksum.o lu_qos.o lu_tgt_descs.o diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c new file mode 100644 index 0000000..04d6acc --- /dev/null +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -0,0 +1,192 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * This file is part of Lustre, http://www.lustre.org/ + * + * lustre/obdclass/lu_tgt_descs.c + * + * Lustre target descriptions + * These are the only exported functions, they provide some generic + * infrastructure for target description management used by LOD/LMV + * + */ + +#define DEBUG_SUBSYSTEM S_CLASS + +#include +#include +#include +#include +#include +#include +#include + +/** + * Allocate and initialize target table. + * + * A helper function to initialize the target table and allocate + * a bitmap of the available targets. + * + * @ltd target's table to initialize + * + * Return: 0 on success + * negated errno on error + **/ +int lu_tgt_descs_init(struct lu_tgt_descs *ltd) +{ + mutex_init(<d->ltd_mutex); + init_rwsem(<d->ltd_rw_sem); + + /* + * the tgt array and bitmap are allocated/grown dynamically as tgts are + * added to the LOD/LMV, see lu_tgt_descs_add() + */ + ltd->ltd_tgt_bitmap = bitmap_zalloc(BITS_PER_LONG, GFP_NOFS); + if (!ltd->ltd_tgt_bitmap) + return -ENOMEM; + + ltd->ltd_tgts_size = BITS_PER_LONG; + ltd->ltd_tgtnr = 0; + + ltd->ltd_death_row = 0; + ltd->ltd_refcount = 0; + + return 0; +} +EXPORT_SYMBOL(lu_tgt_descs_init); + +/** + * Free bitmap and target table pages. + * + * @ltd target table + */ +void lu_tgt_descs_fini(struct lu_tgt_descs *ltd) +{ + int i; + + bitmap_free(ltd->ltd_tgt_bitmap); + for (i = 0; i < TGT_PTRS; i++) + kfree(ltd->ltd_tgt_idx[i]); + ltd->ltd_tgts_size = 0; +} +EXPORT_SYMBOL(lu_tgt_descs_fini); + +/** + * Expand size of target table. + * + * When the target table is full, we have to extend the table. To do so, + * we allocate new memory with some reserve, move data from the old table + * to the new one and release memory consumed by the old table. + * + * @ltd target table + * @newsize new size of the table + * + * Return: 0 on success + * -ENOMEM if reallocation failed + */ +static int lu_tgt_descs_resize(struct lu_tgt_descs *ltd, u32 newsize) +{ + unsigned long *new_bitmap, *old_bitmap = NULL; + + /* someone else has already resize the array */ + if (newsize <= ltd->ltd_tgts_size) + return 0; + + new_bitmap = bitmap_zalloc(newsize, GFP_NOFS); + if (!new_bitmap) + return -ENOMEM; + + if (ltd->ltd_tgts_size > 0) { + /* the bitmap already exists, copy data from old one */ + bitmap_copy(new_bitmap, ltd->ltd_tgt_bitmap, + ltd->ltd_tgts_size); + old_bitmap = ltd->ltd_tgt_bitmap; + } + + ltd->ltd_tgts_size = newsize; + ltd->ltd_tgt_bitmap = new_bitmap; + + bitmap_free(old_bitmap); + + CDEBUG(D_CONFIG, "tgt size: %d\n", ltd->ltd_tgts_size); + + return 0; +} + +/** + * Add new target to target table. + * + * Extend target table if it's full, update target table and bitmap. + * Notice we need to take ltd_rw_sem exclusively before entry to ensure + * atomic switch. + * + * @ltd target table + * @tgt new target desc + * + * Return: 0 on success + * -ENOMEM if reallocation failed + * -EEXIST if target existed + */ +int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +{ + u32 index = tgt->ltd_index; + int rc; + + if (index >= ltd->ltd_tgts_size) { + u32 newsize = 1; + + while (newsize < index + 1) + newsize = newsize << 1; + + rc = lu_tgt_descs_resize(ltd, newsize); + if (rc) + return rc; + } else if (test_bit(index, ltd->ltd_tgt_bitmap)) { + return -EEXIST; + } + + if (ltd->ltd_tgt_idx[index / TGT_PTRS_PER_BLOCK] == NULL) { + ltd->ltd_tgt_idx[index / TGT_PTRS_PER_BLOCK] = + kzalloc(sizeof(*ltd->ltd_tgt_idx[0]), GFP_NOFS); + if (ltd->ltd_tgt_idx[index / TGT_PTRS_PER_BLOCK] == NULL) + return -ENOMEM; + } + + LTD_TGT(ltd, tgt->ltd_index) = tgt; + set_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); + ltd->ltd_tgtnr++; + + return 0; +} +EXPORT_SYMBOL(lu_tgt_descs_add); + +/** + * Delete target from target table + */ +void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +{ + LTD_TGT(ltd, tgt->ltd_index) = NULL; + clear_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); + ltd->ltd_tgtnr--; +} +EXPORT_SYMBOL(lu_tgt_descs_del); From patchwork Thu Feb 27 21:15:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5057138D for ; Thu, 27 Feb 2020 21:42:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9BD5624690 for ; Thu, 27 Feb 2020 21:42:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BD5624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2BBEA34AC58; Thu, 27 Feb 2020 13:34:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B163348816 for ; Thu, 27 Feb 2020 13:20:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 00971917F; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F37AA468; Thu, 27 Feb 2020 16:18:18 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:54 -0500 Message-Id: <1582838290-17243-487-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 486/622] lustre: lmv: share object alloc QoS code with LMV X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Move object alloc QoS code to obdclass, so that LMV and LOD can share the same code. WC-bug-id: https://jira.whamcloud.com/browse/LU-11213 Lustre-commit: d3090bb2b486 ("LU-11213 lod: share object alloc QoS code with LMV") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35219 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 7 + fs/lustre/lmv/Makefile | 2 +- fs/lustre/lmv/lmv_internal.h | 4 - fs/lustre/lmv/lmv_obd.c | 87 +++++++++ fs/lustre/lmv/lmv_qos.c | 411 ------------------------------------------ fs/lustre/obdclass/lu_qos.c | 303 +++++++++++++++++++++++++++++++ 6 files changed, 398 insertions(+), 416 deletions(-) delete mode 100644 fs/lustre/lmv/lmv_qos.c diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index c30c06d..eaf20ea 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1442,6 +1442,13 @@ struct lu_qos { void lu_qos_rr_init(struct lu_qos_rr *lqr); int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); +bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr); +int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, + u32 active_tgt_nr, u32 maxage, bool is_mdt); +void lqos_calc_weight(struct lu_tgt_desc *tgt); +int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, + struct lu_tgt_desc *tgt, u32 active_tgt_nr, + u64 *total_wt); u64 lu_prandom_u64_max(u64 ep_ro); int lu_tgt_descs_init(struct lu_tgt_descs *ltd); diff --git a/fs/lustre/lmv/Makefile b/fs/lustre/lmv/Makefile index 6f9a19c..ad470bf 100644 --- a/fs/lustre/lmv/Makefile +++ b/fs/lustre/lmv/Makefile @@ -1,4 +1,4 @@ ccflags-y += -I$(srctree)/$(src)/../include obj-$(CONFIG_LUSTRE_FS) += lmv.o -lmv-y := lmv_obd.o lmv_intent.o lmv_fld.o lproc_lmv.o lmv_qos.o +lmv-y := lmv_obd.o lmv_intent.o lmv_fld.o lproc_lmv.o diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index e0c3ba0..d95fa3f 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -218,10 +218,6 @@ static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) struct lmv_tgt_desc *lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data); -/* lmv_qos.c */ -struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt); -struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt); - /* lproc_lmv.c */ int lmv_tunables_init(struct obd_device *obd); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 8d682b4..2959b18 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1518,6 +1518,93 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } +static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) +{ + struct lu_tgt_desc *tgt; + u64 total_weight = 0; + u64 cur_weight = 0; + u64 rand; + int rc; + + if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) + return ERR_PTR(-EAGAIN); + + down_write(&lmv->lmv_qos.lq_rw_sem); + + if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) { + tgt = ERR_PTR(-EAGAIN); + goto unlock; + } + + rc = lqos_calc_penalties(&lmv->lmv_qos, &lmv->lmv_mdt_descs, + lmv->desc.ld_active_tgt_count, + lmv->desc.ld_qos_maxage, true); + if (rc) { + tgt = ERR_PTR(rc); + goto unlock; + } + + lmv_foreach_tgt(lmv, tgt) { + tgt->ltd_qos.ltq_usable = 0; + if (!tgt->ltd_exp || !tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_usable = 1; + lqos_calc_weight(tgt); + total_weight += tgt->ltd_qos.ltq_weight; + } + + rand = lu_prandom_u64_max(total_weight); + + lmv_foreach_connected_tgt(lmv, tgt) { + if (!tgt->ltd_qos.ltq_usable) + continue; + + cur_weight += tgt->ltd_qos.ltq_weight; + if (cur_weight < rand) + continue; + + *mdt = tgt->ltd_index; + lqos_recalc_weight(&lmv->lmv_qos, &lmv->lmv_mdt_descs, tgt, + lmv->desc.ld_active_tgt_count, + &total_weight); + rc = 0; + goto unlock; + } + + /* no proper target found */ + tgt = ERR_PTR(-EAGAIN); + goto unlock; +unlock: + up_write(&lmv->lmv_qos.lq_rw_sem); + + return tgt; +} + +static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) +{ + struct lu_tgt_desc *tgt; + int i; + int index; + + spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); + for (i = 0; i < lmv->desc.ld_tgt_count; i++) { + index = (i + lmv->lmv_qos_rr_index) % lmv->desc.ld_tgt_count; + tgt = lmv_tgt(lmv, index); + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + + *mdt = tgt->ltd_index; + lmv->lmv_qos_rr_index = (*mdt + 1) % lmv->desc.ld_tgt_count; + spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); + + return tgt; + } + spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); + + return ERR_PTR(-ENODEV); +} + static struct lmv_tgt_desc * lmv_locate_tgt_by_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm, const char *name, int namelen, struct lu_fid *fid, diff --git a/fs/lustre/lmv/lmv_qos.c b/fs/lustre/lmv/lmv_qos.c deleted file mode 100644 index 0bee7c0..0000000 --- a/fs/lustre/lmv/lmv_qos.c +++ /dev/null @@ -1,411 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * - * lustre/lmv/lmv_qos.c - * - * LMV QoS. - * These are the only exported functions, they provide some generic - * infrastructure for object allocation QoS - * - */ - -#define DEBUG_SUBSYSTEM S_LMV - -#include -#include -#include -#include - -#include "lmv_internal.h" - -static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) -{ - struct obd_statfs *statfs = &tgt->ltd_statfs; - - return statfs->os_bavail * statfs->os_bsize; -} - -static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) -{ - return tgt->ltd_statfs.os_ffree; -} - -/** - * Calculate penalties per-tgt and per-server - * - * Re-calculate penalties when the configuration changes, active targets - * change and after statfs refresh (all these are reflected by lq_dirty flag). - * On every MDT and MDS: decay the penalty by half for every 8x the update - * interval that the device has been idle. That gives lots of time for the - * statfs information to be updated (which the penalty is only a proxy for), - * and avoids penalizing MDS/MDTs under light load. - * See lmv_qos_calc_weight() for how penalties are factored into the weight. - * - * @lmv LMV device - * - * Return: 0 on success - * -EAGAIN if the number of MDTs isn't enough or all - * MDT spaces are almost the same - */ -static int lmv_qos_calc_ppts(struct lmv_obd *lmv) -{ - struct lu_qos *qos = &lmv->lmv_qos; - struct lu_tgt_desc *tgt; - struct lu_svr_qos *svr; - u64 ba_max, ba_min, ba; - u64 ia_max, ia_min, ia; - u32 num_active; - int prio_wide; - time64_t now, age; - u32 maxage = lmv->desc.ld_qos_maxage; - int rc = 0; - - - if (!qos->lq_dirty) - goto out; - - num_active = lmv->desc.ld_active_tgt_count; - if (num_active < 2) { - rc = -EAGAIN; - goto out; - } - - /* find bavail on each server */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - svr->lsq_bavail = 0; - svr->lsq_iavail = 0; - } - qos->lq_active_svr_count = 0; - - /* - * How badly user wants to select targets "widely" (not recently chosen - * and not on recent MDS's). As opposed to "freely" (free space avail.) - * 0-256 - */ - prio_wide = 256 - qos->lq_prio_free; - - ba_min = (u64)(-1); - ba_max = 0; - ia_min = (u64)(-1); - ia_max = 0; - now = ktime_get_real_seconds(); - - /* Calculate server penalty per object */ - lmv_foreach_tgt(lmv, tgt) { - if (!tgt->ltd_exp || !tgt->ltd_active) - continue; - - /* bavail >> 16 to avoid overflow */ - ba = tgt_statfs_bavail(tgt) >> 16; - if (!ba) - continue; - - ba_min = min(ba, ba_min); - ba_max = max(ba, ba_max); - - /* iavail >> 8 to avoid overflow */ - ia = tgt_statfs_iavail(tgt) >> 8; - if (!ia) - continue; - - ia_min = min(ia, ia_min); - ia_max = max(ia, ia_max); - - /* Count the number of usable MDS's */ - if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) - qos->lq_active_svr_count++; - tgt->ltd_qos.ltq_svr->lsq_bavail += ba; - tgt->ltd_qos.ltq_svr->lsq_iavail += ia; - - /* - * per-MDT penalty is - * prio * bavail * iavail / (num_tgt - 1) / 2 - */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; - do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active - 1); - tgt->ltd_qos.ltq_penalty_per_obj >>= 1; - - age = (now - tgt->ltd_qos.ltq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - tgt->ltd_qos.ltq_penalty = 0; - else if (age > maxage) - /* Decay tgt penalty. */ - tgt->ltd_qos.ltq_penalty >>= (age / maxage); - } - - num_active = qos->lq_active_svr_count; - if (num_active < 2) { - /* - * If there's only 1 MDS, we can't penalize it, so instead - * we have to double the MDT penalty - */ - num_active = 2; - lmv_foreach_tgt(lmv, tgt) { - if (!tgt->ltd_exp || !tgt->ltd_active) - continue; - - tgt->ltd_qos.ltq_penalty_per_obj <<= 1; - } - } - - /* - * Per-MDS penalty is - * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 - */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - ba = svr->lsq_bavail; - ia = svr->lsq_iavail; - svr->lsq_penalty_per_obj = prio_wide * ba * ia; - do_div(ba, svr->lsq_tgt_count * (num_active - 1)); - svr->lsq_penalty_per_obj >>= 1; - - age = (now - svr->lsq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - svr->lsq_penalty = 0; - else if (age > maxage) - /* Decay server penalty. */ - svr->lsq_penalty >>= age / maxage; - } - - qos->lq_dirty = 0; - qos->lq_reset = 0; - - /* - * If each MDT has almost same free space, do rr allocation for better - * creation performance - */ - qos->lq_same_space = 0; - if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && - (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { - qos->lq_same_space = 1; - /* Reset weights for the next time we enter qos mode */ - qos->lq_reset = 1; - } - rc = 0; - -out: - if (!rc && qos->lq_same_space) - return -EAGAIN; - - return rc; -} - -static inline bool lmv_qos_is_usable(struct lmv_obd *lmv) -{ - if (!lmv->lmv_qos.lq_dirty && lmv->lmv_qos.lq_same_space) - return false; - - if (lmv->desc.ld_active_tgt_count < 2) - return false; - - return true; -} - -/** - * Calculate weight for a given MDT. - * - * The final MDT weight is bavail >> 16 * iavail >> 8 minus the MDT and MDS - * penalties. See lmv_qos_calc_ppts() for how penalties are calculated. - * - * \param[in] tgt MDT target descriptor - */ -static void lmv_qos_calc_weight(struct lu_tgt_desc *tgt) -{ - struct lu_tgt_qos *ltq = &tgt->ltd_qos; - u64 temp, temp2; - - temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); - temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; - if (temp < temp2) - ltq->ltq_weight = 0; - else - ltq->ltq_weight = temp - temp2; -} - -/** - * Re-calculate weights. - * - * The function is called when some target was used for a new object. In - * this case we should re-calculate all the weights to keep new allocations - * balanced well. - * - * \param[in] lmv LMV device - * \param[in] tgt target where a new object was placed - * \param[out] total_wt new total weight for the pool - * - * \retval 0 - */ -static int lmv_qos_used(struct lmv_obd *lmv, struct lu_tgt_desc *tgt, - u64 *total_wt) -{ - struct lu_tgt_qos *ltq; - struct lu_svr_qos *svr; - - ltq = &tgt->ltd_qos; - LASSERT(ltq); - - /* Don't allocate on this device anymore, until the next alloc_qos */ - ltq->ltq_usable = 0; - - svr = ltq->ltq_svr; - - /* - * Decay old penalty by half (we're adding max penalty, and don't - * want it to run away.) - */ - ltq->ltq_penalty >>= 1; - svr->lsq_penalty >>= 1; - - /* mark the MDS and MDT as recently used */ - ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); - - /* Set max penalties for this MDT and MDS */ - ltq->ltq_penalty += ltq->ltq_penalty_per_obj * - lmv->desc.ld_active_tgt_count; - svr->lsq_penalty += svr->lsq_penalty_per_obj * - lmv->lmv_qos.lq_active_svr_count; - - /* Decrease all MDS penalties */ - list_for_each_entry(svr, &lmv->lmv_qos.lq_svr_list, lsq_svr_list) { - if (svr->lsq_penalty < svr->lsq_penalty_per_obj) - svr->lsq_penalty = 0; - else - svr->lsq_penalty -= svr->lsq_penalty_per_obj; - } - - *total_wt = 0; - /* Decrease all MDT penalties */ - lmv_foreach_tgt(lmv, tgt) { - if (!tgt->ltd_exp || !tgt->ltd_active) - continue; - - if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) - ltq->ltq_penalty = 0; - else - ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; - - lmv_qos_calc_weight(tgt); - - /* Recalc the total weight of usable osts */ - if (ltq->ltq_usable) - *total_wt += ltq->ltq_weight; - - CDEBUG(D_OTHER, - "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", - tgt->ltd_index, ltq->ltq_usable, - tgt_statfs_bavail(tgt) >> 10, - ltq->ltq_penalty_per_obj >> 10, - ltq->ltq_penalty >> 10, - ltq->ltq_svr->lsq_penalty_per_obj >> 10, - ltq->ltq_svr->lsq_penalty >> 10, - ltq->ltq_weight >> 10); - } - - return 0; -} - -struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) -{ - struct lu_tgt_desc *tgt; - u64 total_weight = 0; - u64 cur_weight = 0; - u64 rand; - int rc; - - if (!lmv_qos_is_usable(lmv)) - return ERR_PTR(-EAGAIN); - - down_write(&lmv->lmv_qos.lq_rw_sem); - - if (!lmv_qos_is_usable(lmv)) { - tgt = ERR_PTR(-EAGAIN); - goto unlock; - } - - rc = lmv_qos_calc_ppts(lmv); - if (rc) { - tgt = ERR_PTR(rc); - goto unlock; - } - - lmv_foreach_tgt(lmv, tgt) { - tgt->ltd_qos.ltq_usable = 0; - if (!tgt->ltd_exp || !tgt->ltd_active) - continue; - - tgt->ltd_qos.ltq_usable = 1; - lmv_qos_calc_weight(tgt); - total_weight += tgt->ltd_qos.ltq_weight; - } - - rand = lu_prandom_u64_max(total_weight); - - lmv_foreach_tgt(lmv, tgt) { - if (!tgt->ltd_qos.ltq_usable) - continue; - - cur_weight += tgt->ltd_qos.ltq_weight; - if (cur_weight < rand) - continue; - - *mdt = tgt->ltd_index; - lmv_qos_used(lmv, tgt, &total_weight); - rc = 0; - goto unlock; - } - - /* no proper target found */ - tgt = ERR_PTR(-EAGAIN); - goto unlock; -unlock: - up_write(&lmv->lmv_qos.lq_rw_sem); - - return tgt; -} - -struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) -{ - struct lu_tgt_desc *tgt; - int i; - - spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - tgt = lmv_tgt(lmv, - (i + lmv->lmv_qos_rr_index) % lmv->desc.ld_tgt_count); - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) - continue; - - *mdt = tgt->ltd_index; - lmv->lmv_qos_rr_index = - (i + lmv->lmv_qos_rr_index + 1) % - lmv->desc.ld_tgt_count; - spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); - - return tgt; - } - spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); - - return ERR_PTR(-ENODEV); -} diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c index d4803e8..e77e81d 100644 --- a/fs/lustre/obdclass/lu_qos.c +++ b/fs/lustre/obdclass/lu_qos.c @@ -207,3 +207,306 @@ u64 lu_prandom_u64_max(u64 ep_ro) return rand; } EXPORT_SYMBOL(lu_prandom_u64_max); + +static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) +{ + struct obd_statfs *statfs = &tgt->ltd_statfs; + + return statfs->os_bavail * statfs->os_bsize; +} + +static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) +{ + return tgt->ltd_statfs.os_ffree; +} + +/** + * Calculate penalties per-tgt and per-server + * + * Re-calculate penalties when the configuration changes, active targets + * change and after statfs refresh (all these are reflected by lq_dirty flag). + * On every tgt and server: decay the penalty by half for every 8x the update + * interval that the device has been idle. That gives lots of time for the + * statfs information to be updated (which the penalty is only a proxy for), + * and avoids penalizing server/tgt under light load. + * See lqos_calc_weight() for how penalties are factored into the weight. + * + * @qos lu_qos + * @ltd lu_tgt_descs + * @active_tgt_nr active tgt number + * @ maxage qos max age + * @is_mdt MDT will count inode usage + * + * Return: 0 on success + * -EAGAIN the number of tgt isn't enough or all + * tgt spaces are almost the same + */ +int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, + u32 active_tgt_nr, u32 maxage, bool is_mdt) +{ + struct lu_tgt_desc *tgt; + struct lu_svr_qos *svr; + u64 ba_max, ba_min, ba; + u64 ia_max, ia_min, ia = 1; + u32 num_active; + int prio_wide; + time64_t now, age; + int rc; + + if (!qos->lq_dirty) { + rc = 0; + goto out; + } + + num_active = active_tgt_nr - 1; + if (num_active < 1) { + rc = -EAGAIN; + goto out; + } + + /* find bavail on each server */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + svr->lsq_bavail = 0; + /* if inode is not counted, set to 1 to ignore */ + svr->lsq_iavail = is_mdt ? 0 : 1; + } + qos->lq_active_svr_count = 0; + + /* + * How badly user wants to select targets "widely" (not recently chosen + * and not on recent MDS's). As opposed to "freely" (free space avail.) + * 0-256 + */ + prio_wide = 256 - qos->lq_prio_free; + + ba_min = (u64)(-1); + ba_max = 0; + ia_min = (u64)(-1); + ia_max = 0; + now = ktime_get_real_seconds(); + + /* Calculate server penalty per object */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + /* when inode is counted, bavail >> 16 to avoid overflow */ + ba = tgt_statfs_bavail(tgt); + if (is_mdt) + ba >>= 16; + else + ba >>= 8; + if (!ba) + continue; + + ba_min = min(ba, ba_min); + ba_max = max(ba, ba_max); + + /* Count the number of usable servers */ + if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) + qos->lq_active_svr_count++; + tgt->ltd_qos.ltq_svr->lsq_bavail += ba; + + if (is_mdt) { + /* iavail >> 8 to avoid overflow */ + ia = tgt_statfs_iavail(tgt) >> 8; + if (!ia) + continue; + + ia_min = min(ia, ia_min); + ia_max = max(ia, ia_max); + + tgt->ltd_qos.ltq_svr->lsq_iavail += ia; + } + + /* + * per-tgt penalty is + * prio * bavail * iavail / (num_tgt - 1) / 2 + */ + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); + tgt->ltd_qos.ltq_penalty_per_obj >>= 1; + + age = (now - tgt->ltd_qos.ltq_used) >> 3; + if (qos->lq_reset || age > 32 * maxage) + tgt->ltd_qos.ltq_penalty = 0; + else if (age > maxage) + /* Decay tgt penalty. */ + tgt->ltd_qos.ltq_penalty >>= (age / maxage); + } + + num_active = qos->lq_active_svr_count - 1; + if (num_active < 1) { + /* + * If there's only 1 server, we can't penalize it, so instead + * we have to double the tgt penalty + */ + num_active = 1; + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_penalty_per_obj <<= 1; + } + } + + /* + * Per-server penalty is + * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + ba = svr->lsq_bavail; + ia = svr->lsq_iavail; + svr->lsq_penalty_per_obj = prio_wide * ba * ia; + do_div(ba, svr->lsq_tgt_count * num_active); + svr->lsq_penalty_per_obj >>= 1; + + age = (now - svr->lsq_used) >> 3; + if (qos->lq_reset || age > 32 * maxage) + svr->lsq_penalty = 0; + else if (age > maxage) + /* Decay server penalty. */ + svr->lsq_penalty >>= age / maxage; + } + + qos->lq_dirty = 0; + qos->lq_reset = 0; + + /* + * If each tgt has almost same free space, do rr allocation for better + * creation performance + */ + qos->lq_same_space = 0; + if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && + (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { + qos->lq_same_space = 1; + /* Reset weights for the next time we enter qos mode */ + qos->lq_reset = 1; + } + rc = 0; + +out: + if (!rc && qos->lq_same_space) + return -EAGAIN; + + return rc; +} +EXPORT_SYMBOL(lqos_calc_penalties); + +bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr) +{ + if (!qos->lq_dirty && qos->lq_same_space) + return false; + + if (active_tgt_nr < 2) + return false; + + return true; +} +EXPORT_SYMBOL(lqos_is_usable); + +/** + * Calculate weight for a given tgt. + * + * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server + * penalties. See lqos_calc_ppts() for how penalties are calculated. + * + * @tgt target descriptor + */ +void lqos_calc_weight(struct lu_tgt_desc *tgt) +{ + struct lu_tgt_qos *ltq = &tgt->ltd_qos; + u64 temp, temp2; + + temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); + temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; + if (temp < temp2) + ltq->ltq_weight = 0; + else + ltq->ltq_weight = temp - temp2; +} +EXPORT_SYMBOL(lqos_calc_weight); + +/** + * Re-calculate weights. + * + * The function is called when some target was used for a new object. In + * this case we should re-calculate all the weights to keep new allocations + * balanced well. + * + * @qos lu_qos + * @ltd lu_tgt_descs + * @tgt target where a new object was placed + * @active_tgt_nr active tgt number + * @total_wt new total weight for the pool + * + * Return: 0 + */ +int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, + struct lu_tgt_desc *tgt, u32 active_tgt_nr, + u64 *total_wt) +{ + struct lu_tgt_qos *ltq; + struct lu_svr_qos *svr; + + ltq = &tgt->ltd_qos; + LASSERT(ltq); + + /* Don't allocate on this device anymore, until the next alloc_qos */ + ltq->ltq_usable = 0; + + svr = ltq->ltq_svr; + + /* + * Decay old penalty by half (we're adding max penalty, and don't + * want it to run away.) + */ + ltq->ltq_penalty >>= 1; + svr->lsq_penalty >>= 1; + + /* mark the server and tgt as recently used */ + ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); + + /* Set max penalties for this tgt and server */ + ltq->ltq_penalty += ltq->ltq_penalty_per_obj * active_tgt_nr; + svr->lsq_penalty += svr->lsq_penalty_per_obj * active_tgt_nr; + + /* Decrease all MDS penalties */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_penalty < svr->lsq_penalty_per_obj) + svr->lsq_penalty = 0; + else + svr->lsq_penalty -= svr->lsq_penalty_per_obj; + } + + *total_wt = 0; + /* Decrease all tgt penalties */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) + ltq->ltq_penalty = 0; + else + ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; + + lqos_calc_weight(tgt); + + /* Recalc the total weight of usable osts */ + if (ltq->ltq_usable) + *total_wt += ltq->ltq_weight; + + CDEBUG(D_OTHER, + "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", + tgt->ltd_index, ltq->ltq_usable, + tgt_statfs_bavail(tgt) >> 10, + ltq->ltq_penalty_per_obj >> 10, + ltq->ltq_penalty >> 10, + ltq->ltq_svr->lsq_penalty_per_obj >> 10, + ltq->ltq_svr->lsq_penalty >> 10, + ltq->ltq_weight >> 10); + } + + return 0; +} +EXPORT_SYMBOL(lqos_recalc_weight); From patchwork Thu Feb 27 21:15:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB376138D for ; Thu, 27 Feb 2020 21:44:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D401C24690 for ; Thu, 27 Feb 2020 21:44:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D401C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4F38534AEF6; Thu, 27 Feb 2020 13:35:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0CD321FF0D for ; Thu, 27 Feb 2020 13:20:49 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0357C9180; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 022FF46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:55 -0500 Message-Id: <1582838290-17243-488-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 487/622] lustre: import: Fix missing spin_unlock() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown A recent patch moved the spin_unlock() down into each branch of an 'if', but missed the final 'else'. Add the spin_unlock in the else. Fixes: 428ed8100580 ("lustre: import: fix race between imp_state & imp_invalid") WC-bug-id: https://jira.whamcloud.com/browse/LU-11542 Lustre-commit: 3dbdd38a6adc ("LU-11542 import: Fix missing spin_unlock()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35999 Reviewed-by: Yang Sheng Reviewed-by: James Simmons Reviewed-by: Wang Shilong Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/pinger.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index a812942..f584fc6 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -242,6 +242,8 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp, } else if ((imp->imp_pingable && !suppress) || force_next || force) { spin_unlock(&imp->imp_lock); ptlrpc_ping(imp); + } else { + spin_unlock(&imp->imp_lock); } } From patchwork Thu Feb 27 21:15:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410631 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F43D924 for ; Thu, 27 Feb 2020 21:42:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 17F3B24690 for ; Thu, 27 Feb 2020 21:42:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 17F3B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4DD9334AD0B; Thu, 27 Feb 2020 13:34:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F7EC348821 for ; Thu, 27 Feb 2020 13:20:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 06C6C9181; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0541C46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:56 -0500 Message-Id: <1582838290-17243-489-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 488/622] lnet: o2iblnd: Make credits hiw connection aware X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The IBLND_CREDITS_HIGHWATER mark check currently looks only at the global peer credits tunable, ignoring the connection specific queue depth when determining the threshold at which to send a NOOP message to return credits. This is incorrect because while connection queue depth defaults to the same as peer credits, it can be less than that global value for specific connections. So we must check for this case when setting the threshold. WC-bug-id: https://jira.whamcloud.com/browse/LU-12569 Lustre-commit: 1b87e8f61781 ("LU-12569 o2iblnd: Make credits hiw connection aware") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35578 Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 2f2337a..bc79874 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -102,9 +102,11 @@ struct kib_tunables { #define IBLND_CREDITS_MAX ((typeof(((struct kib_msg *)0)->ibm_credits)) - 1) /* when eagerly to return credits */ -#define IBLND_CREDITS_HIGHWATER(t, v) ((v) == IBLND_MSG_VERSION_1 ? \ - IBLND_CREDIT_HIGHWATER_V1 : \ - t->lnd_peercredits_hiw) +#define IBLND_CREDITS_HIGHWATER(t, conn) \ + (((conn)->ibc_version) == IBLND_MSG_VERSION_1 ? \ + IBLND_CREDIT_HIGHWATER_V1 : \ + min((t)->lnd_peercredits_hiw, \ + (u32)(conn)->ibc_queue_depth - 1)) # define kiblnd_rdma_create_id(ns, cb, dev, ps, qpt) rdma_create_id(ns, cb, \ dev, ps, \ @@ -791,7 +793,7 @@ struct kib_peer_ni { tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; if (conn->ibc_outstanding_credits < - IBLND_CREDITS_HIGHWATER(tunables, conn->ibc_version) && + IBLND_CREDITS_HIGHWATER(tunables, conn) && !kiblnd_send_keepalive(conn)) return 0; /* No need to send NOOP */ From patchwork Thu Feb 27 21:15:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410635 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E465517E0 for ; Thu, 27 Feb 2020 21:43:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CC8E224690 for ; Thu, 27 Feb 2020 21:43:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC8E224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6EA3834ADB5; Thu, 27 Feb 2020 13:34:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 84CA5348821 for ; Thu, 27 Feb 2020 13:20:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0900E9182; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0809C46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:57 -0500 Message-Id: <1582838290-17243-490-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 489/622] lustre: obdecho: avoid panic with partially object init X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov in some cases (like ENOMEM) init function can't called, so any init related code should placed in the object delete handler, not in the object free. WC-bug-id: https://jira.whamcloud.com/browse/LU-12707 Lustre-commit: 1a9ca8417c60 ("LU-12707 obdecho: avoid panic with partially object init") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/35950 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdecho/echo_client.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 84823ec..172fe11 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -444,10 +444,16 @@ static int echo_object_init(const struct lu_env *env, struct lu_object *obj, return 0; } -static void echo_object_free(const struct lu_env *env, struct lu_object *obj) +static void echo_object_delete(const struct lu_env *env, struct lu_object *obj) { struct echo_object *eco = cl2echo_obj(lu2cl(obj)); - struct echo_client_obd *ec = eco->eo_dev->ed_ec; + struct echo_client_obd *ec; + + /* object delete called unconditolally - layer init or not */ + if (!eco->eo_dev) + return; + + ec = eco->eo_dev->ed_ec; LASSERT(atomic_read(&eco->eo_npages) == 0); @@ -455,10 +461,16 @@ static void echo_object_free(const struct lu_env *env, struct lu_object *obj) list_del_init(&eco->eo_obj_chain); spin_unlock(&ec->ec_lock); + kfree(eco->eo_oinfo); +} + +static void echo_object_free(const struct lu_env *env, struct lu_object *obj) +{ + struct echo_object *eco = cl2echo_obj(lu2cl(obj)); + lu_object_fini(obj); lu_object_header_fini(obj->lo_header); - kfree(eco->eo_oinfo); kmem_cache_free(echo_object_kmem, eco); } @@ -472,7 +484,7 @@ static int echo_object_print(const struct lu_env *env, void *cookie, static const struct lu_object_operations echo_lu_obj_ops = { .loo_object_init = echo_object_init, - .loo_object_delete = NULL, + .loo_object_delete = echo_object_delete, .loo_object_release = NULL, .loo_object_free = echo_object_free, .loo_object_print = echo_object_print, From patchwork Thu Feb 27 21:15:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE2E1924 for ; Thu, 27 Feb 2020 21:44:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D501824690 for ; Thu, 27 Feb 2020 21:44:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D501824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 38D02348EAC; Thu, 27 Feb 2020 13:35:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C5E03348821 for ; Thu, 27 Feb 2020 13:20:50 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0C13C9183; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0AD2647C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:58 -0500 Message-Id: <1582838290-17243-491-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 490/622] lnet: o2iblnd: cache max_qp_wr X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When creating the device the maximum number of work requests per qp which can be allocated is already known. Cache that internally, and when creating the qp make sure the qp's max_send_wr does not exceed that max. If it does then cap max_send_wr to max_qp_wr. Recalculate the connection's queue depth based on the max_qp_wr. WC-bug-id: https://jira.whamcloud.com/browse/LU-12621 Lustre-commit: 7ee319ed7f9d ("LU-12621 o2iblnd: cache max_qp_wr") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/36073 Reviewed-by: Doug Oucharek Reviewed-by: Olaf Weber Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 42 ++++++++++++++++++++++++---------------- net/lnet/klnds/o2iblnd/o2iblnd.h | 1 + 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 278823f..d4d5d4f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -656,16 +656,28 @@ static unsigned int kiblnd_send_wrs(struct kib_conn *conn) * One WR for the LNet message * And ibc_max_frags for the transfer WRs */ + int ret; + int multiplier = 1 + conn->ibc_max_frags; enum kib_dev_caps dev_caps = conn->ibc_hdev->ibh_dev->ibd_dev_caps; - unsigned int ret = 1 + conn->ibc_max_frags; /* FastReg needs two extra WRs for map and invalidate */ if (dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED) - ret += 2; + multiplier += 2; /* account for a maximum of ibc_queue_depth in-flight transfers */ - ret *= conn->ibc_queue_depth; - return ret; + ret = multiplier * conn->ibc_queue_depth; + + if (ret > conn->ibc_hdev->ibh_max_qp_wr) { + CDEBUG(D_NET, + "peer_credits %u will result in send work request size %d larger than maximum %d device can handle\n", + conn->ibc_queue_depth, ret, + conn->ibc_hdev->ibh_max_qp_wr); + conn->ibc_queue_depth = + conn->ibc_hdev->ibh_max_qp_wr / multiplier; + } + + /* don't go beyond the maximum the device can handle */ + return min(ret, conn->ibc_hdev->ibh_max_qp_wr); } struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, @@ -814,20 +826,13 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, init_qp_attr->qp_type = IB_QPT_RC; init_qp_attr->send_cq = cq; init_qp_attr->recv_cq = cq; + /* kiblnd_send_wrs() can change the connection's queue depth if + * the maximum work requests for the device is maxed out + */ + init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); + init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); - conn->ibc_sched = sched; - - do { - init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); - init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); - - rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); - if (!rc || conn->ibc_queue_depth < 2) - break; - - conn->ibc_queue_depth--; - } while (rc); - + rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); if (rc) { CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d, send_sge: %d, recv_sge: %d\n", rc, init_qp_attr->cap.max_send_wr, @@ -837,6 +842,8 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, goto failed_2; } + conn->ibc_sched = sched; + if (conn->ibc_queue_depth != peer_ni->ibp_queue_depth) CWARN("peer %s - queue depth reduced from %u to %u to allow for qp creation\n", libcfs_nid2str(peer_ni->ibp_nid), @@ -2330,6 +2337,7 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) } hdev->ibh_mr_size = dev_attr->max_mr_size; + hdev->ibh_max_qp_wr = dev_attr->max_qp_wr; CERROR("Invalid mr size: %#llx\n", hdev->ibh_mr_size); return -EINVAL; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index bc79874..ac91757 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -178,6 +178,7 @@ struct kib_hca_dev { int ibh_page_size; /* page size of current HCA */ u64 ibh_page_mask; /* page mask of current HCA */ u64 ibh_mr_size; /* size of MR */ + int ibh_max_qp_wr; /* maximum work requests size */ struct ib_pd *ibh_pd; /* PD */ struct kib_dev *ibh_dev; /* owner */ atomic_t ibh_ref; /* refcount */ From patchwork Thu Feb 27 21:15:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6ECAB138D for ; Thu, 27 Feb 2020 21:43:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 577C6246A1 for ; Thu, 27 Feb 2020 21:43:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 577C6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BDB2334AE10; Thu, 27 Feb 2020 13:34:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2711334882E for ; Thu, 27 Feb 2020 13:20:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0EBAB9184; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0DA1A468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:15:59 -0500 Message-Id: <1582838290-17243-492-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 491/622] lustre: som: integrate LSOM with lfs find X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The patch integrates LSOM functionality with lfs find so that it is possible to use LSOM functionality directly on the client. The MDS fills in the mbo_size and mbo_blocks fields from the LSOM xattr, if the actual size/blocks are not available, and then set new OBD_MD_FLLSIZE and OBD_MD_FLLBLOCKS flags in the reply so that the client knows these fields are valid. The lfs find command adds "-l|--lazy" option to allow the use of LSOM data from the MDS. Add a new version of ioctl(LL_IOC_MDC_GETINFO) call that also returns valid flags from the MDS RPC to userspace in struct lov_user_mds_data so that it is possible to determine whether the size and blocks are returned by the call. The old LL_IOC_MDC_GETINFO ioctl number is renamed to LL_IOC_MDC_GETINFO_OLD and is binary compatible, but newly-compiled applications will use the new struct lov_user_mds_data. New llapi interfaces llapi_get_lum_file(), llapi_get_lum_dir(), llapi_get_lum_file_fd(), llapi_get_lum_dir_fd() are added to fetch valid stat() attributes and LOV info to the user. WC-bug-id: https://jira.whamcloud.com/browse/LU-11367 Lustre-commit: 11aa7f8704c4 ("LU-11367 som: integrate LSOM with lfs find") Signed-off-by: Qian Yingjin Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35167 Reviewed-by: Li Xi Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 97 +++++++++++++++++++++++++++++++-- include/uapi/linux/lustre/lustre_idl.h | 3 + include/uapi/linux/lustre/lustre_user.h | 17 +++++- 3 files changed, 108 insertions(+), 9 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 812f535..4dccd24 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1604,16 +1604,24 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case LL_IOC_LOV_GETSTRIPE: case LL_IOC_LOV_GETSTRIPE_NEW: case LL_IOC_MDC_GETINFO: + case LL_IOC_MDC_GETINFO_OLD: case IOC_MDC_GETFILEINFO: + case IOC_MDC_GETFILEINFO_OLD: case IOC_MDC_GETFILESTRIPE: { struct ptlrpc_request *request = NULL; struct lov_user_md __user *lump; struct lov_mds_md *lmm = NULL; struct mdt_body *body; char *filename = NULL; + lstat_t __user *statp = NULL; + struct statx __user *stxp = NULL; + u64 __user *flagsp = NULL; + u32 __user *lmmsizep = NULL; + struct lu_fid __user *fidp = NULL; int lmmsize; - if (cmd == IOC_MDC_GETFILEINFO || + if (cmd == IOC_MDC_GETFILEINFO_OLD || + cmd == IOC_MDC_GETFILEINFO || cmd == IOC_MDC_GETFILESTRIPE) { filename = ll_getname((const char __user *)arg); if (IS_ERR(filename)) @@ -1635,7 +1643,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) } if (rc == -ENODATA && (cmd == IOC_MDC_GETFILEINFO || - cmd == LL_IOC_MDC_GETINFO)) { + cmd == LL_IOC_MDC_GETINFO || + cmd == IOC_MDC_GETFILEINFO_OLD || + cmd == LL_IOC_MDC_GETINFO_OLD)) { lmmsize = 0; rc = 0; } @@ -1647,10 +1657,21 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) cmd == LL_IOC_LOV_GETSTRIPE || cmd == LL_IOC_LOV_GETSTRIPE_NEW) { lump = (struct lov_user_md __user *)arg; + } else if (cmd == IOC_MDC_GETFILEINFO_OLD || + cmd == LL_IOC_MDC_GETINFO_OLD){ + struct lov_user_mds_data_v1 __user *lmdp; + + lmdp = (struct lov_user_mds_data_v1 __user *)arg; + statp = &lmdp->lmd_st; + lump = &lmdp->lmd_lmm; } else { struct lov_user_mds_data __user *lmdp; lmdp = (struct lov_user_mds_data __user *)arg; + fidp = &lmdp->lmd_fid; + stxp = &lmdp->lmd_stx; + flagsp = &lmdp->lmd_flags; + lmmsizep = &lmdp->lmd_lmmsize; lump = &lmdp->lmd_lmm; } @@ -1670,8 +1691,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) rc = -EOVERFLOW; } - if (cmd == IOC_MDC_GETFILEINFO || cmd == LL_IOC_MDC_GETINFO) { - struct lov_user_mds_data __user *lmdp; + if (cmd == IOC_MDC_GETFILEINFO_OLD || + cmd == LL_IOC_MDC_GETINFO_OLD) { lstat_t st = { 0 }; st.st_dev = inode->i_sb->s_dev; @@ -1690,8 +1711,72 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) sbi->ll_flags & LL_SBI_32BIT_API); - lmdp = (struct lov_user_mds_data __user *)arg; - if (copy_to_user(&lmdp->lmd_st, &st, sizeof(st))) { + if (copy_to_user(statp, &st, sizeof(st))) { + rc = -EFAULT; + goto out_req; + } + } else if (cmd == IOC_MDC_GETFILEINFO || + cmd == LL_IOC_MDC_GETINFO) { + struct statx stx = { 0 }; + u64 valid = body->mbo_valid; + + stx.stx_blksize = PAGE_SIZE; + stx.stx_nlink = body->mbo_nlink; + stx.stx_uid = body->mbo_uid; + stx.stx_gid = body->mbo_gid; + stx.stx_mode = body->mbo_mode; + stx.stx_ino = cl_fid_build_ino(&body->mbo_fid1, + sbi->ll_flags & + LL_SBI_32BIT_API); + stx.stx_size = body->mbo_size; + stx.stx_blocks = body->mbo_blocks; + stx.stx_atime.tv_sec = body->mbo_atime; + stx.stx_ctime.tv_sec = body->mbo_ctime; + stx.stx_mtime.tv_sec = body->mbo_mtime; + stx.stx_rdev_major = MAJOR(body->mbo_rdev); + stx.stx_rdev_minor = MINOR(body->mbo_rdev); + stx.stx_dev_major = MAJOR(inode->i_sb->s_dev); + stx.stx_dev_minor = MINOR(inode->i_sb->s_dev); + stx.stx_mask |= STATX_BASIC_STATS; + + /* + * For a striped directory, the size and blocks returned + * from MDT is not correct. + * The size and blocks are aggregated by client across + * all stripes. + * Thus for a striped directory, do not return the valid + * FLSIZE and FLBLOCKS flags to the caller. + * However, this whould be better decided by the MDS + * instead of the client. + */ + if (cmd == LL_IOC_MDC_GETINFO && + ll_i2info(inode)->lli_lsm_md) + valid &= ~(OBD_MD_FLSIZE | OBD_MD_FLBLOCKS); + + if (flagsp && copy_to_user(flagsp, &valid, + sizeof(*flagsp))) { + rc = -EFAULT; + goto out_req; + } + + if (fidp && copy_to_user(fidp, &body->mbo_fid1, + sizeof(*fidp))) { + rc = -EFAULT; + goto out_req; + } + + if (!(valid & OBD_MD_FLSIZE)) + stx.stx_mask &= ~STATX_SIZE; + if (!(valid & OBD_MD_FLBLOCKS)) + stx.stx_mask &= ~STATX_BLOCKS; + + if (stxp && copy_to_user(stxp, &stx, sizeof(stx))) { + rc = -EFAULT; + goto out_req; + } + + if (lmmsizep && copy_to_user(lmmsizep, &lmmsize, + sizeof(*lmmsizep))) { rc = -EFAULT; goto out_req; } diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 47321ae..d4b29d8 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1211,6 +1211,9 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic) #define OBD_MD_FLPROJID (0x0100000000000000ULL) /* project ID */ #define OBD_MD_SECCTX (0x0200000000000000ULL) /* embed security xattr */ +#define OBD_MD_FLLAZYSIZE (0x0400000000000000ULL) /* Lazy size */ +#define OBD_MD_FLLAZYBLOCKS (0x0800000000000000ULL) /* Lazy blocks */ + #define OBD_MD_FLALLQUOTA (OBD_MD_FLUSRQUOTA | \ OBD_MD_FLGRPQUOTA | \ OBD_MD_FLPRJQUOTA) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 695ceb2..06a691b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -371,8 +371,10 @@ struct ll_ioc_lease_id { #define IOC_MDC_TYPE 'i' #define IOC_MDC_LOOKUP _IOWR(IOC_MDC_TYPE, 20, struct obd_device *) #define IOC_MDC_GETFILESTRIPE _IOWR(IOC_MDC_TYPE, 21, struct lov_user_md *) -#define IOC_MDC_GETFILEINFO _IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data *) -#define LL_IOC_MDC_GETINFO _IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data *) +#define IOC_MDC_GETFILEINFO_OLD _IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data_v1 *) +#define IOC_MDC_GETFILEINFO _IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data) +#define LL_IOC_MDC_GETINFO_OLD _IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data_v1 *) +#define LL_IOC_MDC_GETINFO _IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data) #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added */ @@ -636,12 +638,21 @@ static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic) * is possible the application has already #included . */ #ifdef HAVE_LOV_USER_MDS_DATA -#define lov_user_mds_data lov_user_mds_data_v1 +#define lov_user_mds_data lov_user_mds_data_v2 struct lov_user_mds_data_v1 { lstat_t lmd_st; /* MDS stat struct */ struct lov_user_md_v1 lmd_lmm; /* LOV EA V1 user data */ } __packed; +struct lov_user_mds_data_v2 { + struct lu_fid lmd_fid; /* Lustre FID */ + struct statx lmd_stx; /* MDS statx struct */ + __u64 lmd_flags; /* MDS stat flags */ + __u32 lmd_lmmsize; /* LOV EA size */ + __u32 lmd_padding; /* unused */ + struct lov_user_md_v1 lmd_lmm; /* LOV EA user data */ +} __attribute__((packed)); + struct lov_user_mds_data_v3 { lstat_t lmd_st; /* MDS stat struct */ struct lov_user_md_v3 lmd_lmm; /* LOV EA V3 user data */ From patchwork Thu Feb 27 21:16:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C96E17E0 for ; Thu, 27 Feb 2020 21:39:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 259A824690 for ; Thu, 27 Feb 2020 21:39:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 259A824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B6E8F34A619; Thu, 27 Feb 2020 13:32:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8004D348834 for ; Thu, 27 Feb 2020 13:20:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 11B079185; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1061946A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:00 -0500 Message-Id: <1582838290-17243-493-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 492/622] lustre: llite: error handling of ll_och_fill() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam The return error of ll_och_fill() should be handled. WC-bug-id: https://jira.whamcloud.com/browse/LU-12690 Lustre-commit: 4d6d58575d3d ("LU-12690 llite: error handling of ll_och_fill()") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/35913 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 856aa64..31d7dce 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1091,7 +1091,9 @@ static int ll_lease_och_release(struct inode *inode, struct file *file) goto out_release_it; LASSERT(it_disposition(&it, DISP_ENQ_OPEN_REF)); - ll_och_fill(sbi->ll_md_exp, &it, och); + rc = ll_och_fill(sbi->ll_md_exp, &it, och); + if (rc) + goto out_release_it; if (!it_disposition(&it, DISP_OPEN_LEASE)) /* old server? */ { rc = -EOPNOTSUPP; @@ -2225,7 +2227,9 @@ int ll_release_openhandle(struct inode *inode, struct lookup_intent *it) goto out; } - ll_och_fill(ll_i2sbi(inode)->ll_md_exp, it, och); + rc = ll_och_fill(ll_i2sbi(inode)->ll_md_exp, it, och); + if (rc) + goto out; rc = ll_close_inode_openhandle(inode, och, 0, NULL); out: From patchwork Thu Feb 27 21:16:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE3991580 for ; Thu, 27 Feb 2020 21:45:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 96E83246A1 for ; Thu, 27 Feb 2020 21:45:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 96E83246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA6D534AF44; Thu, 27 Feb 2020 13:35:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C17C634883A for ; Thu, 27 Feb 2020 13:20:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 141D39186; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1319346C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:01 -0500 Message-Id: <1582838290-17243-494-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 493/622] lnet: Don't queue msg when discovery has completed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn In lnet_initiate_peer_discovery(), it is possible for the peer object to change after the call to lnet_discover_peer_locked(), and it is also possible for the peer to complete discovery between the first call to lnet_peer_is_uptodate() and our placing the lnet_msg onto the peer's lp_dc_pendq. After the call to lnet_discover_peer_locked() check whether the, potentially new, peer object is up to date while holding the lp_lock. If the peer is up to date, then we needn't queue the message. Otherwise, we continue to hold the lock to place the message on the peer's lp_dc_pendq. Cray-bug-id: LUS-7596 WC-bug-id: https://jira.whamcloud.com/browse/LU-12739 Lustre-commit: 4ef62976448d ("LU-12739 lnet: Don't queue msg when discovery has completed") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36139 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/lib-move.c | 19 +++++++++++++------ net/lnet/lnet/peer.c | 16 +++++++++++++--- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index f2f5455..db1b7e5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -876,6 +876,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, } bool lnet_peer_is_uptodate(struct lnet_peer *lp); +bool lnet_peer_is_uptodate_locked(struct lnet_peer *lp); bool lnet_is_discovery_disabled(struct lnet_peer *lp); bool lnet_peer_gw_discovery(struct lnet_peer *lp); diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 2f31f06..6da0be4 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1807,15 +1807,21 @@ struct lnet_ni * } /* The peer may have changed. */ peer = lpni->lpni_peer_net->lpn_peer; + spin_lock(&peer->lp_lock); + if (lnet_peer_is_uptodate_locked(peer)) { + spin_unlock(&peer->lp_lock); + lnet_peer_ni_decref_locked(lpni); + return 0; + } /* queue message and return */ msg->msg_rtr_nid_param = rtr_nid; msg->msg_sending = 0; msg->msg_txpeer = NULL; - spin_lock(&peer->lp_lock); list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + primary_nid = peer->lp_primary_nid; spin_unlock(&peer->lp_lock); + lnet_peer_ni_decref_locked(lpni); - primary_nid = peer->lp_primary_nid; CDEBUG(D_NET, "msg %p delayed. %s pending discovery\n", msg, libcfs_nid2str(primary_nid)); @@ -2428,11 +2434,10 @@ struct lnet_ni * */ msg->msg_src_nid_param = src_nid; - /* Now that we have a peer_ni, check if we want to discover - * the peer. Traffic to the LNET_RESERVED_PORTAL should not - * trigger discovery. + /* If necessary, perform discovery on the peer that owns this peer_ni. + * Note, this can result in the ownership of this peer_ni changing + * to another peer object. */ - peer = lpni->lpni_peer_net->lpn_peer; rc = lnet_initiate_peer_discovery(lpni, msg, rtr_nid, cpt); if (rc) { lnet_peer_ni_decref_locked(lpni); @@ -2441,6 +2446,8 @@ struct lnet_ni * } lnet_peer_ni_decref_locked(lpni); + peer = lpni->lpni_peer_net->lpn_peer; + /* Identify the different send cases */ if (src_nid == LNET_NID_ANY) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 088bb62..0d33ade 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1831,6 +1831,17 @@ struct lnet_peer_ni * return rc; } +bool +lnet_peer_is_uptodate(struct lnet_peer *lp) +{ + bool rc; + + spin_lock(&lp->lp_lock); + rc = lnet_peer_is_uptodate_locked(lp); + spin_unlock(&lp->lp_lock); + return rc; +} + /* * Is a peer uptodate from the point of view of discovery? * @@ -1840,11 +1851,11 @@ struct lnet_peer_ni * * Otherwise look at whether the peer needs rediscovering. */ bool -lnet_peer_is_uptodate(struct lnet_peer *lp) +lnet_peer_is_uptodate_locked(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) { bool rc; - spin_lock(&lp->lp_lock); if (lp->lp_state & (LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH)) { @@ -1861,7 +1872,6 @@ struct lnet_peer_ni * } else { rc = false; } - spin_unlock(&lp->lp_lock); return rc; } From patchwork Thu Feb 27 21:16:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FDDB1580 for ; Thu, 27 Feb 2020 21:45:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6843A24690 for ; Thu, 27 Feb 2020 21:45:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6843A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 550FA34AF6C; Thu, 27 Feb 2020 13:36:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 23B91348840 for ; Thu, 27 Feb 2020 13:20:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 16F749187; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 15E4146D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:02 -0500 Message-Id: <1582838290-17243-495-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 494/622] lnet: Use alternate ping processing for non-mr peers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Router peers without multi-rail capabilities (i.e. older Lustre versions) or router peers that have discovery disabled need to use the alternate ping processing introduced by LU-12422. Otherwise, these peers go through the normal discovery processing, but their remote network interfaces are never added to the peer object. This causes routes through these peers to be considered down when avoid_asym_router_failure is enabled. Cray-bug-id: LUS-7866 WC-bug-id: https://jira.whamcloud.com/browse/LU-12763 Lustre-commit: 010f6b1819b9 ("LU-12763 lnet: Use alternate ping processing for non-mr peers") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36182 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/peer.c | 1 + net/lnet/lnet/router.c | 9 ++++++--- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index db1b7e5..56556fd 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -878,6 +878,7 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, bool lnet_peer_is_uptodate(struct lnet_peer *lp); bool lnet_peer_is_uptodate_locked(struct lnet_peer *lp); bool lnet_is_discovery_disabled(struct lnet_peer *lp); +bool lnet_is_discovery_disabled_locked(struct lnet_peer *lp); bool lnet_peer_gw_discovery(struct lnet_peer *lp); static inline bool diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 0d33ade..a067136 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1141,6 +1141,7 @@ struct lnet_peer_ni * bool lnet_is_discovery_disabled_locked(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) { if (lnet_peer_discovery_disabled) return true; diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 7246eea..a5e4af0 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -227,7 +227,7 @@ bool lnet_is_route_alive(struct lnet_route *route) * aliveness information can only be obtained when discovery is * enabled. */ - if (lnet_peer_discovery_disabled) + if (lnet_is_discovery_disabled(gw)) return route->lr_alive; /* check the gateway's interfaces on the route rnet to make sure @@ -316,11 +316,14 @@ bool lnet_is_route_alive(struct lnet_route *route) spin_lock(&lp->lp_lock); lp_state = lp->lp_state; - spin_unlock(&lp->lp_lock); /* only handle replies if discovery is disabled. */ - if (!lnet_peer_discovery_disabled) + if (!lnet_is_discovery_disabled_locked(lp)) { + spin_unlock(&lp->lp_lock); return; + } + + spin_unlock(&lp->lp_lock); if (lp_state & LNET_PEER_PING_FAILED) { CDEBUG(D_NET, From patchwork Thu Feb 27 21:16:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C4841580 for ; Thu, 27 Feb 2020 21:46:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0419824690 for ; Thu, 27 Feb 2020 21:46:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0419824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ACEF134B252; Thu, 27 Feb 2020 13:36:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6721F348840 for ; Thu, 27 Feb 2020 13:20:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1990F9188; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1886A47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:03 -0500 Message-Id: <1582838290-17243-496-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 495/622] lustre: obdclass: qos penalties miscalculated X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao In lqos_calc_penalties(), the penalty_per_obj is miscalculated. Fixes: e6dd0ec9bcd2 ("lustre: lmv: share object alloc QoS code with LMV") WC-bug-id: https://jira.whamcloud.com/browse/LU-12495 Lustre-commit: 9130d05de4e2 ("LU-12495 obdclass: qos penalties miscalculated") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/36269 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_qos.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c index e77e81d..13ab4a7 100644 --- a/fs/lustre/obdclass/lu_qos.c +++ b/fs/lustre/obdclass/lu_qos.c @@ -323,7 +323,7 @@ int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, * per-tgt penalty is * prio * bavail * iavail / (num_tgt - 1) / 2 */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 8; do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); tgt->ltd_qos.ltq_penalty_per_obj >>= 1; @@ -357,7 +357,7 @@ int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { ba = svr->lsq_bavail; ia = svr->lsq_iavail; - svr->lsq_penalty_per_obj = prio_wide * ba * ia; + svr->lsq_penalty_per_obj = prio_wide * ba * ia >> 8; do_div(ba, svr->lsq_tgt_count * num_active); svr->lsq_penalty_per_obj >>= 1; From patchwork Thu Feb 27 21:16:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA08B924 for ; Thu, 27 Feb 2020 21:43:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C29B924690 for ; Thu, 27 Feb 2020 21:43:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C29B924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9796E348783; Thu, 27 Feb 2020 13:34:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A7A5B34884A for ; Thu, 27 Feb 2020 13:20:52 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1C5D39189; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1B4AB468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:04 -0500 Message-Id: <1582838290-17243-497-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 496/622] lustre: osc: wrong cache of LVB attrs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman osc object keeps the cache of LVB, obtained on lock enqueue, in lov_oinfo. This cache gets all the modifications happenning on the client, whereas the original LVB in locks does not get them. At the same time, this cache is lost on object destroy, which may appear on layout change in particular. ldlm locks are left in LRU and could be matched on next operations. First enqueue does not match a lock in LRU due to @kms_ignore in enqueue_base, however if the lock will be obtained on a small offset with some locks existent in LRU on larger offsets, the obtained size will be cut by the policy region when set to KMS. 2nd enqueue can already match and add stale data to oinfo. Thus the OSC cache is left with a small KMS. However the logic of preparing a partial page code checks the KMS to decide if to read a page and as it is small,the page is not read and therefore the non-read part of the page is zeroed. The object destroy detaches dlm locks from osc object, offload the current osc oinfo cache to all the locks, so that it could be reconstructed for the next osc oinfo. Introduce per-lock flag to control the cached attribute status and drop re-enqueue after osc object replacement. This patch also fixes the handling of KMS_IGNORE added in LU-11964. It is used only for skip the self lock in a search there is no other logic for it and it is not needed for DOM locks at all - all the relevant semantics is supposed to be accomplished by cbpending flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-12681 Lustre-commit: 8ac020df4592 ("LU-12681 osc: wrong cache of LVB attrs") Signed-off-by: Vitaly Fertman Cray-bug-id: LUS-7731 Reviewed-on: https://review.whamcloud.com/36199 Reviewed-by: Patrick Farrell Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm_flags.h | 8 ++++++ fs/lustre/llite/namei.c | 3 --- fs/lustre/mdc/mdc_dev.c | 47 ++++++++++++++++++++++-------------- fs/lustre/osc/osc_internal.h | 3 +-- fs/lustre/osc/osc_lock.c | 15 ++++++------ fs/lustre/osc/osc_object.c | 24 +++++++++++++++++- fs/lustre/osc/osc_request.c | 15 ++---------- 7 files changed, 70 insertions(+), 45 deletions(-) diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h index 3d69c49..06337d3 100644 --- a/fs/lustre/include/lustre_dlm_flags.h +++ b/fs/lustre/include/lustre_dlm_flags.h @@ -399,6 +399,14 @@ #define ldlm_is_ndelay(_l) LDLM_TEST_FLAG((_l), 1ULL << 58) #define ldlm_set_ndelay(_l) LDLM_SET_FLAG((_l), 1ULL << 58) +/** + * LVB from this lock is cached in osc object + */ +#define LDLM_FL_LVB_CACHED 0x0800000000000000ULL /* bit 59 */ +#define ldlm_is_lvb_cached(_l) LDLM_TEST_FLAG((_l), 1ULL << 59) +#define ldlm_set_lvb_cached(_l) LDLM_SET_FLAG((_l), 1ULL << 59) +#define ldlm_clear_lvb_cached(_l) LDLM_CLEAR_FLAG((_l), 1ULL << 59) + /** l_flags bits marked as "ast" bits */ #define LDLM_FL_AST_MASK (LDLM_FL_FLOCK_DEADLOCK |\ LDLM_FL_DISCARD_DATA) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index de01a73..ce72910 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -276,9 +276,6 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) CDEBUG(D_INODE, "cannot flush DoM data " DFID": rc = %d\n", PFID(ll_inode2fid(inode)), rc); - lock_res_and_lock(lock); - ldlm_set_kms_ignore(lock); - unlock_res_and_lock(lock); } if (bits & MDS_INODELOCK_LAYOUT) { diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index b49509c..d589f97 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -312,7 +312,6 @@ static int mdc_dlm_blocking_ast0(const struct lu_env *env, dlmlock->l_ast_data = NULL; cl_object_get(obj); } - ldlm_set_kms_ignore(dlmlock); unlock_res_and_lock(dlmlock); /* if l_ast_data is NULL, the dlmlock was enqueued by AGL or @@ -432,7 +431,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc, } static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, - struct lustre_handle *lockh, bool lvb_update) + struct lustre_handle *lockh) { struct ldlm_lock *dlmlock; @@ -473,10 +472,11 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, descr->cld_end = CL_PAGE_EOF; /* no lvb update for matched lock */ - if (lvb_update) { + if (!ldlm_is_lvb_cached(dlmlock)) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); mdc_lock_lvb_update(env, cl2osc(oscl->ols_cl.cls_obj), dlmlock, NULL); + ldlm_set_lvb_cached(dlmlock); } } unlock_res_and_lock(dlmlock); @@ -514,7 +514,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh, CDEBUG(D_INODE, "rc %d, err %d\n", rc, errcode); if (rc == 0) - mdc_lock_granted(env, oscl, lockh, errcode == ELDLM_OK); + mdc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ if (oscl->ols_locklessable && rc == -EUSERS) { @@ -685,10 +685,8 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, * LVB information, e.g. canceled locks or locks of just pruned object, * such locks should be skipped. */ - mode = ldlm_lock_match_with_skip(obd->obd_namespace, match_flags, - LDLM_FL_KMS_IGNORE, res_id, - einfo->ei_type, policy, mode, - &lockh, 0); + mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id, + einfo->ei_type, policy, mode, &lockh, 0); if (mode) { struct ldlm_lock *matched; @@ -696,13 +694,6 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, return ELDLM_OK; matched = ldlm_handle2lock(&lockh); - /* this shouldn't happen but this check is kept to make - * related test fail if problem occurs - */ - if (unlikely(ldlm_is_kms_ignore(matched))) { - LDLM_ERROR(matched, "matched lock has KMS ignore flag"); - goto no_match; - } if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GLIMPSE_DDOS)) ldlm_set_kms_ignore(matched); @@ -717,7 +708,6 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, LDLM_LOCK_PUT(matched); return ELDLM_OK; } -no_match: ldlm_lock_decref(&lockh, mode); LDLM_LOCK_PUT(matched); } @@ -1362,9 +1352,30 @@ static int mdc_attr_get(const struct lu_env *env, struct cl_object *obj, static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data) { - if (lock->l_ast_data == data) + struct osc_object *osc = (struct osc_object *)data; + struct ost_lvb *lvb = &lock->l_ost_lvb; + struct lov_oinfo *oinfo; + + if (lock->l_ast_data == data) { lock->l_ast_data = NULL; - ldlm_set_kms_ignore(lock); + + LASSERT(osc); + LASSERT(osc->oo_oinfo); + LASSERT(lvb); + + /* Updates lvb in lock by the cached oinfo */ + oinfo = osc->oo_oinfo; + cl_object_attr_lock(&osc->oo_cl); + memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); + cl_object_attr_unlock(&osc->oo_cl); + + LDLM_DEBUG(lock, + "update lvb size %llu blocks %llu [cma]time: %llu %llu %llu", + lvb->lvb_size, lvb->lvb_blocks, + lvb->lvb_ctime, lvb->lvb_mtime, lvb->lvb_atime); + + ldlm_clear_lvb_cached(lock); + } return LDLM_ITER_CONTINUE; } diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index 6f71d8d..b3b365a 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -54,8 +54,7 @@ int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc, int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, u64 *flags, union ldlm_policy_data *policy, - struct ost_lvb *lvb, int kms_valid, - osc_enqueue_upcall_f upcall, + struct ost_lvb *lvb, osc_enqueue_upcall_f upcall, void *cookie, struct ldlm_enqueue_info *einfo, struct ptlrpc_request_set *rqset, int async, bool speculative); diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index dcddf17..02d3436 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -143,9 +143,6 @@ static void osc_lock_build_policy(const struct lu_env *env, * with the DLM lock reply from the server. Copy of osc_update_enqueue() * logic. * - * This can be optimized to not update attributes when lock is a result of a - * local match. - * * Called under lock and resource spin-locks. */ static void osc_lock_lvb_update(const struct lu_env *env, @@ -197,7 +194,7 @@ static void osc_lock_lvb_update(const struct lu_env *env, } static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, - struct lustre_handle *lockh, bool lvb_update) + struct lustre_handle *lockh) { struct ldlm_lock *dlmlock; @@ -240,10 +237,11 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, descr->cld_gid = ext->gid; /* no lvb update for matched lock */ - if (lvb_update) { + if (!ldlm_is_lvb_cached(dlmlock)) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); osc_lock_lvb_update(env, cl2osc(oscl->ols_cl.cls_obj), dlmlock, NULL); + ldlm_set_lvb_cached(dlmlock); } LINVRNT(osc_lock_invariant(oscl)); } @@ -281,7 +279,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh, } if (rc == 0) - osc_lock_granted(env, oscl, lockh, errcode == ELDLM_OK); + osc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ if (oscl->ols_locklessable && rc == -EUSERS) { @@ -338,7 +336,9 @@ static int osc_lock_upcall_speculative(void *cookie, lock_res_and_lock(dlmlock); LASSERT(ldlm_is_granted(dlmlock)); - /* there is no osc_lock associated with speculative lock */ + /* there is no osc_lock associated with speculative lock + * thus no need to set LDLM_FL_LVB_CACHED + */ osc_lock_lvb_update(env, osc, dlmlock, NULL); unlock_res_and_lock(dlmlock); @@ -1022,7 +1022,6 @@ static int osc_lock_enqueue(const struct lu_env *env, } result = osc_enqueue_base(exp, resname, &oscl->ols_flags, policy, &oscl->ols_lvb, - osc->oo_oinfo->loi_kms_valid, upcall, cookie, &oscl->ols_einfo, PTLRPCD_SET, async, oscl->ols_speculative); diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c index fdee8fa..d2206e8 100644 --- a/fs/lustre/osc/osc_object.c +++ b/fs/lustre/osc/osc_object.c @@ -196,8 +196,30 @@ int osc_object_glimpse(const struct lu_env *env, static int osc_object_ast_clear(struct ldlm_lock *lock, void *data) { - if (lock->l_ast_data == data) + struct osc_object *osc = (struct osc_object *)data; + struct ost_lvb *lvb = lock->l_lvb_data; + struct lov_oinfo *oinfo; + + if (lock->l_ast_data == data) { lock->l_ast_data = NULL; + + LASSERT(osc); + LASSERT(osc->oo_oinfo); + LASSERT(lvb); + + /* Updates lvb in lock by the cached oinfo */ + oinfo = osc->oo_oinfo; + cl_object_attr_lock(&osc->oo_cl); + memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); + cl_object_attr_unlock(&osc->oo_cl); + + LDLM_DEBUG(lock, + "update lvb size %llu blocks %llu [cma]time: %llu %llu %llu", + lvb->lvb_size, lvb->lvb_blocks, + lvb->lvb_ctime, lvb->lvb_mtime, lvb->lvb_atime); + + ldlm_clear_lvb_cached(lock); + } return LDLM_ITER_CONTINUE; } diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 7ba9ea5..0e32496 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2496,9 +2496,8 @@ int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req, */ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, u64 *flags, union ldlm_policy_data *policy, - struct ost_lvb *lvb, int kms_valid, - osc_enqueue_upcall_f upcall, void *cookie, - struct ldlm_enqueue_info *einfo, + struct ost_lvb *lvb, osc_enqueue_upcall_f upcall, + void *cookie, struct ldlm_enqueue_info *einfo, struct ptlrpc_request_set *rqset, int async, bool speculative) { @@ -2516,15 +2515,6 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, policy->l_extent.start -= policy->l_extent.start & ~PAGE_MASK; policy->l_extent.end |= ~PAGE_MASK; - /* - * kms is not valid when either object is completely fresh (so that no - * locks are cached), or object was evicted. In the latter case cached - * lock cannot be used, because it would prime inode state with - * potentially stale LVB. - */ - if (!kms_valid) - goto no_match; - /* Next, search for already existing extent locks that will cover us */ /* If we're trying to read, we also search for an existing PW lock. The * VFS and page cache already protect us locally, so lots of readers/ @@ -2589,7 +2579,6 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, LDLM_LOCK_PUT(matched); } -no_match: if (*flags & (LDLM_FL_TEST_LOCK | LDLM_FL_MATCH_LOCK)) return -ENOLCK; if (intent) { From patchwork Thu Feb 27 21:16:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B02F192A for ; Thu, 27 Feb 2020 21:39:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 98E6A24690 for ; Thu, 27 Feb 2020 21:39:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98E6A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A1FFC34A63F; Thu, 27 Feb 2020 13:32:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0A65D348851 for ; Thu, 27 Feb 2020 13:20:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1F628918A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1E2B946A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:05 -0500 Message-Id: <1582838290-17243-498-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 497/622] lustre: osc: wrong cache of LVB attrs, part2 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman It may happen that osc oinfo lvb cache has size < kms. It occurs if a reply re-ordering happens and an older size is applied to oinfo unconditionally. Another possibility is RA, when osc_match_base() attaches the dlm lock to osc object but does not cache the lvb. The next layout change will overwrites the lock lvb by the oinfo cache (previous LUS-7731 fix), presumably smaller values. Therefore, the next lock re-use may run into a problem with partial page write which thinks the preliminary read is not needed. Do not let the cached oinfo lvb size to become less than kms. Also, cache the lock's lvb in the oinfo on osc_match_base(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12681 Lustre-commit: 40319db5bc64 ("LU-12681 osc: wrong cache of LVB attrs, part2") Signed-off-by: Vitaly Fertman Cray-bug-id: LUS-7731 Reviewed-on: https://review.whamcloud.com/36200 Reviewed-by: Patrick Farrell Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_dev.c | 72 +++++++++++++++++++++++++++----------------- fs/lustre/osc/osc_internal.h | 12 ++++++-- fs/lustre/osc/osc_lock.c | 40 +++++++++++++----------- fs/lustre/osc/osc_object.c | 16 ++++++---- fs/lustre/osc/osc_request.c | 19 +++++++++--- 5 files changed, 100 insertions(+), 59 deletions(-) diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index d589f97..312e527 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -69,21 +69,17 @@ static void mdc_lock_lvb_update(const struct lu_env *env, struct ldlm_lock *dlmlock, struct ost_lvb *lvb); -static int mdc_set_dom_lock_data(const struct lu_env *env, - struct ldlm_lock *lock, void *data) +static int mdc_set_dom_lock_data(struct ldlm_lock *lock, void *data) { - struct osc_object *obj = data; int set = 0; LASSERT(lock); LASSERT(lock->l_glimpse_ast == mdc_ldlm_glimpse_ast); lock_res_and_lock(lock); - if (!lock->l_ast_data) { - lock->l_ast_data = data; - mdc_lock_lvb_update(env, obj, lock, NULL); - } + if (!lock->l_ast_data) + lock->l_ast_data = data; if (lock->l_ast_data == data) set = 1; @@ -93,9 +89,9 @@ static int mdc_set_dom_lock_data(const struct lu_env *env, } int mdc_dom_lock_match(const struct lu_env *env, struct obd_export *exp, - struct ldlm_res_id *res_id, - enum ldlm_type type, union ldlm_policy_data *policy, - enum ldlm_mode mode, u64 *flags, void *data, + struct ldlm_res_id *res_id, enum ldlm_type type, + union ldlm_policy_data *policy, enum ldlm_mode mode, + u64 *flags, struct osc_object *obj, struct lustre_handle *lockh, int unref) { struct obd_device *obd = exp->exp_obd; @@ -107,11 +103,19 @@ int mdc_dom_lock_match(const struct lu_env *env, struct obd_export *exp, if (rc == 0 || lflags & LDLM_FL_TEST_LOCK) return rc; - if (data) { + if (obj) { struct ldlm_lock *lock = ldlm_handle2lock(lockh); LASSERT(lock); - if (!mdc_set_dom_lock_data(env, lock, data)) { + if (mdc_set_dom_lock_data(lock, obj)) { + lock_res_and_lock(lock); + if (!ldlm_is_lvb_cached(lock)) { + LASSERT(lock->l_ast_data == obj); + mdc_lock_lvb_update(env, obj, lock, NULL); + ldlm_set_lvb_cached(lock); + } + unlock_res_and_lock(lock); + } else { ldlm_lock_decref(lockh, rc); rc = 0; } @@ -400,6 +404,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc, struct cl_attr *attr = &osc_env_info(env)->oti_attr; unsigned int valid = CAT_BLOCKS | CAT_ATIME | CAT_CTIME | CAT_MTIME | CAT_SIZE; + unsigned int setkms = 0; if (!lvb) { LASSERT(dlmlock); @@ -415,17 +420,23 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc, size = lvb->lvb_size; if (size >= oinfo->loi_kms) { - LDLM_DEBUG(dlmlock, - "lock acquired, setting rss=%llu, kms=%llu", - lvb->lvb_size, size); valid |= CAT_KMS; attr->cat_kms = size; - } else { - LDLM_DEBUG(dlmlock, - "lock acquired, setting rss=%llu, leaving kms=%llu", - lvb->lvb_size, oinfo->loi_kms); + setkms = 1; } } + + /* The size should not be less than the kms */ + if (attr->cat_size < oinfo->loi_kms) + attr->cat_size = oinfo->loi_kms; + + LDLM_DEBUG(dlmlock, + "acquired size %llu, setting rss=%llu;%s kms=%llu, end=%llu", + lvb->lvb_size, attr->cat_size, + setkms ? "" : " leaving", + setkms ? attr->cat_kms : oinfo->loi_kms, + dlmlock ? dlmlock->l_policy_data.l_extent.end : -1ull); + cl_object_attr_update(env, obj, attr, valid); cl_object_attr_unlock(obj); } @@ -433,6 +444,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc, static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, struct lustre_handle *lockh) { + struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj); struct ldlm_lock *dlmlock; dlmlock = ldlm_handle2lock_long(lockh, 0); @@ -474,8 +486,8 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, /* no lvb update for matched lock */ if (!ldlm_is_lvb_cached(dlmlock)) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); - mdc_lock_lvb_update(env, cl2osc(oscl->ols_cl.cls_obj), - dlmlock, NULL); + LASSERT(osc == dlmlock->l_ast_data); + mdc_lock_lvb_update(env, osc, dlmlock, NULL); ldlm_set_lvb_cached(dlmlock); } } @@ -698,7 +710,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GLIMPSE_DDOS)) ldlm_set_kms_ignore(matched); - if (mdc_set_dom_lock_data(env, matched, einfo->ei_cbdata)) { + if (mdc_set_dom_lock_data(matched, einfo->ei_cbdata)) { *flags |= LDLM_FL_LVB_READY; /* We already have a lock, and it's referenced. */ @@ -1365,15 +1377,19 @@ static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data) /* Updates lvb in lock by the cached oinfo */ oinfo = osc->oo_oinfo; - cl_object_attr_lock(&osc->oo_cl); - memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); - cl_object_attr_unlock(&osc->oo_cl); LDLM_DEBUG(lock, - "update lvb size %llu blocks %llu [cma]time: %llu %llu %llu", - lvb->lvb_size, lvb->lvb_blocks, - lvb->lvb_ctime, lvb->lvb_mtime, lvb->lvb_atime); + "update lock size %llu blocks %llu [cma]time: %llu %llu %llu by oinfo size %llu blocks %llu [cma]time %llu %llu %llu", + lvb->lvb_size, + lvb->lvb_blocks, lvb->lvb_ctime, lvb->lvb_mtime, + lvb->lvb_atime, oinfo->loi_lvb.lvb_size, + oinfo->loi_lvb.lvb_blocks, oinfo->loi_lvb.lvb_ctime, + oinfo->loi_lvb.lvb_mtime, oinfo->loi_lvb.lvb_atime); + LASSERT(oinfo->loi_lvb.lvb_size >= oinfo->loi_kms); + cl_object_attr_lock(&osc->oo_cl); + memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); + cl_object_attr_unlock(&osc->oo_cl); ldlm_clear_lvb_cached(lock); } return LDLM_ITER_CONTINUE; diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index b3b365a..492c60d 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -52,6 +52,11 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext, int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc, pgoff_t start, pgoff_t end, bool discard); +void osc_lock_lvb_update(const struct lu_env *env, + struct osc_object *osc, + struct ldlm_lock *dlmlock, + struct ost_lvb *lvb); + int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, u64 *flags, union ldlm_policy_data *policy, struct ost_lvb *lvb, osc_enqueue_upcall_f upcall, @@ -59,9 +64,10 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, struct ptlrpc_request_set *rqset, int async, bool speculative); -int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id, - enum ldlm_type type, union ldlm_policy_data *policy, - enum ldlm_mode mode, u64 *flags, void *data, +int osc_match_base(const struct lu_env *env, struct obd_export *exp, + struct ldlm_res_id *res_id, enum ldlm_type type, + union ldlm_policy_data *policy, enum ldlm_mode mode, + u64 *flags, struct osc_object *obj, struct lustre_handle *lockh, int unref); int osc_setattr_async(struct obd_export *exp, struct obdo *oa, diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 02d3436..ce592d7 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -145,15 +145,16 @@ static void osc_lock_build_policy(const struct lu_env *env, * * Called under lock and resource spin-locks. */ -static void osc_lock_lvb_update(const struct lu_env *env, - struct osc_object *osc, - struct ldlm_lock *dlmlock, - struct ost_lvb *lvb) +void osc_lock_lvb_update(const struct lu_env *env, + struct osc_object *osc, + struct ldlm_lock *dlmlock, + struct ost_lvb *lvb) { struct cl_object *obj = osc2cl(osc); struct lov_oinfo *oinfo = osc->oo_oinfo; struct cl_attr *attr = &osc_env_info(env)->oti_attr; unsigned int valid; + unsigned int setkms = 0; valid = CAT_BLOCKS | CAT_ATIME | CAT_CTIME | CAT_MTIME | CAT_SIZE; if (!lvb) @@ -175,20 +176,24 @@ static void osc_lock_lvb_update(const struct lu_env *env, if (size > dlmlock->l_policy_data.l_extent.end) size = dlmlock->l_policy_data.l_extent.end + 1; if (size >= oinfo->loi_kms) { - LDLM_DEBUG(dlmlock, - "lock acquired, setting rss=%llu, kms=%llu", - lvb->lvb_size, size); valid |= CAT_KMS; attr->cat_kms = size; - } else { - LDLM_DEBUG(dlmlock, - "lock acquired, setting rss=%llu; leaving kms=%llu, end=%llu", - lvb->lvb_size, oinfo->loi_kms, - dlmlock->l_policy_data.l_extent.end); + setkms = 1; } ldlm_lock_allow_match_locked(dlmlock); } + /* The size should not be less than the kms */ + if (attr->cat_size < oinfo->loi_kms) + attr->cat_size = oinfo->loi_kms; + + LDLM_DEBUG(dlmlock, + "acquired size %llu, setting rss=%llu;%s kms=%llu, end=%llu", + lvb->lvb_size, attr->cat_size, + setkms ? "" : " leaving", + setkms ? attr->cat_kms : oinfo->loi_kms, + dlmlock ? dlmlock->l_policy_data.l_extent.end : -1ull); + cl_object_attr_update(env, obj, attr, valid); cl_object_attr_unlock(obj); } @@ -196,6 +201,7 @@ static void osc_lock_lvb_update(const struct lu_env *env, static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, struct lustre_handle *lockh) { + struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj); struct ldlm_lock *dlmlock; dlmlock = ldlm_handle2lock_long(lockh, 0); @@ -239,8 +245,8 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, /* no lvb update for matched lock */ if (!ldlm_is_lvb_cached(dlmlock)) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); - osc_lock_lvb_update(env, cl2osc(oscl->ols_cl.cls_obj), - dlmlock, NULL); + LASSERT(osc == dlmlock->l_ast_data); + osc_lock_lvb_update(env, osc, dlmlock, NULL); ldlm_set_lvb_cached(dlmlock); } LINVRNT(osc_lock_invariant(oscl)); @@ -1271,9 +1277,9 @@ struct ldlm_lock *osc_obj_dlmlock_at_pgoff(const struct lu_env *env, * with a uniq gid and it conflicts with all other lock modes too */ again: - mode = osc_match_base(osc_export(obj), resname, LDLM_EXTENT, policy, - LCK_PR | LCK_PW | LCK_GROUP, &flags, obj, &lockh, - dap_flags & OSC_DAP_FL_CANCELING); + mode = osc_match_base(env, osc_export(obj), resname, LDLM_EXTENT, + policy, LCK_PR | LCK_PW | LCK_GROUP, &flags, + obj, &lockh, dap_flags & OSC_DAP_FL_CANCELING); if (mode != 0) { lock = ldlm_handle2lock(&lockh); /* RACE: the lock is cancelled so let's try again */ diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c index d2206e8..6d24cd3 100644 --- a/fs/lustre/osc/osc_object.c +++ b/fs/lustre/osc/osc_object.c @@ -209,15 +209,19 @@ static int osc_object_ast_clear(struct ldlm_lock *lock, void *data) /* Updates lvb in lock by the cached oinfo */ oinfo = osc->oo_oinfo; - cl_object_attr_lock(&osc->oo_cl); - memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); - cl_object_attr_unlock(&osc->oo_cl); LDLM_DEBUG(lock, - "update lvb size %llu blocks %llu [cma]time: %llu %llu %llu", - lvb->lvb_size, lvb->lvb_blocks, - lvb->lvb_ctime, lvb->lvb_mtime, lvb->lvb_atime); + "update lock size %llu blocks %llu [cma]time: %llu %llu %llu by oinfo size %llu blocks %llu [cma]time %llu %llu %llu", + lvb->lvb_size, + lvb->lvb_blocks, lvb->lvb_ctime, lvb->lvb_mtime, + lvb->lvb_atime, oinfo->loi_lvb.lvb_size, + oinfo->loi_lvb.lvb_blocks, oinfo->loi_lvb.lvb_ctime, + oinfo->loi_lvb.lvb_mtime, oinfo->loi_lvb.lvb_atime); + LASSERT(oinfo->loi_lvb.lvb_size >= oinfo->loi_kms); + cl_object_attr_lock(&osc->oo_cl); + memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); + cl_object_attr_unlock(&osc->oo_cl); ldlm_clear_lvb_cached(lock); } return LDLM_ITER_CONTINUE; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 0e32496..95e09ce 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2643,9 +2643,10 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, return rc; } -int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id, - enum ldlm_type type, union ldlm_policy_data *policy, - enum ldlm_mode mode, u64 *flags, void *data, +int osc_match_base(const struct lu_env *env, struct obd_export *exp, + struct ldlm_res_id *res_id, enum ldlm_type type, + union ldlm_policy_data *policy, enum ldlm_mode mode, + u64 *flags, struct osc_object *obj, struct lustre_handle *lockh, int unref) { struct obd_device *obd = exp->exp_obd; @@ -2674,11 +2675,19 @@ int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id, if (!rc || lflags & LDLM_FL_TEST_LOCK) return rc; - if (data) { + if (obj) { struct ldlm_lock *lock = ldlm_handle2lock(lockh); LASSERT(lock); - if (!osc_set_lock_data(lock, data)) { + if (osc_set_lock_data(lock, obj)) { + lock_res_and_lock(lock); + if (!ldlm_is_lvb_cached(lock)) { + LASSERT(lock->l_ast_data == obj); + osc_lock_lvb_update(env, obj, lock, NULL); + ldlm_set_lvb_cached(lock); + } + unlock_res_and_lock(lock); + } else { ldlm_lock_decref(lockh, rc); rc = 0; } From patchwork Thu Feb 27 21:16:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410893 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBBDF924 for ; Thu, 27 Feb 2020 21:50:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C3CE424690 for ; Thu, 27 Feb 2020 21:50:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3CE424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3DD51349C15; Thu, 27 Feb 2020 13:41:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 60C5F348851 for ; Thu, 27 Feb 2020 13:20:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 22186918B; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2102846C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:06 -0500 Message-Id: <1582838290-17243-499-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 498/622] lustre: vvp: dirty pages with pagevec X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When doing i/o from multiple writers to a single file, the per-file page cache lock (the mapping lock) becomes a bottleneck. Most current uses are single page at a time. This converts one prominent use, marking page as dirty, to use a pagevec. When many threads are writing to one file, this improves write performance by around 25%. This requires implementing our own version of the set_page_dirty-->__set_page_dirty_nobuffers functions. This was modeled on upstream tip of tree: v5.2-rc4-224-ge01e060fe0 (7/13/2019) The relevant code is unchanged since Linux 4.17, and has changed only minimally since before Linux 2.6. WC-bug-id: https://jira.whamcloud.com/browse/LU-9920 Lustre-commit: a7299cb012f8 ("LU-9920 vvp: dirty pages with pagevec") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/28711 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 2 +- fs/lustre/include/lustre_osc.h | 6 +-- fs/lustre/llite/llite_lib.c | 5 +- fs/lustre/llite/vvp_io.c | 102 +++++++++++++++++++++++++++++++++++----- fs/lustre/mdc/mdc_request.c | 7 +-- fs/lustre/obdecho/echo_client.c | 11 ++++- fs/lustre/osc/osc_cache.c | 13 ++++- fs/lustre/osc/osc_io.c | 23 +++++++-- fs/lustre/osc/osc_page.c | 7 ++- mm/page-writeback.c | 1 + 10 files changed, 144 insertions(+), 33 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 4c68d7b..75ece62 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1458,7 +1458,7 @@ struct cl_io_slice { }; typedef void (*cl_commit_cbt)(const struct lu_env *, struct cl_io *, - struct cl_page *); + struct pagevec *); struct cl_read_ahead { /* diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index de7ccd6..2cd23f2 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -584,9 +584,9 @@ int osc_set_async_flags(struct osc_object *obj, struct osc_page *opg, int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, struct page *page, loff_t offset); int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, - struct osc_page *ops); -int osc_page_cache_add(const struct lu_env *env, - const struct cl_page_slice *slice, struct cl_io *io); + struct osc_page *ops, cl_commit_cbt cb); +int osc_page_cache_add(const struct lu_env *env, struct osc_page *opg, + struct cl_io *io, cl_commit_cbt cb); int osc_teardown_async_page(const struct lu_env *env, struct osc_object *obj, struct osc_page *ops); int osc_flush_async_page(const struct lu_env *env, struct cl_io *io, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index ad7c2e2..5d74f30 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2149,6 +2149,7 @@ void ll_delete_inode(struct inode *inode) struct ll_inode_info *lli = ll_i2info(inode); struct address_space *mapping = &inode->i_data; unsigned long nrpages; + unsigned long flags; if (S_ISREG(inode->i_mode) && lli->lli_clob) { /* It is last chance to write out dirty pages, @@ -2172,9 +2173,9 @@ void ll_delete_inode(struct inode *inode) */ nrpages = mapping->nrpages; if (nrpages) { - xa_lock_irq(&mapping->i_pages); + xa_lock_irqsave(&mapping->i_pages, flags); nrpages = mapping->nrpages; - xa_unlock_irq(&mapping->i_pages); + xa_unlock_irqrestore(&mapping->i_pages, flags); } /* Workaround end */ LASSERTF(nrpages == 0, diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index d0d8b1f..aa8f2e1 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -39,7 +39,8 @@ #define DEBUG_SUBSYSTEM S_LLITE #include - +#include +#include #include "llite_internal.h" #include "vvp_internal.h" @@ -860,19 +861,98 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io, return bytes > 0 ? bytes : rc; } +/* Taken from kernel set_page_dirty, __set_page_dirty_nobuffers + * Last change to this area: b93b016313b3ba8003c3b8bb71f569af91f19fc7 + * + * Current with Linus tip of tree (7/13/2019): + * v5.2-rc4-224-ge01e060fe0 + * + */ +void vvp_set_pagevec_dirty(struct pagevec *pvec) +{ + struct page *page = pvec->pages[0]; + struct address_space *mapping = page->mapping; + unsigned long flags; + int count = pagevec_count(pvec); + int dirtied = 0; + int i = 0; + + /* From set_page_dirty */ + for (i = 0; i < count; i++) + ClearPageReclaim(pvec->pages[i]); + + LASSERTF(page->mapping, + "mapping must be set. page %p, page->private (cl_page) %p", + page, (void *) page->private); + + /* Rest of code derived from __set_page_dirty_nobuffers */ + xa_lock_irqsave(&mapping->i_pages, flags); + + /* Notes on differences with __set_page_dirty_nobuffers: + * 1. We don't need to call page_mapping because we know this is a page + * cache page. + * 2. We have the pages locked, so there is no need for the careful + * mapping/mapping2 dance. + * 3. No mapping is impossible. (Race w/truncate mentioned in + * dirty_nobuffers should be impossible because we hold the page lock.) + * 4. All mappings are the same because i/o is only to one file. + * 5. We invert the lock order on lock_page_memcg(page) and the mapping + * xa_lock, but this is the only function that should use that pair of + * locks and it can't race because Lustre locks pages throughout i/o. + */ + for (i = 0; i < count; i++) { + page = pvec->pages[i]; + lock_page_memcg(page); + if (TestSetPageDirty(page)) { + unlock_page_memcg(page); + continue; + } + LASSERTF(page->mapping == mapping, + "all pages must have the same mapping. page %p, mapping %p, first mapping %p\n", + page, page->mapping, mapping); + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + account_page_dirtied(page, mapping); + __xa_set_mark(&mapping->i_pages, page_index(page), + PAGECACHE_TAG_DIRTY); + dirtied++; + unlock_page_memcg(page); + } + xa_unlock_irqrestore(&mapping->i_pages, flags); + + CDEBUG(D_VFSTRACE, "mapping %p, count %d, dirtied %d\n", mapping, + count, dirtied); + + if (mapping->host && dirtied) { + /* !PageAnon && !swapper_space */ + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + } +} + static void write_commit_callback(const struct lu_env *env, struct cl_io *io, - struct cl_page *page) + struct pagevec *pvec) { - struct page *vmpage = page->cp_vmpage; + struct cl_page *page; + struct page *vmpage; + int count = 0; + int i = 0; - SetPageUptodate(vmpage); - set_page_dirty(vmpage); + count = pagevec_count(pvec); + LASSERT(count > 0); - cl_page_disown(env, io, page); + for (i = 0; i < count; i++) { + vmpage = pvec->pages[i]; + SetPageUptodate(vmpage); + } + + vvp_set_pagevec_dirty(pvec); - /* held in ll_cl_init() */ - lu_ref_del(&page->cp_reference, "cl_io", cl_io_top(io)); - cl_page_put(env, page); + for (i = 0; i < count; i++) { + vmpage = pvec->pages[i]; + page = (struct cl_page *) vmpage->private; + cl_page_disown(env, io, page); + lu_ref_del(&page->cp_reference, "cl_io", cl_io_top(io)); + cl_page_put(env, page); + } } /* make sure the page list is contiguous */ @@ -1128,9 +1208,9 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) } static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io, - struct cl_page *page) + struct pagevec *pvec) { - set_page_dirty(page->cp_vmpage); + vvp_set_pagevec_dirty(pvec); } static int vvp_io_fault_start(const struct lu_env *env, diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 34cf177..287013f 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1138,16 +1138,17 @@ static struct page *mdc_page_locate(struct address_space *mapping, u64 *hash, */ unsigned long offset = hash_x_index(*hash, hash64); struct page *page; + unsigned long flags; int found; - xa_lock_irq(&mapping->i_pages); + xa_lock_irqsave(&mapping->i_pages, flags); found = radix_tree_gang_lookup(&mapping->i_pages, (void **)&page, offset, 1); if (found > 0 && !xa_is_value(page)) { struct lu_dirpage *dp; get_page(page); - xa_unlock_irq(&mapping->i_pages); + xa_unlock_irqrestore(&mapping->i_pages, flags); /* * In contrast to find_lock_page() we are sure that directory * page cannot be truncated (while DLM lock is held) and, @@ -1197,7 +1198,7 @@ static struct page *mdc_page_locate(struct address_space *mapping, u64 *hash, page = ERR_PTR(-EIO); } } else { - xa_unlock_irq(&mapping->i_pages); + xa_unlock_irqrestore(&mapping->i_pages, flags); page = NULL; } return page; diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 172fe11..8e04636 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -998,16 +998,23 @@ static int __cl_echo_cancel(struct lu_env *env, struct echo_device *ed, } static void echo_commit_callback(const struct lu_env *env, struct cl_io *io, - struct cl_page *page) + struct pagevec *pvec) { struct echo_thread_info *info; struct cl_2queue *queue; + int i = 0; info = echo_env_info(env); LASSERT(io == &info->eti_io); queue = &info->eti_queue; - cl_page_list_add(&queue->c2_qout, page); + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *vmpage = pvec->pages[i]; + struct cl_page *page = (struct cl_page *)vmpage->private; + + cl_page_list_add(&queue->c2_qout, page); + } } static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset, diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 3d47c02..dde03bd 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2303,13 +2303,14 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, EXPORT_SYMBOL(osc_prep_async_page); int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, - struct osc_page *ops) + struct osc_page *ops, cl_commit_cbt cb) { struct osc_io *oio = osc_env_io(env); struct osc_extent *ext = NULL; struct osc_async_page *oap = &ops->ops_oap; struct client_obd *cli = oap->oap_cli; struct osc_object *osc = oap->oap_obj; + struct pagevec *pvec = &osc_env_info(env)->oti_pagevec; pgoff_t index; unsigned int grants = 0, tmp; int brw_flags = OBD_BRW_ASYNC; @@ -2431,7 +2432,15 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, rc = 0; if (grants == 0) { - /* we haven't allocated grant for this page. */ + /* We haven't allocated grant for this page, and we + * must not hold a page lock while we do enter_cache, + * so we must mark dirty & unlock any pages in the + * write commit pagevec. + */ + if (pagevec_count(pvec)) { + cb(env, io, pvec); + pagevec_reinit(pvec); + } rc = osc_enter_cache(env, cli, oap, tmp); if (rc == 0) grants = tmp; diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 8e299d4..f340266 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -40,6 +40,7 @@ #include #include +#include #include "osc_internal.h" @@ -288,6 +289,7 @@ int osc_io_commit_async(const struct lu_env *env, struct cl_page *page; struct cl_page *last_page; struct osc_page *opg; + struct pagevec *pvec = &osc_env_info(env)->oti_pagevec; int result = 0; LASSERT(qin->pl_nr > 0); @@ -306,6 +308,8 @@ int osc_io_commit_async(const struct lu_env *env, } } + pagevec_init(pvec); + while (qin->pl_nr > 0) { struct osc_async_page *oap; @@ -325,7 +329,7 @@ int osc_io_commit_async(const struct lu_env *env, /* The page may be already in dirty cache. */ if (list_empty(&oap->oap_pending_item)) { - result = osc_page_cache_add(env, &opg->ops_cl, io); + result = osc_page_cache_add(env, opg, io, cb); if (result != 0) break; } @@ -335,12 +339,21 @@ int osc_io_commit_async(const struct lu_env *env, cl_page_list_del(env, qin, page); - (*cb)(env, io, page); - /* Can't access page any more. Page can be in transfer and - * complete at any time. - */ + /* if there are no more slots, do the callback & reinit */ + if (pagevec_add(pvec, page->cp_vmpage) == 0) { + (*cb)(env, io, pvec); + pagevec_reinit(pvec); + } } + /* Clean up any partially full pagevecs */ + if (pagevec_count(pvec) != 0) + (*cb)(env, io, pvec); + + /* Can't access these pages any more. Page can be in transfer and + * complete at any time. + */ + /* for sync write, kernel will wait for this page to be flushed before * osc_io_end() is called, so release it earlier. * for mkwrite(), it's known there is no further pages. diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 0910f3a..6685968 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -92,14 +92,13 @@ static void osc_page_transfer_add(const struct lu_env *env, osc_lru_use(osc_cli(obj), opg); } -int osc_page_cache_add(const struct lu_env *env, - const struct cl_page_slice *slice, struct cl_io *io) +int osc_page_cache_add(const struct lu_env *env, struct osc_page *opg, + struct cl_io *io, cl_commit_cbt cb) { - struct osc_page *opg = cl2osc_page(slice); int result; osc_page_transfer_get(opg, "transfer\0cache"); - result = osc_queue_async_io(env, io, opg); + result = osc_queue_async_io(env, io, opg, cb); if (result != 0) osc_page_transfer_put(env, opg); else diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 50055d2..3b5a43d 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2433,6 +2433,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) mem_cgroup_track_foreign_dirty(page, wb); } } +EXPORT_SYMBOL(account_page_dirtied); /* * Helper function for deaccounting dirty page without writeback. From patchwork Thu Feb 27 21:16:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5652924 for ; Thu, 27 Feb 2020 21:46:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DC93246A2 for ; Thu, 27 Feb 2020 21:46:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DC93246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0C38334B17F; Thu, 27 Feb 2020 13:36:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B6622348851 for ; Thu, 27 Feb 2020 13:20:53 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2551A918C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 23BE046D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:07 -0500 Message-Id: <1582838290-17243-500-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 499/622] lustre: ptlrpc: resend may corrupt the data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Late resend if arrives much later than another modification RPC which has been already handled on this slot, may be still applied and therefore overrides the last one Send RPCs from client in increasing order for each tag and check it on server to check late resend. A slot can be reused by a client after kill while the server continue to rely on it. Add flag for such obsolete requests, here we trust the client and perform xid check for all in progress requests. Cray-bug-id: LUS-6272, LUS-7277, LUS-7339 WC-bug-id: https://jira.whamcloud.com/browse/LU-11444 Lustre-commit: 23773b32bfe1 ("LU-11444 ptlrpc: resend may corrupt the data") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/35114 Reviewed-by: Vitaly Fertman Reviewed-by: Andrew Perepechko Reviewed-by: Alexandr Boyko Reviewed-by: Mike Pershin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_mdc.h | 1 + fs/lustre/include/lustre_net.h | 1 + fs/lustre/llite/llite_lib.c | 4 +++- fs/lustre/obdclass/genops.c | 6 ++++++ fs/lustre/ptlrpc/client.c | 10 ++++++++++ fs/lustre/ptlrpc/service.c | 11 ++++++++--- 6 files changed, 29 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/lustre_mdc.h b/fs/lustre/include/lustre_mdc.h index aecb6ee..f57783d 100644 --- a/fs/lustre/include/lustre_mdc.h +++ b/fs/lustre/include/lustre_mdc.h @@ -70,6 +70,7 @@ static inline void mdc_get_mod_rpc_slot(struct ptlrpc_request *req, opc = lustre_msg_get_opc(req->rq_reqmsg); tag = obd_get_mod_rpc_slot(cli, opc, it); lustre_msg_set_tag(req->rq_reqmsg, tag); + ptlrpc_reassign_next_xid(req); } static inline void mdc_put_mod_rpc_slot(struct ptlrpc_request *req, diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 8dad08e..40c1ae8 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1916,6 +1916,7 @@ void ptlrpc_retain_replayable_request(struct ptlrpc_request *req, u64 ptlrpc_next_xid(void); u64 ptlrpc_sample_next_xid(void); u64 ptlrpc_req_xid(struct ptlrpc_request *request); +void ptlrpc_reassign_next_xid(struct ptlrpc_request *req); /* Set of routines to run a function in ptlrpcd context */ void *ptlrpcd_alloc_work(struct obd_import *imp, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 5d74f30..4580be3 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -240,6 +240,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT2_FLR | OBD_CONNECT2_LOCK_CONVERT | OBD_CONNECT2_ARCHIVE_ID_ARRAY | + OBD_CONNECT2_INC_XID | OBD_CONNECT2_LSOM | OBD_CONNECT2_ASYNC_DISCARD | OBD_CONNECT2_PCC; @@ -459,7 +460,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) if (data->ocd_version < OBD_OCD_VERSION(2, 12, 50, 0)) data->ocd_connect_flags |= OBD_CONNECT_LOCKAHEAD_OLD; - data->ocd_connect_flags2 = OBD_CONNECT2_LOCKAHEAD; + data->ocd_connect_flags2 = OBD_CONNECT2_LOCKAHEAD | + OBD_CONNECT2_INC_XID; if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_GRANT_PARAM)) data->ocd_connect_flags |= OBD_CONNECT_GRANT_PARAM; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 49db077..5d4e421 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1550,6 +1550,12 @@ u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc, LASSERT(!test_and_set_bit(i, cli->cl_mod_tag_bitmap)); spin_unlock(&cli->cl_mod_rpcs_lock); /* tag 0 is reserved for non-modify RPCs */ + + CDEBUG(D_RPCTRACE, + "%s: modify RPC slot %u is allocated opc %u, max %hu\n", + cli->cl_import->imp_obd->obd_name, + i + 1, opc, max); + return i + 1; } spin_unlock(&cli->cl_mod_rpcs_lock); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index c359ac0..8d874f2 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -717,6 +717,16 @@ static inline void ptlrpc_assign_next_xid(struct ptlrpc_request *req) static atomic64_t ptlrpc_last_xid; +void ptlrpc_reassign_next_xid(struct ptlrpc_request *req) +{ + spin_lock(&req->rq_import->imp_lock); + list_del_init(&req->rq_unreplied_list); + ptlrpc_assign_next_xid_nolock(req); + spin_unlock(&req->rq_import->imp_lock); + DEBUG_REQ(D_RPCTRACE, req, "reassign xid"); +} +EXPORT_SYMBOL(ptlrpc_reassign_next_xid); + int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, u32 version, int opcode, char **bufs, struct ptlrpc_cli_ctx *ctx) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index c66c690..b2a33a3 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -864,6 +864,13 @@ static void ptlrpc_server_drop_request(struct ptlrpc_request *req) } } +static void ptlrpc_del_exp_list(struct ptlrpc_request *req) +{ + spin_lock(&req->rq_export->exp_rpc_lock); + list_del_init(&req->rq_exp_list); + spin_unlock(&req->rq_export->exp_rpc_lock); +} + /** * to finish a request: stop sending more early replies, and release * the request. @@ -1367,9 +1374,7 @@ static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req) if (req->rq_ops->hpreq_fini) req->rq_ops->hpreq_fini(req); - spin_lock(&req->rq_export->exp_rpc_lock); - list_del_init(&req->rq_exp_list); - spin_unlock(&req->rq_export->exp_rpc_lock); + ptlrpc_del_exp_list(req); } } From patchwork Thu Feb 27 21:16:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB91A924 for ; Thu, 27 Feb 2020 21:43:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9423424690 for ; Thu, 27 Feb 2020 21:43:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9423424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 762B1348CE4; Thu, 27 Feb 2020 13:34:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 59702348862 for ; Thu, 27 Feb 2020 13:20:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2B471918E; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 28E97468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:08 -0500 Message-Id: <1582838290-17243-501-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 500/622] lnet: eliminate uninitialized warning X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong lustre-release/net/lnet/lnet/router.c: In funciton 'lnet_del_route': include/linux/compiler.h:177:26: error: 'lp' may be used uninitialized in this function [-Werror=maybe-uninitialized] case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ lustre-release/net/lnet/lnet/router.c:754:20: note: 'lp' was declared here struct lnet_peer *lp; lustre-release/net/lnet/lnet/router.c: At top level: cc1: error: unrecognized command line option '-Wno-stringop-overflow' [-Werror] cc1: error: unrecognized command line option '-Wno-stringop-truncation' [-Werror] cc1: error: unrecognized command line option '-Wno-format-truncation' [-Werror] cc1: all warnings being treated as errors codes logic gurantee @lpi and @lpni are inited at the same time, but let's init @lpi to make gcc happy. WC-bug-id: https://jira.whamcloud.com/browse/LU-12764 Lustre-commit: a8fbaa1b998f ("LU-12764 lnet: eliminate uninitialized warning") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/36189 Reviewed-by: Andreas Dilger Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index a5e4af0..447706d 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -721,7 +721,7 @@ static void lnet_shuffle_seed(void) struct lnet_peer_ni *lpni; struct lnet_route *route; struct list_head zombies; - struct lnet_peer *lp; + struct lnet_peer *lp = NULL; int i = 0; INIT_LIST_HEAD(&rnet_zombies); From patchwork Thu Feb 27 21:16:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410897 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E7521580 for ; Thu, 27 Feb 2020 21:50:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 76D9D246A1 for ; Thu, 27 Feb 2020 21:50:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76D9D246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A9EF4348B64; Thu, 27 Feb 2020 13:41:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 178E6348851 for ; Thu, 27 Feb 2020 13:20:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2B31E918D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2971047C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:09 -0500 Message-Id: <1582838290-17243-502-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 501/622] lnet: o2ib: Record rc in debug log on startup failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Since kiblnd_startup() return -ENETDOWN on failure, let's record the rc value for the failure case in the debug log. Cray-bug-id: LUS-7935 WC-bug-id: https://jira.whamcloud.com/browse/LU-12824 Lustre-commit: 99f85541a685 ("LU-12824 o2ib: Record rc in debug log on startup failure") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36325 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index d4d5d4f..d162b0a7 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2848,10 +2848,10 @@ static int kiblnd_dev_start_threads(struct kib_dev *dev, u32 *cpts, int ncpts) static int kiblnd_startup(struct lnet_ni *ni) { - char *ifname; + char *ifname = NULL; struct lnet_inetdev *ifaces = NULL; struct kib_dev *ibdev = NULL; - struct kib_net *net; + struct kib_net *net = NULL; unsigned long flags; int rc; int i; @@ -2866,8 +2866,10 @@ static int kiblnd_startup(struct lnet_ni *ni) net = kzalloc(sizeof(*net), GFP_NOFS); ni->ni_data = net; - if (!net) + if (!net) { + rc = -ENOMEM; goto net_failed; + } net->ibn_incarnation = ktime_get_real_ns() / NSEC_PER_USEC; @@ -2884,6 +2886,7 @@ static int kiblnd_startup(struct lnet_ni *ni) if (ni->ni_interfaces[1]) { CERROR("ko2iblnd: Multiple interfaces not supported\n"); + rc = -EINVAL; goto failed; } @@ -2894,6 +2897,7 @@ static int kiblnd_startup(struct lnet_ni *ni) if (strlen(ifname) >= sizeof(ibdev->ibd_ifname)) { CERROR("IPoIB interface name too long: %s\n", ifname); + rc = -E2BIG; goto failed; } @@ -2968,7 +2972,9 @@ static int kiblnd_startup(struct lnet_ni *ni) net_failed: kiblnd_shutdown(ni); - CDEBUG(D_NET, "%s failed\n", __func__); + CDEBUG(D_NET, "Configuration of device %s failed: rc = %d\n", + ifname ? ifname : "", rc); + return -ENETDOWN; } From patchwork Thu Feb 27 21:16:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 760AC138D for ; Thu, 27 Feb 2020 21:43:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5F00224690 for ; Thu, 27 Feb 2020 21:43:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F00224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B01A349326; Thu, 27 Feb 2020 13:35:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BBEE348862 for ; Thu, 27 Feb 2020 13:20:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2D73D918F; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2C48C46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:10 -0500 Message-Id: <1582838290-17243-503-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 502/622] lnet: o2ib: Reintroduce kiblnd_dev_search X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If we add an interface to multiple nets then we need to re-use the struct ib_dev object for each of the nets. Cray-bug-id: LUS-7935 Fixes: 3aa523159321 ("lnet: consoldate secondary IP address handling") WC-bug-id: https://jira.whamcloud.com/browse/LU-12824 Lustre-commit: e25e45c612a0 ("LU-12824 o2ib: Reintroduce kiblnd_dev_search") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36326 Reviewed-by: James Simmons Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 85 +++++++++++++++++++++++++++++----------- 1 file changed, 63 insertions(+), 22 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index d162b0a7..1cc5358 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2821,7 +2821,8 @@ static int kiblnd_start_schedulers(struct kib_sched_info *sched) return rc; } -static int kiblnd_dev_start_threads(struct kib_dev *dev, u32 *cpts, int ncpts) +static int kiblnd_dev_start_threads(struct kib_dev *dev, bool newdev, u32 *cpts, + int ncpts) { int cpt; int rc; @@ -2833,7 +2834,7 @@ static int kiblnd_dev_start_threads(struct kib_dev *dev, u32 *cpts, int ncpts) cpt = !cpts ? i : cpts[i]; sched = kiblnd_data.kib_scheds[cpt]; - if (sched->ibs_nthreads > 0) + if (!newdev && sched->ibs_nthreads > 0) continue; rc = kiblnd_start_schedulers(kiblnd_data.kib_scheds[cpt]); @@ -2846,6 +2847,39 @@ static int kiblnd_dev_start_threads(struct kib_dev *dev, u32 *cpts, int ncpts) return 0; } +static struct kib_dev * +kiblnd_dev_search(char *ifname) +{ + struct kib_dev *alias = NULL; + struct kib_dev *dev; + char *colon; + char *colon2; + + colon = strchr(ifname, ':'); + list_for_each_entry(dev, &kiblnd_data.kib_devs, ibd_list) { + if (strcmp(&dev->ibd_ifname[0], ifname) == 0) + return dev; + + if (alias) + continue; + + colon2 = strchr(dev->ibd_ifname, ':'); + if (colon) + *colon = 0; + if (colon2) + *colon2 = 0; + + if (strcmp(&dev->ibd_ifname[0], ifname) == 0) + alias = dev; + + if (colon) + *colon = ':'; + if (colon2) + *colon2 = ':'; + } + return alias; +} + static int kiblnd_startup(struct lnet_ni *ni) { char *ifname = NULL; @@ -2855,6 +2889,7 @@ static int kiblnd_startup(struct lnet_ni *ni) unsigned long flags; int rc; int i; + bool newdev; LASSERT(ni->ni_net->net_lnd == &the_o2iblnd); @@ -2916,36 +2951,42 @@ static int kiblnd_startup(struct lnet_ni *ni) goto failed; } - ibdev = kzalloc(sizeof(*ibdev), GFP_KERNEL); - if (!ibdev) { - rc = -ENOMEM; - goto failed; - } + ibdev = kiblnd_dev_search(ifname); + newdev = !ibdev; + /* hmm...create kib_dev even for alias */ + if (!ibdev || strcmp(&ibdev->ibd_ifname[0], ifname) != 0) { + ibdev = kzalloc(sizeof(*ibdev), GFP_NOFS); + if (!ibdev) { + rc = -ENOMEM; + goto failed; + } - ibdev->ibd_ifip = ifaces[i].li_ipaddr; - strlcpy(ibdev->ibd_ifname, ifaces[i].li_name, - sizeof(ibdev->ibd_ifname)); - ibdev->ibd_can_failover = !!(ifaces[i].li_flags & IFF_MASTER); + ibdev->ibd_ifip = ifaces[i].li_ipaddr; + strlcpy(ibdev->ibd_ifname, ifaces[i].li_name, + sizeof(ibdev->ibd_ifname)); + ibdev->ibd_can_failover = !!(ifaces[i].li_flags & IFF_MASTER); - INIT_LIST_HEAD(&ibdev->ibd_nets); - INIT_LIST_HEAD(&ibdev->ibd_list); /* not yet in kib_devs */ - INIT_LIST_HEAD(&ibdev->ibd_fail_list); + INIT_LIST_HEAD(&ibdev->ibd_nets); + INIT_LIST_HEAD(&ibdev->ibd_list); /* not yet in kib_devs */ + INIT_LIST_HEAD(&ibdev->ibd_fail_list); - /* initialize the device */ - rc = kiblnd_dev_failover(ibdev, ni->ni_net_ns); - if (rc) { - CERROR("ko2iblnd: Can't initialize device: rc = %d\n", rc); - goto failed; - } + /* initialize the device */ + rc = kiblnd_dev_failover(ibdev, ni->ni_net_ns); + if (rc) { + CERROR("ko2iblnd: Can't initialize device: rc = %d\n", + rc); + goto failed; + } - list_add_tail(&ibdev->ibd_list, &kiblnd_data.kib_devs); + list_add_tail(&ibdev->ibd_list, &kiblnd_data.kib_devs); + } net->ibn_dev = ibdev; ni->ni_nid = LNET_MKNID(LNET_NIDNET(ni->ni_nid), ibdev->ibd_ifip); ni->ni_dev_cpt = ifaces[i].li_cpt; - rc = kiblnd_dev_start_threads(ibdev, ni->ni_cpts, ni->ni_ncpts); + rc = kiblnd_dev_start_threads(ibdev, newdev, ni->ni_cpts, ni->ni_ncpts); if (rc) goto failed; From patchwork Thu Feb 27 21:16:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2FF231580 for ; Thu, 27 Feb 2020 21:49:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 18BDF24690 for ; Thu, 27 Feb 2020 21:49:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18BDF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F26834A55B; Thu, 27 Feb 2020 13:39:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F2216348868 for ; Thu, 27 Feb 2020 13:20:54 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2FFB19190; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2EF9146C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:11 -0500 Message-Id: <1582838290-17243-504-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 503/622] lustre: ptlrpc: fix watchdog ratelimit logic X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The ptlrpc-level watchdog ratelimiting is broken. The kernel prints: mdt00_009: service thread pid 18935 was inactive for 72s. Watchdog stack traces are limited to 3 per 300s, skipping... even though there hasn't been any stack trace printed before. It looks like the __ratelimit() return value is backward from what one would expect from normal English grammar, namely that if __ratelimit() returns true the action should NOT be limited. Fix the logic checking the __ratelimit() return value, and add a check in sanity test_422 (which forces a service thread timeout) to ensure that the watchdog sometimes prints a full stack. Fixes: aeaf46886c7b ("lustre: ptlrpc: add watchdog for ptlrpc service threads") WC-bug-id: https://jira.whamcloud.com/browse/LU-12838 Lustre-commit: 594c79f2f855 ("LU-12838 ptlrpc: fix watchdog ratelimit logic") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36409 Reviewed-by: James Simmons Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index b2a33a3..fe0e108 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -2067,7 +2067,8 @@ static void ptlrpc_watchdog_fire(struct work_struct *w) s64 ms_lapse = ktime_ms_delta(ktime_get(), thread->t_touched); u32 ms_frac = do_div(ms_lapse, MSEC_PER_SEC); - if (!__ratelimit(&watchdog_limit)) { + /* ___ratelimit() returns true if the action is NOT ratelimited */ + if (__ratelimit(&watchdog_limit)) { /* below message is checked in sanity-quota.sh test_6,18 */ LCONSOLE_WARN("%s: service thread pid %u was inactive for %llu.%.03u seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:\n", thread->t_task->comm, thread->t_task->pid, From patchwork Thu Feb 27 21:16:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410899 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A465924 for ; Thu, 27 Feb 2020 21:50:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 72CB924690 for ; Thu, 27 Feb 2020 21:50:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 72CB924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0B7F3499AD; Thu, 27 Feb 2020 13:41:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F02634886C for ; Thu, 27 Feb 2020 13:20:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3321F9191; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 31A6B46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:12 -0500 Message-Id: <1582838290-17243-505-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 504/622] lustre: flr: avoid reading unhealthy mirror X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam * Fix an error in lov_io_mirror_init() which would wait unnecessarily if we're retrying the last mirror of the file. * In osc_io_iter_init() we'd check its OSC import status so that the read path can quickly switch another mirror. sanity-flr test_33b is added to test this case. * And with all mirrors have been tried, we'd turn off the quick switch so that when all mirrors contain bad OSTs, the read will still try its best to get partial data from a component before trying another mirror. sanity-flr test_33c is added to test this case. Fixes: 4b102da53ad ("lustre: ptlrpc: idle connections can disconnect") WC-bug-id: https://jira.whamcloud.com/browse/LU-12328 Lustre-commit: 39da3c06275e ("LU-12328 flr: avoid reading unhealthy mirror") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/34952 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 8 +++++++- fs/lustre/lov/lov_io.c | 25 ++++++++++++++++--------- fs/lustre/osc/osc_io.c | 16 +++++++++++++++- 3 files changed, 38 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 75ece62..c3376a4 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1906,7 +1906,13 @@ struct cl_io { /** * Set if IO is triggered by async workqueue readahead. */ - ci_async_readahead:1; + ci_async_readahead:1, + /** + * Set if we've tried all mirrors for this read IO, if it's not set, + * the read IO will check to-be-read OSCs' status, and make fast-switch + * another mirror if some of the OSTs are not healthy. + */ + ci_tried_all_mirrors:1; /** * How many times the read has retried before this one. * Set by the top level and consumed by the LOV. diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 56e4a982..971f9ba 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -140,6 +140,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio, sub_io->ci_lock_no_expand = io->ci_lock_no_expand; sub_io->ci_ndelay = io->ci_ndelay; sub_io->ci_layout_version = io->ci_layout_version; + sub_io->ci_tried_all_mirrors = io->ci_tried_all_mirrors; rc = cl_io_sub_init(sub->sub_env, sub_io, io->ci_type, sub_obj); if (rc < 0) @@ -395,13 +396,13 @@ static int lov_io_mirror_init(struct lov_io *lio, struct lov_object *obj, found = true; break; } - } - + } /* each component of the mirror */ if (found) { index = (index + i) % comp->lo_mirror_count; break; } - } + } /* each mirror */ + if (i == comp->lo_mirror_count) { CERROR(DFID ": failed to find a component covering I/O region at %llu\n", PFID(lu_object_fid(lov2lu(obj))), lio->lis_pos); @@ -423,16 +424,21 @@ static int lov_io_mirror_init(struct lov_io *lio, struct lov_object *obj, * of this client has been partitioned. We should relinquish CPU for * a while before trying again. */ - ++io->ci_ndelay_tried; - if (io->ci_ndelay && io->ci_ndelay_tried >= comp->lo_mirror_count) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(msecs_to_jiffies(MSEC_PER_SEC)); /* 10ms */ + if (io->ci_ndelay && io->ci_ndelay_tried > 0 && + (io->ci_ndelay_tried % comp->lo_mirror_count == 0)) { + schedule_timeout_interruptible(HZ / 100 + 1); /* 10ms */ if (signal_pending(current)) return -EINTR; - /* reset retry counter */ - io->ci_ndelay_tried = 1; + /** + * we'd set ci_tried_all_mirrors to turn off fast mirror + * switching for read after we've tried all mirrors several + * rounds. + */ + io->ci_tried_all_mirrors = io->ci_ndelay_tried % + (comp->lo_mirror_count * 4) == 0; } + ++io->ci_ndelay_tried; CDEBUG(D_VFSTRACE, "use %sdelayed RPC state for this IO\n", io->ci_ndelay ? "non-" : ""); @@ -668,6 +674,7 @@ static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio, case CIT_READ: case CIT_WRITE: { io->u.ci_wr.wr_sync = cl_io_is_sync_write(parent); + io->ci_tried_all_mirrors = parent->ci_tried_all_mirrors; if (cl_io_is_append(parent)) { io->u.ci_wr.wr_append = 1; } else { diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index f340266..1ff2df2 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -368,6 +368,13 @@ int osc_io_commit_async(const struct lu_env *env, } EXPORT_SYMBOL(osc_io_commit_async); +static bool osc_import_not_healthy(struct obd_import *imp) +{ + return imp->imp_invalid || imp->imp_deactive || + !(imp->imp_state == LUSTRE_IMP_FULL || + imp->imp_state == LUSTRE_IMP_IDLE); +} + int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios) { struct osc_object *osc = cl2osc(ios->cis_obj); @@ -376,7 +383,14 @@ int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios) int rc = -EIO; spin_lock(&imp->imp_lock); - if (likely(!imp->imp_invalid)) { + /** + * check whether this OSC device is available for non-delay read, + * fast switching mirror if we haven't tried all mirrors. + */ + if (ios->cis_io->ci_type == CIT_READ && ios->cis_io->ci_ndelay && + !ios->cis_io->ci_tried_all_mirrors && osc_import_not_healthy(imp)) { + rc = -EWOULDBLOCK; + } else if (likely(!imp->imp_invalid)) { atomic_inc(&osc->oo_nr_ios); oio->oi_is_active = 1; rc = 0; From patchwork Thu Feb 27 21:16:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18C8C924 for ; Thu, 27 Feb 2020 21:45:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0149D24690 for ; Thu, 27 Feb 2020 21:45:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0149D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0270734AFA1; Thu, 27 Feb 2020 13:36:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 950B53487C4 for ; Thu, 27 Feb 2020 13:20:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 36FB99192; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 34A79496; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:13 -0500 Message-Id: <1582838290-17243-506-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 505/622] lustre: obdclass: lu_tgt_descs cleanup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao This patch cleans up code about lu_tgt_descs, so that it's cleaner to add MDT object QoS allocation support: * rename struct ost_pool to lu_tgt_pool. * put struct lu_qos, lmv_desc/lov_desc and lu_tgt_pool into struct lu_tgt_descs because it's more natural to manage these data there and fewer arguments are needed to pass around in related functions. * remove lu_tgt_descs.ltd_tgtnr, use lu_tgt_descs.ltd_lov_desc.ld_tgt_count instead, because they are duplicate. * other cleanups. WC-bug-id: https://jira.whamcloud.com/browse/LU-12624 Lustre-commit: 45222b2ef279 ("LU-12624 obdclass: lu_tgt_descs cleanup") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35824 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 81 +++--- fs/lustre/include/obd.h | 7 +- fs/lustre/lmv/lmv_fld.c | 6 +- fs/lustre/lmv/lmv_internal.h | 2 +- fs/lustre/lmv/lmv_obd.c | 118 ++++----- fs/lustre/lmv/lproc_lmv.c | 19 +- fs/lustre/lov/lov_internal.h | 14 +- fs/lustre/lov/lov_pool.c | 10 +- fs/lustre/obdclass/Makefile | 2 +- fs/lustre/obdclass/lu_qos.c | 512 -------------------------------------- fs/lustre/obdclass/lu_tgt_descs.c | 509 ++++++++++++++++++++++++++++++++++++- 11 files changed, 618 insertions(+), 662 deletions(-) delete mode 100644 fs/lustre/obdclass/lu_qos.c diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index eaf20ea..e92f12f 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1322,14 +1322,14 @@ struct lu_kmem_descr { extern u32 lu_context_tags_default; extern u32 lu_session_tags_default; -/* Generic subset of OSTs */ -struct ost_pool { +/* Generic subset of tgts */ +struct lu_tgt_pool { u32 *op_array; /* array of index of * lov_obd->lov_tgts */ - unsigned int op_count; /* number of OSTs in the array */ - unsigned int op_size; /* allocated size of lp_array */ - struct rw_semaphore op_rw_sem; /* to protect ost_pool use */ + unsigned int op_count; /* number of tgts in the array */ + unsigned int op_size; /* allocated size of op_array */ + struct rw_semaphore op_rw_sem; /* to protect lu_tgt_pool use */ }; /* round-robin QoS data for LOD/LMV */ @@ -1338,7 +1338,7 @@ struct lu_qos_rr { u32 lqr_start_idx; /* start index of new inode */ u32 lqr_offset_idx;/* aliasing for start_idx */ int lqr_start_count;/* reseed counter */ - struct ost_pool lqr_pool; /* round-robin optimized list */ + struct lu_tgt_pool lqr_pool; /* round-robin optimized list */ unsigned long lqr_dirty:1; /* recalc round-robin list */ }; @@ -1401,13 +1401,30 @@ struct lu_tgt_desc_idx { struct lu_tgt_desc *ldi_tgt[TGT_PTRS_PER_BLOCK]; }; +/* QoS data for LOD/LMV */ +struct lu_qos { + struct list_head lq_svr_list; /* lu_svr_qos list */ + struct rw_semaphore lq_rw_sem; + u32 lq_active_svr_count; + unsigned int lq_prio_free; /* priority for free space */ + unsigned int lq_threshold_rr;/* priority for rr */ + struct lu_qos_rr lq_rr; /* round robin qos data */ + unsigned long lq_dirty:1, /* recalc qos data */ + lq_same_space:1,/* the servers all have approx. + * the same space avail + */ + lq_reset:1; /* zero current penalties */ +}; + struct lu_tgt_descs { + union { + struct lov_desc ltd_lov_desc; + struct lmv_desc ltd_lmv_desc; + }; /* list of known TGTs */ struct lu_tgt_desc_idx *ltd_tgt_idx[TGT_PTRS]; /* Size of the lu_tgts array, granted to be a power of 2 */ u32 ltd_tgts_size; - /* number of registered TGTs */ - u32 ltd_tgtnr; /* bitmap of TGTs available */ unsigned long *ltd_tgt_bitmap; /* TGTs scheduled to be deleted */ @@ -1418,43 +1435,31 @@ struct lu_tgt_descs { struct mutex ltd_mutex; /* read/write semaphore used for array relocation */ struct rw_semaphore ltd_rw_sem; + /* QoS */ + struct lu_qos ltd_qos; + /* all tgts in a packed array */ + struct lu_tgt_pool ltd_tgt_pool; + /* true if tgt is MDT */ + bool ltd_is_mdt; }; #define LTD_TGT(ltd, index) \ - ((ltd)->ltd_tgt_idx[(index) / TGT_PTRS_PER_BLOCK] \ - ->ldi_tgt[(index) % TGT_PTRS_PER_BLOCK]) + (ltd)->ltd_tgt_idx[(index) / TGT_PTRS_PER_BLOCK] \ + ->ldi_tgt[(index) % TGT_PTRS_PER_BLOCK] -/* QoS data for LOD/LMV */ -struct lu_qos { - struct list_head lq_svr_list; /* lu_svr_qos list */ - struct rw_semaphore lq_rw_sem; - u32 lq_active_svr_count; - unsigned int lq_prio_free; /* priority for free space */ - unsigned int lq_threshold_rr;/* priority for rr */ - struct lu_qos_rr lq_rr; /* round robin qos data */ - unsigned long lq_dirty:1, /* recalc qos data */ - lq_same_space:1,/* the servers all have approx. - * the same space avail - */ - lq_reset:1; /* zero current penalties */ -}; - -void lu_qos_rr_init(struct lu_qos_rr *lqr); -int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); -int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); -bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr); -int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, - u32 active_tgt_nr, u32 maxage, bool is_mdt); -void lqos_calc_weight(struct lu_tgt_desc *tgt); -int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, - struct lu_tgt_desc *tgt, u32 active_tgt_nr, - u64 *total_wt); u64 lu_prandom_u64_max(u64 ep_ro); +void lu_qos_rr_init(struct lu_qos_rr *lqr); +int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt); -int lu_tgt_descs_init(struct lu_tgt_descs *ltd); +int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt); void lu_tgt_descs_fini(struct lu_tgt_descs *ltd); -int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); -void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +int ltd_add_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +bool ltd_qos_is_usable(struct lu_tgt_descs *ltd); +int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd); +int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, + u64 *total_wt); static inline struct lu_tgt_desc *ltd_first_tgt(struct lu_tgt_descs *ltd) { diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 41431f9..4ba70c7 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -394,7 +394,7 @@ struct lov_md_tgt_desc { struct lov_obd { struct lov_desc desc; struct lov_tgt_desc **lov_tgts; /* sparse array */ - struct ost_pool lov_packed; /* all OSTs in a packed array */ + struct lu_tgt_pool lov_packed; /* all OSTs in a packed array */ struct mutex lov_lock; struct obd_connect_data lov_ocd; atomic_t lov_refcount; @@ -422,7 +422,6 @@ struct lov_obd { struct lmv_obd { struct lu_client_fld lmv_fld; spinlock_t lmv_lock; - struct lmv_desc desc; int connected; int max_easize; @@ -435,10 +434,12 @@ struct lmv_obd { struct kobject *lmv_tgts_kobj; void *lmv_cache; - struct lu_qos lmv_qos; u32 lmv_qos_rr_index; }; +#define lmv_mdt_count lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count +#define lmv_qos lmv_mdt_descs.ltd_qos + struct niobuf_local { u64 lnb_file_offset; u32 lnb_page_offset; diff --git a/fs/lustre/lmv/lmv_fld.c b/fs/lustre/lmv/lmv_fld.c index ef2c866..ea1ef72 100644 --- a/fs/lustre/lmv/lmv_fld.c +++ b/fs/lustre/lmv/lmv_fld.c @@ -75,11 +75,11 @@ int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds) CDEBUG(D_INODE, "FLD lookup got mds #%x for fid=" DFID "\n", *mds, PFID(fid)); - if (*mds >= lmv->desc.ld_tgt_count) { + if (*mds >= lmv->lmv_mdt_descs.ltd_tgts_size) { rc = -EINVAL; CERROR("%s: FLD lookup got invalid mds #%x (max: %x) for fid=" DFID ": rc = %d\n", - obd->obd_name, *mds, lmv->desc.ld_tgt_count, PFID(fid), - rc); + obd->obd_name, *mds, lmv->lmv_mdt_descs.ltd_tgts_size, + PFID(fid), rc); } return rc; } diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index d95fa3f..70d86676 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -122,7 +122,7 @@ struct lu_tgt_desc *lmv_next_connected_tgt(struct lmv_obd *lmv, u32 mdt_idx; int rc; - if (lmv->desc.ld_tgt_count < 2) + if (lmv->lmv_mdt_count < 2) return 0; rc = lmv_fld_lookup(lmv, fid, &mdt_idx); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 2959b18..84be905 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -64,7 +64,8 @@ void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, return; tgt->ltd_active = activate; - lmv->desc.ld_active_tgt_count += (activate ? 1 : -1); + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count += + (activate ? 1 : -1); tgt->ltd_exp->exp_obd->obd_inactive = !activate; } @@ -330,11 +331,11 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) tgt->ltd_active = 1; tgt->ltd_exp = mdc_exp; - lmv->desc.ld_active_tgt_count++; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count++; md_init_ea_size(tgt->ltd_exp, lmv->max_easize, lmv->max_def_easize); - rc = lqos_add_tgt(&lmv->lmv_qos, tgt); + rc = lu_qos_add_tgt(&lmv->lmv_qos, tgt); if (rc) { obd_disconnect(mdc_exp); return rc; @@ -357,8 +358,7 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) static void lmv_del_target(struct lmv_obd *lmv, struct lu_tgt_desc *tgt) { LASSERT(tgt); - lqos_del_tgt(&lmv->lmv_qos, tgt); - lu_tgt_descs_del(&lmv->lmv_mdt_descs, tgt); + ltd_del_tgt(&lmv->lmv_mdt_descs, tgt); kfree(tgt); } @@ -369,7 +369,6 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, struct obd_device *mdc_obd; struct lmv_tgt_desc *tgt; struct lu_tgt_descs *ltd = &lmv->lmv_mdt_descs; - int orig_tgt_count = 0; int rc = 0; CDEBUG(D_CONFIG, "Target uuid: %s. index %d\n", uuidp->uuid, index); @@ -392,11 +391,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, tgt->ltd_active = 0; mutex_lock(<d->ltd_mutex); - rc = lu_tgt_descs_add(ltd, tgt); - if (!rc && index >= lmv->desc.ld_tgt_count) { - orig_tgt_count = lmv->desc.ld_tgt_count; - lmv->desc.ld_tgt_count = index + 1; - } + rc = ltd_add_tgt(ltd, tgt); mutex_unlock(<d->ltd_mutex); if (rc) @@ -407,14 +402,10 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, return rc; rc = lmv_connect_mdc(obd, tgt); - if (rc) { - mutex_lock(<d->ltd_mutex); - lmv->desc.ld_tgt_count = orig_tgt_count; - memset(tgt, 0, sizeof(*tgt)); - mutex_unlock(<d->ltd_mutex); - } else { + if (!rc) { int easize = sizeof(struct lmv_stripe_md) + - lmv->desc.ld_tgt_count * sizeof(struct lu_fid); + lmv->lmv_mdt_count * sizeof(struct lu_fid); + lmv_init_ea_size(obd->obd_self_export, easize, 0); } @@ -441,7 +432,7 @@ static int lmv_check_connect(struct obd_device *obd) goto unlock; } - if (lmv->desc.ld_tgt_count == 0) { + if (!lmv->lmv_mdt_count) { CERROR("%s: no targets configured: rc = -EINVAL\n", obd->obd_name); rc = -EINVAL; @@ -465,7 +456,7 @@ static int lmv_check_connect(struct obd_device *obd) } lmv->connected = 1; - easize = lmv_mds_md_size(lmv->desc.ld_tgt_count, LMV_MAGIC); + easize = lmv_mds_md_size(lmv->lmv_mdt_count, LMV_MAGIC); lmv_init_ea_size(obd->obd_self_export, easize, 0); unlock: mutex_unlock(&lmv->lmv_mdt_descs.ltd_mutex); @@ -478,7 +469,7 @@ static int lmv_check_connect(struct obd_device *obd) if (!tgt->ltd_exp) continue; - --lmv->desc.ld_active_tgt_count; + --lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count; obd_disconnect(tgt->ltd_exp); } @@ -810,7 +801,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, struct lmv_obd *lmv = &obddev->u.lmv; struct lu_tgt_desc *tgt = NULL; int set = 0; - u32 count = lmv->desc.ld_tgt_count; + u32 count = lmv->lmv_mdt_count; int rc = 0; if (count == 0) @@ -824,7 +815,8 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, u32 index; memcpy(&index, data->ioc_inlbuf2, sizeof(u32)); - if (index >= count) + + if (index >= lmv->lmv_mdt_descs.ltd_tgts_size) return -ENODEV; tgt = lmv_tgt(lmv, index); @@ -857,12 +849,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, struct obd_quotactl *oqctl; if (qctl->qc_valid == QC_MDTIDX) { - if (count <= qctl->qc_idx) - return -EINVAL; - tgt = lmv_tgt(lmv, qctl->qc_idx); - if (!tgt || !tgt->ltd_exp) - return -EINVAL; } else if (qctl->qc_valid == QC_UUID) { lmv_foreach_tgt(lmv, tgt) { if (!obd_uuid_equals(&tgt->ltd_uuid, @@ -878,10 +865,9 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, return -EINVAL; } - if (tgt->ltd_index >= count) - return -EAGAIN; + if (!tgt || !tgt->ltd_exp) + return -EINVAL; - LASSERT(tgt && tgt->ltd_exp); oqctl = kzalloc(sizeof(*oqctl), GFP_KERNEL); if (!oqctl) return -ENOMEM; @@ -1069,7 +1055,7 @@ static u32 lmv_placement_policy(struct obd_device *obd, struct lmv_user_md *lum; u32 mdt; - if (lmv->desc.ld_tgt_count == 1) + if (lmv->lmv_mdt_count == 1) return 0; lum = op_data->op_data; @@ -1182,27 +1168,17 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) return -EINVAL; } - obd_str2uuid(&lmv->desc.ld_uuid, desc->ld_uuid.uuid); - lmv->desc.ld_tgt_count = 0; - lmv->desc.ld_active_tgt_count = 0; - lmv->desc.ld_qos_maxage = LMV_DESC_QOS_MAXAGE_DEFAULT; + obd_str2uuid(&lmv->lmv_mdt_descs.ltd_lmv_desc.ld_uuid, + desc->ld_uuid.uuid); + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count = 0; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count = 0; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage = + LMV_DESC_QOS_MAXAGE_DEFAULT; lmv->max_def_easize = 0; lmv->max_easize = 0; spin_lock_init(&lmv->lmv_lock); - /* Set up allocation policy (QoS and RR) */ - INIT_LIST_HEAD(&lmv->lmv_qos.lq_svr_list); - init_rwsem(&lmv->lmv_qos.lq_rw_sem); - lmv->lmv_qos.lq_dirty = 1; - lmv->lmv_qos.lq_reset = 1; - /* Default priority is toward free space balance */ - lmv->lmv_qos.lq_prio_free = 232; - /* Default threshold for rr (roughly 17%) */ - lmv->lmv_qos.lq_threshold_rr = 43; - - lu_qos_rr_init(&lmv->lmv_qos.lq_rr); - /* * initialize rr_index to lower 32bit of netid, so that client * can distribute subdirs evenly from the beginning. @@ -1224,7 +1200,7 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) if (rc) CERROR("Can't init FLD, err %d\n", rc); - rc = lu_tgt_descs_init(&lmv->lmv_mdt_descs); + rc = lu_tgt_descs_init(&lmv->lmv_mdt_descs, true); if (rc) CWARN("%s: error initialize target table: rc = %d\n", obd->obd_name, rc); @@ -1292,7 +1268,7 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) if (flags & OBD_STATFS_FOR_MDT0) return 0; - if (lmv->lmv_statfs_start || lmv->desc.ld_tgt_count == 1) + if (lmv->lmv_statfs_start || lmv->lmv_mdt_count == 1) return lmv->lmv_statfs_start; /* choose initial MDT for this client */ @@ -1306,8 +1282,8 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) /* We dont need a full 64-bit modulus, just enough * to distribute the requests across MDTs evenly. */ - lmv->lmv_statfs_start = - (u32)lnet_id.nid % lmv->desc.ld_tgt_count; + lmv->lmv_statfs_start = (u32)lnet_id.nid % + lmv->lmv_mdt_count; break; } } @@ -1333,8 +1309,8 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, /* distribute statfs among MDTs */ idx = lmv_select_statfs_mdt(lmv, flags); - for (i = 0; i < lmv->desc.ld_tgt_count; i++, idx++) { - idx = idx % lmv->desc.ld_tgt_count; + for (i = 0; i < lmv->lmv_mdt_descs.ltd_tgts_size; i++, idx++) { + idx = idx % lmv->lmv_mdt_descs.ltd_tgts_size; tgt = lmv_tgt(lmv, idx); if (!tgt || !tgt->ltd_exp) continue; @@ -1410,7 +1386,7 @@ int lmv_statfs_check_update(struct obd_device *obd, struct lmv_tgt_desc *tgt) int rc; if (ktime_get_seconds() - tgt->ltd_statfs_age < - obd->u.lmv.desc.ld_qos_maxage) + obd->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage) return 0; rc = obd_statfs_async(tgt->ltd_exp, &oinfo, 0, NULL); @@ -1526,19 +1502,17 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) u64 rand; int rc; - if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) + if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs)) return ERR_PTR(-EAGAIN); down_write(&lmv->lmv_qos.lq_rw_sem); - if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) { + if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs)) { tgt = ERR_PTR(-EAGAIN); goto unlock; } - rc = lqos_calc_penalties(&lmv->lmv_qos, &lmv->lmv_mdt_descs, - lmv->desc.ld_active_tgt_count, - lmv->desc.ld_qos_maxage, true); + rc = ltd_qos_penalties_calc(&lmv->lmv_mdt_descs); if (rc) { tgt = ERR_PTR(rc); goto unlock; @@ -1550,7 +1524,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) continue; tgt->ltd_qos.ltq_usable = 1; - lqos_calc_weight(tgt); + lu_tgt_qos_weight_calc(tgt); total_weight += tgt->ltd_qos.ltq_weight; } @@ -1565,9 +1539,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) continue; *mdt = tgt->ltd_index; - lqos_recalc_weight(&lmv->lmv_qos, &lmv->lmv_mdt_descs, tgt, - lmv->desc.ld_active_tgt_count, - &total_weight); + ltd_qos_update(&lmv->lmv_mdt_descs, tgt, &total_weight); rc = 0; goto unlock; } @@ -1588,14 +1560,16 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) int index; spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - index = (i + lmv->lmv_qos_rr_index) % lmv->desc.ld_tgt_count; + for (i = 0; i < lmv->lmv_mdt_descs.ltd_tgts_size; i++) { + index = (i + lmv->lmv_qos_rr_index) % + lmv->lmv_mdt_descs.ltd_tgts_size; tgt = lmv_tgt(lmv, index); if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) continue; *mdt = tgt->ltd_index; - lmv->lmv_qos_rr_index = (*mdt + 1) % lmv->desc.ld_tgt_count; + lmv->lmv_qos_rr_index = (*mdt + 1) % + lmv->lmv_mdt_descs.ltd_tgts_size; spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); return tgt; @@ -1791,7 +1765,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, struct lmv_tgt_desc *tgt; int rc; - if (!lmv->desc.ld_active_tgt_count) + if (!lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count) return -EIO; if (lmv_dir_bad_hash(op_data->op_mea1)) @@ -2903,7 +2877,7 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, exp->exp_connect_data = *(struct obd_connect_data *)val; return rc; } else if (KEY_IS(KEY_TGT_COUNT)) { - *((int *)val) = lmv->desc.ld_tgt_count; + *((int *)val) = lmv->lmv_mdt_descs.ltd_tgts_size; return 0; } @@ -2917,7 +2891,7 @@ static int lmv_rmfid(struct obd_export *exp, struct fid_array *fa, struct obd_device *obddev = class_exp2obd(exp); struct ptlrpc_request_set *set = _set; struct lmv_obd *lmv = &obddev->u.lmv; - int tgt_count = lmv->desc.ld_tgt_count; + int tgt_count = lmv->lmv_mdt_count; struct lu_tgt_desc *tgt; struct fid_array *fat, **fas = NULL; int i, rc, **rcs = NULL; @@ -3303,8 +3277,8 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, u64 flags, * since this can be easily found, and only try others if that fails. */ for (i = 0, index = lmv_fid2tgt_index(lmv, fid); - i < lmv->desc.ld_tgt_count; - i++, index = (index + 1) % lmv->desc.ld_tgt_count) { + i < lmv->lmv_mdt_descs.ltd_tgts_size; + i++, index = (index + 1) % lmv->lmv_mdt_descs.ltd_tgts_size) { if (index < 0) { CDEBUG(D_HA, "%s: " DFID " is inaccessible: rc = %d\n", obd->obd_name, PFID(fid), index); diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index af670f8..79e27b3 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -45,10 +45,8 @@ static ssize_t numobd_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%u\n", desc->ld_tgt_count); + return sprintf(buf, "%u\n", dev->u.lmv.lmv_mdt_count); } LUSTRE_RO_ATTR(numobd); @@ -57,10 +55,9 @@ static ssize_t activeobd_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%u\n", desc->ld_active_tgt_count); + return sprintf(buf, "%u\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count); } LUSTRE_RO_ATTR(activeobd); @@ -69,10 +66,9 @@ static ssize_t desc_uuid_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%s\n", desc->ld_uuid.uuid); + return sprintf(buf, "%s\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_uuid.uuid); } LUSTRE_RO_ATTR(desc_uuid); @@ -83,7 +79,8 @@ static ssize_t qos_maxage_show(struct kobject *kobj, struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - return sprintf(buf, "%u\n", dev->u.lmv.desc.ld_qos_maxage); + return sprintf(buf, "%u\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage); } static ssize_t qos_maxage_store(struct kobject *kobj, @@ -100,7 +97,7 @@ static ssize_t qos_maxage_store(struct kobject *kobj, if (rc) return rc; - dev->u.lmv.desc.ld_qos_maxage = val; + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage = val; return count; } diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h index d235abe..3725d1e 100644 --- a/fs/lustre/lov/lov_internal.h +++ b/fs/lustre/lov/lov_internal.h @@ -221,7 +221,7 @@ struct lsm_operations { struct pool_desc { char pool_name[LOV_MAXPOOLNAME + 1]; - struct ost_pool pool_obds; + struct lu_tgt_pool pool_obds; atomic_t pool_refcount; struct rhash_head pool_hash; /* access by poolname */ union { @@ -322,12 +322,12 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf, #define LOV_MDC_TGT_MAX 256 -/* ost_pool methods */ -int lov_ost_pool_init(struct ost_pool *op, unsigned int count); -int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count); -int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count); -int lov_ost_pool_remove(struct ost_pool *op, u32 idx); -int lov_ost_pool_free(struct ost_pool *op); +/* lu_tgt_pool methods */ +int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count); +int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count); +int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count); +int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx); +int lov_ost_pool_free(struct lu_tgt_pool *op); /* high level pool methods */ int lov_pool_new(struct obd_device *obd, char *poolname); diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c index a0552fb..9ab81cb 100644 --- a/fs/lustre/lov/lov_pool.c +++ b/fs/lustre/lov/lov_pool.c @@ -231,7 +231,7 @@ static int pool_proc_open(struct inode *inode, struct file *file) }; #define LOV_POOL_INIT_COUNT 2 -int lov_ost_pool_init(struct ost_pool *op, unsigned int count) +int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count) { if (count == 0) count = LOV_POOL_INIT_COUNT; @@ -249,7 +249,7 @@ int lov_ost_pool_init(struct ost_pool *op, unsigned int count) } /* Caller must hold write op_rwlock */ -int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count) +int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count) { int new_count; u32 *new; @@ -273,7 +273,7 @@ int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count) return 0; } -int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count) +int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count) { int rc = 0, i; @@ -298,7 +298,7 @@ int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count) return rc; } -int lov_ost_pool_remove(struct ost_pool *op, u32 idx) +int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx) { int i; @@ -318,7 +318,7 @@ int lov_ost_pool_remove(struct ost_pool *op, u32 idx) return -EINVAL; } -int lov_ost_pool_free(struct ost_pool *op) +int lov_ost_pool_free(struct lu_tgt_pool *op) { if (op->op_size == 0) return 0; diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 5718a6d..9693a5e 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o integrity.o obd_cksum.o lu_qos.o lu_tgt_descs.o + jobid.o integrity.o obd_cksum.o lu_tgt_descs.o diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c deleted file mode 100644 index 13ab4a7..0000000 --- a/fs/lustre/obdclass/lu_qos.c +++ /dev/null @@ -1,512 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * - * lustre/obdclass/lu_qos.c - * - * Lustre QoS. - * These are the only exported functions, they provide some generic - * infrastructure for object allocation QoS - * - */ - -#define DEBUG_SUBSYSTEM S_CLASS - -#include -#include -#include -#include -#include -#include -#include -#include - -void lu_qos_rr_init(struct lu_qos_rr *lqr) -{ - spin_lock_init(&lqr->lqr_alloc); - lqr->lqr_dirty = 1; -} -EXPORT_SYMBOL(lu_qos_rr_init); - -/** - * Add a new target to Quality of Service (QoS) target table. - * - * Add a new MDT/OST target to the structure representing an OSS. Resort the - * list of known MDSs/OSSs by the number of MDTs/OSTs attached to each MDS/OSS. - * The MDS/OSS list is protected internally and no external locking is required. - * - * @qos lu_qos data - * @ltd target description - * - * Return: 0 on success - * -ENOMEM on error - */ -int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) -{ - struct lu_svr_qos *svr = NULL; - struct lu_svr_qos *tempsvr; - struct obd_export *exp = ltd->ltd_exp; - int found = 0; - u32 id = 0; - int rc = 0; - - down_write(&qos->lq_rw_sem); - /* - * a bit hacky approach to learn NID of corresponding connection - * but there is no official API to access information like this - * with OSD API. - */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - if (obd_uuid_equals(&svr->lsq_uuid, - &exp->exp_connection->c_remote_uuid)) { - found++; - break; - } - if (svr->lsq_id > id) - id = svr->lsq_id; - } - - if (!found) { - svr = kmalloc(sizeof(*svr), GFP_NOFS); - if (!svr) { - rc = -ENOMEM; - goto out; - } - memcpy(&svr->lsq_uuid, &exp->exp_connection->c_remote_uuid, - sizeof(svr->lsq_uuid)); - ++id; - svr->lsq_id = id; - } else { - /* Assume we have to move this one */ - list_del(&svr->lsq_svr_list); - } - - svr->lsq_tgt_count++; - ltd->ltd_qos.ltq_svr = svr; - - CDEBUG(D_OTHER, "add tgt %s to server %s (%d targets)\n", - obd_uuid2str(<d->ltd_uuid), obd_uuid2str(&svr->lsq_uuid), - svr->lsq_tgt_count); - - /* - * Add sorted by # of tgts. Find the first entry that we're - * bigger than... - */ - list_for_each_entry(tempsvr, &qos->lq_svr_list, lsq_svr_list) { - if (svr->lsq_tgt_count > tempsvr->lsq_tgt_count) - break; - } - /* - * ...and add before it. If we're the first or smallest, tempsvr - * points to the list head, and we add to the end. - */ - list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list); - - qos->lq_dirty = 1; - qos->lq_rr.lqr_dirty = 1; - -out: - up_write(&qos->lq_rw_sem); - return rc; -} -EXPORT_SYMBOL(lqos_add_tgt); - -/** - * Remove MDT/OST target from QoS table. - * - * Removes given MDT/OST target from QoS table and releases related - * MDS/OSS structure if no target remain on the MDS/OSS. - * - * @qos lu_qos data - * @ltd target description - * - * Return: 0 on success - * -ENOENT if no server was found - */ -int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) -{ - struct lu_svr_qos *svr; - int rc = 0; - - down_write(&qos->lq_rw_sem); - svr = ltd->ltd_qos.ltq_svr; - if (!svr) { - rc = -ENOENT; - goto out; - } - - svr->lsq_tgt_count--; - if (svr->lsq_tgt_count == 0) { - CDEBUG(D_OTHER, "removing server %s\n", - obd_uuid2str(&svr->lsq_uuid)); - list_del(&svr->lsq_svr_list); - ltd->ltd_qos.ltq_svr = NULL; - kfree(svr); - } - - qos->lq_dirty = 1; - qos->lq_rr.lqr_dirty = 1; -out: - up_write(&qos->lq_rw_sem); - return rc; -} -EXPORT_SYMBOL(lqos_del_tgt); - -/** - * lu_prandom_u64_max - returns a pseudo-random u64 number in interval - * [0, ep_ro) - * - * #ep_ro right open interval endpoint - * - * Return: a pseudo-random 64-bit number that is in interval [0, ep_ro). - */ -u64 lu_prandom_u64_max(u64 ep_ro) -{ - u64 rand = 0; - - if (ep_ro) { -#if BITS_PER_LONG == 32 - /* - * If ep_ro > 32-bit, first generate the high - * 32 bits of the random number, then add in the low - * 32 bits (truncated to the upper limit, if needed) - */ - if (ep_ro > 0xffffffffULL) - rand = prandom_u32_max((u32)(ep_ro >> 32)) << 32; - - if (rand == (ep_ro & 0xffffffff00000000ULL)) - rand |= prandom_u32_max((u32)ep_ro); - else - rand |= prandom_u32(); -#else - rand = ((u64)prandom_u32() << 32 | prandom_u32()) % ep_ro; -#endif - } - - return rand; -} -EXPORT_SYMBOL(lu_prandom_u64_max); - -static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) -{ - struct obd_statfs *statfs = &tgt->ltd_statfs; - - return statfs->os_bavail * statfs->os_bsize; -} - -static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) -{ - return tgt->ltd_statfs.os_ffree; -} - -/** - * Calculate penalties per-tgt and per-server - * - * Re-calculate penalties when the configuration changes, active targets - * change and after statfs refresh (all these are reflected by lq_dirty flag). - * On every tgt and server: decay the penalty by half for every 8x the update - * interval that the device has been idle. That gives lots of time for the - * statfs information to be updated (which the penalty is only a proxy for), - * and avoids penalizing server/tgt under light load. - * See lqos_calc_weight() for how penalties are factored into the weight. - * - * @qos lu_qos - * @ltd lu_tgt_descs - * @active_tgt_nr active tgt number - * @ maxage qos max age - * @is_mdt MDT will count inode usage - * - * Return: 0 on success - * -EAGAIN the number of tgt isn't enough or all - * tgt spaces are almost the same - */ -int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, - u32 active_tgt_nr, u32 maxage, bool is_mdt) -{ - struct lu_tgt_desc *tgt; - struct lu_svr_qos *svr; - u64 ba_max, ba_min, ba; - u64 ia_max, ia_min, ia = 1; - u32 num_active; - int prio_wide; - time64_t now, age; - int rc; - - if (!qos->lq_dirty) { - rc = 0; - goto out; - } - - num_active = active_tgt_nr - 1; - if (num_active < 1) { - rc = -EAGAIN; - goto out; - } - - /* find bavail on each server */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - svr->lsq_bavail = 0; - /* if inode is not counted, set to 1 to ignore */ - svr->lsq_iavail = is_mdt ? 0 : 1; - } - qos->lq_active_svr_count = 0; - - /* - * How badly user wants to select targets "widely" (not recently chosen - * and not on recent MDS's). As opposed to "freely" (free space avail.) - * 0-256 - */ - prio_wide = 256 - qos->lq_prio_free; - - ba_min = (u64)(-1); - ba_max = 0; - ia_min = (u64)(-1); - ia_max = 0; - now = ktime_get_real_seconds(); - - /* Calculate server penalty per object */ - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - /* when inode is counted, bavail >> 16 to avoid overflow */ - ba = tgt_statfs_bavail(tgt); - if (is_mdt) - ba >>= 16; - else - ba >>= 8; - if (!ba) - continue; - - ba_min = min(ba, ba_min); - ba_max = max(ba, ba_max); - - /* Count the number of usable servers */ - if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) - qos->lq_active_svr_count++; - tgt->ltd_qos.ltq_svr->lsq_bavail += ba; - - if (is_mdt) { - /* iavail >> 8 to avoid overflow */ - ia = tgt_statfs_iavail(tgt) >> 8; - if (!ia) - continue; - - ia_min = min(ia, ia_min); - ia_max = max(ia, ia_max); - - tgt->ltd_qos.ltq_svr->lsq_iavail += ia; - } - - /* - * per-tgt penalty is - * prio * bavail * iavail / (num_tgt - 1) / 2 - */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 8; - do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); - tgt->ltd_qos.ltq_penalty_per_obj >>= 1; - - age = (now - tgt->ltd_qos.ltq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - tgt->ltd_qos.ltq_penalty = 0; - else if (age > maxage) - /* Decay tgt penalty. */ - tgt->ltd_qos.ltq_penalty >>= (age / maxage); - } - - num_active = qos->lq_active_svr_count - 1; - if (num_active < 1) { - /* - * If there's only 1 server, we can't penalize it, so instead - * we have to double the tgt penalty - */ - num_active = 1; - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - tgt->ltd_qos.ltq_penalty_per_obj <<= 1; - } - } - - /* - * Per-server penalty is - * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 - */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - ba = svr->lsq_bavail; - ia = svr->lsq_iavail; - svr->lsq_penalty_per_obj = prio_wide * ba * ia >> 8; - do_div(ba, svr->lsq_tgt_count * num_active); - svr->lsq_penalty_per_obj >>= 1; - - age = (now - svr->lsq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - svr->lsq_penalty = 0; - else if (age > maxage) - /* Decay server penalty. */ - svr->lsq_penalty >>= age / maxage; - } - - qos->lq_dirty = 0; - qos->lq_reset = 0; - - /* - * If each tgt has almost same free space, do rr allocation for better - * creation performance - */ - qos->lq_same_space = 0; - if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && - (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { - qos->lq_same_space = 1; - /* Reset weights for the next time we enter qos mode */ - qos->lq_reset = 1; - } - rc = 0; - -out: - if (!rc && qos->lq_same_space) - return -EAGAIN; - - return rc; -} -EXPORT_SYMBOL(lqos_calc_penalties); - -bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr) -{ - if (!qos->lq_dirty && qos->lq_same_space) - return false; - - if (active_tgt_nr < 2) - return false; - - return true; -} -EXPORT_SYMBOL(lqos_is_usable); - -/** - * Calculate weight for a given tgt. - * - * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server - * penalties. See lqos_calc_ppts() for how penalties are calculated. - * - * @tgt target descriptor - */ -void lqos_calc_weight(struct lu_tgt_desc *tgt) -{ - struct lu_tgt_qos *ltq = &tgt->ltd_qos; - u64 temp, temp2; - - temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); - temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; - if (temp < temp2) - ltq->ltq_weight = 0; - else - ltq->ltq_weight = temp - temp2; -} -EXPORT_SYMBOL(lqos_calc_weight); - -/** - * Re-calculate weights. - * - * The function is called when some target was used for a new object. In - * this case we should re-calculate all the weights to keep new allocations - * balanced well. - * - * @qos lu_qos - * @ltd lu_tgt_descs - * @tgt target where a new object was placed - * @active_tgt_nr active tgt number - * @total_wt new total weight for the pool - * - * Return: 0 - */ -int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, - struct lu_tgt_desc *tgt, u32 active_tgt_nr, - u64 *total_wt) -{ - struct lu_tgt_qos *ltq; - struct lu_svr_qos *svr; - - ltq = &tgt->ltd_qos; - LASSERT(ltq); - - /* Don't allocate on this device anymore, until the next alloc_qos */ - ltq->ltq_usable = 0; - - svr = ltq->ltq_svr; - - /* - * Decay old penalty by half (we're adding max penalty, and don't - * want it to run away.) - */ - ltq->ltq_penalty >>= 1; - svr->lsq_penalty >>= 1; - - /* mark the server and tgt as recently used */ - ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); - - /* Set max penalties for this tgt and server */ - ltq->ltq_penalty += ltq->ltq_penalty_per_obj * active_tgt_nr; - svr->lsq_penalty += svr->lsq_penalty_per_obj * active_tgt_nr; - - /* Decrease all MDS penalties */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - if (svr->lsq_penalty < svr->lsq_penalty_per_obj) - svr->lsq_penalty = 0; - else - svr->lsq_penalty -= svr->lsq_penalty_per_obj; - } - - *total_wt = 0; - /* Decrease all tgt penalties */ - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) - ltq->ltq_penalty = 0; - else - ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; - - lqos_calc_weight(tgt); - - /* Recalc the total weight of usable osts */ - if (ltq->ltq_usable) - *total_wt += ltq->ltq_weight; - - CDEBUG(D_OTHER, - "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", - tgt->ltd_index, ltq->ltq_usable, - tgt_statfs_bavail(tgt) >> 10, - ltq->ltq_penalty_per_obj >> 10, - ltq->ltq_penalty >> 10, - ltq->ltq_svr->lsq_penalty_per_obj >> 10, - ltq->ltq_svr->lsq_penalty >> 10, - ltq->ltq_weight >> 10); - } - - return 0; -} -EXPORT_SYMBOL(lqos_recalc_weight); diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c index 04d6acc..60c50a0 100644 --- a/fs/lustre/obdclass/lu_tgt_descs.c +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -35,6 +35,7 @@ #include #include +#include #include #include #include @@ -42,17 +43,221 @@ #include /** + * lu_prandom_u64_max - returns a pseudo-random u64 number in interval + * [0, ep_ro) + * + * @ep_ro right open interval endpoint + * + * Return: a pseudo-random 64-bit number that is in interval [0, ep_ro). + */ +u64 lu_prandom_u64_max(u64 ep_ro) +{ + u64 rand = 0; + + if (ep_ro) { +#if BITS_PER_LONG == 32 + /* + * If ep_ro > 32-bit, first generate the high + * 32 bits of the random number, then add in the low + * 32 bits (truncated to the upper limit, if needed) + */ + if (ep_ro > 0xffffffffULL) + rand = prandom_u32_max((u32)(ep_ro >> 32)) << 32; + + if (rand == (ep_ro & 0xffffffff00000000ULL)) + rand |= prandom_u32_max((u32)ep_ro); + else + rand |= prandom_u32(); +#else + rand = ((u64)prandom_u32() << 32 | prandom_u32()) % ep_ro; +#endif + } + + return rand; +} +EXPORT_SYMBOL(lu_prandom_u64_max); + +void lu_qos_rr_init(struct lu_qos_rr *lqr) +{ + spin_lock_init(&lqr->lqr_alloc); + lqr->lqr_dirty = 1; +} +EXPORT_SYMBOL(lu_qos_rr_init); + +/** + * Add a new target to Quality of Service (QoS) target table. + * + * Add a new MDT/OST target to the structure representing an OSS. Resort the + * list of known MDSs/OSSs by the number of MDTs/OSTs attached to each MDS/OSS. + * The MDS/OSS list is protected internally and no external locking is required. + * + * @qos lu_qos data + * @tgt target description + * + * Return: 0 on success + * -ENOMEM on error + */ +int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *tgt) +{ + struct lu_svr_qos *svr = NULL; + struct lu_svr_qos *tempsvr; + struct obd_export *exp = tgt->ltd_exp; + int found = 0; + u32 id = 0; + int rc = 0; + + /* tgt not connected, this function will be called again later */ + if (!exp) + return 0; + + down_write(&qos->lq_rw_sem); + /* + * a bit hacky approach to learn NID of corresponding connection + * but there is no official API to access information like this + * with OSD API. + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (obd_uuid_equals(&svr->lsq_uuid, + &exp->exp_connection->c_remote_uuid)) { + found++; + break; + } + if (svr->lsq_id > id) + id = svr->lsq_id; + } + + if (!found) { + svr = kzalloc(sizeof(*svr), GFP_NOFS); + if (!svr) { + rc = -ENOMEM; + goto out; + } + memcpy(&svr->lsq_uuid, &exp->exp_connection->c_remote_uuid, + sizeof(svr->lsq_uuid)); + ++id; + svr->lsq_id = id; + } else { + /* Assume we have to move this one */ + list_del(&svr->lsq_svr_list); + } + + svr->lsq_tgt_count++; + tgt->ltd_qos.ltq_svr = svr; + + CDEBUG(D_OTHER, "add tgt %s to server %s (%d targets)\n", + obd_uuid2str(&tgt->ltd_uuid), obd_uuid2str(&svr->lsq_uuid), + svr->lsq_tgt_count); + + /* + * Add sorted by # of tgts. Find the first entry that we're + * bigger than... + */ + list_for_each_entry(tempsvr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_tgt_count > tempsvr->lsq_tgt_count) + break; + } + /* + * ...and add before it. If we're the first or smallest, tempsvr + * points to the list head, and we add to the end. + */ + list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list); + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; + +out: + up_write(&qos->lq_rw_sem); + return rc; +} +EXPORT_SYMBOL(lu_qos_add_tgt); + +/** + * Remove MDT/OST target from QoS table. + * + * Removes given MDT/OST target from QoS table and releases related + * MDS/OSS structure if no target remain on the MDS/OSS. + * + * @qos lu_qos data + * @ltd target description + * + * Return: 0 on success + * -ENOENT if no server was found + */ +static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) +{ + struct lu_svr_qos *svr; + int rc = 0; + + down_write(&qos->lq_rw_sem); + svr = ltd->ltd_qos.ltq_svr; + if (!svr) { + rc = -ENOENT; + goto out; + } + + svr->lsq_tgt_count--; + if (svr->lsq_tgt_count == 0) { + CDEBUG(D_OTHER, "removing server %s\n", + obd_uuid2str(&svr->lsq_uuid)); + list_del(&svr->lsq_svr_list); + ltd->ltd_qos.ltq_svr = NULL; + kfree(svr); + } + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; +out: + up_write(&qos->lq_rw_sem); + return rc; +} + +static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) +{ + struct obd_statfs *statfs = &tgt->ltd_statfs; + + return statfs->os_bavail * statfs->os_bsize; +} + +static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) +{ + return tgt->ltd_statfs.os_ffree; +} + +/** + * Calculate weight for a given tgt. + * + * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server + * penalties. See ltd_qos_penalties_calc() for how penalties are calculated. + * + * @tgt target descriptor + */ +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt) +{ + struct lu_tgt_qos *ltq = &tgt->ltd_qos; + u64 temp, temp2; + + temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); + temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; + if (temp < temp2) + ltq->ltq_weight = 0; + else + ltq->ltq_weight = temp - temp2; +} +EXPORT_SYMBOL(lu_tgt_qos_weight_calc); + +/** * Allocate and initialize target table. * * A helper function to initialize the target table and allocate * a bitmap of the available targets. * * @ltd target's table to initialize + * @is_mdt target table for MDTs * * Return: 0 on success * negated errno on error **/ -int lu_tgt_descs_init(struct lu_tgt_descs *ltd) +int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt) { mutex_init(<d->ltd_mutex); init_rwsem(<d->ltd_rw_sem); @@ -66,11 +271,22 @@ int lu_tgt_descs_init(struct lu_tgt_descs *ltd) return -ENOMEM; ltd->ltd_tgts_size = BITS_PER_LONG; - ltd->ltd_tgtnr = 0; - ltd->ltd_death_row = 0; ltd->ltd_refcount = 0; + /* Set up allocation policy (QoS and RR) */ + INIT_LIST_HEAD(<d->ltd_qos.lq_svr_list); + init_rwsem(<d->ltd_qos.lq_rw_sem); + ltd->ltd_qos.lq_dirty = 1; + ltd->ltd_qos.lq_reset = 1; + /* Default priority is toward free space balance */ + ltd->ltd_qos.lq_prio_free = 232; + /* Default threshold for rr (roughly 17%) */ + ltd->ltd_qos.lq_threshold_rr = 43; + ltd->ltd_is_mdt = is_mdt; + + lu_qos_rr_init(<d->ltd_qos.lq_rr); + return 0; } EXPORT_SYMBOL(lu_tgt_descs_init); @@ -147,7 +363,7 @@ static int lu_tgt_descs_resize(struct lu_tgt_descs *ltd, u32 newsize) * -ENOMEM if reallocation failed * -EEXIST if target existed */ -int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +int ltd_add_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) { u32 index = tgt->ltd_index; int rc; @@ -174,19 +390,294 @@ int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) LTD_TGT(ltd, tgt->ltd_index) = tgt; set_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); - ltd->ltd_tgtnr++; + + ltd->ltd_lov_desc.ld_tgt_count++; + if (tgt->ltd_active) + ltd->ltd_lov_desc.ld_active_tgt_count++; return 0; } -EXPORT_SYMBOL(lu_tgt_descs_add); +EXPORT_SYMBOL(ltd_add_tgt); /** * Delete target from target table */ -void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) { + lu_qos_del_tgt(<d->ltd_qos, tgt); LTD_TGT(ltd, tgt->ltd_index) = NULL; clear_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); - ltd->ltd_tgtnr--; + ltd->ltd_lov_desc.ld_tgt_count--; + if (tgt->ltd_active) + ltd->ltd_lov_desc.ld_active_tgt_count--; +} +EXPORT_SYMBOL(ltd_del_tgt); + +/** + * Whether QoS data is up-to-date and QoS can be applied. + */ +bool ltd_qos_is_usable(struct lu_tgt_descs *ltd) +{ + if (!ltd->ltd_qos.lq_dirty && ltd->ltd_qos.lq_same_space) + return false; + + if (ltd->ltd_lov_desc.ld_active_tgt_count < 2) + return false; + + return true; +} +EXPORT_SYMBOL(ltd_qos_is_usable); + +/** + * Calculate penalties per-tgt and per-server + * + * Re-calculate penalties when the configuration changes, active targets + * change and after statfs refresh (all these are reflected by lq_dirty flag). + * On every tgt and server: decay the penalty by half for every 8x the update + * interval that the device has been idle. That gives lots of time for the + * statfs information to be updated (which the penalty is only a proxy for), + * and avoids penalizing server/tgt under light load. + * See lu_qos_tgt_weight_calc() for how penalties are factored into the weight. + * + * \param[in] ltd lu_tgt_descs + * + * \retval 0 on success + * \retval -EAGAIN the number of tgt isn't enough or all tgt spaces are + * almost the same + */ +int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) +{ + struct lu_qos *qos = <d->ltd_qos; + struct lov_desc *desc = <d->ltd_lov_desc; + struct lu_tgt_desc *tgt; + struct lu_svr_qos *svr; + u64 ba_max, ba_min, ba; + u64 ia_max, ia_min, ia = 1; + u32 num_active; + int prio_wide; + time64_t now, age; + int rc; + + if (!qos->lq_dirty) { + rc = 0; + goto out; + } + + num_active = desc->ld_active_tgt_count - 1; + if (num_active < 1) { + rc = -EAGAIN; + goto out; + } + + /* find bavail on each server */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + svr->lsq_bavail = 0; + /* if inode is not counted, set to 1 to ignore */ + svr->lsq_iavail = ltd->ltd_is_mdt ? 0 : 1; + } + qos->lq_active_svr_count = 0; + + /* + * How badly user wants to select targets "widely" (not recently chosen + * and not on recent MDS's). As opposed to "freely" (free space avail.) + * 0-256 + */ + prio_wide = 256 - qos->lq_prio_free; + + ba_min = (u64)(-1); + ba_max = 0; + ia_min = (u64)(-1); + ia_max = 0; + now = ktime_get_real_seconds(); + + /* Calculate server penalty per object */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + /* when inode is counted, bavail >> 16 to avoid overflow */ + ba = tgt_statfs_bavail(tgt); + if (ltd->ltd_is_mdt) + ba >>= 16; + else + ba >>= 8; + if (!ba) + continue; + + ba_min = min(ba, ba_min); + ba_max = max(ba, ba_max); + + /* Count the number of usable servers */ + if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) + qos->lq_active_svr_count++; + tgt->ltd_qos.ltq_svr->lsq_bavail += ba; + + if (ltd->ltd_is_mdt) { + /* iavail >> 8 to avoid overflow */ + ia = tgt_statfs_iavail(tgt) >> 8; + if (!ia) + continue; + + ia_min = min(ia, ia_min); + ia_max = max(ia, ia_max); + + tgt->ltd_qos.ltq_svr->lsq_iavail += ia; + } + + /* + * per-tgt penalty is + * prio * bavail * iavail / (num_tgt - 1) / 2 + */ + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); + tgt->ltd_qos.ltq_penalty_per_obj >>= 1; + + age = (now - tgt->ltd_qos.ltq_used) >> 3; + if (qos->lq_reset || age > 32 * desc->ld_qos_maxage) + tgt->ltd_qos.ltq_penalty = 0; + else if (age > desc->ld_qos_maxage) + /* Decay tgt penalty. */ + tgt->ltd_qos.ltq_penalty >>= age / desc->ld_qos_maxage; + } + + num_active = qos->lq_active_svr_count - 1; + if (num_active < 1) { + /* + * If there's only 1 server, we can't penalize it, so instead + * we have to double the tgt penalty + */ + num_active = 1; + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_penalty_per_obj <<= 1; + } + } + + /* + * Per-server penalty is + * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + ba = svr->lsq_bavail; + ia = svr->lsq_iavail; + svr->lsq_penalty_per_obj = prio_wide * ba * ia; + do_div(ba, svr->lsq_tgt_count * num_active); + svr->lsq_penalty_per_obj >>= 1; + + age = (now - svr->lsq_used) >> 3; + if (qos->lq_reset || age > 32 * desc->ld_qos_maxage) + svr->lsq_penalty = 0; + else if (age > desc->ld_qos_maxage) + /* Decay server penalty. */ + svr->lsq_penalty >>= age / desc->ld_qos_maxage; + } + + qos->lq_dirty = 0; + qos->lq_reset = 0; + + /* + * If each tgt has almost same free space, do rr allocation for better + * creation performance + */ + qos->lq_same_space = 0; + if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && + (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { + qos->lq_same_space = 1; + /* Reset weights for the next time we enter qos mode */ + qos->lq_reset = 1; + } + rc = 0; + +out: + if (!rc && qos->lq_same_space) + return -EAGAIN; + + return rc; +} +EXPORT_SYMBOL(ltd_qos_penalties_calc); + +/** + * Re-calculate penalties and weights of all tgts. + * + * The function is called when some target was used for a new object. In + * this case we should re-calculate all the weights to keep new allocations + * balanced well. + * + * \param[in] ltd lu_tgt_descs + * \param[in] tgt recently used tgt + * \param[out] total_wt new total weight for the pool + * + * \retval 0 + */ +int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, + u64 *total_wt) +{ + struct lu_qos *qos = <d->ltd_qos; + struct lu_tgt_qos *ltq; + struct lu_svr_qos *svr; + + ltq = &tgt->ltd_qos; + LASSERT(ltq); + + /* Don't allocate on this device anymore, until the next alloc_qos */ + ltq->ltq_usable = 0; + + svr = ltq->ltq_svr; + + /* + * Decay old penalty by half (we're adding max penalty, and don't + * want it to run away.) + */ + ltq->ltq_penalty >>= 1; + svr->lsq_penalty >>= 1; + + /* mark the server and tgt as recently used */ + ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); + + /* Set max penalties for this tgt and server */ + ltq->ltq_penalty += ltq->ltq_penalty_per_obj * + ltd->ltd_lov_desc.ld_active_tgt_count; + svr->lsq_penalty += svr->lsq_penalty_per_obj * + ltd->ltd_lov_desc.ld_active_tgt_count; + + /* Decrease all MDS penalties */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_penalty < svr->lsq_penalty_per_obj) + svr->lsq_penalty = 0; + else + svr->lsq_penalty -= svr->lsq_penalty_per_obj; + } + + *total_wt = 0; + /* Decrease all tgt penalties */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) + ltq->ltq_penalty = 0; + else + ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; + + lu_tgt_qos_weight_calc(tgt); + + /* Recalc the total weight of usable osts */ + if (ltq->ltq_usable) + *total_wt += ltq->ltq_weight; + + CDEBUG(D_OTHER, + "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", + tgt->ltd_index, ltq->ltq_usable, + tgt_statfs_bavail(tgt) >> 10, + ltq->ltq_penalty_per_obj >> 10, + ltq->ltq_penalty >> 10, + ltq->ltq_svr->lsq_penalty_per_obj >> 10, + ltq->ltq_svr->lsq_penalty >> 10, + ltq->ltq_weight >> 10); + } + + return 0; } -EXPORT_SYMBOL(lu_tgt_descs_del); +EXPORT_SYMBOL(ltd_qos_update); From patchwork Thu Feb 27 21:16:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7478D924 for ; Thu, 27 Feb 2020 21:46:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5CA4F24690 for ; Thu, 27 Feb 2020 21:46:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5CA4F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB263349874; Thu, 27 Feb 2020 13:36:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0B39321FCD8 for ; Thu, 27 Feb 2020 13:20:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 38A779193; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 37885468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:14 -0500 Message-Id: <1582838290-17243-507-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 506/622] lustre: ptlrpc: Properly swab ll_fiemap_info_key X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin It was using lustre_swab_fiemap which is incorrect since the structures don't match. Added lustre_swab_fiemap_info_key that swabs embedded obdo and ll_fiemap_info_key structures. WC-bug-id: https://jira.whamcloud.com/browse/LU-11997 Lustre-commit: 2b905746ee3b ("LU-11997 ptlrpc: Properly swab ll_fiemap_info_key") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/36308 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Signed-off-by: James Simmons --- fs/lustre/include/lustre_swab.h | 1 + fs/lustre/ptlrpc/layout.c | 4 ++-- fs/lustre/ptlrpc/pack_generic.c | 17 ++++++++++++++--- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/lustre/include/lustre_swab.h b/fs/lustre/include/lustre_swab.h index dd3c50c..a5c1de5 100644 --- a/fs/lustre/include/lustre_swab.h +++ b/fs/lustre/include/lustre_swab.h @@ -81,6 +81,7 @@ void lustre_swab_ost_body(struct ost_body *b); void lustre_swab_ost_last_id(u64 *id); void lustre_swab_fiemap(struct fiemap *fiemap); +void lustre_swab_fiemap_info_key(struct ll_fiemap_info_key *fiemap_info); void lustre_swab_lov_user_md_v1(struct lov_user_md_v1 *lum); void lustre_swab_lov_user_md_v3(struct lov_user_md_v3 *lum); void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index dd04eee..06db86d 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -1134,8 +1134,8 @@ struct req_msg_field RMF_OST_ID = EXPORT_SYMBOL(RMF_OST_ID); struct req_msg_field RMF_FIEMAP_KEY = - DEFINE_MSGF("fiemap", 0, sizeof(struct ll_fiemap_info_key), - lustre_swab_fiemap, NULL); + DEFINE_MSGF("fiemap_key", 0, sizeof(struct ll_fiemap_info_key), + lustre_swab_fiemap_info_key, NULL); EXPORT_SYMBOL(RMF_FIEMAP_KEY); struct req_msg_field RMF_FIEMAP_VAL = diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 9b28624..b569d57 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -1913,21 +1913,32 @@ static void lustre_swab_fiemap_extent(struct fiemap_extent *fm_extent) __swab32s(&fm_extent->fe_device); } -void lustre_swab_fiemap(struct fiemap *fiemap) +static void lustre_swab_fiemap_hdr(struct fiemap *fiemap) { - u32 i; - __swab64s(&fiemap->fm_start); __swab64s(&fiemap->fm_length); __swab32s(&fiemap->fm_flags); __swab32s(&fiemap->fm_mapped_extents); __swab32s(&fiemap->fm_extent_count); __swab32s(&fiemap->fm_reserved); +} + +void lustre_swab_fiemap(struct fiemap *fiemap) +{ + u32 i; + + lustre_swab_fiemap_hdr(fiemap); for (i = 0; i < fiemap->fm_mapped_extents; i++) lustre_swab_fiemap_extent(&fiemap->fm_extents[i]); } +void lustre_swab_fiemap_info_key(struct ll_fiemap_info_key *fiemap_info) +{ + lustre_swab_obdo(&fiemap_info->lfik_oa); + lustre_swab_fiemap_hdr(&fiemap_info->lfik_fiemap); +} + void lustre_swab_mdt_rec_reint (struct mdt_rec_reint *rr) { __swab32s(&rr->rr_opcode); From patchwork Thu Feb 27 21:16:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410911 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4CAAC1580 for ; Thu, 27 Feb 2020 21:50:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3485D24692 for ; Thu, 27 Feb 2020 21:50:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3485D24692 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3B5223491A5; Thu, 27 Feb 2020 13:43:43 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6423021FCD8 for ; Thu, 27 Feb 2020 13:20:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3B3559194; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3A5A646A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:15 -0500 Message-Id: <1582838290-17243-508-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 507/622] lustre: llite: clear flock when using localflock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger When mounting a client with "-o localflock" or equivalent option in /etc/fstab, it does not clear out the "flock" mount option flag from the superblock. This results in "flock" still being the option used and it displays both options in the /proc/mounts output: 10.0.0.1@o2ib:/lfs on /mnt/lfs type lustre (rw,flock,localflock) Mount a client with both "flock,localflock" as mount options and verify that the "flock" option is cleared by "localflock", and vice versa. Verify that "noflock" clears both options. Remove the "remount_client()" helper in conf-sanity.sh, since this shadows a helper function of the same name in test-framework.sh and is confusing. Instead, use "mount_client()" now that it can accept mount options, and just pass "remount" explicitly in a few places. Fixes: 083c51418b67 ("lustre: llite: enable flock mount option by default") WC-bug-id: https://jira.whamcloud.com/browse/LU-12859 Lustre-commit: 22ee4a1f64ec ("LU-12859 llite: clear flock when using localflock") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36452 Reviewed-by: Ben Evans Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 4580be3..49490ee 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -823,12 +823,12 @@ static int ll_options(char *options, struct ll_sb_info *sbi) } tmp = ll_set_opt("flock", s1, LL_SBI_FLOCK); if (tmp) { - *flags |= tmp; + *flags = (*flags & ~LL_SBI_LOCALFLOCK) | tmp; goto next; } tmp = ll_set_opt("localflock", s1, LL_SBI_LOCALFLOCK); if (tmp) { - *flags |= tmp; + *flags = (*flags & ~LL_SBI_FLOCK) | tmp; goto next; } tmp = ll_set_opt("noflock", s1, @@ -2672,11 +2672,16 @@ int ll_show_options(struct seq_file *seq, struct dentry *dentry) if (sbi->ll_flags & LL_SBI_NOLCK) seq_puts(seq, ",nolock"); + /* "flock" is the default since 2.13, but it wasn't for many years, + * so it is still useful to print this to show it is enabled. + * Start to print "noflock" so it is now clear when flock is disabled. + */ if (sbi->ll_flags & LL_SBI_FLOCK) seq_puts(seq, ",flock"); - - if (sbi->ll_flags & LL_SBI_LOCALFLOCK) + else if (sbi->ll_flags & LL_SBI_LOCALFLOCK) seq_puts(seq, ",localflock"); + else + seq_puts(seq, ",noflock"); if (sbi->ll_flags & LL_SBI_USER_XATTR) seq_puts(seq, ",user_xattr"); From patchwork Thu Feb 27 21:16:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410891 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA5581580 for ; Thu, 27 Feb 2020 21:50:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C1AD124690 for ; Thu, 27 Feb 2020 21:50:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1AD124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC20734AA80; Thu, 27 Feb 2020 13:41:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A62CD34887B for ; Thu, 27 Feb 2020 13:20:56 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3E2D39195; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3D1CB46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:16 -0500 Message-Id: <1582838290-17243-509-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 508/622] lustre: sec: reserve flags for client side encryption X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Reserve OBD_CONNECT2_ENC connection flag so that 'encrypt' or 'test_dummy_encryption' client mount options can only be used if server side knows how to handle encrypted object size properly. WC-bug-id: https://jira.whamcloud.com/browse/LU-12275 Lustre-commit: 4f9632f97011 ("LU-12275 sec: reserve flags for client side encryption") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/36360 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 8 ++++---- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index ca169ec..98d1e3b 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -126,6 +126,7 @@ "pcc", /* 0x1000 */ "plain_layout", /* 0x2000 */ "async_discard", /* 0x4000 */ + "client_encryption", /* 0x8000 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index c0b4ad9..da51dc1 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1160,6 +1160,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_PCC); LASSERTF(OBD_CONNECT2_ASYNC_DISCARD == 0x4000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ASYNC_DISCARD); + LASSERTF(OBD_CONNECT2_ENCRYPT == 0x8000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ENCRYPT); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index d4b29d8..4277ac6 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -813,15 +813,15 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* support async DoM data * discard */ - +#define OBD_CONNECT2_ENCRYPT 0x8000ULL /* client-to-disk encrypt */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same * flag value is not in use on some other branch. Please clear any such * changes with senior engineers before starting to use a new flag. Then, * submit a small patch against EVERY branch that ONLY adds the new flag, - * updates obd_connect_names[] for lprocfs_rd_connect_flags(), adds the - * flag to check_obd_connect_data(), and updates wiretests accordingly, so it - * can be approved and landed easily to reserve the flag for future use. + * updates obd_connect_names[], adds the flag to check_obd_connect_data(), + * and updates wiretests accordingly, so it can be approved and landed easily + * to reserve the flag for future use. */ /* The MNE_SWAB flag is overloading the MDS_MDS bit only for the MGS From patchwork Thu Feb 27 21:16:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44D121580 for ; Thu, 27 Feb 2020 21:46:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2CA9624690 for ; Thu, 27 Feb 2020 21:46:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2CA9624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3F4EF34B1EB; Thu, 27 Feb 2020 13:36:47 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 07E78348887 for ; Thu, 27 Feb 2020 13:20:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 40E369196; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3FDA846D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:17 -0500 Message-Id: <1582838290-17243-510-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 509/622] lustre: llite: limit max xattr size by kernel value X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Limit the maximum xattr size returned to userspace from the MDS to what the currently-running kernel supports (XATTR_SIZE_MAX=65536 bytes typically). While it is possible a Lustre backing filesystem may store larger xattrs than this, it wouldn't be possible for users to access a larger xattr via kernel xattr interfaces. This fixes interop problems when newer clients and tests are running against older servers: sanity.sh: line 8946: /usr/bin/setfattr: Argument list too long Skip subtests for new features in 2.13 so 2.12 interop testing passes. Fix test-framework.sh::large_xattr_enabled() to return true for ZFS. Fix test-framework.sh::max_xattr_size() to return the actual value returned from the MDS rather than computing it locally. Fixes: 4c9f501e6d5 ("lustre: osd: Set max ea size to XATTR_SIZE_MAX") WC-bug-id: https://jira.whamcloud.com/browse/LU-12784 Lustre-commit: 84097792f56c ("LU-12784 llite: limit max xattr size by kernel value") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36240 Reviewed-by: Wang Shilong Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/llite/lproc_llite.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index c2ec3fb..439c096 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -925,7 +925,9 @@ static ssize_t max_easize_show(struct kobject *kobj, if (rc) return rc; - return sprintf(buf, "%u\n", ealen); + /* Limit xattr size returned to userspace based on kernel maximum */ + return snprintf(buf, PAGE_SIZE, "%u\n", + ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen); } LUSTRE_RO_ATTR(max_easize); @@ -954,7 +956,9 @@ static ssize_t default_easize_show(struct kobject *kobj, if (rc) return rc; - return sprintf(buf, "%u\n", ealen); + /* Limit xattr size returned to userspace based on kernel maximum */ + return snprintf(buf, PAGE_SIZE, "%u\n", + ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen); } /** From patchwork Thu Feb 27 21:16:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20F801580 for ; Thu, 27 Feb 2020 21:46:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08F3324690 for ; Thu, 27 Feb 2020 21:46:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08F3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 13E1134B27B; Thu, 27 Feb 2020 13:36:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 49427348887 for ; Thu, 27 Feb 2020 13:20:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 439BD9197; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4281247C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:18 -0500 Message-Id: <1582838290-17243-511-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 510/622] lustre: ptlrpc: return proper error code X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev from ptlrpc_disconnect_prep_req() using ERR_PTR() as the callers expect. Fixes: 4b102da53ad ("lustre: ptlrpc: idle connections can disconnect") WC-bug-id: https://jira.whamcloud.com/browse/LU-12799 Lustre-commit: 9e2620d75cce ("LU-12799 ptlrpc: return proper error code") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/36282 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index c4a732d..76a40be 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1571,7 +1571,7 @@ static struct ptlrpc_request *ptlrpc_disconnect_prep_req(struct obd_import *imp) req = ptlrpc_request_alloc_pack(imp, &RQF_MDS_DISCONNECT, LUSTRE_OBD_VERSION, rq_opc); if (!req) - return NULL; + return ERR_PTR(-ENOMEM); /* We are disconnecting, do not retry a failed DISCONNECT rpc if * it fails. We can get through the above with a down server From patchwork Thu Feb 27 21:16:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A4E61580 for ; Thu, 27 Feb 2020 21:48:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E6EB524690 for ; Thu, 27 Feb 2020 21:48:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6EB524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 529CA34A49A; Thu, 27 Feb 2020 13:39:00 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8AF83348887 for ; Thu, 27 Feb 2020 13:20:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 47AC69198; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 45387468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:19 -0500 Message-Id: <1582838290-17243-512-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 511/622] lnet: fix peer_ni selection X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When selecting a peer-ni we must use the same peer NID through all the messages which belong to the same RPC. This is necessary in order to ensure we do the RDMA over the optimal interface. WC-bug-id: https://jira.whamcloud.com/browse/LU-12893 Lustre-commit: 94ee26738884 ("LU-12893 lnet: fix peer_ni selection") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/36552 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 6da0be4..b8278ad 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1710,8 +1710,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * Local Destination * MR Peer * - * Run the selection algorithm on the peer NIs unless we're sending - * a response, in this case just send to the destination + * Don't run the selection algorithm on the peer NIs. By specifying the + * local NID, we're also saying that we should always use the destination NID + * provided. This handles the case where we should be using the same + * destination NID for the all the messages which belong to the same RPC + * request. */ static int lnet_handle_spec_local_mr_dst(struct lnet_send_data *sd) @@ -1724,16 +1727,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return -EINVAL; } - /* only run the selection algorithm to pick the peer_ni if we're - * sending a GET or a PUT. Responses are sent to the same - * destination NID provided. - */ - if (!(sd->sd_send_case & SND_RESP)) { - sd->sd_best_lpni = - lnet_find_best_lpni_on_net(sd, sd->sd_peer, - sd->sd_best_ni->ni_net->net_id); - } - if (sd->sd_best_lpni && sd->sd_best_lpni->lpni_nid == the_lnet.ln_loni->ni_nid) return lnet_handle_lo_send(sd); From patchwork Thu Feb 27 21:16:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410909 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A379924 for ; Thu, 27 Feb 2020 21:50:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 02B5C24695 for ; Thu, 27 Feb 2020 21:50:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02B5C24695 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EA2E9348F44; Thu, 27 Feb 2020 13:43:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CDBD4348895 for ; Thu, 27 Feb 2020 13:20:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 494D59199; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 483D146A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:20 -0500 Message-Id: <1582838290-17243-513-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 512/622] lustre: pcc: Auto attach for PCC during IO X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin PCC uses the layout lock to protect the cache validity. Currently PCC only supports auto attach at the next open. However, the layout lock can be revoked at any time by LRU/manual lock shrinking or lock conflict callback. For example, the layout lock can be revoked when performing I/Os after opened the file. At this time, the cached file will be detached involuntary. The I/O originally directed into PCC will redirect to OSTs after the data restore into OSTs' objects. The cost of this unwilling behavior may be expensive. To avoid this problem, this patch implements auto attach for PCC even during IOs (not only at the open time). For debug purpose, now we have three auto attach options: - open_attach: auto attach at the next open; - io_attach: auto attach during IO - stat_attach: auto attach at stat() call. The reason to add the stat_attach option is that: when check PCC state via "lfs pcc state", it will not only open the file but also stat() on the file, to verify the feature of auto attach during IO, we need to both disable open_attach and stat_attach. And all these auto attach options are enabled by default. This patch also fixed the bug for auto cache at create time: In the current Lustre, the truncate operation will revoke the LOOKUP ibits lock, and the file dentry cache will be invalidated. The following open with O_CREAT flag will call into ->atomic_open, the file was wrongly though as newly created file and try to auto cache the file. So after client known it is not a DISP_OPEN_CREATE, it should cleanup the already created PCC copy. WC-bug-id: https://jira.whamcloud.com/browse/LU-12526 Lustre-commit: a120bb135257 ("LU-12526 pcc: Auto attach for PCC during IO") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/36005 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 43 +++++---- fs/lustre/llite/pcc.c | 157 ++++++++++++++++++++++++++------ fs/lustre/llite/pcc.h | 45 +++++++-- include/uapi/linux/lustre/lustre_idl.h | 1 + include/uapi/linux/lustre/lustre_user.h | 8 ++ 5 files changed, 199 insertions(+), 55 deletions(-) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index ce72910..f4ca16e 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -696,11 +696,6 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, return rc; } -struct pcc_create_attach { - struct pcc_dataset *pca_dataset; - struct dentry *pca_dentry; -}; - static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, struct lookup_intent *it, void **secctx, u32 *secctxlen, @@ -950,8 +945,7 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, u32 secctxlen = 0; struct dentry *de; struct ll_sb_info *sbi; - struct pcc_create_attach pca = {NULL, NULL}; - struct pcc_dataset *dataset = NULL; + struct pcc_create_attach pca = { NULL, NULL }; int rc = 0; CDEBUG(D_VFSTRACE, @@ -988,6 +982,7 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, if (!filename_is_volatile(dentry->d_name.name, dentry->d_name.len, NULL)) { struct pcc_matcher item; + struct pcc_dataset *dataset; item.pm_uid = from_kuid(&init_user_ns, current_uid()); item.pm_gid = from_kgid(&init_user_ns, current_gid()); @@ -1020,18 +1015,30 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, dput(de); goto out_release; } - if (dataset && dentry->d_inode) { - rc = pcc_inode_create_fini(dataset, - dentry->d_inode, - pca.pca_dentry); - if (rc) { - if (de) - dput(de); - goto out_release; - } + + rc = pcc_inode_create_fini(dentry->d_inode, &pca); + if (rc) { + if (de) + dput(de); + goto out_release; } file->f_mode |= FMODE_CREATED; + } else { + /* Open the file with O_CREAT, but the file already + * existed on MDT. This may happened in the case that + * the LOOKUP ibits lock is revoked and the + * corresponding dentry cache is deleted. + * i.e. In the current Lustre, the truncate operation + * will revoke the LOOKUP ibits lock, and the file + * dentry cache will be invalidated. The following open + * with O_CREAT flag will call into ->atomic_open, the + * file was wrongly though as newly created file and + * try to auto cache the file. So after client knows it + * is not a DISP_OPEN_CREATE, it should cleanup the + * already created PCC copy. + */ + pcc_create_attach_cleanup(dir->i_sb, &pca); } if (d_really_is_positive(dentry) && @@ -1055,11 +1062,11 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, } else { rc = finish_no_open(file, de); } + } else { + pcc_create_attach_cleanup(dir->i_sb, &pca); } out_release: - if (dataset) - pcc_dataset_put(dataset); ll_intent_release(it); kfree(it); diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index c8c2442..b926f87 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -472,12 +472,30 @@ static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) if (id <= 0) return -EINVAL; cmd->u.pccc_add.pccc_roid = id; + } else if (strcmp(key, "auto_attach") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id == 0) + cmd->u.pccc_add.pccc_flags &= ~PCC_DATASET_AUTO_ATTACH; } else if (strcmp(key, "open_attach") == 0) { rc = kstrtoul(val, 10, &id); if (rc) return rc; - if (id > 0) - cmd->u.pccc_add.pccc_flags |= PCC_DATASET_OPEN_ATTACH; + if (id == 0) + cmd->u.pccc_add.pccc_flags &= ~PCC_DATASET_OPEN_ATTACH; + } else if (strcmp(key, "io_attach") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id == 0) + cmd->u.pccc_add.pccc_flags &= ~PCC_DATASET_IO_ATTACH; + } else if (strcmp(key, "stat_attach") == 0) { + rc = kstrtoul(val, 10, &id); + if (rc) + return rc; + if (id == 0) + cmd->u.pccc_add.pccc_flags &= ~PCC_DATASET_STAT_ATTACH; } else if (strcmp(key, "rwpcc") == 0) { rc = kstrtoul(val, 10, &id); if (rc) @@ -504,6 +522,18 @@ static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) char *token; int rc; + switch (cmd->pccc_cmd) { + case PCC_ADD_DATASET: + /* Enable auto attach by default */ + cmd->u.pccc_add.pccc_flags |= PCC_DATASET_AUTO_ATTACH; + break; + case PCC_DEL_DATASET: + case PCC_CLEAR_ALL: + break; + default: + return -EINVAL; + } + val = buffer; while (val && strlen(val) != 0) { token = strsep(&val, " "); @@ -1002,7 +1032,6 @@ static void pcc_inode_init(struct pcc_inode *pcci, struct ll_inode_info *lli) { pcci->pcci_lli = lli; lli->lli_pcc_inode = pcci; - lli->lli_pcc_state = PCC_STATE_FL_NONE; atomic_set(&pcci->pcci_refcount, 0); pcci->pcci_type = LU_PCC_NONE; pcci->pcci_layout_gen = CL_LAYOUT_GEN_NONE; @@ -1072,9 +1101,9 @@ void pcc_file_init(struct pcc_file *pccf) pccf->pccf_type = LU_PCC_NONE; } -static inline bool pcc_open_attach_enabled(struct pcc_dataset *dataset) +static inline bool pcc_auto_attach_enabled(struct pcc_dataset *dataset) { - return dataset->pccd_flags & PCC_DATASET_OPEN_ATTACH; + return dataset->pccd_flags & PCC_DATASET_AUTO_ATTACH; } static const char pcc_xattr_layout[] = XATTR_USER_PREFIX "PCC.layout"; @@ -1085,7 +1114,7 @@ static int pcc_layout_xattr_set(struct pcc_inode *pcci, u32 gen) struct ll_inode_info *lli = pcci->pcci_lli; int rc; - if (!(lli->lli_pcc_state & PCC_STATE_FL_OPEN_ATTACH)) + if (!(lli->lli_pcc_state & PCC_STATE_FL_AUTO_ATTACH)) return 0; rc = __vfs_setxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout, @@ -1137,6 +1166,8 @@ static void pcc_inode_attach_init(struct pcc_dataset *dataset, struct dentry *dentry, enum lu_pcc_type type) { + struct ll_inode_info *lli = pcci->pcci_lli; + pcci->pcci_path.mnt = mntget(dataset->pccd_path.mnt); pcci->pcci_path.dentry = dentry; LASSERT(atomic_read(&pcci->pcci_refcount) == 0); @@ -1144,11 +1175,12 @@ static void pcc_inode_attach_init(struct pcc_dataset *dataset, pcci->pcci_type = type; pcci->pcci_attr_valid = false; - if (pcc_open_attach_enabled(dataset)) { - struct ll_inode_info *lli = pcci->pcci_lli; - + if (dataset->pccd_flags & PCC_DATASET_OPEN_ATTACH) lli->lli_pcc_state |= PCC_STATE_FL_OPEN_ATTACH; - } + if (dataset->pccd_flags & PCC_DATASET_IO_ATTACH) + lli->lli_pcc_state |= PCC_STATE_FL_IO_ATTACH; + if (dataset->pccd_flags & PCC_DATASET_STAT_ATTACH) + lli->lli_pcc_state |= PCC_STATE_FL_STAT_ATTACH; } static inline void pcc_layout_gen_set(struct pcc_inode *pcci, @@ -1252,7 +1284,7 @@ static int pcc_try_datasets_attach(struct inode *inode, u32 gen, down_read(&super->pccs_rw_sem); list_for_each_entry_safe(dataset, tmp, &super->pccs_datasets, pccd_linkage) { - if (!pcc_open_attach_enabled(dataset)) + if (!pcc_auto_attach_enabled(dataset)) continue; rc = pcc_try_dataset_attach(inode, gen, type, dataset, cached); if (rc < 0 || (!rc && *cached)) @@ -1263,13 +1295,15 @@ static int pcc_try_datasets_attach(struct inode *inode, u32 gen, return rc; } -static int pcc_try_open_attach(struct inode *inode, bool *cached) +static int pcc_try_auto_attach(struct inode *inode, bool *cached, bool is_open) { struct pcc_super *super = &ll_i2sbi(inode)->ll_pcc_super; struct cl_layout clt = { .cl_layout_gen = 0, .cl_is_released = false, }; + struct ll_inode_info *lli = ll_i2info(inode); + u32 gen; int rc; /* @@ -1283,13 +1317,25 @@ static int pcc_try_open_attach(struct inode *inode, bool *cached) * obtain valid layout lock from MDT (i.e. the file is being * HSM restoring). */ - if (ll_layout_version_get(ll_i2info(inode)) == CL_LAYOUT_GEN_NONE) - return 0; + if (is_open) { + if (ll_layout_version_get(lli) == CL_LAYOUT_GEN_NONE) + return 0; + } else { + rc = ll_layout_refresh(inode, &gen); + if (rc) + return rc; + } rc = pcc_get_layout_info(inode, &clt); if (rc) return rc; + if (!is_open && gen != clt.cl_layout_gen) { + CDEBUG(D_CACHE, DFID" layout changed from %d to %d.\n", + PFID(ll_inode2fid(inode)), gen, clt.cl_layout_gen); + return -EINVAL; + } + if (clt.cl_is_released) rc = pcc_try_datasets_attach(inode, clt.cl_layout_gen, LU_PCC_READWRITE, cached); @@ -1319,7 +1365,9 @@ int pcc_file_open(struct inode *inode, struct file *file) goto out_unlock; if (!pcci || !pcc_inode_has_layout(pcci)) { - rc = pcc_try_open_attach(inode, &cached); + if (lli->lli_pcc_state & PCC_STATE_FL_OPEN_ATTACH) + rc = pcc_try_auto_attach(inode, &cached, true); + if (rc < 0 || !cached) goto out_unlock; @@ -1379,8 +1427,9 @@ void pcc_file_release(struct inode *inode, struct file *file) pcc_inode_unlock(inode); } -static void pcc_io_init(struct inode *inode, bool *cached) +static void pcc_io_init(struct inode *inode, enum pcc_io_type iot, bool *cached) { + struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci; pcc_inode_lock(inode); @@ -1391,6 +1440,17 @@ static void pcc_io_init(struct inode *inode, bool *cached) *cached = true; } else { *cached = false; + if ((lli->lli_pcc_state & PCC_STATE_FL_IO_ATTACH && + iot != PIT_GETATTR) || + (iot == PIT_GETATTR && + lli->lli_pcc_state & PCC_STATE_FL_STAT_ATTACH)) { + (void) pcc_try_auto_attach(inode, cached, false); + if (*cached) { + pcci = ll_i2pcci(inode); + LASSERT(atomic_read(&pcci->pcci_refcount) > 0); + atomic_inc(&pcci->pcci_active_ios); + } + } } pcc_inode_unlock(inode); } @@ -1418,7 +1478,7 @@ ssize_t pcc_file_read_iter(struct kiocb *iocb, return 0; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_READ, cached); if (!*cached) return 0; @@ -1453,7 +1513,7 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb, return -EAGAIN; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_WRITE, cached); if (!*cached) return 0; @@ -1489,7 +1549,7 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr, return 0; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_GETATTR, cached); if (!*cached) return 0; @@ -1523,7 +1583,7 @@ int pcc_inode_getattr(struct inode *inode, bool *cached) return 0; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_SETATTR, cached); if (!*cached) return 0; @@ -1585,7 +1645,7 @@ ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos, if (!file_inode(pcc_file)->i_fop->splice_read) return -ENOTSUPP; - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_SPLICE_READ, cached); if (!*cached) return 0; @@ -1610,7 +1670,7 @@ int pcc_fsync(struct file *file, loff_t start, loff_t end, return 0; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_FSYNC, cached); if (!*cached) return 0; @@ -1716,7 +1776,7 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, CDEBUG(D_MMAP, "%s: PCC backend fs not support ->page_mkwrite()\n", ll_i2sbi(inode)->ll_fsname); - pcc_ioctl_detach(inode, PCC_DETACH_OPT_NONE); + pcc_ioctl_detach(inode, PCC_DETACH_OPT_UNCACHE); up_read(&mm->mmap_sem); *cached = true; return VM_FAULT_RETRY | VM_FAULT_NOPAGE; @@ -1724,7 +1784,7 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, /* Pause to allow for a race with concurrent detach */ OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PCC_MKWRITE_PAUSE, cfs_fail_val); - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_PAGE_MKWRITE, cached); if (!*cached) { /* This happens when the file is detached from PCC after got * the fault page via ->fault() on the inode of the PCC copy. @@ -1757,7 +1817,7 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, */ if (OBD_FAIL_CHECK(OBD_FAIL_LLITE_PCC_DETACH_MKWRITE)) { pcc_io_fini(inode); - pcc_ioctl_detach(inode, PCC_DETACH_OPT_NONE); + pcc_ioctl_detach(inode, PCC_DETACH_OPT_UNCACHE); up_read(&mm->mmap_sem); return VM_FAULT_RETRY | VM_FAULT_NOPAGE; } @@ -1785,7 +1845,7 @@ int pcc_fault(struct vm_area_struct *vma, struct vm_fault *vmf, return 0; } - pcc_io_init(inode, cached); + pcc_io_init(inode, PIT_FAULT, cached); if (!*cached) return 0; @@ -1993,13 +2053,21 @@ int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, return rc; } -int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, - struct dentry *pcc_dentry) +int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca) { + struct dentry *pcc_dentry = pca->pca_dentry; const struct cred *old_cred; struct pcc_inode *pcci; int rc = 0; + if (!pca->pca_dataset) + return 0; + + if (!inode) + goto out_dataset_put; + + LASSERT(pcc_dentry); + old_cred = override_creds(pcc_super_cred(inode->i_sb)); pcc_inode_lock(inode); LASSERT(!ll_i2pcci(inode)); @@ -2015,7 +2083,8 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, goto out_put; pcc_inode_init(pcci, ll_i2info(inode)); - pcc_inode_attach_init(dataset, pcci, pcc_dentry, LU_PCC_READWRITE); + pcc_inode_attach_init(pca->pca_dataset, pcci, pcc_dentry, + LU_PCC_READWRITE); rc = pcc_layout_xattr_set(pcci, 0); if (rc) { @@ -2038,9 +2107,36 @@ int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, pcc_inode_unlock(inode); revert_creds(old_cred); +out_dataset_put: + pcc_dataset_put(pca->pca_dataset); return rc; } +void pcc_create_attach_cleanup(struct super_block *sb, + struct pcc_create_attach *pca) +{ + if (!pca->pca_dataset) + return; + + if (pca->pca_dentry) { + const struct cred *old_cred; + int rc; + + old_cred = override_creds(pcc_super_cred(sb)); + rc = vfs_unlink(pca->pca_dentry->d_parent->d_inode, + pca->pca_dentry, NULL); + if (rc) + CWARN("failed to unlink PCC file %.*s, rc = %d\n", + pca->pca_dentry->d_name.len, + pca->pca_dentry->d_name.name, rc); + /* ignore the unlink failure */ + revert_creds(old_cred); + dput(pca->pca_dentry); + } + + pcc_dataset_put(pca->pca_dataset); +} + static int pcc_filp_write(struct file *filp, const void *buf, ssize_t count, loff_t *offset) { @@ -2202,7 +2298,6 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, old_cred = override_creds(pcc_super_cred(inode->i_sb)); pcc_inode_lock(inode); pcci = ll_i2pcci(inode); - lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING; if (rc || lease_broken) { if (attached && pcci) pcc_inode_put(pcci); @@ -2221,6 +2316,7 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, if (rc) goto out_put; + LASSERT(lli->lli_pcc_state & PCC_STATE_FL_ATTACHING); rc = ll_layout_refresh(inode, &gen2); if (!rc) { if (gen2 == gen) { @@ -2240,6 +2336,7 @@ int pcc_readwrite_attach_fini(struct file *file, struct inode *inode, pcc_inode_put(pcci); } out_unlock: + lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING; pcc_inode_unlock(inode); revert_creds(old_cred); return rc; diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index c00cb0b..a221ef6 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -93,12 +93,19 @@ struct pcc_matcher { enum pcc_dataset_flags { PCC_DATASET_NONE = 0x0, - /* Try auto attach at open, disabled by default */ - PCC_DATASET_OPEN_ATTACH = 0x1, + /* Try auto attach at open, enabled by default */ + PCC_DATASET_OPEN_ATTACH = 0x01, + /* Try auto attach during IO when layout refresh, enabled by default */ + PCC_DATASET_IO_ATTACH = 0x02, + /* Try auto attach at stat */ + PCC_DATASET_STAT_ATTACH = 0x04, + PCC_DATASET_AUTO_ATTACH = PCC_DATASET_OPEN_ATTACH | + PCC_DATASET_IO_ATTACH | + PCC_DATASET_STAT_ATTACH, /* PCC backend is only used for RW-PCC */ - PCC_DATASET_RWPCC = 0x2, + PCC_DATASET_RWPCC = 0x08, /* PCC backend is only used for RO-PCC */ - PCC_DATASET_ROPCC = 0x4, + PCC_DATASET_ROPCC = 0x10, /* PCC backend provides caching services for both RW-PCC and RO-PCC */ PCC_DATASET_PCC_ALL = PCC_DATASET_RWPCC | PCC_DATASET_ROPCC, }; @@ -154,6 +161,25 @@ struct pcc_file { enum lu_pcc_type pccf_type; }; +enum pcc_io_type { + /* read system call */ + PIT_READ = 1, + /* write system call */ + PIT_WRITE, + /* truncate, utime system calls */ + PIT_SETATTR, + /* stat system call */ + PIT_GETATTR, + /* mmap write handling */ + PIT_PAGE_MKWRITE, + /* page fault handling */ + PIT_FAULT, + /* fsync system call handling */ + PIT_FSYNC, + /* splice_read system call */ + PIT_SPLICE_READ +}; + enum pcc_cmd_type { PCC_ADD_DATASET = 0, PCC_DEL_DATASET, @@ -177,6 +203,11 @@ struct pcc_cmd { } u; }; +struct pcc_create_attach { + struct pcc_dataset *pca_dataset; + struct dentry *pca_dentry; +}; + int pcc_super_init(struct pcc_super *super); void pcc_super_fini(struct pcc_super *super); int pcc_cmd_handle(char *buffer, unsigned long count, @@ -212,12 +243,12 @@ int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, bool *cached); int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, struct lu_fid *fid, struct dentry **pcc_dentry); -int pcc_inode_create_fini(struct pcc_dataset *dataset, struct inode *inode, - struct dentry *pcc_dentry); +int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca); +void pcc_create_attach_cleanup(struct super_block *sb, + struct pcc_create_attach *pca); struct pcc_dataset *pcc_dataset_match_get(struct pcc_super *super, struct pcc_matcher *matcher); void pcc_dataset_put(struct pcc_dataset *dataset); void pcc_inode_free(struct inode *inode); void pcc_layout_invalidate(struct inode *inode); - #endif /* LLITE_PCC_H */ diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 4277ac6..a74d979 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1723,6 +1723,7 @@ enum mds_op_bias { MDS_CLOSE_LAYOUT_SPLIT = 1 << 17, MDS_TRUNC_KEEP_LEASE = 1 << 18, MDS_PCC_ATTACH = 1 << 19, + MDS_CLOSE_UPDATE_TIMES = 1 << 20, }; #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP | \ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 06a691b..2178666 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -2180,6 +2180,14 @@ enum lu_pcc_state_flags { PCC_STATE_FL_ATTACHING = 0x02, /* Allow to auto attach at open */ PCC_STATE_FL_OPEN_ATTACH = 0x04, + /* Allow to auto attach during I/O after layout lock revocation */ + PCC_STATE_FL_IO_ATTACH = 0x08, + /* Allow to auto attach at stat */ + PCC_STATE_FL_STAT_ATTACH = 0x10, + /* Allow to auto attach at the next open or layout refresh */ + PCC_STATE_FL_AUTO_ATTACH = PCC_STATE_FL_OPEN_ATTACH | + PCC_STATE_FL_IO_ATTACH | + PCC_STATE_FL_STAT_ATTACH, }; struct lu_pcc_state { From patchwork Thu Feb 27 21:16:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9D25924 for ; Thu, 27 Feb 2020 21:50:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A1B2E24692 for ; Thu, 27 Feb 2020 21:50:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1B2E24692 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45CCB34989D; Thu, 27 Feb 2020 13:43:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2E84434889B for ; Thu, 27 Feb 2020 13:20:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4C856919A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4B09C46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:21 -0500 Message-Id: <1582838290-17243-514-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 513/622] lustre: lmv: alloc dir stripes by QoS X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Similar to file OST object allocation, introduce directory stripe allocation by space usage, but they don't share the same code because of the many differences between them: file has mirrors, PFL, object precreation; while for directory, the first stripe is always on the same MDT where its master object is on. The changes include: * add lod_mdt_alloc_qos() to allocate stripes by space/inode usage. * add lod_mdt_alloc_rr() to allocate stripes round-robin. * add lod_mdt_alloc_specific() to allocate stripes in the old way. * add sysfs support for lmv_desc field in LOD structure, and move those remain in procfs to sysfs. This patch also changes LMV QoS code: * mkdir by QoS if user mkdir by command 'lfs mkdir -i -1 ...', or the parent directory default LMV starting MDT index is -1. * with the above change, 'space' hash flag is useless, remove all related code. * previously 'lfs mkdir -i -1' QoS code is in lfs_setdirstripe(), but now it's done in LMV, remove the old code. Update sanity 413a 413b to support QoS mkdir of both plain and striped directories. Update lfs-setdirstripe man to reflect the changes. WC-bug-id: https://jira.whamcloud.com/browse/LU-12624 Lustre-commit: c1d0a355a6a6 ("LU-12624 lod: alloc dir stripes by QoS") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/35825 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_lmv.h | 12 -- fs/lustre/lmv/lmv_intent.c | 16 +- fs/lustre/lmv/lmv_internal.h | 4 +- fs/lustre/lmv/lmv_obd.c | 279 ++++++++++++++++---------------- fs/lustre/obdclass/lu_tgt_descs.c | 17 +- fs/lustre/ptlrpc/wiretest.c | 1 - include/uapi/linux/lustre/lustre_user.h | 10 +- 7 files changed, 154 insertions(+), 185 deletions(-) diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h index b33a6ed..a538559 100644 --- a/fs/lustre/include/lustre_lmv.h +++ b/fs/lustre/include/lustre_lmv.h @@ -55,12 +55,6 @@ struct lmv_stripe_md { struct lmv_oinfo lsm_md_oinfo[0]; }; -static inline bool lmv_is_known_hash_type(u32 type) -{ - return (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_FNV_1A_64 || - (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_ALL_CHARS; -} - static inline bool lmv_dir_striped(const struct lmv_stripe_md *lsm) { return lsm && lsm->lsm_md_magic == LMV_MAGIC; @@ -89,12 +83,6 @@ static inline bool lmv_dir_bad_hash(const struct lmv_stripe_md *lsm) return !lmv_is_known_hash_type(lsm->lsm_md_hash_type); } -/* NB, this is checking directory default LMV */ -static inline bool lmv_dir_qos_mkdir(const struct lmv_stripe_md *lsm) -{ - return lsm && (lsm->lsm_md_hash_type & LMV_HASH_FLAG_SPACE); -} - static inline bool lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2) { diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 542b16d..ca9bbe8 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -306,22 +306,10 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data, /* * open(O_CREAT | O_EXCL) needs to check * existing name, which should be done on both - * old and new layout, to avoid creating new - * file under old layout, check old layout on + * old and new layout, check old layout on * client side. */ - tgt = lmv_locate_tgt(lmv, op_data); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - - rc = md_getattr_name(tgt->ltd_exp, op_data, - reqp); - if (!rc) { - ptlrpc_req_finished(*reqp); - *reqp = NULL; - return -EEXIST; - } - + rc = lmv_migrate_existence_check(lmv, op_data); if (rc != -ENOENT) return rc; diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index 70d86676..e23eb37 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -49,7 +49,6 @@ int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data, u64 extra_lock_flags); int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds); -int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds); int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp, struct lu_fid *fid, struct md_op_data *op_data); @@ -217,8 +216,9 @@ static inline bool lmv_dir_retry_check_update(struct md_op_data *op_data) struct lmv_tgt_desc *lmv_locate_tgt(struct lmv_obd *lmv, struct md_op_data *op_data); +int lmv_migrate_existence_check(struct lmv_obd *lmv, + struct md_op_data *op_data); /* lproc_lmv.c */ int lmv_tunables_init(struct obd_device *obd); - #endif diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 84be905..e92be25 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1045,106 +1045,36 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, return rc; } -/** - * This is _inode_ placement policy function (not name). - */ -static u32 lmv_placement_policy(struct obd_device *obd, - struct md_op_data *op_data) +int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp, + struct lu_fid *fid, struct md_op_data *op_data) { + struct obd_device *obd = class_exp2obd(exp); struct lmv_obd *lmv = &obd->u.lmv; - struct lmv_user_md *lum; - u32 mdt; - - if (lmv->lmv_mdt_count == 1) - return 0; - - lum = op_data->op_data; - /* - * Choose MDT by - * 1. See if the stripe offset is specified by lum. - * 2. If parent has default LMV, and its hash type is "space", choose - * MDT with QoS. (see lmv_locate_tgt_qos()). - * 3. Then check if default LMV stripe offset is not -1. - * 4. Finally choose MDS by name hash if the parent - * is striped directory. (see lmv_locate_tgt()). - * - * presently explicit MDT location is not supported - * for foreign dirs (as it can't be embedded into free - * format LMV, like with lum_stripe_offset), so we only - * rely on default stripe offset or then name hashing. - */ - if (op_data->op_cli_flags & CLI_SET_MEA && lum && - le32_to_cpu(lum->lum_magic != LMV_MAGIC_FOREIGN) && - le32_to_cpu(lum->lum_stripe_offset) != (u32)-1) { - mdt = le32_to_cpu(lum->lum_stripe_offset); - } else if (op_data->op_code == LUSTRE_OPC_MKDIR && - !lmv_dir_striped(op_data->op_mea1) && - lmv_dir_qos_mkdir(op_data->op_default_mea1)) { - mdt = op_data->op_mds; - } else if (op_data->op_code == LUSTRE_OPC_MKDIR && - op_data->op_default_mea1 && - op_data->op_default_mea1->lsm_md_master_mdt_index != - (u32)-1) { - mdt = op_data->op_default_mea1->lsm_md_master_mdt_index; - op_data->op_mds = mdt; - } else { - mdt = op_data->op_mds; - } - - return mdt; -} - -int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds) -{ struct lmv_tgt_desc *tgt; int rc; - tgt = lmv_tgt(lmv, mds); + LASSERT(op_data); + LASSERT(fid); + + tgt = lmv_tgt(lmv, op_data->op_mds); if (!tgt) return -ENODEV; + if (!tgt->ltd_active || !tgt->ltd_exp) + return -ENODEV; + /* * New seq alloc and FLD setup should be atomic. Otherwise we may find * on server that seq in new allocated fid is not yet known. */ mutex_lock(&tgt->ltd_fid_mutex); - - if (tgt->ltd_active == 0 || !tgt->ltd_exp) { - rc = -ENODEV; - goto out; - } - - /* - * Asking underlaying tgt layer to allocate new fid. - */ rc = obd_fid_alloc(NULL, tgt->ltd_exp, fid, NULL); + mutex_unlock(&tgt->ltd_fid_mutex); if (rc > 0) { LASSERT(fid_is_sane(fid)); rc = 0; } -out: - mutex_unlock(&tgt->ltd_fid_mutex); - return rc; -} - -int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp, - struct lu_fid *fid, struct md_op_data *op_data) -{ - struct obd_device *obd = class_exp2obd(exp); - struct lmv_obd *lmv = &obd->u.lmv; - u32 mds; - int rc; - - LASSERT(op_data); - LASSERT(fid); - - mds = lmv_placement_policy(obd, op_data); - - rc = __lmv_fid_alloc(lmv, fid, mds); - if (rc) - CERROR("Can't alloc new fid, rc %d\n", rc); - return rc; } @@ -1624,8 +1554,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) * which is set outside, and if dir is migrating, 'op_data->op_post_migrate' * indicates whether old or new layout is used to locate. * - * For plain direcotry, normally it will locate MDT by FID, but if this - * directory has default LMV, and its hash type is "space", locate MDT with QoS. + * For plain direcotry, it just locate the MDT of op_data->op_fid1. * * @lmv: LMV device * @op_data: client MD stack parameters, name, namelen @@ -1650,7 +1579,7 @@ struct lmv_tgt_desc * * ct_restore(). */ if (op_data->op_bias & MDS_CREATE_VOLATILE && - (int)op_data->op_mds != -1) { + op_data->op_mds != LMV_OFFSET_DEFAULT) { tgt = lmv_tgt(lmv, op_data->op_mds); if (!tgt) return ERR_PTR(-ENODEV); @@ -1679,30 +1608,7 @@ struct lmv_tgt_desc * tgt = lmv_tgt(lmv, oinfo->lmo_mds); if (!tgt) - tgt = ERR_PTR(-ENODEV); - } else if (op_data->op_code == LUSTRE_OPC_MKDIR && - lmv_dir_qos_mkdir(op_data->op_default_mea1) && - !lmv_dir_striped(lsm)) { - tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); - if (tgt == ERR_PTR(-EAGAIN)) - tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds); - /* - * only update statfs when mkdir under dir with "space" hash, - * this means the cached statfs may be stale, and current mkdir - * may not follow QoS accurately, but it's not serious, and it - * avoids periodic statfs when client doesn't mkdir under - * "space" hashed directories. - * - * TODO: after MDT support QoS object allocation, also update - * statfs for 'lfs mkdir -i -1 ...", currently it's done in user - * space. - */ - if (!IS_ERR(tgt)) { - struct obd_device *obd; - - obd = container_of(lmv, struct obd_device, u.lmv); - lmv_statfs_check_update(obd, tgt); - } + return ERR_PTR(-ENODEV); } else { tgt = lmv_locate_tgt_by_name(lmv, op_data->op_mea1, op_data->op_name, op_data->op_namelen, @@ -1755,6 +1661,78 @@ struct lmv_tgt_desc * &op_data->op_mds, true); } +int lmv_migrate_existence_check(struct lmv_obd *lmv, struct md_op_data *op_data) +{ + struct lu_tgt_desc *tgt; + struct ptlrpc_request *request; + int rc; + + LASSERT(lmv_dir_migrating(op_data->op_mea1)); + + tgt = lmv_locate_tgt(lmv, op_data); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + rc = md_getattr_name(tgt->ltd_exp, op_data, &request); + if (!rc) { + ptlrpc_req_finished(request); + return -EEXIST; + } + + return rc; +} + +/* mkdir by QoS in two cases: + * 1. 'lfs mkdir -i -1' + * 2. parent default LMV master_mdt_index is -1 + * + * NB, mkdir by QoS only if parent is not striped, this is to avoid remote + * directories under striped directory. + */ +static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data) +{ + const struct lmv_stripe_md *lsm = op_data->op_default_mea1; + const struct lmv_user_md *lum = op_data->op_data; + + if (op_data->op_code != LUSTRE_OPC_MKDIR) + return false; + + if (lmv_dir_striped(op_data->op_mea1)) + return false; + + if (op_data->op_cli_flags & CLI_SET_MEA && lum && + (le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC || + le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC_SPECIFIC) && + le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT) + return true; + + if (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT) + return true; + + return false; +} + +/* 'lfs mkdir -i ' */ +static inline bool lmv_op_user_specific_mkdir(const struct md_op_data *op_data) +{ + const struct lmv_user_md *lum = op_data->op_data; + + return op_data->op_code == LUSTRE_OPC_MKDIR && + op_data->op_cli_flags & CLI_SET_MEA && lum && + (le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC || + le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC_SPECIFIC) && + le32_to_cpu(lum->lum_stripe_offset) != LMV_OFFSET_DEFAULT; +} + +/* parent default LMV master_mdt_index is not -1. */ +static inline bool +lmv_op_default_specific_mkdir(const struct md_op_data *op_data) +{ + return op_data->op_code == LUSTRE_OPC_MKDIR && + op_data->op_default_mea1 && + op_data->op_default_mea1->lsm_md_master_mdt_index != + LMV_OFFSET_DEFAULT; +} int lmv_create(struct obd_export *exp, struct md_op_data *op_data, const void *data, size_t datalen, umode_t mode, uid_t uid, gid_t gid, kernel_cap_t cap_effective, u64 rdev, @@ -1774,20 +1752,9 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (lmv_dir_migrating(op_data->op_mea1)) { /* * if parent is migrating, create() needs to lookup existing - * name, to avoid creating new file under old layout of - * migrating directory, check old layout here. + * name in both old and new layout, check old layout on client. */ - tgt = lmv_locate_tgt(lmv, op_data); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); - - rc = md_getattr_name(tgt->ltd_exp, op_data, request); - if (!rc) { - ptlrpc_req_finished(*request); - *request = NULL; - return -EEXIST; - } - + rc = lmv_migrate_existence_check(lmv, op_data); if (rc != -ENOENT) return rc; @@ -1798,28 +1765,44 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(tgt)) return PTR_ERR(tgt); - CDEBUG(D_INODE, "CREATE name '%.*s' on " DFID " -> mds #%x\n", - (int)op_data->op_namelen, op_data->op_name, - PFID(&op_data->op_fid1), op_data->op_mds); - - rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data); - if (rc) - return rc; - - if (exp_connect_flags(exp) & OBD_CONNECT_DIR_STRIPE) { + if (lmv_op_qos_mkdir(op_data)) { + tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); + if (tgt == ERR_PTR(-EAGAIN)) + tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds); /* - * Send the create request to the MDT where the object - * will be located + * only update statfs after QoS mkdir, this means the cached + * statfs may be stale, and current mkdir may not follow QoS + * accurately, but it's not serious, and avoids periodic statfs + * when client doesn't mkdir by QoS. */ - tgt = lmv_fid2tgt(lmv, &op_data->op_fid2); - if (IS_ERR(tgt)) - return PTR_ERR(tgt); + if (!IS_ERR(tgt)) + lmv_statfs_check_update(obd, tgt); + } else if (lmv_op_user_specific_mkdir(op_data)) { + struct lmv_user_md *lum = op_data->op_data; - op_data->op_mds = tgt->ltd_index; + op_data->op_mds = le32_to_cpu(lum->lum_stripe_offset); + tgt = lmv_tgt(lmv, op_data->op_mds); + if (!tgt) + return -ENODEV; + } else if (lmv_op_default_specific_mkdir(op_data)) { + op_data->op_mds = + op_data->op_default_mea1->lsm_md_master_mdt_index; + tgt = lmv_tgt(lmv, op_data->op_mds); + if (!tgt) + return -ENODEV; } - CDEBUG(D_INODE, "CREATE obj " DFID " -> mds #%x\n", - PFID(&op_data->op_fid1), op_data->op_mds); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data); + if (rc) + return rc; + + CDEBUG(D_INODE, "CREATE name '%.*s' "DFID" on " DFID " -> mds #%x\n", + (int)op_data->op_namelen, op_data->op_name, + PFID(&op_data->op_fid2), PFID(&op_data->op_fid1), + op_data->op_mds); op_data->op_flags |= MF_MDC_CANCEL_FID1; rc = md_create(tgt->ltd_exp, op_data, data, datalen, mode, uid, gid, @@ -2063,10 +2046,20 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(child_tgt)) return PTR_ERR(child_tgt); - if (!S_ISDIR(op_data->op_mode) && tp_tgt) - rc = __lmv_fid_alloc(lmv, &target_fid, tp_tgt->ltd_index); - else - rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); + /* for directory, migrate to MDT specified by lum_stripe_offset; + * otherwise migrate to the target stripe of parent, but parent + * directory may have finished migration (normally current file too), + * allocate FID on MDT lum_stripe_offset, and server will check + * whether file was migrated already. + */ + if (S_ISDIR(op_data->op_mode) || !tp_tgt) { + struct lmv_user_md *lum = op_data->op_data; + + op_data->op_mds = le32_to_cpu(lum->lum_stripe_offset); + } else { + op_data->op_mds = tp_tgt->ltd_index; + } + rc = lmv_fid_alloc(NULL, exp, &target_fid, op_data); if (rc) return rc; @@ -3071,7 +3064,7 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm, * set default value -1, so lmv_locate_tgt() knows this stripe * target is not initialized. */ - lsm->lsm_md_oinfo[i].lmo_mds = (u32)-1; + lsm->lsm_md_oinfo[i].lmo_mds = LMV_OFFSET_DEFAULT; if (!fid_is_sane(&lsm->lsm_md_oinfo[i].lmo_fid)) continue; diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c index 60c50a0..5a141ce 100644 --- a/fs/lustre/obdclass/lu_tgt_descs.c +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -106,10 +106,6 @@ int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *tgt) u32 id = 0; int rc = 0; - /* tgt not connected, this function will be called again later */ - if (!exp) - return 0; - down_write(&qos->lq_rw_sem); /* * a bit hacky approach to learn NID of corresponding connection @@ -528,7 +524,7 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) * per-tgt penalty is * prio * bavail * iavail / (num_tgt - 1) / 2 */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 8; do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); tgt->ltd_qos.ltq_penalty_per_obj >>= 1; @@ -562,8 +558,9 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { ba = svr->lsq_bavail; ia = svr->lsq_iavail; - svr->lsq_penalty_per_obj = prio_wide * ba * ia; - do_div(ba, svr->lsq_tgt_count * num_active); + svr->lsq_penalty_per_obj = prio_wide * ba * ia >> 8; + do_div(svr->lsq_penalty_per_obj, + svr->lsq_tgt_count * num_active); svr->lsq_penalty_per_obj >>= 1; age = (now - svr->lsq_used) >> 3; @@ -656,6 +653,7 @@ int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, if (!tgt->ltd_active) continue; + ltq = &tgt->ltd_qos; if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) ltq->ltq_penalty = 0; else @@ -668,9 +666,10 @@ int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, *total_wt += ltq->ltq_weight; CDEBUG(D_OTHER, - "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", + "recalc tgt %d usable=%d bavail=%llu ffree=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", tgt->ltd_index, ltq->ltq_usable, - tgt_statfs_bavail(tgt) >> 10, + tgt_statfs_bavail(tgt) >> 16, + tgt_statfs_iavail(tgt) >> 8, ltq->ltq_penalty_per_obj >> 10, ltq->ltq_penalty >> 10, ltq->ltq_svr->lsq_penalty_per_obj >> 10, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index da51dc1..671878d 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1663,7 +1663,6 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(LMV_MAGIC_V1 != 0x0CD20CD0); BUILD_BUG_ON(LMV_MAGIC_STRIPE != 0x0CD40CD0); BUILD_BUG_ON(LMV_HASH_TYPE_MASK != 0x0000ffff); - BUILD_BUG_ON(LMV_HASH_FLAG_SPACE != 0x08000000); BUILD_BUG_ON(LMV_HASH_FLAG_MIGRATION != 0x80000000); /* Checks for struct obd_statfs */ diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 2178666..b46f52b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -429,6 +429,7 @@ static inline bool lov_pattern_supported_normal_comp(__u32 pattern) #define LOV_MAXPOOLNAME 15 #define LOV_POOLNAMEF "%.15s" #define LOV_OFFSET_DEFAULT ((__u16)-1) +#define LMV_OFFSET_DEFAULT ((__u32)-1) #define LOV_MIN_STRIPE_BITS 16 /* maximum PAGE_SIZE (ia64), power of 2 */ #define LOV_MIN_STRIPE_SIZE (1 << LOV_MIN_STRIPE_BITS) @@ -687,10 +688,11 @@ enum lmv_hash_type { */ #define LMV_HASH_TYPE_MASK 0x0000ffff -/* once this is set on a plain directory default layout, newly created - * subdirectories will be distributed on all MDTs by space usage. - */ -#define LMV_HASH_FLAG_SPACE 0x08000000 +static inline bool lmv_is_known_hash_type(__u32 type) +{ + return (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_FNV_1A_64 || + (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_ALL_CHARS; +} /* The striped directory has ever lost its master LMV EA, then LFSCK * re-generated it. This flag is used to indicate such case. It is an From patchwork Thu Feb 27 21:16:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7742517E0 for ; Thu, 27 Feb 2020 21:39:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 603B224690 for ; Thu, 27 Feb 2020 21:39:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 603B224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EF018349440; Thu, 27 Feb 2020 13:32:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8871934889B for ; Thu, 27 Feb 2020 13:20:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4EE17919B; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4DD3F46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:22 -0500 Message-Id: <1582838290-17243-515-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 514/622] lustre: llite: Don't clear d_fsdata in ll_release() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown The whole point of using rcu_free() is that some code might still be accessing the dentry (e.g. lockless lookup) and so the dentry cannot be freed until the end of the grace period. As lockless lookup can accesses d_fsdata -- ll_dcompare calls d_lustre_invalid() -- we also mustn't clear d_fsdata before the end of the grace period. We don't need to clear it at all - by the time it is freed, the inode will no longer be accessed. Fixes: 7126bc2e8d60c ("lustre: switch to use of ->d_init()") Signed-off-by: NeilBrown Reviewed-by: James Simmons --- fs/lustre/llite/dcache.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c index 2dfe12a..3230d32 100644 --- a/fs/lustre/llite/dcache.c +++ b/fs/lustre/llite/dcache.c @@ -63,7 +63,6 @@ static void ll_release(struct dentry *de) kfree(lld->lld_it); } - de->d_fsdata = NULL; call_rcu(&lld->lld_rcu_head, free_dentry_data); } From patchwork Thu Feb 27 21:16:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E9C217E0 for ; Thu, 27 Feb 2020 21:40:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 677EB24690 for ; Thu, 27 Feb 2020 21:40:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 677EB24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 469C734A698; Thu, 27 Feb 2020 13:32:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE8353488A2 for ; Thu, 27 Feb 2020 13:20:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 514FD919C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5076F47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:23 -0500 Message-Id: <1582838290-17243-516-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 515/622] lustre: llite: move agl_thread cleanup out of thread. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown When we start a thread with kthread_create() and later stop it with kthread_stop(), there is no guarantee that the thread function runs at all. So it is not safe to leave cleanup to the thread. So move the cleanup code to a separate function which stops the thread and then cleans up. Fixes: c044fb0f835c ("staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread") Signed-off-by: NeilBrown Reviewed-by: James Simmons --- fs/lustre/llite/statahead.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 497aba3..1639408 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -915,7 +915,19 @@ static int ll_agl_thread(void *arg) schedule(); __set_current_state(TASK_RUNNING); } + return 0; +} + +static void ll_stop_agl(struct ll_statahead_info *sai) +{ + struct ll_inode_info *plli = ll_i2info(sai->sai_dentry->d_inode); + struct ll_inode_info *clli; + CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n", + sai, (unsigned int)sai->sai_agl_task->pid); + kthread_stop(sai->sai_agl_task); + + sai->sai_agl_task = NULL; spin_lock(&plli->lli_agl_lock); sai->sai_agl_valid = 0; while ((clli = list_first_entry_or_null(&sai->sai_agls, @@ -929,9 +941,8 @@ static int ll_agl_thread(void *arg) } spin_unlock(&plli->lli_agl_lock); CDEBUG(D_READA, "agl thread stopped: sai %p, parent %pd\n", - sai, parent); + sai, sai->sai_dentry); ll_sai_put(sai); - return 0; } /* start agl thread */ @@ -1134,13 +1145,9 @@ static int ll_statahead_thread(void *arg) __set_current_state(TASK_RUNNING); } out: - if (sai->sai_agl_task) { - kthread_stop(sai->sai_agl_task); + if (sai->sai_agl_task) + ll_stop_agl(sai); - CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n", - sai, (unsigned int)sai->sai_agl_task->pid); - sai->sai_agl_task = NULL; - } /* * wait for inflight statahead RPCs to finish, and then we can free sai * safely because statahead RPC will access sai data From patchwork Thu Feb 27 21:16:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA73A924 for ; Thu, 27 Feb 2020 21:46:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B2E3724690 for ; Thu, 27 Feb 2020 21:46:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B2E3724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9310634B2C5; Thu, 27 Feb 2020 13:37:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1BCBC3488A2 for ; Thu, 27 Feb 2020 13:20:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5431F919E; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 53298468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:24 -0500 Message-Id: <1582838290-17243-517-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 516/622] lustre/lnet: remove unnecessary use of msecs_to_jiffies() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown msecs_to_jiffies() is useful when you have a number of milliseconds, but when you have a number of seconds, sec * HZ is simpler than msecs_to_jiffies(sec * MSECS_PER_SEC) Similary for small divisions of a second (e.g. HZ/4) So change all calls to msecs_to_jiffies() the reference MSECS_PER_SEC to simple multiplications by HZ. Signed-off-by: NeilBrown Reviewed-by: James Simmons --- fs/lustre/mgc/mgc_request.c | 8 ++++---- fs/lustre/obdclass/integrity.c | 2 +- fs/lustre/osc/osc_request.c | 5 ++--- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 2 +- net/lnet/libcfs/linux-crypto.c | 2 +- net/lnet/lnet/lib-socket.c | 4 ++-- 6 files changed, 11 insertions(+), 12 deletions(-) diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index 5bfa1b7..28064fd 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -555,12 +555,12 @@ static int mgc_requeue_thread(void *data) * caused the lock revocation to finish its setup, plus some * random so everyone doesn't try to reconnect at once. */ - to = msecs_to_jiffies(MGC_TIMEOUT_MIN_SECONDS * MSEC_PER_SEC); - /* rand is centi-seconds */ - to += msecs_to_jiffies(rand * MSEC_PER_SEC / 100); + /* rand is centi-seconds, "to" is in centi-HZ */ + to = MGC_TIMEOUT_MIN_SECONDS * HZ * 100; + to += rand * HZ; wait_event_idle_timeout(rq_waitq, rq_state & (RQ_STOP | RQ_PRECLEANUP), - to); + to/100); /* * iterate & processing through the list. for each cld, process diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c index 2d5760d..230e1a5 100644 --- a/fs/lustre/obdclass/integrity.c +++ b/fs/lustre/obdclass/integrity.c @@ -226,7 +226,7 @@ static void obd_t10_performance_test(const char *obd_name, memset(buf, 0xAD, PAGE_SIZE); kunmap(page); - for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC / 4), + for (start = jiffies, end = start + HZ / 4, bcount = 0; time_before(jiffies, end) && rc == 0; bcount++) { rc = __obd_t10_performance_test(obd_name, cksum_type, page, buf_len / PAGE_SIZE); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 95e09ce..9c43756 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -901,9 +901,8 @@ static void osc_grant_work_handler(struct work_struct *data) return; if (next_shrink > ktime_get_seconds()) - schedule_delayed_work(&work, msecs_to_jiffies( - (next_shrink - ktime_get_seconds()) * - MSEC_PER_SEC)); + schedule_delayed_work(&work, + (next_shrink - ktime_get_seconds()) * HZ); else schedule_work(&work.work); } diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 1110553..fcd9db2 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -3550,7 +3550,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, kiblnd_data.kib_peer_hash_size; } - deadline += msecs_to_jiffies(p * MSEC_PER_SEC); + deadline += p * HZ; spin_lock_irqsave(lock, flags); } diff --git a/net/lnet/libcfs/linux-crypto.c b/net/lnet/libcfs/linux-crypto.c index 532fab4..add4e79 100644 --- a/net/lnet/libcfs/linux-crypto.c +++ b/net/lnet/libcfs/linux-crypto.c @@ -346,7 +346,7 @@ static void cfs_crypto_performance_test(enum cfs_crypto_hash_alg hash_alg) memset(buf, 0xAD, PAGE_SIZE); kunmap(page); - for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC / 4), + for (start = jiffies, end = start + HZ / 4, bcount = 0; time_before(jiffies, end) && err == 0; bcount++) { struct ahash_request *hdesc; int i; diff --git a/net/lnet/lnet/lib-socket.c b/net/lnet/lnet/lib-socket.c index 046bd2d..0c65dc9 100644 --- a/net/lnet/lnet/lib-socket.c +++ b/net/lnet/lnet/lib-socket.c @@ -47,7 +47,7 @@ lnet_sock_write(struct socket *sock, void *buffer, int nob, int timeout) { int rc; - long jiffies_left = timeout * msecs_to_jiffies(MSEC_PER_SEC); + long jiffies_left = timeout * HZ; unsigned long then; struct timeval tv; struct __kernel_sock_timeval ktv; @@ -105,7 +105,7 @@ lnet_sock_read(struct socket *sock, void *buffer, int nob, int timeout) { int rc; - long jiffies_left = timeout * msecs_to_jiffies(MSEC_PER_SEC); + long jiffies_left = timeout * HZ; unsigned long then; struct timeval tv; struct __kernel_sock_timeval ktv; From patchwork Thu Feb 27 21:16:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAFFB138D for ; Thu, 27 Feb 2020 21:42:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D42DC246A1 for ; Thu, 27 Feb 2020 21:42:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D42DC246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 22A893499F3; Thu, 27 Feb 2020 13:34:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73CE83488AE for ; Thu, 27 Feb 2020 13:20:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 56C3A919F; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 55DCD46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:25 -0500 Message-Id: <1582838290-17243-518-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 517/622] lnet: net_fault: don't pass struct member to do_div() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown do_div() changes it's first argument, so passing a struct member is not a good idea unless we really want the struct to change, which we don't in these cases. So copy the value to a local variable and call do_div() on that. Signed-off-by: NeilBrown Reviewed-by: James Simmons --- net/lnet/lnet/net_fault.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index 9f78e43..e43b1e1 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -394,9 +394,11 @@ struct lnet_drop_rule { } } else { /* rate based drop */ - drop = rule->dr_stat.fs_count++ == rule->dr_drop_at; + u64 count; - if (!do_div(rule->dr_stat.fs_count, attr->u.drop.da_rate)) { + drop = rule->dr_stat.fs_count++ == rule->dr_drop_at; + count = rule->dr_stat.fs_count; + if (!do_div(count, attr->u.drop.da_rate)) { rule->dr_drop_at = rule->dr_stat.fs_count + prandom_u32_max(attr->u.drop.da_rate); CDEBUG(D_NET, "Drop Rule %s->%s: next drop: %lu\n", @@ -563,9 +565,12 @@ struct delay_daemon_data { } } else { /* rate based delay */ + u64 count; + delay = rule->dl_stat.fs_count++ == rule->dl_delay_at; + count = rule->dl_stat.fs_count; /* generate the next random rate sequence */ - if (!do_div(rule->dl_stat.fs_count, attr->u.delay.la_rate)) { + if (!do_div(count, attr->u.delay.la_rate)) { rule->dl_delay_at = rule->dl_stat.fs_count + prandom_u32_max(attr->u.delay.la_rate); CDEBUG(D_NET, "Delay Rule %s->%s: next delay: %lu\n", From patchwork Thu Feb 27 21:16:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C512D17E0 for ; Thu, 27 Feb 2020 21:42:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD5F9246A1 for ; Thu, 27 Feb 2020 21:42:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD5F9246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB10A34AB21; Thu, 27 Feb 2020 13:34:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B56943488B1 for ; Thu, 27 Feb 2020 13:20:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5A10D91A0; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 589B346C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:26 -0500 Message-Id: <1582838290-17243-519-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 518/622] lustre: obd: discard unused enum X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown The values in this enum are never used - so discard it. Signed-off-by: NeilBrown Reviewed-by: James Simmons --- fs/lustre/include/obd.h | 8 -------- 1 file changed, 8 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 4ba70c7..5f5a595 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -133,14 +133,6 @@ struct timeout_item { #define OSC_MAX_DIRTY_MB_MAX 2048 /* arbitrary, but < MAX_LONG bytes */ #define OSC_DEFAULT_RESENDS 10 -/* possible values for fo_sync_lock_cancel */ -enum { - NEVER_SYNC_ON_CANCEL = 0, - BLOCKING_SYNC_ON_CANCEL = 1, - ALWAYS_SYNC_ON_CANCEL = 2, - NUM_SYNC_ON_CANCEL_STATES -}; - enum obd_cl_sem_lock_class { OBD_CLI_SEM_NORMAL, OBD_CLI_SEM_MGC, From patchwork Thu Feb 27 21:16:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C953A1580 for ; Thu, 27 Feb 2020 21:45:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1AA924690 for ; Thu, 27 Feb 2020 21:45:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1AA924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0BC87200D20; Thu, 27 Feb 2020 13:36:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 039703488B3 for ; Thu, 27 Feb 2020 13:21:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5E35891A1; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5B6F446D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:27 -0500 Message-Id: <1582838290-17243-520-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 519/622] lustre: update version to 2.13.50 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" With all of the the missing patches from the lustre 2.13 version merged upstream its time to update the upstream clients version. Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_ver.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h index 8ceb57d..0f07260 100644 --- a/include/uapi/linux/lustre/lustre_ver.h +++ b/include/uapi/linux/lustre/lustre_ver.h @@ -2,10 +2,10 @@ #define _LUSTRE_VER_H_ #define LUSTRE_MAJOR 2 -#define LUSTRE_MINOR 11 -#define LUSTRE_PATCH 99 +#define LUSTRE_MINOR 13 +#define LUSTRE_PATCH 50 #define LUSTRE_FIX 0 -#define LUSTRE_VERSION_STRING "2.11.99" +#define LUSTRE_VERSION_STRING "2.13.50" #define OBD_OCD_VERSION(major, minor, patch, fix) \ (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix)) From patchwork Thu Feb 27 21:16:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 524AD92A for ; Thu, 27 Feb 2020 21:40:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3A89924690 for ; Thu, 27 Feb 2020 21:40:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A89924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4758D34919C; Thu, 27 Feb 2020 13:32:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 456B13488B8 for ; Thu, 27 Feb 2020 13:21:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 60E0391A2; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5E68247C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:28 -0500 Message-Id: <1582838290-17243-521-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 520/622] lustre: llite: report latency for filesystem ops X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Add the elapsed time of VFS operations to the llite stats counter, instead of just tracking the number of operations, to allow tracking of operation round-trip latency. WC-bug-id: https://jira.whamcloud.com/browse/LU-12631 Lustre-commit: ea58c4cfb0fc ("LU-12631 llite: report latency for filesystem ops") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36078 Reviewed-by: Li Xi Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 4 +- fs/lustre/llite/dir.c | 4 +- fs/lustre/llite/file.c | 69 ++++++++++++++++++++++++--------- fs/lustre/llite/llite_internal.h | 7 ++-- fs/lustre/llite/llite_lib.c | 15 ++++++-- fs/lustre/llite/llite_mmap.c | 36 ++++++++++++------ fs/lustre/llite/lproc_llite.c | 78 ++++++++++++++++++++------------------ fs/lustre/llite/namei.c | 39 ++++++++++++++----- fs/lustre/llite/pcc.c | 4 +- fs/lustre/llite/pcc.h | 4 +- fs/lustre/llite/super25.c | 1 - fs/lustre/llite/xattr.c | 49 ++++++++++++++---------- 12 files changed, 199 insertions(+), 111 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index fdc1b19..ac62560 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -138,10 +138,10 @@ enum { LPROCFS_CNTR_STDDEV = 0x0004, /* counter data type */ - LPROCFS_TYPE_REGS = 0x0100, + LPROCFS_TYPE_REQS = 0x0100, LPROCFS_TYPE_BYTES = 0x0200, LPROCFS_TYPE_PAGES = 0x0400, - LPROCFS_TYPE_CYCLE = 0x0800, + LPROCFS_TYPE_USEC = 0x0800, }; #define LC_MIN_INIT ((~(u64)0) >> 1) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 4dccd24..c38862e 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -298,6 +298,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) bool api32 = ll_need_32bit_api(sbi); struct md_op_data *op_data; struct lu_fid pfid = { 0 }; + ktime_t kstart = ktime_get(); int rc; CDEBUG(D_VFSTRACE, @@ -374,7 +375,8 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) ll_finish_md_op_data(op_data); out: if (!rc) - ll_stats_ops_tally(sbi, LPROC_LL_READDIR, 1); + ll_stats_ops_tally(sbi, LPROC_LL_READDIR, + ktime_us_delta(ktime_get(), kstart)); return rc; } diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 31d7dce..92eead1 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -383,13 +383,12 @@ int ll_file_release(struct inode *inode, struct file *file) struct ll_file_data *fd; struct ll_sb_info *sbi = ll_i2sbi(inode); struct ll_inode_info *lli = ll_i2info(inode); + ktime_t kstart = ktime_get(); int rc; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", PFID(ll_inode2fid(inode)), inode); - if (!is_root_inode(inode)) - ll_stats_ops_tally(sbi, LPROC_LL_RELEASE, 1); fd = LUSTRE_FPRIVATE(file); LASSERT(fd); @@ -402,7 +401,8 @@ int ll_file_release(struct inode *inode, struct file *file) if (is_root_inode(inode)) { LUSTRE_FPRIVATE(file) = NULL; ll_file_data_put(fd); - return 0; + rc = 0; + goto out; } pcc_file_release(inode, file); @@ -418,6 +418,10 @@ int ll_file_release(struct inode *inode, struct file *file) if (CFS_FAIL_TIMEOUT_MS(OBD_FAIL_PTLRPC_DUMP_LOG, cfs_fail_val)) libcfs_debug_dumplog(); +out: + if (!rc && inode->i_sb->s_root != file_dentry(file)) + ll_stats_ops_tally(sbi, LPROC_LL_RELEASE, + ktime_us_delta(ktime_get(), kstart)); return rc; } @@ -699,6 +703,7 @@ int ll_file_open(struct inode *inode, struct file *file) struct obd_client_handle **och_p = NULL; u64 *och_usecount = NULL; struct ll_file_data *fd; + ktime_t kstart = ktime_get(); int rc = 0; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), flags %o\n", @@ -896,7 +901,8 @@ int ll_file_open(struct inode *inode, struct file *file) if (fd) ll_file_data_put(fd); } else { - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_OPEN, 1); + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_OPEN, + ktime_us_delta(ktime_get(), kstart)); } out_nofiledata: @@ -1676,6 +1682,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t result; u16 refcheck; ssize_t rc2; + ktime_t kstart = ktime_get(); bool cached; if (!iov_iter_count(to)) @@ -1694,7 +1701,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) */ result = pcc_file_read_iter(iocb, to, &cached); if (cached) - return result; + goto out; ll_ras_enter(file); @@ -1719,10 +1726,13 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) cl_env_put(env, &refcheck); out: - if (result > 0) + if (result > 0) { ll_rw_stats_tally(ll_i2sbi(file_inode(file)), current->pid, LUSTRE_FPRIVATE(file), iocb->ki_pos, result, READ); + ll_stats_ops_tally(ll_i2sbi(file_inode(file)), LPROC_LL_READ, + ktime_us_delta(ktime_get(), kstart)); + } return result; } @@ -1795,6 +1805,7 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) struct file *file = iocb->ki_filp; u16 refcheck; bool cached; + ktime_t kstart = ktime_get(); int result; if (!iov_iter_count(from)) { @@ -1813,8 +1824,10 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * from PCC cache automatically. */ result = pcc_file_write_iter(iocb, from, &cached); - if (cached && result != -ENOSPC && result != -EDQUOT) - return result; + if (cached && result != -ENOSPC && result != -EDQUOT) { + rc_normal = result; + goto out; + } /* NB: we can't do direct IO for tiny writes because they use the page * cache, we can't do sync writes because tiny writes can't flush @@ -1855,10 +1868,14 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) cl_env_put(env, &refcheck); out: - if (rc_normal > 0) + if (rc_normal > 0) { ll_rw_stats_tally(ll_i2sbi(file_inode(file)), current->pid, LUSTRE_FPRIVATE(file), iocb->ki_pos, rc_normal, WRITE); + ll_stats_ops_tally(ll_i2sbi(file_inode(file)), LPROC_LL_WRITE, + ktime_us_delta(ktime_get(), kstart)); + } + return rc_normal; } @@ -3850,12 +3867,12 @@ static loff_t ll_file_seek(struct file *file, loff_t offset, int origin) { struct inode *inode = file_inode(file); loff_t retval, eof = 0; + ktime_t kstart = ktime_get(); retval = offset + ((origin == SEEK_END) ? i_size_read(inode) : (origin == SEEK_CUR) ? file->f_pos : 0); CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), to=%llu=%#llx(%d)\n", PFID(ll_inode2fid(inode)), inode, retval, retval, origin); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_LLSEEK, 1); if (origin == SEEK_END || origin == SEEK_HOLE || origin == SEEK_DATA) { retval = ll_glimpse_size(inode); @@ -3864,8 +3881,12 @@ static loff_t ll_file_seek(struct file *file, loff_t offset, int origin) eof = i_size_read(inode); } - return generic_file_llseek_size(file, offset, origin, - ll_file_maxbytes(inode), eof); + retval = generic_file_llseek_size(file, offset, origin, + ll_file_maxbytes(inode), eof); + if (retval >= 0) + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_LLSEEK, + ktime_us_delta(ktime_get(), kstart)); + return retval; } static int ll_flush(struct file *file, fl_owner_t id) @@ -3948,14 +3969,13 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct inode *inode = file_inode(file); struct ll_inode_info *lli = ll_i2info(inode); struct ptlrpc_request *req; + ktime_t kstart = ktime_get(); int rc, err; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), start %lld, end %lld, datasync %d\n", PFID(ll_inode2fid(inode)), inode, start, end, datasync); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); - rc = file_write_and_wait_range(file, start, end); inode_lock(inode); @@ -4002,6 +4022,10 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) } inode_unlock(inode); + + if (!rc) + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, + ktime_us_delta(ktime_get(), kstart)); return rc; } @@ -4019,6 +4043,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct lustre_handle lockh = {0}; union ldlm_policy_data flock = { { 0 } }; int fl_type = file_lock->fl_type; + ktime_t kstart = ktime_get(); u64 flags = 0; int rc; int rc2 = 0; @@ -4026,7 +4051,6 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID " file_lock=%p\n", PFID(ll_inode2fid(inode)), file_lock); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FLOCK, 1); if (file_lock->fl_flags & FL_FLOCK) LASSERT((cmd == F_SETLKW) || (cmd == F_SETLK)); @@ -4122,6 +4146,9 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) ll_finish_md_op_data(op_data); + if (!rc) + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FLOCK, + ktime_us_delta(ktime_get(), kstart)); return rc; } @@ -4515,10 +4542,9 @@ int ll_getattr(const struct path *path, struct kstat *stat, struct inode *inode = d_inode(path->dentry); struct ll_sb_info *sbi = ll_i2sbi(inode); struct ll_inode_info *lli = ll_i2info(inode); + ktime_t kstart = ktime_get(); int rc; - ll_stats_ops_tally(sbi, LPROC_LL_GETATTR, 1); - rc = ll_inode_revalidate(path->dentry, IT_GETATTR); if (rc < 0) return rc; @@ -4582,6 +4608,9 @@ int ll_getattr(const struct path *path, struct kstat *stat, stat->size = i_size_read(inode); stat->blocks = inode->i_blocks; + ll_stats_ops_tally(sbi, LPROC_LL_GETATTR, + ktime_us_delta(ktime_get(), kstart)); + return 0; } @@ -4634,6 +4663,7 @@ int ll_inode_permission(struct inode *inode, int mask) const struct cred *old_cred = NULL; struct cred *cred = NULL; bool squash_id = false; + ktime_t kstart = ktime_get(); int rc = 0; if (mask & MAY_NOT_BLOCK) @@ -4682,7 +4712,6 @@ int ll_inode_permission(struct inode *inode, int mask) old_cred = override_creds(cred); } - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_INODE_PERM, 1); rc = generic_permission(inode, mask); /* restore current process's credentials and FS capability */ @@ -4691,6 +4720,10 @@ int ll_inode_permission(struct inode *inode, int mask) put_cred(cred); } + if (!rc) + ll_stats_ops_tally(sbi, LPROC_LL_INODE_PERM, + ktime_us_delta(ktime_get(), kstart)); + return rc; } diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d84f50c..205ea50 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -775,7 +775,7 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock, /* llite/lproc_llite.c */ int ll_debugfs_register_super(struct super_block *sb, const char *name); void ll_debugfs_unregister_super(struct super_block *sb); -void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, int count); +void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, long count); void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, struct ll_file_data *file, loff_t pos, size_t count, int rw); @@ -783,10 +783,12 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid, enum { LPROC_LL_READ_BYTES, LPROC_LL_WRITE_BYTES, + LPROC_LL_READ, + LPROC_LL_WRITE, LPROC_LL_IOCTL, LPROC_LL_OPEN, LPROC_LL_RELEASE, - LPROC_LL_MAP, + LPROC_LL_MMAP, LPROC_LL_FAULT, LPROC_LL_MKWRITE, LPROC_LL_LLSEEK, @@ -805,7 +807,6 @@ enum { LPROC_LL_MKNOD, LPROC_LL_RENAME, LPROC_LL_STATFS, - LPROC_LL_ALLOC_INODE, LPROC_LL_SETXATTR, LPROC_LL_GETXATTR, LPROC_LL_GETXATTR_HITS, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 49490ee..84472fb 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1644,6 +1644,7 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, struct inode *inode = d_inode(dentry); struct ll_inode_info *lli = ll_i2info(inode); struct md_op_data *op_data = NULL; + ktime_t kstart = ktime_get(); int rc = 0; CDEBUG(D_VFSTRACE, "%s: setattr inode " DFID "(%p) from %llu to %llu, valid %x, hsm_import %d\n", @@ -1820,8 +1821,10 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, inode_has_no_xattr(inode); } - ll_stats_ops_tally(ll_i2sbi(inode), (attr->ia_valid & ATTR_SIZE) ? - LPROC_LL_TRUNC : LPROC_LL_SETATTR, 1); + if (!rc) + ll_stats_ops_tally(ll_i2sbi(inode), attr->ia_valid & ATTR_SIZE ? + LPROC_LL_TRUNC : LPROC_LL_SETATTR, + ktime_us_delta(ktime_get(), kstart)); return rc; } @@ -1918,10 +1921,10 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) struct super_block *sb = de->d_sb; struct obd_statfs osfs; u64 fsid = huge_encode_dev(sb->s_dev); + ktime_t kstart = ktime_get(); int rc; - CDEBUG(D_VFSTRACE, "VFS Op: at %llu jiffies\n", get_jiffies_64()); - ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STATFS, 1); + CDEBUG(D_VFSTRACE, "VFS Op:sb=%s (%p)\n", sb->s_id, sb); /* Some amount of caching on the client is allowed */ rc = ll_statfs_internal(ll_s2sbi(sb), &osfs, OBD_STATFS_SUM); @@ -1950,6 +1953,10 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs) sfs->f_bavail = osfs.os_bavail; sfs->f_fsid.val[0] = (u32)fsid; sfs->f_fsid.val[1] = (u32)(fsid >> 32); + + ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STATFS, + ktime_us_delta(ktime_get(), kstart)); + return 0; } diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 5c13164..b955756e 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -363,13 +363,11 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) bool cached; vm_fault_t result; sigset_t old, new; - - ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), - LPROC_LL_FAULT, 1); + ktime_t kstart = ktime_get(); result = pcc_fault(vma, vmf, &cached); if (cached) - return result; + goto out; /* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite * so that it can be killed by admin but not cause segfault by @@ -407,11 +405,17 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) } sigprocmask(SIG_SETMASK, &old, NULL); - if (vmf->page && result == VM_FAULT_LOCKED) +out: + if (vmf->page && result == VM_FAULT_LOCKED) { ll_rw_stats_tally(ll_i2sbi(file_inode(vma->vm_file)), current->pid, LUSTRE_FPRIVATE(vma->vm_file), cl_offset(NULL, vmf->page->index), PAGE_SIZE, READ); + ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), + LPROC_LL_FAULT, + ktime_us_delta(ktime_get(), kstart)); + } + return result; } @@ -424,13 +428,11 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) bool cached; int err; vm_fault_t ret; + ktime_t kstart = ktime_get(); - ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), - LPROC_LL_MKWRITE, 1); - - err = pcc_page_mkwrite(vma, vmf, &cached); + ret = pcc_page_mkwrite(vma, vmf, &cached); if (cached) - return err; + goto out; file_update_time(vma->vm_file); do { @@ -465,11 +467,17 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) break; } - if (ret == VM_FAULT_LOCKED) +out: + if (ret == VM_FAULT_LOCKED) { ll_rw_stats_tally(ll_i2sbi(file_inode(vma->vm_file)), current->pid, LUSTRE_FPRIVATE(vma->vm_file), cl_offset(NULL, vmf->page->index), PAGE_SIZE, WRITE); + ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)), + LPROC_LL_MKWRITE, + ktime_us_delta(ktime_get(), kstart)); + } + return ret; } @@ -527,6 +535,7 @@ int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last) int ll_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); + ktime_t kstart = ktime_get(); bool cached; int rc; @@ -537,7 +546,6 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma) if (cached && rc != 0) return rc; - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_MAP, 1); rc = generic_file_mmap(file, vma); if (rc == 0) { vma->vm_ops = &ll_file_vm_ops; @@ -547,5 +555,9 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma) rc = ll_glimpse_size(inode); } + if (!rc) + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_MMAP, + ktime_us_delta(ktime_get(), kstart)); + return rc; } diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 439c096..82c5e5c 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1541,54 +1541,58 @@ static void sbi_kobj_release(struct kobject *kobj) .release = sbi_kobj_release, }; +#define LPROCFS_TYPE_LATENCY \ + (LPROCFS_TYPE_USEC | LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV) static const struct llite_file_opcode { u32 opcode; u32 type; const char *opname; } llite_opcode_table[LPROC_LL_FILE_OPCODES] = { /* file operation */ - { LPROC_LL_READ_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, - "read_bytes" }, - { LPROC_LL_WRITE_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, - "write_bytes" }, - { LPROC_LL_IOCTL, LPROCFS_TYPE_REGS, "ioctl" }, - { LPROC_LL_OPEN, LPROCFS_TYPE_REGS, "open" }, - { LPROC_LL_RELEASE, LPROCFS_TYPE_REGS, "close" }, - { LPROC_LL_MAP, LPROCFS_TYPE_REGS, "mmap" }, - { LPROC_LL_FAULT, LPROCFS_TYPE_REGS, "page_fault" }, - { LPROC_LL_MKWRITE, LPROCFS_TYPE_REGS, "page_mkwrite" }, - { LPROC_LL_LLSEEK, LPROCFS_TYPE_REGS, "seek" }, - { LPROC_LL_FSYNC, LPROCFS_TYPE_REGS, "fsync" }, - { LPROC_LL_READDIR, LPROCFS_TYPE_REGS, "readdir" }, + { LPROC_LL_READ_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, + "read_bytes" }, + { LPROC_LL_WRITE_BYTES, LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_BYTES, + "write_bytes" }, + { LPROC_LL_READ, LPROCFS_TYPE_LATENCY, "read" }, + { LPROC_LL_WRITE, LPROCFS_TYPE_LATENCY, "write" }, + { LPROC_LL_IOCTL, LPROCFS_TYPE_REQS, "ioctl" }, + { LPROC_LL_OPEN, LPROCFS_TYPE_LATENCY, "open" }, + { LPROC_LL_RELEASE, LPROCFS_TYPE_LATENCY, "close" }, + { LPROC_LL_MMAP, LPROCFS_TYPE_LATENCY, "mmap" }, + { LPROC_LL_FAULT, LPROCFS_TYPE_LATENCY, "page_fault" }, + { LPROC_LL_MKWRITE, LPROCFS_TYPE_LATENCY, "page_mkwrite" }, + { LPROC_LL_LLSEEK, LPROCFS_TYPE_LATENCY, "seek" }, + { LPROC_LL_FSYNC, LPROCFS_TYPE_LATENCY, "fsync" }, + { LPROC_LL_READDIR, LPROCFS_TYPE_LATENCY, "readdir" }, /* inode operation */ - { LPROC_LL_SETATTR, LPROCFS_TYPE_REGS, "setattr" }, - { LPROC_LL_TRUNC, LPROCFS_TYPE_REGS, "truncate" }, - { LPROC_LL_FLOCK, LPROCFS_TYPE_REGS, "flock" }, - { LPROC_LL_GETATTR, LPROCFS_TYPE_REGS, "getattr" }, + { LPROC_LL_SETATTR, LPROCFS_TYPE_LATENCY, "setattr" }, + { LPROC_LL_TRUNC, LPROCFS_TYPE_LATENCY, "truncate" }, + { LPROC_LL_FLOCK, LPROCFS_TYPE_LATENCY, "flock" }, + { LPROC_LL_GETATTR, LPROCFS_TYPE_LATENCY, "getattr" }, /* dir inode operation */ - { LPROC_LL_CREATE, LPROCFS_TYPE_REGS, "create" }, - { LPROC_LL_LINK, LPROCFS_TYPE_REGS, "link" }, - { LPROC_LL_UNLINK, LPROCFS_TYPE_REGS, "unlink" }, - { LPROC_LL_SYMLINK, LPROCFS_TYPE_REGS, "symlink" }, - { LPROC_LL_MKDIR, LPROCFS_TYPE_REGS, "mkdir" }, - { LPROC_LL_RMDIR, LPROCFS_TYPE_REGS, "rmdir" }, - { LPROC_LL_MKNOD, LPROCFS_TYPE_REGS, "mknod" }, - { LPROC_LL_RENAME, LPROCFS_TYPE_REGS, "rename" }, + { LPROC_LL_CREATE, LPROCFS_TYPE_LATENCY, "create" }, + { LPROC_LL_LINK, LPROCFS_TYPE_LATENCY, "link" }, + { LPROC_LL_UNLINK, LPROCFS_TYPE_LATENCY, "unlink" }, + { LPROC_LL_SYMLINK, LPROCFS_TYPE_LATENCY, "symlink" }, + { LPROC_LL_MKDIR, LPROCFS_TYPE_LATENCY, "mkdir" }, + { LPROC_LL_RMDIR, LPROCFS_TYPE_LATENCY, "rmdir" }, + { LPROC_LL_MKNOD, LPROCFS_TYPE_LATENCY, "mknod" }, + { LPROC_LL_RENAME, LPROCFS_TYPE_LATENCY, "rename" }, /* special inode operation */ - { LPROC_LL_STATFS, LPROCFS_TYPE_REGS, "statfs" }, - { LPROC_LL_ALLOC_INODE, LPROCFS_TYPE_REGS, "alloc_inode" }, - { LPROC_LL_SETXATTR, LPROCFS_TYPE_REGS, "setxattr" }, - { LPROC_LL_GETXATTR, LPROCFS_TYPE_REGS, "getxattr" }, - { LPROC_LL_GETXATTR_HITS, LPROCFS_TYPE_REGS, "getxattr_hits" }, - { LPROC_LL_LISTXATTR, LPROCFS_TYPE_REGS, "listxattr" }, - { LPROC_LL_REMOVEXATTR, LPROCFS_TYPE_REGS, "removexattr" }, - { LPROC_LL_INODE_PERM, LPROCFS_TYPE_REGS, "inode_permission" }, + { LPROC_LL_STATFS, LPROCFS_TYPE_LATENCY, "statfs" }, + { LPROC_LL_SETXATTR, LPROCFS_TYPE_LATENCY, "setxattr" }, + { LPROC_LL_GETXATTR, LPROCFS_TYPE_LATENCY, "getxattr" }, + { LPROC_LL_GETXATTR_HITS, LPROCFS_TYPE_REQS, "getxattr_hits" }, + { LPROC_LL_LISTXATTR, LPROCFS_TYPE_LATENCY, "listxattr" }, + { LPROC_LL_REMOVEXATTR, LPROCFS_TYPE_LATENCY, "removexattr" }, + { LPROC_LL_INODE_PERM, LPROCFS_TYPE_LATENCY, "inode_permission" }, }; -void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, int count) +void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, long count) { if (!sbi->ll_stats) return; + if (sbi->ll_stats_track_type == STATS_TRACK_ALL) lprocfs_counter_add(sbi->ll_stats, op, count); else if (sbi->ll_stats_track_type == STATS_TRACK_PID && @@ -1661,12 +1665,14 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name) u32 type = llite_opcode_table[id].type; void *ptr = NULL; - if (type & LPROCFS_TYPE_REGS) - ptr = "regs"; + if (type & LPROCFS_TYPE_REQS) + ptr = "reqs"; else if (type & LPROCFS_TYPE_BYTES) ptr = "bytes"; else if (type & LPROCFS_TYPE_PAGES) ptr = "pages"; + else if (type & LPROCFS_TYPE_USEC) + ptr = "usec"; lprocfs_counter_init(sbi->ll_stats, llite_opcode_table[id].opcode, (type & LPROCFS_CNTR_AVGMINMAX), diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index f4ca16e..5b9f3a7 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -1322,6 +1322,7 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry, static int ll_mknod(struct inode *dir, struct dentry *dchild, umode_t mode, dev_t rdev) { + ktime_t kstart = ktime_get(); int err; CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir=" DFID "(%p) mode %o dev %x\n", @@ -1353,7 +1354,8 @@ static int ll_mknod(struct inode *dir, struct dentry *dchild, } if (!err) - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_MKNOD, 1); + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_MKNOD, + ktime_us_delta(ktime_get(), kstart)); return err; } @@ -1364,6 +1366,7 @@ static int ll_mknod(struct inode *dir, struct dentry *dchild, static int ll_create_nd(struct inode *dir, struct dentry *dentry, umode_t mode, bool want_excl) { + ktime_t kstart = ktime_get(); int rc; CDEBUG(D_VFSTRACE, @@ -1372,11 +1375,13 @@ static int ll_create_nd(struct inode *dir, struct dentry *dentry, rc = ll_mknod(dir, dentry, mode, 0); - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_CREATE, 1); - CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, unhashed %d\n", dentry, d_unhashed(dentry)); + if (!rc) + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_CREATE, + ktime_us_delta(ktime_get(), kstart)); + return rc; } @@ -1385,6 +1390,7 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild) struct ptlrpc_request *request = NULL; struct md_op_data *op_data; struct mdt_body *body; + ktime_t kstart = ktime_get(); int rc; CDEBUG(D_VFSTRACE, "VFS Op:name=%pd,dir=%lu/%u(%p)\n", @@ -1414,7 +1420,8 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild) set_nlink(dchild->d_inode, body->mbo_nlink); ll_update_times(request, dir); - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_UNLINK, 1); + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_UNLINK, + ktime_us_delta(ktime_get(), kstart)); out: ptlrpc_req_finished(request); @@ -1423,6 +1430,7 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild) static int ll_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { + ktime_t kstart = ktime_get(); int err; CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir" DFID "(%p)\n", @@ -1434,13 +1442,15 @@ static int ll_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) err = ll_new_node(dir, dentry, NULL, mode, 0, LUSTRE_OPC_MKDIR); if (!err) - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_MKDIR, 1); + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_MKDIR, + ktime_us_delta(ktime_get(), kstart)); return err; } static int ll_rmdir(struct inode *dir, struct dentry *dchild) { + ktime_t kstart = ktime_get(); struct ptlrpc_request *request = NULL; struct md_op_data *op_data; int rc; @@ -1463,7 +1473,8 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild) ll_finish_md_op_data(op_data); if (rc == 0) { ll_update_times(request, dir); - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_RMDIR, 1); + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_RMDIR, + ktime_us_delta(ktime_get(), kstart)); } ptlrpc_req_finished(request); @@ -1473,6 +1484,7 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild) static int ll_symlink(struct inode *dir, struct dentry *dentry, const char *oldname) { + ktime_t kstart = ktime_get(); int err; CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir=" DFID "(%p),target=%.*s\n", @@ -1482,7 +1494,8 @@ static int ll_symlink(struct inode *dir, struct dentry *dentry, 0, LUSTRE_OPC_SYMLINK); if (!err) - ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_SYMLINK, 1); + ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_SYMLINK, + ktime_us_delta(ktime_get(), kstart)); return err; } @@ -1494,6 +1507,7 @@ static int ll_link(struct dentry *old_dentry, struct inode *dir, struct ll_sb_info *sbi = ll_i2sbi(dir); struct ptlrpc_request *request = NULL; struct md_op_data *op_data; + ktime_t kstart = ktime_get(); int err; CDEBUG(D_VFSTRACE, @@ -1513,7 +1527,8 @@ static int ll_link(struct dentry *old_dentry, struct inode *dir, goto out; ll_update_times(request, dir); - ll_stats_ops_tally(sbi, LPROC_LL_LINK, 1); + ll_stats_ops_tally(sbi, LPROC_LL_LINK, + ktime_us_delta(ktime_get(), kstart)); out: ptlrpc_req_finished(request); return err; @@ -1526,6 +1541,7 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild, struct ptlrpc_request *request = NULL; struct ll_sb_info *sbi = ll_i2sbi(src); struct md_op_data *op_data; + ktime_t kstart = ktime_get(); int err; if (flags) @@ -1555,12 +1571,15 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild, if (!err) { ll_update_times(request, src); ll_update_times(request, tgt); - ll_stats_ops_tally(sbi, LPROC_LL_RENAME, 1); } ptlrpc_req_finished(request); - if (!err) + if (!err) { d_move(src_dchild, tgt_dchild); + ll_stats_ops_tally(sbi, LPROC_LL_RENAME, + ktime_us_delta(ktime_get(), kstart)); + } + return err; } diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index b926f87..a40f242 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -1754,8 +1754,8 @@ void pcc_vm_close(struct vm_area_struct *vma) pcc_inode_unlock(inode); } -int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, - bool *cached) +vm_fault_t pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, + bool *cached) { struct page *page = vmf->page; struct mm_struct *mm = vma->vm_mm; diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index a221ef6..ec2e421 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -239,8 +239,8 @@ int pcc_fsync(struct file *file, loff_t start, loff_t end, void pcc_vm_open(struct vm_area_struct *vma); void pcc_vm_close(struct vm_area_struct *vma); int pcc_fault(struct vm_area_struct *mva, struct vm_fault *vmf, bool *cached); -int pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, - bool *cached); +vm_fault_t pcc_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, + bool *cached); int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, struct lu_fid *fid, struct dentry **pcc_dentry); int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca); diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 38d60b0..006be6b 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -50,7 +50,6 @@ static struct inode *ll_alloc_inode(struct super_block *sb) { struct ll_inode_info *lli; - ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_ALLOC_INODE, 1); lli = kmem_cache_zalloc(ll_inode_cachep, GFP_NOFS); if (!lli) return NULL; diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 4e1ce34..7134f10 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -91,6 +91,7 @@ static int ll_xattr_set_common(const struct xattr_handler *handler, struct ptlrpc_request *req = NULL; const char *pv = value; char *fullname; + ktime_t kstart = ktime_get(); u64 valid; int rc; @@ -98,13 +99,10 @@ static int ll_xattr_set_common(const struct xattr_handler *handler, * unconditionally replaced by "". When removexattr() is * called we get a NULL value and XATTR_REPLACE for flags. */ - if (!value && flags == XATTR_REPLACE) { - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1); + if (!value && flags == XATTR_REPLACE) valid = OBD_MD_FLXATTRRM; - } else { - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_SETXATTR, 1); + else valid = OBD_MD_FLXATTR; - } rc = xattr_type_filter(sbi, handler); if (rc) @@ -153,6 +151,11 @@ static int ll_xattr_set_common(const struct xattr_handler *handler, } ptlrpc_req_finished(req); + + ll_stats_ops_tally(ll_i2sbi(inode), valid == OBD_MD_FLXATTRRM ? + LPROC_LL_REMOVEXATTR : LPROC_LL_SETXATTR, + ktime_us_delta(ktime_get(), kstart)); + return 0; } @@ -294,6 +297,11 @@ static int ll_xattr_set(const struct xattr_handler *handler, const char *name, const void *value, size_t size, int flags) { + ktime_t kstart = ktime_get(); + int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : + LPROC_LL_SETXATTR; + int rc; + LASSERT(inode); LASSERT(name); @@ -302,18 +310,14 @@ static int ll_xattr_set(const struct xattr_handler *handler, /* lustre/trusted.lov.xxx would be passed through xattr API */ if (!strcmp(name, "lov")) { - int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : - LPROC_LL_SETXATTR; - - ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); - - return ll_setstripe_ea(dentry, (struct lov_user_md *)value, + rc = ll_setstripe_ea(dentry, (struct lov_user_md *)value, size); + ll_stats_ops_tally(ll_i2sbi(inode), op_type, + ktime_us_delta(ktime_get(), kstart)); + return rc; } else if (!strcmp(name, "lma") || !strcmp(name, "link")) { - int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : - LPROC_LL_SETXATTR; - - ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); + ll_stats_ops_tally(ll_i2sbi(inode), op_type, + ktime_us_delta(ktime_get(), kstart)); return 0; } @@ -402,14 +406,13 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, const char *name, void *buffer, size_t size) { struct ll_sb_info *sbi = ll_i2sbi(inode); + ktime_t kstart = ktime_get(); char *fullname; int rc; CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", PFID(ll_inode2fid(inode)), inode); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_GETXATTR, 1); - rc = xattr_type_filter(sbi, handler); if (rc) return rc; @@ -444,6 +447,9 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size, OBD_MD_FLXATTR); kfree(fullname); + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_GETXATTR, + ktime_us_delta(ktime_get(), kstart)); + return rc; } @@ -569,6 +575,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) { struct inode *inode = d_inode(dentry); struct ll_sb_info *sbi = ll_i2sbi(inode); + ktime_t kstart = ktime_get(); char *xattr_name; ssize_t rc, rc2; size_t len, rem; @@ -578,8 +585,6 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n", PFID(ll_inode2fid(inode)), inode); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_LISTXATTR, 1); - rc = ll_xattr_list(inode, NULL, XATTR_OTHER_T, buffer, size, OBD_MD_FLXATTRLS); if (rc < 0) @@ -591,7 +596,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) * exists. */ if (!size) - return rc + sizeof(XATTR_LUSTRE_LOV); + goto out; xattr_name = buffer; rem = rc; @@ -625,6 +630,10 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) memcpy(buffer + rc, XATTR_LUSTRE_LOV, sizeof(XATTR_LUSTRE_LOV)); +out: + ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_LISTXATTR, + ktime_us_delta(ktime_get(), kstart)); + return rc + sizeof(XATTR_LUSTRE_LOV); } From patchwork Thu Feb 27 21:16:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21A8C17E0 for ; Thu, 27 Feb 2020 21:51:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0A6FD24692 for ; Thu, 27 Feb 2020 21:51:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A6FD24692 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B41B6348D3B; Thu, 27 Feb 2020 13:43:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B4493488BF for ; Thu, 27 Feb 2020 13:21:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 62E4291A3; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 614CA468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:29 -0500 Message-Id: <1582838290-17243-522-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 521/622] lustre: osc: don't re-enable grant shrink on reconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev client requests grant shrinking support on each reconnect and re-enables the capability even it was explicitly disabled by lctl set_param. Cray-bug-id: LUS-7585 WC-bug-id: https://jira.whamcloud.com/browse/LU-12759 Lustre-commit: efa3425c5f5a ("LU-12759 osc: don't re-enable grant shrink on reconnect") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/36177 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 4 +++- fs/lustre/osc/lproc_osc.c | 32 +++++++++----------------------- fs/lustre/osc/osc_request.c | 4 ++-- 3 files changed, 14 insertions(+), 26 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index c2f98e6..501a896 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -303,7 +303,9 @@ struct obd_import { /* import has tried to connect with server */ imp_connect_tried:1, /* connected but not FULL yet */ - imp_connected:1; + imp_connected:1, + /* grant shrink disabled */ + imp_grant_shrink_disabled:1; u32 imp_connect_op; u32 imp_idle_timeout; diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 8e0088b..2bc7047 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -695,18 +695,17 @@ static ssize_t grant_shrink_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); - struct client_obd *cli = &obd->u.cli; - struct obd_connect_data *ocd; + struct obd_import *imp; ssize_t len; len = lprocfs_climp_check(obd); if (len) return len; - ocd = &cli->cl_import->imp_connect_data; - + imp = obd->u.cli.cl_import; len = snprintf(buf, PAGE_SIZE, "%d\n", - !!OCD_HAS_FLAG(ocd, GRANT_SHRINK)); + !imp->imp_grant_shrink_disabled && + OCD_HAS_FLAG(&imp->imp_connect_data, GRANT_SHRINK)); up_read(&obd->u.cli.cl_sem); return len; @@ -717,8 +716,7 @@ static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct client_obd *cli = &dev->u.cli; - struct obd_connect_data *ocd; + struct obd_import *imp; bool val; int rc; @@ -733,22 +731,10 @@ static ssize_t grant_shrink_store(struct kobject *kobj, struct attribute *attr, if (rc) return rc; - ocd = &cli->cl_import->imp_connect_data; - - if (!val) { - if (OCD_HAS_FLAG(ocd, GRANT_SHRINK)) - ocd->ocd_connect_flags &= ~OBD_CONNECT_GRANT_SHRINK; - } else { - /** - * server replied obd_connect_data is always bigger, so - * client's imp_connect_flags_orig are always supported - * by the server - */ - if (!OCD_HAS_FLAG(ocd, GRANT_SHRINK) && - cli->cl_import->imp_connect_flags_orig & - OBD_CONNECT_GRANT_SHRINK) - ocd->ocd_connect_flags |= OBD_CONNECT_GRANT_SHRINK; - } + imp = dev->u.cli.cl_import; + spin_lock(&imp->imp_lock); + imp->imp_grant_shrink_disabled = !val; + spin_unlock(&imp->imp_lock); up_read(&dev->u.cli.cl_sem); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 9c43756..39cac7d 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -844,8 +844,8 @@ static int osc_should_shrink_grant(struct client_obd *client) if (!client->cl_import) return 0; - if ((client->cl_import->imp_connect_data.ocd_connect_flags & - OBD_CONNECT_GRANT_SHRINK) == 0) + if (!OCD_HAS_FLAG(&client->cl_import->imp_connect_data, GRANT_SHRINK) || + client->cl_import->imp_grant_shrink_disabled) return 0; if (ktime_get_seconds() >= next_shrink - 5) { From patchwork Thu Feb 27 21:16:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CEA10924 for ; Thu, 27 Feb 2020 21:43:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B6E2124690 for ; Thu, 27 Feb 2020 21:43:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6E2124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7149B349574; Thu, 27 Feb 2020 13:35:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0F5E3488C4 for ; Thu, 27 Feb 2020 13:21:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6553F91A4; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 63FFF46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:30 -0500 Message-Id: <1582838290-17243-523-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 522/622] lustre: llite: statfs to use NODELAY with MDS X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev otherwise client umount can get stuck if MDS is down for a reason. recovery-small/110k simulates this. WC-bug-id: https://jira.whamcloud.com/browse/LU-12809 Lustre-commit: a7ae8da24229 ("LU-12809 llite: statfs to use NODELAY with MDS") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/36297 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 84472fb..1245336 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1869,6 +1869,9 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, max_age = ktime_get_seconds() - sbi->ll_statfs_max_age; + if (sbi->ll_flags & LL_SBI_LAZYSTATFS) + flags |= OBD_STATFS_NODELAY; + rc = obd_statfs(NULL, sbi->ll_md_exp, osfs, max_age, flags); if (rc) return rc; @@ -1882,9 +1885,6 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, if (osfs->os_state & OS_STATE_SUM) goto out; - if (sbi->ll_flags & LL_SBI_LAZYSTATFS) - flags |= OBD_STATFS_NODELAY; - rc = obd_statfs(NULL, sbi->ll_dt_exp, &obd_osfs, max_age, flags); if (rc) { /* Possibly a filesystem with no OSTs. Report MDT totals. */ From patchwork Thu Feb 27 21:16:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8CE6924 for ; Thu, 27 Feb 2020 21:50:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D03F524690 for ; Thu, 27 Feb 2020 21:50:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D03F524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36A8434A918; Thu, 27 Feb 2020 13:41:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3EEFA3488C4 for ; Thu, 27 Feb 2020 13:21:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 67F6391A5; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 66BCE46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:31 -0500 Message-Id: <1582838290-17243-524-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 523/622] lustre: ptlrpc: grammar fix. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev ptlrpc_invalidate_import() error message grammar fix. Cray-bug-id: LUS-4015 WC-bug-id: https://jira.whamcloud.com/browse/LU-12370 Lustre-commit: 316eddce9382 ("LU-12370 ptlrpc: grammar fix.") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/36508 Reviewed-by: Andrew Perepechko Reviewed-by: Stephan Thiell Reviewed-by: Andreas Dilger Reviewed-by: Arshad Hussain Reviewed-by: Colin Faber Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 76a40be..813d3c8 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -381,7 +381,7 @@ void ptlrpc_invalidate_import(struct obd_import *imp) "still on delayed list"); } - CERROR("%s: Unregistering RPCs found (%d). Network is sluggish? Waiting them to error out.\n", + CERROR("%s: Unregistering RPCs found (%d). Network is sluggish? Waiting for them to error out.\n", cli_tgt, atomic_read(&imp->imp_unregistering)); } From patchwork Thu Feb 27 21:16:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A0B6138D for ; Thu, 27 Feb 2020 21:40:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1280B24690 for ; Thu, 27 Feb 2020 21:40:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1280B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4953834A706; Thu, 27 Feb 2020 13:32:42 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 81B463488CC for ; Thu, 27 Feb 2020 13:21:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6AE3D91A6; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6976246D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:32 -0500 Message-Id: <1582838290-17243-525-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 524/622] lustre: lov: check all entries in lov_flush_composite X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev Check all layout entries for DOM layout and exit with -ENODATA if no one exists. Caller consider that as valid case due to layout change. Define llo_flush methods for all layouts as required by lov_dispatch(). Patch cleans up also cl_dom_size field in cl_layout which was used in previous ll_dom_lock_cancel() implementation Run lov_flush_composite under down_read lov->lo_type_guard to avoid race with layout change. Fixes: 865a95df36 ("lustre: llite: improve ll_dom_lock_cancel") WC-bug-id: https://jira.whamcloud.com/browse/LU-12704 Lustre-commit: 44460570fd21 ("LU-12704 lov: check all entries in lov_flush_composite") Signed-off-by: Mikhail Pershin Signed-off-by: Vladimir Saveliev Reviewed-on: https://review.whamcloud.com/36368 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 2 -- fs/lustre/llite/namei.c | 6 ++++++ fs/lustre/lov/lov_object.c | 42 +++++++++++++++++++++++------------------- 3 files changed, 29 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index c3376a4..67731b0 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -287,8 +287,6 @@ struct cl_layout { struct lu_buf cl_buf; /** size of layout in lov_mds_md format. */ size_t cl_size; - /** size of DoM component if exists or zero otherwise */ - u64 cl_dom_comp_size; /** Layout generation. */ u32 cl_layout_gen; /** whether layout is a composite one */ diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 5b9f3a7..c87653d 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -198,6 +198,12 @@ static int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock) /* reach MDC layer to flush data under the DoM ldlm lock */ rc = cl_object_flush(env, lli->lli_clob, lock); + if (rc == -ENODATA) { + CDEBUG(D_INODE, "inode "DFID" layout has no DoM stripe\n", + PFID(ll_inode2fid(inode))); + /* most likely result of layout change, do nothing */ + rc = 0; + } cl_env_put(env, &refcheck); return rc; diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 5c4d8f9..f2c7bc2 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -1048,13 +1048,23 @@ static int lov_flush_composite(const struct lu_env *env, struct ldlm_lock *lock) { struct lov_object *lov = cl2lov(obj); - struct lovsub_object *lovsub; + struct lov_layout_entry *lle; + int rc = -ENODATA; - if (!lsme_is_dom(lov->lo_lsm->lsm_entries[0])) - return -EINVAL; + lov_foreach_layout_entry(lov, lle) { + if (!lsme_is_dom(lle->lle_lsme)) + continue; + rc = cl_object_flush(env, lovsub2cl(lle->lle_dom.lo_dom), lock); + break; + } + + return rc; +} - lovsub = lov->u.composite.lo_entries[0].lle_dom.lo_dom; - return cl_object_flush(env, lovsub2cl(lovsub), lock); +static int lov_flush_empty(const struct lu_env *env, struct cl_object *obj, + struct ldlm_lock *lock) +{ + return 0; } const static struct lov_layout_operations lov_dispatch[] = { @@ -1066,7 +1076,8 @@ static int lov_flush_composite(const struct lu_env *env, .llo_page_init = lov_page_init_empty, .llo_lock_init = lov_lock_init_empty, .llo_io_init = lov_io_init_empty, - .llo_getattr = lov_attr_get_empty + .llo_getattr = lov_attr_get_empty, + .llo_flush = lov_flush_empty, }, [LLT_RELEASED] = { .llo_init = lov_init_released, @@ -1076,7 +1087,8 @@ static int lov_flush_composite(const struct lu_env *env, .llo_page_init = lov_page_init_empty, .llo_lock_init = lov_lock_init_empty, .llo_io_init = lov_io_init_released, - .llo_getattr = lov_attr_get_empty + .llo_getattr = lov_attr_get_empty, + .llo_flush = lov_flush_empty, }, [LLT_COMP] = { .llo_init = lov_init_composite, @@ -1098,6 +1110,7 @@ static int lov_flush_composite(const struct lu_env *env, .llo_lock_init = lov_lock_init_empty, .llo_io_init = lov_io_init_empty, .llo_getattr = lov_attr_get_empty, + .llo_flush = lov_flush_empty, }, }; @@ -2085,18 +2098,8 @@ static int lov_object_layout_get(const struct lu_env *env, cl->cl_size = lov_comp_md_size(lsm); cl->cl_layout_gen = lsm->lsm_layout_gen; - cl->cl_dom_comp_size = 0; cl->cl_is_released = lsm->lsm_is_released; - if (lsm_is_composite(lsm->lsm_magic)) { - struct lov_stripe_md_entry *lsme = lsm->lsm_entries[0]; - - cl->cl_is_composite = true; - - if (lsme_is_dom(lsme)) - cl->cl_dom_comp_size = lsme->lsme_extent.e_end; - } else { - cl->cl_is_composite = false; - } + cl->cl_is_composite = lsm_is_composite(lsm->lsm_magic); rc = lov_lsm_pack(lsm, buf->lb_buf, buf->lb_len); lov_lsm_put(lsm); @@ -2123,7 +2126,8 @@ static loff_t lov_object_maxbytes(struct cl_object *obj) static int lov_object_flush(const struct lu_env *env, struct cl_object *obj, struct ldlm_lock *lock) { - return LOV_2DISPATCH_NOLOCK(cl2lov(obj), llo_flush, env, obj, lock); + return LOV_2DISPATCH_MAYLOCK(cl2lov(obj), llo_flush, true, env, obj, + lock); } static const struct cl_object_operations lov_ops = { From patchwork Thu Feb 27 21:16:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410773 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4F4017E0 for ; Thu, 27 Feb 2020 21:46:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DB7B246A1 for ; Thu, 27 Feb 2020 21:46:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DB7B246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA32234B228; Thu, 27 Feb 2020 13:36:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DC02E3488CC for ; Thu, 27 Feb 2020 13:21:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6E76791A7; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C45547C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:33 -0500 Message-Id: <1582838290-17243-526-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 525/622] lustre: pcc: Incorrect size after re-attach X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The following test case will result in incorrect size for PCC copy: - Attach a file with size of s1 (s2 > 0) into PCC; - Detach this file with --keep option, and the data will retain on PCC; - Truncate this file locally or on an remote client to a new size s2 (s2 < s1); - Re-attach the file again. The size of PCC copy is still s1. To solve this problem, it need to truncate the size of the PCC copy to the same size of the Lustre copy which will be HSM released later after finished the data copy (archive) phase. This patch also adds the handle for the signal pending when the attach process is killed by an administrator. WC-bug-id: https://jira.whamcloud.com/browse/LU-13023 Lustre-commmit: 7a810496c2c ("LU-13023 pcc: Incorrect size after re-attach") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/36884 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/pcc.c | 55 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index a40f242..550045b 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -2023,16 +2023,21 @@ static int __pcc_inode_create(struct pcc_dataset *dataset, return rc; } -/* TODO: Set the project ID for PCC copy */ -int pcc_inode_store_ugpid(struct dentry *dentry, kuid_t uid, kgid_t gid) +/* + * Reset uid, gid or size for the PCC copy masked by @valid. + * TODO: Set the project ID for PCC copy. + */ +int pcc_inode_reset_iattr(struct dentry *dentry, unsigned int valid, + kuid_t uid, kgid_t gid, loff_t size) { struct inode *inode = dentry->d_inode; struct iattr attr; int rc; - attr.ia_valid = ATTR_UID | ATTR_GID; + attr.ia_valid = valid; attr.ia_uid = uid; attr.ia_gid = gid; + attr.ia_size = size; inode_lock(inode); rc = notify_change(dentry, &attr, NULL); @@ -2077,8 +2082,8 @@ int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca) goto out_put; } - rc = pcc_inode_store_ugpid(pcc_dentry, old_cred->suid, - old_cred->sgid); + rc = pcc_inode_reset_iattr(pcc_dentry, ATTR_UID | ATTR_GID, + old_cred->suid, old_cred->sgid, 0); if (rc) goto out_put; @@ -2152,9 +2157,9 @@ static int pcc_filp_write(struct file *filp, const void *buf, ssize_t count, return 0; } -static int pcc_copy_data(struct file *src, struct file *dst) +static ssize_t pcc_copy_data(struct file *src, struct file *dst) { - int rc = 0; + ssize_t rc = 0; ssize_t rc2; loff_t pos, offset = 0; size_t buf_len = 1048576; @@ -2165,6 +2170,10 @@ static int pcc_copy_data(struct file *src, struct file *dst) return -ENOMEM; while (1) { + if (signal_pending(current)) { + rc = -EINTR; + goto out_free; + } pos = offset; rc2 = kernel_read(src, buf, buf_len, &pos); if (rc2 < 0) { @@ -2180,6 +2189,7 @@ static int pcc_copy_data(struct file *src, struct file *dst) offset += rc2; } + rc = offset; out_free: kvfree(buf); return rc; @@ -2219,6 +2229,7 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, struct dentry *dentry; struct file *pcc_filp; struct path path; + ssize_t ret; int rc; rc = pcc_attach_allowed_check(inode); @@ -2232,27 +2243,35 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, old_cred = override_creds(pcc_super_cred(inode->i_sb)); rc = __pcc_inode_create(dataset, &lli->lli_fid, &dentry); - if (rc) { - revert_creds(old_cred); + if (rc) goto out_dataset_put; - } path.mnt = dataset->pccd_path.mnt; path.dentry = dentry; - pcc_filp = dentry_open(&path, O_TRUNC | O_WRONLY | O_LARGEFILE, - current_cred()); + pcc_filp = dentry_open(&path, O_WRONLY | O_LARGEFILE, current_cred()); if (IS_ERR_OR_NULL(pcc_filp)) { rc = pcc_filp ? PTR_ERR(pcc_filp) : -EINVAL; - revert_creds(old_cred); goto out_dentry; } - rc = pcc_inode_store_ugpid(dentry, old_cred->uid, old_cred->gid); - revert_creds(old_cred); + rc = pcc_inode_reset_iattr(dentry, ATTR_UID | ATTR_GID, + old_cred->uid, old_cred->gid, 0); if (rc) goto out_fput; - rc = pcc_copy_data(file, pcc_filp); + ret = pcc_copy_data(file, pcc_filp); + if (ret < 0) { + rc = ret; + goto out_fput; + } + + /* + * It must to truncate the PCC copy to the same size of the Lustre + * copy after copy data. Otherwise, it may get wrong file size after + * re-attach a file. See LU-13023 for details. + */ + rc = pcc_inode_reset_iattr(dentry, ATTR_SIZE, KUIDT_INIT(0), + KGIDT_INIT(0), ret); if (rc) goto out_fput; @@ -2276,13 +2295,13 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, fput(pcc_filp); out_dentry: if (rc) { - old_cred = override_creds(pcc_super_cred(inode->i_sb)); (void) pcc_inode_remove(inode, dentry); - revert_creds(old_cred); dput(dentry); } out_dataset_put: pcc_dataset_put(dataset); + revert_creds(old_cred); + return rc; } From patchwork Thu Feb 27 21:16:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1ED5217E0 for ; Thu, 27 Feb 2020 21:46:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 04B24246A1 for ; Thu, 27 Feb 2020 21:46:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 04B24246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8EC6034A174; Thu, 27 Feb 2020 13:36:54 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3EAED3488CC for ; Thu, 27 Feb 2020 13:21:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 701F391A8; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F165468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:34 -0500 Message-Id: <1582838290-17243-527-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 526/622] lustre: pcc: auto attach not work after client cache clear X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin When the inode of a PCC cached file in unused state was evicted from icache due to memory pressure or manual icache cleanup (i.e. "echo 3 > /proc/sys/vm/drop_caches"), this file will be detached from PCC also, and all PCC state for this file is cleared. In the current design, PCC only tries to auto attache the file once attached into PCC according to the in-memery PCC state. Thus later IO for the file is not directed to PCC and will trigger the data restore. If this is a not desired result for the user, then we need to try to auto attach file that was never attached into PCC or once attached but detached as a result of shrinking its inode from icache. Although the candidates to try auto attach are increased, but only the file in HSM released state (which can directly get from file layout) will be checked. This bug is easy reproduced on rhel8. It seems that the command "echo 3 > /proc/sys/vm/drop_caches" will drop all unused inodes from icache, but it is not true for rhel7. This patch also adds the check for the input parameter @rwid, which should be non zero value and same as the archive ID. WC-bug-id: https://jira.whamcloud.com/browse/LU-13030 Lustre-commit: a5ef2d6e068e ("LU-13030 pcc: auto attach not work after client cache clear") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/36892 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 27 ++++- fs/lustre/llite/llite_lib.c | 2 + fs/lustre/llite/pcc.c | 194 ++++++++++++++++++++++++++------ fs/lustre/llite/pcc.h | 26 +++-- include/uapi/linux/lustre/lustre_user.h | 10 -- 5 files changed, 204 insertions(+), 55 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 205ea50..8e7b949 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -206,7 +206,22 @@ struct ll_inode_info { struct mutex lli_pcc_lock; enum lu_pcc_state_flags lli_pcc_state; - struct pcc_inode *lli_pcc_inode; + /* + * @lli_pcc_generation saves the gobal PCC generation + * when the file was successfully attached into PCC. + * The flags of the PCC dataset are saved in + * @lli_pcc_dsflags. + * The gobal PCC generation will be increased when add + * or delete a PCC backend, or change the configuration + * parameters for PCC. + * If @lli_pcc_generation is same as the gobal PCC + * generation, we can use the saved flags of the PCC + * dataset to determine whether need to try auto attach + * safely. + */ + u64 lli_pcc_generation; + enum pcc_dataset_flags lli_pcc_dsflags; + struct pcc_inode *lli_pcc_inode; struct mutex lli_group_mutex; u64 lli_group_users; unsigned long lli_group_gid; @@ -1432,4 +1447,14 @@ int cl_setattr_ost(struct cl_object *obj, const struct iattr *attr, u64 cl_fid_build_ino(const struct lu_fid *fid, bool api32); u32 cl_fid_build_gen(const struct lu_fid *fid); +static inline struct pcc_super *ll_i2pccs(struct inode *inode) +{ + return &ll_i2sbi(inode)->ll_pcc_super; +} + +static inline struct pcc_super *ll_info2pccs(struct ll_inode_info *lli) +{ + return ll_i2pccs(ll_info2i(lli)); +} + #endif /* LLITE_INTERNAL_H */ diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 1245336..c2baf6a 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -983,6 +983,8 @@ void ll_lli_init(struct ll_inode_info *lli) mutex_init(&lli->lli_pcc_lock); lli->lli_pcc_state = PCC_STATE_FL_NONE; lli->lli_pcc_inode = NULL; + lli->lli_pcc_dsflags = PCC_DATASET_NONE; + lli->lli_pcc_generation = 0; mutex_init(&lli->lli_group_mutex); lli->lli_group_users = 0; lli->lli_group_gid = 0; diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index 550045b..a0e31c8 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -126,6 +126,7 @@ int pcc_super_init(struct pcc_super *super) cap_lower(cred->cap_effective, CAP_SYS_RESOURCE); init_rwsem(&super->pccs_rw_sem); INIT_LIST_HEAD(&super->pccs_datasets); + super->pccs_generation = 1; return 0; } @@ -553,6 +554,12 @@ static int pcc_id_parse(struct pcc_cmd *cmd, const char *id) */ if ((cmd->u.pccc_add.pccc_flags & PCC_DATASET_PCC_ALL) == 0) cmd->u.pccc_add.pccc_flags |= PCC_DATASET_PCC_ALL; + + /* For RW-PCC, the value of @rwid must be non zero. */ + if (cmd->u.pccc_add.pccc_flags & PCC_DATASET_RWPCC && + cmd->u.pccc_add.pccc_rwid == 0) + return -EINVAL; + break; case PCC_DEL_DATASET: case PCC_CLEAR_ALL: @@ -840,6 +847,7 @@ struct pcc_dataset * if (strcmp(dataset->pccd_pathname, pathname) == 0) { list_del_init(&dataset->pccd_linkage); pcc_dataset_put(dataset); + super->pccs_generation++; rc = 0; break; } @@ -880,6 +888,7 @@ static void pcc_remove_datasets(struct pcc_super *super) list_del(&dataset->pccd_linkage); pcc_dataset_put(dataset); } + super->pccs_generation++; up_write(&super->pccs_rw_sem); } @@ -1101,9 +1110,15 @@ void pcc_file_init(struct pcc_file *pccf) pccf->pccf_type = LU_PCC_NONE; } -static inline bool pcc_auto_attach_enabled(struct pcc_dataset *dataset) +static inline bool pcc_auto_attach_enabled(enum pcc_dataset_flags flags, + enum pcc_io_type iot) { - return dataset->pccd_flags & PCC_DATASET_AUTO_ATTACH; + if (iot == PIT_OPEN) + return flags & PCC_DATASET_OPEN_ATTACH; + if (iot == PIT_GETATTR) + return flags & PCC_DATASET_STAT_ATTACH; + else + return flags & PCC_DATASET_AUTO_ATTACH; } static const char pcc_xattr_layout[] = XATTR_USER_PREFIX "PCC.layout"; @@ -1114,7 +1129,7 @@ static int pcc_layout_xattr_set(struct pcc_inode *pcci, u32 gen) struct ll_inode_info *lli = pcci->pcci_lli; int rc; - if (!(lli->lli_pcc_state & PCC_STATE_FL_AUTO_ATTACH)) + if (!(lli->lli_pcc_dsflags & PCC_DATASET_AUTO_ATTACH)) return 0; rc = __vfs_setxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout, @@ -1166,21 +1181,33 @@ static void pcc_inode_attach_init(struct pcc_dataset *dataset, struct dentry *dentry, enum lu_pcc_type type) { - struct ll_inode_info *lli = pcci->pcci_lli; - pcci->pcci_path.mnt = mntget(dataset->pccd_path.mnt); pcci->pcci_path.dentry = dentry; LASSERT(atomic_read(&pcci->pcci_refcount) == 0); atomic_set(&pcci->pcci_refcount, 1); pcci->pcci_type = type; pcci->pcci_attr_valid = false; +} - if (dataset->pccd_flags & PCC_DATASET_OPEN_ATTACH) - lli->lli_pcc_state |= PCC_STATE_FL_OPEN_ATTACH; - if (dataset->pccd_flags & PCC_DATASET_IO_ATTACH) - lli->lli_pcc_state |= PCC_STATE_FL_IO_ATTACH; - if (dataset->pccd_flags & PCC_DATASET_STAT_ATTACH) - lli->lli_pcc_state |= PCC_STATE_FL_STAT_ATTACH; +static inline void pcc_inode_dsflags_set(struct ll_inode_info *lli, + struct pcc_dataset *dataset) +{ + lli->lli_pcc_generation = ll_info2pccs(lli)->pccs_generation; + lli->lli_pcc_dsflags = dataset->pccd_flags; +} + +static void pcc_inode_attach_set(struct pcc_super *super, + struct pcc_dataset *dataset, + struct ll_inode_info *lli, + struct pcc_inode *pcci, + struct dentry *dentry, + enum lu_pcc_type type) +{ + pcc_inode_init(pcci, lli); + pcc_inode_attach_init(dataset, pcci, dentry, type); + down_read(&super->pccs_rw_sem); + pcc_inode_dsflags_set(lli, dataset); + up_read(&super->pccs_rw_sem); } static inline void pcc_layout_gen_set(struct pcc_inode *pcci, @@ -1263,6 +1290,7 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, pcc_inode_get(pcci); pcci->pcci_type = type; } + pcc_inode_dsflags_set(lli, dataset); pcc_layout_gen_set(pcci, gen); *cached = true; } @@ -1274,28 +1302,83 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, return rc; } -static int pcc_try_datasets_attach(struct inode *inode, u32 gen, - enum lu_pcc_type type, bool *cached) +static int pcc_try_datasets_attach(struct inode *inode, enum pcc_io_type iot, + u32 gen, enum lu_pcc_type type, + bool *cached) { - struct pcc_dataset *dataset, *tmp; struct pcc_super *super = &ll_i2sbi(inode)->ll_pcc_super; + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_dataset *dataset = NULL, *tmp; int rc = 0; down_read(&super->pccs_rw_sem); list_for_each_entry_safe(dataset, tmp, &super->pccs_datasets, pccd_linkage) { - if (!pcc_auto_attach_enabled(dataset)) - continue; + if (!pcc_auto_attach_enabled(dataset->pccd_flags, iot)) + break; + rc = pcc_try_dataset_attach(inode, gen, type, dataset, cached); if (rc < 0 || (!rc && *cached)) break; } + + /* + * Update the saved dataset flags for the inode accordingly if failed. + */ + if (!rc && !*cached) { + /* + * Currently auto attach strategy for a PCC backend is + * unchangeable once once it was added into the PCC datasets on + * a client as the support to change auto attach strategy is + * not implemented yet. + */ + /* + * If tried to attach from one PCC backend: + * @lli_pcc_generation > 0: + * 1) The file was once attached into PCC, but now the + * corresponding PCC backend should be removed from the client; + * 2) The layout generation was changed, the data has been + * restored; + * 3) The corresponding PCC copy is not existed on PCC + * @lli_pcc_generation == 0: + * The file is never attached into PCC but in a HSM released + * state, or once attached into PCC but the inode was evicted + * from icache later. + * Set the saved dataset flags with PCC_DATASET_NONE. Then this + * file will skip from the candidates to try auto attach until + * the file is attached ninto PCC again. + * + * If the file was never attached into PCC, or once attached but + * its inode was evicted from icache (lli_pcc_generation == 0), + * set the saved dataset flags with PCC_DATASET_NONE. + * + * If the file was once attached into PCC but the corresponding + * dataset was removed from the client, set the saved dataset + * flags with PCC_DATASET_NONE. + * + * TODO: If the file was once attached into PCC but not try to + * auto attach due to the change of the configuration parameters + * for this dataset (i.e. change from auto attach enabled to + * auto attach disabled for this dataset), update the saved + * dataset flags witha the found one. + */ + lli->lli_pcc_dsflags = PCC_DATASET_NONE; + } up_read(&super->pccs_rw_sem); return rc; } -static int pcc_try_auto_attach(struct inode *inode, bool *cached, bool is_open) +/* + * TODO: For RW-PCC, it is desirable to store HSM info as a layout (LU-10606). + * Thus the client can get archive ID from the layout directly. When try to + * attach the file automatically which is in HSM released state (according to + * LOV_PATTERN_F_RELEASED in the layout), it can determine whether the file is + * valid cached on PCC more precisely according to the @rwid (archive ID) in + * the PCC dataset and the archive ID in HSM attrs. + */ +static int pcc_try_auto_attach(struct inode *inode, bool *cached, + enum pcc_io_type iot) { struct pcc_super *super = &ll_i2sbi(inode)->ll_pcc_super; struct cl_layout clt = { @@ -1317,7 +1400,7 @@ static int pcc_try_auto_attach(struct inode *inode, bool *cached, bool is_open) * obtain valid layout lock from MDT (i.e. the file is being * HSM restoring). */ - if (is_open) { + if (iot == PIT_OPEN) { if (ll_layout_version_get(lli) == CL_LAYOUT_GEN_NONE) return 0; } else { @@ -1330,19 +1413,54 @@ static int pcc_try_auto_attach(struct inode *inode, bool *cached, bool is_open) if (rc) return rc; - if (!is_open && gen != clt.cl_layout_gen) { + if (iot != PIT_OPEN && gen != clt.cl_layout_gen) { CDEBUG(D_CACHE, DFID" layout changed from %d to %d.\n", PFID(ll_inode2fid(inode)), gen, clt.cl_layout_gen); return -EINVAL; } if (clt.cl_is_released) - rc = pcc_try_datasets_attach(inode, clt.cl_layout_gen, + rc = pcc_try_datasets_attach(inode, iot, clt.cl_layout_gen, LU_PCC_READWRITE, cached); return rc; } +static inline bool pcc_may_auto_attach(struct inode *inode, + enum pcc_io_type iot) +{ + struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_super *super = ll_i2pccs(inode); + + /* Known the file was not in any PCC backend. */ + if (lli->lli_pcc_dsflags & PCC_DATASET_NONE) + return false; + + /* + * lli_pcc_generation = 0 means that the file was never attached into + * PCC, or may be once attached into PCC but detached as the inode is + * evicted from icache (i.e. "echo 3 > /proc/sys/vm/drop_caches" or + * icache shrinking due to the memory pressure), which will cause the + * file detach from PCC when releasing the inode from icache. + * In either case, we still try to attach. + */ + /* lli_pcc_generation == 0, or the PCC setting was changed, + * or there is no PCC setup on the client and the try will return + * immediately in pcc_try_auto_attch(). + */ + if (super->pccs_generation != lli->lli_pcc_generation) + return true; + + /* The cached setting @lli_pcc_dsflags is valid */ + if (iot == PIT_OPEN) + return lli->lli_pcc_dsflags & PCC_DATASET_OPEN_ATTACH; + + if (iot == PIT_GETATTR) + return lli->lli_pcc_dsflags & PCC_DATASET_STAT_ATTACH; + + return lli->lli_pcc_dsflags & PCC_DATASET_IO_ATTACH; +} + int pcc_file_open(struct inode *inode, struct file *file) { struct pcc_inode *pcci; @@ -1365,8 +1483,8 @@ int pcc_file_open(struct inode *inode, struct file *file) goto out_unlock; if (!pcci || !pcc_inode_has_layout(pcci)) { - if (lli->lli_pcc_state & PCC_STATE_FL_OPEN_ATTACH) - rc = pcc_try_auto_attach(inode, &cached, true); + if (pcc_may_auto_attach(inode, PIT_OPEN)) + rc = pcc_try_auto_attach(inode, &cached, PIT_OPEN); if (rc < 0 || !cached) goto out_unlock; @@ -1429,7 +1547,6 @@ void pcc_file_release(struct inode *inode, struct file *file) static void pcc_io_init(struct inode *inode, enum pcc_io_type iot, bool *cached) { - struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci; pcc_inode_lock(inode); @@ -1440,11 +1557,8 @@ static void pcc_io_init(struct inode *inode, enum pcc_io_type iot, bool *cached) *cached = true; } else { *cached = false; - if ((lli->lli_pcc_state & PCC_STATE_FL_IO_ATTACH && - iot != PIT_GETATTR) || - (iot == PIT_GETATTR && - lli->lli_pcc_state & PCC_STATE_FL_STAT_ATTACH)) { - (void) pcc_try_auto_attach(inode, cached, false); + if (pcc_may_auto_attach(inode, iot)) { + (void) pcc_try_auto_attach(inode, cached, iot); if (*cached) { pcci = ll_i2pcci(inode); LASSERT(atomic_read(&pcci->pcci_refcount) > 0); @@ -2061,6 +2175,7 @@ int pcc_inode_create(struct super_block *sb, struct pcc_dataset *dataset, int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca) { struct dentry *pcc_dentry = pca->pca_dentry; + struct pcc_super *super = ll_i2pccs(inode); const struct cred *old_cred; struct pcc_inode *pcci; int rc = 0; @@ -2073,7 +2188,7 @@ int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca) LASSERT(pcc_dentry); - old_cred = override_creds(pcc_super_cred(inode->i_sb)); + old_cred = override_creds(super->pccs_cred); pcc_inode_lock(inode); LASSERT(!ll_i2pcci(inode)); pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); @@ -2087,9 +2202,8 @@ int pcc_inode_create_fini(struct inode *inode, struct pcc_create_attach *pca) if (rc) goto out_put; - pcc_inode_init(pcci, ll_i2info(inode)); - pcc_inode_attach_init(pca->pca_dataset, pcci, pcc_dentry, - LU_PCC_READWRITE); + pcc_inode_attach_set(super, pca->pca_dataset, ll_i2info(inode), + pcci, pcc_dentry, LU_PCC_READWRITE); rc = pcc_layout_xattr_set(pcci, 0); if (rc) { @@ -2224,6 +2338,7 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, { struct pcc_dataset *dataset; struct ll_inode_info *lli = ll_i2info(inode); + struct pcc_super *super = ll_i2pccs(inode); struct pcc_inode *pcci; const struct cred *old_cred; struct dentry *dentry; @@ -2241,7 +2356,7 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, if (!dataset) return -ENOENT; - old_cred = override_creds(pcc_super_cred(inode->i_sb)); + old_cred = override_creds(super->pccs_cred); rc = __pcc_inode_create(dataset, &lli->lli_fid, &dentry); if (rc) goto out_dataset_put; @@ -2287,8 +2402,8 @@ int pcc_readwrite_attach(struct file *file, struct inode *inode, goto out_unlock; } - pcc_inode_init(pcci, lli); - pcc_inode_attach_init(dataset, pcci, dentry, LU_PCC_READWRITE); + pcc_inode_attach_set(super, dataset, lli, pcci, + dentry, LU_PCC_READWRITE); out_unlock: pcc_inode_unlock(inode); out_fput: @@ -2417,8 +2532,15 @@ int pcc_ioctl_detach(struct inode *inode, u32 opt) LASSERT(atomic_read(&pcci->pcci_refcount) > 0); if (pcci->pcci_type == LU_PCC_READWRITE) { - if (opt == PCC_DETACH_OPT_UNCACHE) + if (opt == PCC_DETACH_OPT_UNCACHE) { hsm_remove = true; + /* + * The file will be removed from PCC, set the flags + * with PCC_DATASET_NONE even the later removal of the + * PCC copy fails. + */ + lli->lli_pcc_dsflags = PCC_DATASET_NONE; + } __pcc_layout_invalidate(pcci); pcc_inode_put(pcci); diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h index ec2e421..60f9bea 100644 --- a/fs/lustre/llite/pcc.h +++ b/fs/lustre/llite/pcc.h @@ -92,20 +92,22 @@ struct pcc_matcher { }; enum pcc_dataset_flags { - PCC_DATASET_NONE = 0x0, + PCC_DATASET_INVALID = 0x0, + /* Indicate that known the file is not in PCC. */ + PCC_DATASET_NONE = 0x1, /* Try auto attach at open, enabled by default */ - PCC_DATASET_OPEN_ATTACH = 0x01, + PCC_DATASET_OPEN_ATTACH = 0x02, /* Try auto attach during IO when layout refresh, enabled by default */ - PCC_DATASET_IO_ATTACH = 0x02, + PCC_DATASET_IO_ATTACH = 0x04, /* Try auto attach at stat */ - PCC_DATASET_STAT_ATTACH = 0x04, + PCC_DATASET_STAT_ATTACH = 0x08, PCC_DATASET_AUTO_ATTACH = PCC_DATASET_OPEN_ATTACH | PCC_DATASET_IO_ATTACH | PCC_DATASET_STAT_ATTACH, /* PCC backend is only used for RW-PCC */ - PCC_DATASET_RWPCC = 0x08, + PCC_DATASET_RWPCC = 0x10, /* PCC backend is only used for RO-PCC */ - PCC_DATASET_ROPCC = 0x10, + PCC_DATASET_ROPCC = 0x20, /* PCC backend provides caching services for both RW-PCC and RO-PCC */ PCC_DATASET_PCC_ALL = PCC_DATASET_RWPCC | PCC_DATASET_ROPCC, }; @@ -114,7 +116,7 @@ struct pcc_dataset { u32 pccd_rwid; /* Archive ID */ u32 pccd_roid; /* Readonly ID */ struct pcc_match_rule pccd_rule; /* Match rule */ - enum pcc_dataset_flags pccd_flags; /* flags of PCC backend */ + enum pcc_dataset_flags pccd_flags; /* Flags of PCC backend */ char pccd_pathname[PATH_MAX]; /* full path */ struct path pccd_path; /* Root path */ struct list_head pccd_linkage; /* Linked to pccs_datasets */ @@ -128,6 +130,12 @@ struct pcc_super { struct list_head pccs_datasets; /* creds of process who forced instantiation of super block */ const struct cred *pccs_cred; + /* + * Gobal PCC Generation: it will be increased once the configuration + * for PCC is changed, i.e. add or delete a PCC backend, modify the + * parameters for PCC. + */ + u64 pccs_generation; }; struct pcc_inode { @@ -177,7 +185,9 @@ enum pcc_io_type { /* fsync system call handling */ PIT_FSYNC, /* splice_read system call */ - PIT_SPLICE_READ + PIT_SPLICE_READ, + /* open system call */ + PIT_OPEN, }; enum pcc_cmd_type { diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index b46f52b..12b1f78 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -2180,16 +2180,6 @@ enum lu_pcc_state_flags { PCC_STATE_FL_ATTR_VALID = 0x01, /* The file is being attached into PCC */ PCC_STATE_FL_ATTACHING = 0x02, - /* Allow to auto attach at open */ - PCC_STATE_FL_OPEN_ATTACH = 0x04, - /* Allow to auto attach during I/O after layout lock revocation */ - PCC_STATE_FL_IO_ATTACH = 0x08, - /* Allow to auto attach at stat */ - PCC_STATE_FL_STAT_ATTACH = 0x10, - /* Allow to auto attach at the next open or layout refresh */ - PCC_STATE_FL_AUTO_ATTACH = PCC_STATE_FL_OPEN_ATTACH | - PCC_STATE_FL_IO_ATTACH | - PCC_STATE_FL_STAT_ATTACH, }; struct lu_pcc_state { From patchwork Thu Feb 27 21:16:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20EF4924 for ; Thu, 27 Feb 2020 21:46:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 09015246A1 for ; Thu, 27 Feb 2020 21:46:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 09015246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0CC034B27A; Thu, 27 Feb 2020 13:36:57 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 95E5721FF15 for ; Thu, 27 Feb 2020 13:21:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7328E91A9; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 71DFF46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:35 -0500 Message-Id: <1582838290-17243-528-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 527/622] lustre: pcc: Init saved dataset flags properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Qian Yingjin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin When init a new inode, the saved flags is set wrongly with PCC_DATASET_NONE which means that the file is known in NONE of PCC dataset. This patch corrects it with PCC_DATASET_INVALID. WC-bug-id: https://jira.whamcloud.com/browse/LU-13030 Lustre-commit: e467a421c7aa ("LU-13030 pcc: Init saved dataset flags properly") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/36923 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/pcc.c | 13 +++++-------- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index c2baf6a..384b55b 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -983,7 +983,7 @@ void ll_lli_init(struct ll_inode_info *lli) mutex_init(&lli->lli_pcc_lock); lli->lli_pcc_state = PCC_STATE_FL_NONE; lli->lli_pcc_inode = NULL; - lli->lli_pcc_dsflags = PCC_DATASET_NONE; + lli->lli_pcc_dsflags = PCC_DATASET_INVALID; lli->lli_pcc_generation = 0; mutex_init(&lli->lli_group_mutex); lli->lli_group_users = 0; diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index a0e31c8..3a2c8f2 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -1346,21 +1346,18 @@ static int pcc_try_datasets_attach(struct inode *inode, enum pcc_io_type iot, * from icache later. * Set the saved dataset flags with PCC_DATASET_NONE. Then this * file will skip from the candidates to try auto attach until - * the file is attached ninto PCC again. + * the file is attached into PCC again. * * If the file was never attached into PCC, or once attached but * its inode was evicted from icache (lli_pcc_generation == 0), + * or the corresponding dataset was removed from the client, * set the saved dataset flags with PCC_DATASET_NONE. * - * If the file was once attached into PCC but the corresponding - * dataset was removed from the client, set the saved dataset - * flags with PCC_DATASET_NONE. - * * TODO: If the file was once attached into PCC but not try to * auto attach due to the change of the configuration parameters * for this dataset (i.e. change from auto attach enabled to * auto attach disabled for this dataset), update the saved - * dataset flags witha the found one. + * dataset flags with the found one. */ lli->lli_pcc_dsflags = PCC_DATASET_NONE; } @@ -1437,7 +1434,7 @@ static inline bool pcc_may_auto_attach(struct inode *inode, return false; /* - * lli_pcc_generation = 0 means that the file was never attached into + * lli_pcc_generation == 0 means that the file was never attached into * PCC, or may be once attached into PCC but detached as the inode is * evicted from icache (i.e. "echo 3 > /proc/sys/vm/drop_caches" or * icache shrinking due to the memory pressure), which will cause the @@ -1446,7 +1443,7 @@ static inline bool pcc_may_auto_attach(struct inode *inode, */ /* lli_pcc_generation == 0, or the PCC setting was changed, * or there is no PCC setup on the client and the try will return - * immediately in pcc_try_auto_attch(). + * immediately in pcc_try_auto_attach(). */ if (super->pccs_generation != lli->lli_pcc_generation) return true; From patchwork Thu Feb 27 21:16:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410895 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CC21924 for ; Thu, 27 Feb 2020 21:50:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8497224690 for ; Thu, 27 Feb 2020 21:50:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8497224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5AB7834B945; Thu, 27 Feb 2020 13:41:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC44C3488A0 for ; Thu, 27 Feb 2020 13:21:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7606691AA; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7498946C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:36 -0500 Message-Id: <1582838290-17243-529-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 528/622] lustre: use simple sleep in some cases X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown To match the OpenSFS branch change schedule_timeout_uninterruptible() to ssleep(). In mdc_request.c the change to ssleep() lets us remove a wait queue. In seq_client_alloc_meta() wait 2 seconds before attempting to run seq_client_rpc() again. WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: 077b35568be5 ("LU-10467 lustre: don't use l_wait_event() for simple sleep.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35966 Lustre-commit: d0ca764a1a91 ("LU-10467 lustre: don't use l_wait_event() for poll loops.") Reviewed-on: https://review.whamcloud.com/35968 Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/fid/fid_request.c | 7 +++++++ fs/lustre/llite/llite_lib.c | 3 ++- fs/lustre/lov/lov_request.c | 4 +++- fs/lustre/mdc/mdc_request.c | 5 ++--- fs/lustre/ptlrpc/events.c | 7 +++---- 5 files changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/lustre/fid/fid_request.c b/fs/lustre/fid/fid_request.c index a54d1e5..6cede30 100644 --- a/fs/lustre/fid/fid_request.c +++ b/fs/lustre/fid/fid_request.c @@ -40,6 +40,7 @@ #define DEBUG_SUBSYSTEM S_FID #include +#include #include #include @@ -155,6 +156,12 @@ static int seq_client_alloc_meta(const struct lu_env *env, */ rc = seq_client_rpc(seq, &seq->lcs_space, SEQ_ALLOC_META, "meta"); + if (rc == -EINPROGRESS || rc == -EAGAIN) + /* MDT0 is not ready, let's wait for 2 + * seconds and retry. + */ + ssleep(2); + } while (rc == -EINPROGRESS || rc == -EAGAIN); return rc; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 384b55b..7e128f0 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -2344,7 +2345,7 @@ void ll_umount_begin(struct super_block *sb) * to decrement mnt_cnt and hope to finish it within 10sec. */ while (cnt < 10 && !may_umount(sbi->ll_mnt.mnt)) { - schedule_timeout_uninterruptible(HZ); + ssleep(1); cnt++; } diff --git a/fs/lustre/lov/lov_request.c b/fs/lustre/lov/lov_request.c index added19..d263cec 100644 --- a/fs/lustre/lov/lov_request.c +++ b/fs/lustre/lov/lov_request.c @@ -33,6 +33,8 @@ #define DEBUG_SUBSYSTEM S_LOV +#include + #include #include #include "lov_internal.h" @@ -130,7 +132,7 @@ static int lov_check_and_wait_active(struct lov_obd *lov, int ost_idx) mutex_unlock(&lov->lov_lock); while (cnt < obd_timeout && !lov_check_set(lov, ost_idx)) { - schedule_timeout_uninterruptible(HZ); + ssleep(1); cnt++; } if (tgt->ltd_active) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 287013f..54f6d15 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -1043,13 +1044,11 @@ static int mdc_getpage(struct obd_export *exp, const struct lu_fid *fid, { struct ptlrpc_bulk_desc *desc; struct ptlrpc_request *req; - wait_queue_head_t waitq; int resends = 0; int rc; int i; *request = NULL; - init_waitqueue_head(&waitq); restart_bulk: req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_MDS_READPAGE); @@ -1093,7 +1092,7 @@ static int mdc_getpage(struct obd_export *exp, const struct lu_fid *fid, exp->exp_obd->obd_name, -EIO); return -EIO; } - wait_event_idle_timeout(waitq, 0, resends * HZ); + ssleep(resends); goto restart_bulk; } diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c index e6a49db..ce13aa6 100644 --- a/fs/lustre/ptlrpc/events.c +++ b/fs/lustre/ptlrpc/events.c @@ -34,9 +34,8 @@ #define DEBUG_SUBSYSTEM S_RPC #include -# ifdef __mips64__ -# include -# endif +#include +#include #include #include @@ -522,7 +521,7 @@ static void ptlrpc_ni_fini(void) if (retries != 0) CWARN("Event queue still busy\n"); - schedule_timeout_uninterruptible(2 * HZ); + ssleep(2); break; } } From patchwork Thu Feb 27 21:16:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBFCB924 for ; Thu, 27 Feb 2020 21:46:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A278F24690 for ; Thu, 27 Feb 2020 21:46:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A278F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6863234B2C0; Thu, 27 Feb 2020 13:37:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E2D63488E5 for ; Thu, 27 Feb 2020 13:21:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 78BDA91AB; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7758A46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:37 -0500 Message-Id: <1582838290-17243-530-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 529/622] lustre: lov: use wait_event() in lov_subobject_kill() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown lov_subobject_kill() has an open-coded version of wait_event(). Change it to use the macro. There is no need to take a spinlock just to check if a variable have changed value. If there was, the first test would be protected too. "lti_waiter" now has no users and can be removed from lov_thread_info. WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: c0894d1d32670 ("LU-10467 lov: use wait_event() in lov_subobject_kill()") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36343 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_cl_internal.h | 1 - fs/lustre/lov/lov_object.c | 24 +----------------------- 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index 8791e69..e21439d 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -474,7 +474,6 @@ struct lov_thread_info { struct ost_lvb lti_lvb; struct cl_2queue lti_cl2q; struct cl_page_list lti_plist; - wait_queue_entry_t lti_waiter; }; /** diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index f2c7bc2..2a35993 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -287,7 +287,6 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, struct cl_object *sub; struct lu_site *site; wait_queue_head_t *wq; - wait_queue_entry_t *waiter; LASSERT(r0->lo_sub[idx] == los); @@ -303,28 +302,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, /* ... wait until it is actually destroyed---sub-object clears its * ->lo_sub[] slot in lovsub_object_free() */ - if (r0->lo_sub[idx] == los) { - waiter = &lov_env_info(env)->lti_waiter; - init_waitqueue_entry(waiter, current); - add_wait_queue(wq, waiter); - set_current_state(TASK_UNINTERRUPTIBLE); - while (1) { - /* this wait-queue is signaled at the end of - * lu_object_free(). - */ - set_current_state(TASK_UNINTERRUPTIBLE); - spin_lock(&r0->lo_sub_lock); - if (r0->lo_sub[idx] == los) { - spin_unlock(&r0->lo_sub_lock); - schedule(); - } else { - spin_unlock(&r0->lo_sub_lock); - set_current_state(TASK_RUNNING); - break; - } - } - remove_wait_queue(wq, waiter); - } + wait_event(*wq, r0->lo_sub[idx] != los); LASSERT(!r0->lo_sub[idx]); } From patchwork Thu Feb 27 21:16:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410527 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C0A5138D for ; Thu, 27 Feb 2020 21:40:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E6E3424690 for ; Thu, 27 Feb 2020 21:40:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6E3424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 04FA03491CC; Thu, 27 Feb 2020 13:33:01 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 904E73488E5 for ; Thu, 27 Feb 2020 13:21:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7CBA991AC; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A16347C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:38 -0500 Message-Id: <1582838290-17243-531-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 530/622] lustre: llite: use wait_event in cl_object_put_last() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown cl_object_put_last() contains an open-coded version of wait_event(). Replace it with the library macro. WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: f963f19c94b5 ("LU-10467 llite: use wait_event in cl_object_put_last()") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36345 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/lcommon_cl.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c index 3129316..76f76a0 100644 --- a/fs/lustre/llite/lcommon_cl.c +++ b/fs/lustre/llite/lcommon_cl.c @@ -221,7 +221,6 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md) static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) { struct lu_object_header *header = obj->co_lu.lo_header; - wait_queue_entry_t waiter; if (unlikely(atomic_read(&header->loh_ref) != 1)) { struct lu_site *site = obj->co_lu.lo_dev->ld_site; @@ -229,18 +228,7 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) wq = lu_site_wq_from_fid(site, &header->loh_fid); - init_waitqueue_entry(&waiter, current); - add_wait_queue(wq, &waiter); - - while (1) { - set_current_state(TASK_UNINTERRUPTIBLE); - if (atomic_read(&header->loh_ref) == 1) - break; - schedule(); - } - - set_current_state(TASK_RUNNING); - remove_wait_queue(wq, &waiter); + wait_event(*wq, atomic_read(&header->loh_ref) == 1); } cl_object_put(env, obj); From patchwork Thu Feb 27 21:16:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410787 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89EB31580 for ; Thu, 27 Feb 2020 21:46:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 72E34246A1 for ; Thu, 27 Feb 2020 21:46:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 72E34246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3C65334B307; Thu, 27 Feb 2020 13:37:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D35973488E5 for ; Thu, 27 Feb 2020 13:21:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7DF4891AD; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7CFCA468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:39 -0500 Message-Id: <1582838290-17243-532-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 531/622] lustre: modules: Use LIST_HEAD for declaring list_heads X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Rather than struct list_head foo = LIST_HEAD_INIT(foo); use LIST_HEAD(foo); This is shorter and more in-keeping with upstream style. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: 546993d587c5 ("LU-9679 modules: Use LIST_HEAD for declaring list_heads") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36669 Reviewed-by: James Simmons Reviewed-by: Ben Evans Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_locks.c | 2 +- fs/lustre/mdc/mdc_reint.c | 2 +- fs/lustre/osc/osc_request.c | 2 +- fs/lustre/ptlrpc/pinger.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index b91c162..4d40087 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -581,7 +581,7 @@ static struct ptlrpc_request *mdc_intent_layout_pack(struct obd_export *exp, struct md_op_data *op_data) { struct obd_device *obd = class_exp2obd(exp); - struct list_head cancels = LIST_HEAD_INIT(cancels); + LIST_HEAD(cancels); struct ptlrpc_request *req; struct ldlm_intent *lit; struct layout_intent *layout; diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index d26e27d..0dc0de4 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -470,7 +470,7 @@ int mdc_rename(struct obd_export *exp, struct md_op_data *op_data, int mdc_file_resync(struct obd_export *exp, struct md_op_data *op_data) { - struct list_head cancels = LIST_HEAD_INIT(cancels); + LIST_HEAD(cancels); struct ptlrpc_request *req; struct ldlm_lock *lock; struct mdt_rec_resync *rec; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 39cac7d..d6761dd 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -3382,7 +3382,7 @@ int osc_cleanup_common(struct obd_device *obd) .quotactl = osc_quotactl, }; -struct list_head osc_shrink_list = LIST_HEAD_INIT(osc_shrink_list); +LIST_HEAD(osc_shrink_list); DEFINE_SPINLOCK(osc_shrink_lock); static struct shrinker osc_cache_shrinker = { diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c index f584fc6..d8f57bb 100644 --- a/fs/lustre/ptlrpc/pinger.c +++ b/fs/lustre/ptlrpc/pinger.c @@ -43,7 +43,7 @@ struct mutex pinger_mutex; static LIST_HEAD(pinger_imports); -static struct list_head timeout_list = LIST_HEAD_INIT(timeout_list); +static LIST_HEAD(timeout_list); struct ptlrpc_request * ptlrpc_prep_ping(struct obd_import *imp) From patchwork Thu Feb 27 21:16:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B663924 for ; Thu, 27 Feb 2020 21:50:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 53D7D24690 for ; Thu, 27 Feb 2020 21:50:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 53D7D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C412634B980; Thu, 27 Feb 2020 13:41:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 22EF23488F0 for ; Thu, 27 Feb 2020 13:21:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8152B91AE; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 800FB46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:40 -0500 Message-Id: <1582838290-17243-533-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 532/622] lustre: handle: move refcount into the lustre_handle. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown Most objects with a lustre_handle have a refcount. The exception is mdt_mfd which uses locking (med_open_lock) to manage its lifetime. The lustre_handles code currently needs a call-out to increment its refcount. To simplify things, move the refcount into the lustre_hanle (which will be largely ignored by mdt_mfd) and discard the call-out. To avoid warnings when refcount debugging is enabled the refcount of mdt_mfd is initialized to 1, and decremeneted after any class_handle2object() call which would have incremented it. In order to preserve the same debug messages, we store an object type name in the portals_handle_ops, and use that in a CDEBUG() when incrementing the ref count. WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: 1d1f6c8908b3 ("LU-12542 handle: move refcount into the lustre_handle.") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/35794 Reviewed-by: Neil Brown Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 6 ------ fs/lustre/include/lustre_export.h | 3 +-- fs/lustre/include/lustre_handles.h | 4 +++- fs/lustre/ldlm/ldlm_lock.c | 36 +++++++++++++++--------------------- fs/lustre/obdclass/genops.c | 25 ++++++++++--------------- fs/lustre/obdclass/lustre_handles.c | 5 ++++- fs/lustre/obdecho/echo_client.c | 2 +- fs/lustre/ptlrpc/service.c | 4 ++-- 8 files changed, 36 insertions(+), 49 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index f7d2d9c..7621d1e 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -598,12 +598,6 @@ struct ldlm_lock { */ struct portals_handle l_handle; /** - * Lock reference count. - * This is how many users have pointers to actual structure, so that - * we do not accidentally free lock structure that is in use. - */ - atomic_t l_refc; - /** * Internal spinlock protects l_resource. We should hold this lock * first before taking res_lock. */ diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index 967ce37..878dedd 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -67,12 +67,11 @@ struct obd_export { * what export they are talking to. */ struct portals_handle exp_handle; - refcount_t exp_refcount; /** * Set of counters below is to track where export references are * kept. The exp_rpc_count is used for reconnect handling also, * the cb_count and locks_count are for debug purposes only for now. - * The sum of them should be less than exp_refcount by 3 + * The sum of them should be less than exp_handle.href by 3 */ atomic_t exp_rpc_count; /* RPC references */ atomic_t exp_cb_count; /* Commit callback references */ diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h index 0440970..7c93d72 100644 --- a/fs/lustre/include/lustre_handles.h +++ b/fs/lustre/include/lustre_handles.h @@ -46,8 +46,9 @@ #include struct portals_handle_ops { - void (*hop_addref)(void *object); void (*hop_free)(void *object, int size); + /* hop_type is used for some debugging messages */ + char *hop_type; }; /* These handles are most easily used by having them appear at the very top of @@ -66,6 +67,7 @@ struct portals_handle { struct list_head h_link; u64 h_cookie; const struct portals_handle_ops *h_ops; + refcount_t h_ref; /* newly added fields to handle the RCU issue. -jxiong */ struct rcu_head h_rcu; diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index d14221a..62d2c1d 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -148,7 +148,7 @@ const char *ldlm_it2str(enum ldlm_intent_flags it) */ struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock) { - atomic_inc(&lock->l_refc); + refcount_inc(&lock->l_handle.h_ref); return lock; } EXPORT_SYMBOL(ldlm_lock_get); @@ -161,8 +161,8 @@ struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock) void ldlm_lock_put(struct ldlm_lock *lock) { LASSERT(lock->l_resource != LP_POISON); - LASSERT(atomic_read(&lock->l_refc) > 0); - if (atomic_dec_and_test(&lock->l_refc)) { + LASSERT(refcount_read(&lock->l_handle.h_ref) > 0); + if (refcount_dec_and_test(&lock->l_handle.h_ref)) { struct ldlm_resource *res; LDLM_DEBUG(lock, @@ -358,12 +358,6 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock) } } -/* this is called by portals_handle2object with the handle lock taken */ -static void lock_handle_addref(void *lock) -{ - LDLM_LOCK_GET((struct ldlm_lock *)lock); -} - static void lock_handle_free(void *lock, int size) { LASSERT(size == sizeof(struct ldlm_lock)); @@ -371,8 +365,8 @@ static void lock_handle_free(void *lock, int size) } static struct portals_handle_ops lock_handle_ops = { - .hop_addref = lock_handle_addref, .hop_free = lock_handle_free, + .hop_type = "ldlm", }; /** @@ -397,7 +391,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource) lock->l_resource = resource; lu_ref_add(&resource->lr_reference, "lock", lock); - atomic_set(&lock->l_refc, 2); + refcount_set(&lock->l_handle.h_ref, 2); INIT_LIST_HEAD(&lock->l_res_link); INIT_LIST_HEAD(&lock->l_lru); INIT_LIST_HEAD(&lock->l_pending_chain); @@ -1896,13 +1890,13 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, &vaf, lock, lock->l_handle.h_cookie, - atomic_read(&lock->l_refc), + refcount_read(&lock->l_handle.h_ref), lock->l_readers, lock->l_writers, ldlm_lockname[lock->l_granted_mode], ldlm_lockname[lock->l_req_mode], lock->l_flags, nid, lock->l_remote_handle.cookie, - exp ? refcount_read(&exp->exp_refcount) : -99, + exp ? refcount_read(&exp->exp_handle.h_ref) : -99, lock->l_pid, lock->l_callback_timeout, lock->l_lvb_type); va_end(args); @@ -1916,7 +1910,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, &vaf, ldlm_lock_to_ns_name(lock), lock, lock->l_handle.h_cookie, - atomic_read(&lock->l_refc), + refcount_read(&lock->l_handle.h_ref), lock->l_readers, lock->l_writers, ldlm_lockname[lock->l_granted_mode], ldlm_lockname[lock->l_req_mode], @@ -1929,7 +1923,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, lock->l_req_extent.end, lock->l_flags, nid, lock->l_remote_handle.cookie, - exp ? refcount_read(&exp->exp_refcount) : -99, + exp ? refcount_read(&exp->exp_handle.h_ref) : -99, lock->l_pid, lock->l_callback_timeout, lock->l_lvb_type); break; @@ -1940,7 +1934,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, &vaf, ldlm_lock_to_ns_name(lock), lock, lock->l_handle.h_cookie, - atomic_read(&lock->l_refc), + refcount_read(&lock->l_handle.h_ref), lock->l_readers, lock->l_writers, ldlm_lockname[lock->l_granted_mode], ldlm_lockname[lock->l_req_mode], @@ -1952,7 +1946,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, lock->l_policy_data.l_flock.end, lock->l_flags, nid, lock->l_remote_handle.cookie, - exp ? refcount_read(&exp->exp_refcount) : -99, + exp ? refcount_read(&exp->exp_handle.h_ref) : -99, lock->l_pid, lock->l_callback_timeout); break; @@ -1962,7 +1956,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, &vaf, ldlm_lock_to_ns_name(lock), lock, lock->l_handle.h_cookie, - atomic_read(&lock->l_refc), + refcount_read(&lock->l_handle.h_ref), lock->l_readers, lock->l_writers, ldlm_lockname[lock->l_granted_mode], ldlm_lockname[lock->l_req_mode], @@ -1972,7 +1966,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, ldlm_typename[resource->lr_type], lock->l_flags, nid, lock->l_remote_handle.cookie, - exp ? refcount_read(&exp->exp_refcount) : -99, + exp ? refcount_read(&exp->exp_handle.h_ref) : -99, lock->l_pid, lock->l_callback_timeout, lock->l_lvb_type); break; @@ -1983,7 +1977,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, &vaf, ldlm_lock_to_ns_name(lock), lock, lock->l_handle.h_cookie, - atomic_read(&lock->l_refc), + refcount_read(&lock->l_handle.h_ref), lock->l_readers, lock->l_writers, ldlm_lockname[lock->l_granted_mode], ldlm_lockname[lock->l_req_mode], @@ -1992,7 +1986,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock, ldlm_typename[resource->lr_type], lock->l_flags, nid, lock->l_remote_handle.cookie, - exp ? refcount_read(&exp->exp_refcount) : -99, + exp ? refcount_read(&exp->exp_handle.h_ref) : -99, lock->l_pid, lock->l_callback_timeout, lock->l_lvb_type); break; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 5d4e421..7f841d5 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -708,7 +708,7 @@ static void class_export_destroy(struct obd_export *exp) { struct obd_device *obd = exp->exp_obd; - LASSERT(refcount_read(&exp->exp_refcount) == 0); + LASSERT(refcount_read(&exp->exp_handle.h_ref) == 0); LASSERT(obd); CDEBUG(D_IOCTL, "destroying export %p/%s for %s\n", exp, @@ -732,33 +732,28 @@ static void class_export_destroy(struct obd_export *exp) OBD_FREE_RCU(exp, sizeof(*exp), &exp->exp_handle); } -static void export_handle_addref(void *export) -{ - class_export_get(export); -} - static struct portals_handle_ops export_handle_ops = { - .hop_addref = export_handle_addref, .hop_free = NULL, + .hop_type = "export", }; struct obd_export *class_export_get(struct obd_export *exp) { - refcount_inc(&exp->exp_refcount); - CDEBUG(D_INFO, "GETting export %p : new refcount %d\n", exp, - refcount_read(&exp->exp_refcount)); + refcount_inc(&exp->exp_handle.h_ref); + CDEBUG(D_INFO, "GET export %p refcount=%d\n", exp, + refcount_read(&exp->exp_handle.h_ref)); return exp; } EXPORT_SYMBOL(class_export_get); void class_export_put(struct obd_export *exp) { - LASSERT(refcount_read(&exp->exp_refcount) > 0); - LASSERT(refcount_read(&exp->exp_refcount) < LI_POISON); + LASSERT(refcount_read(&exp->exp_handle.h_ref) > 0); + LASSERT(refcount_read(&exp->exp_handle.h_ref) < LI_POISON); CDEBUG(D_INFO, "PUTting export %p : new refcount %d\n", exp, - refcount_read(&exp->exp_refcount) - 1); + refcount_read(&exp->exp_handle.h_ref) - 1); - if (refcount_dec_and_test(&exp->exp_refcount)) { + if (refcount_dec_and_test(&exp->exp_handle.h_ref)) { struct obd_device *obd = exp->exp_obd; CDEBUG(D_IOCTL, "final put %p/%s\n", @@ -809,7 +804,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd, export->exp_conn_cnt = 0; /* 2 = class_handle_hash + last */ - refcount_set(&export->exp_refcount, 2); + refcount_set(&export->exp_handle.h_ref, 2); atomic_set(&export->exp_rpc_count, 0); atomic_set(&export->exp_cb_count, 0); atomic_set(&export->exp_locks_count, 0); diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index 7fa3ef6..95a34db 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -152,7 +152,10 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops) spin_lock(&h->h_lock); if (likely(h->h_in != 0)) { - h->h_ops->hop_addref(h); + refcount_inc(&h->h_ref); + CDEBUG(D_INFO, "GET %s %p refcount=%d\n", + h->h_ops->hop_type, h, + refcount_read(&h->h_ref)); retval = h; } spin_unlock(&h->h_lock); diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index 8e04636..c473f547 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1669,7 +1669,7 @@ static int echo_client_cleanup(struct obd_device *obddev) lu_session_tags_clear(ECHO_SES_TAG & ~LCT_SESSION); lu_context_tags_clear(ECHO_DT_CTX_TAG); - LASSERT(refcount_read(&ec->ec_exp->exp_refcount) > 0); + LASSERT(refcount_read(&ec->ec_exp->exp_handle.h_ref) > 0); rc = obd_disconnect(ec->ec_exp); if (rc != 0) CERROR("fail to disconnect device: %d\n", rc); diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index fe0e108..c874487 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1768,7 +1768,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, (request->rq_export ? (char *)request->rq_export->exp_client_uuid.uuid : "0"), (request->rq_export ? - refcount_read(&request->rq_export->exp_refcount) : -99), + refcount_read(&request->rq_export->exp_handle.h_ref) : -99), lustre_msg_get_status(request->rq_reqmsg), request->rq_xid, libcfs_id2str(request->rq_peer), lustre_msg_get_opc(request->rq_reqmsg), @@ -1809,7 +1809,7 @@ static int ptlrpc_server_handle_request(struct ptlrpc_service_part *svcpt, (request->rq_export ? (char *)request->rq_export->exp_client_uuid.uuid : "0"), (request->rq_export ? - refcount_read(&request->rq_export->exp_refcount) : -99), + refcount_read(&request->rq_export->exp_handle.h_ref) : -99), lustre_msg_get_status(request->rq_reqmsg), request->rq_xid, libcfs_id2str(request->rq_peer), From patchwork Thu Feb 27 21:16:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410917 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A13181580 for ; Thu, 27 Feb 2020 21:51:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8647F24692 for ; Thu, 27 Feb 2020 21:51:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8647F24692 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EA59A34A4EF; Thu, 27 Feb 2020 13:43:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8CDD621FD10 for ; Thu, 27 Feb 2020 13:21:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 847F791AF; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 82F4D46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:41 -0500 Message-Id: <1582838290-17243-534-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 533/622] lustre: llite: support page unaligned stride readahead X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Currently, Lustre works well for aligned IO, but performance is pretty bad for unaligned IO stride read, we might need take some efforts to improve this situation. One of the main problem with current stride read is it is based on Page Index, so if we hit unaligned page case, stride Read detection will not work well. To support unaligned page stride read, we might change page index to bytes offset thus stride read pattern detection work well and we won't hit many small pages RPC and readahead window reset. At the same time, we shall keep as much as performances for existed cases and make sure there won't be obvious regressions for aligned-stride and sequential read. Benchmark numbers: iozone -w -c -i 5 -t1 -j 2 -s 1G -r 43k -F /mnt/lustre/data Patched Unpatched 1386630.75 kB/sec 152002.50 kB/sec At least performance bumped up more than ~800%. Benchmarked with IOR from ihara: FPP Read(MB/sec) SSF Read(MB/sec) Unpatched 44,636 7,731 Patched 44,318 20,745 Got 250% performances up for ior_hard_read workload. WC-bug-id: https://jira.whamcloud.com/browse/LU-12518 Lustre-commit: 91d264551508 ("LU-12518 llite: support page unaligned stride readahead") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35437 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 +- fs/lustre/llite/llite_internal.h | 11 +- fs/lustre/llite/rw.c | 388 ++++++++++++++++++++++----------------- 3 files changed, 228 insertions(+), 173 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 92eead1..d196da8 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1703,7 +1703,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (cached) goto out; - ll_ras_enter(file); + ll_ras_enter(file, iocb->ki_pos, iov_iter_count(to)); result = ll_do_fast_read(iocb, to); if (result < 0 || iov_iter_count(to) == 0) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 8e7b949..fe9d568 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -654,11 +654,6 @@ struct ll_readahead_state { */ unsigned long ras_requests; /* - * Page index with respect to the current request, these value - * will not be accurate when dealing with reads issued via mmap. - */ - unsigned long ras_request_index; - /* * The following 3 items are used for detecting the stride I/O * mode. * In stride I/O mode, @@ -681,6 +676,10 @@ struct ll_readahead_state { unsigned long ras_consecutive_stride_requests; /* index of the last page that async readahead starts */ pgoff_t ras_async_last_readpage; + /* whether we should increase readahead window */ + bool ras_need_increase_window; + /* whether ra miss check should be skipped */ + bool ras_no_miss_check; }; struct ll_readahead_work { @@ -778,7 +777,7 @@ static inline bool ll_sbi_has_file_heat(struct ll_sb_info *sbi) return !!(sbi->ll_flags & LL_SBI_FILE_HEAT); } -void ll_ras_enter(struct file *f); +void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count); /* llite/lcommon_misc.c */ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 38f7aa2c..bf91ae1 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -131,12 +131,11 @@ void ll_ra_stats_inc(struct inode *inode, enum ra_stat which) #define RAS_CDEBUG(ras) \ CDEBUG(D_READA, \ - "lre %lu cr %lu cb %lu ws %lu wl %lu nra %lu rpc %lu r %lu ri %lu csr %lu sf %lu sb %lu sl %lu lr %lu\n", \ + "lre %lu cr %lu cb %lu ws %lu wl %lu nra %lu rpc %lu r %lu csr %lu sf %lu sb %lu sl %lu lr %lu\n", \ ras->ras_last_read_end, ras->ras_consecutive_requests, \ ras->ras_consecutive_bytes, ras->ras_window_start, \ ras->ras_window_len, ras->ras_next_readahead, \ - ras->ras_rpc_size, \ - ras->ras_requests, ras->ras_request_index, \ + ras->ras_rpc_size, ras->ras_requests, \ ras->ras_consecutive_stride_requests, ras->ras_stride_offset, \ ras->ras_stride_bytes, ras->ras_stride_length, \ ras->ras_async_last_readpage) @@ -154,18 +153,6 @@ static int pos_in_window(unsigned long pos, unsigned long point, return start <= pos && pos <= end; } -void ll_ras_enter(struct file *f) -{ - struct ll_file_data *fd = LUSTRE_FPRIVATE(f); - struct ll_readahead_state *ras = &fd->fd_ras; - - spin_lock(&ras->ras_lock); - ras->ras_requests++; - ras->ras_request_index = 0; - ras->ras_consecutive_requests++; - spin_unlock(&ras->ras_lock); -} - /** * Initiates read-ahead of a page with given index. * @@ -311,15 +298,23 @@ static inline int stride_io_mode(struct ll_readahead_state *ras) static int ria_page_count(struct ra_io_arg *ria) { - u64 length = ria->ria_end >= ria->ria_start ? - ria->ria_end - ria->ria_start + 1 : 0; - unsigned int bytes_count; - + u64 length_bytes = ria->ria_end >= ria->ria_start ? + (ria->ria_end - ria->ria_start + 1) << PAGE_SHIFT : 0; + unsigned int bytes_count, pg_count; + + if (ria->ria_length > ria->ria_bytes && ria->ria_bytes && + (ria->ria_length % PAGE_SIZE || ria->ria_bytes % PAGE_SIZE || + ria->ria_stoff % PAGE_SIZE)) { + /* Over-estimate un-aligned page stride read */ + pg_count = ((ria->ria_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT) + 1; + pg_count *= length_bytes / ria->ria_length + 1; + + return pg_count; + } bytes_count = stride_byte_count(ria->ria_stoff, ria->ria_length, ria->ria_bytes, ria->ria_start, - length << PAGE_SHIFT); + length_bytes); return (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT; - } static unsigned long ras_align(struct ll_readahead_state *ras, @@ -333,16 +328,28 @@ static unsigned long ras_align(struct ll_readahead_state *ras, } /*Check whether the index is in the defined ra-window */ -static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) +static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) { + unsigned long pos = idx << PAGE_SHIFT; + unsigned long offset; + /* If ria_length == ria_pages, it means non-stride I/O mode, * idx should always inside read-ahead window in this case * For stride I/O mode, just check whether the idx is inside * the ria_pages. */ - return ria->ria_length == 0 || ria->ria_length == ria->ria_bytes || - (idx >= ria->ria_stoff && (idx - ria->ria_stoff) % - ria->ria_length < ria->ria_bytes); + if (ria->ria_length == 0 || ria->ria_length == ria->ria_bytes) + return true; + + if (pos >= ria->ria_stoff) { + offset = (pos - ria->ria_stoff) % ria->ria_length; + if (offset < ria->ria_bytes || + (ria->ria_length - offset) < PAGE_SIZE) + return true; + } else if (pos + PAGE_SIZE > ria->ria_stoff) + return true; + + return false; } static unsigned long @@ -351,7 +358,6 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) struct ra_io_arg *ria, pgoff_t *ra_end) { struct cl_read_ahead ra = { 0 }; - bool stride_ria; pgoff_t page_idx; int count = 0; int rc; @@ -359,7 +365,6 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) LASSERT(ria); RIA_DEBUG(ria); - stride_ria = ria->ria_length > ria->ria_bytes && ria->ria_bytes > 0; for (page_idx = ria->ria_start; page_idx <= ria->ria_end && ria->ria_reserved > 0; page_idx++) { if (ras_inside_ra_window(page_idx, ria)) { @@ -417,7 +422,7 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) ria->ria_reserved--; count++; } - } else if (stride_ria) { + } else if (stride_io_mode(ras)) { /* If it is not in the read-ahead window, and it is * read-ahead mode, then check whether it should skip * the stride gap. @@ -428,7 +433,8 @@ static int ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) offset = (pos - ria->ria_stoff) % ria->ria_length; if (offset >= ria->ria_bytes) { pos += (ria->ria_length - offset); - page_idx = (pos >> PAGE_SHIFT) - 1; + if ((pos >> PAGE_SHIFT) >= page_idx + 1) + page_idx = (pos >> PAGE_SHIFT) - 1; CDEBUG(D_READA, "Stride: jump %lu pages to %lu\n", ria->ria_length - offset, page_idx); @@ -775,11 +781,10 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras) * Check whether the read request is in the stride window. * If it is in the stride window, return true, otherwise return false. */ -static bool index_in_stride_window(struct ll_readahead_state *ras, - pgoff_t index) +static bool read_in_stride_window(struct ll_readahead_state *ras, + unsigned long pos, unsigned long count) { unsigned long stride_gap; - unsigned long pos = index << PAGE_SHIFT; if (ras->ras_stride_length == 0 || ras->ras_stride_bytes == 0 || ras->ras_stride_bytes == ras->ras_stride_length) @@ -789,12 +794,13 @@ static bool index_in_stride_window(struct ll_readahead_state *ras, /* If it is contiguous read */ if (stride_gap == 0) - return ras->ras_consecutive_bytes + PAGE_SIZE <= + return ras->ras_consecutive_bytes + count <= ras->ras_stride_bytes; /* Otherwise check the stride by itself */ return (ras->ras_stride_length - ras->ras_stride_bytes) == stride_gap && - ras->ras_consecutive_bytes == ras->ras_stride_bytes; + ras->ras_consecutive_bytes == ras->ras_stride_bytes && + count <= ras->ras_stride_bytes; } static void ras_init_stride_detector(struct ll_readahead_state *ras, @@ -802,13 +808,6 @@ static void ras_init_stride_detector(struct ll_readahead_state *ras, { unsigned long stride_gap = pos - ras->ras_last_read_end - 1; - if ((stride_gap != 0 || ras->ras_consecutive_stride_requests == 0) && - !stride_io_mode(ras)) { - ras->ras_stride_bytes = ras->ras_consecutive_bytes; - ras->ras_stride_length = ras->ras_consecutive_bytes + - stride_gap; - } - LASSERT(ras->ras_request_index == 0); LASSERT(ras->ras_consecutive_stride_requests == 0); if (pos <= ras->ras_last_read_end) { @@ -819,6 +818,8 @@ static void ras_init_stride_detector(struct ll_readahead_state *ras, ras->ras_stride_bytes = ras->ras_consecutive_bytes; ras->ras_stride_length = stride_gap + ras->ras_consecutive_bytes; + ras->ras_consecutive_stride_requests++; + ras->ras_stride_offset = pos; RAS_CDEBUG(ras); } @@ -895,49 +896,97 @@ static void ras_increase_window(struct inode *inode, } } -static void ras_update(struct ll_sb_info *sbi, struct inode *inode, - struct ll_readahead_state *ras, unsigned long index, - enum ras_update_flags flags) +/** + * Seek within 8 pages are considered as sequential read for now. + */ +static inline bool is_loose_seq_read(struct ll_readahead_state *ras, + unsigned long pos) { - struct ll_ra_info *ra = &sbi->ll_ra_info; - int zero = 0, stride_detect = 0, ra_miss = 0; - unsigned long pos = index << PAGE_SHIFT; - bool hit = flags & LL_RAS_HIT; - - spin_lock(&ras->ras_lock); - - if (!hit) - CDEBUG(D_READA, DFID " pages at %lu miss.\n", - PFID(ll_inode2fid(inode)), index); + return pos_in_window(pos, ras->ras_last_read_end, + 8 << PAGE_SHIFT, 8 << PAGE_SHIFT); +} - ll_ra_stats_inc_sbi(sbi, hit ? RA_STAT_HIT : RA_STAT_MISS); +static void ras_detect_read_pattern(struct ll_readahead_state *ras, + struct ll_sb_info *sbi, + unsigned long pos, unsigned long count, + bool mmap) +{ + bool stride_detect = false; + unsigned long index = pos >> PAGE_SHIFT; - /* reset the read-ahead window in two cases. First when the app seeks - * or reads to some other part of the file. Secondly if we get a - * read-ahead miss that we think we've previously issued. This can - * be a symptom of there being so many read-ahead pages that the VM is - * reclaiming it before we get to it. + /* + * Reset the read-ahead window in two cases. First when the app seeks + * or reads to some other part of the file. Secondly if we get a + * read-ahead miss that we think we've previously issued. This can + * be a symptom of there being so many read-ahead pages that the VM + * is reclaiming it before we get to it. */ - if (!pos_in_window(pos, ras->ras_last_read_end, - 8 << PAGE_SHIFT, 8 << PAGE_SHIFT)) { - zero = 1; + if (!is_loose_seq_read(ras, pos)) { + /* Check whether it is in stride I/O mode */ + if (!read_in_stride_window(ras, pos, count)) { + if (ras->ras_consecutive_stride_requests == 0) + ras_init_stride_detector(ras, pos, count); + else + ras_stride_reset(ras); + ras->ras_consecutive_bytes = 0; + ras_reset(ras, index); + } else { + ras->ras_consecutive_bytes = 0; + ras->ras_consecutive_requests = 0; + if (++ras->ras_consecutive_stride_requests > 1) + stride_detect = true; + RAS_CDEBUG(ras); + } ll_ra_stats_inc_sbi(sbi, RA_STAT_DISTANT_READPAGE); - } else if (!hit && ras->ras_window_len && - index < ras->ras_next_readahead && - pos_in_window(index, ras->ras_window_start, 0, - ras->ras_window_len)) { - ra_miss = 1; - ll_ra_stats_inc_sbi(sbi, RA_STAT_MISS_IN_WINDOW); + } else if (stride_io_mode(ras)) { + /* + * If this is contiguous read but in stride I/O mode + * currently, check whether stride step still is valid, + * if invalid, it will reset the stride ra window to + * be zero. + */ + if (!read_in_stride_window(ras, pos, count)) { + ras_stride_reset(ras); + ras->ras_window_len = 0; + ras->ras_next_readahead = index; + } } - /* On the second access to a file smaller than the tunable + ras->ras_consecutive_bytes += count; + if (mmap) { + unsigned int idx = (ras->ras_consecutive_bytes >> PAGE_SHIFT); + + if ((idx >= 4 && idx % 4 == 0) || stride_detect) + ras->ras_need_increase_window = true; + } else if ((ras->ras_consecutive_requests > 1 || stride_detect)) { + ras->ras_need_increase_window = true; + } + + ras->ras_last_read_end = pos + count - 1; +} + +void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count) +{ + struct ll_file_data *fd = LUSTRE_FPRIVATE(f); + struct ll_readahead_state *ras = &fd->fd_ras; + struct inode *inode = file_inode(f); + unsigned long index = pos >> PAGE_SHIFT; + struct ll_sb_info *sbi = ll_i2sbi(inode); + + spin_lock(&ras->ras_lock); + ras->ras_requests++; + ras->ras_consecutive_requests++; + ras->ras_need_increase_window = false; + ras->ras_no_miss_check = false; + /* + * On the second access to a file smaller than the tunable * ra_max_read_ahead_whole_pages trigger RA on all pages in the * file up to ra_max_pages_per_file. This is simply a best effort - * and only occurs once per open file. Normal RA behavior is reverted - * to for subsequent IO. The mmap case does not increment - * ras_requests and thus can never trigger this behavior. + * and only occurs once per open file. Normal RA behavior is reverted + * to for subsequent IO. */ - if (ras->ras_requests >= 2 && !ras->ras_request_index) { + if (ras->ras_requests >= 2) { + struct ll_ra_info *ra = &sbi->ll_ra_info; u64 kms_pages; kms_pages = (i_size_read(inode) + PAGE_SIZE - 1) >> @@ -952,73 +1001,111 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, ras->ras_window_start = 0; ras->ras_next_readahead = index + 1; ras->ras_window_len = min(ra->ra_max_pages_per_file, - ra->ra_max_read_ahead_whole_pages); + ra->ra_max_read_ahead_whole_pages); + ras->ras_no_miss_check = true; goto out_unlock; } } - if (zero) { - /* check whether it is in stride I/O mode*/ - if (!index_in_stride_window(ras, index)) { - if (ras->ras_consecutive_stride_requests == 0 && - ras->ras_request_index == 0) { - ras_init_stride_detector(ras, pos, PAGE_SIZE); - ras->ras_consecutive_stride_requests++; - } else { - ras_stride_reset(ras); - } + ras_detect_read_pattern(ras, sbi, pos, count, false); +out_unlock: + spin_unlock(&ras->ras_lock); +} + +static bool index_in_stride_window(struct ll_readahead_state *ras, + unsigned int index) +{ + unsigned long pos = index << PAGE_SHIFT; + unsigned long offset; + + if (ras->ras_stride_length == 0 || ras->ras_stride_bytes == 0 || + ras->ras_stride_bytes == ras->ras_stride_length) + return false; + + if (pos >= ras->ras_stride_offset) { + offset = (pos - ras->ras_stride_offset) % + ras->ras_stride_length; + if (offset < ras->ras_stride_bytes || + ras->ras_stride_length - offset < PAGE_SIZE) + return true; + } else if (ras->ras_stride_offset - pos < PAGE_SIZE) { + return true; + } + + return false; +} + +/* + * ll_ras_enter() is used to detect read pattern according to + * pos and count. + * + * ras_update() is used to detect cache miss and + * reset window or increase window accordingly + */ +static void ras_update(struct ll_sb_info *sbi, struct inode *inode, + struct ll_readahead_state *ras, unsigned long index, + enum ras_update_flags flags) +{ + struct ll_ra_info *ra = &sbi->ll_ra_info; + bool hit = flags & LL_RAS_HIT; + + spin_lock(&ras->ras_lock); + + if (!hit) + CDEBUG(D_READA, DFID " pages at %lu miss.\n", + PFID(ll_inode2fid(inode)), index); + ll_ra_stats_inc_sbi(sbi, hit ? RA_STAT_HIT : RA_STAT_MISS); + + /* + * The readahead window has been expanded to cover whole + * file size, we don't care whether ra miss happen or not. + * Because we will read whole file to page cache even if + * some pages missed. + */ + if (ras->ras_no_miss_check) + goto out_unlock; + + if (flags & LL_RAS_MMAP) + ras_detect_read_pattern(ras, sbi, index << PAGE_SHIFT, + PAGE_SIZE, true); + + if (!hit && ras->ras_window_len && + index < ras->ras_next_readahead && + pos_in_window(index, ras->ras_window_start, 0, + ras->ras_window_len)) { + ll_ra_stats_inc_sbi(sbi, RA_STAT_MISS_IN_WINDOW); + ras->ras_need_increase_window = false; + + if (index_in_stride_window(ras, index) && + stride_io_mode(ras)) { + /* + * if (index != ras->ras_last_readpage + 1) + * ras->ras_consecutive_pages = 0; + */ ras_reset(ras, index); - ras->ras_consecutive_bytes += PAGE_SIZE; - goto out_unlock; - } else { - ras->ras_consecutive_bytes = 0; - ras->ras_consecutive_requests = 0; - if (++ras->ras_consecutive_stride_requests > 1) - stride_detect = 1; - RAS_CDEBUG(ras); - } - } else { - if (ra_miss) { - if (index_in_stride_window(ras, index) && - stride_io_mode(ras)) { - if (index != (ras->ras_last_read_end >> - PAGE_SHIFT) + 1) - ras->ras_consecutive_bytes = 0; - ras_reset(ras, index); - - /* If stride-RA hit cache miss, the stride - * detector will not be reset to avoid the - * overhead of redetecting read-ahead mode, - * but on the condition that the stride window - * is still intersect with normal sequential - * read-ahead window. - */ - if (ras->ras_window_start < - (ras->ras_stride_offset >> PAGE_SHIFT)) - ras_stride_reset(ras); - RAS_CDEBUG(ras); - } else { - /* Reset both stride window and normal RA - * window - */ - ras_reset(ras, index); - ras->ras_consecutive_bytes += PAGE_SIZE; - ras_stride_reset(ras); - goto out_unlock; - } - } else if (stride_io_mode(ras)) { - /* If this is contiguous read but in stride I/O mode - * currently, check whether stride step still is valid, - * if invalid, it will reset the stride ra window + /* + * If stride-RA hit cache miss, the stride + * detector will not be reset to avoid the + * overhead of redetecting read-ahead mode, + * but on the condition that the stride window + * is still intersect with normal sequential + * read-ahead window. */ - if (!index_in_stride_window(ras, index)) { - /* Shrink stride read-ahead window to be zero */ + if (ras->ras_window_start < + ras->ras_stride_offset) ras_stride_reset(ras); - ras->ras_window_len = 0; - ras->ras_next_readahead = index; - } + RAS_CDEBUG(ras); + } else { + /* + * Reset both stride window and normal RA + * window. + */ + ras_reset(ras, index); + /* ras->ras_consecutive_pages++; */ + ras->ras_consecutive_bytes = 0; + ras_stride_reset(ras); + goto out_unlock; } } - ras->ras_consecutive_bytes += PAGE_SIZE; ras_set_start(ras, index); if (stride_io_mode(ras)) { @@ -1037,44 +1124,13 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, if (!hit) ras->ras_next_readahead = index + 1; } - RAS_CDEBUG(ras); - /* Trigger RA in the mmap case where ras_consecutive_requests - * is not incremented and thus can't be used to trigger RA - */ - if (ras->ras_consecutive_bytes >= (4 << PAGE_SHIFT) && - flags & LL_RAS_MMAP) { + if (ras->ras_need_increase_window) { ras_increase_window(inode, ras, ra); - /* - * reset consecutive pages so that the readahead window can - * grow gradually. - */ - ras->ras_consecutive_bytes = 0; - goto out_unlock; - } - - /* Initially reset the stride window offset to next_readahead*/ - if (ras->ras_consecutive_stride_requests == 2 && stride_detect) { - /** - * Once stride IO mode is detected, next_readahead should be - * reset to make sure next_readahead > stride offset - */ - ras->ras_next_readahead = max(index, ras->ras_next_readahead); - ras->ras_stride_offset = index << PAGE_SHIFT; - ras->ras_window_start = max(index, ras->ras_window_start); + ras->ras_need_increase_window = false; } - /* The initial ras_window_len is set to the request size. To avoid - * uselessly reading and discarding pages for random IO the window is - * only increased once per consecutive request received. - */ - if ((ras->ras_consecutive_requests > 1 || stride_detect) && - !ras->ras_request_index) - ras_increase_window(inode, ras, ra); out_unlock: - RAS_CDEBUG(ras); - ras->ras_request_index++; - ras->ras_last_read_end = pos + PAGE_SIZE - 1; spin_unlock(&ras->ras_lock); } From patchwork Thu Feb 27 21:16:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410531 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B9CF92A for ; Thu, 27 Feb 2020 21:40:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6446524690 for ; Thu, 27 Feb 2020 21:40:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6446524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4692F34A7F7; Thu, 27 Feb 2020 13:33:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E30E521FD10 for ; Thu, 27 Feb 2020 13:21:04 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 86F3791B0; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 85B9246D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:42 -0500 Message-Id: <1582838290-17243-535-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 534/622] lustre: ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ann Koehler , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ann Koehler Another path through ptl_send_rpc() can cause the assert reported in LU-10643. The assertion in ptlrpc_register_bulk() on !desc->bd_registered fails when an rpc is resent and the first send attempt failed to successfully attach the reply buffer. The bulk error cleanup in ptl_send_rpc() does not reset the bd_registered flag. Cray-bug-id: LUS-7946 WC-bug-id: https://jira.whamcloud.com/browse/LU-12816 Lustre-commit: e6225c07ce4c ("LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM") Signed-off-by: Ann Koehler Reviewed-on: https://review.whamcloud.com/36309 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/niobuf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 12a9a5e..fcf7bfa 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -720,6 +720,8 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) * the chance to have long unlink to sluggish net is smaller here. */ ptlrpc_unregister_bulk(request, 0); + if (request->rq_bulk) + request->rq_bulk->bd_registered = 0; out: if (rc == -ENOMEM) { /* From patchwork Thu Feb 27 21:16:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410661 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3EEF7138D for ; Thu, 27 Feb 2020 21:43:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 26F3324690 for ; Thu, 27 Feb 2020 21:43:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26F3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 03928348F50; Thu, 27 Feb 2020 13:35:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 32F5F21FD10 for ; Thu, 27 Feb 2020 13:21:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 89CEA91B1; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8871D468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:43 -0500 Message-Id: <1582838290-17243-536-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 535/622] lustre: osc: allow increasing osc.*.short_io_bytes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The osc.*.short_io_bytes parameter was mixing up the default and maximum parameter values, and did not allow increasing the parameter beyond the default. Allow it to be increased to the maximum value, which depends on the client PAGE_SIZE, and the amount of free space in the maximally-sized OST RPC. Since the maximum size is system dependent, allow some grace when setting the parameter, so that a single tunable parameter can work on a variety of different systems. However, if it is larger than the maximum RDMA size (which is already too large) return an error, as it means something is wrong. Add a test case to exercise the osc.*.short_io_bytes parameter. WC-bug-id: https://jira.whamcloud.com/browse/LU-12910 Lustre-commit: cedc7f361a6e ("LU-12910 osc: allow increasing osc.*.short_io_bytes") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36587 Reviewed-by: Wang Shilong Reviewed-by: Olaf Faaland-LLNL Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 25 ++++++++++++++----------- fs/lustre/ldlm/ldlm_lib.c | 2 +- fs/lustre/obdclass/lprocfs_status.c | 20 ++++++++++++-------- 3 files changed, 27 insertions(+), 20 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 40c1ae8..87e1d60 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -306,17 +306,19 @@ * DT_MAX_BRW_PAGES * niobuf_remote * * - single object with 16 pages is 512 bytes - * - OST_IO_MAXREQSIZE must be at least 1 page of cookies plus some spillover + * - OST_IO_MAXREQSIZE must be at least 1 niobuf per page of data * - Must be a multiple of 1024 + * - should allow a reasonably large SHORT_IO_BYTES size (64KB) */ #define _OST_MAXREQSIZE_BASE ((unsigned long)(sizeof(struct lustre_msg) + \ - sizeof(struct ptlrpc_body) + \ - sizeof(struct obdo) + \ - sizeof(struct obd_ioobj) + \ - sizeof(struct niobuf_remote))) -#define _OST_MAXREQSIZE_SUM ((unsigned long)(_OST_MAXREQSIZE_BASE + \ - sizeof(struct niobuf_remote) * \ - (DT_MAX_BRW_PAGES - 1))) + /* lm_buflens */ sizeof(u32) * 4 + \ + sizeof(struct ptlrpc_body) +\ + sizeof(struct obdo) + \ + sizeof(struct obd_ioobj) + \ + sizeof(struct niobuf_remote))) +#define _OST_MAXREQSIZE_SUM ((unsigned long)(_OST_MAXREQSIZE_BASE + \ + sizeof(struct niobuf_remote) * \ + DT_MAX_BRW_PAGES)) /** * MDS incoming request with LOV EA @@ -335,14 +337,15 @@ /* Safe estimate of free space in standard RPC, provides upper limit for # of * bytes of i/o to pack in RPC (skipping bulk transfer). */ -#define OST_SHORT_IO_SPACE (OST_IO_MAXREQSIZE - _OST_MAXREQSIZE_BASE) +#define OST_MAX_SHORT_IO_BYTES ((OST_IO_MAXREQSIZE - _OST_MAXREQSIZE_BASE) & \ + PAGE_MASK) /* Actual size used for short i/o buffer. Calculation means this: * At least one page (for large PAGE_SIZE), or 16 KiB, but not more * than the available space aligned to a page boundary. */ -#define OBD_MAX_SHORT_IO_BYTES (min(max(PAGE_SIZE, 16UL * 1024UL), \ - OST_SHORT_IO_SPACE & PAGE_MASK)) +#define OBD_DEF_SHORT_IO_BYTES min(max(PAGE_SIZE, 16UL * 1024UL), \ + OST_MAX_SHORT_IO_BYTES) /* Macro to hide a typecast and BUILD_BUG. */ #define ptlrpc_req_async_args(_var, req) ({ \ diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 127ed32..58919d3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -381,7 +381,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg) */ cli->cl_max_pages_per_rpc = PTLRPC_MAX_BRW_PAGES; - cli->cl_max_short_io_bytes = OBD_MAX_SHORT_IO_BYTES; + cli->cl_max_short_io_bytes = OBD_DEF_SHORT_IO_BYTES; /* * set cl_chunkbits default value to PAGE_CACHE_SHIFT, diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 98d1e3b..806d6517 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -1894,18 +1894,24 @@ ssize_t short_io_bytes_store(struct kobject *kobj, struct attribute *attr, struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &dev->u.cli; - u32 val; + unsigned long long val; + char *endp; int rc; rc = lprocfs_climp_check(dev); if (rc) return rc; - rc = kstrtouint(buffer, 0, &val); - if (rc) + val = memparse(buffer, &endp); + if (*endp) { + rc = -EINVAL; goto out; + } + + if (val == -1) + val = OBD_DEF_SHORT_IO_BYTES; - if (val && (val < MIN_SHORT_IO_BYTES || val > OBD_MAX_SHORT_IO_BYTES)) { + if (val && (val < MIN_SHORT_IO_BYTES || val > LNET_MTU)) { rc = -ERANGE; goto out; } @@ -1913,10 +1919,8 @@ ssize_t short_io_bytes_store(struct kobject *kobj, struct attribute *attr, rc = count; spin_lock(&cli->cl_loi_list_lock); - if (val > (cli->cl_max_pages_per_rpc << PAGE_SHIFT)) - rc = -ERANGE; - else - cli->cl_max_short_io_bytes = val; + cli->cl_max_short_io_bytes = min_t(unsigned long long, + val, OST_MAX_SHORT_IO_BYTES); spin_unlock(&cli->cl_loi_list_lock); out: From patchwork Thu Feb 27 21:16:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B584924 for ; Thu, 27 Feb 2020 21:45:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 22D2324690 for ; Thu, 27 Feb 2020 21:45:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22D2324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 565C634AFF7; Thu, 27 Feb 2020 13:36:12 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 893B7348901 for ; Thu, 27 Feb 2020 13:21:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8CBB091B2; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8B43E47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:44 -0500 Message-Id: <1582838290-17243-537-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 536/622] lnet: remove pt_number from lnet_peer_table. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This fields is no longer used - except for an ASSERT(). It did have a use once, but that was removed in Commit 21602c7db4cf ("staging: lustre: Dynamic LNet Configuration (DLC)") WC-bug-id: https://jira.whamcloud.com/browse/LU-12936 Lustre-commit: e9c9e2103a78 ("LU-12936 lnet: remove pt_number from lnet_peer_table.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36671 Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 3 --- net/lnet/lnet/peer.c | 3 --- 2 files changed, 6 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 18d4e4e..51cc9ce 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -765,7 +765,6 @@ struct lnet_peer_net { * * protected by lnet_net_lock/EX for update * pt_version - * pt_number * pt_hash[...] * pt_peer_list * pt_peers @@ -778,8 +777,6 @@ struct lnet_peer_net { struct lnet_peer_table { /* /proc validity stamp */ int pt_version; - /* # peers extant */ - atomic_t pt_number; /* peers */ struct list_head pt_peer_list; /* # peers */ diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index a067136..4f0da4b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -354,8 +354,6 @@ /* decrement the ref count on the peer table */ ptable = the_lnet.ln_peer_tables[lpni->lpni_cpt]; - LASSERT(atomic_read(&ptable->pt_number) > 0); - atomic_dec(&ptable->pt_number); /* * The peer_ni can no longer be found with a lookup. But there @@ -1246,7 +1244,6 @@ struct lnet_peer_net * ptable = the_lnet.ln_peer_tables[lpni->lpni_cpt]; list_add_tail(&lpni->lpni_hashlist, &ptable->pt_hash[hash]); ptable->pt_version++; - atomic_inc(&ptable->pt_number); /* This is the 1st refcount on lpni. */ atomic_inc(&lpni->lpni_refcount); } From patchwork Thu Feb 27 21:16:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4862D138D for ; Thu, 27 Feb 2020 21:43:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 30A0024690 for ; Thu, 27 Feb 2020 21:43:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30A0024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CEA9F3492E9; Thu, 27 Feb 2020 13:35:11 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB707348904 for ; Thu, 27 Feb 2020 13:21:05 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8F8E091B3; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8DF4046A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:45 -0500 Message-Id: <1582838290-17243-538-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 537/622] lnet: Optimize check for routing feature flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Check the routing feature flag outside of the loop. Cray-bug-id: LUS-7862 WC-bug-id: https://jira.whamcloud.com/browse/LU-12942 Lustre-commit: 7a99dc0b2f27 ("LU-12942 lnet: Optimize check for routing feature flag") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36679 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Neil Brown Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 447706d..41d0eb0 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -325,12 +325,14 @@ bool lnet_is_route_alive(struct lnet_route *route) spin_unlock(&lp->lp_lock); - if (lp_state & LNET_PEER_PING_FAILED) { - CDEBUG(D_NET, - "Ping failed with %d. Set routes down for gw %s\n", - lp->lp_ping_error, libcfs_nid2str(lp->lp_primary_nid)); - /* If the ping failed then mark the routes served by this - * peer down + if (lp_state & LNET_PEER_PING_FAILED || + pbuf->pb_info.pi_features & LNET_PING_FEAT_RTE_DISABLED) { + CDEBUG(D_NET, "Set routes down for gw %s because %s %d\n", + libcfs_nid2str(lp->lp_primary_nid), + lp_state & LNET_PEER_PING_FAILED ? "ping failed" : + "route feature is disabled", lp->lp_ping_error); + /* If the ping failed or the peer has routing disabled then + * mark the routes served by this peer down */ list_for_each_entry(route, &lp->lp_routes, lr_gwlist) lnet_set_route_aliveness(route, false); @@ -359,13 +361,6 @@ bool lnet_is_route_alive(struct lnet_route *route) route->lr_gateway->lp_primary_nid) continue; - /* gateway has the routing feature disabled */ - if (pbuf->pb_info.pi_features & - LNET_PING_FEAT_RTE_DISABLED) { - lnet_set_route_aliveness(route, false); - continue; - } - llpn = lnet_peer_get_net_locked(lp, route->lr_lnet); if (!llpn) { lnet_set_route_aliveness(route, false); From patchwork Thu Feb 27 21:16:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410903 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 424F41580 for ; Thu, 27 Feb 2020 21:50:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2A18B24690 for ; Thu, 27 Feb 2020 21:50:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A18B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 804F034A9AA; Thu, 27 Feb 2020 13:41:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 185A2348901 for ; Thu, 27 Feb 2020 13:21:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 920FC91B4; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 90D0346C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:46 -0500 Message-Id: <1582838290-17243-539-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 538/622] lustre: llite: file write pos mimatch X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam In vvp_io_write_start(), after data were successfully written, but for some reason (e.g. out of quota), the data does not or got partially commited, so that the file's write position (kiocb->ki_pos) would be pushed forward falsely, and in the next iteration of write loop, it fails the assertion ASSERTION( io->u.ci_rw.rw_iocb.ki_pos == range->cir_pos ) This patch corrects ki_pos if this scenario happens. WC-bug-id: https://jira.whamcloud.com/browse/LU-12503 Lustre-commit: 1d2aa1513dc4 ("LU-12503 llite: file write pos mimatch") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/36021 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/vvp_io.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index aa8f2e1..b3f628c 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1068,9 +1068,12 @@ static int vvp_io_write_start(const struct lu_env *env, struct cl_object *obj = io->ci_obj; struct inode *inode = vvp_object_inode(obj); struct ll_inode_info *lli = ll_i2info(inode); + struct file *file = vio->vui_fd->fd_file; bool lock_inode = !inode_is_locked(inode) && !IS_NOSEC(inode); loff_t pos = io->u.ci_wr.wr.crw_pos; size_t cnt = io->u.ci_wr.wr.crw_count; + size_t nob = io->ci_nob; + size_t written = 0; ssize_t result = 0; down_read(&lli->lli_trunc_sem); @@ -1135,6 +1138,7 @@ static int vvp_io_write_start(const struct lu_env *env, if (unlikely(lock_inode)) inode_unlock(inode); + written = result; if (result > 0 || result == -EIOCBQUEUED) result = generic_write_sync(vio->vui_iocb, result); } @@ -1149,6 +1153,15 @@ static int vvp_io_write_start(const struct lu_env *env, io->ci_nob, result); } } + if (vio->vui_iocb->ki_pos != (pos + io->ci_nob - nob)) { + CDEBUG(D_VFSTRACE, + "%s: write position mismatch: ki_pos %lld vs. pos %lld, written %ld, commit %ld rc %ld\n", + file_dentry(file)->d_name.name, + vio->vui_iocb->ki_pos, pos + io->ci_nob - nob, + written, io->ci_nob - nob, result); + /* rewind ki_pos to where it has successfully committed */ + vio->vui_iocb->ki_pos = pos + io->ci_nob - nob; + } if (result > 0) { set_bit(LLIF_DATA_MODIFIED, &(ll_i2info(inode))->lli_flags); From patchwork Thu Feb 27 21:16:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C518D1580 for ; Thu, 27 Feb 2020 21:48:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD78924690 for ; Thu, 27 Feb 2020 21:48:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD78924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5C46034B86F; Thu, 27 Feb 2020 13:39:04 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 599A121FC51 for ; Thu, 27 Feb 2020 13:21:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9640991B5; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 93AF546D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:47 -0500 Message-Id: <1582838290-17243-540-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 539/622] lustre: ldlm: FLOCK request can be processed twice X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Original request can be processed after resend request, so it can create a lock on MDT without client lock or unlock other lock. Make flock enqueue to use modify RPC slot. Cray-bug-id: LUS-5739 WC-bug-id: https://jira.whamcloud.com/browse/LU-12828 Lustre-commit: 85a12c6c8d7a ("LU-12828 ldlm: FLOCK request can be processed twice") Signed-off-by: Andriy Skulysh Signed-off-by: Vitaly Fertman Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/36340 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 3 +++ fs/lustre/include/lustre_mdc.h | 25 ----------------------- fs/lustre/include/lustre_net.h | 3 ++- fs/lustre/include/obd_class.h | 6 ++---- fs/lustre/ldlm/ldlm_request.c | 34 ++++++++++++++++++++++++++++--- fs/lustre/mdc/mdc_locks.c | 45 +++++++++++++----------------------------- fs/lustre/mdc/mdc_reint.c | 4 ++-- fs/lustre/mdc/mdc_request.c | 20 +++++++++---------- fs/lustre/obdclass/genops.c | 30 +++++----------------------- fs/lustre/ptlrpc/client.c | 29 +++++++++++++++++++++++++-- 10 files changed, 96 insertions(+), 103 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 7621d1e..31d360e 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -959,6 +959,8 @@ struct ldlm_enqueue_info { void *ei_cbdata; /* whether enqueue slave stripes */ unsigned int ei_enq_slave:1; + /* whether acquire rpc slot */ + unsigned int ei_enq_slot:1; }; extern struct obd_ops ldlm_obd_ops; @@ -1279,6 +1281,7 @@ int ldlm_prep_elc_req(struct obd_export *exp, int version, int opc, int canceloff, struct list_head *cancels, int count); +struct ptlrpc_request *ldlm_enqueue_pack(struct obd_export *exp, int lvb_len); int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, enum ldlm_type type, u8 with_policy, enum ldlm_mode mode, diff --git a/fs/lustre/include/lustre_mdc.h b/fs/lustre/include/lustre_mdc.h index f57783d..d7b6e4a 100644 --- a/fs/lustre/include/lustre_mdc.h +++ b/fs/lustre/include/lustre_mdc.h @@ -60,31 +60,6 @@ struct ptlrpc_request; struct obd_device; -static inline void mdc_get_mod_rpc_slot(struct ptlrpc_request *req, - struct lookup_intent *it) -{ - struct client_obd *cli = &req->rq_import->imp_obd->u.cli; - u32 opc; - u16 tag; - - opc = lustre_msg_get_opc(req->rq_reqmsg); - tag = obd_get_mod_rpc_slot(cli, opc, it); - lustre_msg_set_tag(req->rq_reqmsg, tag); - ptlrpc_reassign_next_xid(req); -} - -static inline void mdc_put_mod_rpc_slot(struct ptlrpc_request *req, - struct lookup_intent *it) -{ - struct client_obd *cli = &req->rq_import->imp_obd->u.cli; - u32 opc; - u16 tag; - - opc = lustre_msg_get_opc(req->rq_reqmsg); - tag = lustre_msg_get_tag(req->rq_reqmsg); - obd_put_mod_rpc_slot(cli, opc, it, tag); -} - /** * Update the maximum possible easize. * diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 87e1d60..90a0b01 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1919,7 +1919,8 @@ void ptlrpc_retain_replayable_request(struct ptlrpc_request *req, u64 ptlrpc_next_xid(void); u64 ptlrpc_sample_next_xid(void); u64 ptlrpc_req_xid(struct ptlrpc_request *request); -void ptlrpc_reassign_next_xid(struct ptlrpc_request *req); +void ptlrpc_get_mod_rpc_slot(struct ptlrpc_request *req); +void ptlrpc_put_mod_rpc_slot(struct ptlrpc_request *req); /* Set of routines to run a function in ptlrpcd context */ void *ptlrpcd_alloc_work(struct obd_import *imp, diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index bc01eca..a099768 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -115,10 +115,8 @@ static inline char *obd_import_nid2str(struct obd_import *imp) int obd_set_max_mod_rpcs_in_flight(struct client_obd *cli, u16 max); int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq); -u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc, - struct lookup_intent *it); -void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, - struct lookup_intent *it, u16 tag); +u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc); +void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag); struct llog_handle; struct llog_rec_hdr; diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 20bdba4..6df057d 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -347,6 +347,11 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns, } } +static bool ldlm_request_slot_needed(enum ldlm_type type) +{ + return type == LDLM_FLOCK || type == LDLM_IBITS; +} + /** * Finishing portion of client lock enqueue code. * @@ -365,6 +370,11 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, struct ldlm_reply *reply; int cleanup_phase = 1; + if (ldlm_request_slot_needed(type)) + obd_put_request_slot(&req->rq_import->imp_obd->u.cli); + + ptlrpc_put_mod_rpc_slot(req); + lock = ldlm_handle2lock(lockh); /* ldlm_cli_enqueue is holding a reference on this lock. */ if (!lock) { @@ -662,8 +672,7 @@ int ldlm_prep_enqueue_req(struct obd_export *exp, struct ptlrpc_request *req, } EXPORT_SYMBOL(ldlm_prep_enqueue_req); -static struct ptlrpc_request *ldlm_enqueue_pack(struct obd_export *exp, - int lvb_len) +struct ptlrpc_request *ldlm_enqueue_pack(struct obd_export *exp, int lvb_len) { struct ptlrpc_request *req; int rc; @@ -682,6 +691,7 @@ static struct ptlrpc_request *ldlm_enqueue_pack(struct obd_export *exp, ptlrpc_request_set_replen(req); return req; } +EXPORT_SYMBOL(ldlm_enqueue_pack); /** * Client-side lock enqueue. @@ -814,6 +824,24 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, LDLM_GLIMPSE_ENQUEUE); } + /* It is important to obtain modify RPC slot first (if applicable), so + * that threads that are waiting for a modify RPC slot are not polluting + * our rpcs in flight counter. + */ + if (einfo->ei_enq_slot) + ptlrpc_get_mod_rpc_slot(req); + + if (ldlm_request_slot_needed(einfo->ei_type)) { + rc = obd_get_request_slot(&req->rq_import->imp_obd->u.cli); + if (rc) { + if (einfo->ei_enq_slot) + ptlrpc_put_mod_rpc_slot(req); + failed_lock_cleanup(ns, lock, einfo->ei_mode); + LDLM_LOCK_RELEASE(lock); + goto out; + } + } + if (async) { LASSERT(reqp); return 0; @@ -835,7 +863,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, LDLM_LOCK_RELEASE(lock); else rc = err; - +out: if (!req_passed_in && req) { ptlrpc_req_finished(req); if (reqp) diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 4d40087..60bbae1 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -856,6 +856,16 @@ static int mdc_finish_enqueue(struct obd_export *exp, return rc; } +static inline bool mdc_skip_mod_rpc_slot(const struct lookup_intent *it) +{ + if (it && + (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP || + it->it_op == IT_READDIR || + (it->it_op == IT_LAYOUT && !(it->it_flags & MDS_FMODE_WRITE)))) + return true; + return false; +} + /* We always reserve enough space in the reply packet for a stripe MD, because * we don't know in advance the file type. */ @@ -877,7 +887,7 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, .l_inodebits = { MDS_INODELOCK_XATTR } }; struct obd_device *obddev = class_exp2obd(exp); - struct ptlrpc_request *req = NULL; + struct ptlrpc_request *req; u64 flags, saved_flags = extra_lock_flags; struct ldlm_res_id res_id; int generation, resends = 0; @@ -920,6 +930,7 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, LASSERTF(einfo->ei_type == LDLM_FLOCK, "lock type %d\n", einfo->ei_type); res_id.name[3] = LDLM_FLOCK; + req = ldlm_enqueue_pack(exp, 0); } else if (it->it_op & IT_OPEN) { req = mdc_intent_open_pack(exp, it, op_data, acl_bufsize); } else if (it->it_op & (IT_GETATTR | IT_LOOKUP)) { @@ -947,21 +958,7 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, req->rq_sent = ktime_get_real_seconds() + resends; } - /* It is important to obtain modify RPC slot first (if applicable), so - * that threads that are waiting for a modify RPC slot are not polluting - * our rpcs in flight counter. - * We do not do flock request limiting, though - */ - if (it) { - mdc_get_mod_rpc_slot(req, it); - rc = obd_get_request_slot(&obddev->u.cli); - if (rc != 0) { - mdc_put_mod_rpc_slot(req, it); - mdc_clear_replay_flag(req, 0); - ptlrpc_req_finished(req); - return rc; - } - } + einfo->ei_enq_slot = !mdc_skip_mod_rpc_slot(it); /* With Data-on-MDT the glimpse callback is needed too. * It is set here in advance but not in mdc_finish_enqueue() @@ -987,12 +984,10 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, (einfo->ei_type == LDLM_FLOCK) && (einfo->ei_mode == LCK_NL)) goto resend; + ptlrpc_req_finished(req); return rc; } - obd_put_request_slot(&obddev->u.cli); - mdc_put_mod_rpc_slot(req, it); - if (rc < 0) { CDEBUG(D_INFO, "%s: ldlm_cli_enqueue " DFID ":" DFID "=%s failed: rc = %d\n", @@ -1343,16 +1338,12 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, struct ldlm_enqueue_info *einfo = &minfo->mi_einfo; struct lookup_intent *it; struct lustre_handle *lockh; - struct obd_device *obddev; struct ldlm_reply *lockrep; u64 flags = LDLM_FL_HAS_INTENT; it = &minfo->mi_it; lockh = &minfo->mi_lockh; - obddev = class_exp2obd(exp); - - obd_put_request_slot(&obddev->u.cli); if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GETATTR_ENQUEUE)) rc = -ETIMEDOUT; @@ -1387,7 +1378,6 @@ int mdc_intent_getattr_async(struct obd_export *exp, struct lookup_intent *it = &minfo->mi_it; struct ptlrpc_request *req; struct mdc_getattr_args *ga; - struct obd_device *obddev = class_exp2obd(exp); struct ldlm_res_id res_id; union ldlm_policy_data policy = { .l_inodebits = { MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE } @@ -1409,12 +1399,6 @@ int mdc_intent_getattr_async(struct obd_export *exp, if (IS_ERR(req)) return PTR_ERR(req); - rc = obd_get_request_slot(&obddev->u.cli); - if (rc != 0) { - ptlrpc_req_finished(req); - return rc; - } - /* With Data-on-MDT the glimpse callback is needed too. * It is set here in advance but not in mdc_finish_enqueue() * to avoid possible races. It is safe to have glimpse handler @@ -1426,7 +1410,6 @@ int mdc_intent_getattr_async(struct obd_export *exp, rc = ldlm_cli_enqueue(exp, &req, &minfo->mi_einfo, &res_id, &policy, &flags, NULL, 0, LVB_T_NONE, &minfo->mi_lockh, 1); if (rc < 0) { - obd_put_request_slot(&obddev->u.cli); ptlrpc_req_finished(req); return rc; } diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 0dc0de4..dade5686 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -47,9 +47,9 @@ static int mdc_reint(struct ptlrpc_request *request, int level) request->rq_send_state = level; - mdc_get_mod_rpc_slot(request, NULL); + ptlrpc_get_mod_rpc_slot(request); rc = ptlrpc_queue_wait(request); - mdc_put_mod_rpc_slot(request, NULL); + ptlrpc_put_mod_rpc_slot(request); if (rc) CDEBUG(D_INFO, "error in handling %d\n", rc); else if (!req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY)) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 54f6d15..8569858 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -412,12 +412,12 @@ static int mdc_xattr_common(struct obd_export *exp, /* make rpc */ if (opcode == MDS_REINT) - mdc_get_mod_rpc_slot(req, NULL); + ptlrpc_get_mod_rpc_slot(req); rc = ptlrpc_queue_wait(req); if (opcode == MDS_REINT) - mdc_put_mod_rpc_slot(req, NULL); + ptlrpc_put_mod_rpc_slot(req); if (rc) ptlrpc_req_finished(req); @@ -990,9 +990,9 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data, ptlrpc_request_set_replen(req); - mdc_get_mod_rpc_slot(req, NULL); + ptlrpc_get_mod_rpc_slot(req); rc = ptlrpc_queue_wait(req); - mdc_put_mod_rpc_slot(req, NULL); + ptlrpc_put_mod_rpc_slot(req); if (!req->rq_repmsg) { CDEBUG(D_RPCTRACE, "request %p failed to send: rc = %d\n", req, @@ -1779,9 +1779,9 @@ static int mdc_ioc_hsm_progress(struct obd_export *exp, ptlrpc_request_set_replen(req); - mdc_get_mod_rpc_slot(req, NULL); + ptlrpc_get_mod_rpc_slot(req); rc = ptlrpc_queue_wait(req); - mdc_put_mod_rpc_slot(req, NULL); + ptlrpc_put_mod_rpc_slot(req); out: ptlrpc_req_finished(req); return rc; @@ -1984,9 +1984,9 @@ static int mdc_ioc_hsm_state_set(struct obd_export *exp, ptlrpc_request_set_replen(req); - mdc_get_mod_rpc_slot(req, NULL); + ptlrpc_get_mod_rpc_slot(req); rc = ptlrpc_queue_wait(req); - mdc_put_mod_rpc_slot(req, NULL); + ptlrpc_put_mod_rpc_slot(req); out: ptlrpc_req_finished(req); return rc; @@ -2049,9 +2049,9 @@ static int mdc_ioc_hsm_request(struct obd_export *exp, ptlrpc_request_set_replen(req); - mdc_get_mod_rpc_slot(req, NULL); + ptlrpc_get_mod_rpc_slot(req); rc = ptlrpc_queue_wait(req); - mdc_put_mod_rpc_slot(req, NULL); + ptlrpc_put_mod_rpc_slot(req); out: ptlrpc_req_finished(req); return rc; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 7f841d5..bceb055 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1495,36 +1495,18 @@ static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli, return avail; } -static inline bool obd_skip_mod_rpc_slot(const struct lookup_intent *it) -{ - if (it && - (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP || - it->it_op == IT_READDIR || - (it->it_op == IT_LAYOUT && !(it->it_flags & MDS_FMODE_WRITE)))) - return true; - return false; -} - /* Get a modify RPC slot from the obd client @cli according - * to the kind of operation @opc that is going to be sent - * and the intent @it of the operation if it applies. + * to the kind of operation @opc that is going to be sent. * If the maximum number of modify RPCs in flight is reached * the thread is put to sleep. * Returns the tag to be set in the request message. Tag 0 * is reserved for non-modifying requests. */ -u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc, - struct lookup_intent *it) +u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc) { bool close_req = false; u16 i, max; - /* read-only metadata RPCs don't consume a slot on MDT - * for reply reconstruction - */ - if (obd_skip_mod_rpc_slot(it)) - return 0; - if (opc == MDS_CLOSE) close_req = true; @@ -1567,15 +1549,13 @@ u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc, /* * Put a modify RPC slot from the obd client @cli according - * to the kind of operation @opc that has been sent and the - * intent @it of the operation if it applies. + * to the kind of operation @opc that has been sent. */ -void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, - struct lookup_intent *it, u16 tag) +void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag) { bool close_req = false; - if (obd_skip_mod_rpc_slot(it)) + if (tag == 0) return; if (opc == MDS_CLOSE) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 8d874f2..632ddf1 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -717,7 +717,7 @@ static inline void ptlrpc_assign_next_xid(struct ptlrpc_request *req) static atomic64_t ptlrpc_last_xid; -void ptlrpc_reassign_next_xid(struct ptlrpc_request *req) +static void ptlrpc_reassign_next_xid(struct ptlrpc_request *req) { spin_lock(&req->rq_import->imp_lock); list_del_init(&req->rq_unreplied_list); @@ -725,7 +725,32 @@ void ptlrpc_reassign_next_xid(struct ptlrpc_request *req) spin_unlock(&req->rq_import->imp_lock); DEBUG_REQ(D_RPCTRACE, req, "reassign xid"); } -EXPORT_SYMBOL(ptlrpc_reassign_next_xid); + +void ptlrpc_get_mod_rpc_slot(struct ptlrpc_request *req) +{ + struct client_obd *cli = &req->rq_import->imp_obd->u.cli; + u32 opc; + u16 tag; + + opc = lustre_msg_get_opc(req->rq_reqmsg); + tag = obd_get_mod_rpc_slot(cli, opc); + lustre_msg_set_tag(req->rq_reqmsg, tag); + ptlrpc_reassign_next_xid(req); +} +EXPORT_SYMBOL(ptlrpc_get_mod_rpc_slot); + +void ptlrpc_put_mod_rpc_slot(struct ptlrpc_request *req) +{ + u16 tag = lustre_msg_get_tag(req->rq_reqmsg); + + if (tag != 0) { + struct client_obd *cli = &req->rq_import->imp_obd->u.cli; + u32 opc = lustre_msg_get_opc(req->rq_reqmsg); + + obd_put_mod_rpc_slot(cli, opc, tag); + } +} +EXPORT_SYMBOL(ptlrpc_put_mod_rpc_slot); int ptlrpc_request_bufs_pack(struct ptlrpc_request *request, u32 version, int opcode, char **bufs, From patchwork Thu Feb 27 21:16:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27FF9138D for ; Thu, 27 Feb 2020 21:42:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 10A1C24690 for ; Thu, 27 Feb 2020 21:42:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 10A1C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BE8E434AB4F; Thu, 27 Feb 2020 13:34:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AEC2621FC51 for ; Thu, 27 Feb 2020 13:21:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 97A3E91B6; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9677B468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:48 -0500 Message-Id: <1582838290-17243-541-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 540/622] lnet: timers: correctly offset mod_timer. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" During a high level code review of the lustre time code it was discovered that some of the mod_timer() calles was missing adding the current jiffies value to the timeout that converted to jiffies from seconds. Add this proper offset. Fixes: 5109c2502543 ("staging: lustre: lnet: move ping and delay injection to time64_t") WC-bug-id: https://jira.whamcloud.com/browse/LU-12931 Lustre-commit: e150810faa5 ("LU-12931 timers: correctly offset mod_timer.") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/36688 Reviewed-by: Neil Brown Reviewed-by: Alex Zhuravlev Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/net_fault.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index e43b1e1..8408e93 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -487,7 +487,7 @@ struct lnet_delay_rule { /** baseline to caculate dl_delay_time */ time64_t dl_time_base; /** jiffies to send the next delayed message */ - unsigned long dl_msg_send; + time64_t dl_msg_send; /** delayed message list */ struct list_head dl_msg_list; /** statistic of delayed messages */ @@ -592,7 +592,7 @@ struct delay_daemon_data { msg->msg_delay_send = ktime_get_seconds() + attr->u.delay.la_latency; if (rule->dl_msg_send == -1) { rule->dl_msg_send = msg->msg_delay_send; - mod_timer(&rule->dl_timer, rule->dl_msg_send); + mod_timer(&rule->dl_timer, jiffies + rule->dl_msg_send * HZ); } spin_unlock(&rule->dl_lock); @@ -664,7 +664,7 @@ struct delay_daemon_data { msg = list_first_entry(&rule->dl_msg_list, struct lnet_msg, msg_list); rule->dl_msg_send = msg->msg_delay_send; - mod_timer(&rule->dl_timer, rule->dl_msg_send); + mod_timer(&rule->dl_timer, jiffies + rule->dl_msg_send * HZ); } spin_unlock(&rule->dl_lock); } From patchwork Thu Feb 27 21:16:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13E50924 for ; Thu, 27 Feb 2020 21:43:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F072F24690 for ; Thu, 27 Feb 2020 21:43:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F072F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6E366348E26; Thu, 27 Feb 2020 13:35:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EFA3F21FC51 for ; Thu, 27 Feb 2020 13:21:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9AA6091B7; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9970C47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:49 -0500 Message-Id: <1582838290-17243-542-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 541/622] lustre: ptlrpc: update wiretest for new values X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Update wiretest.c file to fixes issues with some #defines that were changed to named enums. Don't need to wire check posix acl structures if CONFIG_FS_POSIX_ACL is disabled. Fixes: cd7fd3b2e230 ("lustre: obd: add rmfid support") Fixes: c52da9b97ee0 ("lustre: introduce CONFIG_LUSTRE_FS_POSIX_ACL") Fixes: 0b75bfcd14ac ("lustre: uapi: Add nonrotational flag to statfs") WC-bug-id: https://jira.whamcloud.com/browse/LU-12937 Lustre-commit: bc2e23e1cd80 ("LU-12937 utils: update wirecheck for new values") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/36706 Reviewed-by: Artem Blagodarenko Reviewed-by: Arshad Hussain Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 671878d..9fc7a5b 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1748,19 +1748,19 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct obd_statfs, os_spare9)); LASSERTF((int)sizeof(((struct obd_statfs *)0)->os_spare9) == 4, "found %lld\n", (long long)(int)sizeof(((struct obd_statfs *)0)->os_spare9)); - LASSERTF(OS_STATE_DEGRADED == 0x1, "found %lld\n", + LASSERTF(OS_STATE_DEGRADED == 0x00000001UL, "found %lld\n", (long long)OS_STATE_DEGRADED); - LASSERTF(OS_STATE_READONLY == 0x2, "found %lld\n", + LASSERTF(OS_STATE_READONLY == 0x00000002UL, "found %lld\n", (long long)OS_STATE_READONLY); - LASSERTF(OS_STATE_NOPRECREATE == 0x4, "found %lld\n", + LASSERTF(OS_STATE_NOPRECREATE == 0x00000004UL, "found %lld\n", (long long)OS_STATE_NOPRECREATE); - LASSERTF(OS_STATE_ENOSPC == 0x20, "found %lld\n", + LASSERTF(OS_STATE_ENOSPC == 0x00000020UL, "found %lld\n", (long long)OS_STATE_ENOSPC); - LASSERTF(OS_STATE_ENOINO == 0x40, "found %lld\n", + LASSERTF(OS_STATE_ENOINO == 0x00000040UL, "found %lld\n", (long long)OS_STATE_ENOINO); - LASSERTF(OS_STATE_SUM == 0x100, "found %lld\n", + LASSERTF(OS_STATE_SUM == 0x00000100UL, "found %lld\n", (long long)OS_STATE_SUM); - LASSERTF(OS_STATE_NONROT == 0x200, "found %lld\n", + LASSERTF(OS_STATE_NONROT == 0x00000200UL, "found %lld\n", (long long)OS_STATE_NONROT); /* Checks for struct obd_ioobj */ @@ -2178,19 +2178,19 @@ void lustre_assert_wire_constants(void) LUSTRE_DIRECTIO_FL); LASSERTF(LUSTRE_INLINE_DATA_FL == 0x10000000, "found 0x%.8x\n", LUSTRE_INLINE_DATA_FL); - LASSERTF(MDS_INODELOCK_LOOKUP == 0x000001, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_LOOKUP == 0x00000001UL, "found 0x%.8x\n", MDS_INODELOCK_LOOKUP); - LASSERTF(MDS_INODELOCK_UPDATE == 0x000002, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_UPDATE == 0x00000002UL, "found 0x%.8x\n", MDS_INODELOCK_UPDATE); - LASSERTF(MDS_INODELOCK_OPEN == 0x000004, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_OPEN == 0x00000004UL, "found 0x%.8x\n", MDS_INODELOCK_OPEN); - LASSERTF(MDS_INODELOCK_LAYOUT == 0x000008, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_LAYOUT == 0x00000008UL, "found 0x%.8x\n", MDS_INODELOCK_LAYOUT); - LASSERTF(MDS_INODELOCK_PERM == 0x000010, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_PERM == 0x00000010UL, "found 0x%.8x\n", MDS_INODELOCK_PERM); - LASSERTF(MDS_INODELOCK_XATTR == 0x000020, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_XATTR == 0x00000020UL, "found 0x%.8x\n", MDS_INODELOCK_XATTR); - LASSERTF(MDS_INODELOCK_DOM == 0x000040, "found 0x%.8x\n", + LASSERTF(MDS_INODELOCK_DOM == 0x00000040UL, "found 0x%.8x\n", MDS_INODELOCK_DOM); /* Checks for struct mdt_ioepoch */ @@ -4176,6 +4176,7 @@ void lustre_assert_wire_constants(void) BUILD_BUG_ON(FIEMAP_EXTENT_NO_DIRECT != 0x40000000); BUILD_BUG_ON(FIEMAP_EXTENT_NET != 0x80000000); +#ifdef CONFIG_FS_POSIX_ACL /* Checks for type posix_acl_xattr_entry */ LASSERTF((int)sizeof(struct posix_acl_xattr_entry) == 8, "found %lld\n", (long long)(int)sizeof(struct posix_acl_xattr_entry)); @@ -4199,6 +4200,7 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct posix_acl_xattr_header, a_version)); LASSERTF((int)sizeof(((struct posix_acl_xattr_header *)0)->a_version) == 4, "found %lld\n", (long long)(int)sizeof(((struct posix_acl_xattr_header *)0)->a_version)); +#endif /* CONFIG_FS_POSIX_ACL */ /* Checks for struct link_ea_header */ LASSERTF((int)sizeof(struct link_ea_header) == 24, "found %lld\n", From patchwork Thu Feb 27 21:16:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F03E1138D for ; Thu, 27 Feb 2020 21:44:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D90BA24690 for ; Thu, 27 Feb 2020 21:44:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D90BA24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0879A349809; Thu, 27 Feb 2020 13:35:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 539A721FC51 for ; Thu, 27 Feb 2020 13:21:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9D5A591B8; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9C39046A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:50 -0500 Message-Id: <1582838290-17243-543-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 542/622] lustre: ptlrpc: do lu_env_refill for any new request X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Perform lu_env_refill() prior any new request handling. That was done already server side by tgt_request_handle() and is moved now to ptlrpc_main() to work for any handler as well, e.g. ldlm_cancel_handler() WC-bug-id: https://jira.whamcloud.com/browse/LU-12741 Lustre-commit: 3f304b75d24a ("LU-12741 ptlrpc: do lu_env_refill for new request") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/36714 Reviewed-by: Alex Zhuravlev Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index c874487..f65d5c5 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -2281,6 +2281,12 @@ static int ptlrpc_main(void *arg) ptlrpc_start_thread(svcpt, 0); } + /* reset le_ses to initial state */ + env->le_ses = NULL; + /* Refill the context before execution to make sure + * all thread keys are allocated + */ + lu_env_refill(env); /* Process all incoming reqs before handling any */ if (ptlrpc_server_request_incoming(svcpt)) { lu_context_enter(&env->le_ctx); From patchwork Thu Feb 27 21:16:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E55F0924 for ; Thu, 27 Feb 2020 21:46:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE17B24690 for ; Thu, 27 Feb 2020 21:46:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE17B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 48EBA34A1CD; Thu, 27 Feb 2020 13:37:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 97A3E21FC51 for ; Thu, 27 Feb 2020 13:21:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A376191B9; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A030646C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:51 -0500 Message-Id: <1582838290-17243-544-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 543/622] lustre: obd: perform proper division X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Lustre stats have two files lc_sum and lc_count which are both s64 so using do_div() is completely wrong. Use div64_s64() instead. WC-bug-id: https://jira.whamcloud.com/browse/LU-6174 Lustre-commit: e8f793f620f4 ("LU-6174 obd: perform proper division") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/36751 Reviewed-by: Shaun Tancheff Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 806d6517..893f06d 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -799,15 +799,10 @@ int lprocfs_rd_import(struct seq_file *m, void *data) header = &obd->obd_svc_stats->ls_cnt_header[PTLRPC_REQWAIT_CNTR]; lprocfs_stats_collect(obd->obd_svc_stats, PTLRPC_REQWAIT_CNTR, &ret); - if (ret.lc_count != 0) { - /* first argument to do_div MUST be u64 */ - u64 sum = ret.lc_sum; - - do_div(sum, ret.lc_count); - ret.lc_sum = sum; - } else { + if (ret.lc_count != 0) + ret.lc_sum = div64_s64(ret.lc_sum, ret.lc_count); + else ret.lc_sum = 0; - } seq_printf(m, " rpcs:\n" " inflight: %u\n" @@ -848,11 +843,7 @@ int lprocfs_rd_import(struct seq_file *m, void *data) PTLRPC_LAST_CNTR + BRW_READ_BYTES + rw, &ret); if (ret.lc_sum > 0 && ret.lc_count > 0) { - /* first argument to do_div MUST be u64 */ - u64 sum = ret.lc_sum; - - do_div(sum, ret.lc_count); - ret.lc_sum = sum; + ret.lc_sum = div64_s64(ret.lc_sum, ret.lc_count); seq_printf(m, " %s_data_averages:\n" " bytes_per_rpc: %llu\n", @@ -864,11 +855,7 @@ int lprocfs_rd_import(struct seq_file *m, void *data) header = &obd->obd_svc_stats->ls_cnt_header[j]; lprocfs_stats_collect(obd->obd_svc_stats, j, &ret); if (ret.lc_sum > 0 && ret.lc_count != 0) { - /* first argument to do_div MUST be u64 */ - u64 sum = ret.lc_sum; - - do_div(sum, ret.lc_count); - ret.lc_sum = sum; + ret.lc_sum = div64_s64(ret.lc_sum, ret.lc_count); seq_printf(m, " %s_per_rpc: %llu\n", header->lc_units, ret.lc_sum); From patchwork Thu Feb 27 21:16:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410675 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B7A9924 for ; Thu, 27 Feb 2020 21:44:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0479224690 for ; Thu, 27 Feb 2020 21:44:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0479224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 03B60349B9E; Thu, 27 Feb 2020 13:35:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA5B221FC51 for ; Thu, 27 Feb 2020 13:21:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A8B8591BA; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A2DD5468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:52 -0500 Message-Id: <1582838290-17243-545-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 544/622] lustre: uapi: introduce OBD_CONNECT2_CRUSH X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Introduce a new connect flag OBD_CONNECT2_CRUSH to indicate whether client or server supports new directory hash type 'crush'. WC-bug-id: https://jira.whamcloud.com/browse/LU-11025 Lustre-commit: dbafa9df0f8f ("LU-11025 uapi: introduce OBD_CONNECT2_CRUSH") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/36774 Reviewed-by: Andreas Dilger Reviewed-by: Olaf Faaland-LLNL Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 2 +- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 893f06d..9772194 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -124,7 +124,7 @@ "selinux_policy", /* 0x400 */ "lsom", /* 0x800 */ "pcc", /* 0x1000 */ - "plain_layout", /* 0x2000 */ + "crush", /* 0x2000 */ "async_discard", /* 0x4000 */ "client_encryption", /* 0x8000 */ NULL diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 9fc7a5b..6c66815 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1158,6 +1158,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_LSOM); LASSERTF(OBD_CONNECT2_PCC == 0x1000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_PCC); + LASSERTF(OBD_CONNECT2_CRUSH == 0x2000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_CRUSH); LASSERTF(OBD_CONNECT2_ASYNC_DISCARD == 0x4000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ASYNC_DISCARD); LASSERTF(OBD_CONNECT2_ENCRYPT == 0x8000ULL, "found 0x%.16llxULL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index a74d979..a69d49a 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -810,6 +810,8 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ #define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ #define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ +#define OBD_CONNECT2_CRUSH 0x2000ULL /* crush hash striped directory + */ #define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* support async DoM data * discard */ From patchwork Thu Feb 27 21:16:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D094A17E0 for ; Thu, 27 Feb 2020 21:44:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B86DB24690 for ; Thu, 27 Feb 2020 21:44:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B86DB24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9295A349F23; Thu, 27 Feb 2020 13:35:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2711921FF40 for ; Thu, 27 Feb 2020 13:21:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AC2DB91BB; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A5AB146D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:53 -0500 Message-Id: <1582838290-17243-546-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 545/622] lnet: Wait for single discovery attempt of routers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Historically, check_routers_before_use would cause LNet initialization to pause until all routers had been ping'd once. This behavior was changed in commit fe17e9b8370affe063769b880f02b9190584baaa from LU-11298. Now, LNet will wait indefinitely until discovery completes on all routers. This is problematic, because if even one router is down then LNet will stall forever. Introduce a new lnet_peer state to indicate whether a router has been discovered (either successfully or not) to restore the historic behavior. Fixes fe17e9b8370a ("LU-11298 lnet: use peer for gateway") Cray-bug-id: LUS-8184 WC-bug-id: https://jira.whamcloud.com/browse/LU-13001 Lustre-commit: d45a032d9a5c ("LU-13001 lnet: Wait for single discovery attempt of routers") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36820 Reviewed-by: Amir Shehata Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/router.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 51cc9ce..4b110eb 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -732,6 +732,8 @@ struct lnet_peer { /* gw undergoing alive discovery */ #define LNET_PEER_RTR_DISCOVERY BIT(16) +/* gw has undergone discovery (does not indicate success or failure) */ +#define LNET_PEER_RTR_DISCOVERED BIT(17) struct lnet_peer_net { /* chain on lp_peer_nets */ diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 41d0eb0..71ba951 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -408,6 +408,7 @@ bool lnet_is_route_alive(struct lnet_route *route) spin_lock(&lp->lp_lock); lp->lp_state &= ~LNET_PEER_RTR_DISCOVERY; + lp->lp_state |= LNET_PEER_RTR_DISCOVERED; spin_unlock(&lp->lp_lock); /* Router discovery successful? All peer information would've been @@ -882,7 +883,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) { spin_lock(&rtr->lp_lock); - if (!(rtr->lp_state & LNET_PEER_DISCOVERED)) { + if (!(rtr->lp_state & LNET_PEER_RTR_DISCOVERED)) { all_known = 0; spin_unlock(&rtr->lp_lock); break; From patchwork Thu Feb 27 21:16:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92CCD924 for ; Thu, 27 Feb 2020 21:45:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7B94F24690 for ; Thu, 27 Feb 2020 21:45:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7B94F24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E953434B028; Thu, 27 Feb 2020 13:36:15 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 68260348927 for ; Thu, 27 Feb 2020 13:21:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AF95691BC; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AB9D346A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:54 -0500 Message-Id: <1582838290-17243-547-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 546/622] lustre: mgc: config lock leak X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov Regression introduced by "LU-580: update mgc llog process code". It takes additional cld reference to the lock, but lock cancel forget during normal shutdown. So this lock holds cld on the list for a long time. any config modification needs to cancel each lock separately. Cray-bugid: LUS-6253 Fixes: d7e09d0397e8 ("LU-580: update mgc llog process code") WC-bug-id: https://jira.whamcloud.com/browse/LU-11185 Lustre-commit: 0ad54d597773 ("LU-11185 mgc: config lock leak") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/32890 Reviewed-by: Alexandr Boyko Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 1 + fs/lustre/ldlm/ldlm_lock.c | 3 +++ fs/lustre/mgc/mgc_request.c | 57 ++++++++++++++++++++++++++----------------- 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index a099768..85fe129 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -197,6 +197,7 @@ int class_config_parse_llog(const struct lu_env *env, struct llog_ctxt *ctxt, /* list of active configuration logs */ struct config_llog_data { struct ldlm_res_id cld_resid; + struct lustre_handle cld_lockh; struct config_llog_instance cld_cfg; struct list_head cld_list_chain; /* on config_llog_list */ atomic_t cld_refcount; diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 62d2c1d..2471e30 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -512,6 +512,9 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle, LASSERT(handle); + if (!lustre_handle_is_used(handle)) + return NULL; + lock = class_handle2object(handle->cookie, &lock_handle_ops); if (!lock) return NULL; diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index 28064fd..b2c296e 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -122,7 +122,7 @@ static int mgc_logname2resid(char *logname, struct ldlm_res_id *res_id, static int config_log_get(struct config_llog_data *cld) { atomic_inc(&cld->cld_refcount); - CDEBUG(D_INFO, "log %s refs %d\n", cld->cld_logname, + CDEBUG(D_INFO, "log %s (%p) refs %d\n", cld->cld_logname, cld, atomic_read(&cld->cld_refcount)); return 0; } @@ -135,7 +135,7 @@ static void config_log_put(struct config_llog_data *cld) if (!cld) return; - CDEBUG(D_INFO, "log %s refs %d\n", cld->cld_logname, + CDEBUG(D_INFO, "log %s(%p) refs %d\n", cld->cld_logname, cld, atomic_read(&cld->cld_refcount)); LASSERT(atomic_read(&cld->cld_refcount) > 0); @@ -379,16 +379,26 @@ struct config_llog_data *do_config_log_add(struct obd_device *obd, return ERR_PTR(rc); } -static inline void config_mark_cld_stop(struct config_llog_data *cld) -{ - if (!cld) - return; +DEFINE_MUTEX(llog_process_lock); - mutex_lock(&cld->cld_lock); +static inline void config_mark_cld_stop_nolock(struct config_llog_data *cld) +{ spin_lock(&config_list_lock); cld->cld_stopping = 1; spin_unlock(&config_list_lock); - mutex_unlock(&cld->cld_lock); + + CDEBUG(D_INFO, "lockh %#llx\n", cld->cld_lockh.cookie); + if (!ldlm_lock_addref_try(&cld->cld_lockh, LCK_CR)) + ldlm_lock_decref_and_cancel(&cld->cld_lockh, LCK_CR); +} + +static inline void config_mark_cld_stop(struct config_llog_data *cld) +{ + if (cld) { + mutex_lock(&cld->cld_lock); + config_mark_cld_stop_nolock(cld); + mutex_unlock(&cld->cld_lock); + } } /** Stop watching for updates on this log. @@ -420,10 +430,6 @@ static int config_log_end(char *logname, struct config_llog_instance *cfg) return rc; } - spin_lock(&config_list_lock); - cld->cld_stopping = 1; - spin_unlock(&config_list_lock); - cld_recover = cld->cld_recover; cld->cld_recover = NULL; @@ -431,21 +437,22 @@ static int config_log_end(char *logname, struct config_llog_instance *cfg) cld->cld_params = NULL; cld_sptlrpc = cld->cld_sptlrpc; cld->cld_sptlrpc = NULL; + + config_mark_cld_stop_nolock(cld); mutex_unlock(&cld->cld_lock); config_mark_cld_stop(cld_recover); - config_log_put(cld_recover); - config_mark_cld_stop(cld_params); - config_log_put(cld_params); + config_mark_cld_stop(cld_sptlrpc); + config_log_put(cld_params); + config_log_put(cld_recover); config_log_put(cld_sptlrpc); /* drop the ref from the find */ config_log_put(cld); /* drop the start ref */ config_log_put(cld); - CDEBUG(D_MGC, "end config log %s (%d)\n", logname ? logname : "client", rc); return rc; @@ -627,9 +634,14 @@ static void mgc_requeue_add(struct config_llog_data *cld) cld->cld_stopping, rq_state); LASSERT(atomic_read(&cld->cld_refcount) > 0); + /* lets cancel an existent lock to mark cld as "lostlock" */ + CDEBUG(D_INFO, "lockh %#llx\n", cld->cld_lockh.cookie); + if (!ldlm_lock_addref_try(&cld->cld_lockh, LCK_CR)) + ldlm_lock_decref_and_cancel(&cld->cld_lockh, LCK_CR); + mutex_lock(&cld->cld_lock); spin_lock(&config_list_lock); - if (!(rq_state & RQ_STOP) && !cld->cld_stopping && !cld->cld_lostlock) { + if (!(rq_state & RQ_STOP) && !cld->cld_stopping) { cld->cld_lostlock = 1; rq_state |= RQ_NOW; wakeup = true; @@ -803,6 +815,7 @@ static int mgc_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, LASSERT(atomic_read(&cld->cld_refcount) > 0); lock->l_ast_data = NULL; + cld->cld_lockh.cookie = 0; /* Are we done with this log? */ if (cld->cld_stopping) { CDEBUG(D_MGC, "log %s: stopping, won't requeue\n", @@ -1616,9 +1629,12 @@ int mgc_process_log(struct obd_device *mgc, struct config_llog_data *cld) /* Get the cld, it will be released in mgc_blocking_ast. */ config_log_get(cld); rc = ldlm_lock_set_data(&lockh, (void *)cld); + LASSERT(!lustre_handle_is_used(&cld->cld_lockh)); LASSERT(rc == 0); + cld->cld_lockh = lockh; } else { CDEBUG(D_MGC, "Can't get cfg lock: %d\n", rcl); + cld->cld_lockh.cookie = 0; if (rcl == -ESHUTDOWN && atomic_read(&mgc->u.cli.cl_mgc_refcount) > 0 && !retry) { @@ -1673,9 +1689,6 @@ int mgc_process_log(struct obd_device *mgc, struct config_llog_data *cld) CERROR("%s: recover log %s failed: rc = %d not fatal.\n", mgc->obd_name, cld->cld_logname, rc); rc = 0; - spin_lock(&config_list_lock); - cld->cld_lostlock = 1; - spin_unlock(&config_list_lock); } } } else { @@ -1685,12 +1698,12 @@ int mgc_process_log(struct obd_device *mgc, struct config_llog_data *cld) CDEBUG(D_MGC, "%s: configuration from log '%s' %sed (%d).\n", mgc->obd_name, cld->cld_logname, rc ? "fail" : "succeed", rc); - mutex_unlock(&cld->cld_lock); - /* Now drop the lock so MGS can revoke it */ if (!rcl) ldlm_lock_decref(&lockh, LCK_CR); + mutex_unlock(&cld->cld_lock); + return rc; } From patchwork Thu Feb 27 21:16:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DB97924 for ; Thu, 27 Feb 2020 21:50:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05DB124690 for ; Thu, 27 Feb 2020 21:50:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05DB124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2707E34B9B1; Thu, 27 Feb 2020 13:41:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BD6A2348929 for ; Thu, 27 Feb 2020 13:21:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B143D91BD; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ADF1D47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:55 -0500 Message-Id: <1582838290-17243-548-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 547/622] lnet: check if current->nsproxy is NULL before using X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sonia Sharma A crash is seen at few sites in the function rdma_create_id(current->nsproxy->net_ns, cb, dev, ps, qpt). The issue is identified with the first param in this function - current->nsproxy->net_ns. There is a possibility that this value is NULL and resulting in "kernel NULL pointer dereference" crash. Handle the case of NULL value gracefully by adding a check and using init_net if current or current->nsproxy is NULL. WC-bug-id: https://jira.whamcloud.com/browse/LU-11385 Lustre-commit: ef1783e282f6 ("LU-11385 lnet: check if current->nsproxy is NULL before using") Signed-off-by: Sonia Sharma Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/34577 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Sebastien Buisson Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 6 +++--- net/lnet/lnet/acceptor.c | 7 ++++--- net/lnet/lnet/config.c | 9 ++++++--- net/lnet/lnet/lib-move.c | 4 ++-- 4 files changed, 15 insertions(+), 11 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index ac91757..2169fdd 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -108,9 +108,9 @@ struct kib_tunables { min((t)->lnd_peercredits_hiw, \ (u32)(conn)->ibc_queue_depth - 1)) -# define kiblnd_rdma_create_id(ns, cb, dev, ps, qpt) rdma_create_id(ns, cb, \ - dev, ps, \ - qpt) +# define kiblnd_rdma_create_id(ns, cb, dev, ps, qpt) \ + rdma_create_id((ns) ? (ns) : &init_net, cb, dev, ps, qpt) + /* 2 OOB shall suffice for 1 keepalive and 1 returning credits */ #define IBLND_OOB_CAPABLE(v) ((v) != IBLND_MSG_VERSION_1) #define IBLND_OOB_MSGS(v) (IBLND_OOB_CAPABLE(v) ? 2 : 0) diff --git a/net/lnet/lnet/acceptor.c b/net/lnet/lnet/acceptor.c index 23b5bf0..acd1d75 100644 --- a/net/lnet/lnet/acceptor.c +++ b/net/lnet/lnet/acceptor.c @@ -458,14 +458,15 @@ if (!lnet_count_acceptor_nets()) /* not required */ return 0; - - lnet_acceptor_state.pta_ns = current->nsproxy->net_ns; + if (current->nsproxy && current->nsproxy->net_ns) + lnet_acceptor_state.pta_ns = current->nsproxy->net_ns; + else + lnet_acceptor_state.pta_ns = &init_net; task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure, "acceptor_%03ld", secure); if (IS_ERR(task)) { rc2 = PTR_ERR(task); CERROR("Can't start acceptor thread: %ld\n", rc2); - return -ESRCH; } diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 2c8edcd..f521b0b 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -464,10 +464,10 @@ struct lnet_net * ni->ni_nid = LNET_MKNID(net->net_id, 0); /* Store net namespace in which current ni is being created */ - if (current->nsproxy->net_ns) + if (current->nsproxy && current->nsproxy->net_ns) ni->ni_net_ns = get_net(current->nsproxy->net_ns); else - ni->ni_net_ns = NULL; + ni->ni_net_ns = get_net(&init_net); ni->ni_state = LNET_NI_STATE_INIT; list_add_tail(&ni->ni_netlist, &net->net_ni_added); @@ -1642,7 +1642,10 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns) int rc; int i; - nip = lnet_inet_enumerate(&ifaces, current->nsproxy->net_ns); + if (current->nsproxy && current->nsproxy->net_ns) + nip = lnet_inet_enumerate(&ifaces, current->nsproxy->net_ns); + else + nip = lnet_inet_enumerate(&ifaces, &init_net); if (nip < 0) { if (nip != -ENOENT) { LCONSOLE_ERROR_MSG(0x117, diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index b8278ad..ca0009c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4826,9 +4826,9 @@ struct lnet_msg * * If not, assign order above 0xffff0000, * to make this ni not a priority. */ - if (!net_eq(ni->ni_net_ns, current->nsproxy->net_ns)) + if (current->nsproxy && + !net_eq(ni->ni_net_ns, current->nsproxy->net_ns)) order += 0xffff0000; - if (srcnidp) *srcnidp = ni->ni_nid; if (orderp) From patchwork Thu Feb 27 21:16:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86A91924 for ; Thu, 27 Feb 2020 21:46:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6F54D24690 for ; Thu, 27 Feb 2020 21:46:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F54D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6AEBC34B309; Thu, 27 Feb 2020 13:37:05 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2254E21FD16 for ; Thu, 27 Feb 2020 13:21:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B2F0591BE; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B1A9446C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:56 -0500 Message-Id: <1582838290-17243-549-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 548/622] lustre: ptlrpc: always reset generation for idle reconnect X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Idle reconnetion is common case and reconnections will be quick mostly, so always reset generation for this case, otherwise, it will make application fail just for Idle reconnection feature. WC-bug-id: https://jira.whamcloud.com/browse/LU-12378 Lustre-commit: 94fbe511ba96 ("LU-12378 ptlrpc: always reset generation for idle reconnect") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/35052 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/import.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 813d3c8..028dd65 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1674,7 +1674,8 @@ static void ptlrpc_reset_reqs_generation(struct obd_import *imp) rq_list) { spin_lock(&old->rq_lock); if (old->rq_import_generation == imp->imp_generation - 1 && - !old->rq_no_resend) + ((imp->imp_initiated_at == imp->imp_generation) || + !old->rq_no_resend)) old->rq_import_generation = imp->imp_generation; spin_unlock(&old->rq_lock); } From patchwork Thu Feb 27 21:16:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A57971580 for ; Thu, 27 Feb 2020 21:47:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8E25424690 for ; Thu, 27 Feb 2020 21:47:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E25424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27C2B34A1FC; Thu, 27 Feb 2020 13:37:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6426421FD16 for ; Thu, 27 Feb 2020 13:21:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B7D6C91BF; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B6E60468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:57 -0500 Message-Id: <1582838290-17243-550-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 549/622] lustre: obdclass: Allow read-ahead for write requests X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown cl_io_read_ahead asserts that read-ahead can only happen due to CIT_READ or CIT_FAULT requests. Since LU-9618, we expect CIT_WRITE requests to also sometimes trigger read-ahead. So the LINVRNT() needs to be extended to acknowledge that. WC-bug-id: https://jira.whamcloud.com/browse/LU-12718 Lustre-commit: 514bd936d061 ("LU-12718 obdclass: Allow read-ahead for write requests") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36000 Reviewed-by: Shilong Wang Reviewed-by: Patrick Farrell Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 14849ed..3bc9097 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -554,7 +554,9 @@ int cl_io_read_ahead(const struct lu_env *env, struct cl_io *io, const struct cl_io_slice *scan; int result = 0; - LINVRNT(io->ci_type == CIT_READ || io->ci_type == CIT_FAULT); + LINVRNT(io->ci_type == CIT_READ || + io->ci_type == CIT_FAULT || + io->ci_type == CIT_WRITE); LINVRNT(cl_io_invariant(io)); list_for_each_entry(scan, &io->ci_layers, cis_linkage) { From patchwork Thu Feb 27 21:16:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CA891580 for ; Thu, 27 Feb 2020 21:47:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7586624690 for ; Thu, 27 Feb 2020 21:47:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7586624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB83E34B3B2; Thu, 27 Feb 2020 13:37:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A6EE021FD16 for ; Thu, 27 Feb 2020 13:21:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BB5F091C0; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B9C6A46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:58 -0500 Message-Id: <1582838290-17243-551-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 550/622] lustre: ldlm: separate buckets from ldlm hash table X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown ldlm maintains a per-namespace hashtable of resources. With these hash tables it stores per-bucket 'struct adaptive_timeout' structures. Presumably having a single struct for the whole table results in too much contention while having one per resource results in very little adaption. A future patch will change ldlm to use rhashtable which does not support per-bucket data, so we need to manage the data separately. There is no need for the multiple adaptive_timeout to align with the hash chains, and trying to do this has resulted in a rather complex hash function. The purpose of ldlm_res_hop_fid_hash() appears to be to keep resources with the same fid in the same hash bucket, so they use the same adaptive timeout. However it fails at doing this because it puts the fid-specific bits in the wrong part of the hash. If that is not the purpose, then I can see no point to the complexitiy. This patch creates a completely separate array of adaptive timeouts (and other less interesting data) and uses a hash of the fid to index that, meaning that a simple hash can be used for the hash table. In the previous code, two namespace uses the same value for nsd_all_bits and nsd_bkt_bits. This results in zero bits being used to choose a bucket - so there is only one bucket. This looks odd and would confuse hash_32(), so I've adjusted the numbers so there is always at least 1 bit (2 buckets). WC-bug-id: https://jira.whamcloud.com/browse/LU-8130 Lustre-commit: d234e2cf5f55 ("LU-8130 ldlm: separate buckets from ldlm hash table") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36218 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 2 ++ fs/lustre/ldlm/ldlm_resource.c | 56 ++++++++++++++++++------------------------ 2 files changed, 26 insertions(+), 32 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 31d360e..cc4b8b0 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -364,6 +364,8 @@ struct ldlm_namespace { /** Resource hash table for namespace. */ struct cfs_hash *ns_rs_hash; + struct ldlm_ns_bucket *ns_rs_buckets; + unsigned int ns_bucket_bits; /** serialize */ spinlock_t ns_lock; diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 14e03bc..65ff32c 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -452,10 +452,9 @@ static unsigned int ldlm_res_hop_hash(struct cfs_hash *hs, return val & mask; } -static unsigned int ldlm_res_hop_fid_hash(struct cfs_hash *hs, - const void *key, unsigned int mask) +static unsigned int ldlm_res_hop_fid_hash(const struct ldlm_res_id *id, + unsigned int bits) { - const struct ldlm_res_id *id = key; struct lu_fid fid; u32 hash; u32 val; @@ -468,18 +467,11 @@ static unsigned int ldlm_res_hop_fid_hash(struct cfs_hash *hs, hash += (hash >> 4) + (hash << 12); /* mixing oid and seq */ if (id->name[LUSTRE_RES_ID_HSH_OFF] != 0) { val = id->name[LUSTRE_RES_ID_HSH_OFF]; - hash += (val >> 5) + (val << 11); } else { val = fid_oid(&fid); } - hash = hash_long(hash, hs->hs_bkt_bits); - /* give me another random factor */ - hash -= hash_long((unsigned long)hs, val % 11 + 3); - - hash <<= hs->hs_cur_bits - hs->hs_bkt_bits; - hash |= ldlm_res_hop_hash(hs, key, CFS_HASH_NBKT(hs) - 1); - - return hash & mask; + hash += (val >> 5) + (val << 11); + return hash_32(hash, bits); } static void *ldlm_res_hop_key(struct hlist_node *hnode) @@ -531,16 +523,6 @@ static void ldlm_res_hop_put(struct cfs_hash *hs, struct hlist_node *hnode) .hs_put = ldlm_res_hop_put }; -static struct cfs_hash_ops ldlm_ns_fid_hash_ops = { - .hs_hash = ldlm_res_hop_fid_hash, - .hs_key = ldlm_res_hop_key, - .hs_keycmp = ldlm_res_hop_keycmp, - .hs_keycpy = NULL, - .hs_object = ldlm_res_hop_object, - .hs_get = ldlm_res_hop_get_locked, - .hs_put = ldlm_res_hop_put -}; - struct ldlm_ns_hash_def { enum ldlm_ns_type nsd_type; /** hash bucket bits */ @@ -556,13 +538,13 @@ struct ldlm_ns_hash_def { .nsd_type = LDLM_NS_TYPE_MDC, .nsd_bkt_bits = 11, .nsd_all_bits = 16, - .nsd_hops = &ldlm_ns_fid_hash_ops, + .nsd_hops = &ldlm_ns_hash_ops, }, { .nsd_type = LDLM_NS_TYPE_MDT, .nsd_bkt_bits = 14, .nsd_all_bits = 21, - .nsd_hops = &ldlm_ns_fid_hash_ops, + .nsd_hops = &ldlm_ns_hash_ops, }, { .nsd_type = LDLM_NS_TYPE_OSC, @@ -578,13 +560,13 @@ struct ldlm_ns_hash_def { }, { .nsd_type = LDLM_NS_TYPE_MGC, - .nsd_bkt_bits = 4, + .nsd_bkt_bits = 3, .nsd_all_bits = 4, .nsd_hops = &ldlm_ns_hash_ops, }, { .nsd_type = LDLM_NS_TYPE_MGT, - .nsd_bkt_bits = 4, + .nsd_bkt_bits = 3, .nsd_all_bits = 4, .nsd_hops = &ldlm_ns_hash_ops, }, @@ -613,9 +595,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, enum ldlm_ns_type ns_type) { struct ldlm_namespace *ns = NULL; - struct ldlm_ns_bucket *nsb; struct ldlm_ns_hash_def *nsd; - struct cfs_hash_bd bd; int idx; int rc; @@ -644,7 +624,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_rs_hash = cfs_hash_create(name, nsd->nsd_all_bits, nsd->nsd_all_bits, - nsd->nsd_bkt_bits, sizeof(*nsb), + nsd->nsd_bkt_bits, 0, CFS_HASH_MIN_THETA, CFS_HASH_MAX_THETA, nsd->nsd_hops, @@ -655,8 +635,16 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, if (!ns->ns_rs_hash) goto out_ns; - cfs_hash_for_each_bucket(ns->ns_rs_hash, &bd, idx) { - nsb = cfs_hash_bd_extra_get(ns->ns_rs_hash, &bd); + ns->ns_bucket_bits = nsd->nsd_all_bits - nsd->nsd_bkt_bits; + ns->ns_rs_buckets = kvmalloc(BIT(ns->ns_bucket_bits) * + sizeof(ns->ns_rs_buckets[0]), + GFP_KERNEL); + if (!ns->ns_rs_buckets) + goto out_hash; + + for (idx = 0; idx < (1 << ns->ns_bucket_bits); idx++) { + struct ldlm_ns_bucket *nsb = &ns->ns_rs_buckets[idx]; + at_init(&nsb->nsb_at_estimate, ldlm_enqueue_min, 0); nsb->nsb_namespace = ns; } @@ -711,6 +699,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ldlm_namespace_sysfs_unregister(ns); ldlm_namespace_cleanup(ns, 0); out_hash: + kvfree(ns->ns_rs_buckets); kfree(ns->ns_name); cfs_hash_putref(ns->ns_rs_hash); out_ns: @@ -973,6 +962,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns) ldlm_namespace_debugfs_unregister(ns); ldlm_namespace_sysfs_unregister(ns); cfs_hash_putref(ns->ns_rs_hash); + kvfree(ns->ns_rs_buckets); kfree(ns->ns_name); /* Namespace @ns should be not on list at this time, otherwise * this will cause issues related to using freed @ns in poold @@ -1087,6 +1077,7 @@ struct ldlm_resource * struct cfs_hash_bd bd; u64 version; int ns_refcount = 0; + int hash; LASSERT(!parent); LASSERT(ns->ns_rs_hash); @@ -1111,7 +1102,8 @@ struct ldlm_resource * if (!res) return ERR_PTR(-ENOMEM); - res->lr_ns_bucket = cfs_hash_bd_extra_get(ns->ns_rs_hash, &bd); + hash = ldlm_res_hop_fid_hash(name, ns->ns_bucket_bits); + res->lr_ns_bucket = &ns->ns_rs_buckets[hash]; res->lr_name = *name; res->lr_type = type; From patchwork Thu Feb 27 21:16:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E1385924 for ; Thu, 27 Feb 2020 21:44:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C8ADE24690 for ; Thu, 27 Feb 2020 21:44:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8ADE24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F4713489D7; Thu, 27 Feb 2020 13:35:30 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0866F348938 for ; Thu, 27 Feb 2020 13:21:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BE6E991C1; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BCE2D46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:16:59 -0500 Message-Id: <1582838290-17243-552-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 551/622] lustre: llite: don't cache MDS_OPEN_LOCK for volatile files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The kernels knfsd constantly opens and closes files for each access which can result in a continuous stream of open+close RPCs being send to the MDS. To avoid this Lustre created a special flag, ll_nfs_dentry, which enables caching of the MDS_OPEN_LOCK on the client. The fhandles API also uses the same exportfs layer as NFS which indirectly ends up caching the MDS_OPEN_LOCK as well. This is okay for normal files except for Lustre's special volatile files that are used for HSM restore. It is expected on the last close of a Lustre volatile file that it is no longer accessable. To ensure this behavior is kept don't cache MDS_OPEN_LOCK for volatile files. WC-bug-id: https://jira.whamcloud.com/browse/LU-8585 Lustre-commit: 6a3a842add0e ("LU-8585 llite: don't cache MDS_OPEN_LOCK for volatile files") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/36641 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Quentin Bouget Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index d196da8..a3c36a7 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -798,6 +798,7 @@ int ll_file_open(struct inode *inode, struct file *file) } else { LASSERT(*och_usecount == 0); if (!it->it_disposition) { + struct dentry *dentry = file_dentry(file); struct ll_dentry_data *ldd; /* We cannot just request lock handle now, new ELC code @@ -822,10 +823,13 @@ int ll_file_open(struct inode *inode, struct file *file) * lookup path only, since ll_iget_for_nfs always calls * ll_d_init(). */ - ldd = ll_d2d(file->f_path.dentry); + ldd = ll_d2d(dentry); if (ldd && ldd->lld_nfs_dentry) { ldd->lld_nfs_dentry = 0; - it->it_flags |= MDS_OPEN_LOCK; + if (!filename_is_volatile(dentry->d_name.name, + dentry->d_name.len, + NULL)) + it->it_flags |= MDS_OPEN_LOCK; } /* @@ -833,8 +837,7 @@ int ll_file_open(struct inode *inode, struct file *file) * to get file with different fid. */ it->it_flags |= MDS_OPEN_BY_FID; - rc = ll_intent_file_open(file->f_path.dentry, - NULL, 0, it); + rc = ll_intent_file_open(dentry, NULL, 0, it); if (rc) goto out_openerr; From patchwork Thu Feb 27 21:17:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410687 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFA43924 for ; Thu, 27 Feb 2020 21:44:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 983DC24690 for ; Thu, 27 Feb 2020 21:44:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 983DC24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4F6FA34A0FE; Thu, 27 Feb 2020 13:35:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 49A9B34893F for ; Thu, 27 Feb 2020 13:21:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C0FB791C2; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BF97246C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:00 -0500 Message-Id: <1582838290-17243-553-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 552/622] lnet: discard lnd_refcount X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The lnd_refcount in 'struct lnet_lnd' is never tested (except in an ASSERT()), so it cannot be needed. Let's remove it. Each individual lnd keeps track of how many lnet_ni are registered for that lnd e.g. ksocklnd has a counter in ksnd_nnets and o2iblnd has a linked list in kib_devs. They hold a reference on the module while there are registered devices, and the lnd is only freed (and the lnd_refcount checked) when the module is unloaded. This confirms that lnd_refcount adds no value. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 606299929509 ("LU-12678 lnet: discard lnd_refcount") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36829 Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 1 - net/lnet/lnet/api-ni.c | 18 ------------------ 2 files changed, 19 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 4b110eb..e105308 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -246,7 +246,6 @@ struct lnet_test_peer { struct lnet_lnd { /* fields managed by portals */ struct list_head lnd_list; /* stash in the LND table */ - int lnd_refcount; /* # active instances */ /* fields initialised by the LND */ u32 lnd_type; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e66d9dc7..6c913b5 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -758,7 +758,6 @@ static void lnet_assert_wire_constants(void) LASSERT(!lnet_find_lnd_by_type(lnd->lnd_type)); list_add_tail(&lnd->lnd_list, &the_lnet.ln_lnds); - lnd->lnd_refcount = 0; CDEBUG(D_NET, "%s LND registered\n", libcfs_lnd2str(lnd->lnd_type)); @@ -772,7 +771,6 @@ static void lnet_assert_wire_constants(void) mutex_lock(&the_lnet.ln_lnd_mutex); LASSERT(lnet_find_lnd_by_type(lnd->lnd_type) == lnd); - LASSERT(!lnd->lnd_refcount); list_del(&lnd->lnd_list); CDEBUG(D_NET, "%s LND unregistered\n", libcfs_lnd2str(lnd->lnd_type)); @@ -2045,15 +2043,6 @@ static void lnet_push_target_fini(void) /* Do peer table cleanup for this net */ lnet_peer_tables_cleanup(net); - lnet_net_lock(LNET_LOCK_EX); - /* - * decrement ref count on lnd only when the entire network goes - * away - */ - net->net_lnd->lnd_refcount--; - - lnet_net_unlock(LNET_LOCK_EX); - lnet_net_free(net); } @@ -2134,9 +2123,6 @@ static void lnet_push_target_fini(void) if (rc) { LCONSOLE_ERROR_MSG(0x105, "Error %d starting up LNI %s\n", rc, libcfs_lnd2str(net->net_lnd->lnd_type)); - lnet_net_lock(LNET_LOCK_EX); - net->net_lnd->lnd_refcount--; - lnet_net_unlock(LNET_LOCK_EX); goto failed0; } @@ -2247,10 +2233,6 @@ static void lnet_push_target_fini(void) } } - lnet_net_lock(LNET_LOCK_EX); - lnd->lnd_refcount++; - lnet_net_unlock(LNET_LOCK_EX); - net->net_lnd = lnd; mutex_unlock(&the_lnet.ln_lnd_mutex); From patchwork Thu Feb 27 21:17:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36075924 for ; Thu, 27 Feb 2020 21:47:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1EC7A24690 for ; Thu, 27 Feb 2020 21:47:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1EC7A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 788E634B3BB; Thu, 27 Feb 2020 13:37:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9F5BE348942 for ; Thu, 27 Feb 2020 13:21:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C4B8191C3; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C27BB468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:01 -0500 Message-Id: <1582838290-17243-554-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 553/622] lnet: socklnd: rename struct ksock_peer to struct ksock_peer_ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" In the OpenSFS tree when typedefs were removed from the socklnd driver all ksock peers were renamed to struct ksock_peer_ni. This didn't happened for the linux client so lets bring both trees in sync. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 93090d9b8250 ("LU-6142 socklnd: remove typedefs from ksocklnd") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/28275 Reviewed-by: Dmitry Eremin Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 78 +++++++++++++++++----------------- net/lnet/klnds/socklnd/socklnd.h | 38 ++++++++--------- net/lnet/klnds/socklnd/socklnd_cb.c | 24 +++++------ net/lnet/klnds/socklnd/socklnd_proto.c | 4 +- 4 files changed, 72 insertions(+), 72 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index e2a9819..79068f3 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -99,12 +99,12 @@ } static int -ksocknal_create_peer(struct ksock_peer **peerp, struct lnet_ni *ni, +ksocknal_create_peer(struct ksock_peer_ni **peerp, struct lnet_ni *ni, struct lnet_process_id id) { int cpt = lnet_cpt_of_nid(id.nid, ni); struct ksock_net *net = ni->ni_data; - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; LASSERT(id.nid != LNET_NID_ANY); LASSERT(id.pid != LNET_PID_ANY); @@ -148,7 +148,7 @@ } void -ksocknal_destroy_peer(struct ksock_peer *peer_ni) +ksocknal_destroy_peer(struct ksock_peer_ni *peer_ni) { struct ksock_net *net = peer_ni->ksnp_ni->ni_data; @@ -175,11 +175,11 @@ spin_unlock_bh(&net->ksnn_lock); } -struct ksock_peer * +struct ksock_peer_ni * ksocknal_find_peer_locked(struct lnet_ni *ni, struct lnet_process_id id) { struct list_head *peer_list = ksocknal_nid2peerlist(id.nid); - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; list_for_each_entry(peer_ni, peer_list, ksnp_list) { LASSERT(!peer_ni->ksnp_closing); @@ -199,10 +199,10 @@ struct ksock_peer * return NULL; } -struct ksock_peer * +struct ksock_peer_ni * ksocknal_find_peer(struct lnet_ni *ni, struct lnet_process_id id) { - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; read_lock(&ksocknal_data.ksnd_global_lock); peer_ni = ksocknal_find_peer_locked(ni, id); @@ -214,7 +214,7 @@ struct ksock_peer * } static void -ksocknal_unlink_peer_locked(struct ksock_peer *peer_ni) +ksocknal_unlink_peer_locked(struct ksock_peer_ni *peer_ni) { int i; u32 ip; @@ -250,7 +250,7 @@ struct ksock_peer * struct lnet_process_id *id, u32 *myip, u32 *peer_ip, int *port, int *conn_count, int *share_count) { - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; struct ksock_route *route; int i; int j; @@ -318,7 +318,7 @@ struct ksock_peer * ksocknal_associate_route_conn_locked(struct ksock_route *route, struct ksock_conn *conn) { - struct ksock_peer *peer_ni = route->ksnr_peer; + struct ksock_peer_ni *peer_ni = route->ksnr_peer; int type = conn->ksnc_type; struct ksock_interface *iface; @@ -362,7 +362,7 @@ struct ksock_peer * } static void -ksocknal_add_route_locked(struct ksock_peer *peer_ni, struct ksock_route *route) +ksocknal_add_route_locked(struct ksock_peer_ni *peer_ni, struct ksock_route *route) { struct ksock_conn *conn; struct ksock_route *route2; @@ -400,7 +400,7 @@ struct ksock_peer * static void ksocknal_del_route_locked(struct ksock_route *route) { - struct ksock_peer *peer_ni = route->ksnr_peer; + struct ksock_peer_ni *peer_ni = route->ksnr_peer; struct ksock_interface *iface; struct ksock_conn *conn; struct list_head *ctmp; @@ -443,8 +443,8 @@ struct ksock_peer * ksocknal_add_peer(struct lnet_ni *ni, struct lnet_process_id id, u32 ipaddr, int port) { - struct ksock_peer *peer_ni; - struct ksock_peer *peer2; + struct ksock_peer_ni *peer_ni; + struct ksock_peer_ni *peer2; struct ksock_route *route; struct ksock_route *route2; int rc; @@ -497,7 +497,7 @@ struct ksock_peer * } static void -ksocknal_del_peer_locked(struct ksock_peer *peer_ni, u32 ip) +ksocknal_del_peer_locked(struct ksock_peer_ni *peer_ni, u32 ip) { struct ksock_conn *conn; struct ksock_route *route; @@ -556,8 +556,8 @@ struct ksock_peer * ksocknal_del_peer(struct lnet_ni *ni, struct lnet_process_id id, u32 ip) { LIST_HEAD(zombies); - struct ksock_peer *pnxt; - struct ksock_peer *peer_ni; + struct ksock_peer_ni *pnxt; + struct ksock_peer_ni *peer_ni; int lo; int hi; int i; @@ -615,7 +615,7 @@ struct ksock_peer * static struct ksock_conn * ksocknal_get_conn_by_idx(struct lnet_ni *ni, int index) { - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; struct ksock_conn *conn; int i; @@ -729,7 +729,7 @@ struct ksock_peer * } static int -ksocknal_select_ips(struct ksock_peer *peer_ni, u32 *peerips, int n_peerips) +ksocknal_select_ips(struct ksock_peer_ni *peer_ni, u32 *peerips, int n_peerips) { rwlock_t *global_lock = &ksocknal_data.ksnd_global_lock; struct ksock_net *net = peer_ni->ksnp_ni->ni_data; @@ -844,7 +844,7 @@ struct ksock_peer * } static void -ksocknal_create_routes(struct ksock_peer *peer_ni, int port, +ksocknal_create_routes(struct ksock_peer_ni *peer_ni, int port, u32 *peer_ipaddrs, int npeer_ipaddrs) { struct ksock_route *newroute = NULL; @@ -984,7 +984,7 @@ struct ksock_peer * } static int -ksocknal_connecting(struct ksock_peer *peer_ni, u32 ipaddr) +ksocknal_connecting(struct ksock_peer_ni *peer_ni, u32 ipaddr) { struct ksock_route *route; @@ -1005,8 +1005,8 @@ struct ksock_peer * u64 incarnation; struct ksock_conn *conn; struct ksock_conn *conn2; - struct ksock_peer *peer_ni = NULL; - struct ksock_peer *peer2; + struct ksock_peer_ni *peer_ni = NULL; + struct ksock_peer_ni *peer2; struct ksock_sched *sched; struct ksock_hello_msg *hello; int cpt; @@ -1422,7 +1422,7 @@ struct ksock_peer * * connection for the reaper to terminate. * Caller holds ksnd_global_lock exclusively in irq context */ - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; struct ksock_route *route; struct ksock_conn *conn2; @@ -1495,7 +1495,7 @@ struct ksock_peer * } void -ksocknal_peer_failed(struct ksock_peer *peer_ni) +ksocknal_peer_failed(struct ksock_peer_ni *peer_ni) { int notify = 0; time64_t last_alive = 0; @@ -1525,7 +1525,7 @@ struct ksock_peer * void ksocknal_finalize_zcreq(struct ksock_conn *conn) { - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; struct ksock_tx *tx; struct ksock_tx *tmp; LIST_HEAD(zlist); @@ -1569,7 +1569,7 @@ struct ksock_peer * * ksnc_refcount will eventually hit zero, and then the reaper will * destroy it. */ - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; struct ksock_sched *sched = conn->ksnc_scheduler; int failed = 0; @@ -1703,7 +1703,7 @@ struct ksock_peer * } int -ksocknal_close_peer_conns_locked(struct ksock_peer *peer_ni, +ksocknal_close_peer_conns_locked(struct ksock_peer_ni *peer_ni, u32 ipaddr, int why) { struct ksock_conn *conn; @@ -1726,7 +1726,7 @@ struct ksock_peer * int ksocknal_close_conn_and_siblings(struct ksock_conn *conn, int why) { - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; u32 ipaddr = conn->ksnc_ipaddr; int count; @@ -1742,8 +1742,8 @@ struct ksock_peer * int ksocknal_close_matching_conns(struct lnet_process_id id, u32 ipaddr) { - struct ksock_peer *peer_ni; - struct ksock_peer *pnxt; + struct ksock_peer_ni *peer_ni; + struct ksock_peer_ni *pnxt; int lo; int hi; int i; @@ -1816,7 +1816,7 @@ struct ksock_peer * int connect = 1; time64_t last_alive = 0; time64_t now = ktime_get_seconds(); - struct ksock_peer *peer_ni = NULL; + struct ksock_peer_ni *peer_ni = NULL; rwlock_t *glock = &ksocknal_data.ksnd_global_lock; struct lnet_process_id id = { .nid = nid, @@ -1872,7 +1872,7 @@ struct ksock_peer * } static void -ksocknal_push_peer(struct ksock_peer *peer_ni) +ksocknal_push_peer(struct ksock_peer_ni *peer_ni) { int index; int i; @@ -1921,7 +1921,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) int peer_off; /* searching offset in peer_ni hash table */ for (peer_off = 0; ; peer_off++) { - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; int i = 0; read_lock(&ksocknal_data.ksnd_global_lock); @@ -1958,7 +1958,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) int rc; int i; int j; - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; struct ksock_route *route; if (!ipaddress || !netmask) @@ -2014,7 +2014,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) } static void -ksocknal_peer_del_interface_locked(struct ksock_peer *peer_ni, u32 ipaddr) +ksocknal_peer_del_interface_locked(struct ksock_peer_ni *peer_ni, u32 ipaddr) { struct list_head *tmp; struct list_head *nxt; @@ -2059,8 +2059,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) { struct ksock_net *net = ni->ni_data; int rc = -ENOENT; - struct ksock_peer *nxt; - struct ksock_peer *peer_ni; + struct ksock_peer_ni *nxt; + struct ksock_peer_ni *peer_ni; u32 this_ip; int i; int j; @@ -2457,7 +2457,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) static void ksocknal_debug_peerhash(struct lnet_ni *ni) { - struct ksock_peer *peer_ni = NULL; + struct ksock_peer_ni *peer_ni = NULL; int i; read_lock(&ksocknal_data.ksnd_global_lock); diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index efdd02e..1e10663 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -262,7 +262,7 @@ struct ksock_nal_data { * what the header matched or whether the message needs forwarding. */ struct ksock_conn; /* forward ref */ -struct ksock_peer; /* forward ref */ +struct ksock_peer_ni; /* forward ref */ struct ksock_route; /* forward ref */ struct ksock_proto; /* forward ref */ @@ -311,7 +311,7 @@ struct ksock_tx { /* transmit packet */ #define SOCKNAL_RX_SLOP 6 /* skipping body */ struct ksock_conn { - struct ksock_peer *ksnc_peer; /* owning peer_ni */ + struct ksock_peer_ni *ksnc_peer; /* owning peer_ni */ struct ksock_route *ksnc_route; /* owning route */ struct list_head ksnc_list; /* stash on peer_ni's conn list */ struct socket *ksnc_sock; /* actual socket */ @@ -383,7 +383,7 @@ struct ksock_conn { struct ksock_route { struct list_head ksnr_list; /* chain on peer_ni route list */ struct list_head ksnr_connd_list; /* chain on ksnr_connd_routes */ - struct ksock_peer *ksnr_peer; /* owning peer_ni */ + struct ksock_peer_ni *ksnr_peer; /* owning peer_ni */ atomic_t ksnr_refcount; /* # users */ time64_t ksnr_timeout; /* when (in secs) reconnection * can happen next @@ -408,7 +408,7 @@ struct ksock_route { #define SOCKNAL_KEEPALIVE_PING 1 /* cookie for keepalive ping */ -struct ksock_peer { +struct ksock_peer_ni { struct list_head ksnp_list; /* stash on global peer_ni list */ time64_t ksnp_last_alive; /* when (in seconds) I was last * alive @@ -607,16 +607,16 @@ struct ksock_proto { } static inline void -ksocknal_peer_addref(struct ksock_peer *peer_ni) +ksocknal_peer_addref(struct ksock_peer_ni *peer_ni) { LASSERT(atomic_read(&peer_ni->ksnp_refcount) > 0); atomic_inc(&peer_ni->ksnp_refcount); } -void ksocknal_destroy_peer(struct ksock_peer *peer_ni); +void ksocknal_destroy_peer(struct ksock_peer_ni *peer_ni); static inline void -ksocknal_peer_decref(struct ksock_peer *peer_ni) +ksocknal_peer_decref(struct ksock_peer_ni *peer_ni) { LASSERT(atomic_read(&peer_ni->ksnp_refcount) > 0); if (atomic_dec_and_test(&peer_ni->ksnp_refcount)) @@ -633,21 +633,21 @@ int ksocknal_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg, int ksocknal_add_peer(struct lnet_ni *ni, struct lnet_process_id id, u32 ip, int port); -struct ksock_peer *ksocknal_find_peer_locked(struct lnet_ni *ni, - struct lnet_process_id id); -struct ksock_peer *ksocknal_find_peer(struct lnet_ni *ni, - struct lnet_process_id id); -void ksocknal_peer_failed(struct ksock_peer *peer_ni); +struct ksock_peer_ni *ksocknal_find_peer_locked(struct lnet_ni *ni, + struct lnet_process_id id); +struct ksock_peer_ni *ksocknal_find_peer(struct lnet_ni *ni, + struct lnet_process_id id); +void ksocknal_peer_failed(struct ksock_peer_ni *peer_ni); int ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route, struct socket *sock, int type); void ksocknal_close_conn_locked(struct ksock_conn *conn, int why); void ksocknal_terminate_conn(struct ksock_conn *conn); void ksocknal_destroy_conn(struct ksock_conn *conn); -int ksocknal_close_peer_conns_locked(struct ksock_peer *peer_ni, +int ksocknal_close_peer_conns_locked(struct ksock_peer_ni *peer_ni, u32 ipaddr, int why); int ksocknal_close_conn_and_siblings(struct ksock_conn *conn, int why); int ksocknal_close_matching_conns(struct lnet_process_id id, u32 ipaddr); -struct ksock_conn *ksocknal_find_conn_locked(struct ksock_peer *peer_ni, +struct ksock_conn *ksocknal_find_conn_locked(struct ksock_peer_ni *peer_ni, struct ksock_tx *tx, int nonblk); int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx, @@ -662,11 +662,11 @@ int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx, void ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when); int ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name); void ksocknal_thread_fini(void); -void ksocknal_launch_all_connections_locked(struct ksock_peer *peer_ni); -struct ksock_route *ksocknal_find_connectable_route_locked( - struct ksock_peer *peer_ni); -struct ksock_route *ksocknal_find_connecting_route_locked( - struct ksock_peer *peer_ni); +void ksocknal_launch_all_connections_locked(struct ksock_peer_ni *peer_ni); +struct ksock_route * +ksocknal_find_connectable_route_locked(struct ksock_peer_ni *peer_ni); +struct ksock_route * +ksocknal_find_connecting_route_locked(struct ksock_peer_ni *peer_ni); int ksocknal_new_packet(struct ksock_conn *conn, int skip); int ksocknal_scheduler(void *arg); int ksocknal_connd(void *arg); diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 0132727..2b93331 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -394,7 +394,7 @@ struct ksock_tx * ksocknal_check_zc_req(struct ksock_tx *tx) { struct ksock_conn *conn = tx->tx_conn; - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; /* * Set tx_msg.ksm_zc_cookies[0] to a unique non-zero cookie and add tx @@ -440,7 +440,7 @@ struct ksock_tx * static void ksocknal_uncheck_zc_req(struct ksock_tx *tx) { - struct ksock_peer *peer_ni = tx->tx_conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = tx->tx_conn->ksnc_peer; LASSERT(tx->tx_msg.ksm_type != KSOCK_MSG_NOOP); LASSERT(tx->tx_zc_capable); @@ -581,7 +581,7 @@ struct ksock_tx * } void -ksocknal_launch_all_connections_locked(struct ksock_peer *peer_ni) +ksocknal_launch_all_connections_locked(struct ksock_peer_ni *peer_ni) { struct ksock_route *route; @@ -597,7 +597,7 @@ struct ksock_tx * } struct ksock_conn * -ksocknal_find_conn_locked(struct ksock_peer *peer_ni, struct ksock_tx *tx, +ksocknal_find_conn_locked(struct ksock_peer_ni *peer_ni, struct ksock_tx *tx, int nonblk) { struct ksock_conn *c; @@ -763,7 +763,7 @@ struct ksock_conn * } struct ksock_route * -ksocknal_find_connectable_route_locked(struct ksock_peer *peer_ni) +ksocknal_find_connectable_route_locked(struct ksock_peer_ni *peer_ni) { time64_t now = ktime_get_seconds(); struct ksock_route *route; @@ -797,7 +797,7 @@ struct ksock_route * } struct ksock_route * -ksocknal_find_connecting_route_locked(struct ksock_peer *peer_ni) +ksocknal_find_connecting_route_locked(struct ksock_peer_ni *peer_ni) { struct ksock_route *route; @@ -815,7 +815,7 @@ struct ksock_route * ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx, struct lnet_process_id id) { - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; struct ksock_conn *conn; rwlock_t *g_lock; int retry; @@ -1806,7 +1806,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) ksocknal_connect(struct ksock_route *route) { LIST_HEAD(zombies); - struct ksock_peer *peer_ni = route->ksnr_peer; + struct ksock_peer_ni *peer_ni = route->ksnr_peer; int type; int wanted; struct socket *sock; @@ -2213,7 +2213,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) } static struct ksock_conn * -ksocknal_find_timed_out_conn(struct ksock_peer *peer_ni) +ksocknal_find_timed_out_conn(struct ksock_peer_ni *peer_ni) { /* We're called with a shared lock on ksnd_global_lock */ struct ksock_conn *conn; @@ -2296,7 +2296,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) } static inline void -ksocknal_flush_stale_txs(struct ksock_peer *peer_ni) +ksocknal_flush_stale_txs(struct ksock_peer_ni *peer_ni) { struct ksock_tx *tx; LIST_HEAD(stale_txs); @@ -2322,7 +2322,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) } static int -ksocknal_send_keepalive_locked(struct ksock_peer *peer_ni) +ksocknal_send_keepalive_locked(struct ksock_peer_ni *peer_ni) __must_hold(&ksocknal_data.ksnd_global_lock) { struct ksock_sched *sched; @@ -2388,7 +2388,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) ksocknal_check_peer_timeouts(int idx) { struct list_head *peers = &ksocknal_data.ksnd_peers[idx]; - struct ksock_peer *peer_ni; + struct ksock_peer_ni *peer_ni; struct ksock_conn *conn; struct ksock_tx *tx; diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c index 64c0c74..c6ea302 100644 --- a/net/lnet/klnds/socklnd/socklnd_proto.c +++ b/net/lnet/klnds/socklnd/socklnd_proto.c @@ -367,7 +367,7 @@ static int ksocknal_handle_zcreq(struct ksock_conn *c, u64 cookie, int remote) { - struct ksock_peer *peer_ni = c->ksnc_peer; + struct ksock_peer_ni *peer_ni = c->ksnc_peer; struct ksock_conn *conn; struct ksock_tx *tx; int rc; @@ -411,7 +411,7 @@ static int ksocknal_handle_zcack(struct ksock_conn *conn, u64 cookie1, u64 cookie2) { - struct ksock_peer *peer_ni = conn->ksnc_peer; + struct ksock_peer_ni *peer_ni = conn->ksnc_peer; struct ksock_tx *tx; struct ksock_tx *tmp; LIST_HEAD(zlist); From patchwork Thu Feb 27 21:17:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4BD661580 for ; Thu, 27 Feb 2020 21:47:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3237824690 for ; Thu, 27 Feb 2020 21:47:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3237824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1C84E34B3E2; Thu, 27 Feb 2020 13:37:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 00D81348942 for ; Thu, 27 Feb 2020 13:21:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C724591C4; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C550247C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:02 -0500 Message-Id: <1582838290-17243-555-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 554/622] lnet: change ksocknal_create_peer() to return pointer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown ksocknal_create_peer() currently returns an error status, and if that is 0, a pointer is stored in a by-reference argument. The preferred pattern in the kernel is to return the pointer, or the error code encoded with ERR_PTR(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 049683bc0fc0 ("LU-12678 lnet: change ksocknal_create_peer() to return pointer") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36833 Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 79068f3..3e69d9c 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -98,9 +98,8 @@ kfree(route); } -static int -ksocknal_create_peer(struct ksock_peer_ni **peerp, struct lnet_ni *ni, - struct lnet_process_id id) +static struct ksock_peer_ni * +ksocknal_create_peer(struct lnet_ni *ni, struct lnet_process_id id) { int cpt = lnet_cpt_of_nid(id.nid, ni); struct ksock_net *net = ni->ni_data; @@ -112,7 +111,7 @@ peer_ni = kzalloc_cpt(sizeof(*peer_ni), GFP_NOFS, cpt); if (!peer_ni) - return -ENOMEM; + return ERR_PTR(-ENOMEM); peer_ni->ksnp_ni = ni; peer_ni->ksnp_id = id; @@ -136,15 +135,14 @@ kfree(peer_ni); CERROR("Can't create peer_ni: network shutdown\n"); - return -ESHUTDOWN; + return ERR_PTR(-ESHUTDOWN); } net->ksnn_npeers++; spin_unlock_bh(&net->ksnn_lock); - *peerp = peer_ni; - return 0; + return peer_ni; } void @@ -447,16 +445,15 @@ struct ksock_peer_ni * struct ksock_peer_ni *peer2; struct ksock_route *route; struct ksock_route *route2; - int rc; if (id.nid == LNET_NID_ANY || id.pid == LNET_PID_ANY) return -EINVAL; /* Have a brand new peer_ni ready... */ - rc = ksocknal_create_peer(&peer_ni, ni, id); - if (rc) - return rc; + peer_ni = ksocknal_create_peer(ni, id); + if (IS_ERR(peer_ni)) + return PTR_ERR(peer_ni); route = ksocknal_create_route(ipaddr, port); if (!route) { @@ -1114,9 +1111,11 @@ struct ksock_peer_ni * ksocknal_peer_addref(peer_ni); write_lock_bh(global_lock); } else { - rc = ksocknal_create_peer(&peer_ni, ni, peerid); - if (rc) + peer_ni = ksocknal_create_peer(ni, peerid); + if (IS_ERR(peer_ni)) { + rc = PTR_ERR(peer_ni); goto failed_1; + } write_lock_bh(global_lock); From patchwork Thu Feb 27 21:17:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E609C924 for ; Thu, 27 Feb 2020 21:45:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CEA94246A2 for ; Thu, 27 Feb 2020 21:45:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEA94246A2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88B4D34B055; Thu, 27 Feb 2020 13:36:19 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44EE234894C for ; Thu, 27 Feb 2020 13:21:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C985C91C5; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C867546D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:03 -0500 Message-Id: <1582838290-17243-556-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 555/622] lnet: discard ksnn_lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This lock in 'struct ksock_net' is being taken in places where it isn't needed, so it is worth cleaning up. It isn't needed when checking if ksnn_npeers has reached 0 yet, as at that point in the code, the value can only decrement to zero and then stay there. It is only needed: - to ensure concurrent updates to ksnn_npeers don't race, and - to ensure that no more peers are added after the net is shutdown. The first is best achieved using atomic_t. The second is more easily achieved by replacing the ksnn_shutdown flag with a large negative bias on ksnn_npeers, and using atomic_inc_unless_negative(). So change ksnn_npeers to atomic_t and discard ksnn_lock and ksnn_shutdown. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: fb983bbebf81 ("LU-12678 lnet: discard ksnn_lock") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36834 Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 46 +++++++++++++--------------------------- net/lnet/klnds/socklnd/socklnd.h | 9 +++++--- 2 files changed, 21 insertions(+), 34 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 3e69d9c..1d0bedb 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -109,9 +109,16 @@ LASSERT(id.pid != LNET_PID_ANY); LASSERT(!in_interrupt()); + if (!atomic_inc_unless_negative(&net->ksnn_npeers)) { + CERROR("Can't create peer_ni: network shutdown\n"); + return ERR_PTR(-ESHUTDOWN); + } + peer_ni = kzalloc_cpt(sizeof(*peer_ni), GFP_NOFS, cpt); - if (!peer_ni) + if (!peer_ni) { + atomic_dec(&net->ksnn_npeers); return ERR_PTR(-ENOMEM); + } peer_ni->ksnp_ni = ni; peer_ni->ksnp_id = id; @@ -128,20 +135,6 @@ INIT_LIST_HEAD(&peer_ni->ksnp_zc_req_list); spin_lock_init(&peer_ni->ksnp_lock); - spin_lock_bh(&net->ksnn_lock); - - if (net->ksnn_shutdown) { - spin_unlock_bh(&net->ksnn_lock); - - kfree(peer_ni); - CERROR("Can't create peer_ni: network shutdown\n"); - return ERR_PTR(-ESHUTDOWN); - } - - net->ksnn_npeers++; - - spin_unlock_bh(&net->ksnn_lock); - return peer_ni; } @@ -168,9 +161,7 @@ * do with this peer_ni has been cleaned up when its refcount drops to * zero. */ - spin_lock_bh(&net->ksnn_lock); - net->ksnn_npeers--; - spin_unlock_bh(&net->ksnn_lock); + atomic_dec(&net->ksnn_npeers); } struct ksock_peer_ni * @@ -464,7 +455,7 @@ struct ksock_peer_ni * write_lock_bh(&ksocknal_data.ksnd_global_lock); /* always called with a ref on ni, so shutdown can't have started */ - LASSERT(!((struct ksock_net *)ni->ni_data)->ksnn_shutdown); + LASSERT(atomic_read(&((struct ksock_net *)ni->ni_data)->ksnn_npeers) >= 0); peer2 = ksocknal_find_peer_locked(ni, id); if (peer2) { @@ -1120,7 +1111,7 @@ struct ksock_peer_ni * write_lock_bh(global_lock); /* called with a ref on ni, so shutdown can't have started */ - LASSERT(!((struct ksock_net *)ni->ni_data)->ksnn_shutdown); + LASSERT(atomic_read(&((struct ksock_net *)ni->ni_data)->ksnn_npeers) >= 0); peer2 = ksocknal_find_peer_locked(ni, peerid); if (!peer2) { @@ -2516,30 +2507,24 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) LASSERT(ksocknal_data.ksnd_init == SOCKNAL_INIT_ALL); LASSERT(ksocknal_data.ksnd_nnets > 0); - spin_lock_bh(&net->ksnn_lock); - net->ksnn_shutdown = 1; /* prevent new peers */ - spin_unlock_bh(&net->ksnn_lock); + /* prevent new peers */ + atomic_add(SOCKNAL_SHUTDOWN_BIAS, &net->ksnn_npeers); /* Delete all peers */ ksocknal_del_peer(ni, anyid, 0); /* Wait for all peer_ni state to clean up */ i = 2; - spin_lock_bh(&net->ksnn_lock); - while (net->ksnn_npeers) { - spin_unlock_bh(&net->ksnn_lock); - + while (atomic_read(&net->ksnn_npeers) > SOCKNAL_SHUTDOWN_BIAS) { i++; CDEBUG(((i & (-i)) == i) ? D_WARNING : D_NET, /* power of 2? */ "waiting for %d peers to disconnect\n", - net->ksnn_npeers); + atomic_read(&net->ksnn_npeers) - SOCKNAL_SHUTDOWN_BIAS); schedule_timeout_uninterruptible(HZ); ksocknal_debug_peerhash(ni); - spin_lock_bh(&net->ksnn_lock); } - spin_unlock_bh(&net->ksnn_lock); for (i = 0; i < net->ksnn_ninterfaces; i++) { LASSERT(!net->ksnn_interfaces[i].ksni_npeers); @@ -2691,7 +2676,6 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) if (!net) goto fail_0; - spin_lock_init(&net->ksnn_lock); net->ksnn_incarnation = ktime_get_real_ns(); ni->ni_data = net; net_tunables = &ni->ni_net->net_tunables; diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 1e10663..832bc08 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -166,14 +166,17 @@ struct ksock_tunables { struct ksock_net { u64 ksnn_incarnation; /* my epoch */ - spinlock_t ksnn_lock; /* serialise */ struct list_head ksnn_list; /* chain on global list */ - int ksnn_npeers; /* # peers */ - int ksnn_shutdown; /* shutting down? */ + atomic_t ksnn_npeers; /* # peers */ int ksnn_ninterfaces; /* IP interfaces */ struct ksock_interface ksnn_interfaces[LNET_INTERFACES_NUM]; }; +/* When the ksock_net is shut down, this bias is added to + * ksnn_npeers, which prevents new pears from being added. + */ +#define SOCKNAL_SHUTDOWN_BIAS (INT_MIN + 1) + /** connd timeout */ #define SOCKNAL_CONND_TIMEOUT 120 /** reserved thread for accepting & creating new connd */ From patchwork Thu Feb 27 21:17:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E482138D for ; Thu, 27 Feb 2020 21:44:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6613624690 for ; Thu, 27 Feb 2020 21:44:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6613624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2D78421F4FA; Thu, 27 Feb 2020 13:35:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BBDF348951 for ; Thu, 27 Feb 2020 13:21:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CC39E91C6; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CB3AA46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:04 -0500 Message-Id: <1582838290-17243-557-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 556/622] lnet: discard LNetMEInsert X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This function is unused and has never been used. It is not used by cray-dvs - the other user of LNet. So discard it. Lustre-commit: bd5e458cc5fc ("LU-12678 lnet: discard LNetMEInsert") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36858 Reviewed-by: James Simmons Reviewed-by: Serguei Smirnov Reviewed-by: Amir Shehata Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/api.h | 10 +----- net/lnet/lnet/lib-me.c | 91 ++---------------------------------------------- 2 files changed, 3 insertions(+), 98 deletions(-) diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index 4b152c8..ac602fc 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -91,7 +91,7 @@ * and a set of match criteria. The match criteria can be used to reject * incoming requests based on process ID or the match bits provided in the * request. MEs can be dynamically inserted into a match list by LNetMEAttach() - * and LNetMEInsert(), and removed from its list by LNetMEUnlink(). + * and removed from its list by LNetMEUnlink(). * @{ */ int LNetMEAttach(unsigned int portal, @@ -102,14 +102,6 @@ int LNetMEAttach(unsigned int portal, enum lnet_ins_pos pos_in, struct lnet_handle_me *handle_out); -int LNetMEInsert(struct lnet_handle_me current_in, - struct lnet_process_id match_id_in, - u64 match_bits_in, - u64 ignore_bits_in, - enum lnet_unlink unlink_in, - enum lnet_ins_pos position_in, - struct lnet_handle_me *handle_out); - int LNetMEUnlink(struct lnet_handle_me current_in); /** @} lnet_me */ diff --git a/net/lnet/lnet/lib-me.c b/net/lnet/lnet/lib-me.c index 4fe6991..47cf498 100644 --- a/net/lnet/lnet/lib-me.c +++ b/net/lnet/lnet/lib-me.c @@ -63,8 +63,8 @@ * appended to the match list. Allowed constants: LNET_INS_BEFORE, * LNET_INS_AFTER. * @handle On successful returns, a handle to the newly created ME object - * is saved here. This handle can be used later in LNetMEInsert(), - * LNetMEUnlink(), or LNetMDAttach() functions. + * is saved here. This handle can be used later in LNetMEUnlink(), + * or LNetMDAttach() functions. * * Return: 0 On success. * -EINVAL If @portal is invalid. @@ -125,93 +125,6 @@ EXPORT_SYMBOL(LNetMEAttach); /** - * Create and a match entry and insert it before or after the ME pointed to by - * @current_meh. The new ME is empty, i.e. not associated with a memory - * descriptor. LNetMDAttach() can be used to attach a MD to an empty ME. - * - * This function is identical to LNetMEAttach() except for the position - * where the new ME is inserted. - * - * @current_meh A handle for a ME. The new ME will be inserted - * immediately before or immediately after this ME. - * @match_id See the discussion for LNetMEAttach(). - * @match_bits - * @ignore_bits - * @unlink - * @pos - * @handle - * - * Return: 0 On success. - * -ENOMEM If new ME object cannot be allocated. - * -ENOENT If @current_meh does not point to a valid match entry. - */ -int -LNetMEInsert(struct lnet_handle_me current_meh, - struct lnet_process_id match_id, - u64 match_bits, u64 ignore_bits, - enum lnet_unlink unlink, enum lnet_ins_pos pos, - struct lnet_handle_me *handle) -{ - struct lnet_me *current_me; - struct lnet_me *new_me; - struct lnet_portal *ptl; - int cpt; - - LASSERT(the_lnet.ln_refcount > 0); - - if (pos == LNET_INS_LOCAL) - return -EPERM; - - new_me = kzalloc(sizeof(*new_me), GFP_NOFS); - if (!new_me) - return -ENOMEM; - - cpt = lnet_cpt_of_cookie(current_meh.cookie); - - lnet_res_lock(cpt); - - current_me = lnet_handle2me(¤t_meh); - if (!current_me) { - kfree(new_me); - - lnet_res_unlock(cpt); - return -ENOENT; - } - - LASSERT(current_me->me_portal < the_lnet.ln_nportals); - - ptl = the_lnet.ln_portals[current_me->me_portal]; - if (lnet_ptl_is_unique(ptl)) { - /* nosense to insertion on unique portal */ - kfree(new_me); - lnet_res_unlock(cpt); - return -EPERM; - } - - new_me->me_pos = current_me->me_pos; - new_me->me_portal = current_me->me_portal; - new_me->me_match_id = match_id; - new_me->me_match_bits = match_bits; - new_me->me_ignore_bits = ignore_bits; - new_me->me_unlink = unlink; - new_me->me_md = NULL; - - lnet_res_lh_initialize(the_lnet.ln_me_containers[cpt], &new_me->me_lh); - - if (pos == LNET_INS_AFTER) - list_add(&new_me->me_list, ¤t_me->me_list); - else - list_add_tail(&new_me->me_list, ¤t_me->me_list); - - lnet_me2handle(handle, new_me); - - lnet_res_unlock(cpt); - - return 0; -} -EXPORT_SYMBOL(LNetMEInsert); - -/** * Unlink a match entry from its match list. * * This operation also releases any resources associated with the ME. If a From patchwork Thu Feb 27 21:17:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410515 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 693B892A for ; Thu, 27 Feb 2020 21:40:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5214524690 for ; Thu, 27 Feb 2020 21:40:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5214524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B51734A738; Thu, 27 Feb 2020 13:32:46 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F176D348951 for ; Thu, 27 Feb 2020 13:21:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CEFC691C7; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CE02946C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:05 -0500 Message-Id: <1582838290-17243-558-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 557/622] lustre: lmv: fix to return correct MDT count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong @ltd_tgts_size could be larger than actual MDT count, as we preallocate ltd_tgts and resize it if necessary. Fix it to use @ld_tgt_count instead. WC-bug-id: https://jira.whamcloud.com/browse/LU-12951 Lustre-commit: 3aa8826aabc7 ("LU-12951 lmv: fix to return correct MDT count") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/36713 Reviewed-by: Lai Siyao Reviewed-by: Olaf Faaland-LLNL Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index e92be25..ee52bba 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2870,7 +2870,7 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, exp->exp_connect_data = *(struct obd_connect_data *)val; return rc; } else if (KEY_IS(KEY_TGT_COUNT)) { - *((int *)val) = lmv->lmv_mdt_descs.ltd_tgts_size; + *((int *)val) = lmv->lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count; return 0; } From patchwork Thu Feb 27 21:17:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DA61924 for ; Thu, 27 Feb 2020 21:47:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 565E624690 for ; Thu, 27 Feb 2020 21:47:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 565E624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CF27434B40F; Thu, 27 Feb 2020 13:37:23 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3EB40348951 for ; Thu, 27 Feb 2020 13:21:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D1C2A91C8; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D0BA5468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:06 -0500 Message-Id: <1582838290-17243-559-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 558/622] lustre: obdclass: remove assertion for imp_refcount X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang After calling obd_zombie_import_add(), obd_import could be freed by obd_zombie before we check imp_refcount with LASSERT_ATOMIC_GE_LT. It's a use after free and could crash the box. WC-bug-id: https://jira.whamcloud.com/browse/LU-12965 Lustre-commit: dd71e74fecf4 ("LU-12965 obdclass: remove assertion for imp_refcount") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/36743 Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index bceb055..a31e9ce 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -945,9 +945,6 @@ void class_import_put(struct obd_import *imp) CDEBUG(D_INFO, "final put import %p\n", imp); obd_zombie_import_add(imp); } - - /* catch possible import put race */ - LASSERT_ATOMIC_GE_LT(&imp->imp_refcount, 0, LI_POISON); } EXPORT_SYMBOL(class_import_put); From patchwork Thu Feb 27 21:17:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410535 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC4F2138D for ; Thu, 27 Feb 2020 21:40:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D4DDB24690 for ; Thu, 27 Feb 2020 21:40:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D4DDB24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1D58A34A82E; Thu, 27 Feb 2020 13:33:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 80DE134895D for ; Thu, 27 Feb 2020 13:21:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D49E391CA; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D390847C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:07 -0500 Message-Id: <1582838290-17243-560-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 559/622] lnet: Prefer route specified by rtr_nid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Restore an optimization that was initially added under LU-11413. For routed REPLY and ACK we should preferably use the same router from which the GET/PUT was receieved. Cray-bug-id: LUS-8008 WC-bug-id: https://jira.whamcloud.com/browse/LU-12646 Lustre-commit: ca8958189198 ("LU-12646 lnet: Prefer route specified by rtr_nid") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/35737 Reviewed-by: Alexandr Boyko Reviewed-by: Shaun Tancheff Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 131 +++++++++++++++++++++++++++-------------------- net/lnet/lnet/lib-msg.c | 4 -- 2 files changed, 76 insertions(+), 59 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ca0009c..6a2833c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1330,7 +1330,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, static struct lnet_route * lnet_find_route_locked(struct lnet_net *net, u32 remote_net, - lnet_nid_t rtr_nid, struct lnet_route **prev_route, + struct lnet_route **prev_route, struct lnet_peer_ni **gwni) { struct lnet_peer_ni *best_gw_ni = NULL; @@ -1342,10 +1342,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_peer *lp; int rc; - /* - * If @rtr_nid is not LNET_NID_ANY, return the gateway with - * rtr_nid nid, otherwise find the best gateway I can use - */ rnet = lnet_find_rnet_locked(remote_net); if (!rnet) return NULL; @@ -1652,13 +1648,14 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, rc = lnet_post_send_locked(msg, 0); if (!rc) - CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s try# %d\n", + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) %s : %s try# %d\n", libcfs_nid2str(msg->msg_hdr.src_nid), libcfs_nid2str(msg->msg_txni->ni_nid), libcfs_nid2str(sd->sd_src_nid), libcfs_nid2str(msg->msg_hdr.dest_nid), libcfs_nid2str(sd->sd_dst_nid), libcfs_nid2str(msg->msg_txpeer->lpni_nid), + libcfs_nid2str(sd->sd_rtr_nid), lnet_msgtyp2str(msg->msg_type), msg->msg_retry_count); return rc; @@ -1829,70 +1826,91 @@ struct lnet_ni * struct lnet_peer **gw_peer) { int rc; + u32 local_lnet; struct lnet_peer *gw; struct lnet_peer *lp; struct lnet_peer_net *lpn; struct lnet_peer_net *best_lpn = NULL; struct lnet_remotenet *rnet; - struct lnet_route *best_route; - struct lnet_route *last_route; + struct lnet_route *best_route = NULL; + struct lnet_route *last_route = NULL; struct lnet_peer_ni *lpni = NULL; struct lnet_peer_ni *gwni = NULL; lnet_nid_t src_nid = sd->sd_src_nid; - /* we've already looked up the initial lpni using dst_nid */ - lpni = sd->sd_best_lpni; - /* the peer tree must be in existence */ - LASSERT(lpni && lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer); - lp = lpni->lpni_peer_net->lpn_peer; + /* If a router nid was specified then we are replying to a GET or + * sending an ACK. In this case we use the gateway associated with the + * specified router nid. + */ + if (sd->sd_rtr_nid != LNET_NID_ANY) { + gwni = lnet_find_peer_ni_locked(sd->sd_rtr_nid); + if (!gwni) { + CERROR("No peer NI for gateway %s\n", + libcfs_nid2str(sd->sd_rtr_nid)); + return -EHOSTUNREACH; + } + gw = gwni->lpni_peer_net->lpn_peer; + lnet_peer_ni_decref_locked(gwni); + local_lnet = LNET_NIDNET(sd->sd_rtr_nid); + } else { + /* we've already looked up the initial lpni using dst_nid */ + lpni = sd->sd_best_lpni; + /* the peer tree must be in existence */ + LASSERT(lpni && lpni->lpni_peer_net && + lpni->lpni_peer_net->lpn_peer); + lp = lpni->lpni_peer_net->lpn_peer; + + list_for_each_entry(lpn, &lp->lp_peer_nets, lpn_peer_nets) { + /* is this remote network reachable? */ + rnet = lnet_find_rnet_locked(lpn->lpn_net_id); + if (!rnet) + continue; - list_for_each_entry(lpn, &lp->lp_peer_nets, lpn_peer_nets) { - /* is this remote network reachable? */ - rnet = lnet_find_rnet_locked(lpn->lpn_net_id); - if (!rnet) - continue; + if (!best_lpn) + best_lpn = lpn; + + if (best_lpn->lpn_seq <= lpn->lpn_seq) + continue; - if (!best_lpn) best_lpn = lpn; + } - if (best_lpn->lpn_seq <= lpn->lpn_seq) - continue; + if (!best_lpn) { + CERROR("peer %s has no available nets\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } - best_lpn = lpn; - } + sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd, lp, + best_lpn->lpn_net_id); + if (!sd->sd_best_lpni) { + CERROR("peer %s down\n", + libcfs_nid2str(sd->sd_dst_nid)); + return -EHOSTUNREACH; + } - if (!best_lpn) { - CERROR("peer %s has no available nets\n", - libcfs_nid2str(sd->sd_dst_nid)); - return -EHOSTUNREACH; - } + best_route = lnet_find_route_locked(NULL, best_lpn->lpn_net_id, + &last_route, &gwni); + if (!best_route) { + CERROR("no route to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EHOSTUNREACH; + } - sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd, lp, - best_lpn->lpn_net_id); - if (!sd->sd_best_lpni) { - CERROR("peer %s down\n", libcfs_nid2str(sd->sd_dst_nid)); - return -EHOSTUNREACH; - } + if (!gwni) { + CERROR("Internal Error. Route expected to %s from %s\n", + libcfs_nid2str(dst_nid), + libcfs_nid2str(src_nid)); + return -EFAULT; + } - best_route = lnet_find_route_locked(NULL, best_lpn->lpn_net_id, - sd->sd_rtr_nid, &last_route, - &gwni); - if (!best_route) { - CERROR("no route to %s from %s\n", - libcfs_nid2str(dst_nid), libcfs_nid2str(src_nid)); - return -EHOSTUNREACH; - } + gw = best_route->lr_gateway; + LASSERT(gw == gwni->lpni_peer_net->lpn_peer); + local_lnet = best_route->lr_lnet; - if (!gwni) { - CERROR("Internal Error. Route expected to %s from %s\n", - libcfs_nid2str(dst_nid), - libcfs_nid2str(src_nid)); - return -EFAULT; } - gw = best_route->lr_gateway; - LASSERT(gw == gwni->lpni_peer_net->lpn_peer); - /* Discover this gateway if it hasn't already been discovered. * This means we might delay the message until discovery has * completed @@ -1906,14 +1924,15 @@ struct lnet_ni * if (!sd->sd_best_ni) { struct lnet_peer_net *lpeer; - lpeer = lnet_peer_get_net_locked(gw, best_route->lr_lnet); + lpeer = lnet_peer_get_net_locked(gw, local_lnet); sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpeer, sd->sd_md_cpt, true); } + if (!sd->sd_best_ni) { CERROR("Internal Error. Expected local ni on %s but non found :%s\n", - libcfs_net2str(best_route->lr_lnet), + libcfs_net2str(local_lnet), libcfs_nid2str(sd->sd_src_nid)); return -EFAULT; } @@ -1924,9 +1943,11 @@ struct lnet_ni * /* increment the sequence numbers since now we're sure we're * going to use this path */ - LASSERT(best_route && last_route); - best_route->lr_seq = last_route->lr_seq + 1; - best_lpn->lpn_seq++; + if (sd->sd_rtr_nid == LNET_NID_ANY) { + LASSERT(best_route && last_route); + best_route->lr_seq = last_route->lr_seq + 1; + best_lpn->lpn_seq++; + } return 0; } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index d74ff53..86ac692 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -397,10 +397,6 @@ msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits; msg->msg_hdr.msg.ack.mlength = cpu_to_le32(msg->msg_ev.mlength); - /* - * NB: we probably want to use NID of msg::msg_from as 3rd - * parameter (router NID) if it's routed message - */ rc = lnet_send(msg->msg_ev.target.nid, msg, msg->msg_from); lnet_net_lock(cpt); From patchwork Thu Feb 27 21:17:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410539 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57EC2924 for ; Thu, 27 Feb 2020 21:40:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 407F024690 for ; Thu, 27 Feb 2020 21:40:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 407F024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D08A934A852; Thu, 27 Feb 2020 13:33:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D970C34895D for ; Thu, 27 Feb 2020 13:21:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D943291CB; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D66E946D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:08 -0500 Message-Id: <1582838290-17243-561-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 560/622] lustre: all: prefer sizeof(*var) for alloc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The construct var = kzalloc(sizeof(*var, GFP...) is more obviously correct than var = kzalloc(sizeof(struct something), GFP...); and is preferred So convert allocations and frees that use sizeof(struct..) to use one of the simpler constructs. For cfs_percpt_alloc() allocations, we are allocating a array of pointers. so sizeof(*var[0]) is best. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: 11f2c86650fd ("LU-9679 all: prefer sizeof(*var) for ALLOC/FREE") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36661 Reviewed-by: Alex Zhuravlev Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 2 +- fs/lustre/obdecho/echo_client.c | 2 +- net/lnet/klnds/o2iblnd/o2iblnd.c | 9 +++++---- net/lnet/libcfs/libcfs_lock.c | 2 +- net/lnet/lnet/api-ni.c | 7 +++---- net/lnet/lnet/lib-eq.c | 2 +- net/lnet/lnet/lib-ptl.c | 2 +- net/lnet/lnet/router.c | 2 +- net/lnet/selftest/rpc.c | 2 +- 9 files changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 9772194..4fc35c5 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -1137,7 +1137,7 @@ struct lprocfs_stats *lprocfs_alloc_stats(unsigned int num, /* alloc num of counter headers */ stats->ls_cnt_header = kvmalloc_array(stats->ls_num, - sizeof(struct lprocfs_counter_header), + sizeof(*stats->ls_cnt_header), GFP_KERNEL | __GFP_ZERO); if (!stats->ls_cnt_header) goto fail; diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index c473f547..84dea56 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1367,7 +1367,7 @@ static int echo_client_prep_commit(const struct lu_env *env, npages = batch >> PAGE_SHIFT; tot_pages = count >> PAGE_SHIFT; - lnb = kvmalloc_array(npages, sizeof(struct niobuf_local), + lnb = kvmalloc_array(npages, sizeof(*lnb), GFP_NOFS | __GFP_ZERO); if (!lnb) { ret = -ENOMEM; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 1cc5358..04e121b 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -852,7 +852,8 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, kfree(init_qp_attr); - conn->ibc_rxs = kzalloc_cpt(IBLND_RX_MSGS(conn) * sizeof(struct kib_rx), + conn->ibc_rxs = kzalloc_cpt(IBLND_RX_MSGS(conn) * + sizeof(*conn->ibc_rxs), GFP_NOFS, cpt); if (!conn->ibc_rxs) { CERROR("Cannot allocate RX buffers\n"); @@ -2119,7 +2120,7 @@ static int kiblnd_create_tx_pool(struct kib_poolset *ps, int size, return -ENOMEM; } - tpo->tpo_tx_descs = kzalloc_cpt(size * sizeof(struct kib_tx), + tpo->tpo_tx_descs = kzalloc_cpt(size * sizeof(*tpo->tpo_tx_descs), GFP_NOFS, ps->ps_cpt); if (!tpo->tpo_tx_descs) { CERROR("Can't allocate %d tx descriptors\n", size); @@ -2251,7 +2252,7 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, * number of CPTs that exist, i.e net->ibn_fmr_ps[cpt]. */ net->ibn_fmr_ps = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct kib_fmr_poolset)); + sizeof(*net->ibn_fmr_ps[0])); if (!net->ibn_fmr_ps) { CERROR("Failed to allocate FMR pool array\n"); rc = -ENOMEM; @@ -2278,7 +2279,7 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, * number of CPTs that exist, i.e net->ibn_tx_ps[cpt]. */ net->ibn_tx_ps = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct kib_tx_poolset)); + sizeof(*net->ibn_tx_ps[0])); if (!net->ibn_tx_ps) { CERROR("Failed to allocate tx pool array\n"); rc = -ENOMEM; diff --git a/net/lnet/libcfs/libcfs_lock.c b/net/lnet/libcfs/libcfs_lock.c index 3d5157f..313aa95 100644 --- a/net/lnet/libcfs/libcfs_lock.c +++ b/net/lnet/libcfs/libcfs_lock.c @@ -66,7 +66,7 @@ struct cfs_percpt_lock * return NULL; pcl->pcl_cptab = cptab; - pcl->pcl_locks = cfs_percpt_alloc(cptab, sizeof(*lock)); + pcl->pcl_locks = cfs_percpt_alloc(cptab, sizeof(*pcl->pcl_locks[0])); if (!pcl->pcl_locks) { kfree(pcl); return NULL; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 6c913b5..0020ffd 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -970,7 +970,7 @@ static void lnet_assert_wire_constants(void) int rc; int i; - recs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*rec)); + recs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*recs[0])); if (!recs) { CERROR("Failed to allocate %s resource containers\n", lnet_res_type2str(type)); @@ -1033,8 +1033,7 @@ struct list_head ** struct list_head *q; int i; - qs = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct list_head)); + qs = cfs_percpt_alloc(lnet_cpt_table(), sizeof(*qs[0])); if (!qs) { CERROR("Failed to allocate queues\n"); return NULL; @@ -1096,7 +1095,7 @@ struct list_head ** the_lnet.ln_interface_cookie = ktime_get_real_ns(); the_lnet.ln_counters = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct lnet_counters)); + sizeof(*the_lnet.ln_counters[0])); if (!the_lnet.ln_counters) { CERROR("Failed to allocate counters for LNet\n"); rc = -ENOMEM; diff --git a/net/lnet/lnet/lib-eq.c b/net/lnet/lnet/lib-eq.c index 01b8ee3..25af2bd 100644 --- a/net/lnet/lnet/lib-eq.c +++ b/net/lnet/lnet/lib-eq.c @@ -95,7 +95,7 @@ return -ENOMEM; if (count) { - eq->eq_events = kvmalloc_array(count, sizeof(struct lnet_event), + eq->eq_events = kvmalloc_array(count, sizeof(*eq->eq_events), GFP_KERNEL | __GFP_ZERO); if (!eq->eq_events) goto failed; diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c index bb92f37..ae38bc3 100644 --- a/net/lnet/lnet/lib-ptl.c +++ b/net/lnet/lnet/lib-ptl.c @@ -793,7 +793,7 @@ struct list_head * int j; ptl->ptl_mtables = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(struct lnet_match_table)); + sizeof(*ptl->ptl_mtables[0])); if (!ptl->ptl_mtables) { CERROR("Failed to create match table for portal %d\n", index); return -ENOMEM; diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 71ba951..b8f7aba0 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1386,7 +1386,7 @@ bool lnet_router_checker_active(void) the_lnet.ln_rtrpools = cfs_percpt_alloc(lnet_cpt_table(), LNET_NRBPOOLS * - sizeof(struct lnet_rtrbufpool)); + sizeof(*the_lnet.ln_rtrpools[0])); if (!the_lnet.ln_rtrpools) { LCONSOLE_ERROR_MSG(0x10c, "Failed to initialize router buffe pool\n"); diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index 4645f04..7a8226c 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -256,7 +256,7 @@ struct srpc_bulk * svc->sv_shuttingdown = 0; svc->sv_cpt_data = cfs_percpt_alloc(lnet_cpt_table(), - sizeof(**svc->sv_cpt_data)); + sizeof(*svc->sv_cpt_data[0])); if (!svc->sv_cpt_data) return -ENOMEM; From patchwork Thu Feb 27 21:17:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 084E017E0 for ; Thu, 27 Feb 2020 21:42:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4E11246A8 for ; Thu, 27 Feb 2020 21:42:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4E11246A8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6634034AB7F; Thu, 27 Feb 2020 13:34:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3E9A2201341 for ; Thu, 27 Feb 2020 13:21:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DAAD591CC; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D940A46A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:09 -0500 Message-Id: <1582838290-17243-562-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 561/622] lustre: handle: discard OBD_FREE_RCU X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown OBD_FREE_RCU and the hop_free call-back together form an overly complex mechanism equivalent to kfree_rcu() or call_rcu(...). Discard them and use the simpler approach. This removes the only use for the field h_size, so discard that too. WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: 48830f888b6 ("LU-12542 handle: discard OBD_FREE_RCU") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/35797 Reviewed-by: Neil Brown Reviewed-by: Mike Pershin Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Petros Koutoupis Signed-off-by: James Simmons --- fs/lustre/include/lustre_handles.h | 3 --- fs/lustre/include/obd_support.h | 10 ---------- fs/lustre/ldlm/ldlm_lock.c | 16 ++++++++-------- fs/lustre/obdclass/genops.c | 3 +-- fs/lustre/obdclass/lustre_handles.c | 15 --------------- 5 files changed, 9 insertions(+), 38 deletions(-) diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h index 7c93d72..8f733fd 100644 --- a/fs/lustre/include/lustre_handles.h +++ b/fs/lustre/include/lustre_handles.h @@ -46,7 +46,6 @@ #include struct portals_handle_ops { - void (*hop_free)(void *object, int size); /* hop_type is used for some debugging messages */ char *hop_type; }; @@ -72,7 +71,6 @@ struct portals_handle { /* newly added fields to handle the RCU issue. -jxiong */ struct rcu_head h_rcu; spinlock_t h_lock; - unsigned int h_size:31; unsigned int h_in:1; }; @@ -83,7 +81,6 @@ void class_handle_hash(struct portals_handle *, const struct portals_handle_ops *ops); void class_handle_unhash(struct portals_handle *); void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops); -void class_handle_free_cb(struct rcu_head *rcu); int class_handle_init(void); void class_handle_cleanup(void); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index acfd098..5969b6b 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -533,16 +533,6 @@ #define POISON_PAGE(page, val) do { } while (0) #endif -#define OBD_FREE_RCU(ptr, size, handle) \ -do { \ - struct portals_handle *__h = (handle); \ - \ - __h->h_cookie = (unsigned long)(ptr); \ - __h->h_size = (size); \ - call_rcu(&__h->h_rcu, class_handle_free_cb); \ - POISON_PTR(ptr); \ -} while (0) - #define KEY_IS(str) \ (keylen >= (sizeof(str) - 1) && \ memcmp(key, str, (sizeof(str) - 1)) == 0) diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 2471e30..61bf028 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -153,6 +153,13 @@ struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock) } EXPORT_SYMBOL(ldlm_lock_get); +static void lock_handle_free(struct rcu_head *rcu) +{ + struct ldlm_lock *lock = container_of(rcu, struct ldlm_lock, + l_handle.h_rcu); + kmem_cache_free(ldlm_lock_slab, lock); +} + /** * Release lock reference. * @@ -186,7 +193,7 @@ void ldlm_lock_put(struct ldlm_lock *lock) kvfree(lock->l_lvb_data); lu_ref_fini(&lock->l_reference); - OBD_FREE_RCU(lock, sizeof(*lock), &lock->l_handle); + call_rcu(&lock->l_handle.h_rcu, lock_handle_free); } } EXPORT_SYMBOL(ldlm_lock_put); @@ -358,14 +365,7 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock) } } -static void lock_handle_free(void *lock, int size) -{ - LASSERT(size == sizeof(struct ldlm_lock)); - kmem_cache_free(ldlm_lock_slab, lock); -} - static struct portals_handle_ops lock_handle_ops = { - .hop_free = lock_handle_free, .hop_type = "ldlm", }; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index a31e9ce..15bea0d 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -729,11 +729,10 @@ static void class_export_destroy(struct obd_export *exp) if (exp != obd->obd_self_export) class_decref(obd, "export", exp); - OBD_FREE_RCU(exp, sizeof(*exp), &exp->exp_handle); + kfree_rcu(exp, exp_handle.h_rcu); } static struct portals_handle_ops export_handle_ops = { - .hop_free = NULL, .hop_type = "export", }; diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index 95a34db..99c68fe 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -167,21 +167,6 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops) } EXPORT_SYMBOL(class_handle2object); -void class_handle_free_cb(struct rcu_head *rcu) -{ - struct portals_handle *h; - void *ptr; - - h = container_of(rcu, struct portals_handle, h_rcu); - ptr = (void *)(unsigned long)h->h_cookie; - - if (h->h_ops->hop_free) - h->h_ops->hop_free(ptr, h->h_size); - else - kfree(ptr); -} -EXPORT_SYMBOL(class_handle_free_cb); - int class_handle_init(void) { struct handle_bucket *bucket; From patchwork Thu Feb 27 21:17:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0311617E0 for ; Thu, 27 Feb 2020 21:46:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DFA1F246A1 for ; Thu, 27 Feb 2020 21:46:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DFA1F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 69DA534B351; Thu, 27 Feb 2020 13:37:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 94C97348969 for ; Thu, 27 Feb 2020 13:21:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DE58591CD; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DC14446C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:10 -0500 Message-Id: <1582838290-17243-563-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 562/622] lnet: use list_move where appropriate. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown There are several places in lustre where "list_del" (or occasionally "list_del_init") is followed by "list_add" or "list_add_tail" which moves the object to a different list. These can be combined into "list_move" or "list_move_tail". WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 590089790fee ("LU-12678 lnet: use list_move where appropriate.") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36339 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Petros Koutoupis Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 10 ++++------ net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 ++---- net/lnet/klnds/socklnd/socklnd.c | 3 +-- net/lnet/klnds/socklnd/socklnd_cb.c | 3 +-- net/lnet/klnds/socklnd/socklnd_proto.c | 3 +-- net/lnet/lnet/config.c | 3 +-- net/lnet/lnet/lib-move.c | 9 +++------ net/lnet/selftest/console.c | 6 ++---- 8 files changed, 15 insertions(+), 28 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 04e121b..37d8235 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -1565,11 +1565,10 @@ static void kiblnd_fail_fmr_poolset(struct kib_fmr_poolset *fps, struct kib_fmr_pool, fpo_list)) != NULL) { fpo->fpo_failed = 1; - list_del(&fpo->fpo_list); if (!fpo->fpo_map_count) - list_add(&fpo->fpo_list, zombies); + list_move(&fpo->fpo_list, zombies); else - list_add(&fpo->fpo_list, &fps->fps_failed_pool_list); + list_move(&fpo->fpo_list, &fps->fps_failed_pool_list); } spin_unlock(&fps->fps_lock); @@ -1887,11 +1886,10 @@ static void kiblnd_fail_poolset(struct kib_poolset *ps, struct list_head *zombie struct kib_pool, po_list)) == NULL) { po->po_failed = 1; - list_del(&po->po_list); if (!po->po_allocated) - list_add(&po->po_list, zombies); + list_move(&po->po_list, zombies); else - list_add(&po->po_list, &ps->ps_failed_pool_list); + list_move(&po->po_list, &ps->ps_failed_pool_list); } spin_unlock(&ps->ps_lock); } diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index fcd9db2..f769a45 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -986,8 +986,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, (tx = list_first_entry_or_null( &conn->ibc_tx_queue_rsrvd, struct kib_tx, tx_list)) != NULL) { - list_del(&tx->tx_list); - list_add_tail(&tx->tx_list, &conn->ibc_tx_queue); + list_move_tail(&tx->tx_list, &conn->ibc_tx_queue); conn->ibc_reserved_credits--; } @@ -2118,8 +2117,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, */ if (!tx->tx_sending) { tx->tx_queued = 0; - list_del(&tx->tx_list); - list_add(&tx->tx_list, &zombies); + list_move(&tx->tx_list, &zombies); } } diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 1d0bedb..593c205 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1537,8 +1537,7 @@ struct ksock_peer_ni * tx->tx_msg.ksm_zc_cookies[0] = 0; tx->tx_zc_aborted = 1; /* mark it as not-acked */ - list_del(&tx->tx_zc_list); - list_add(&tx->tx_zc_list, &zlist); + list_move(&tx->tx_zc_list, &zlist); } spin_unlock(&peer_ni->ksnp_lock); diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 2b93331..996b231 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -2312,8 +2312,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_TIMEOUT; - list_del(&tx->tx_list); - list_add_tail(&tx->tx_list, &stale_txs); + list_move_tail(&tx->tx_list, &stale_txs); } write_unlock_bh(&ksocknal_data.ksnd_global_lock); diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c index c6ea302..887ed2d 100644 --- a/net/lnet/klnds/socklnd/socklnd_proto.c +++ b/net/lnet/klnds/socklnd/socklnd_proto.c @@ -437,8 +437,7 @@ if (c == cookie1 || c == cookie2 || (cookie1 < c && c < cookie2)) { tx->tx_msg.ksm_zc_cookies[0] = 0; - list_del(&tx->tx_zc_list); - list_add(&tx->tx_zc_list, &zlist); + list_move(&tx->tx_zc_list, &zlist); if (!--count) break; diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index f521b0b..8994882 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1533,8 +1533,7 @@ struct lnet_ni * list_for_each_safe(t, t2, ¤t_nets) { tb = list_entry(t, struct lnet_text_buf, ltb_list); - list_del(&tb->ltb_list); - list_add_tail(&tb->ltb_list, &matched_nets); + list_move_tail(&tb->ltb_list, &matched_nets); len += snprintf(networks + len, sizeof(networks) - len, "%s%s", !len ? "" : ",", diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 6a2833c..da73009 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -195,8 +195,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!tp->tp_threshold || /* needs culling anyway */ nid == LNET_NID_ANY || /* removing all entries */ tp->tp_nid == nid) { /* matched this one */ - list_del(&tp->tp_list); - list_add(&tp->tp_list, &cull); + list_move(&tp->tp_list, &cull); } } @@ -236,8 +235,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * since we may be at interrupt priority on * incoming messages. */ - list_del(&tp->tp_list); - list_add(&tp->tp_list, &cull); + list_move(&tp->tp_list, &cull); } continue; } @@ -251,8 +249,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (outgoing && !tp->tp_threshold) { /* see above */ - list_del(&tp->tp_list); - list_add(&tp->tp_list, &cull); + list_move(&tp->tp_list, &cull); } } break; diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c index abc342c..9f32c1f 100644 --- a/net/lnet/selftest/console.c +++ b/net/lnet/selftest/console.c @@ -316,12 +316,10 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, unsigned int idx = LNET_NIDADDR(ndl->ndl_node->nd_id.nid) % LST_NODE_HASHSIZE; - list_del(&ndl->ndl_hlink); - list_del(&ndl->ndl_link); old->grp_nnode--; - list_add_tail(&ndl->ndl_hlink, &new->grp_ndl_hash[idx]); - list_add_tail(&ndl->ndl_link, &new->grp_ndl_list); + list_move_tail(&ndl->ndl_hlink, &new->grp_ndl_hash[idx]); + list_move_tail(&ndl->ndl_link, &new->grp_ndl_list); new->grp_nnode++; } From patchwork Thu Feb 27 21:17:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C1381580 for ; Thu, 27 Feb 2020 21:47:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7496024690 for ; Thu, 27 Feb 2020 21:47:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7496024690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B30BA34B448; Thu, 27 Feb 2020 13:37:27 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EC9F2348972 for ; Thu, 27 Feb 2020 13:21:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E010E91CE; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DEC22468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:11 -0500 Message-Id: <1582838290-17243-564-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 563/622] lnet: libcfs: provide an scnprintf and start using it X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff snprintf() returns the number of chars that would be needed to hold the complete result, which may be larger that the buffer size. scnprintf differs in it's return value is number of chars actually written (not including the terminating null). Correct the few patterns where the return from snprintf() is used and expected not to exceed the passed buffer size. Cray-bug-id: LUS-7999 WC-bug-id: https://jira.whamcloud.com/browse/LU-12861 Lustre-commit: 998a494fa9a4 ("LU-12861 libcfs: provide an scnprintf and start using it") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/36453 Reviewed-by: Sebastien Buisson Reviewed-by: James Simmons Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/lproc_osc.c | 6 +-- net/lnet/lnet/config.c | 6 +-- net/lnet/lnet/router_proc.c | 128 ++++++++++++++++++++++---------------------- 3 files changed, 70 insertions(+), 70 deletions(-) diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 2bc7047..d545d1b 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -703,9 +703,9 @@ static ssize_t grant_shrink_show(struct kobject *kobj, struct attribute *attr, return len; imp = obd->u.cli.cl_import; - len = snprintf(buf, PAGE_SIZE, "%d\n", - !imp->imp_grant_shrink_disabled && - OCD_HAS_FLAG(&imp->imp_connect_data, GRANT_SHRINK)); + len = scnprintf(buf, PAGE_SIZE, "%d\n", + !imp->imp_grant_shrink_disabled && + OCD_HAS_FLAG(&imp->imp_connect_data, GRANT_SHRINK)); up_read(&obd->u.cli.cl_sem); return len; diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 8994882..f50df88 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1535,9 +1535,9 @@ struct lnet_ni * list_move_tail(&tb->ltb_list, &matched_nets); - len += snprintf(networks + len, sizeof(networks) - len, - "%s%s", !len ? "" : ",", - tb->ltb_text); + len += scnprintf(networks + len, sizeof(networks) - len, + "%s%s", !len ? "" : ",", + tb->ltb_text); if (len >= sizeof(networks)) { CERROR("Too many matched networks\n"); diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index 2e9342c..180bbde 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -105,16 +105,16 @@ static int proc_lnet_stats(struct ctl_table *table, int write, lnet_counters_get(ctrs); common = ctrs->lct_common; - len = snprintf(tmpstr, tmpsiz, - "%u %u %u %u %u %u %u %llu %llu %llu %llu", - common.lcc_msgs_alloc, common.lcc_msgs_max, - common.lcc_errors, - common.lcc_send_count, common.lcc_recv_count, - common.lcc_route_count, common.lcc_drop_count, - common.lcc_send_length, common.lcc_recv_length, - common.lcc_route_length, common.lcc_drop_length); - - if (pos >= min_t(int, len, strlen(tmpstr))) + len = scnprintf(tmpstr, tmpsiz, + "%u %u %u %u %u %u %u %llu %llu %llu %llu", + common.lcc_msgs_alloc, common.lcc_msgs_max, + common.lcc_errors, + common.lcc_send_count, common.lcc_recv_count, + common.lcc_route_count, common.lcc_drop_count, + common.lcc_send_length, common.lcc_recv_length, + common.lcc_route_length, common.lcc_drop_length); + + if (pos >= len) rc = 0; else rc = cfs_trace_copyout_string(buffer, nob, @@ -153,12 +153,12 @@ static int proc_lnet_routes(struct ctl_table *table, int write, s = tmpstr; /* points to current position in tmpstr[] */ if (!*ppos) { - s += snprintf(s, tmpstr + tmpsiz - s, "Routing %s\n", - the_lnet.ln_routing ? "enabled" : "disabled"); + s += scnprintf(s, tmpstr + tmpsiz - s, "Routing %s\n", + the_lnet.ln_routing ? "enabled" : "disabled"); LASSERT(tmpstr + tmpsiz - s > 0); - s += snprintf(s, tmpstr + tmpsiz - s, "%-8s %4s %8s %7s %s\n", - "net", "hops", "priority", "state", "router"); + s += scnprintf(s, tmpstr + tmpsiz - s, "%-8s %4s %8s %7s %s\n", + "net", "hops", "priority", "state", "router"); LASSERT(tmpstr + tmpsiz - s > 0); lnet_net_lock(0); @@ -217,12 +217,12 @@ static int proc_lnet_routes(struct ctl_table *table, int write, unsigned int priority = route->lr_priority; int alive = lnet_is_route_alive(route); - s += snprintf(s, tmpstr + tmpsiz - s, - "%-8s %4d %8u %7s %s\n", - libcfs_net2str(net), hops, - priority, - alive ? "up" : "down", - libcfs_nid2str(route->lr_nid)); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-8s %4d %8u %7s %s\n", + libcfs_net2str(net), hops, + priority, + alive ? "up" : "down", + libcfs_nid2str(route->lr_nid)); LASSERT(tmpstr + tmpsiz - s > 0); } @@ -276,9 +276,9 @@ static int proc_lnet_routers(struct ctl_table *table, int write, s = tmpstr; /* points to current position in tmpstr[] */ if (!*ppos) { - s += snprintf(s, tmpstr + tmpsiz - s, - "%-4s %7s %5s %s\n", - "ref", "rtr_ref", "alive", "router"); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-4s %7s %5s %s\n", + "ref", "rtr_ref", "alive", "router"); LASSERT(tmpstr + tmpsiz - s > 0); lnet_net_lock(0); @@ -320,11 +320,11 @@ static int proc_lnet_routers(struct ctl_table *table, int write, int nrtrrefs = peer->lp_rtr_refcount; int alive = lnet_is_gateway_alive(peer); - s += snprintf(s, tmpstr + tmpsiz - s, - "%-4d %7d %5s %s\n", - nrefs, nrtrrefs, - alive ? "up" : "down", - libcfs_nid2str(nid)); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-4d %7d %5s %s\n", + nrefs, nrtrrefs, + alive ? "up" : "down", + libcfs_nid2str(nid)); } lnet_net_unlock(0); @@ -411,10 +411,10 @@ static int proc_lnet_peers(struct ctl_table *table, int write, s = tmpstr; /* points to current position in tmpstr[] */ if (!*ppos) { - s += snprintf(s, tmpstr + tmpsiz - s, - "%-24s %4s %5s %5s %5s %5s %5s %5s %5s %s\n", - "nid", "refs", "state", "last", "max", - "rtr", "min", "tx", "min", "queue"); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-24s %4s %5s %5s %5s %5s %5s %5s %5s %s\n", + "nid", "refs", "state", "last", "max", + "rtr", "min", "tx", "min", "queue"); LASSERT(tmpstr + tmpsiz - s > 0); hoff++; @@ -498,11 +498,11 @@ static int proc_lnet_peers(struct ctl_table *table, int write, lnet_net_unlock(cpt); - s += snprintf(s, tmpstr + tmpsiz - s, - "%-24s %4d %5s %5lld %5d %5d %5d %5d %5d %d\n", - libcfs_nid2str(nid), nrefs, aliveness, - lastalive, maxcr, rtrcr, minrtrcr, txcr, - mintxcr, txqnob); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-24s %4d %5s %5lld %5d %5d %5d %5d %5d %d\n", + libcfs_nid2str(nid), nrefs, aliveness, + lastalive, maxcr, rtrcr, minrtrcr, txcr, + mintxcr, txqnob); LASSERT(tmpstr + tmpsiz - s > 0); } else { /* peer is NULL */ @@ -560,9 +560,9 @@ static int proc_lnet_buffers(struct ctl_table *table, int write, s = tmpstr; /* points to current position in tmpstr[] */ - s += snprintf(s, tmpstr + tmpsiz - s, - "%5s %5s %7s %7s\n", - "pages", "count", "credits", "min"); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%5s %5s %7s %7s\n", + "pages", "count", "credits", "min"); LASSERT(tmpstr + tmpsiz - s > 0); if (!the_lnet.ln_rtrpools) @@ -573,12 +573,12 @@ static int proc_lnet_buffers(struct ctl_table *table, int write, lnet_net_lock(LNET_LOCK_EX); cfs_percpt_for_each(rbp, i, the_lnet.ln_rtrpools) { - s += snprintf(s, tmpstr + tmpsiz - s, - "%5d %5d %7d %7d\n", - rbp[idx].rbp_npages, - rbp[idx].rbp_nbuffers, - rbp[idx].rbp_credits, - rbp[idx].rbp_mincredits); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%5d %5d %7d %7d\n", + rbp[idx].rbp_npages, + rbp[idx].rbp_nbuffers, + rbp[idx].rbp_credits, + rbp[idx].rbp_mincredits); LASSERT(tmpstr + tmpsiz - s > 0); } lnet_net_unlock(LNET_LOCK_EX); @@ -652,10 +652,10 @@ static int proc_lnet_nis(struct ctl_table *table, int write, s = tmpstr; /* points to current position in tmpstr[] */ if (!*ppos) { - s += snprintf(s, tmpstr + tmpsiz - s, - "%-24s %6s %5s %4s %4s %4s %5s %5s %5s\n", - "nid", "status", "alive", "refs", "peer", - "rtr", "max", "tx", "min"); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-24s %6s %5s %4s %4s %4s %5s %5s %5s\n", + "nid", "status", "alive", "refs", "peer", + "rtr", "max", "tx", "min"); LASSERT(tmpstr + tmpsiz - s > 0); } else { struct lnet_ni *ni = NULL; @@ -705,15 +705,15 @@ static int proc_lnet_nis(struct ctl_table *table, int write, if (i) lnet_net_lock(i); - s += snprintf(s, tmpstr + tmpsiz - s, - "%-24s %6s %5lld %4d %4d %4d %5d %5d %5d\n", - libcfs_nid2str(ni->ni_nid), stat, - last_alive, *ni->ni_refs[i], - ni->ni_net->net_tunables.lct_peer_tx_credits, - ni->ni_net->net_tunables.lct_peer_rtr_credits, - tq->tq_credits_max, - tq->tq_credits, - tq->tq_credits_min); + s += scnprintf(s, tmpstr + tmpsiz - s, + "%-24s %6s %5lld %4d %4d %4d %5d %5d %5d\n", + libcfs_nid2str(ni->ni_nid), stat, + last_alive, *ni->ni_refs[i], + ni->ni_net->net_tunables.lct_peer_tx_credits, + ni->ni_net->net_tunables.lct_peer_rtr_credits, + tq->tq_credits_max, + tq->tq_credits, + tq->tq_credits_min); if (i) lnet_net_unlock(i); } @@ -803,11 +803,11 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write, LASSERT(portal_rotors[i].pr_value == portal_rotor); lnet_res_unlock(0); - rc = snprintf(buf, buf_len, - "{\n\tportals: all\n" - "\trotor: %s\n\tdescription: %s\n}", - portal_rotors[i].pr_name, - portal_rotors[i].pr_desc); + rc = scnprintf(buf, buf_len, + "{\n\tportals: all\n" + "\trotor: %s\n\tdescription: %s\n}", + portal_rotors[i].pr_name, + portal_rotors[i].pr_desc); if (pos >= min_t(int, rc, buf_len)) { rc = 0; From patchwork Thu Feb 27 21:17:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410741 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ACDB7924 for ; Thu, 27 Feb 2020 21:45:42 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9594F246A1 for ; Thu, 27 Feb 2020 21:45:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9594F246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DEC7A34B083; Thu, 27 Feb 2020 13:36:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E67E21FF31 for ; Thu, 27 Feb 2020 13:21:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E33C691CF; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E1A7C47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:12 -0500 Message-Id: <1582838290-17243-565-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 564/622] lustre: llite: fetch default layout for a directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Jian Yu For a directory that does not have trusted.lov xattr, the current "lfs getstripe" will only print the stripe_count, stripe_size, and stripe_index that are fetched from the /sys/fs/lustre/lov values. It doesn't show the actual default layout that will be used when new files will be created in that directory. This patch fixes the above issue in ll_dir_getstripe_default() by fetching the layout from root FID after ll_dir_get_default_layout() returns -ENODATA from a directory that does not have trusted.lov xattr. WC-bug-id: https://jira.whamcloud.com/browse/LU-11656 Lustre-commit: 3e8fa8a7396c ("LU-11656 llite: fetch default layout for a directory") Signed-off-by: Jian Yu Reviewed-on: https://review.whamcloud.com/36609 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 102 ++++++++++++++++++++++++++++----- fs/lustre/llite/llite_internal.h | 9 ++- fs/lustre/llite/xattr.c | 7 ++- include/uapi/linux/lustre/lustre_fid.h | 7 +++ 4 files changed, 107 insertions(+), 18 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index c38862e..b1ec905 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -635,16 +635,10 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, return rc; } -/** - * This function will be used to get default LOV/LMV/Default LMV - * @valid will be used to indicate which stripe it will retrieve - * OBD_MD_MEA LMV stripe EA - * OBD_MD_DEFAULT_MEA Default LMV stripe EA - * otherwise Default LOV EA. - * Each time, it can only retrieve 1 stripe EA - **/ -int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, - struct ptlrpc_request **request, u64 valid) +static int ll_dir_get_default_layout(struct inode *inode, void **plmm, + int *plmm_size, + struct ptlrpc_request **request, u64 valid, + enum get_default_layout_type type) { struct ll_sb_info *sbi = ll_i2sbi(inode); struct mdt_body *body; @@ -652,6 +646,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, struct ptlrpc_request *req = NULL; int rc, lmmsize; struct md_op_data *op_data; + struct lu_fid fid; rc = ll_get_max_mdsize(sbi, &lmmsize); if (rc) @@ -664,11 +659,19 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, return PTR_ERR(op_data); op_data->op_valid = valid | OBD_MD_FLEASIZE | OBD_MD_FLDIREA; + + if (type == GET_DEFAULT_LAYOUT_ROOT) { + lu_root_fid(&op_data->op_fid1); + fid = op_data->op_fid1; + } else { + fid = *ll_inode2fid(inode); + } + rc = md_getattr(sbi->ll_md_exp, op_data, &req); ll_finish_md_op_data(op_data); if (rc < 0) { CDEBUG(D_INFO, "md_getattr failed on inode " DFID ": rc %d\n", - PFID(ll_inode2fid(inode)), rc); + PFID(&fid), rc); goto out; } @@ -730,6 +733,70 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, return rc; } +/** + * This function will be used to get default LOV/LMV/Default LMV + * @valid will be used to indicate which stripe it will retrieve. + * If the directory does not have its own default layout, then the + * function will request the default layout from root FID. + * OBD_MD_MEA LMV stripe EA + * OBD_MD_DEFAULT_MEA Default LMV stripe EA + * otherwise Default LOV EA. + * Each time, it can only retrieve 1 stripe EA + */ +int ll_dir_getstripe_default(struct inode *inode, void **plmm, int *plmm_size, + struct ptlrpc_request **request, + struct ptlrpc_request **root_request, + u64 valid) +{ + struct ptlrpc_request *req = NULL; + struct ptlrpc_request *root_req = NULL; + struct lov_mds_md *lmm = NULL; + int lmm_size = 0; + int rc = 0; + + rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size, + &req, valid, 0); + if (rc == -ENODATA && !fid_is_root(ll_inode2fid(inode)) && + !(valid & (OBD_MD_MEA|OBD_MD_DEFAULT_MEA)) && root_request) + rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size, + &root_req, valid, + GET_DEFAULT_LAYOUT_ROOT); + + *plmm = lmm; + *plmm_size = lmm_size; + *request = req; + if (root_request) + *root_request = root_req; + + return rc; +} + +/** + * This function will be used to get default LOV/LMV/Default LMV + * @valid will be used to indicate which stripe it will retrieve + * OBD_MD_MEA LMV stripe EA + * OBD_MD_DEFAULT_MEA Default LMV stripe EA + * otherwise Default LOV EA. + * Each time, it can only retrieve 1 stripe EA + */ +int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, + struct ptlrpc_request **request, u64 valid) +{ + struct ptlrpc_request *req = NULL; + struct lov_mds_md *lmm = NULL; + int lmm_size = 0; + int rc = 0; + + rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size, + &req, valid, 0); + + *plmm = lmm; + *plmm_size = lmm_size; + *request = req; + + return rc; +} + int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid) { struct md_op_data *op_data; @@ -1465,6 +1532,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct lmv_user_md __user *ulmv; struct lmv_user_md lum; struct ptlrpc_request *request = NULL; + struct ptlrpc_request *root_request = NULL; struct lmv_user_md *tmp = NULL; union lmv_mds_md *lmm = NULL; u64 valid = 0; @@ -1493,8 +1561,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) else return -EINVAL; - rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, &request, - valid); + rc = ll_dir_getstripe_default(inode, (void **)&lmm, &lmmsize, + &request, &root_request, valid); if (rc) goto finish_req; @@ -1595,6 +1663,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) kfree(tmp); finish_req: ptlrpc_req_finished(request); + ptlrpc_req_finished(root_request); return rc; } case LL_IOC_RMFID: @@ -1611,6 +1680,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case IOC_MDC_GETFILEINFO_OLD: case IOC_MDC_GETFILESTRIPE: { struct ptlrpc_request *request = NULL; + struct ptlrpc_request *root_request = NULL; struct lov_user_md __user *lump; struct lov_mds_md *lmm = NULL; struct mdt_body *body; @@ -1632,8 +1702,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) rc = ll_lov_getstripe_ea_info(inode, filename, &lmm, &lmmsize, &request); } else { - rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, - &request, 0); + rc = ll_dir_getstripe_default(inode, (void **)&lmm, + &lmmsize, &request, + &root_request, 0); } if (request) { @@ -1786,6 +1857,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) out_req: ptlrpc_req_finished(request); + ptlrpc_req_finished(root_request); if (filename) ll_putname(filename); return rc; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index fe9d568..def4df0 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -841,6 +841,10 @@ struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data, u64 offset); void ll_release_page(struct inode *inode, struct page *page, bool remove); +enum get_default_layout_type { + GET_DEFAULT_LAYOUT_ROOT = 1, +}; + /* llite/namei.c */ extern const struct inode_operations ll_special_inode_operations; @@ -911,7 +915,10 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename, struct ptlrpc_request **request); int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump, int set_default); -int ll_dir_getstripe(struct inode *inode, void **lmmp, int *lmm_size, +int ll_dir_getstripe_default(struct inode *inode, void **lmmp, + int *lmm_size, struct ptlrpc_request **request, + struct ptlrpc_request **root_request, u64 valid); +int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, struct ptlrpc_request **request, u64 valid); int ll_fsync(struct file *file, loff_t start, loff_t end, int data); int ll_merge_attr(const struct lu_env *env, struct inode *inode); diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 7134f10..e76d2c3 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -522,11 +522,12 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) return rc; } else if (S_ISDIR(inode->i_mode)) { struct ptlrpc_request *req = NULL; + struct ptlrpc_request *root_req = NULL; struct lov_mds_md *lmm = NULL; int lmm_size = 0; - rc = ll_dir_getstripe(inode, (void **)&lmm, &lmm_size, - &req, 0); + rc = ll_dir_getstripe_default(inode, (void **)&lmm, &lmm_size, + &req, &root_req, 0); if (rc < 0) goto out_req; @@ -545,6 +546,8 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size) out_req: if (req) ptlrpc_req_finished(req); + if (root_req) + ptlrpc_req_finished(root_req); return rc; } else { diff --git a/include/uapi/linux/lustre/lustre_fid.h b/include/uapi/linux/lustre/lustre_fid.h index 79574c0..d6e59cc 100644 --- a/include/uapi/linux/lustre/lustre_fid.h +++ b/include/uapi/linux/lustre/lustre_fid.h @@ -135,6 +135,13 @@ static inline bool fid_is_mdt0(const struct lu_fid *fid) return fid_seq_is_mdt0(fid_seq(fid)); } +static inline void lu_root_fid(struct lu_fid *fid) +{ + fid->f_seq = FID_SEQ_ROOT; + fid->f_oid = FID_OID_ROOT; + fid->f_ver = 0; +} + /** * Check if a fid is igif or not. * From patchwork Thu Feb 27 21:17:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 423DE924 for ; Thu, 27 Feb 2020 21:44:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2987124690 for ; Thu, 27 Feb 2020 21:44:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2987124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E2B9634AE7D; Thu, 27 Feb 2020 13:35:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4BA021FD2E for ; Thu, 27 Feb 2020 13:21:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E5A1591D0; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E459B46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:13 -0500 Message-Id: <1582838290-17243-566-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 565/622] lnet: fix rspt counter X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov rsp entries must freed via lnet_rspt_free function to avoid counter leak. handle NULL allocation properly. Cray-bug-id: LUS-8189 WC-bug-id: https://jira.whamcloud.com/browse/LU-12991 Lustre-commit: 027a4722b26d ("LU-12991 lnet: fix rspt counter") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/36895 Reviewed-by: Alexandr Boyko Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 8 +++++--- net/lnet/lnet/lib-move.c | 6 +++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 56556fd..3b597e3 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -438,9 +438,11 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, struct lnet_rsp_tracker *rspt; rspt = kzalloc(sizeof(*rspt), GFP_NOFS); - lnet_net_lock(cpt); - the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc++; - lnet_net_unlock(cpt); + if (rspt) { + lnet_net_lock(cpt); + the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc++; + lnet_net_unlock(cpt); + } return rspt; } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index da73009..73f9d20 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4390,7 +4390,7 @@ void lnet_monitor_thr_stop(void) /* we already have an rspt attached to the md, so we'll * update the deadline on that one. */ - kfree(rspt); + lnet_rspt_free(rspt, cpt); new_entry = false; } else { /* new md */ @@ -4511,7 +4511,7 @@ void lnet_monitor_thr_stop(void) md->md_me->me_portal); lnet_res_unlock(cpt); - kfree(rspt); + lnet_rspt_free(rspt, cpt); kfree(msg); return -ENOENT; } @@ -4745,7 +4745,7 @@ struct lnet_msg * lnet_res_unlock(cpt); kfree(msg); - kfree(rspt); + lnet_rspt_free(rspt, cpt); return -ENOENT; } From patchwork Thu Feb 27 21:17:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5533C17E0 for ; Thu, 27 Feb 2020 21:42:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3DEC7246A1 for ; Thu, 27 Feb 2020 21:42:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DEC7246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EEE7234ABC4; Thu, 27 Feb 2020 13:34:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E75FA34897A for ; Thu, 27 Feb 2020 13:21:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E85AF91D1; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E71D446A; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:14 -0500 Message-Id: <1582838290-17243-567-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 566/622] lustre: ldlm: add a counter to the per-namespace data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown When we change the resource hash to rhashtable we won't have a per-bucket counter. We could use the nelems global counter, but ldlm_resource goes to some trouble to avoid having any table-wide atomics, and hopefully rhashtable will grow the ability to disable the global counter in the near future. Having a counter we control makes it easier to manage the back-reference to the namespace when there is anything in the hash table. So add a counter to the ldlm_ns_bucket. WC-bug-id: https://jira.whamcloud.com/browse/LU-8130 Lustre-commit: f9314d6e9259e6c7 ("LU-8130 ldlm: add a counter to the per-namespace data") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36219 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 2 ++ fs/lustre/ldlm/ldlm_resource.c | 10 +++++----- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index cc4b8b0..9ca79f4 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -306,6 +306,8 @@ struct ldlm_ns_bucket { * fact the network or overall system load is at fault */ struct adaptive_timeout nsb_at_estimate; + /* counter of entries in this bucket */ + atomic_t nsb_count; }; enum { diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index 65ff32c..d009d5d 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -133,12 +133,11 @@ static ssize_t resource_count_show(struct kobject *kobj, struct attribute *attr, struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace, ns_kobj); u64 res = 0; - struct cfs_hash_bd bd; int i; /* result is not strictly consistent */ - cfs_hash_for_each_bucket(ns->ns_rs_hash, &bd, i) - res += cfs_hash_bd_count_get(&bd); + for (i = 0; i < (1 << ns->ns_bucket_bits); i++) + res += atomic_read(&ns->ns_rs_buckets[i].nsb_count); return sprintf(buf, "%lld\n", res); } LUSTRE_RO_ATTR(resource_count); @@ -647,6 +646,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, at_init(&nsb->nsb_at_estimate, ldlm_enqueue_min, 0); nsb->nsb_namespace = ns; + atomic_set(&nsb->nsb_count, 0); } ns->ns_obd = obd; @@ -1126,7 +1126,7 @@ struct ldlm_resource * } /* We won! Let's add the resource. */ cfs_hash_bd_add_locked(ns->ns_rs_hash, &bd, &res->lr_hash); - if (cfs_hash_bd_count_get(&bd) == 1) + if (atomic_inc_return(&res->lr_ns_bucket->nsb_count) == 1) ns_refcount = ldlm_namespace_get_return(ns); cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 1); @@ -1170,7 +1170,7 @@ static void __ldlm_resource_putref_final(struct cfs_hash_bd *bd, cfs_hash_bd_unlock(ns->ns_rs_hash, bd, 1); if (ns->ns_lvbo && ns->ns_lvbo->lvbo_free) ns->ns_lvbo->lvbo_free(res); - if (cfs_hash_bd_count_get(bd) == 0) + if (atomic_dec_and_test(&nsb->nsb_count)) ldlm_namespace_put(ns); if (res->lr_itree) kmem_cache_free(ldlm_interval_tree_slab, res->lr_itree); From patchwork Thu Feb 27 21:17:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0CD6F138D for ; Thu, 27 Feb 2020 21:44:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E9B3D24690 for ; Thu, 27 Feb 2020 21:44:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E9B3D24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9AA5934AEB7; Thu, 27 Feb 2020 13:35:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4AF89348984 for ; Thu, 27 Feb 2020 13:21:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EB43891D2; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E9E7C46C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:15 -0500 Message-Id: <1582838290-17243-568-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 567/622] lnet: Add peer level aliveness information X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Keep track of the aliveness of a peer so that we can optimize for situations where an LNet router hasn't responded to a ping. In this situation we consider all routes down, and we needn't spend time inspecting each route, or inspecting all of the router's local and remote interfaces in order to determine the router's aliveness. Cray-bug-id: LUS-7860 WC-bug-id: https://jira.whamcloud.com/browse/LU-12941 Lustre-commit: ebc9835a971f ("LU-12941 lnet: Add peer level aliveness information") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36678 Reviewed-by: Neil Brown Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 3 +++ net/lnet/lnet/peer.c | 4 ++++ net/lnet/lnet/router.c | 52 ++++++++++++++++++++++++------------------ 3 files changed, 37 insertions(+), 22 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index e105308..02ac5df 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -672,6 +672,9 @@ struct lnet_peer { /* tasks waiting on discovery of this peer */ wait_queue_head_t lp_dc_waitq; + + /* cached peer aliveness */ + bool lp_alive; }; /* diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4f0da4b..b168c97 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -216,6 +216,10 @@ init_waitqueue_head(&lp->lp_dc_waitq); spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; + if (lnet_peers_start_down()) + lp->lp_alive = false; + else + lp->lp_alive = true; /* all peers created on a router should have health on * if it's not already on. diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index b8f7aba0..7ba406a 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -179,7 +179,9 @@ static int rtr_sensitivity_set(const char *val, return check_routers_before_use; } -/* A net is alive if at least one gateway NI on the network is alive. */ +/* The peer_net of a gateway is alive if at least one of the peer_ni's on + * that peer_net is alive. + */ static bool lnet_is_gateway_net_alive(struct lnet_peer_net *lpn) { @@ -200,6 +202,9 @@ bool lnet_is_gateway_alive(struct lnet_peer *gw) { struct lnet_peer_net *lpn; + if (!gw->lp_alive) + return false; + list_for_each_entry(lpn, &gw->lp_peer_nets, lpn_peer_nets) { if (!lnet_is_gateway_net_alive(lpn)) return false; @@ -219,7 +224,10 @@ bool lnet_is_route_alive(struct lnet_route *route) struct lnet_peer *gw = route->lr_gateway; struct lnet_peer_net *llpn; struct lnet_peer_net *rlpn; - bool route_alive; + + /* If the gateway is down then all routes are considered down */ + if (!gw->lp_alive) + return false; /* if discovery is disabled then rely on the cached aliveness * information. This is handicapped information which we log when @@ -230,36 +238,34 @@ bool lnet_is_route_alive(struct lnet_route *route) if (lnet_is_discovery_disabled(gw)) return route->lr_alive; - /* check the gateway's interfaces on the route rnet to make sure - * that the gateway is viable. - */ + /* check the gateway's interfaces on the local network */ llpn = lnet_peer_get_net_locked(gw, route->lr_lnet); if (!llpn) return false; - route_alive = lnet_is_gateway_net_alive(llpn); + if (!lnet_is_gateway_net_alive(llpn)) + return false; if (avoid_asym_router_failure) { + /* Check the gateway's interfaces on the remote network */ rlpn = lnet_peer_get_net_locked(gw, route->lr_net); if (!rlpn) return false; - route_alive = route_alive && - lnet_is_gateway_net_alive(rlpn); + if (!lnet_is_gateway_net_alive(rlpn)) + return false; } - if (!route_alive) - return route_alive; - spin_lock(&gw->lp_lock); if (!(gw->lp_state & LNET_PEER_ROUTER_ENABLED)) { + spin_unlock(&gw->lp_lock); if (gw->lp_rtr_refcount > 0) CERROR("peer %s is being used as a gateway but routing feature is not turned on\n", libcfs_nid2str(gw->lp_primary_nid)); - route_alive = false; + return false; } spin_unlock(&gw->lp_lock); - return route_alive; + return true; } void @@ -409,21 +415,22 @@ bool lnet_is_route_alive(struct lnet_route *route) spin_lock(&lp->lp_lock); lp->lp_state &= ~LNET_PEER_RTR_DISCOVERY; lp->lp_state |= LNET_PEER_RTR_DISCOVERED; + lp->lp_alive = lp->lp_dc_error == 0; spin_unlock(&lp->lp_lock); /* Router discovery successful? All peer information would've been * updated already. No need to do any more processing */ - if (!lp->lp_dc_error) + if (lp->lp_alive) return; - /* discovery failed? then we need to set the status of each lpni - * to DOWN. It will be updated the next time we discover the - * router. For router peer NIs not on local networks, we never send - * messages directly to them, so their health will always remain - * at maximum. We can only tell if they are up or down from the - * status returned in the PING response. If we fail to get that - * status in our scheduled router discovery, then we'll assume - * it's down until we're told otherwise. + + /* We do not send messages directly to the remote interfaces + * of an LNet router. As such, we rely on the PING response + * to determine the up/down status of these interfaces. If + * a PING response is not receieved, or some other problem with + * discovery occurs that prevents us from getting this status, + * we assume all interfaces are down until we're able to + * determine otherwise. */ CDEBUG(D_NET, "%s: Router discovery failed %d\n", libcfs_nid2str(lp->lp_primary_nid), lp->lp_dc_error); @@ -1629,6 +1636,7 @@ bool lnet_router_checker_active(void) lnet_peer_ni_decref_locked(lpni); if (lpni && lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer) { lp = lpni->lpni_peer_net->lpn_peer; + lp->lp_alive = alive; list_for_each_entry(route, &lp->lp_routes, lr_gwlist) lnet_set_route_aliveness(route, alive); } From patchwork Thu Feb 27 21:17:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A476924 for ; Thu, 27 Feb 2020 21:48:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 32D6324690 for ; Thu, 27 Feb 2020 21:48:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 32D6324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 904F434A4C5; Thu, 27 Feb 2020 13:39:07 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A1EA434898A for ; Thu, 27 Feb 2020 13:21:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id EDBEA91D3; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EC9EF468; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:16 -0500 Message-Id: <1582838290-17243-569-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 568/622] lnet: always check return of try_module_get() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown try_module_get() can fail, so the return value should be checked. If we *know* that we already hold a reference, __module_get() should be used instead. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: a1282a0d8a53 ("LU-9679 lnet: always check return of try_module_get()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36854 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 4 +++- net/lnet/klnds/socklnd/socklnd.c | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 37d8235..f6db2c7 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2693,7 +2693,9 @@ static int kiblnd_base_startup(struct net *ns) LASSERT(kiblnd_data.kib_init == IBLND_INIT_NOTHING); - try_module_get(THIS_MODULE); + if (!try_module_get(THIS_MODULE)) + goto failed; + /* zero pointers, flags etc */ memset(&kiblnd_data, 0, sizeof(kiblnd_data)); diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 593c205..9a19a3f 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2357,7 +2357,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) /* flag lists/ptrs/locks initialised */ ksocknal_data.ksnd_init = SOCKNAL_INIT_DATA; - try_module_get(THIS_MODULE); + if (!try_module_get(THIS_MODULE)) + goto failed; /* Create a scheduler block per available CPT */ ksocknal_data.ksnd_schedulers = cfs_percpt_alloc(lnet_cpt_table(), From patchwork Thu Feb 27 21:17:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 145F21580 for ; Thu, 27 Feb 2020 21:47:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F129324690 for ; Thu, 27 Feb 2020 21:47:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F129324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CBBAA34A1EA; Thu, 27 Feb 2020 13:37:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E786B34898F for ; Thu, 27 Feb 2020 13:21:15 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F07AB91D4; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EF5EA47C; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:17 -0500 Message-Id: <1582838290-17243-570-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 569/622] lustre: obdclass: don't skip records for wrapped catalog X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko osp_sync_thread() uses opd_sync_last_catalog_idx as a start point of catalog processing. It is used at llog_cat_process_cb also, to skip records from processing. When catalog is wrapped, processing starts from second part of catalog and then a first part. So, a first part would be skipped at llog_cat_process_cb() base on lpd_startcat. osp_sync_thread() restarts a processing loop with a opd_sync_last_catalog_idx. For a wrapped it increases last index and one more increase do a llog_process_thread. This leads to a skipped records at catalog, they would not be processed. The patch fixes these issues. It also adds sanity test 135 and 136 as regression tests. WC-bug-id: https://jira.whamcloud.com/browse/LU-13069 Lustre-commit: cc1092291932 ("LU-13069 obdclass: don't skip records for wrapped catalog") Signed-off-by: Alexander Boyko Cray-bug-id: LUS-8053,LUS-8236 Reviewed-on: https://review.whamcloud.com/36996 Reviewed-by: Andriy Skulysh Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 2 ++ fs/lustre/obdclass/llog.c | 9 +++++++++ fs/lustre/obdclass/llog_cat.c | 1 + 3 files changed, 12 insertions(+) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 5969b6b..a26ac76 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -447,6 +447,8 @@ /* was OBD_FAIL_LLOG_CATINFO_NET 0x1309 until 2.3 */ #define OBD_FAIL_MDS_SYNC_CAPA_SL 0x1310 #define OBD_FAIL_SEQ_ALLOC 0x1311 +#define OBD_FAIL_PLAIN_RECORDS 0x1319 +#define OBD_FAIL_CATALOG_FULL_CHECK 0x131a #define OBD_FAIL_LLITE 0x1400 #define OBD_FAIL_LLITE_FAULT_TRUNC_RACE 0x1401 diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c index 4e9fd17..620ebc6 100644 --- a/fs/lustre/obdclass/llog.c +++ b/fs/lustre/obdclass/llog.c @@ -453,6 +453,8 @@ int llog_process_or_fork(const struct lu_env *env, llog_cb_t cb, void *data, void *catdata, bool fork) { struct llog_process_info *lpi; + struct llog_process_data *d = data; + struct llog_process_cat_data *cd = catdata; int rc; lpi = kzalloc(sizeof(*lpi), GFP_KERNEL); @@ -463,6 +465,13 @@ int llog_process_or_fork(const struct lu_env *env, lpi->lpi_cbdata = data; lpi->lpi_catdata = catdata; + CDEBUG(D_OTHER, + "Processing " DFID " flags 0x%03x startcat %d startidx %d first_idx %d last_idx %d\n", + PFID(&loghandle->lgh_id.lgl_oi.oi_fid), + loghandle->lgh_hdr->llh_flags, d ? d->lpd_startcat : -1, + d ? d->lpd_startidx : -1, cd ? cd->lpcd_first_idx : -1, + cd ? cd->lpcd_last_idx : -1); + if (fork) { struct task_struct *task; diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index 30b0ac5..75226f4 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -244,6 +244,7 @@ static int llog_cat_process_or_fork(const struct lu_env *env, * catalog bottom. */ startcat = 0; + d.lpd_startcat = 0; if (rc != 0) return rc; } From patchwork Thu Feb 27 21:17:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06F14924 for ; Thu, 27 Feb 2020 21:44:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E30BD24690 for ; Thu, 27 Feb 2020 21:44:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E30BD24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7B54234AED9; Thu, 27 Feb 2020 13:35:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 48983348995 for ; Thu, 27 Feb 2020 13:21:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id F383E91D5; Thu, 27 Feb 2020 16:18:19 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F20CB46D; Thu, 27 Feb 2020 16:18:19 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:18 -0500 Message-Id: <1582838290-17243-571-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 570/622] lnet: Refactor lnet_find_best_lpni_on_net X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Replace lnet_send_data argument. WC-bug-id: https://jira.whamcloud.com/browse/LU-12756 Lustre-commit: 80edb2ad72ba ("LU-12756 lnet: Refactor lnet_find_best_lpni_on_net") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36534 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 73f9d20..c8266f0 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1247,8 +1247,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, /* Prerequisite: the best_ni should already be set in the sd */ static inline struct lnet_peer_ni * -lnet_find_best_lpni_on_net(struct lnet_send_data *sd, struct lnet_peer *peer, - u32 net_id) +lnet_find_best_lpni_on_net(struct lnet_ni *lni, lnet_nid_t dst_nid, + struct lnet_peer *peer, u32 net_id) { struct lnet_peer_net *peer_net; @@ -1264,8 +1264,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return NULL; } - return lnet_select_peer_ni(sd->sd_best_ni, sd->sd_dst_nid, - peer, peer_net); + return lnet_select_peer_ni(lni, dst_nid, peer, peer_net); } static int @@ -1278,13 +1277,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_peer *lp2 = r2->lr_gateway; struct lnet_peer_ni *lpni1; struct lnet_peer_ni *lpni2; - struct lnet_send_data sd; int rc; - sd.sd_best_ni = NULL; - sd.sd_dst_nid = LNET_NID_ANY; - lpni1 = lnet_find_best_lpni_on_net(&sd, lp1, r1->lr_lnet); - lpni2 = lnet_find_best_lpni_on_net(&sd, lp2, r2->lr_lnet); + lpni1 = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, lp1, + r1->lr_lnet); + lpni2 = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, lp2, + r2->lr_lnet); LASSERT(lpni1 && lpni2); if (r1->lr_priority < r2->lr_priority) { @@ -1878,7 +1876,9 @@ struct lnet_ni * return -EHOSTUNREACH; } - sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd, lp, + sd->sd_best_lpni = lnet_find_best_lpni_on_net(sd->sd_best_ni, + sd->sd_dst_nid, + lp, best_lpn->lpn_net_id); if (!sd->sd_best_lpni) { CERROR("peer %s down\n", @@ -2191,7 +2191,8 @@ struct lnet_ni * lnet_msg_discovery(sd->sd_msg)); if (sd->sd_best_ni) { sd->sd_best_lpni = - lnet_find_best_lpni_on_net(sd, sd->sd_peer, + lnet_find_best_lpni_on_net(sd->sd_best_ni, sd->sd_dst_nid, + sd->sd_peer, sd->sd_best_ni->ni_net->net_id); /* if we're successful in selecting a peer_ni on the local From patchwork Thu Feb 27 21:17:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410519 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67F81138D for ; Thu, 27 Feb 2020 21:40:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 50A3324690 for ; Thu, 27 Feb 2020 21:40:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 50A3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EA811349B47; Thu, 27 Feb 2020 13:32:50 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8BB24348995 for ; Thu, 27 Feb 2020 13:21:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 028E891D6; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 00BE946A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:19 -0500 Message-Id: <1582838290-17243-572-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 571/622] lnet: Avoid comparing route to itself X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The first iteration of the route selection loop compares the first route in the list with itself. WC-bug-id: https://jira.whamcloud.com/browse/LU-12756 Lustre-commit: 2b8d9d12d182 ("LU-12756 lnet: Avoid comparing route to itself") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36535 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index c8266f0..45975d6 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1354,6 +1354,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_route = route; last_route = route; lp_best = lp; + best_gw_ni = lnet_find_best_lpni_on_net(NULL, + LNET_NID_ANY, + route->lr_gateway, + route->lr_lnet); + LASSERT(best_gw_ni); + continue; } /* no protection on below fields, but it's harmless */ From patchwork Thu Feb 27 21:17:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0B50924 for ; Thu, 27 Feb 2020 21:47:30 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B92E424690 for ; Thu, 27 Feb 2020 21:47:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B92E424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DCD634B47C; Thu, 27 Feb 2020 13:37:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CDE4934899B for ; Thu, 27 Feb 2020 13:21:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 04EA791D7; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 038F8496; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:20 -0500 Message-Id: <1582838290-17243-573-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 572/622] lustre: sysfs: use string helper like functions for sysfs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" For a very long time the Linux kernel has supported the function memparse() that allowed the passing in of memory sizes with the suffix set of K, M, G, T, P, E. Lustre adopted this approach with its proc / sysfs implementation. The difference being that lustre expanded this functionality to allow sizes with a fractional component, such as 1.5G for example. The code used to parse for the numerical value is heavily tied into the debugfs seq_file handling and stomps on the passed in buffer which you can't do with sysfs files. Similar functionality to what Lustre does today exist in newer linux kernels in the form of string helpers. Currently the string helpers only convert a numerical value to human readable format. A new function, string_to_size(), was created that takes a string and turns it into a numerical value. This enables the use of string helper suffixes i.e MiB, kB etc with the lustre tunables and we can now support 10 base numbers i.e MB, kB as well. Already string helper suffixes are used for debugfs files so I expect this to be adopted over time so it should be encouraged to use string_to_size() for newer lustre sysfs files. At the same time we want to perserve the original behavior of using the suffix set of K, M, G, T, P, E. To do this we create the function sysfs_memparse() that supports the new string helper suffixes as well as the older set of suffixes. This new code is also way simpler than what is currently done with the current code. WC-bug-id: https://jira.whamcloud.com/browse/LU-9091 Lustre-commit: d9e0c9f346d0 ("LU-9091 sysfs: use string helper like functions for sysfs") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/35658 Reviewed-by: Shaun Tancheff Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin --- fs/lustre/include/lprocfs_status.h | 4 + fs/lustre/lov/lproc_lov.c | 4 +- fs/lustre/mdc/lproc_mdc.c | 27 +++--- fs/lustre/obdclass/class_obd.c | 61 ++++++++++++ fs/lustre/obdclass/lprocfs_status.c | 179 ++++++++++++++++++++++++++++++++++++ fs/lustre/osc/lproc_osc.c | 27 ++---- 6 files changed, 271 insertions(+), 31 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index ac62560..22d7741 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -484,6 +485,9 @@ int lprocfs_write_u64_helper(const char __user *buffer, int lprocfs_write_frac_u64_helper(const char __user *buffer, unsigned long count, u64 *val, int mult); +int string_to_size(u64 *size, const char *buffer, size_t count); +int sysfs_memparse(const char *buffer, size_t count, u64 *val, + const char *defunit); char *lprocfs_find_named_value(const char *buffer, const char *name, size_t *count); void lprocfs_oh_tally(struct obd_histogram *oh, unsigned int value); diff --git a/fs/lustre/lov/lproc_lov.c b/fs/lustre/lov/lproc_lov.c index c528a8b..37ef084 100644 --- a/fs/lustre/lov/lproc_lov.c +++ b/fs/lustre/lov/lproc_lov.c @@ -57,8 +57,8 @@ static ssize_t stripesize_store(struct kobject *kobj, struct attribute *attr, u64 val; int rc; - rc = kstrtoull(buf, 10, &val); - if (rc) + rc = sysfs_memparse(buf, count, &val, "B"); + if (rc < 0) return rc; lov_fix_desc_stripe_size(&val); diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 454b69d..c438198 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -61,12 +61,19 @@ static ssize_t mdc_max_dirty_mb_seq_write(struct file *file, struct seq_file *sfl = file->private_data; struct obd_device *dev = sfl->private; struct client_obd *cli = &dev->u.cli; - __s64 pages_number; + char kernbuf[22] = ""; + u64 pages_number; int rc; - rc = lprocfs_write_frac_u64_helper(buffer, count, &pages_number, - 1 << (20 - PAGE_SHIFT)); - if (rc) + if (count >= sizeof(kernbuf)) + return -EINVAL; + + if (copy_from_user(kernbuf, buffer, count)) + return -EFAULT; + kernbuf[count] = 0; + + rc = sysfs_memparse(kernbuf, count, &pages_number, "MiB"); + if (rc < 0) return rc; /* MB -> pages */ @@ -111,6 +118,7 @@ static int mdc_cached_mb_seq_show(struct seq_file *m, void *v) struct obd_device *dev = sfl->private; struct client_obd *cli = &dev->u.cli; u64 pages_number; + const char *tmp; long rc; char kernbuf[128]; @@ -121,18 +129,13 @@ static int mdc_cached_mb_seq_show(struct seq_file *m, void *v) return -EFAULT; kernbuf[count] = 0; - buffer += lprocfs_find_named_value(kernbuf, "used_mb:", &count) - - kernbuf; - rc = lprocfs_write_frac_u64_helper(buffer, count, &pages_number, - 1 << (20 - PAGE_SHIFT)); - if (rc) + tmp = lprocfs_find_named_value(kernbuf, "used_mb:", &count); + rc = sysfs_memparse(tmp, count, &pages_number, "MiB"); + if (rc < 0) return rc; pages_number >>= PAGE_SHIFT; - if (pages_number < 0) - return -ERANGE; - rc = atomic_long_read(&cli->cl_lru_in_list) - pages_number; if (rc > 0) { struct lu_env *env; diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c index 0718fdb..d462317 100644 --- a/fs/lustre/obdclass/class_obd.c +++ b/fs/lustre/obdclass/class_obd.c @@ -524,6 +524,20 @@ static long obd_class_ioctl(struct file *filp, unsigned int cmd, .fops = &obd_psdev_fops, }; +#define test_string_to_size_one(value, result, def_unit) \ +({ \ + u64 __size; \ + int __ret; \ + \ + BUILD_BUG_ON(strlen(value) >= 23); \ + __ret = sysfs_memparse((value), (result), &__size, \ + (def_unit)); \ + if (__ret == 0 && (u64)result != __size) \ + CERROR("string_helper: size %llu != result %llu\n", \ + __size, (u64)result); \ + __ret; \ +}) + static int obd_init_checks(void) { u64 u64val, div64val; @@ -590,6 +604,53 @@ static int obd_init_checks(void) ret = -EINVAL; } + /* invalid string */ + ret = test_string_to_size_one("256B34", 256, "B"); + if (ret == 0) + CERROR("string_helpers: format should be number then units\n"); + ret = test_string_to_size_one("132OpQ", 132, "B"); + if (ret == 0) + CERROR("string_helpers: invalid units should be rejected\n"); + ret = 0; + + /* small values */ + test_string_to_size_one("0B", 0, "B"); + ret = test_string_to_size_one("1.82B", 1, "B"); + if (ret == 0) + CERROR("string_helpers: number string with 'B' and '.' should be invalid\n"); + ret = 0; + test_string_to_size_one("512B", 512, "B"); + test_string_to_size_one("1.067kB", 1067, "B"); + test_string_to_size_one("1.042KiB", 1067, "B"); + + /* Lustre special handling */ + test_string_to_size_one("16", 16777216, "MiB"); + test_string_to_size_one("65536", 65536, "B"); + test_string_to_size_one("128K", 131072, "B"); + test_string_to_size_one("1M", 1048576, "B"); + test_string_to_size_one("256.5G", 275414777856ULL, "GiB"); + + /* normal values */ + test_string_to_size_one("8.39MB", 8390000, "MiB"); + test_string_to_size_one("8.00MiB", 8388608, "MiB"); + test_string_to_size_one("256GB", 256000000, "GiB"); + test_string_to_size_one("238.731 GiB", 256335459385ULL, "GiB"); + + /* huge values */ + test_string_to_size_one("0.4TB", 400000000000ULL, "TiB"); + test_string_to_size_one("12.5TiB", 13743895347200ULL, "TiB"); + test_string_to_size_one("2PB", 2000000000000000ULL, "PiB"); + test_string_to_size_one("16PiB", 18014398509481984ULL, "PiB"); + + /* huge values should overflow */ + ret = test_string_to_size_one("1000EiB", 0, "EiB"); + if (ret != -EOVERFLOW) + CERROR("string_helpers: Failed to detect overflow\n"); + ret = test_string_to_size_one("1000EB", 0, "EiB"); + if (ret != -EOVERFLOW) + CERROR("string_helpers: Failed to detect overflow\n"); + ret = 0; + return ret; } diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 4fc35c5..325005d 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -217,6 +217,185 @@ static void obd_connect_data_seqprint(struct seq_file *m, ocd->ocd_maxmodrpcs); } +/** + * string_to_size - convert ASCII string representing a numerical + * value with optional units to 64-bit binary value + * + * @size: The numerical value extract out of @buffer + * @buffer: passed in string to parse + * @count: length of the @buffer + * + * This function returns a 64-bit binary value if @buffer contains a valid + * numerical string. The string is parsed to 3 significant figures after + * the decimal point. Support the string containing an optional units at + * the end which can be base 2 or base 10 in value. If no units are given + * the string is assumed to just a numerical value. + * + * Returns: @count if the string is successfully parsed, + * -errno on invalid input strings. Error values: + * + * - ``-EINVAL``: @buffer is not a proper numerical string + * - ``-EOVERFLOW``: results does not fit into 64 bits. + * - ``-E2BIG ``: @buffer is not large + */ +int string_to_size(u64 *size, const char *buffer, size_t count) +{ + /* For string_get_size() it can support values above exabytes, + * (ZiB, YiB) due to breaking the return value into a size and + * bulk size to avoid 64 bit overflow. We don't break the size + * up into block size units so we don't support ZiB or YiB. + */ + static const char *const units_10[] = { + "kB", "MB", "GB", "TB", "PB", "EB" + }; + static const char *const units_2[] = { + "KiB", "MiB", "GiB", "TiB", "PiB", "EiB" + }; + static const char *const *const units_str[] = { + [STRING_UNITS_2] = units_2, + [STRING_UNITS_10] = units_10, + }; + static const unsigned int coeff[] = { + [STRING_UNITS_10] = 1000, + [STRING_UNITS_2] = 1024, + }; + enum string_size_units unit; + u64 whole, blk_size = 1; + char kernbuf[22], *end; + size_t len = count; + int rc; + int i; + + if (count >= sizeof(kernbuf)) + return -E2BIG; + + *size = 0; + /* 'iB' is used for based 2 numbers. If @buffer contains only a 'B' + * or only numbers then we treat it as a direct number which doesn't + * matter if its STRING_UNITS_2 or STRING_UNIT_10. + */ + unit = strstr(buffer, "iB") ? STRING_UNITS_2 : STRING_UNITS_10; + i = unit == STRING_UNITS_2 ? ARRAY_SIZE(units_2) - 1 : + ARRAY_SIZE(units_10) - 1; + do { + end = strstr(buffer, units_str[unit][i]); + if (end) { + for (; i >= 0; i--) + blk_size *= coeff[unit]; + len -= strlen(end); + break; + } + } while (i--); + + /* as 'B' is a substring of all units, we need to handle it + * separately. + */ + if (!end) { + /* 'B' is only acceptable letter at this point */ + end = strchr(buffer, 'B'); + if (end) { + len -= strlen(end); + + if (count - len > 2 || + (count - len == 2 && strcmp(end, "B\n") != 0)) + return -EINVAL; + } + /* kstrtoull will error out if it has non digits */ + goto numbers_only; + } + + end = strchr(buffer, '.'); + if (end) { + /* need to limit 3 decimal places */ + char rem[4] = "000"; + u64 frac = 0; + size_t off; + + len = end - buffer; + end++; + + /* limit to 3 decimal points */ + off = min_t(size_t, 3, strspn(end, "0123456789")); + /* need to limit frac_d to a u32 */ + memcpy(rem, end, off); + rc = kstrtoull(rem, 10, &frac); + if (rc) + return rc; + + if (fls64(frac) + fls64(blk_size) - 1 > 64) + return -EOVERFLOW; + + frac *= blk_size; + do_div(frac, 1000); + *size += frac; + } +numbers_only: + snprintf(kernbuf, sizeof(kernbuf), "%.*s", (int)len, buffer); + rc = kstrtoull(kernbuf, 10, &whole); + if (rc) + return rc; + + if (whole != 0 && fls64(whole) + fls64(blk_size) - 1 > 64) + return -EOVERFLOW; + + *size += whole * blk_size; + + return count; +} +EXPORT_SYMBOL(string_to_size); + +/** + * sysfs_memparse - parse a ASCII string to 64-bit binary value, + * with optional units + * + * @buffer: kernel pointer to input string + * @count: number of bytes in the input @buffer + * @val: (output) binary value returned to caller + * @defunit: default unit suffix to use if none is provided + * + * Parses a string into a number. The number stored at @buffer is + * potentially suffixed with K, M, G, T, P, E. Besides these other + * valid suffix units are shown in the string_to_size() function. + * If the string lacks a suffix then the defunit is used. The defunit + * should be given as a binary unit (e.g. MiB) as that is the standard + * for tunables in Lustre. If no unit suffix is given (e.g. 'G'), then + * it is assumed to be in binary units. + * + * Returns: 0 on success or -errno on failure. + */ +int sysfs_memparse(const char *buffer, size_t count, u64 *val, + const char *defunit) +{ + char param[23]; + int rc; + + if (count >= sizeof(param)) + return -E2BIG; + + count = strlen(buffer); + if (count && buffer[count - 1] == '\n') + count--; + + if (!count) + return -EINVAL; + + if (isalpha(buffer[count - 1])) { + if (buffer[count - 1] != 'B') { + scnprintf(param, sizeof(param), "%.*siB", + (int)count, buffer); + } else { + memcpy(param, buffer, count + 1); + } + } else { + scnprintf(param, sizeof(param), "%.*s%s", (int)count, + buffer, defunit); + } + + rc = string_to_size(val, param, strlen(param)); + return rc < 0 ? rc : 0; +} +EXPORT_SYMBOL(sysfs_memparse); + int lprocfs_read_frac_helper(char *buffer, unsigned long count, long val, int mult) { diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index d545d1b..5cf2148 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -203,10 +203,10 @@ static ssize_t osc_cached_mb_seq_write(struct file *file, struct seq_file *m = file->private_data; struct obd_device *dev = m->private; struct client_obd *cli = &dev->u.cli; - long pages_number, rc; + u64 pages_number; + const char *tmp; + long rc; char kernbuf[128]; - int mult; - u64 val; if (count >= sizeof(kernbuf)) return -EINVAL; @@ -215,19 +215,12 @@ static ssize_t osc_cached_mb_seq_write(struct file *file, return -EFAULT; kernbuf[count] = 0; - mult = 1 << (20 - PAGE_SHIFT); - buffer += lprocfs_find_named_value(kernbuf, "used_mb:", &count) - - kernbuf; - rc = lprocfs_write_frac_u64_helper(buffer, count, &val, mult); - if (rc) + tmp = lprocfs_find_named_value(kernbuf, "used_mb:", &count); + rc = sysfs_memparse(tmp, count, &pages_number, "MiB"); + if (rc < 0) return rc; - if (val > LONG_MAX) - return -ERANGE; - pages_number = (long)val; - - if (pages_number < 0) - return -ERANGE; + pages_number >>= PAGE_SHIFT; rc = atomic_long_read(&cli->cl_lru_in_list) - pages_number; if (rc > 0) { @@ -277,11 +270,11 @@ static ssize_t cur_grant_bytes_store(struct kobject *kobj, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct client_obd *cli = &obd->u.cli; + u64 val; int rc; - unsigned long long val; - rc = kstrtoull(buffer, 10, &val); - if (rc) + rc = sysfs_memparse(buffer, count, &val, "MiB"); + if (rc < 0) return rc; /* this is only for shrinking grant */ From patchwork Thu Feb 27 21:17:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410621 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C66E138D for ; Thu, 27 Feb 2020 21:42:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8AB3246A1 for ; Thu, 27 Feb 2020 21:42:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8AB3246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BCE65348D7E; Thu, 27 Feb 2020 13:34:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 326313489A1 for ; Thu, 27 Feb 2020 13:21:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0872391D8; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0666A468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:21 -0500 Message-Id: <1582838290-17243-574-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 573/622] lustre: rename ops to owner X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown Now that portals_handle_ops contains only a char*, it is functioning primarily to identify the owner of each handle. So change the name to h_owner, and the type to const char*. Note: this h_owner is now quite different from the similar h_owner in the server code. When server code is merged the "med" pointer will be stored in the "mfd" and validated separately. WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: 1a9aafbf6317 ("LU-12542 handle: rename ops to owner") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/35798 Reviewed-by: Shaun Tancheff Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_handles.h | 12 +++--------- fs/lustre/ldlm/ldlm_lock.c | 8 +++----- fs/lustre/obdclass/genops.c | 10 +++------- fs/lustre/obdclass/lustre_handles.c | 15 +++++++-------- 4 files changed, 16 insertions(+), 29 deletions(-) diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h index 8f733fd..55f9a09 100644 --- a/fs/lustre/include/lustre_handles.h +++ b/fs/lustre/include/lustre_handles.h @@ -45,11 +45,6 @@ #include #include -struct portals_handle_ops { - /* hop_type is used for some debugging messages */ - char *hop_type; -}; - /* These handles are most easily used by having them appear at the very top of * whatever object that you want to make handles for. ie: * @@ -65,7 +60,7 @@ struct portals_handle_ops { struct portals_handle { struct list_head h_link; u64 h_cookie; - const struct portals_handle_ops *h_ops; + const char *h_owner; refcount_t h_ref; /* newly added fields to handle the RCU issue. -jxiong */ @@ -77,10 +72,9 @@ struct portals_handle { /* handles.c */ /* Add a handle to the hash table */ -void class_handle_hash(struct portals_handle *, - const struct portals_handle_ops *ops); +void class_handle_hash(struct portals_handle *, const char *h_owner); void class_handle_unhash(struct portals_handle *); -void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops); +void *class_handle2object(u64 cookie, const char *h_owner); int class_handle_init(void); void class_handle_cleanup(void); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 61bf028..2c19636 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -365,9 +365,7 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock) } } -static struct portals_handle_ops lock_handle_ops = { - .hop_type = "ldlm", -}; +static const char lock_handle_owner[] = "ldlm"; /** * @@ -407,7 +405,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource) lprocfs_counter_incr(ldlm_res_to_ns(resource)->ns_stats, LDLM_NSS_LOCKS); INIT_LIST_HEAD(&lock->l_handle.h_link); - class_handle_hash(&lock->l_handle, &lock_handle_ops); + class_handle_hash(&lock->l_handle, lock_handle_owner); lu_ref_init(&lock->l_reference); lu_ref_add(&lock->l_reference, "hash", lock); @@ -515,7 +513,7 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle, if (!lustre_handle_is_used(handle)) return NULL; - lock = class_handle2object(handle->cookie, &lock_handle_ops); + lock = class_handle2object(handle->cookie, lock_handle_owner); if (!lock) return NULL; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 15bea0d..0fbe03e 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -662,7 +662,7 @@ int obd_init_caches(void) return -ENOMEM; } -static struct portals_handle_ops export_handle_ops; +static const char export_handle_owner[] = "export"; /* map connection to client */ struct obd_export *class_conn2export(struct lustre_handle *conn) @@ -680,7 +680,7 @@ struct obd_export *class_conn2export(struct lustre_handle *conn) } CDEBUG(D_INFO, "looking for export cookie %#llx\n", conn->cookie); - export = class_handle2object(conn->cookie, &export_handle_ops); + export = class_handle2object(conn->cookie, export_handle_owner); return export; } EXPORT_SYMBOL(class_conn2export); @@ -732,10 +732,6 @@ static void class_export_destroy(struct obd_export *exp) kfree_rcu(exp, exp_handle.h_rcu); } -static struct portals_handle_ops export_handle_ops = { - .hop_type = "export", -}; - struct obd_export *class_export_get(struct obd_export *exp) { refcount_inc(&exp->exp_handle.h_ref); @@ -819,7 +815,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd, INIT_LIST_HEAD(&export->exp_req_replay_queue); INIT_LIST_HEAD_RCU(&export->exp_handle.h_link); INIT_LIST_HEAD(&export->exp_hp_rpcs); - class_handle_hash(&export->exp_handle, &export_handle_ops); + class_handle_hash(&export->exp_handle, export_handle_owner); spin_lock_init(&export->exp_lock); spin_lock_init(&export->exp_rpc_lock); spin_lock_init(&export->exp_bl_list_lock); diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index 99c68fe..6989a60 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -58,8 +58,7 @@ * Generate a unique 64bit cookie (hash) for a handle and insert it into * global (per-node) hash-table. */ -void class_handle_hash(struct portals_handle *h, - const struct portals_handle_ops *ops) +void class_handle_hash(struct portals_handle *h, const char *owner) { struct handle_bucket *bucket; @@ -85,7 +84,7 @@ void class_handle_hash(struct portals_handle *h, h->h_cookie = handle_base; spin_unlock(&handle_base_lock); - h->h_ops = ops; + h->h_owner = owner; spin_lock_init(&h->h_lock); bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK]; @@ -132,7 +131,7 @@ void class_handle_unhash(struct portals_handle *h) } EXPORT_SYMBOL(class_handle_unhash); -void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops) +void *class_handle2object(u64 cookie, const char *owner) { struct handle_bucket *bucket; struct portals_handle *h; @@ -147,14 +146,14 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops) rcu_read_lock(); list_for_each_entry_rcu(h, &bucket->head, h_link) { - if (h->h_cookie != cookie || h->h_ops != ops) + if (h->h_cookie != cookie || h->h_owner != owner) continue; spin_lock(&h->h_lock); if (likely(h->h_in != 0)) { refcount_inc(&h->h_ref); CDEBUG(D_INFO, "GET %s %p refcount=%d\n", - h->h_ops->hop_type, h, + h->h_owner, h, refcount_read(&h->h_ref)); retval = h; } @@ -201,8 +200,8 @@ static int cleanup_all_handles(void) spin_lock(&handle_hash[i].lock); list_for_each_entry_rcu(h, &handle_hash[i].head, h_link) { - CERROR("force clean handle %#llx addr %p ops %p\n", - h->h_cookie, h, h->h_ops); + CERROR("force clean handle %#llx addr %p owner %p\n", + h->h_cookie, h, h->h_owner); class_handle_unhash_nolock(h); rc++; From patchwork Thu Feb 27 21:17:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410745 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23D3A1580 for ; Thu, 27 Feb 2020 21:45:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 06E0B24690 for ; Thu, 27 Feb 2020 21:45:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 06E0B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B165034B0B9; Thu, 27 Feb 2020 13:36:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A85D3489A1 for ; Thu, 27 Feb 2020 13:21:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0AF8191D9; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0950346C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:22 -0500 Message-Id: <1582838290-17243-575-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 574/622] lustre: ldlm: simplify ldlm_ns_hash_defs[] X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown As the ldlm_ns_types are dense, we can use the type as the index to the array, rather than searching through the array for a match. We can also discard nsd_hops as all hash tables now use the same hops. This makes the table smaller and the code simpler. WC-bug-id: https://jira.whamcloud.com/browse/LU-8130 Lustre-commit: 416142145c9d ("LU-8130 ldlm: simplify ldlm_ns_hash_defs[]") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36220 Reviewed-by: Andreas Dilger Reviewed-by: Mike Pershin Reviewed-by: Neil Brown Signed-off-by: James Simmons --- fs/lustre/ldlm/ldlm_resource.c | 62 ++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 42 deletions(-) diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index d009d5d..9b24be7 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -522,55 +522,35 @@ static void ldlm_res_hop_put(struct cfs_hash *hs, struct hlist_node *hnode) .hs_put = ldlm_res_hop_put }; -struct ldlm_ns_hash_def { - enum ldlm_ns_type nsd_type; +static struct { /** hash bucket bits */ unsigned int nsd_bkt_bits; /** hash bits */ unsigned int nsd_all_bits; - /** hash operations */ - struct cfs_hash_ops *nsd_hops; -}; - -static struct ldlm_ns_hash_def ldlm_ns_hash_defs[] = { - { - .nsd_type = LDLM_NS_TYPE_MDC, +} ldlm_ns_hash_defs[] = { + [LDLM_NS_TYPE_MDC] = { .nsd_bkt_bits = 11, .nsd_all_bits = 16, - .nsd_hops = &ldlm_ns_hash_ops, }, - { - .nsd_type = LDLM_NS_TYPE_MDT, + [LDLM_NS_TYPE_MDT] = { .nsd_bkt_bits = 14, .nsd_all_bits = 21, - .nsd_hops = &ldlm_ns_hash_ops, }, - { - .nsd_type = LDLM_NS_TYPE_OSC, + [LDLM_NS_TYPE_OSC] = { .nsd_bkt_bits = 8, .nsd_all_bits = 12, - .nsd_hops = &ldlm_ns_hash_ops, }, - { - .nsd_type = LDLM_NS_TYPE_OST, + [LDLM_NS_TYPE_OST] = { .nsd_bkt_bits = 11, .nsd_all_bits = 17, - .nsd_hops = &ldlm_ns_hash_ops, }, - { - .nsd_type = LDLM_NS_TYPE_MGC, + [LDLM_NS_TYPE_MGC] = { .nsd_bkt_bits = 3, .nsd_all_bits = 4, - .nsd_hops = &ldlm_ns_hash_ops, }, - { - .nsd_type = LDLM_NS_TYPE_MGT, + [LDLM_NS_TYPE_MGT] = { .nsd_bkt_bits = 3, .nsd_all_bits = 4, - .nsd_hops = &ldlm_ns_hash_ops, - }, - { - .nsd_type = LDLM_NS_TYPE_UNKNOWN, }, }; @@ -594,7 +574,6 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, enum ldlm_ns_type ns_type) { struct ldlm_namespace *ns = NULL; - struct ldlm_ns_hash_def *nsd; int idx; int rc; @@ -606,15 +585,10 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, return NULL; } - for (idx = 0; ; idx++) { - nsd = &ldlm_ns_hash_defs[idx]; - if (nsd->nsd_type == LDLM_NS_TYPE_UNKNOWN) { - CERROR("Unknown type %d for ns %s\n", ns_type, name); - goto out_ref; - } - - if (nsd->nsd_type == ns_type) - break; + if (ns_type >= ARRAY_SIZE(ldlm_ns_hash_defs) || + ldlm_ns_hash_defs[ns_type].nsd_bkt_bits == 0) { + CERROR("Unknown type %d for ns %s\n", ns_type, name); + goto out_ref; } ns = kzalloc(sizeof(*ns), GFP_NOFS); @@ -622,11 +596,13 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, goto out_ref; ns->ns_rs_hash = cfs_hash_create(name, - nsd->nsd_all_bits, nsd->nsd_all_bits, - nsd->nsd_bkt_bits, 0, + ldlm_ns_hash_defs[ns_type].nsd_all_bits, + ldlm_ns_hash_defs[ns_type].nsd_all_bits, + ldlm_ns_hash_defs[ns_type].nsd_bkt_bits, + 0, CFS_HASH_MIN_THETA, CFS_HASH_MAX_THETA, - nsd->nsd_hops, + &ldlm_ns_hash_ops, CFS_HASH_DEPTH | CFS_HASH_BIGNAME | CFS_HASH_SPIN_BKTLOCK | @@ -634,7 +610,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, if (!ns->ns_rs_hash) goto out_ns; - ns->ns_bucket_bits = nsd->nsd_all_bits - nsd->nsd_bkt_bits; + ns->ns_bucket_bits = ldlm_ns_hash_defs[ns_type].nsd_all_bits - + ldlm_ns_hash_defs[ns_type].nsd_bkt_bits; + ns->ns_rs_buckets = kvmalloc(BIT(ns->ns_bucket_bits) * sizeof(ns->ns_rs_buckets[0]), GFP_KERNEL); From patchwork Thu Feb 27 21:17:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B941924 for ; Thu, 27 Feb 2020 21:45:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 341D724690 for ; Thu, 27 Feb 2020 21:45:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 341D724690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E63A434B0E8; Thu, 27 Feb 2020 13:36:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E00B83489A1 for ; Thu, 27 Feb 2020 13:21:17 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 0D5C991DA; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0C2F646D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:23 -0500 Message-Id: <1582838290-17243-576-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 575/622] lnet: prepare to make lnet_lnd const. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Preferred practice is for structs containing function pointers to be 'const'. Such structs are generally tempting attack vectors, and making them const allows linux to place them in read-only memory, thus reducing the attack surface. 'struct lnet_lnd' is mostly function pointers, but contains one writable field - a list_head. Rather than keeping registered lnds in a linked-list, we can place them in an array indexed by type - type numbers are at most 15 so this is not a burden. With these changes, no part of an lnet_lnd is ever modified. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 87a6bd0766da ("LU-12678 lnet: prepare to make lnet_lnd const.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36830 Reviewed-by: James Simmons Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 6 ++---- include/uapi/linux/lnet/nidstr.h | 2 ++ net/lnet/lnet/api-ni.c | 29 +++++++++++++++-------------- net/lnet/lnet/lo.c | 1 - 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 02ac5df..99ed87a 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -46,6 +46,7 @@ #include #include #include +#include /* Max payload size */ #define LNET_MAX_PAYLOAD LNET_MTU @@ -244,9 +245,6 @@ struct lnet_test_peer { struct lnet_ni; /* forward ref */ struct lnet_lnd { - /* fields managed by portals */ - struct list_head lnd_list; /* stash in the LND table */ - /* fields initialised by the LND */ u32 lnd_type; @@ -1133,7 +1131,7 @@ struct lnet { /* uniquely identifies this ni in this epoch */ u64 ln_interface_cookie; /* registered LNDs */ - struct list_head ln_lnds; + struct lnet_lnd *ln_lnds[NUM_LNDS]; /* test protocol compatibility flags */ int ln_testprotocompat; diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h index 43ec232..958ca8d 100644 --- a/include/uapi/linux/lnet/nidstr.h +++ b/include/uapi/linux/lnet/nidstr.h @@ -53,6 +53,8 @@ enum { /*MXLND = 12, removed v2_7_50_0-34-g8be9e41 */ GNILND = 13, GNIIPLND = 14, + + NUM_LNDS }; struct list_head; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0020ffd..cd95bdd 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -734,12 +734,12 @@ static void lnet_assert_wire_constants(void) struct lnet_lnd *lnd; /* holding lnd mutex */ - list_for_each_entry(lnd, &the_lnet.ln_lnds, lnd_list) { - if (lnd->lnd_type == type) - return lnd; - } + if (type >= NUM_LNDS) + return NULL; + lnd = the_lnet.ln_lnds[type]; + LASSERT(!lnd || lnd->lnd_type == type); - return NULL; + return lnd; } unsigned int @@ -757,7 +757,7 @@ static void lnet_assert_wire_constants(void) LASSERT(libcfs_isknown_lnd(lnd->lnd_type)); LASSERT(!lnet_find_lnd_by_type(lnd->lnd_type)); - list_add_tail(&lnd->lnd_list, &the_lnet.ln_lnds); + the_lnet.ln_lnds[lnd->lnd_type] = lnd; CDEBUG(D_NET, "%s LND registered\n", libcfs_lnd2str(lnd->lnd_type)); @@ -772,7 +772,7 @@ static void lnet_assert_wire_constants(void) LASSERT(lnet_find_lnd_by_type(lnd->lnd_type) == lnd); - list_del(&lnd->lnd_list); + the_lnet.ln_lnds[lnd->lnd_type] = NULL; CDEBUG(D_NET, "%s LND unregistered\n", libcfs_lnd2str(lnd->lnd_type)); mutex_unlock(&the_lnet.ln_lnd_mutex); @@ -2429,7 +2429,6 @@ int lnet_lib_init(void) } the_lnet.ln_refcount = 0; - INIT_LIST_HEAD(&the_lnet.ln_lnds); INIT_LIST_HEAD(&the_lnet.ln_net_zombie); INIT_LIST_HEAD(&the_lnet.ln_msg_resend); @@ -2459,16 +2458,18 @@ int lnet_lib_init(void) * * \pre lnet_lib_init() called with success. * \pre All LNet users called LNetNIFini() for matching LNetNIInit() calls. + * + * As this happens at module-unload, all lnds must already be unloaded, + * so they must already be unregistered. */ void lnet_lib_exit(void) { - struct lnet_lnd *lnd; - LASSERT(!the_lnet.ln_refcount); + int i; - while ((lnd = list_first_entry_or_null(&the_lnet.ln_lnds, - struct lnet_lnd, - lnd_list)) != NULL) - lnet_unregister_lnd(lnd); + LASSERT(!the_lnet.ln_refcount); + lnet_unregister_lnd(&the_lolnd); + for (i = 0; i < NUM_LNDS; i++) + LASSERT(!the_lnet.ln_lnds[i]); lnet_destroy_locks(); } diff --git a/net/lnet/lnet/lo.c b/net/lnet/lnet/lo.c index 350495f..c19a5b5 100644 --- a/net/lnet/lnet/lo.c +++ b/net/lnet/lnet/lo.c @@ -93,7 +93,6 @@ } struct lnet_lnd the_lolnd = { - .lnd_list = LIST_HEAD_INIT(the_lolnd.lnd_list), .lnd_type = LOLND, .lnd_startup = lolnd_startup, .lnd_shutdown = lolnd_shutdown, From patchwork Thu Feb 27 21:17:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14DFA138D for ; Thu, 27 Feb 2020 21:44:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F17C324690 for ; Thu, 27 Feb 2020 21:44:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F17C324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E951B34AF13; Thu, 27 Feb 2020 13:35:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42C103489AD for ; Thu, 27 Feb 2020 13:21:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1229191DB; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0EF5847C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:24 -0500 Message-Id: <1582838290-17243-577-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 576/622] lnet: discard struct ksock_peer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown struct ksock_peer is declared in a forward-ref, but never defined or used. Let's remove it, and change some spaces to TABs while we are there. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 179d50565e0b ("LU-12678 lnet: discard struct ksock_peer") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36835 Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 832bc08..2d4e8d59 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -264,10 +264,9 @@ struct ksock_nal_data { * received into either struct iovec or struct bio_vec fragments, depending on * what the header matched or whether the message needs forwarding. */ -struct ksock_conn; /* forward ref */ -struct ksock_peer_ni; /* forward ref */ -struct ksock_route; /* forward ref */ -struct ksock_proto; /* forward ref */ +struct ksock_conn; /* forward ref */ +struct ksock_route; /* forward ref */ +struct ksock_proto; /* forward ref */ struct ksock_tx { /* transmit packet */ struct list_head tx_list; /* queue on conn for transmission etc From patchwork Thu Feb 27 21:17:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55EC61580 for ; Thu, 27 Feb 2020 21:47:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E89B24690 for ; Thu, 27 Feb 2020 21:47:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E89B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1672434B4AD; Thu, 27 Feb 2020 13:37:35 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 849C33489B0 for ; Thu, 27 Feb 2020 13:21:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 136BC91DC; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 11B2446A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:25 -0500 Message-Id: <1582838290-17243-578-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 577/622] lnet: Avoid extra lnet_remotenet lookup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn We can keep track of the lnet_remotenet object associated with the "best" lnet_peer_net, and pass that lnet_remotenet directly to lnet_find_route_locked(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12756 Lustre-commit: 3812c54b9ca3 ("LU-12756 lnet: Avoid extra lnet_remotenet lookup") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36536 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 45975d6..03d629d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1324,23 +1324,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_route * -lnet_find_route_locked(struct lnet_net *net, u32 remote_net, +lnet_find_route_locked(struct lnet_remotenet *rnet, struct lnet_route **prev_route, struct lnet_peer_ni **gwni) { struct lnet_peer_ni *best_gw_ni = NULL; struct lnet_route *best_route; struct lnet_route *last_route; - struct lnet_remotenet *rnet; struct lnet_peer *lp_best; struct lnet_route *route; struct lnet_peer *lp; int rc; - rnet = lnet_find_rnet_locked(remote_net); - if (!rnet) - return NULL; - lp_best = NULL; best_route = NULL; last_route = NULL; @@ -1832,7 +1827,7 @@ struct lnet_ni * struct lnet_peer *lp; struct lnet_peer_net *lpn; struct lnet_peer_net *best_lpn = NULL; - struct lnet_remotenet *rnet; + struct lnet_remotenet *rnet, *best_rnet = NULL; struct lnet_route *best_route = NULL; struct lnet_route *last_route = NULL; struct lnet_peer_ni *lpni = NULL; @@ -1867,13 +1862,16 @@ struct lnet_ni * if (!rnet) continue; - if (!best_lpn) + if (!best_lpn) { best_lpn = lpn; + best_rnet = rnet; + } if (best_lpn->lpn_seq <= lpn->lpn_seq) continue; best_lpn = lpn; + best_rnet = rnet; } if (!best_lpn) { @@ -1892,8 +1890,8 @@ struct lnet_ni * return -EHOSTUNREACH; } - best_route = lnet_find_route_locked(NULL, best_lpn->lpn_net_id, - &last_route, &gwni); + best_route = lnet_find_route_locked(best_rnet, &last_route, + &gwni); if (!best_route) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), From patchwork Thu Feb 27 21:17:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410625 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A95C4924 for ; Thu, 27 Feb 2020 21:42:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9111324690 for ; Thu, 27 Feb 2020 21:42:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9111324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D0FB349EF5; Thu, 27 Feb 2020 13:34:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C99F03489B0 for ; Thu, 27 Feb 2020 13:21:18 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1580091DD; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 14788468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:26 -0500 Message-Id: <1582838290-17243-579-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 578/622] lnet: Remove unused vars in lnet_find_route_locked X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The lp and lp_best variables are not needed in lnet_find_route_locked(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12756 Lustre-commit: b129f7b1f76a ("LU-12756 lnet: Remove unused vars in lnet_find_route_locked") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36620 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 03d629d..b7990c9 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1331,24 +1331,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_peer_ni *best_gw_ni = NULL; struct lnet_route *best_route; struct lnet_route *last_route; - struct lnet_peer *lp_best; struct lnet_route *route; - struct lnet_peer *lp; int rc; - lp_best = NULL; best_route = NULL; last_route = NULL; list_for_each_entry(route, &rnet->lrn_routes, lr_list) { - lp = route->lr_gateway; - if (!lnet_is_route_alive(route)) continue; - if (!lp_best) { + if (!best_route) { best_route = route; last_route = route; - lp_best = lp; best_gw_ni = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, route->lr_gateway, @@ -1366,7 +1360,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, continue; best_route = route; - lp_best = lp; } *prev_route = last_route; From patchwork Thu Feb 27 21:17:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410543 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E9CC138D for ; Thu, 27 Feb 2020 21:40:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 775CF24690 for ; Thu, 27 Feb 2020 21:40:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 775CF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A57C734A87A; Thu, 27 Feb 2020 13:33:18 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1617D3489B9 for ; Thu, 27 Feb 2020 13:21:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 186BE91DE; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1740846C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:27 -0500 Message-Id: <1582838290-17243-580-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 579/622] lnet: Refactor lnet_compare_routes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Restrict lnet_compare_routes() to only comparing the lnet_route objects passed as arguments. This saves us from doing unnecessary calls to lnet_find_best_lpni_on_net(). Rename lnet_compare_peers to lnet_compare_gw_lpnis to better reflect what is done by this routine. WC-bug-id: https://jira.whamcloud.com/browse/LU-12756 Lustre-commit: e02287b4ef6a ("LU-12756 lnet: Refactor lnet_compare_routes") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36621 Reviewed-by: Alexandr Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 77 +++++++++++++++++++----------------------------- 1 file changed, 31 insertions(+), 46 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index b7990c9..269b2d5 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1137,7 +1137,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static int -lnet_compare_peers(struct lnet_peer_ni *p1, struct lnet_peer_ni *p2) +lnet_compare_gw_lpnis(struct lnet_peer_ni *p1, struct lnet_peer_ni *p2) { if (p1->lpni_txqnob < p2->lpni_txqnob) return 1; @@ -1267,60 +1267,26 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return lnet_select_peer_ni(lni, dst_nid, peer, peer_net); } +/* Compare route priorities and hop counts */ static int -lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2, - struct lnet_peer_ni **best_lpni) +lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2) { int r1_hops = (r1->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r1->lr_hops; int r2_hops = (r2->lr_hops == LNET_UNDEFINED_HOPS) ? 1 : r2->lr_hops; - struct lnet_peer *lp1 = r1->lr_gateway; - struct lnet_peer *lp2 = r2->lr_gateway; - struct lnet_peer_ni *lpni1; - struct lnet_peer_ni *lpni2; - int rc; - - lpni1 = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, lp1, - r1->lr_lnet); - lpni2 = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, lp2, - r2->lr_lnet); - LASSERT(lpni1 && lpni2); - if (r1->lr_priority < r2->lr_priority) { - *best_lpni = lpni1; + if (r1->lr_priority < r2->lr_priority) return 1; - } - if (r1->lr_priority > r2->lr_priority) { - *best_lpni = lpni2; + if (r1->lr_priority > r2->lr_priority) return -1; - } - if (r1_hops < r2_hops) { - *best_lpni = lpni1; + if (r1_hops < r2_hops) return 1; - } - if (r1_hops > r2_hops) { - *best_lpni = lpni2; + if (r1_hops > r2_hops) return -1; - } - - rc = lnet_compare_peers(lpni1, lpni2); - if (rc == 1) { - *best_lpni = lpni1; - return rc; - } else if (rc == -1) { - *best_lpni = lpni2; - return rc; - } - - if (r1->lr_seq - r2->lr_seq <= 0) { - *best_lpni = lpni1; - return 1; - } - *best_lpni = lpni2; - return -1; + return 0; } static struct lnet_route * @@ -1328,7 +1294,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_route **prev_route, struct lnet_peer_ni **gwni) { - struct lnet_peer_ni *best_gw_ni = NULL; + struct lnet_peer_ni *lpni, *best_gw_ni = NULL; struct lnet_route *best_route; struct lnet_route *last_route; struct lnet_route *route; @@ -1355,11 +1321,30 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (last_route->lr_seq - route->lr_seq < 0) last_route = route; - rc = lnet_compare_routes(route, best_route, &best_gw_ni); - if (rc < 0) + rc = lnet_compare_routes(route, best_route); + if (rc == -1) + continue; + + lpni = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, + route->lr_gateway, + route->lr_lnet); + LASSERT(lpni); + + if (rc == 1) { + best_route = route; + best_gw_ni = lpni; + continue; + } + + rc = lnet_compare_gw_lpnis(lpni, best_gw_ni); + if (rc == -1) continue; - best_route = route; + if (rc == 1 || route->lr_seq <= best_route->lr_seq) { + best_route = route; + best_gw_ni = lpni; + continue; + } } *prev_route = last_route; From patchwork Thu Feb 27 21:17:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9734A1580 for ; Thu, 27 Feb 2020 21:48:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FA5B24690 for ; Thu, 27 Feb 2020 21:48:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FA5B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFF3934A6A2; Thu, 27 Feb 2020 13:39:10 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 775AC3489B9 for ; Thu, 27 Feb 2020 13:21:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1B31991DF; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 19FFA46D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:28 -0500 Message-Id: <1582838290-17243-581-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 580/622] lustre: u_object: factor out extra per-bucket data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown The hash tables managed by lu_object store some extra information in each bucket in the hash table. This prevents the use of resizeable hash tables, so lu_site_init() goes to some trouble to try to guess a good hash size. There is no real need for the extra data to be closely associated with hash buckets. There is a small advantage as both the hash bucket and the extra information can then be protected by the same lock, but as these locks have low contention, that should rarely be noticed. The extra data is updated frequently and accessed rarely, such an lru list and a wait_queue head. There could just be a single copy of this data for the whole array, but on a many-cpu machine, that could become a contention bottle neck. So it makes sense keep multiple shards and combine them only when needed. It does not make sense to have many more copies than there are CPUs. This patch takes the extra data out of the hash table buckets and creates a separate array, which never has more entries than twice the number of possible cpus. As this extra data contains a wait_queue_head, which contains a spinlock, that lock is used to protect the other data (counter and lru list). The code currently uses a very simple hash to choose a hash-table bucket: (fid_seq(fid) + fid_oid(fid)) & (CFS_HASH_NBKT(hs) - 1) There is no documented reason for this and I cannot see any value in not using a general hash function. We can use hash_32() and hash_64() on the fid value with a random seed created for each lu_site. The hash_*() functions where picked over the jhash() functions since it performances way better. The lock ordering requires that a hash-table lock cannot be taken while an extra-data lock is held. This means that in lu_site_purge_objects() we much first remove objects from the lru (with the extra information locked) and then remove each one from the hash table. To ensure the object is not found between these two steps, the LU_OBJECT_HEARD_BANSHEE flag is set. As the extra info is now separate from the hash buckets, we cannot report statistic from both at the same time. I think the lru statistics are probably more useful than the hash-table statistics, so I have preserved the former and discarded the latter. When the hashtable becomes resizeable, those statistics will be irrelevant. As the lru and the hash table are now managed by different locks we need to be careful to prevent htable_lookup() finding an object that lu_site_purge_objects() is purging. To help with this we introduce a new lu_object flag to say that and object is being purged. Once set, the object will be quickly removed from the hash table, and is already removed from the lru. WC-bug-id: https://jira.whamcloud.com/browse/LU-8130 Lustre-commit: e6f7f8a7b349 ("LU-8130 lu_object: factor out extra per-bucket data") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/36216 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 13 +++- fs/lustre/obdclass/lu_object.c | 167 +++++++++++++++++++++++++---------------- 2 files changed, 113 insertions(+), 67 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index e92f12f..4608937 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -463,7 +463,12 @@ enum lu_object_header_flags { * Object is initialized, when object is found in cache, it may not be * initialized yet, the object allocator will initialize it. */ - LU_OBJECT_INITED = 2 + LU_OBJECT_INITED = 2, + /** + * Object is being purged, so mustn't be returned by + * htable_lookup() + */ + LU_OBJECT_PURGING = 3, }; enum lu_object_header_attr { @@ -553,6 +558,12 @@ struct lu_site { * objects hash table */ struct cfs_hash *ls_obj_hash; + /* + * buckets for summary data + */ + struct lu_site_bkt_data *ls_bkts; + int ls_bkt_cnt; + u32 ls_bkt_seed; /** * index of bucket on hash table while purging */ diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 38c04c7..7ea9948 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -43,6 +43,7 @@ #include #include +#include /* hash_long() */ #include @@ -58,11 +59,10 @@ struct lu_site_bkt_data { /** * LRU list, updated on each access to object. Protected by - * bucket lock of lu_site::ls_obj_hash. + * lsb_waitq.lock. * * "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are - * moved to the lu_site::ls_lru.prev (this is due to the non-existence - * of list_for_each_entry_safe_reverse()). + * moved to the lu_site::ls_lru.prev */ struct list_head lsb_lru; /** @@ -92,9 +92,11 @@ enum { #define LU_SITE_BITS_MAX 24 #define LU_SITE_BITS_MAX_CL 19 /** - * total 256 buckets, we don't want too many buckets because: - * - consume too much memory + * Max 256 buckets, we don't want too many buckets because: + * - consume too much memory (currently max 16K) * - avoid unbalanced LRU list + * With few cpus there is little gain from extra buckets, so + * we treat this as a maximum in lu_site_init(). */ #define LU_SITE_BKT_BITS 8 @@ -109,14 +111,27 @@ enum { static void lu_object_free(const struct lu_env *env, struct lu_object *o); static u32 ls_stats_read(struct lprocfs_stats *stats, int idx); +static u32 lu_fid_hash(const void *data, u32 seed) +{ + const struct lu_fid *fid = data; + + seed = hash_32(seed ^ fid->f_oid, 32); + seed ^= hash_64(fid->f_seq, 32); + return seed; +} + +static inline int lu_bkt_hash(struct lu_site *s, const struct lu_fid *fid) +{ + return lu_fid_hash(fid, s->ls_bkt_seed) & + (s->ls_bkt_cnt - 1); +} + wait_queue_head_t * lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid) { - struct cfs_hash_bd bd; struct lu_site_bkt_data *bkt; - cfs_hash_bd_get(site->ls_obj_hash, fid, &bd); - bkt = cfs_hash_bd_extra_get(site->ls_obj_hash, &bd); + bkt = &site->ls_bkts[lu_bkt_hash(site, fid)]; return &bkt->lsb_waitq; } EXPORT_SYMBOL(lu_site_wq_from_fid); @@ -155,7 +170,6 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) } cfs_hash_bd_get(site->ls_obj_hash, &top->loh_fid, &bd); - bkt = cfs_hash_bd_extra_get(site->ls_obj_hash, &bd); is_dying = lu_object_is_dying(top); if (!cfs_hash_bd_dec_and_lock(site->ls_obj_hash, &bd, &top->loh_ref)) { @@ -169,6 +183,7 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) * somebody may be waiting for this, currently only * used for cl_object, see cl_object_put_last(). */ + bkt = &site->ls_bkts[lu_bkt_hash(site, &top->loh_fid)]; wake_up_all(&bkt->lsb_waitq); } return; @@ -183,6 +198,9 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) o->lo_ops->loo_object_release(env, o); } + bkt = &site->ls_bkts[lu_bkt_hash(site, &top->loh_fid)]; + spin_lock(&bkt->lsb_waitq.lock); + /* don't use local 'is_dying' here because if was taken without lock * but here we need the latest actual value of it so check lu_object * directly here. @@ -190,6 +208,7 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) if (!lu_object_is_dying(top)) { LASSERT(list_empty(&top->loh_lru)); list_add_tail(&top->loh_lru, &bkt->lsb_lru); + spin_unlock(&bkt->lsb_waitq.lock); percpu_counter_inc(&site->ls_lru_len_counter); CDEBUG(D_INODE, "Add %p/%p to site lru. hash: %p, bkt: %p\n", orig, top, site->ls_obj_hash, bkt); @@ -199,22 +218,19 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) /* * If object is dying (will not be cached), then removed it - * from hash table and LRU. + * from hash table (it is already not on the LRU). * - * This is done with hash table and LRU lists locked. As the only + * This is done with hash table lists locked. As the only * way to acquire first reference to previously unreferenced - * object is through hash-table lookup (lu_object_find()), - * or LRU scanning (lu_site_purge()), that are done under hash-table - * and LRU lock, no race with concurrent object lookup is possible - * and we can safely destroy object below. + * object is through hash-table lookup (lu_object_find()) + * which is done under hash-table, no race with concurrent + * object lookup is possible and we can safely destroy object below. */ if (!test_and_set_bit(LU_OBJECT_UNHASHED, &top->loh_flags)) cfs_hash_bd_del_locked(site->ls_obj_hash, &bd, &top->loh_hash); + spin_unlock(&bkt->lsb_waitq.lock); cfs_hash_bd_unlock(site->ls_obj_hash, &bd, 1); - /* - * Object was already removed from hash and lru above, can - * kill it. - */ + /* Object was already removed from hash above, can kill it. */ lu_object_free(env, orig); } EXPORT_SYMBOL(lu_object_put); @@ -238,8 +254,10 @@ void lu_object_unhash(const struct lu_env *env, struct lu_object *o) if (!list_empty(&top->loh_lru)) { struct lu_site_bkt_data *bkt; + bkt = &site->ls_bkts[lu_bkt_hash(site, &top->loh_fid)]; + spin_lock(&bkt->lsb_waitq.lock); list_del_init(&top->loh_lru); - bkt = cfs_hash_bd_extra_get(obj_hash, &bd); + spin_unlock(&bkt->lsb_waitq.lock); percpu_counter_dec(&site->ls_lru_len_counter); } cfs_hash_bd_del_locked(obj_hash, &bd, &top->loh_hash); @@ -390,8 +408,6 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, struct lu_object_header *h; struct lu_object_header *temp; struct lu_site_bkt_data *bkt; - struct cfs_hash_bd bd; - struct cfs_hash_bd bd2; struct list_head dispose; int did_sth; unsigned int start = 0; @@ -409,7 +425,7 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, */ if (nr != ~0) start = s->ls_purge_start; - bnr = (nr == ~0) ? -1 : nr / (int)CFS_HASH_NBKT(s->ls_obj_hash) + 1; + bnr = (nr == ~0) ? -1 : nr / s->ls_bkt_cnt + 1; again: /* * It doesn't make any sense to make purge threads parallel, that can @@ -421,21 +437,21 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, goto out; did_sth = 0; - cfs_hash_for_each_bucket(s->ls_obj_hash, &bd, i) { - if (i < start) - continue; + for (i = start; i < s->ls_bkt_cnt ; i++) { count = bnr; - cfs_hash_bd_lock(s->ls_obj_hash, &bd, 1); - bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, &bd); + bkt = &s->ls_bkts[i]; + spin_lock(&bkt->lsb_waitq.lock); list_for_each_entry_safe(h, temp, &bkt->lsb_lru, loh_lru) { LASSERT(atomic_read(&h->loh_ref) == 0); - cfs_hash_bd_get(s->ls_obj_hash, &h->loh_fid, &bd2); - LASSERT(bd.bd_bucket == bd2.bd_bucket); + LINVRNT(lu_bkt_hash(s, &h->loh_fid) == i); - cfs_hash_bd_del_locked(s->ls_obj_hash, - &bd2, &h->loh_hash); + /* Cannot remove from hash under current spinlock, + * so set flag to stop object from being found + * by htable_lookup(). + */ + set_bit(LU_OBJECT_PURGING, &h->loh_flags); list_move(&h->loh_lru, &dispose); percpu_counter_dec(&s->ls_lru_len_counter); if (did_sth == 0) @@ -447,14 +463,16 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, if (count > 0 && --count == 0) break; } - cfs_hash_bd_unlock(s->ls_obj_hash, &bd, 1); + spin_unlock(&bkt->lsb_waitq.lock); cond_resched(); /* * Free everything on the dispose list. This is safe against * races due to the reasons described in lu_object_put(). */ - while ((h = list_first_entry_or_null( - &dispose, struct lu_object_header, loh_lru)) != NULL) { + while ((h = list_first_entry_or_null(&dispose, + struct lu_object_header, + loh_lru)) != NULL) { + cfs_hash_del(s->ls_obj_hash, &h->loh_fid, &h->loh_hash); list_del_init(&h->loh_lru); lu_object_free(env, lu_object_top(h)); lprocfs_counter_incr(s->ls_stats, LU_SS_LRU_PURGED); @@ -470,7 +488,7 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, goto again; } /* race on s->ls_purge_start, but nobody cares */ - s->ls_purge_start = i % CFS_HASH_NBKT(s->ls_obj_hash); + s->ls_purge_start = i % (s->ls_bkt_cnt - 1); out: return nr; } @@ -631,12 +649,29 @@ static struct lu_object *htable_lookup(struct lu_site *s, } h = container_of(hnode, struct lu_object_header, loh_hash); - cfs_hash_get(s->ls_obj_hash, hnode); - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); if (!list_empty(&h->loh_lru)) { + struct lu_site_bkt_data *bkt; + + bkt = &s->ls_bkts[lu_bkt_hash(s, &h->loh_fid)]; + spin_lock(&bkt->lsb_waitq.lock); + /* Might have just been moved to the dispose list, in which + * case LU_OBJECT_PURGING will be set. In that case, + * delete it from the hash table immediately. + * When lu_site_purge_objects() tried, it will find it + * isn't there, which is harmless. + */ + if (test_bit(LU_OBJECT_PURGING, &h->loh_flags)) { + spin_unlock(&bkt->lsb_waitq.lock); + cfs_hash_bd_del_locked(s->ls_obj_hash, bd, hnode); + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_MISS); + return ERR_PTR(-ENOENT); + } list_del_init(&h->loh_lru); + spin_unlock(&bkt->lsb_waitq.lock); percpu_counter_dec(&s->ls_lru_len_counter); } + cfs_hash_get(s->ls_obj_hash, hnode); + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); return lu_object_top(h); } @@ -721,8 +756,8 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, if (unlikely(OBD_FAIL_PRECHECK(OBD_FAIL_OBD_ZERO_NLINK_RACE))) lu_site_purge(env, s, -1); + bkt = &s->ls_bkts[lu_bkt_hash(s, f)]; cfs_hash_bd_get(hs, f, &bd); - bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, &bd); if (!(conf && conf->loc_flags & LOC_F_NEW)) { cfs_hash_bd_lock(hs, &bd, 1); o = htable_lookup(s, &bd, f, &version); @@ -1029,7 +1064,6 @@ static void lu_dev_add_linkage(struct lu_site *s, struct lu_device *d) int lu_site_init(struct lu_site *s, struct lu_device *top) { struct lu_site_bkt_data *bkt; - struct cfs_hash_bd bd; unsigned long bits; unsigned long i; char name[16]; @@ -1046,7 +1080,7 @@ int lu_site_init(struct lu_site *s, struct lu_device *top) for (bits = lu_htable_order(top); bits >= LU_SITE_BITS_MIN; bits--) { s->ls_obj_hash = cfs_hash_create(name, bits, bits, bits - LU_SITE_BKT_BITS, - sizeof(*bkt), 0, 0, + 0, 0, 0, &lu_site_hash_ops, CFS_HASH_SPIN_BKTLOCK | CFS_HASH_NO_ITEMREF | @@ -1062,16 +1096,31 @@ int lu_site_init(struct lu_site *s, struct lu_device *top) return -ENOMEM; } - cfs_hash_for_each_bucket(s->ls_obj_hash, &bd, i) { - bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, &bd); + s->ls_bkt_seed = prandom_u32(); + s->ls_bkt_cnt = max_t(long, 1 << LU_SITE_BKT_BITS, + 2 * num_possible_cpus()); + s->ls_bkt_cnt = roundup_pow_of_two(s->ls_bkt_cnt); + s->ls_bkts = kvmalloc_array(s->ls_bkt_cnt, sizeof(*bkt), + GFP_KERNEL | __GFP_ZERO); + if (!s->ls_bkts) { + cfs_hash_putref(s->ls_obj_hash); + s->ls_obj_hash = NULL; + s->ls_bkts = NULL; + return -ENOMEM; + } + + for (i = 0; i < s->ls_bkt_cnt; i++) { + bkt = &s->ls_bkts[i]; INIT_LIST_HEAD(&bkt->lsb_lru); init_waitqueue_head(&bkt->lsb_waitq); } s->ls_stats = lprocfs_alloc_stats(LU_SS_LAST_STAT, 0); if (!s->ls_stats) { + kvfree(s->ls_bkts); cfs_hash_putref(s->ls_obj_hash); s->ls_obj_hash = NULL; + s->ls_bkts = NULL; return -ENOMEM; } @@ -1119,6 +1168,8 @@ void lu_site_fini(struct lu_site *s) s->ls_obj_hash = NULL; } + kvfree(s->ls_bkts); + if (s->ls_top_dev) { s->ls_top_dev->ld_site = NULL; lu_ref_del(&s->ls_top_dev->ld_reference, "site-top", s); @@ -1878,37 +1929,21 @@ struct lu_site_stats { }; static void lu_site_stats_get(const struct lu_site *s, - struct lu_site_stats *stats, int populated) + struct lu_site_stats *stats) { - struct cfs_hash *hs = s->ls_obj_hash; - struct cfs_hash_bd bd; - unsigned int i; + int cnt = cfs_hash_size_get(s->ls_obj_hash); /* * percpu_counter_sum_positive() won't accept a const pointer * as it does modify the struct by taking a spinlock */ struct lu_site *s2 = (struct lu_site *)s; - stats->lss_busy += cfs_hash_size_get(hs) - + stats->lss_busy += cnt - percpu_counter_sum_positive(&s2->ls_lru_len_counter); - cfs_hash_for_each_bucket(hs, &bd, i) { - struct hlist_head *hhead; - cfs_hash_bd_lock(hs, &bd, 1); - stats->lss_total += cfs_hash_bd_count_get(&bd); - stats->lss_max_search = max((int)stats->lss_max_search, - cfs_hash_bd_depmax_get(&bd)); - if (!populated) { - cfs_hash_bd_unlock(hs, &bd, 1); - continue; - } - - cfs_hash_bd_for_each_hlist(hs, &bd, hhead) { - if (!hlist_empty(hhead)) - stats->lss_populated++; - } - cfs_hash_bd_unlock(hs, &bd, 1); - } + stats->lss_total += cnt; + stats->lss_max_search = 0; + stats->lss_populated = 0; } /* @@ -2201,7 +2236,7 @@ int lu_site_stats_print(const struct lu_site *s, struct seq_file *m) struct lu_site_stats stats; memset(&stats, 0, sizeof(stats)); - lu_site_stats_get(s, &stats, 1); + lu_site_stats_get(s, &stats); seq_printf(m, "%d/%d %d/%ld %d %d %d %d %d %d %d\n", stats.lss_busy, From patchwork Thu Feb 27 21:17:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410629 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDB9E17E0 for ; Thu, 27 Feb 2020 21:42:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B6669246A1 for ; Thu, 27 Feb 2020 21:42:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6669246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 67DF434AC62; Thu, 27 Feb 2020 13:34:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CFED23489B9 for ; Thu, 27 Feb 2020 13:21:19 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 1E449A140; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1CC09496; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:29 -0500 Message-Id: <1582838290-17243-582-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 581/622] lustre: llite: replace lli_trunc_sem X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown lli_trunc_sem can lead to a deadlock. vvp_io_read_start takes lli_trunc_sem, and can take mmap sem in the direct i/o case, via generic_file_read_iter->ll_direct_IO->get_user_pages_unlocked vvp_io_fault_start is called with mmap_sem held (taken in the kernel page fault code), and takes lli_trunc_sem. These aren't necessarily the same mmap_sem, but can be if you mmap a lustre file, then read into that mapped memory from the file. These are both 'down_read' calls on lli_trunc_sem so they don't directly conflict, but if vvp_io_setattr_start() is called to truncate the file between these, it does 'down_write' on lli_trunc_sem. As semaphores are queued, this down_write blocks subsequent reads. This means if the page fault has taken the mmap_sem, but not yet the lli_trunc_sem in vvp_io_fault_start, it will wait behind the lli_trunc_sem down_write from vvp_io_setattr_start. At the same time, vvp_io_read_start is holding the lli_trunc_sem and waiting for the mmap_sem, which will not be released because vvp_io_fault_start cannot get the lli_trunc_sem because the setattr 'down_write' operation is queued in front of it. Solve this by replacing with a hand-coded semaphore, using atomic counters and wait_var_event(). This allows a special down_read_nowait which ignores waiting down_write operations. This combined with waking up all waiters at once guarantees that down_read_nowait can always 'join' another down_read, guaranteeing our ability to take the semaphore twice for read and avoiding the deadlock. I'd like there to be a better way to fix this, but I haven't found it yet. WC-bug-id: https://jira.whamcloud.com/browse/LU-12460 Lustre-commit: e5914a61ac77 ("LU-12460 llite: replace lli_trunc_sem") Signed-off-by: NeilBrown Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35271 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 93 +++++++++++++++++++++++++++++++++++++++- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/vvp_io.c | 14 +++--- 3 files changed, 100 insertions(+), 9 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index def4df0..b7b418f 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -105,6 +105,16 @@ enum ll_file_flags { LLIF_PROJECT_INHERIT = 3, }; +/* See comment on trunc_sem_down_read_nowait */ +struct ll_trunc_sem { + /* when positive, this is a count of readers, when -1, it indicates + * the semaphore is held for write, and 0 is unlocked + */ + atomic_t ll_trunc_readers; + /* this tracks a count of waiting writers */ + atomic_t ll_trunc_waiters; +}; + struct ll_inode_info { u32 lli_inode_magic; @@ -178,7 +188,7 @@ struct ll_inode_info { struct { struct mutex lli_size_mutex; char *lli_symlink_name; - struct rw_semaphore lli_trunc_sem; + struct ll_trunc_sem lli_trunc_sem; struct range_lock_tree lli_write_tree; struct rw_semaphore lli_glimpse_sem; @@ -253,6 +263,87 @@ struct ll_inode_info { struct list_head lli_xattrs;/* ll_xattr_entry->xe_list */ }; +static inline void ll_trunc_sem_init(struct ll_trunc_sem *sem) +{ + atomic_set(&sem->ll_trunc_readers, 0); + atomic_set(&sem->ll_trunc_waiters, 0); +} + +/* This version of down read ignores waiting writers, meaning if the semaphore + * is already held for read, this down_read will 'join' that reader and also + * take the semaphore. + * + * This lets us avoid an unusual deadlock. + * + * We must take lli_trunc_sem in read mode on entry in to various i/o paths + * in Lustre, in order to exclude truncates. Some of these paths then need to + * take the mmap_sem, while still holding the trunc_sem. The problem is that + * page faults hold the mmap_sem when calling in to Lustre, and then must also + * take the trunc_sem to exclude truncate. + * + * This means the locking order for trunc_sem and mmap_sem is sometimes AB, + * sometimes BA. This is almost OK because in both cases, we take the trunc + * sem for read, so it doesn't block. + * + * However, if a write mode user (truncate, a setattr op) arrives in the + * middle of this, the second reader on the truncate_sem will wait behind that + * writer. + * + * So we have, on our truncate sem, in order (where 'reader' and 'writer' refer + * to the mode in which they take the semaphore): + * reader (holding mmap_sem, needs truncate_sem) + * writer + * reader (holding truncate sem, waiting for mmap_sem) + * + * And so the readers deadlock. + * + * The solution is this modified semaphore, where this down_read ignores + * waiting write operations, and all waiters are woken up at once, so readers + * using down_read_nowait cannot get stuck behind waiting writers, regardless + * of the order they arrived in. + * + * down_read_nowait is only used in the page fault case, where we already hold + * the mmap_sem. This is because otherwise repeated read and write operations + * (which take the truncate sem) could prevent a truncate from ever starting. + * This could still happen with page faults, but without an even more complex + * mechanism, this is unavoidable. + * + * LU-12460 + */ +static inline void trunc_sem_down_read_nowait(struct ll_trunc_sem *sem) +{ + wait_var_event(&sem->ll_trunc_readers, + atomic_inc_unless_negative(&sem->ll_trunc_readers)); +} + +static inline void trunc_sem_down_read(struct ll_trunc_sem *sem) +{ + wait_var_event(&sem->ll_trunc_readers, + atomic_read(&sem->ll_trunc_waiters) == 0 && + atomic_inc_unless_negative(&sem->ll_trunc_readers)); +} + +static inline void trunc_sem_up_read(struct ll_trunc_sem *sem) +{ + if (atomic_dec_return(&sem->ll_trunc_readers) == 0 && + atomic_read(&sem->ll_trunc_waiters)) + wake_up_var(&sem->ll_trunc_readers); +} + +static inline void trunc_sem_down_write(struct ll_trunc_sem *sem) +{ + atomic_inc(&sem->ll_trunc_waiters); + wait_var_event(&sem->ll_trunc_readers, + atomic_cmpxchg(&sem->ll_trunc_readers, 0, -1) == 0); + atomic_dec(&sem->ll_trunc_waiters); +} + +static inline void trunc_sem_up_write(struct ll_trunc_sem *sem) +{ + atomic_set(&sem->ll_trunc_readers, 0); + wake_up_var(&sem->ll_trunc_readers); +} + static inline u32 ll_layout_version_get(struct ll_inode_info *lli) { u32 gen; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 7e128f0..f083a90 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -971,7 +971,7 @@ void ll_lli_init(struct ll_inode_info *lli) } else { mutex_init(&lli->lli_size_mutex); lli->lli_symlink_name = NULL; - init_rwsem(&lli->lli_trunc_sem); + ll_trunc_sem_init(&lli->lli_trunc_sem); range_lock_tree_init(&lli->lli_write_tree); init_rwsem(&lli->lli_glimpse_sem); lli->lli_glimpse_time = ktime_set(0, 0); diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index b3f628c..259b14a 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -682,7 +682,7 @@ static int vvp_io_setattr_start(const struct lu_env *env, struct ll_inode_info *lli = ll_i2info(inode); if (cl_io_is_trunc(io)) { - down_write(&lli->lli_trunc_sem); + trunc_sem_down_write(&lli->lli_trunc_sem); inode_lock(inode); inode_dio_wait(inode); } else { @@ -708,7 +708,7 @@ static void vvp_io_setattr_end(const struct lu_env *env, */ vvp_do_vmtruncate(inode, io->u.ci_setattr.sa_attr.lvb_size); inode_unlock(inode); - up_write(&lli->lli_trunc_sem); + trunc_sem_up_write(&lli->lli_trunc_sem); } else { inode_unlock(inode); } @@ -747,7 +747,7 @@ static int vvp_io_read_start(const struct lu_env *env, CDEBUG(D_VFSTRACE, "read: -> [%lli, %lli)\n", pos, pos + cnt); - down_read(&lli->lli_trunc_sem); + trunc_sem_down_read(&lli->lli_trunc_sem); if (io->ci_async_readahead) { file_accessed(file); @@ -1076,7 +1076,7 @@ static int vvp_io_write_start(const struct lu_env *env, size_t written = 0; ssize_t result = 0; - down_read(&lli->lli_trunc_sem); + trunc_sem_down_read(&lli->lli_trunc_sem); if (!can_populate_pages(env, io, inode)) return 0; @@ -1178,7 +1178,7 @@ static void vvp_io_rw_end(const struct lu_env *env, struct inode *inode = vvp_object_inode(ios->cis_obj); struct ll_inode_info *lli = ll_i2info(inode); - up_read(&lli->lli_trunc_sem); + trunc_sem_up_read(&lli->lli_trunc_sem); } static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) @@ -1243,7 +1243,7 @@ static int vvp_io_fault_start(const struct lu_env *env, loff_t size; pgoff_t last_index; - down_read(&lli->lli_trunc_sem); + trunc_sem_down_read_nowait(&lli->lli_trunc_sem); /* offset of the last byte on the page */ offset = cl_offset(obj, fio->ft_index + 1) - 1; @@ -1400,7 +1400,7 @@ static void vvp_io_fault_end(const struct lu_env *env, CLOBINVRNT(env, ios->cis_io->ci_obj, vvp_object_invariant(ios->cis_io->ci_obj)); - up_read(&lli->lli_trunc_sem); + trunc_sem_up_read(&lli->lli_trunc_sem); } static int vvp_io_fsync_start(const struct lu_env *env, From patchwork Thu Feb 27 21:17:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410547 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32984138D for ; Thu, 27 Feb 2020 21:41:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1B62324690 for ; Thu, 27 Feb 2020 21:41:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B62324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 43996349962; Thu, 27 Feb 2020 13:33:22 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 340983489CC for ; Thu, 27 Feb 2020 13:21:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 20B86A141; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1F9E546A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:30 -0500 Message-Id: <1582838290-17243-583-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 582/622] lnet: Fix source specified route selection X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If lnet_send() is called with a specific src_nid, but rtr_nid == LNET_NID_ANY and the message needs to be routed, then we need to ensure that the lnet_peer_ni of our next hop is on the same network as the lnet_ni associated with the src_nid. Otherwise we may end up choosing an lnet_peer_ni that cannot be reached from the specified source. WC-bug-id: https://jira.whamcloud.com/browse/LU-12919 Lustre-commit: f0aa632d4255 ("LU-12919 lnet: Fix source specified route selection") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36622 Reviewed-by: Alexandr Boyko Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 269b2d5..ca292a6 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1290,7 +1290,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static struct lnet_route * -lnet_find_route_locked(struct lnet_remotenet *rnet, +lnet_find_route_locked(struct lnet_remotenet *rnet, u32 src_net, struct lnet_route **prev_route, struct lnet_peer_ni **gwni) { @@ -1299,6 +1299,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_route *last_route; struct lnet_route *route; int rc; + u32 restrict_net; + u32 any_net = LNET_NIDNET(LNET_NID_ANY); best_route = NULL; last_route = NULL; @@ -1306,14 +1308,23 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!lnet_is_route_alive(route)) continue; + /* If the src_net is specified then we need to find an lpni + * on that network + */ + restrict_net = src_net == any_net ? route->lr_lnet : src_net; if (!best_route) { - best_route = route; - last_route = route; - best_gw_ni = lnet_find_best_lpni_on_net(NULL, - LNET_NID_ANY, - route->lr_gateway, - route->lr_lnet); - LASSERT(best_gw_ni); + lpni = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, + route->lr_gateway, + restrict_net); + if (lpni) { + best_route = route; + last_route = route; + best_gw_ni = lpni; + } else { + CERROR("Gateway %s does not have a peer NI on net %s\n", + libcfs_nid2str(route->lr_gateway->lp_primary_nid), + libcfs_net2str(restrict_net)); + } continue; } @@ -1327,8 +1338,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, lpni = lnet_find_best_lpni_on_net(NULL, LNET_NID_ANY, route->lr_gateway, - route->lr_lnet); - LASSERT(lpni); + restrict_net); + if (!lpni) { + CERROR("Gateway %s does not have a peer NI on net %s\n", + libcfs_nid2str(route->lr_gateway->lp_primary_nid), + libcfs_net2str(restrict_net)); + continue; + } if (rc == 1) { best_route = route; @@ -1868,8 +1884,9 @@ struct lnet_ni * return -EHOSTUNREACH; } - best_route = lnet_find_route_locked(best_rnet, &last_route, - &gwni); + best_route = lnet_find_route_locked(best_rnet, + LNET_NIDNET(src_nid), + &last_route, &gwni); if (!best_route) { CERROR("no route to %s from %s\n", libcfs_nid2str(dst_nid), From patchwork Thu Feb 27 21:17:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A7F8924 for ; Thu, 27 Feb 2020 21:47:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3350A24690 for ; Thu, 27 Feb 2020 21:47:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3350A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9E02734B4C7; Thu, 27 Feb 2020 13:37:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A1773489D0 for ; Thu, 27 Feb 2020 13:21:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 23C8FA142; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2262E468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:31 -0500 Message-Id: <1582838290-17243-584-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 583/622] lustre: uapi: turn struct lustre_nfs_fid to userland fhandle X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Quentin Bouget Rename struct lustre_nfs_fid to struct lustre_file_handle and move it to UAPI header lustre_user.h so we can use it with the fhandle API such as name_to_handle_at(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12806 Lustre-commit: 7ff384eee194 ("LU-12806 llapi: use name_to_handle_at in llapi_fd2fid") Signed-off-by: Quentin Bouget Reviewed-on: https://review.whamcloud.com/36292 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_nfs.c | 23 +++++++++-------------- include/uapi/linux/lustre/lustre_user.h | 6 ++++++ 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index 2ac5ad9..a57ab51 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -110,11 +110,6 @@ struct inode *search_inode_for_lustre(struct super_block *sb, return inode; } -struct lustre_nfs_fid { - struct lu_fid lnf_child; - struct lu_fid lnf_parent; -}; - static struct dentry * ll_iget_for_nfs(struct super_block *sb, struct lu_fid *fid, struct lu_fid *parent) @@ -177,8 +172,8 @@ struct lustre_nfs_fid { static int ll_encode_fh(struct inode *inode, u32 *fh, int *plen, struct inode *parent) { - int fileid_len = sizeof(struct lustre_nfs_fid) / 4; - struct lustre_nfs_fid *nfs_fid = (void *)fh; + int fileid_len = sizeof(struct lustre_file_handle) / 4; + struct lustre_file_handle *lfh = (void *)fh; CDEBUG(D_INFO, "%s: encoding for (" DFID ") maxlen=%d minlen=%d\n", ll_i2sbi(inode)->ll_fsname, @@ -189,11 +184,11 @@ static int ll_encode_fh(struct inode *inode, u32 *fh, int *plen, return FILEID_INVALID; } - nfs_fid->lnf_child = *ll_inode2fid(inode); + lfh->lfh_child = *ll_inode2fid(inode); if (parent) - nfs_fid->lnf_parent = *ll_inode2fid(parent); + lfh->lfh_parent = *ll_inode2fid(parent); else - fid_zero(&nfs_fid->lnf_parent); + fid_zero(&lfh->lfh_parent); *plen = fileid_len; return FILEID_LUSTRE; @@ -264,23 +259,23 @@ static int ll_get_name(struct dentry *dentry, char *name, static struct dentry *ll_fh_to_dentry(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { - struct lustre_nfs_fid *nfs_fid = (struct lustre_nfs_fid *)fid; + struct lustre_file_handle *lfh = (struct lustre_file_handle *)fid; if (fh_type != FILEID_LUSTRE) return ERR_PTR(-EPROTO); - return ll_iget_for_nfs(sb, &nfs_fid->lnf_child, &nfs_fid->lnf_parent); + return ll_iget_for_nfs(sb, &lfh->lfh_child, &lfh->lfh_parent); } static struct dentry *ll_fh_to_parent(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { - struct lustre_nfs_fid *nfs_fid = (struct lustre_nfs_fid *)fid; + struct lustre_file_handle *lfh = (struct lustre_file_handle *)fid; if (fh_type != FILEID_LUSTRE) return ERR_PTR(-EPROTO); - return ll_iget_for_nfs(sb, &nfs_fid->lnf_parent, NULL); + return ll_iget_for_nfs(sb, &lfh->lfh_parent, NULL); } int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 12b1f78..1c36114 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -164,6 +164,12 @@ static inline bool fid_is_zero(const struct lu_fid *fid) return !fid->f_seq && !fid->f_oid; } +/* The data name_to_handle_at() places in a struct file_handle (at f_handle) */ +struct lustre_file_handle { + struct lu_fid lfh_child; + struct lu_fid lfh_parent; +}; + struct ost_layout { __u32 ol_stripe_size; __u32 ol_stripe_count; From patchwork Thu Feb 27 21:17:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25B0A924 for ; Thu, 27 Feb 2020 21:47:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08D5B24690 for ; Thu, 27 Feb 2020 21:47:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08D5B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA2D634B420; Thu, 27 Feb 2020 13:37:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E28DF3489D0 for ; Thu, 27 Feb 2020 13:21:20 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 26459A143; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 251D046C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:32 -0500 Message-Id: <1582838290-17243-585-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 584/622] lustre: uapi: LU-12521 llapi: add separate fsname and instance API X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger For Lustre the kernel internal cfg instance is represented by a 16 numeric value very similar but not an UUID. This value is exposed to user land since this value is used to generate the sysfs directory tree to represent virtual devices. Expose this fixed value for kernel and user land use. WC-bug-id: https://jira.whamcloud.com/browse/LU-12521 Lustre-commit: 00d14521ca1c ("LU-12521 llapi: add separate fsname and instance API") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/35451 Reviewed-by: Olaf Faaland-LLNL Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/obd_config.c | 2 +- include/uapi/linux/lustre/lustre_user.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c index 97cb8c1..0ccdf5f 100644 --- a/fs/lustre/obdclass/obd_config.c +++ b/fs/lustre/obdclass/obd_config.c @@ -1374,7 +1374,7 @@ int class_config_llog_handler(const struct lu_env *env, lcfg->lcfg_command != LCFG_SPTLRPC_CONF && LUSTRE_CFG_BUFLEN(lcfg, 0) > 0) { inst_len = LUSTRE_CFG_BUFLEN(lcfg, 0) + - sizeof(clli->cfg_instance) * 2 + 4; + LUSTRE_MAXINSTANCE + 4; inst_name = kasprintf(GFP_NOFS, "%s-%px", lustre_cfg_string(lcfg, 0), clli->cfg_instance); diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 1c36114..08589e6 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -829,6 +829,7 @@ static inline char *obd_uuid2str(const struct obd_uuid *uuid) } #define LUSTRE_MAXFSNAME 8 +#define LUSTRE_MAXINSTANCE 16 /* Extract fsname from uuid (or target name) of a target * e.g. (myfs-OST0007_UUID -> myfs) From patchwork Thu Feb 27 21:17:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0A9517E0 for ; Thu, 27 Feb 2020 21:42:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A974E24690 for ; Thu, 27 Feb 2020 21:42:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A974E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C7B1834AD20; Thu, 27 Feb 2020 13:34:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2F2913489D7 for ; Thu, 27 Feb 2020 13:21:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 28EB9A144; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 27D1746D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:33 -0500 Message-Id: <1582838290-17243-586-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 585/622] lnet: socklnd: initialize the_ksocklnd at compile-time. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown All other lnds initialize this struct at compile-time. It is best for socklnd to do so too. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: b30930a242c6 ("LU-12678 socklnd: initialize the_ksocklnd at compile-time.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36831 Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 9a19a3f..016e005 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2804,6 +2804,18 @@ static void __exit ksocklnd_exit(void) lnet_unregister_lnd(&the_ksocklnd); } +static struct lnet_lnd the_ksocklnd = { + .lnd_type = SOCKLND, + .lnd_startup = ksocknal_startup, + .lnd_shutdown = ksocknal_shutdown, + .lnd_ctl = ksocknal_ctl, + .lnd_send = ksocknal_send, + .lnd_recv = ksocknal_recv, + .lnd_notify_peer_down = ksocknal_notify_gw_down, + .lnd_query = ksocknal_query, + .lnd_accept = ksocknal_accept, +}; + static int __init ksocklnd_init(void) { int rc; @@ -2812,17 +2824,6 @@ static int __init ksocklnd_init(void) BUILD_BUG_ON(SOCKLND_CONN_NTYPES > 4); BUILD_BUG_ON(SOCKLND_CONN_ACK != SOCKLND_CONN_BULK_IN); - /* initialize the_ksocklnd */ - the_ksocklnd.lnd_type = SOCKLND; - the_ksocklnd.lnd_startup = ksocknal_startup; - the_ksocklnd.lnd_shutdown = ksocknal_shutdown; - the_ksocklnd.lnd_ctl = ksocknal_ctl; - the_ksocklnd.lnd_send = ksocknal_send; - the_ksocklnd.lnd_recv = ksocknal_recv; - the_ksocklnd.lnd_notify_peer_down = ksocknal_notify_gw_down; - the_ksocklnd.lnd_query = ksocknal_query; - the_ksocklnd.lnd_accept = ksocknal_accept; - rc = ksocknal_tunables_init(); if (rc) return rc; From patchwork Thu Feb 27 21:17:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410551 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 88CF8924 for ; Thu, 27 Feb 2020 21:41:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7166824690 for ; Thu, 27 Feb 2020 21:41:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7166824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A6B034A8D3; Thu, 27 Feb 2020 13:33:26 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 703063489DD for ; Thu, 27 Feb 2020 13:21:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2BAD8A145; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2A94C47C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:34 -0500 Message-Id: <1582838290-17243-587-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 586/622] lnet: remove locking protection ln_testprotocompat X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown lnet_net_lock(LNET_LOCK_EX) is a heavy-weight lock that is not necessary here. The bits in this field are only set rarely - via an ioctl - and the pattern for reading and clearing them exactly matches test_and_clear_bit(). So change the field to "unsigned long" (so test_and_clear_bit() can be used), and use test_and_clear_bit(), discarding all other locking. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 624364420970 ("LU-12678 lnet: remove locking protection ln_testprotocompat") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36856 Reviewed-by: Alexey Lyashkov Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 2 +- net/lnet/klnds/socklnd/socklnd_proto.c | 17 ++++------------- net/lnet/lnet/acceptor.c | 11 +++-------- net/lnet/lnet/api-ni.c | 2 -- 4 files changed, 8 insertions(+), 24 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 99ed87a..9055da9 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -1134,7 +1134,7 @@ struct lnet { struct lnet_lnd *ln_lnds[NUM_LNDS]; /* test protocol compatibility flags */ - int ln_testprotocompat; + unsigned long ln_testprotocompat; /* * 0 - load the NIs from the mod params diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c index 887ed2d..195c44f 100644 --- a/net/lnet/klnds/socklnd/socklnd_proto.c +++ b/net/lnet/klnds/socklnd/socklnd_proto.c @@ -484,16 +484,11 @@ if (the_lnet.ln_testprotocompat) { /* single-shot proto check */ - lnet_net_lock(LNET_LOCK_EX); - if (the_lnet.ln_testprotocompat & 1) { + if (test_and_clear_bit(0, &the_lnet.ln_testprotocompat)) hmv->version_major++; /* just different! */ - the_lnet.ln_testprotocompat &= ~1; - } - if (the_lnet.ln_testprotocompat & 2) { + + if (test_and_clear_bit(1, &the_lnet.ln_testprotocompat)) hmv->magic = LNET_PROTO_MAGIC; - the_lnet.ln_testprotocompat &= ~2; - } - lnet_net_unlock(LNET_LOCK_EX); } hdr->src_nid = cpu_to_le64(hello->kshm_src_nid); @@ -541,12 +536,8 @@ if (the_lnet.ln_testprotocompat) { /* single-shot proto check */ - lnet_net_lock(LNET_LOCK_EX); - if (the_lnet.ln_testprotocompat & 1) { + if (test_and_clear_bit(0, &the_lnet.ln_testprotocompat)) hello->kshm_version++; /* just different! */ - the_lnet.ln_testprotocompat &= ~1; - } - lnet_net_unlock(LNET_LOCK_EX); } rc = lnet_sock_write(sock, hello, offsetof(struct ksock_hello_msg, kshm_ips), diff --git a/net/lnet/lnet/acceptor.c b/net/lnet/lnet/acceptor.c index acd1d75..c6a1835 100644 --- a/net/lnet/lnet/acceptor.c +++ b/net/lnet/lnet/acceptor.c @@ -174,16 +174,11 @@ if (the_lnet.ln_testprotocompat) { /* single-shot proto check */ - lnet_net_lock(LNET_LOCK_EX); - if (the_lnet.ln_testprotocompat & 4) { + if (test_and_clear_bit(2, &the_lnet.ln_testprotocompat)) cr.acr_version++; - the_lnet.ln_testprotocompat &= ~4; - } - if (the_lnet.ln_testprotocompat & 8) { + + if (test_and_clear_bit(3, &the_lnet.ln_testprotocompat)) cr.acr_magic = LNET_PROTO_MAGIC; - the_lnet.ln_testprotocompat &= ~8; - } - lnet_net_unlock(LNET_LOCK_EX); } rc = lnet_sock_write(sock, &cr, sizeof(cr), accept_timeout); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index cd95bdd..0ca8bef 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3842,9 +3842,7 @@ u32 lnet_get_dlc_seq_locked(void) return 0; case IOC_LIBCFS_TESTPROTOCOMPAT: - lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_testprotocompat = data->ioc_flags; - lnet_net_unlock(LNET_LOCK_EX); return 0; case IOC_LIBCFS_LNET_FAULT: From patchwork Thu Feb 27 21:17:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8545A924 for ; Thu, 27 Feb 2020 21:47:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6DE1A24690 for ; Thu, 27 Feb 2020 21:47:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DE1A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E084F34A1EC; Thu, 27 Feb 2020 13:37:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D0D5F21FD3A for ; Thu, 27 Feb 2020 13:21:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 2E842A146; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2D65446A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:35 -0500 Message-Id: <1582838290-17243-588-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 587/622] lustre: ptlrpc: suppress connection restored message X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev if that happens on idling connection. Fixes: 4b102da53ad ("lustre: ptlrpc: idle connections can disconnect") WC-bug-id: https://jira.whamcloud.com/browse/LU-13098 Lustre-commit: 7aa58847b94d ("LU-13098 ptlrpc: supress connection restored message") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/37086 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 8 ++++++-- fs/lustre/ptlrpc/import.c | 25 ++++++++++++++++--------- 2 files changed, 22 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 501a896..5d548a6 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -304,8 +304,12 @@ struct obd_import { imp_connect_tried:1, /* connected but not FULL yet */ imp_connected:1, - /* grant shrink disabled */ - imp_grant_shrink_disabled:1; + /* grant shrink disabled */ + imp_grant_shrink_disabled:1, + /* to suppress LCONSOLE() at + * conn.restore + */ + imp_was_idle:1; u32 imp_connect_op; u32 imp_idle_timeout; diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 028dd65..23dac39 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1519,21 +1519,22 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp) import_set_state(imp, LUSTRE_IMP_RECOVER); if (imp->imp_state == LUSTRE_IMP_RECOVER) { - CDEBUG(D_HA, "reconnected to %s@%s\n", - obd2cli_tgt(imp->imp_obd), - imp->imp_connection->c_remote_uuid.uuid); + struct ptlrpc_connection *conn = imp->imp_connection; rc = ptlrpc_resend(imp); if (rc) goto out; ptlrpc_activate_import(imp, true); - deuuidify(obd2cli_tgt(imp->imp_obd), NULL, - &target_start, &target_len); - LCONSOLE_INFO("%s: Connection restored to %.*s (at %s)\n", - imp->imp_obd->obd_name, - target_len, target_start, - obd_import_nid2str(imp)); + CDEBUG_LIMIT(imp->imp_was_idle ? + imp->imp_idle_debug : D_CONSOLE, + "%s: Connection restored to %s (at %s)\n", + imp->imp_obd->obd_name, + obd_uuid2str(&conn->c_remote_uuid), + obd_import_nid2str(imp)); + spin_lock(&imp->imp_lock); + imp->imp_was_idle = 0; + spin_unlock(&imp->imp_lock); } if (imp->imp_state == LUSTRE_IMP_FULL) { @@ -1749,6 +1750,12 @@ int ptlrpc_disconnect_and_idle_import(struct obd_import *imp) CDEBUG_LIMIT(imp->imp_idle_debug, "%s: disconnect after %llus idle\n", imp->imp_obd->obd_name, ktime_get_real_seconds() - imp->imp_last_reply_time); + + /* don't make noise at reconnection */ + spin_lock(&imp->imp_lock); + imp->imp_was_idle = 1; + spin_unlock(&imp->imp_lock); + req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret; ptlrpcd_add_req(req); From patchwork Thu Feb 27 21:17:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410555 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D563C138D for ; Thu, 27 Feb 2020 21:41:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BE15124690 for ; Thu, 27 Feb 2020 21:41:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE15124690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F365534A905; Thu, 27 Feb 2020 13:33:29 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 32FFF21FD3C for ; Thu, 27 Feb 2020 13:21:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 31BF1A147; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 303FA468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:36 -0500 Message-Id: <1582838290-17243-589-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 588/622] lustre: llite: fix deadlock in ll_update_lsm_md() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Deadlock may happen in in following senario: a lookup process called ll_update_lsm_md(), it found lli->lli_lsm_md is NULL, then down_write(&lli->lli_lsm_sem). but another lookup process initialized lli->lli_lsm_md after this check and before write lock, so the first lookup process called up_read(&lli->lli_lsm_sem) and return, so the write lock is never released, which cause subsequent lookups deadlock. Rearrange the code to simplify the locking: 1. take read lock. 2. if lsm was initialized and unchanged, release read lock and return. 3. otherwise release read lock and take write lock. 4. free current lsm and initialize with new lsm. 5. release write lock. 6. initialize stripes with read lock. WC-bug-id: https://jira.whamcloud.com/browse/LU-13121 Lustre-commit: 3746550282c8 ("LU-13121 llite: fix deadlock in ll_update_lsm_md()") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/37182 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 107 +++++++++++++++++++++----------------------- 1 file changed, 50 insertions(+), 57 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index f083a90..1a8a5ec 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1401,6 +1401,7 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) { struct ll_inode_info *lli = ll_i2info(inode); struct lmv_stripe_md *lsm = md->lmv; + struct cl_attr *attr; int rc = 0; LASSERT(S_ISDIR(inode->i_mode)); @@ -1422,74 +1423,66 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) * normally dir layout doesn't change, only take read lock to check * that to avoid blocking other MD operations. */ - if (lli->lli_lsm_md) - down_read(&lli->lli_lsm_sem); - else - down_write(&lli->lli_lsm_sem); + down_read(&lli->lli_lsm_sem); - /* - * if dir layout mismatch, check whether version is increased, which - * means layout is changed, this happens in dir migration and lfsck. + /* some current lookup initialized lsm, and unchanged */ + if (lli->lli_lsm_md && lsm_md_eq(lli->lli_lsm_md, lsm)) + goto unlock; + + /* if dir layout doesn't match, check whether version is increased, + * which means layout is changed, this happens in dir split/merge and + * lfsck. * * foreign LMV should not change. */ - if (lli->lli_lsm_md && !lsm_md_eq(lli->lli_lsm_md, lsm)) { - if (lmv_dir_striped(lli->lli_lsm_md) && - lsm->lsm_md_layout_version <= - lli->lli_lsm_md->lsm_md_layout_version) { - CERROR("%s: " DFID " dir layout mismatch:\n", - ll_i2sbi(inode)->ll_fsname, - PFID(&lli->lli_fid)); - lsm_md_dump(D_ERROR, lli->lli_lsm_md); - lsm_md_dump(D_ERROR, lsm); - rc = -EINVAL; - goto unlock; - } - - /* layout changed, switch to write lock */ - up_read(&lli->lli_lsm_sem); - down_write(&lli->lli_lsm_sem); - ll_dir_clear_lsm_md(inode); + if (lli->lli_lsm_md && lmv_dir_striped(lli->lli_lsm_md) && + lsm->lsm_md_layout_version <= + lli->lli_lsm_md->lsm_md_layout_version) { + CERROR("%s: " DFID " dir layout mismatch:\n", + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid)); + lsm_md_dump(D_ERROR, lli->lli_lsm_md); + lsm_md_dump(D_ERROR, lsm); + rc = -EINVAL; + goto unlock; } - /* set directory layout */ - if (!lli->lli_lsm_md) { - struct cl_attr *attr; + up_read(&lli->lli_lsm_sem); + down_write(&lli->lli_lsm_sem); + /* clear existing lsm */ + if (lli->lli_lsm_md) { + lmv_free_memmd(lli->lli_lsm_md); + lli->lli_lsm_md = NULL; + } - rc = ll_init_lsm_md(inode, md); - up_write(&lli->lli_lsm_sem); - if (rc) - return rc; + rc = ll_init_lsm_md(inode, md); + up_write(&lli->lli_lsm_sem); - /* - * set lsm_md to NULL, so the following free lustre_md - * will not free this lsm - */ - md->lmv = NULL; + if (rc) + return rc; - /* - * md_merge_attr() may take long, since lsm is already set, - * switch to read lock. - */ - down_read(&lli->lli_lsm_sem); + /* set md->lmv to NULL, so the following free lustre_md will not free + * this lsm. + */ + md->lmv = NULL; - if (!lmv_dir_striped(lli->lli_lsm_md)) - goto unlock; + /* md_merge_attr() may take long, since lsm is already set, switch to + * read lock. + */ + down_read(&lli->lli_lsm_sem); - attr = kzalloc(sizeof(*attr), GFP_NOFS); - if (!attr) { - rc = -ENOMEM; - goto unlock; - } + if (!lmv_dir_striped(lli->lli_lsm_md)) + goto unlock; - /* validate the lsm */ - rc = md_merge_attr(ll_i2mdexp(inode), lsm, attr, - ll_md_blocking_ast); - if (rc) { - kfree(attr); - goto unlock; - } + attr = kzalloc(sizeof(*attr), GFP_NOFS); + if (!attr) { + rc = -ENOMEM; + goto unlock; + } + /* validate the lsm */ + rc = md_merge_attr(ll_i2mdexp(inode), lli->lli_lsm_md, attr, + ll_md_blocking_ast); + if (!rc) { if (md->body->mbo_valid & OBD_MD_FLNLINK) md->body->mbo_nlink = attr->cat_nlink; if (md->body->mbo_valid & OBD_MD_FLSIZE) @@ -1500,9 +1493,9 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md) md->body->mbo_ctime = attr->cat_ctime; if (md->body->mbo_valid & OBD_MD_FLMTIME) md->body->mbo_mtime = attr->cat_mtime; - - kfree(attr); } + + kfree(attr); unlock: up_read(&lli->lli_lsm_sem); From patchwork Thu Feb 27 21:17:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410823 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0EFF81580 for ; Thu, 27 Feb 2020 21:47:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EBA3324690 for ; Thu, 27 Feb 2020 21:47:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EBA3324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 899EA34A22A; Thu, 27 Feb 2020 13:37:45 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8916B21FD68 for ; Thu, 27 Feb 2020 13:21:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 34A8BA148; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3340246C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:37 -0500 Message-Id: <1582838290-17243-590-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 589/622] lustre: ldlm: fix lock convert races X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman The blocking cb may be triggered in parallel and the convert logic of the DOM lock must be ready that the cancel_bits could be already zeroed by the first executor. As there may be several blocking cb parallel executors and several conversion callers, each requesting for different inode bits, setup the following logic: - the lock keeps the aggregated set of bits requested for cancelling by different parties, where 0 means the whole lock is to be cancelled, and where the CBPENDING flag means there is a canceling job pending; - once completed, the cancel_bits are zeroed and the CBPENDING flag is dropped, meaning the next request will be a part of the next job; - once a local lock is converted, its state is changed appropriately and no cleanup is left for the interpret time as the lock is ready for the next usage; - as the lock is unlocked in a process of conversion and more bits may appear, check it and repeat appropriately; - let just 1 conversion executor to work at a time, others are waiting similar to ldlm_cli_cancel(); - there are others who may want to cancel unused locks (cancel_lru, cancel_resource_local), consider CANCELING as a request to cancel the full lock independently of the cancel_bits; Some cleanups are done: - move the cache drop logic to the CANCELING part of the blocking cb from the BLOCKING one; - remove the convert RPC interpret, as the lock cleanups are already done in advance; the convert RPC is re-sendable and an error means there is a serioes net problem; WC-bug-id: https://jira.whamcloud.com/browse/LU-11276 Lustre-commit: 6c0b676e4124 ("LU-11276 ldlm: fix lock convert races") Signed-off-by: Vitaly Fertman Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/36466 Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 9 +- fs/lustre/ldlm/ldlm_inodebits.c | 143 +++++++++++++++-------------- fs/lustre/ldlm/ldlm_internal.h | 12 +++ fs/lustre/ldlm/ldlm_lockd.c | 73 ++++++++++----- fs/lustre/ldlm/ldlm_request.c | 198 +++++++--------------------------------- fs/lustre/llite/namei.c | 59 +++++++----- 6 files changed, 210 insertions(+), 284 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 9ca79f4..42c1806 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -545,7 +545,6 @@ enum ldlm_cancel_flags { LCF_BL_AST = 0x4, /* Cancel locks marked as LDLM_FL_BL_AST * in the same RPC */ - LCF_CONVERT = 0x8, /* Try to convert IBITS lock before cancel */ }; struct ldlm_flock { @@ -1291,7 +1290,9 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, enum ldlm_mode mode, u64 *flags, void *lvb, u32 lvb_len, const struct lustre_handle *lockh, int rc); -int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags); +int ldlm_cli_convert_req(struct ldlm_lock *lock, u32 *flags, u64 new_bits); +int ldlm_cli_convert(struct ldlm_lock *lock, + enum ldlm_cancel_flags cancel_flags); int ldlm_cli_update_pool(struct ptlrpc_request *req); int ldlm_cli_cancel(const struct lustre_handle *lockh, enum ldlm_cancel_flags cancel_flags); @@ -1317,8 +1318,8 @@ int ldlm_cli_cancel_list(struct list_head *head, int count, /** @} ldlm_cli_api */ int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop); -int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits); -int ldlm_cli_dropbits_list(struct list_head *converts, u64 drop_bits); +int ldlm_cli_inodebits_convert(struct ldlm_lock *lock, + enum ldlm_cancel_flags cancel_flags); /* mds/handler.c */ /* This has to be here because recursive inclusion sucks. */ diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c index 9cf3c5f..2288eb5 100644 --- a/fs/lustre/ldlm/ldlm_inodebits.c +++ b/fs/lustre/ldlm/ldlm_inodebits.c @@ -98,92 +98,101 @@ int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop) EXPORT_SYMBOL(ldlm_inodebits_drop); /* convert single lock */ -int ldlm_cli_dropbits(struct ldlm_lock *lock, u64 drop_bits) +int ldlm_cli_inodebits_convert(struct ldlm_lock *lock, + enum ldlm_cancel_flags cancel_flags) { - struct lustre_handle lockh; + struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); + struct ldlm_lock_desc ld = { { 0 } }; + u64 drop_bits, new_bits; u32 flags = 0; int rc; - LASSERT(drop_bits); - LASSERT(!lock->l_readers && !lock->l_writers); - - LDLM_DEBUG(lock, "client lock convert START"); + check_res_locked(lock->l_resource); - ldlm_lock2handle(lock, &lockh); - lock_res_and_lock(lock); - /* check if all bits are blocked */ - if (!(lock->l_policy_data.l_inodebits.bits & ~drop_bits)) { - unlock_res_and_lock(lock); - /* return error to continue with cancel */ - rc = -EINVAL; - goto exit; + /* Lock is being converted already */ + if (ldlm_is_converting(lock)) { + if (!(cancel_flags & LCF_ASYNC)) { + unlock_res_and_lock(lock); + wait_event_idle(lock->l_waitq, + is_lock_converted(lock)); + lock_res_and_lock(lock); + } + return 0; } - /* check if no common bits, consider this as successful convert */ - if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { - unlock_res_and_lock(lock); - rc = 0; - goto exit; - } + /* lru_cancel may happen in parallel and call ldlm_cli_cancel_list() + * independently. + */ + if (ldlm_is_canceling(lock)) + return -EINVAL; - /* check if there is race with cancel */ - if (ldlm_is_canceling(lock) || ldlm_is_cancel(lock)) { - unlock_res_and_lock(lock); - rc = -EINVAL; - goto exit; - } + /* no need in only local convert */ + if (lock->l_flags & (LDLM_FL_LOCAL_ONLY | LDLM_FL_CANCEL_ON_BLOCK)) + return -EINVAL; - /* clear cbpending flag early, it is safe to match lock right after - * client convert because it is downgrade always. + drop_bits = lock->l_policy_data.l_inodebits.cancel_bits; + /* no cancel bits - means that caller needs full cancel */ + if (drop_bits == 0) + return -EINVAL; + + new_bits = lock->l_policy_data.l_inodebits.bits & ~drop_bits; + /* check if all lock bits are dropped, proceed with cancel */ + if (!new_bits) + return -EINVAL; + + /* check if no dropped bits, consider this as successful convert */ - ldlm_clear_cbpending(lock); - ldlm_clear_bl_ast(lock); + if (lock->l_policy_data.l_inodebits.bits == new_bits) + return 0; - /* If lock is being converted already, check drop bits first */ - if (ldlm_is_converting(lock)) { - /* raced lock convert, lock inodebits are remaining bits - * so check if they are conflicting with new convert or not. - */ - if (!(lock->l_policy_data.l_inodebits.bits & drop_bits)) { - unlock_res_and_lock(lock); - rc = 0; - goto exit; - } - /* Otherwise drop new conflicting bits in new convert */ - } ldlm_set_converting(lock); - /* from all bits of blocking lock leave only conflicting */ - drop_bits &= lock->l_policy_data.l_inodebits.bits; - /* save them in cancel_bits, so l_blocking_ast will know - * which bits from the current lock were dropped. - */ - lock->l_policy_data.l_inodebits.cancel_bits = drop_bits; - /* Finally clear these bits in lock ibits */ - ldlm_inodebits_drop(lock, drop_bits); - unlock_res_and_lock(lock); /* Finally call cancel callback for remaining bits only. * It is important to have converting flag during that * so blocking_ast callback can distinguish convert from * cancels. */ - if (lock->l_blocking_ast) - lock->l_blocking_ast(lock, NULL, lock->l_ast_data, - LDLM_CB_CANCELING); - + ld.l_policy_data.l_inodebits.cancel_bits = drop_bits; + unlock_res_and_lock(lock); + lock->l_blocking_ast(lock, &ld, lock->l_ast_data, LDLM_CB_CANCELING); /* now notify server about convert */ - rc = ldlm_cli_convert(lock, &flags); - if (rc) { - lock_res_and_lock(lock); - if (ldlm_is_converting(lock)) { - ldlm_clear_converting(lock); - ldlm_set_cbpending(lock); - ldlm_set_bl_ast(lock); - } - unlock_res_and_lock(lock); - goto exit; + rc = ldlm_cli_convert_req(lock, &flags, new_bits); + lock_res_and_lock(lock); + if (rc) + goto full_cancel; + + /* Finally clear these bits in lock ibits */ + ldlm_inodebits_drop(lock, drop_bits); + + /* Being locked again check if lock was canceled, it is important + * to do and don't drop cbpending below + */ + if (ldlm_is_canceling(lock)) { + rc = -EINVAL; + goto full_cancel; + } + + /* also check again if more bits to be cancelled appeared */ + if (drop_bits != lock->l_policy_data.l_inodebits.cancel_bits) { + rc = -EAGAIN; + goto clear_converting; } -exit: - LDLM_DEBUG(lock, "client lock convert END"); + /* clear cbpending flag early, it is safe to match lock right after + * client convert because it is downgrade always. + */ + ldlm_clear_cbpending(lock); + ldlm_clear_bl_ast(lock); + spin_lock(&ns->ns_lock); + if (list_empty(&lock->l_lru)) + ldlm_lock_add_to_lru_nolock(lock); + spin_unlock(&ns->ns_lock); + + /* the job is done, zero the cancel_bits. If more conflicts appear, + * it will result in another cycle of ldlm_cli_inodebits_convert(). + */ +full_cancel: + lock->l_policy_data.l_inodebits.cancel_bits = 0; +clear_converting: + ldlm_clear_converting(lock); return rc; } diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h index 336d9b7..996c0fb 100644 --- a/fs/lustre/ldlm/ldlm_internal.h +++ b/fs/lustre/ldlm/ldlm_internal.h @@ -171,6 +171,7 @@ int ldlm_bl_to_thread_list(struct ldlm_namespace *ns, void ldlm_handle_bl_callback(struct ldlm_namespace *ns, struct ldlm_lock_desc *ld, struct ldlm_lock *lock); +void ldlm_bl_desc2lock(const struct ldlm_lock_desc *ld, struct ldlm_lock *lock); extern struct kmem_cache *ldlm_resource_slab; extern struct kset *ldlm_ns_kset; @@ -330,6 +331,17 @@ static inline bool is_bl_done(struct ldlm_lock *lock) return bl_done; } +static inline bool is_lock_converted(struct ldlm_lock *lock) +{ + bool ret = 0; + + lock_res_and_lock(lock); + ret = (lock->l_policy_data.l_inodebits.cancel_bits == 0); + unlock_res_and_lock(lock); + + return ret; +} + typedef void (*ldlm_policy_wire_to_local_t)(const union ldlm_wire_policy_data *, union ldlm_policy_data *); diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 79dab6e..32b7be1 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -73,7 +73,6 @@ struct ldlm_cb_async_args { /* LDLM state */ static struct ldlm_state *ldlm_state; - struct ldlm_bl_pool { spinlock_t blp_lock; @@ -111,21 +110,15 @@ struct ldlm_bl_work_item { }; /** - * Callback handler for receiving incoming blocking ASTs. - * - * This can only happen on client side. + * Server may pass additional information about blocking lock. + * For IBITS locks it is conflicting bits which can be used for + * lock convert instead of cancel. */ -void ldlm_handle_bl_callback(struct ldlm_namespace *ns, - struct ldlm_lock_desc *ld, struct ldlm_lock *lock) +void ldlm_bl_desc2lock(const struct ldlm_lock_desc *ld, struct ldlm_lock *lock) { - int do_ast; - - LDLM_DEBUG(lock, "client blocking AST callback handler"); - - lock_res_and_lock(lock); - - /* set bits to cancel for this lock for possible lock convert */ - if (lock->l_resource->lr_type == LDLM_IBITS) { + check_res_locked(lock->l_resource); + if (ld && + (lock->l_resource->lr_type == LDLM_IBITS)) { /* * Lock description contains policy of blocking lock, and its * cancel_bits is used to pass conflicting bits. NOTE: ld can @@ -137,18 +130,41 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, * cookie, never use cancel bits from different resource, full * cancel is to be used. */ - if (ld && ld->l_policy_data.l_inodebits.bits && + if (ld->l_policy_data.l_inodebits.cancel_bits && ldlm_res_eq(&ld->l_resource.lr_name, - &lock->l_resource->lr_name)) - lock->l_policy_data.l_inodebits.cancel_bits = + &lock->l_resource->lr_name) && + !(ldlm_is_cbpending(lock) && + lock->l_policy_data.l_inodebits.cancel_bits == 0)) { + /* always combine conflicting ibits */ + lock->l_policy_data.l_inodebits.cancel_bits |= ld->l_policy_data.l_inodebits.cancel_bits; - /* - * If there is no valid ld and lock is cbpending already - * then cancel_bits should be kept, otherwise it is zeroed. - */ - else if (!ldlm_is_cbpending(lock)) + } else { + /* If cancel_bits are not obtained or + * if the lock is already CBPENDING and + * has no cancel_bits set + * - the full lock is to be cancelled + */ lock->l_policy_data.l_inodebits.cancel_bits = 0; + } } +} + +/** + * Callback handler for receiving incoming blocking ASTs. + * + * This can only happen on client side. + */ +void ldlm_handle_bl_callback(struct ldlm_namespace *ns, + struct ldlm_lock_desc *ld, struct ldlm_lock *lock) +{ + int do_ast; + + LDLM_DEBUG(lock, "client blocking AST callback handler"); + + lock_res_and_lock(lock); + + /* get extra information from desc if any */ + ldlm_bl_desc2lock(ld, lock); ldlm_set_cbpending(lock); do_ast = !lock->l_readers && !lock->l_writers; @@ -269,6 +285,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, * Let ldlm_cancel_lru() be fast. */ ldlm_lock_remove_from_lru(lock); + ldlm_bl_desc2lock(&dlm_req->lock_desc, lock); lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_BL_AST; LDLM_DEBUG(lock, "completion AST includes blocking AST"); } @@ -318,6 +335,7 @@ static void ldlm_handle_gl_callback(struct ptlrpc_request *req, struct ldlm_request *dlm_req, struct ldlm_lock *lock) { + struct ldlm_lock_desc *ld = &dlm_req->lock_desc; int rc = -ENXIO; LDLM_DEBUG(lock, "client glimpse AST callback handler"); @@ -339,8 +357,15 @@ static void ldlm_handle_gl_callback(struct ptlrpc_request *req, ktime_add(lock->l_last_used, ktime_set(ns->ns_dirty_age_limit, 0)))) { unlock_res_and_lock(lock); - if (ldlm_bl_to_thread_lock(ns, NULL, lock)) - ldlm_handle_bl_callback(ns, NULL, lock); + + /* For MDS glimpse it is always DOM lock, set corresponding + * cancel_bits to perform lock convert if needed + */ + if (lock->l_resource->lr_type == LDLM_IBITS) + ld->l_policy_data.l_inodebits.cancel_bits = + MDS_INODELOCK_DOM; + if (ldlm_bl_to_thread_lock(ns, ld, lock)) + ldlm_handle_bl_callback(ns, ld, lock); return; } diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 6df057d..7eba8d2 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -489,6 +489,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, if ((*flags) & LDLM_FL_AST_SENT) { lock_res_and_lock(lock); + ldlm_bl_desc2lock(&reply->lock_desc, lock); lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_BL_AST; unlock_res_and_lock(lock); LDLM_DEBUG(lock, "enqueue reply includes blocking AST"); @@ -875,129 +876,6 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, EXPORT_SYMBOL(ldlm_cli_enqueue); /** - * Client-side lock convert reply handling. - * - * Finish client lock converting, checks for concurrent converts - * and clear 'converting' flag so lock can be placed back into LRU. - */ -static int lock_convert_interpret(const struct lu_env *env, - struct ptlrpc_request *req, - void *args, int rc) -{ - struct ldlm_async_args *aa = args; - struct ldlm_lock *lock; - struct ldlm_reply *reply; - - lock = ldlm_handle2lock(&aa->lock_handle); - if (!lock) { - LDLM_DEBUG_NOLOCK("convert ACK for unknown local cookie %#llx", - aa->lock_handle.cookie); - return -ESTALE; - } - - LDLM_DEBUG(lock, "CONVERTED lock:"); - - if (rc != ELDLM_OK) - goto out; - - reply = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP); - if (!reply) { - rc = -EPROTO; - goto out; - } - - if (reply->lock_handle.cookie != aa->lock_handle.cookie) { - LDLM_ERROR(lock, - "convert ACK with wrong lock cookie %#llx but cookie %#llx from server %s id %s\n", - aa->lock_handle.cookie, reply->lock_handle.cookie, - req->rq_export->exp_client_uuid.uuid, - libcfs_id2str(req->rq_peer)); - rc = ELDLM_NO_LOCK_DATA; - goto out; - } - - lock_res_and_lock(lock); - /* - * Lock convert is sent for any new bits to drop, the converting flag - * is dropped when ibits on server are the same as on client. Meanwhile - * that can be so that more later convert will be replied first with - * and clear converting flag, so in case of such race just exit here. - * if lock has no converting bits then. - */ - if (!ldlm_is_converting(lock)) { - LDLM_DEBUG(lock, - "convert ACK for lock without converting flag, reply ibits %#llx", - reply->lock_desc.l_policy_data.l_inodebits.bits); - } else if (reply->lock_desc.l_policy_data.l_inodebits.bits != - lock->l_policy_data.l_inodebits.bits) { - /* - * Compare server returned lock ibits and local lock ibits - * if they are the same we consider conversion is done, - * otherwise we have more converts inflight and keep - * converting flag. - */ - LDLM_DEBUG(lock, "convert ACK with ibits %#llx\n", - reply->lock_desc.l_policy_data.l_inodebits.bits); - } else { - ldlm_clear_converting(lock); - - /* - * Concurrent BL AST may arrive and cause another convert - * or cancel so just do nothing here if bl_ast is set, - * finish with convert otherwise. - */ - if (!ldlm_is_bl_ast(lock)) { - struct ldlm_namespace *ns = ldlm_lock_to_ns(lock); - - /* - * Drop cancel_bits since there are no more converts - * and put lock into LRU if it is still not used and - * is not there yet. - */ - lock->l_policy_data.l_inodebits.cancel_bits = 0; - if (!lock->l_readers && !lock->l_writers && - !ldlm_is_canceling(lock)) { - spin_lock(&ns->ns_lock); - /* there is check for list_empty() inside */ - ldlm_lock_remove_from_lru_nolock(lock); - ldlm_lock_add_to_lru_nolock(lock); - spin_unlock(&ns->ns_lock); - } - } - } - unlock_res_and_lock(lock); -out: - if (rc) { - int flag; - - lock_res_and_lock(lock); - if (ldlm_is_converting(lock)) { - ldlm_clear_converting(lock); - ldlm_set_cbpending(lock); - ldlm_set_bl_ast(lock); - lock->l_policy_data.l_inodebits.cancel_bits = 0; - } - unlock_res_and_lock(lock); - - /* - * fallback to normal lock cancel. If rc means there is no - * valid lock on server, do only local cancel - */ - if (rc == ELDLM_NO_LOCK_DATA) - flag = LCF_LOCAL; - else - flag = LCF_ASYNC; - - rc = ldlm_cli_cancel(&aa->lock_handle, flag); - if (rc < 0) - LDLM_DEBUG(lock, "failed to cancel lock: rc = %d\n", - rc); - } - LDLM_LOCK_PUT(lock); - return rc; -} - -/** * Client-side IBITS lock convert. * * Inform server that lock has been converted instead of canceling. @@ -1009,17 +887,13 @@ static int lock_convert_interpret(const struct lu_env *env, * is made asynchronous. * */ -int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) +int ldlm_cli_convert_req(struct ldlm_lock *lock, u32 *flags, u64 new_bits) { struct ldlm_request *body; struct ptlrpc_request *req; - struct ldlm_async_args *aa; struct obd_export *exp = lock->l_conn_export; - if (!exp) { - LDLM_ERROR(lock, "convert must not be called on local locks."); - return -EINVAL; - } + LASSERT(exp); /* * this is better to check earlier and it is done so already, @@ -1050,8 +924,7 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) body->lock_desc.l_req_mode = lock->l_req_mode; body->lock_desc.l_granted_mode = lock->l_granted_mode; - body->lock_desc.l_policy_data.l_inodebits.bits = - lock->l_policy_data.l_inodebits.bits; + body->lock_desc.l_policy_data.l_inodebits.bits = new_bits; body->lock_desc.l_policy_data.l_inodebits.cancel_bits = 0; body->lock_flags = ldlm_flags_to_wire(*flags); @@ -1071,10 +944,6 @@ int ldlm_cli_convert(struct ldlm_lock *lock, u32 *flags) lprocfs_counter_incr(exp->exp_obd->obd_svc_stats, LDLM_CONVERT - LDLM_FIRST_OPC); - aa = ptlrpc_req_async_args(aa, req); - ldlm_lock2handle(lock, &aa->lock_handle); - req->rq_interpret_reply = lock_convert_interpret; - ptlrpcd_add_req(req); return 0; } @@ -1301,6 +1170,27 @@ int ldlm_cli_update_pool(struct ptlrpc_request *req) return 0; } +int ldlm_cli_convert(struct ldlm_lock *lock, + enum ldlm_cancel_flags cancel_flags) +{ + int rc = -EINVAL; + + LASSERT(!lock->l_readers && !lock->l_writers); + LDLM_DEBUG(lock, "client lock convert START"); + + if (lock->l_resource->lr_type == LDLM_IBITS) { + lock_res_and_lock(lock); + do { + rc = ldlm_cli_inodebits_convert(lock, cancel_flags); + } while (rc == -EAGAIN); + unlock_res_and_lock(lock); + } + + LDLM_DEBUG(lock, "client lock convert END"); + return rc; +} +EXPORT_SYMBOL(ldlm_cli_convert); + /** * Client side lock cancel. * @@ -1323,20 +1213,9 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } - /* Convert lock bits instead of cancel for IBITS locks */ - if (cancel_flags & LCF_CONVERT) { - LASSERT(lock->l_resource->lr_type == LDLM_IBITS); - LASSERT(lock->l_policy_data.l_inodebits.cancel_bits != 0); - - rc = ldlm_cli_dropbits(lock, - lock->l_policy_data.l_inodebits.cancel_bits); - if (rc == 0) { - LDLM_LOCK_RELEASE(lock); - return 0; - } - } - lock_res_and_lock(lock); + LASSERT(!ldlm_is_converting(lock)); + /* Lock is being canceled and the caller doesn't want to wait */ if (ldlm_is_canceling(lock)) { unlock_res_and_lock(lock); @@ -1348,16 +1227,6 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, return 0; } - /* - * Lock is being converted, cancel it immediately. - * When convert will end, it releases lock and it will be gone. - */ - if (ldlm_is_converting(lock)) { - /* set back flags removed by convert */ - ldlm_set_cbpending(lock); - ldlm_set_bl_ast(lock); - } - ldlm_set_canceling(lock); unlock_res_and_lock(lock); @@ -1723,8 +1592,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, /* No locks which got blocking requests. */ LASSERT(!ldlm_is_bl_ast(lock)); - if (!ldlm_is_canceling(lock) && - !ldlm_is_converting(lock)) + if (!ldlm_is_canceling(lock)) break; /* @@ -1782,7 +1650,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns, lock_res_and_lock(lock); /* Check flags again under the lock. */ - if (ldlm_is_canceling(lock) || ldlm_is_converting(lock) || + if (ldlm_is_canceling(lock) || (ldlm_lock_remove_from_lru_check(lock, last_use) == 0)) { /* * Another thread is removing lock from LRU, or @@ -1908,11 +1776,10 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, continue; /* - * If somebody is already doing CANCEL, or blocking AST came, - * skip this lock. + * If somebody is already doing CANCEL, or blocking AST came + * then skip this lock. */ - if (ldlm_is_bl_ast(lock) || ldlm_is_canceling(lock) || - ldlm_is_converting(lock)) + if (ldlm_is_bl_ast(lock) || ldlm_is_canceling(lock)) continue; if (lockmode_compat(lock->l_granted_mode, mode)) @@ -1938,7 +1805,6 @@ int ldlm_cancel_resource_local(struct ldlm_resource *res, /* See CBPENDING comment in ldlm_cancel_lru */ lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_CANCELING | lock_flags; - LASSERT(list_empty(&lock->l_bl_ast)); list_add(&lock->l_bl_ast, cancels); LDLM_LOCK_GET(lock); diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index c87653d..13c1cf9 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -431,11 +431,10 @@ int ll_md_need_convert(struct ldlm_lock *lock) return !!(bits); } -int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, +int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *ld, void *data, int flag) { struct lustre_handle lockh; - u64 bits = lock->l_policy_data.l_inodebits.bits; int rc; switch (flag) { @@ -443,17 +442,21 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, { u64 cancel_flags = LCF_ASYNC; - if (ll_md_need_convert(lock)) { - cancel_flags |= LCF_CONVERT; - /* For lock convert some cancel actions may require - * this lock with non-dropped canceled bits, e.g. page - * flush for DOM lock. So call ll_lock_cancel_bits() - * here while canceled bits are still set. - */ - bits = lock->l_policy_data.l_inodebits.cancel_bits; - if (bits & MDS_INODELOCK_DOM) - ll_lock_cancel_bits(lock, MDS_INODELOCK_DOM); + /* if lock convert is not needed then still have to + * pass lock via ldlm_cli_convert() to keep all states + * correct, set cancel_bits to full lock bits to cause + * full cancel to happen. + */ + if (!ll_md_need_convert(lock)) { + lock_res_and_lock(lock); + lock->l_policy_data.l_inodebits.cancel_bits = + lock->l_policy_data.l_inodebits.bits; + unlock_res_and_lock(lock); } + rc = ldlm_cli_convert(lock, cancel_flags); + if (!rc) + return 0; + /* continue with cancel otherwise */ ldlm_lock2handle(lock, &lockh); rc = ldlm_cli_cancel(&lockh, cancel_flags); if (rc < 0) { @@ -463,24 +466,34 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, break; } case LDLM_CB_CANCELING: + { + u64 to_cancel = lock->l_policy_data.l_inodebits.bits; + /* Nothing to do for non-granted locks */ if (!ldlm_is_granted(lock)) break; - if (ldlm_is_converting(lock)) { - /* this is called on already converted lock, so - * ibits has remained bits only and cancel_bits - * are bits that were dropped. - * Note that DOM lock is handled prior lock convert - * and is excluded here. + /* If 'ld' is supplied then bits to be cancelled are passed + * implicitly by lock converting and cancel_bits from 'ld' + * should be used. Otherwise full cancel is being performed + * and lock inodebits are used. + * + * Note: we cannot rely on cancel_bits in lock itself at this + * moment because they can be changed by concurrent thread, + * so ldlm_cli_inodebits_convert() pass cancel bits implicitly + * in 'ld' parameter. + */ + if (ld) { + /* partial bits cancel allowed only during convert */ + LASSERT(ldlm_is_converting(lock)); + /* mask cancel bits by lock bits so only no any unused + * bits are passed to ll_lock_cancel_bits() */ - bits = lock->l_policy_data.l_inodebits.cancel_bits & - ~MDS_INODELOCK_DOM; - } else { - LASSERT(ldlm_is_canceling(lock)); + to_cancel &= ld->l_policy_data.l_inodebits.cancel_bits; } - ll_lock_cancel_bits(lock, bits); + ll_lock_cancel_bits(lock, to_cancel); break; + } default: LBUG(); } From patchwork Thu Feb 27 21:17:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FE74924 for ; Thu, 27 Feb 2020 21:45:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 689B424690 for ; Thu, 27 Feb 2020 21:45:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 689B424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1706C34B122; Thu, 27 Feb 2020 13:36:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFF8921FD68 for ; Thu, 27 Feb 2020 13:21:22 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 38385A149; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3616946D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:38 -0500 Message-Id: <1582838290-17243-591-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 590/622] lustre: ldlm: signal vs CP callback race X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh In case of interrupted wait for a CP AST failed_lock_cleanup() sets LDLM_FL_LOCAL_ONLY, so the client wouldn't cancel the lock on CP AST. A lock isn't canceled on the server on reception Cray-bug-id: LUS-2021 WC-bug-id: https://jira.whamcloud.com/browse/LU-7791 Lustre-commit: 7fff052c930d ("LU-7791 ldlm: signal vs CP callback race") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/19898 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ldlm/ldlm_lockd.c | 51 +++++++++++++++++++++++++---------------- fs/lustre/ldlm/ldlm_request.c | 3 +++ 3 files changed, 35 insertions(+), 20 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index a26ac76..7dfef0f 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -302,6 +302,7 @@ #define OBD_FAIL_LDLM_CP_CB_WAIT3 0x321 #define OBD_FAIL_LDLM_CP_CB_WAIT4 0x322 #define OBD_FAIL_LDLM_CP_CB_WAIT5 0x323 +#define OBD_FAIL_LDLM_PAUSE_CANCEL_LOCAL 0x329 #define OBD_FAIL_LDLM_GRANT_CHECK 0x32a #define OBD_FAIL_LDLM_LOCAL_CANCEL_PAUSE 0x32c diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 32b7be1..b252fef 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -187,15 +187,29 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns, LDLM_LOCK_RELEASE(lock); } +static int ldlm_callback_reply(struct ptlrpc_request *req, int rc) +{ + if (req->rq_no_reply) + return 0; + + req->rq_status = rc; + if (!req->rq_packed_final) { + rc = lustre_pack_reply(req, 1, NULL, NULL); + if (rc) + return rc; + } + return ptlrpc_reply(req); +} + /* * Callback handler for receiving incoming completion ASTs. * * This only can happen on client side. */ -static void ldlm_handle_cp_callback(struct ptlrpc_request *req, - struct ldlm_namespace *ns, - struct ldlm_request *dlm_req, - struct ldlm_lock *lock) +static int ldlm_handle_cp_callback(struct ptlrpc_request *req, + struct ldlm_namespace *ns, + struct ldlm_request *dlm_req, + struct ldlm_lock *lock) { int lvb_len; LIST_HEAD(ast_list); @@ -206,6 +220,8 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE)) { long to = HZ; + ldlm_callback_reply(req, 0); + while (to > 0) { schedule_timeout_interruptible(to); if (ldlm_is_granted(lock) || @@ -250,6 +266,12 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, lock_res_and_lock(lock); } + if (ldlm_is_failed(lock)) { + unlock_res_and_lock(lock); + LDLM_LOCK_RELEASE(lock); + return -EINVAL; + } + if (ldlm_is_destroyed(lock) || ldlm_is_granted(lock)) { /* bug 11300: the lock has already been granted */ @@ -321,6 +343,8 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req, wake_up(&lock->l_waitq); } LDLM_LOCK_RELEASE(lock); + + return 0; } /** @@ -373,20 +397,6 @@ static void ldlm_handle_gl_callback(struct ptlrpc_request *req, LDLM_LOCK_RELEASE(lock); } -static int ldlm_callback_reply(struct ptlrpc_request *req, int rc) -{ - if (req->rq_no_reply) - return 0; - - req->rq_status = rc; - if (!req->rq_packed_final) { - rc = lustre_pack_reply(req, 1, NULL, NULL); - if (rc) - return rc; - } - return ptlrpc_reply(req); -} - static int __ldlm_bl_to_thread(struct ldlm_bl_work_item *blwi, enum ldlm_cancel_flags cancel_flags) { @@ -714,8 +724,9 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) case LDLM_CP_CALLBACK: CDEBUG(D_INODE, "completion ast\n"); req_capsule_extend(&req->rq_pill, &RQF_LDLM_CP_CALLBACK); - ldlm_callback_reply(req, 0); - ldlm_handle_cp_callback(req, ns, dlm_req, lock); + rc = ldlm_handle_cp_callback(req, ns, dlm_req, lock); + if (!OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE)) + ldlm_callback_reply(req, rc); break; case LDLM_GL_CALLBACK: CDEBUG(D_INODE, "glimpse ast\n"); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 7eba8d2..fcb2af5 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -964,6 +964,9 @@ static u64 ldlm_cli_cancel_local(struct ldlm_lock *lock) bool local_only; LDLM_DEBUG(lock, "client-side cancel"); + OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_PAUSE_CANCEL_LOCAL, + cfs_fail_val); + /* Set this flag to prevent others from getting new references*/ lock_res_and_lock(lock); ldlm_set_cbpending(lock); From patchwork Thu Feb 27 21:17:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64AA61580 for ; Thu, 27 Feb 2020 21:47:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D38C24690 for ; Thu, 27 Feb 2020 21:47:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D38C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 80FB034B452; Thu, 27 Feb 2020 13:37:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44DCA3489F1 for ; Thu, 27 Feb 2020 13:21:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3B33FA14A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3916046A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:39 -0500 Message-Id: <1582838290-17243-592-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 591/622] lustre: uapi: properly pack data structures X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Linux UAPI headers use the gcc attributre __packed__ to ensure that the data structures are the exact same size on all platforms. This comes at the cost of potential misaligned accesses to these data structures which at best cost performance and at worst cause a bus error on some platforms. To detect potential misaligned access starting with gcc version 9 a new compile flags was introduced which is now impacting builds with Lustre. Examining the build failures shows most of the problems are due to packed data structures in the Lustre UAPI header containing unpacked data structure fields. Packing those missed structures resolved many of the build issues. The second problem is that the lustre utilities tend to cast some of its UAPI data structure. A good example is struct lov_user_md being cast to struct lov_user_md_v3. To ensure this is properly handled with packed data structures we need to use the __may_alias__ compiler attribute. The one exception is struct statx which is defined out side of Lustre and its unpacked. This requires extra special handling in user land code due to the described issues in this comment. The Lustre UAPI headers currently used __packed to avoid checkpatch errors due to Lustre being in the staging tree. Now that the Lustre UAPI headers are in the proper place update the UAPI headers to use __attribute__((packed)) over __packed. WC-bug-id: https://jira.whamcloud.com/browse/LU-12822 Lustre-commit: 4751e4a95197 ("LU-12822 uapi: properly pack data structures") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/36798 Reviewed-by: Andreas Dilger Reviewed-by: Quentin Bouget Reviewed-by: Oleg Drokin --- include/uapi/linux/lustre/lustre_idl.h | 54 ++++++++++++++++----------------- include/uapi/linux/lustre/lustre_user.h | 42 ++++++++++++------------- 2 files changed, 48 insertions(+), 48 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index a69d49a..19ac0cb 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2426,7 +2426,7 @@ enum llog_ctxt_id { struct llog_logid { struct ost_id lgl_oi; __u32 lgl_ogen; -} __packed; +} __attribute__((packed)); /** Records written to the CATALOGS list */ #define CATLIST "CATALOGS" @@ -2435,7 +2435,7 @@ struct llog_catid { __u32 lci_padding1; __u32 lci_padding2; __u32 lci_padding3; -} __packed; +} __attribute__((packed)); /* Log data record types - there is no specific reason that these need to * be related to the RPC opcodes, but no reason not to (may be handy later?) @@ -2477,12 +2477,12 @@ struct llog_rec_hdr { __u32 lrh_index; __u32 lrh_type; __u32 lrh_id; -}; +} __attribute__((packed)); struct llog_rec_tail { __u32 lrt_len; __u32 lrt_index; -}; +} __attribute__((packed)); /* Where data follow just after header */ #define REC_DATA(ptr) \ @@ -2499,7 +2499,7 @@ struct llog_logid_rec { __u64 lid_padding2; __u64 lid_padding3; struct llog_rec_tail lid_tail; -} __packed; +} __attribute__((packed)); struct llog_unlink_rec { struct llog_rec_hdr lur_hdr; @@ -2507,7 +2507,7 @@ struct llog_unlink_rec { __u32 lur_oseq; __u32 lur_count; struct llog_rec_tail lur_tail; -} __packed; +} __attribute__((packed)); struct llog_unlink64_rec { struct llog_rec_hdr lur_hdr; @@ -2517,7 +2517,7 @@ struct llog_unlink64_rec { __u64 lur_padding2; __u64 lur_padding3; struct llog_rec_tail lur_tail; -} __packed; +} __attribute__((packed)); struct llog_setattr64_rec { struct llog_rec_hdr lsr_hdr; @@ -2528,7 +2528,7 @@ struct llog_setattr64_rec { __u32 lsr_gid_h; __u64 lsr_valid; struct llog_rec_tail lsr_tail; -} __packed; +} __attribute__((packed)); struct llog_size_change_rec { struct llog_rec_hdr lsc_hdr; @@ -2538,7 +2538,7 @@ struct llog_size_change_rec { __u64 lsc_padding2; __u64 lsc_padding3; struct llog_rec_tail lsc_tail; -} __packed; +} __attribute__((packed)); /* changelog llog name, needed by client replicators */ #define CHANGELOG_CATALOG "changelog_catalog" @@ -2546,14 +2546,14 @@ struct llog_size_change_rec { struct changelog_setinfo { __u64 cs_recno; __u32 cs_id; -} __packed; +} __attribute__((packed)); /** changelog record */ struct llog_changelog_rec { struct llog_rec_hdr cr_hdr; struct changelog_rec cr; /**< Variable length field */ struct llog_rec_tail cr_do_not_use; /**< for_sizezof_only */ -} __packed; +} __attribute__((packed)); struct llog_changelog_user_rec { struct llog_rec_hdr cur_hdr; @@ -2561,7 +2561,7 @@ struct llog_changelog_user_rec { __u32 cur_padding; __u64 cur_endrec; struct llog_rec_tail cur_tail; -} __packed; +} __attribute__((packed)); enum agent_req_status { ARS_WAITING, @@ -2602,13 +2602,13 @@ struct llog_agent_req_rec { __u64 arr_req_change; /**< req. status change time */ struct hsm_action_item arr_hai; /**< req. to the agent */ struct llog_rec_tail arr_tail; /**< record tail for_sizezof_only */ -} __packed; +} __attribute__((packed)); /* Old llog gen for compatibility */ struct llog_gen { __u64 mnt_cnt; __u64 conn_cnt; -} __packed; +} __attribute__((packed)); struct llog_gen_rec { struct llog_rec_hdr lgr_hdr; @@ -2679,7 +2679,7 @@ struct llog_log_hdr { */ __u32 llh_bitmap[LLOG_BITMAP_BYTES / sizeof(__u32)]; struct llog_rec_tail llh_tail; -} __packed; +} __attribute__((packed)); #undef LLOG_HEADER_SIZE #undef LLOG_BITMAP_BYTES @@ -2701,7 +2701,7 @@ struct llog_cookie { __u32 lgc_subsys; __u32 lgc_index; __u32 lgc_padding; -} __packed; +} __attribute__((packed)); /** llog protocol */ enum llogd_rpc_ops { @@ -2726,13 +2726,13 @@ struct llogd_body { __u32 lgd_saved_index; __u32 lgd_len; __u64 lgd_cur_offset; -} __packed; +} __attribute__((packed)); struct llogd_conn_body { struct llog_gen lgdc_gen; struct llog_logid lgdc_logid; __u32 lgdc_ctxt_idx; -} __packed; +} __attribute__((packed)); /* Note: 64-bit types are 64-bit aligned in structure */ struct obdo { @@ -2832,7 +2832,7 @@ struct lustre_capa { /* FIXME: y2038 time_t overflow: */ __u32 lc_expiry; /** expiry time (sec) */ __u8 lc_hmac[CAPA_HMAC_MAX_LEN]; /** HMAC */ -} __packed; +} __attribute__((packed)); /** lustre_capa::lc_opc */ enum { @@ -2864,7 +2864,7 @@ struct lustre_capa_key { __u32 lk_keyid; /**< key# */ __u32 lk_padding; __u8 lk_key[CAPA_HMAC_KEY_MAX_LEN]; /**< key */ -} __packed; +} __attribute__((packed)); /** The link ea holds 1 @link_ea_entry for each hardlink */ #define LINK_EA_MAGIC 0x11EAF1DFUL @@ -2884,7 +2884,7 @@ struct link_ea_entry { unsigned char lee_reclen[2]; unsigned char lee_parent_fid[sizeof(struct lu_fid)]; char lee_name[0]; -} __packed; +} __attribute__((packed)); /** fid2path request/reply structure */ struct getinfo_fid2path { @@ -2896,7 +2896,7 @@ struct getinfo_fid2path { char gf_path[0]; struct lu_fid gf_root_fid[0]; }; -} __packed; +} __attribute__((packed)); /** path2parent request/reply structures */ struct getparent { @@ -2904,7 +2904,7 @@ struct getparent { __u32 gp_linkno; /**< hardlink number */ __u32 gp_name_size; /**< size of the name field */ char gp_name[0]; /**< zero-terminated link name */ -} __packed; +} __attribute__((packed)); enum layout_intent_opc { LAYOUT_INTENT_ACCESS = 0, /** generic access */ @@ -2921,7 +2921,7 @@ struct layout_intent { __u32 li_opc; /* intent operation for enqueue, read, write etc */ __u32 li_flags; struct lu_extent li_extent; -} __packed; +} __attribute__((packed)); /** * On the wire version of hsm_progress structure. @@ -2939,20 +2939,20 @@ struct hsm_progress_kernel { /* Additional fields */ __u64 hpk_data_version; __u64 hpk_padding2; -} __packed; +} __attribute__((packed)); /** layout swap request structure * fid1 and fid2 are in mdt_body */ struct mdc_swap_layouts { __u64 msl_flags; -} __packed; +} __attribute__((packed)); #define INLINE_RESYNC_ARRAY_SIZE 15 struct close_data_resync_done { __u32 resync_count; __u32 resync_ids_inline[INLINE_RESYNC_ARRAY_SIZE]; -}; +} __attribute__((packed)); struct close_data { struct lustre_handle cd_handle; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 08589e6..5c21f34 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -157,7 +157,7 @@ struct lu_fid { * used. **/ __u32 f_ver; -}; +} __attribute__((packed)); static inline bool fid_is_zero(const struct lu_fid *fid) { @@ -176,7 +176,7 @@ struct ost_layout { __u64 ol_comp_start; __u64 ol_comp_end; __u32 ol_comp_id; -} __packed; +} __attribute__((packed)); /* Userspace should treat lu_fid as opaque, and only use the following methods * to print or parse them. Other functions (e.g. compare, swab) could be moved @@ -245,7 +245,7 @@ struct ost_id { } oi; struct lu_fid oi_fid; }; -}; +} __attribute__((packed)); #define DOSTID "%#llx:%llu" #define POSTID(oi) ostid_seq(oi), ostid_id(oi) @@ -462,7 +462,7 @@ struct lov_user_ost_data_v1 { /* per-stripe data structure */ struct ost_id l_ost_oi; /* OST object ID */ __u32 l_ost_gen; /* generation of this OST index */ __u32 l_ost_idx; /* OST index in LOV */ -} __packed; +} __attribute__((packed)); #define lov_user_md lov_user_md_v1 struct lov_user_md_v1 { /* LOV EA user data (host-endian) */ @@ -480,7 +480,7 @@ struct lov_user_md_v1 { /* LOV EA user data (host-endian) */ */ }; struct lov_user_ost_data_v1 lmm_objects[0]; /* per-stripe data */ -} __attribute__((packed, __may_alias__)); +} __attribute__((packed, __may_alias__)); struct lov_user_md_v3 { /* LOV EA user data (host-endian) */ __u32 lmm_magic; /* magic number = LOV_USER_MAGIC_V3 */ @@ -498,7 +498,7 @@ struct lov_user_md_v3 { /* LOV EA user data (host-endian) */ }; char lmm_pool_name[LOV_MAXPOOLNAME + 1]; /* pool name */ struct lov_user_ost_data_v1 lmm_objects[0]; /* per-stripe data */ -} __packed; +} __attribute__((packed, __may_alias__)); struct lov_foreign_md { __u32 lfm_magic; /* magic number = LOV_MAGIC_FOREIGN */ @@ -506,7 +506,7 @@ struct lov_foreign_md { __u32 lfm_type; /* type, see LU_FOREIGN_TYPE_ */ __u32 lfm_flags; /* flags, type specific */ char lfm_value[]; -}; +} __attribute__((packed)); #define foreign_size(lfm) (((struct lov_foreign_md *)lfm)->lfm_length + \ offsetof(struct lov_foreign_md, lfm_value)) @@ -518,7 +518,7 @@ struct lov_foreign_md { struct lu_extent { __u64 e_start; __u64 e_end; -}; +} __attribute__((packed)); #define DEXT "[%#llx, %#llx)" #define PEXT(ext) (unsigned long long)(ext)->e_start, (unsigned long long)(ext)->e_end @@ -583,7 +583,7 @@ struct lov_comp_md_entry_v1 { __u32 lcme_layout_gen; __u64 lcme_timestamp; /* snapshot time if applicable*/ __u32 lcme_padding_1; -} __packed; +} __attribute__((packed)); #define SEQ_ID_MAX 0x0000FFFF #define SEQ_ID_MASK SEQ_ID_MAX @@ -626,7 +626,7 @@ struct lov_comp_md_v1 { __u16 lcm_padding1[3]; __u64 lcm_padding2; struct lov_comp_md_entry_v1 lcm_entries[0]; -} __packed; +} __attribute__((packed)); static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic) { @@ -649,7 +649,7 @@ static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic) struct lov_user_mds_data_v1 { lstat_t lmd_st; /* MDS stat struct */ struct lov_user_md_v1 lmd_lmm; /* LOV EA V1 user data */ -} __packed; +} __attribute__((packed)); struct lov_user_mds_data_v2 { struct lu_fid lmd_fid; /* Lustre FID */ @@ -663,14 +663,14 @@ struct lov_user_mds_data_v2 { struct lov_user_mds_data_v3 { lstat_t lmd_st; /* MDS stat struct */ struct lov_user_md_v3 lmd_lmm; /* LOV EA V3 user data */ -} __packed; +} __attribute__((packed)); #endif struct lmv_user_mds_data { struct lu_fid lum_fid; __u32 lum_padding; __u32 lum_mds; -}; +} __attribute__((packed, __may_alias__)); enum lmv_hash_type { LMV_HASH_TYPE_UNKNOWN = 0, /* 0 is reserved for testing purpose */ @@ -743,7 +743,7 @@ struct lmv_user_md_v1 { __u32 lum_padding3; char lum_pool_name[LOV_MAXPOOLNAME + 1]; struct lmv_user_mds_data lum_objects[0]; -} __packed; +} __attribute__((packed)); static inline __u32 lmv_foreign_to_md_stripes(__u32 size) { @@ -1315,8 +1315,8 @@ struct changelog_rec { struct lu_fid cr_tfid; /**< target fid */ __u32 cr_markerflags; /**< CL_MARK flags */ }; - struct lu_fid cr_pfid; /**< parent fid */ -} __packed; + struct lu_fid cr_pfid; /**< parent fid */ +} __attribute__((packed)); /* Changelog extension for RENAME. */ struct changelog_ext_rename { @@ -1758,7 +1758,7 @@ enum hsm_states { struct hsm_extent { __u64 offset; __u64 length; -} __packed; +} __attribute__((packed)); /** * Current HSM states of a Lustre file. @@ -1842,7 +1842,7 @@ struct hsm_request { struct hsm_user_item { struct lu_fid hui_fid; struct hsm_extent hui_extent; -} __packed; +} __attribute__((packed)); struct hsm_user_request { struct hsm_request hur_request; @@ -1850,7 +1850,7 @@ struct hsm_user_request { /* extra data blob at end of struct (after all * hur_user_items), only use helpers to access it */ -} __packed; +} __attribute__((packed)); /** Return pointer to data field in a hsm user request */ static inline void *hur_data(struct hsm_user_request *hur) @@ -1916,7 +1916,7 @@ struct hsm_action_item { __u64 hai_cookie; /* action cookie from coordinator */ __u64 hai_gid; /* grouplock id */ char hai_data[0]; /* variable length */ -} __packed; +} __attribute__((packed)); /* * helper function which print in hexa the first bytes of @@ -1960,7 +1960,7 @@ struct hsm_action_list { /* struct hsm_action_item[hal_count] follows, aligned on 8-byte * boundaries. See hai_first */ -} __packed; +} __attribute__((packed)); /* Return pointer to first hai in action list */ static inline struct hsm_action_item *hai_first(struct hsm_action_list *hal) From patchwork Thu Feb 27 21:17:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410825 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 809D81580 for ; Thu, 27 Feb 2020 21:47:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6608B24690 for ; Thu, 27 Feb 2020 21:47:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6608B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5695D34B504; Thu, 27 Feb 2020 13:37:49 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9AB063489F2 for ; Thu, 27 Feb 2020 13:21:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 3E625A14B; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BDD247C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:40 -0500 Message-Id: <1582838290-17243-593-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 592/622] lnet: peer lookup handle shutdown X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When LNet is shutting down, looking up peer_nis shouldn't assert but return NULL. Callers handle NULL return WC-bug-id: https://jira.whamcloud.com/browse/LU-13049 Lustre-commit: f46b22aa6a28 ("LU-13049 lnet: peer lookup handle shutdown") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/36925 Reviewed-by: Serguei Smirnov Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b168c97..f987fff 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -647,7 +647,8 @@ void lnet_peer_uninit(void) struct list_head *peers; struct lnet_peer_ni *lp; - LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); + if (the_lnet.ln_state != LNET_STATE_RUNNING) + return NULL; peers = &ptable->pt_hash[lnet_nid2peerhash(nid)]; list_for_each_entry(lp, peers, lpni_hashlist) { From patchwork Thu Feb 27 21:17:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410559 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B835924 for ; Thu, 27 Feb 2020 21:41:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 244EF24690 for ; Thu, 27 Feb 2020 21:41:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 244EF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AAFE9349B66; Thu, 27 Feb 2020 13:33:33 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DC2243489F6 for ; Thu, 27 Feb 2020 13:21:23 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 40A50A14C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3EF38468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:41 -0500 Message-Id: <1582838290-17243-594-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 593/622] lnet: lnet response entries leak X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov LNetPut with ACK flag called, but LNetMDUnlink issued before ACK arrives. It can due timeout or it is application call (ldiskfs commit for difficult replies on MDT). It freed an MD but rsp don't detached, as ACK don't hold an reference to the MD between request sends and ACK arrives. monitor thread detect it situation and RSP entry moved into the zombie list, which don't freed as no msg processed due MD absence. Let's remove a response tracking in case nobody want to have reply aka LNetMDUnlink called. Cray-bug-id: LUS-8188 WC-bug-id: https://jira.whamcloud.com/browse/LU-12991 Lustre-commit: b7035222bd64 ("LU-12991 lnet: lnet response entries leak") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/36896 Reviewed-by: Amir Shehata Reviewed-by: Chris Horn Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 ++ net/lnet/lnet/lib-md.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 3b597e3..bf357b0 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -157,6 +157,8 @@ static inline int lnet_md_unlinkable(struct lnet_libmd *md) { unsigned int size; + LASSERTF(md->md_rspt_ptr == NULL, "md %p rsp %p\n", md, md->md_rspt_ptr); + if ((md->md_options & LNET_MD_KIOV) != 0) size = offsetof(struct lnet_libmd, md_iov.kiov[md->md_niov]); else diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c index 4a70c76..5ee43c2 100644 --- a/net/lnet/lnet/lib-md.c +++ b/net/lnet/lnet/lib-md.c @@ -548,6 +548,9 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) lnet_eq_enqueue_event(md->md_eq, &ev); } + if (md->md_rspt_ptr) + lnet_detach_rsp_tracker(md, cpt); + lnet_md_unlink(md); lnet_res_unlock(cpt); From patchwork Thu Feb 27 21:17:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4527924 for ; Thu, 27 Feb 2020 21:47:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9CE0624690 for ; Thu, 27 Feb 2020 21:47:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CE0624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC33934A21C; Thu, 27 Feb 2020 13:37:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 29C043489F6 for ; Thu, 27 Feb 2020 13:21:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 43132A14D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 41EB346C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:42 -0500 Message-Id: <1582838290-17243-595-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 594/622] lustre: lmv: disable statahead for remote objects X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vladimir Saveliev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vladimir Saveliev Statahead for remote objects is supposed to be disabled by LU-11681 lmv: disable remote file statahead. However due to typo it is not and statahead for remote objects is accompanied by warnings like: ll_set_inode()) Can not initialize inode .. without object type.. ll_prep_inode()) new_inode -fatal: rc -12 Fix the typo. Test to illustrate the issue is added. Fixes: 6dd8b9909e79 ("lustre: lmv: disable remote file statahead") WC-bug-id: https://jira.whamcloud.com/browse/LU-13099 Lustre-commit: 68330379b01c ("LU-13099 lmv: disable statahead for remote objects") Signed-off-by: Vladimir Saveliev Cray-bug-id: LUS-8262 Reviewed-on: https://review.whamcloud.com/37089 Reviewed-by: Andreas Dilger Reviewed-by: Olaf Faaland-LLNL Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index ee52bba..cead3a1 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3369,7 +3369,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp, if (IS_ERR(ptgt)) return PTR_ERR(ptgt); - ctgt = lmv_fid2tgt(lmv, &op_data->op_fid1); + ctgt = lmv_fid2tgt(lmv, &op_data->op_fid2); if (IS_ERR(ctgt)) return PTR_ERR(ctgt); From patchwork Thu Feb 27 21:17:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410865 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6905C1580 for ; Thu, 27 Feb 2020 21:48:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51E5324690 for ; Thu, 27 Feb 2020 21:48:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51E5324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 549D634B89A; Thu, 27 Feb 2020 13:39:14 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D7CD3489F6 for ; Thu, 27 Feb 2020 13:21:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4617CA14E; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 44F0346D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:43 -0500 Message-Id: <1582838290-17243-596-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 595/622] lustre: llite: eviction during ll_open_cleanup() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh On error ll_open_cleanup() is called while intent lock remains pinned. So eviction can happen while close request waits for a mod rpc slot. Release intent lock before ll_open_cleanup() Cray-bug-id: LUS-8055 WC-bug-id: https://jira.whamcloud.com/browse/LU-13101 Lustre-commit: 6d5d7c6bdb4f ("LU-13101 llite: eviction during ll_open_cleanup()") Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Boyko Reviewed-by: Andrew Perepechko Reviewed-by: Vitaly Fertman Reviewed-on: https://review.whamcloud.com/37096 Reviewed-by: Alexandr Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 4 +++- fs/lustre/llite/namei.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 1a8a5ec..33ab3f7 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2507,8 +2507,10 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, /* cleanup will be done if necessary */ md_free_lustre_md(sbi->ll_md_exp, &md); - if (rc != 0 && it && it->it_op & IT_OPEN) + if (rc != 0 && it && it->it_op & IT_OPEN) { + ll_intent_drop_lock(it); ll_open_cleanup(sb ? sb : (*inode)->i_sb, req); + } return rc; } diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 13c1cf9..89317db 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -709,8 +709,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, } out: - if (rc != 0 && it->it_op & IT_OPEN) + if (rc != 0 && it->it_op & IT_OPEN) { + ll_intent_drop_lock(it); ll_open_cleanup((*de)->d_sb, request); + } return rc; } From patchwork Thu Feb 27 21:17:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CB88924 for ; Thu, 27 Feb 2020 21:41:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8548924690 for ; Thu, 27 Feb 2020 21:41:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8548924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D3FA9348C1C; Thu, 27 Feb 2020 13:33:37 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B38803489F6 for ; Thu, 27 Feb 2020 13:21:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 48C1EA14F; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 47BC346A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:44 -0500 Message-Id: <1582838290-17243-597-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 596/622] lustre: ptlrpc: show target name in req_history X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Currently the req_history tracing shows the "self" NID as the second field. However, this is not very useful since there may be a number of different targets on the same server, and since the logs are all collected directly on the server we already know the local NID. Instead of printing the "self" NID, store the target name as the second field, if that is available, so that we can determine which target the RPC was intended for. This makes it easier to debug problems with bad clients and isolate traffic for a specific target. WC-bug-id: https://jira.whamcloud.com/browse/LU-11644 Lustre-commit: 83b6c6608e94 ("LU-11644 ptlrpc: show target name in req_history") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/37193 Reviewed-by: Mike Pershin Reviewed-by: Nathaniel Clark Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/lproc_ptlrpc.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index d52a08a..f34aec3 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -956,7 +956,6 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) req = srhi->srhi_req; - libcfs_nid2str_r(req->rq_self, nidstr, sizeof(nidstr)); arrival.tv_sec = req->rq_arrival_time.tv_sec; arrival.tv_nsec = req->rq_arrival_time.tv_nsec; sent.tv_sec = req->rq_sent; @@ -970,8 +969,13 @@ static int ptlrpc_lprocfs_svc_req_history_show(struct seq_file *s, void *iter) * parser. Currently I only print stuff here I know is OK * to look at coz it was set up in request_in_callback()!!! */ - seq_printf(s, "%lld:%s:%s:x%llu:%d:%s:%lld.%06lld:%lld.%06llds(%+lld.0s) ", - req->rq_history_seq, nidstr, + seq_printf(s, + "%lld:%s:%s:x%llu:%d:%s:%lld.%06lld:%lld.%06llds(%+lld.0s) ", + req->rq_history_seq, + req->rq_export && req->rq_export->exp_obd ? + req->rq_export->exp_obd->obd_name : + libcfs_nid2str_r(req->rq_self, nidstr, + sizeof(nidstr)), libcfs_id2str(req->rq_peer), req->rq_xid, req->rq_reqlen, ptlrpc_rqphase2str(req), (s64)req->rq_arrival_time.tv_sec, From patchwork Thu Feb 27 21:17:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410829 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B74C61580 for ; Thu, 27 Feb 2020 21:47:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F8DF24690 for ; Thu, 27 Feb 2020 21:47:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F8DF24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B47E34B52A; Thu, 27 Feb 2020 13:37:56 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 01EBA3489F6 for ; Thu, 27 Feb 2020 13:21:24 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4B91CA150; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4A69847C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:45 -0500 Message-Id: <1582838290-17243-598-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 597/622] lustre: dom: check read-on-open buffer presents in reply X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin The ll_dom_finish_open() uses req_capsule_has_field() wronly, it check only format but not buffer presence in reply, that causes unneeded console errors about missing buffer later in req_capsule_server_get() Patch replaces that with req_capsule_field_present() to check if server pack that field in reply or not and properly skip responses from an old server. WC-bug-id: https://jira.whamcloud.com/browse/LU-13136 Lustre-commit: 58bea527100b ("LU-13136 dom: check read-on-open buffer presents in reply") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/37249 Reviewed-by: Andreas Dilger Reviewed-by: John L. Hammond Reviewed-by: Stephane Thiell Signed-off-by: James Simmons --- fs/lustre/include/lustre_req_layout.h | 3 +++ fs/lustre/llite/file.c | 4 ++-- fs/lustre/ptlrpc/layout.c | 7 ++++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index feb5e77..ea6baef 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -112,6 +112,9 @@ u32 req_capsule_fmt_size(u32 magic, const struct req_format *fmt, int req_capsule_has_field(const struct req_capsule *pill, const struct req_msg_field *field, enum req_location loc); +int req_capsule_field_present(const struct req_capsule *pill, + const struct req_msg_field *field, + enum req_location loc); void req_capsule_shrink(struct req_capsule *pill, const struct req_msg_field *field, u32 newlen, enum req_location loc); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a3c36a7..c7233bf 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -459,8 +459,8 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, if (!obj) return; - if (!req_capsule_has_field(&req->rq_pill, &RMF_NIOBUF_INLINE, - RCL_SERVER)) + if (!req_capsule_field_present(&req->rq_pill, &RMF_NIOBUF_INLINE, + RCL_SERVER)) return; rnb = req_capsule_server_get(&req->rq_pill, &RMF_NIOBUF_INLINE); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 06db86d..4213fb2 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -2268,9 +2268,9 @@ int req_capsule_has_field(const struct req_capsule *pill, * Returns a non-zero value if the given @field is present in the given * @pill's PTLRPC request or reply (@loc), else it returns 0. */ -static int req_capsule_field_present(const struct req_capsule *pill, - const struct req_msg_field *field, - enum req_location loc) +int req_capsule_field_present(const struct req_capsule *pill, + const struct req_msg_field *field, + enum req_location loc) { u32 offset; @@ -2280,6 +2280,7 @@ static int req_capsule_field_present(const struct req_capsule *pill, offset = __req_capsule_offset(pill, field, loc); return lustre_msg_bufcount(__req_msg(pill, loc)) > offset; } +EXPORT_SYMBOL(req_capsule_field_present); /** * This function shrinks the size of the _buffer_ of the @pill's PTLRPC From patchwork Thu Feb 27 21:17:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EC01924 for ; Thu, 27 Feb 2020 21:48:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8736A24690 for ; Thu, 27 Feb 2020 21:48:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8736A24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 999FC349EAD; Thu, 27 Feb 2020 13:39:17 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 57A943489F6 for ; Thu, 27 Feb 2020 13:21:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4EA15A151; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4D605468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:46 -0500 Message-Id: <1582838290-17243-599-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 598/622] lustre: llite: proper names/types for offset/pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Use loff_t for file offsets and pgoff_t for page index values instead of unsigned long, so that it is possible to distinguish what type of value is being used in the byte-granular readahead code. Otherwise, it is difficult to determine what units "start" or "end" in a given function are in. Rename variables that reference page index values with an "_idx" suffix to make this clear when reading the code. Similarly, use "bytes" or "pages" for variable names instead of "count" or "len". Fix stride_page_count() to properly use loff_t for the byte_count, which might otherwise overflow for large strides. Cast pgoff_t vars to loff_t before PAGE_SIZE shift to avoid overflow. Use shift and mask with PAGE_SIZE and PAGE_MASK instead of mod/div. Use proper 64-bit division functions for the loff_t types when calculating stride, since they are not guaranteed to be within 4GB. Remove unused "remainder" argument from ras_align() function. Fixes: 91d264551508 ("LU-12518 llite: support page unaligned stride readahead") WC-bug-id: https://jira.whamcloud.com/browse/LU-12518 Lustre-commit: 83d8dd1d7c30 ("LU-12518 llite: proper names/types for offset/pages") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/37248 Reviewed-by: Wang Shilong Reviewed-by: Gu Zheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 10 +- fs/lustre/llite/file.c | 6 +- fs/lustre/llite/llite_internal.h | 49 +++-- fs/lustre/llite/rw.c | 455 ++++++++++++++++++++------------------- fs/lustre/llite/vvp_internal.h | 4 +- fs/lustre/llite/vvp_io.c | 18 +- fs/lustre/lov/lov_io.c | 21 +- fs/lustre/mdc/mdc_dev.c | 4 +- fs/lustre/obdclass/integrity.c | 2 +- fs/lustre/osc/osc_cache.c | 2 +- fs/lustre/osc/osc_io.c | 8 +- 11 files changed, 294 insertions(+), 285 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 67731b0..aa54537 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1464,14 +1464,14 @@ struct cl_read_ahead { * This is determined DLM lock coverage, RPC and stripe boundary. * cra_end is included. */ - pgoff_t cra_end; + pgoff_t cra_end_idx; /* optimal RPC size for this read, by pages */ - unsigned long cra_rpc_size; - /* - * Release callback. If readahead holds resources underneath, this + unsigned long cra_rpc_pages; + /* Release callback. If readahead holds resources underneath, this * function should be called to release it. */ - void (*cra_release)(const struct lu_env *env, void *cbdata); + void (*cra_release)(const struct lu_env *env, + void *cbdata); /* Callback data for cra_release routine */ void *cra_cbdata; /* whether lock is in contention */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index c7233bf..097dbeb 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -472,7 +472,7 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, * client PAGE_SIZE to be used on that client, if server's PAGE_SIZE is * smaller then offset may be not aligned and that data is just ignored. */ - if (rnb->rnb_offset % PAGE_SIZE) + if (rnb->rnb_offset & ~PAGE_MASK) return; /* Server returns whole file or just file tail if it fills in reply @@ -492,9 +492,9 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req, data = (char *)rnb + sizeof(*rnb); lnb.lnb_file_offset = rnb->rnb_offset; - start = lnb.lnb_file_offset / PAGE_SIZE; + start = lnb.lnb_file_offset >> PAGE_SHIFT; index = 0; - LASSERT(lnb.lnb_file_offset % PAGE_SIZE == 0); + LASSERT((lnb.lnb_file_offset & ~PAGE_MASK) == 0); lnb.lnb_page_offset = 0; do { lnb.lnb_data = data + (index << PAGE_SHIFT); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index b7b418f..55d451fe 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -464,22 +464,22 @@ struct ll_ra_info { * counted by page index. */ struct ra_io_arg { - pgoff_t ria_start; /* start offset of read-ahead*/ - pgoff_t ria_end; /* end offset of read-ahead*/ + pgoff_t ria_start_idx; /* start offset of read-ahead*/ + pgoff_t ria_end_idx; /* end offset of read-ahead*/ unsigned long ria_reserved; /* reserved pages for read-ahead */ - pgoff_t ria_end_min; /* minimum end to cover current read */ + pgoff_t ria_end_idx_min;/* minimum end to cover current read */ bool ria_eof; /* reach end of file */ - /* If stride read pattern is detected, ria_stoff means where - * stride read is started. Note: for normal read-ahead, the + /* If stride read pattern is detected, ria_stoff is the byte offset + * where stride read is started. Note: for normal read-ahead, the * value here is meaningless, and also it will not be accessed */ - unsigned long ria_stoff; + loff_t ria_stoff; /* ria_length and ria_bytes are the length and pages length in the * stride I/O mode. And they will also be used to check whether * it is stride I/O read-ahead in the read-ahead pages */ - unsigned long ria_length; - unsigned long ria_bytes; + loff_t ria_length; + loff_t ria_bytes; }; /* LL_HIST_MAX=32 causes an overflow */ @@ -697,9 +697,9 @@ struct ll_sb_info { * per file-descriptor read-ahead data. */ struct ll_readahead_state { - spinlock_t ras_lock; + spinlock_t ras_lock; /* End byte that read(2) try to read. */ - unsigned long ras_last_read_end; + loff_t ras_last_read_end_bytes; /* * number of bytes read after last read-ahead window reset. As window * is reset on each seek, this is effectively a number of consecutive @@ -710,7 +710,7 @@ struct ll_readahead_state { * case, it probably doesn't make sense to expand window to * PTLRPC_MAX_BRW_PAGES on the third access. */ - unsigned long ras_consecutive_bytes; + loff_t ras_consecutive_bytes; /* * number of read requests after the last read-ahead window reset * As window is reset on each seek, this is effectively the number @@ -724,12 +724,13 @@ struct ll_readahead_state { * expanded to PTLRPC_MAX_BRW_PAGES. Afterwards, window is enlarged by * PTLRPC_MAX_BRW_PAGES chunks up to ->ra_max_pages. */ - pgoff_t ras_window_start, ras_window_len; + pgoff_t ras_window_start_idx; + pgoff_t ras_window_pages; /* - * Optimal RPC size. It decides how many pages will be sent - * for each read-ahead. + * Optimal RPC size in pages. + * It decides how many pages will be sent for each read-ahead. */ - unsigned long ras_rpc_size; + unsigned long ras_rpc_pages; /* * Where next read-ahead should start at. This lies within read-ahead * window. Read-ahead window is read in pieces rather than at once @@ -737,7 +738,7 @@ struct ll_readahead_state { * ->ra_max_pages (see ll_ra_count_get()), 2. client cannot read pages * not covered by DLM lock. */ - pgoff_t ras_next_readahead; + pgoff_t ras_next_readahead_idx; /* * Total number of ll_file_read requests issued, reads originating * due to mmap are not counted in this total. This value is used to @@ -755,9 +756,9 @@ struct ll_readahead_state { * ras_stride_bytes = stride_bytes; * Note: all these three items are counted by bytes. */ - unsigned long ras_stride_length; - unsigned long ras_stride_bytes; - unsigned long ras_stride_offset; + loff_t ras_stride_length; + loff_t ras_stride_bytes; + loff_t ras_stride_offset; /* * number of consecutive stride request count, and it is similar as * ras_consecutive_requests, but used for stride I/O mode. @@ -766,7 +767,7 @@ struct ll_readahead_state { */ unsigned long ras_consecutive_stride_requests; /* index of the last page that async readahead starts */ - pgoff_t ras_async_last_readpage; + pgoff_t ras_async_last_readpage_idx; /* whether we should increase readahead window */ bool ras_need_increase_window; /* whether ra miss check should be skipped */ @@ -776,10 +777,8 @@ struct ll_readahead_state { struct ll_readahead_work { /** File to readahead */ struct file *lrw_file; - /** Start bytes */ - unsigned long lrw_start; - /** End bytes */ - unsigned long lrw_end; + pgoff_t lrw_start_idx; + pgoff_t lrw_end_idx; /* async worker to handler read */ struct work_struct lrw_readahead_work; @@ -868,7 +867,7 @@ static inline bool ll_sbi_has_file_heat(struct ll_sb_info *sbi) return !!(sbi->ll_flags & LL_SBI_FILE_HEAT); } -void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count); +void ll_ras_enter(struct file *f, loff_t pos, size_t count); /* llite/lcommon_misc.c */ int cl_ocd_update(struct obd_device *host, struct obd_device *watched, diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index bf91ae1..9509023 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -80,7 +80,8 @@ */ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi, struct ra_io_arg *ria, - unsigned long pages, unsigned long min) + unsigned long pages, + unsigned long pages_min) { struct ll_ra_info *ra = &sbi->ll_ra_info; long ret; @@ -101,19 +102,19 @@ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi, } out: - if (ret < min) { + if (ret < pages_min) { /* override ra limit for maximum performance */ - atomic_add(min - ret, &ra->ra_cur_pages); - ret = min; + atomic_add(pages_min - ret, &ra->ra_cur_pages); + ret = pages_min; } return ret; } -void ll_ra_count_put(struct ll_sb_info *sbi, unsigned long len) +void ll_ra_count_put(struct ll_sb_info *sbi, unsigned long pages) { struct ll_ra_info *ra = &sbi->ll_ra_info; - atomic_sub(len, &ra->ra_cur_pages); + atomic_sub(pages, &ra->ra_cur_pages); } static void ll_ra_stats_inc_sbi(struct ll_sb_info *sbi, enum ra_stat which) @@ -131,19 +132,20 @@ void ll_ra_stats_inc(struct inode *inode, enum ra_stat which) #define RAS_CDEBUG(ras) \ CDEBUG(D_READA, \ - "lre %lu cr %lu cb %lu ws %lu wl %lu nra %lu rpc %lu r %lu csr %lu sf %lu sb %lu sl %lu lr %lu\n", \ - ras->ras_last_read_end, ras->ras_consecutive_requests, \ - ras->ras_consecutive_bytes, ras->ras_window_start, \ - ras->ras_window_len, ras->ras_next_readahead, \ - ras->ras_rpc_size, ras->ras_requests, \ + "lre %llu cr %lu cb %llu wsi %lu wp %lu nra %lu rpc %lu r %lu csr %lu so %llu sb %llu sl %llu lr %lu\n", \ + ras->ras_last_read_end_bytes, ras->ras_consecutive_requests, \ + ras->ras_consecutive_bytes, ras->ras_window_start_idx, \ + ras->ras_window_pages, ras->ras_next_readahead_idx, \ + ras->ras_rpc_pages, ras->ras_requests, \ ras->ras_consecutive_stride_requests, ras->ras_stride_offset, \ ras->ras_stride_bytes, ras->ras_stride_length, \ - ras->ras_async_last_readpage) + ras->ras_async_last_readpage_idx) -static int pos_in_window(unsigned long pos, unsigned long point, - unsigned long before, unsigned long after) +static bool pos_in_window(loff_t pos, loff_t point, + unsigned long before, unsigned long after) { - unsigned long start = point - before, end = point + after; + loff_t start = point - before; + loff_t end = point + after; if (start > point) start = 0; @@ -228,9 +230,9 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io, return rc; } -#define RIA_DEBUG(ria) \ - CDEBUG(D_READA, "rs %lu re %lu ro %lu rl %lu rb %lu\n", \ - ria->ria_start, ria->ria_end, ria->ria_stoff, \ +#define RIA_DEBUG(ria) \ + CDEBUG(D_READA, "rs %lu re %lu ro %llu rl %llu rb %llu\n", \ + ria->ria_start_idx, ria->ria_end_idx, ria->ria_stoff, \ ria->ria_length, ria->ria_bytes) static inline int stride_io_mode(struct ll_readahead_state *ras) @@ -238,7 +240,7 @@ static inline int stride_io_mode(struct ll_readahead_state *ras) return ras->ras_consecutive_stride_requests > 1; } -/* The function calculates how much pages will be read in +/* The function calculates how many bytes will be read in * [off, off + length], in such stride IO area, * stride_offset = st_off, stride_length = st_len, * stride_bytes = st_bytes @@ -256,31 +258,29 @@ static inline int stride_io_mode(struct ll_readahead_state *ras) * = |<----->| + |-------------------------------------| + |---| * start_left st_bytes * i end_left */ -static unsigned long -stride_byte_count(unsigned long st_off, unsigned long st_len, - unsigned long st_bytes, unsigned long off, - unsigned long length) +static loff_t stride_byte_count(loff_t st_off, loff_t st_len, loff_t st_bytes, + loff_t off, loff_t length) { u64 start = off > st_off ? off - st_off : 0; u64 end = off + length > st_off ? off + length - st_off : 0; - unsigned long start_left = 0; - unsigned long end_left = 0; - unsigned long bytes_count; + u64 start_left; + u64 end_left; + u64 bytes_count; if (st_len == 0 || length == 0 || end == 0) return length; - start_left = do_div(start, st_len); + start = div64_u64_rem(start, st_len, &start_left); if (start_left < st_bytes) start_left = st_bytes - start_left; else start_left = 0; - end_left = do_div(end, st_len); + end = div64_u64_rem(end, st_len, &end_left); if (end_left > st_bytes) end_left = st_bytes; - CDEBUG(D_READA, "start %llu, end %llu start_left %lu end_left %lu\n", + CDEBUG(D_READA, "start %llu, end %llu start_left %llu end_left %llu\n", start, end, start_left, end_left); if (start == end) @@ -290,48 +290,45 @@ static inline int stride_io_mode(struct ll_readahead_state *ras) st_bytes * (end - start - 1) + end_left; CDEBUG(D_READA, - "st_off %lu, st_len %lu st_bytes %lu off %lu length %lu bytescount %lu\n", + "st_off %llu, st_len %llu st_bytes %llu off %llu length %llu bytescount %llu\n", st_off, st_len, st_bytes, off, length, bytes_count); return bytes_count; } -static int ria_page_count(struct ra_io_arg *ria) +static unsigned long ria_page_count(struct ra_io_arg *ria) { - u64 length_bytes = ria->ria_end >= ria->ria_start ? - (ria->ria_end - ria->ria_start + 1) << PAGE_SHIFT : 0; - unsigned int bytes_count, pg_count; + loff_t length_bytes = ria->ria_end_idx >= ria->ria_start_idx ? + (loff_t)(ria->ria_end_idx - + ria->ria_start_idx + 1) << PAGE_SHIFT : 0; + loff_t bytes_count; if (ria->ria_length > ria->ria_bytes && ria->ria_bytes && - (ria->ria_length % PAGE_SIZE || ria->ria_bytes % PAGE_SIZE || - ria->ria_stoff % PAGE_SIZE)) { + (ria->ria_length & ~PAGE_SIZE || ria->ria_bytes & ~PAGE_SIZE || + ria->ria_stoff & ~PAGE_SIZE)) { /* Over-estimate un-aligned page stride read */ - pg_count = ((ria->ria_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT) + 1; - pg_count *= length_bytes / ria->ria_length + 1; + unsigned long pg_count = ((ria->ria_bytes + + PAGE_SIZE - 1) >> PAGE_SHIFT) + 1; + pg_count *= length_bytes / ria->ria_length + 1; return pg_count; } bytes_count = stride_byte_count(ria->ria_stoff, ria->ria_length, - ria->ria_bytes, ria->ria_start, - length_bytes); + ria->ria_bytes, + (loff_t)ria->ria_start_idx << PAGE_SHIFT, + length_bytes); return (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT; } -static unsigned long ras_align(struct ll_readahead_state *ras, - pgoff_t index, unsigned long *remainder) +static pgoff_t ras_align(struct ll_readahead_state *ras, pgoff_t index) { - unsigned long rem = index % ras->ras_rpc_size; - - if (remainder) - *remainder = rem; - return index - rem; + return index - (index % ras->ras_rpc_pages); } -/*Check whether the index is in the defined ra-window */ -static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) +/* Check whether the index is in the defined ra-window */ +static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria) { - unsigned long pos = idx << PAGE_SHIFT; - unsigned long offset; + loff_t pos = (loff_t)idx << PAGE_SHIFT; /* If ria_length == ria_pages, it means non-stride I/O mode, * idx should always inside read-ahead window in this case @@ -342,12 +339,16 @@ static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) return true; if (pos >= ria->ria_stoff) { - offset = (pos - ria->ria_stoff) % ria->ria_length; + u64 offset; + + div64_u64_rem(pos - ria->ria_stoff, ria->ria_length, &offset); + if (offset < ria->ria_bytes || (ria->ria_length - offset) < PAGE_SIZE) return true; - } else if (pos + PAGE_SIZE > ria->ria_stoff) + } else if (pos + PAGE_SIZE > ria->ria_stoff) { return true; + } return false; } @@ -365,11 +366,12 @@ static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) LASSERT(ria); RIA_DEBUG(ria); - for (page_idx = ria->ria_start; - page_idx <= ria->ria_end && ria->ria_reserved > 0; page_idx++) { + for (page_idx = ria->ria_start_idx; + page_idx <= ria->ria_end_idx && ria->ria_reserved > 0; + page_idx++) { if (ras_inside_ra_window(page_idx, ria)) { - if (!ra.cra_end || ra.cra_end < page_idx) { - unsigned long end; + if (!ra.cra_end_idx || ra.cra_end_idx < page_idx) { + pgoff_t end_idx; cl_read_ahead_release(env, &ra); @@ -377,37 +379,40 @@ static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) if (rc < 0) break; - /* Do not shrink the ria_end at any case until + /* Do not shrink ria_end_idx at any case until * the minimum end of current read is covered. - * And only shrink the ria_end if the matched + * And only shrink ria_end_idx if the matched * LDLM lock doesn't cover more. */ - if (page_idx > ra.cra_end || + if (page_idx > ra.cra_end_idx || (ra.cra_contention && - page_idx > ria->ria_end_min)) { - ria->ria_end = ra.cra_end; + page_idx > ria->ria_end_idx_min)) { + ria->ria_end_idx = ra.cra_end_idx; break; } CDEBUG(D_READA, "idx: %lu, ra: %lu, rpc: %lu\n", - page_idx, ra.cra_end, ra.cra_rpc_size); - LASSERTF(ra.cra_end >= page_idx, + page_idx, ra.cra_end_idx, + ra.cra_rpc_pages); + LASSERTF(ra.cra_end_idx >= page_idx, "object: %p, indcies %lu / %lu\n", - io->ci_obj, ra.cra_end, page_idx); + io->ci_obj, ra.cra_end_idx, page_idx); /* * update read ahead RPC size. * NB: it's racy but doesn't matter */ - if (ras->ras_rpc_size != ra.cra_rpc_size && - ra.cra_rpc_size > 0) - ras->ras_rpc_size = ra.cra_rpc_size; + if (ras->ras_rpc_pages != ra.cra_rpc_pages && + ra.cra_rpc_pages > 0) + ras->ras_rpc_pages = ra.cra_rpc_pages; /* trim it to align with optimal RPC size */ - end = ras_align(ras, ria->ria_end + 1, NULL); - if (end > 0 && !ria->ria_eof) - ria->ria_end = end - 1; - if (ria->ria_end < ria->ria_end_min) - ria->ria_end = ria->ria_end_min; + end_idx = ras_align(ras, ria->ria_end_idx + 1); + if (end_idx > 0 && !ria->ria_eof) + ria->ria_end_idx = end_idx - 1; + if (ria->ria_end_idx < ria->ria_end_idx_min) + ria->ria_end_idx = ria->ria_end_idx_min; } + if (page_idx > ria->ria_end_idx) + break; /* If the page is inside the read-ahead window */ rc = ll_read_ahead_page(env, io, queue, page_idx); @@ -427,16 +432,17 @@ static bool ras_inside_ra_window(unsigned long idx, struct ra_io_arg *ria) * read-ahead mode, then check whether it should skip * the stride gap. */ - unsigned long offset; - unsigned long pos = page_idx << PAGE_SHIFT; + loff_t pos = (loff_t)page_idx << PAGE_SHIFT; + u64 offset; - offset = (pos - ria->ria_stoff) % ria->ria_length; + div64_u64_rem(pos - ria->ria_stoff, ria->ria_length, + &offset); if (offset >= ria->ria_bytes) { pos += (ria->ria_length - offset); if ((pos >> PAGE_SHIFT) >= page_idx + 1) page_idx = (pos >> PAGE_SHIFT) - 1; CDEBUG(D_READA, - "Stride: jump %lu pages to %lu\n", + "Stride: jump %llu pages to %lu\n", ria->ria_length - offset, page_idx); continue; } @@ -495,12 +501,12 @@ static void ll_readahead_handle_work(struct work_struct *wq) struct ll_readahead_state *ras; struct cl_io *io; struct cl_2queue *queue; - pgoff_t ra_end = 0; - unsigned long len, mlen = 0; + pgoff_t ra_end_idx = 0; + unsigned long pages, pages_min = 0; struct file *file; u64 kms; int rc; - unsigned long end_index; + pgoff_t eof_index; work = container_of(wq, struct ll_readahead_work, lrw_readahead_work); @@ -531,30 +537,30 @@ static void ll_readahead_handle_work(struct work_struct *wq) ria = &ll_env_info(env)->lti_ria; memset(ria, 0, sizeof(*ria)); - ria->ria_start = work->lrw_start; + ria->ria_start_idx = work->lrw_start_idx; /* Truncate RA window to end of file */ - end_index = (unsigned long)((kms - 1) >> PAGE_SHIFT); - if (end_index <= work->lrw_end) { - work->lrw_end = end_index; + eof_index = (pgoff_t)(kms - 1) >> PAGE_SHIFT; + if (eof_index <= work->lrw_end_idx) { + work->lrw_end_idx = eof_index; ria->ria_eof = true; } - if (work->lrw_end <= work->lrw_start) { + if (work->lrw_end_idx <= work->lrw_start_idx) { rc = 0; goto out_put_env; } - ria->ria_end = work->lrw_end; - len = ria->ria_end - ria->ria_start + 1; + ria->ria_end_idx = work->lrw_end_idx; + pages = ria->ria_end_idx - ria->ria_start_idx + 1; ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, - ria_page_count(ria), mlen); + ria_page_count(ria), pages_min); CDEBUG(D_READA, "async reserved pages: %lu/%lu/%lu, ra_cur %d, ra_max %lu\n", - ria->ria_reserved, len, mlen, + ria->ria_reserved, pages, pages_min, atomic_read(&ll_i2sbi(inode)->ll_ra_info.ra_cur_pages), ll_i2sbi(inode)->ll_ra_info.ra_max_pages); - if (ria->ria_reserved < len) { + if (ria->ria_reserved < pages) { ll_ra_stats_inc(inode, RA_STAT_MAX_IN_FLIGHT); if (PAGES_TO_MiB(ria->ria_reserved) < 1) { ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved); @@ -563,7 +569,7 @@ static void ll_readahead_handle_work(struct work_struct *wq) } } - rc = cl_io_rw_init(env, io, CIT_READ, ria->ria_start, len); + rc = cl_io_rw_init(env, io, CIT_READ, ria->ria_start_idx, pages); if (rc) goto out_put_env; @@ -577,7 +583,8 @@ static void ll_readahead_handle_work(struct work_struct *wq) queue = &io->ci_queue; cl_2queue_init(queue); - rc = ll_read_ahead_pages(env, io, &queue->c2_qin, ras, ria, &ra_end); + rc = ll_read_ahead_pages(env, io, &queue->c2_qin, ras, ria, + &ra_end_idx); if (ria->ria_reserved != 0) ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved); if (queue->c2_qin.pl_nr > 0) { @@ -587,10 +594,10 @@ static void ll_readahead_handle_work(struct work_struct *wq) if (rc == 0) task_io_account_read(PAGE_SIZE * count); } - if (ria->ria_end == ra_end && ra_end == (kms >> PAGE_SHIFT)) + if (ria->ria_end_idx == ra_end_idx && ra_end_idx == (kms >> PAGE_SHIFT)) ll_ra_stats_inc(inode, RA_STAT_EOF); - if (ra_end != ria->ria_end) + if (ra_end_idx != ria->ria_end_idx) ll_ra_stats_inc(inode, RA_STAT_FAILED_REACH_END); /* TODO: discard all pages until page reinit route is implemented */ @@ -606,7 +613,7 @@ static void ll_readahead_handle_work(struct work_struct *wq) out_put_env: cl_env_put(env, &refcheck); out_free_work: - if (ra_end > 0) + if (ra_end_idx > 0) ll_ra_stats_inc_sbi(ll_i2sbi(inode), RA_STAT_ASYNC); ll_readahead_work_free(work); } @@ -618,8 +625,8 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, { struct vvp_io *vio = vvp_env_io(env); struct ll_thread_info *lti = ll_env_info(env); - unsigned long len, mlen = 0; - pgoff_t ra_end = 0, start = 0, end = 0; + unsigned long pages, pages_min = 0; + pgoff_t ra_end_idx = 0, start_idx = 0, end_idx = 0; struct inode *inode; struct ra_io_arg *ria = <i->lti_ria; struct cl_object *clob; @@ -642,39 +649,38 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, spin_lock(&ras->ras_lock); /** - * Note: other thread might rollback the ras_next_readahead, + * Note: other thread might rollback the ras_next_readahead_idx, * if it can not get the full size of prepared pages, see the * end of this function. For stride read ahead, it needs to * make sure the offset is no less than ras_stride_offset, * so that stride read ahead can work correctly. */ if (stride_io_mode(ras)) - start = max(ras->ras_next_readahead, - ras->ras_stride_offset >> PAGE_SHIFT); + start_idx = max_t(pgoff_t, ras->ras_next_readahead_idx, + ras->ras_stride_offset >> PAGE_SHIFT); else - start = ras->ras_next_readahead; + start_idx = ras->ras_next_readahead_idx; - if (ras->ras_window_len > 0) - end = ras->ras_window_start + ras->ras_window_len - 1; + if (ras->ras_window_pages > 0) + end_idx = ras->ras_window_start_idx + ras->ras_window_pages - 1; /* Enlarge the RA window to encompass the full read */ if (vio->vui_ra_valid && - end < vio->vui_ra_start + vio->vui_ra_count - 1) - end = vio->vui_ra_start + vio->vui_ra_count - 1; + end_idx < vio->vui_ra_start_idx + vio->vui_ra_pages - 1) + end_idx = vio->vui_ra_start_idx + vio->vui_ra_pages - 1; - if (end) { - unsigned long end_index; + if (end_idx) { + pgoff_t eof_index; /* Truncate RA window to end of file */ - end_index = (unsigned long)((kms - 1) >> PAGE_SHIFT); - if (end_index <= end) { - end = end_index; + eof_index = (pgoff_t)((kms - 1) >> PAGE_SHIFT); + if (eof_index <= end_idx) { + end_idx = eof_index; ria->ria_eof = true; } } - - ria->ria_start = start; - ria->ria_end = end; + ria->ria_start_idx = start_idx; + ria->ria_end_idx = end_idx; /* If stride I/O mode is detected, get stride window*/ if (stride_io_mode(ras)) { ria->ria_stoff = ras->ras_stride_offset; @@ -683,12 +689,12 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, } spin_unlock(&ras->ras_lock); - if (end == 0) { + if (end_idx == 0) { ll_ra_stats_inc(inode, RA_STAT_ZERO_WINDOW); return 0; } - len = ria_page_count(ria); - if (len == 0) { + pages = ria_page_count(ria); + if (pages == 0) { ll_ra_stats_inc(inode, RA_STAT_ZERO_WINDOW); return 0; } @@ -696,45 +702,48 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, RAS_CDEBUG(ras); CDEBUG(D_READA, DFID ": ria: %lu/%lu, bead: %lu/%lu, hit: %d\n", PFID(lu_object_fid(&clob->co_lu)), - ria->ria_start, ria->ria_end, - vio->vui_ra_valid ? vio->vui_ra_start : 0, - vio->vui_ra_valid ? vio->vui_ra_count : 0, + ria->ria_start_idx, ria->ria_end_idx, + vio->vui_ra_valid ? vio->vui_ra_start_idx : 0, + vio->vui_ra_valid ? vio->vui_ra_pages : 0, hit); /* at least to extend the readahead window to cover current read */ if (!hit && vio->vui_ra_valid && - vio->vui_ra_start + vio->vui_ra_count > ria->ria_start) - ria->ria_end_min = vio->vui_ra_start + vio->vui_ra_count - 1; + vio->vui_ra_start_idx + vio->vui_ra_pages > ria->ria_start_idx) + ria->ria_end_idx_min = + vio->vui_ra_start_idx + vio->vui_ra_pages - 1; - ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, len, mlen); - if (ria->ria_reserved < len) + ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, pages, + pages_min); + if (ria->ria_reserved < pages) ll_ra_stats_inc(inode, RA_STAT_MAX_IN_FLIGHT); - CDEBUG(D_READA, "reserved pages %lu/%lu/%lu, ra_cur %d, ra_max %lu\n", - ria->ria_reserved, len, mlen, + CDEBUG(D_READA, "reserved pages: %lu/%lu/%lu, ra_cur %d, ra_max %lu\n", + ria->ria_reserved, pages, pages_min, atomic_read(&ll_i2sbi(inode)->ll_ra_info.ra_cur_pages), ll_i2sbi(inode)->ll_ra_info.ra_max_pages); - ret = ll_read_ahead_pages(env, io, queue, ras, ria, &ra_end); + ret = ll_read_ahead_pages(env, io, queue, ras, ria, &ra_end_idx); if (ria->ria_reserved) ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved); - if (ra_end == end && ra_end == (kms >> PAGE_SHIFT)) + if (ra_end_idx == end_idx && ra_end_idx == (kms >> PAGE_SHIFT)) ll_ra_stats_inc(inode, RA_STAT_EOF); - CDEBUG(D_READA, "ra_end = %lu end = %lu stride end = %lu pages = %d\n", - ra_end, end, ria->ria_end, ret); + CDEBUG(D_READA, + "ra_end_idx = %lu end_idx = %lu stride end = %lu pages = %d\n", + ra_end_idx, end_idx, ria->ria_end_idx, ret); - if (ra_end != end) + if (ra_end_idx != end_idx) ll_ra_stats_inc(inode, RA_STAT_FAILED_REACH_END); - if (ra_end > 0) { + if (ra_end_idx > 0) { /* update the ras so that the next read-ahead tries from * where we left off. */ spin_lock(&ras->ras_lock); - ras->ras_next_readahead = ra_end + 1; + ras->ras_next_readahead_idx = ra_end_idx + 1; spin_unlock(&ras->ras_lock); RAS_CDEBUG(ras); } @@ -744,7 +753,7 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, static void ras_set_start(struct ll_readahead_state *ras, pgoff_t index) { - ras->ras_window_start = ras_align(ras, index, NULL); + ras->ras_window_start_idx = ras_align(ras, index); } /* called with the ras_lock held or from places where it doesn't matter */ @@ -752,9 +761,9 @@ static void ras_reset(struct ll_readahead_state *ras, pgoff_t index) { ras->ras_consecutive_requests = 0; ras->ras_consecutive_bytes = 0; - ras->ras_window_len = 0; + ras->ras_window_pages = 0; ras_set_start(ras, index); - ras->ras_next_readahead = max(ras->ras_window_start, index + 1); + ras->ras_next_readahead_idx = max(ras->ras_window_start_idx, index + 1); RAS_CDEBUG(ras); } @@ -771,9 +780,9 @@ static void ras_stride_reset(struct ll_readahead_state *ras) void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras) { spin_lock_init(&ras->ras_lock); - ras->ras_rpc_size = PTLRPC_MAX_BRW_PAGES; + ras->ras_rpc_pages = PTLRPC_MAX_BRW_PAGES; ras_reset(ras, 0); - ras->ras_last_read_end = 0; + ras->ras_last_read_end_bytes = 0; ras->ras_requests = 0; } @@ -782,15 +791,15 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras) * If it is in the stride window, return true, otherwise return false. */ static bool read_in_stride_window(struct ll_readahead_state *ras, - unsigned long pos, unsigned long count) + loff_t pos, loff_t count) { - unsigned long stride_gap; + loff_t stride_gap; if (ras->ras_stride_length == 0 || ras->ras_stride_bytes == 0 || ras->ras_stride_bytes == ras->ras_stride_length) return false; - stride_gap = pos - ras->ras_last_read_end - 1; + stride_gap = pos - ras->ras_last_read_end_bytes - 1; /* If it is contiguous read */ if (stride_gap == 0) @@ -804,13 +813,13 @@ static bool read_in_stride_window(struct ll_readahead_state *ras, } static void ras_init_stride_detector(struct ll_readahead_state *ras, - unsigned long pos, unsigned long count) + loff_t pos, loff_t count) { - unsigned long stride_gap = pos - ras->ras_last_read_end - 1; + loff_t stride_gap = pos - ras->ras_last_read_end_bytes - 1; LASSERT(ras->ras_consecutive_stride_requests == 0); - if (pos <= ras->ras_last_read_end) { + if (pos <= ras->ras_last_read_end_bytes) { /*Reset stride window for forward read*/ ras_stride_reset(ras); return; @@ -828,47 +837,50 @@ static void ras_init_stride_detector(struct ll_readahead_state *ras, * stride I/O pattern */ static void ras_stride_increase_window(struct ll_readahead_state *ras, - struct ll_ra_info *ra, - unsigned long inc_len) + struct ll_ra_info *ra, loff_t inc_bytes) { - unsigned long left, step, window_len; - unsigned long stride_len; - unsigned long end = ras->ras_window_start + ras->ras_window_len; + loff_t window_bytes, stride_bytes; + u64 left_bytes; + u64 step; + loff_t end; + + /* temporarily store in page units to reduce LASSERT() cost below */ + end = ras->ras_window_start_idx + ras->ras_window_pages; LASSERT(ras->ras_stride_length > 0); LASSERTF(end >= (ras->ras_stride_offset >> PAGE_SHIFT), - "window_start %lu, window_len %lu stride_offset %lu\n", - ras->ras_window_start, ras->ras_window_len, + "window_start_idx %lu, window_pages %lu stride_offset %llu\n", + ras->ras_window_start_idx, ras->ras_window_pages, ras->ras_stride_offset); end <<= PAGE_SHIFT; - if (end < ras->ras_stride_offset) - stride_len = 0; + if (end <= ras->ras_stride_offset) + stride_bytes = 0; else - stride_len = end - ras->ras_stride_offset; + stride_bytes = end - ras->ras_stride_offset; - left = stride_len % ras->ras_stride_length; - window_len = (ras->ras_window_len << PAGE_SHIFT) - left; + div64_u64_rem(stride_bytes, ras->ras_stride_length, &left_bytes); + window_bytes = ((loff_t)ras->ras_window_pages << PAGE_SHIFT) - + left_bytes; - if (left < ras->ras_stride_bytes) - left += inc_len; + if (left_bytes < ras->ras_stride_bytes) + left_bytes += inc_bytes; else - left = ras->ras_stride_bytes + inc_len; + left_bytes = ras->ras_stride_bytes + inc_bytes; LASSERT(ras->ras_stride_bytes != 0); - step = left / ras->ras_stride_bytes; - left %= ras->ras_stride_bytes; + step = div64_u64_rem(left_bytes, ras->ras_stride_bytes, &left_bytes); - window_len += step * ras->ras_stride_length + left; + window_bytes += step * ras->ras_stride_length + left_bytes; if (DIV_ROUND_UP(stride_byte_count(ras->ras_stride_offset, ras->ras_stride_length, ras->ras_stride_bytes, ras->ras_stride_offset, - window_len), PAGE_SIZE) + window_bytes), PAGE_SIZE) <= ra->ra_max_pages_per_file) - ras->ras_window_len = (window_len >> PAGE_SHIFT); + ras->ras_window_pages = (window_bytes >> PAGE_SHIFT); RAS_CDEBUG(ras); } @@ -883,36 +895,34 @@ static void ras_increase_window(struct inode *inode, */ if (stride_io_mode(ras)) { ras_stride_increase_window(ras, ra, - ras->ras_rpc_size << PAGE_SHIFT); + (loff_t)ras->ras_rpc_pages << PAGE_SHIFT); } else { - unsigned long wlen; + pgoff_t window_pages; - wlen = min(ras->ras_window_len + ras->ras_rpc_size, - ra->ra_max_pages_per_file); - if (wlen < ras->ras_rpc_size) - ras->ras_window_len = wlen; + window_pages = min(ras->ras_window_pages + ras->ras_rpc_pages, + ra->ra_max_pages_per_file); + if (window_pages < ras->ras_rpc_pages) + ras->ras_window_pages = window_pages; else - ras->ras_window_len = ras_align(ras, wlen, NULL); + ras->ras_window_pages = ras_align(ras, window_pages); } } /** * Seek within 8 pages are considered as sequential read for now. */ -static inline bool is_loose_seq_read(struct ll_readahead_state *ras, - unsigned long pos) +static inline bool is_loose_seq_read(struct ll_readahead_state *ras, loff_t pos) { - return pos_in_window(pos, ras->ras_last_read_end, - 8 << PAGE_SHIFT, 8 << PAGE_SHIFT); + return pos_in_window(pos, ras->ras_last_read_end_bytes, + 8UL << PAGE_SHIFT, 8UL << PAGE_SHIFT); } static void ras_detect_read_pattern(struct ll_readahead_state *ras, struct ll_sb_info *sbi, - unsigned long pos, unsigned long count, - bool mmap) + loff_t pos, size_t count, bool mmap) { bool stride_detect = false; - unsigned long index = pos >> PAGE_SHIFT; + pgoff_t index = pos >> PAGE_SHIFT; /* * Reset the read-ahead window in two cases. First when the app seeks @@ -947,25 +957,25 @@ static void ras_detect_read_pattern(struct ll_readahead_state *ras, */ if (!read_in_stride_window(ras, pos, count)) { ras_stride_reset(ras); - ras->ras_window_len = 0; - ras->ras_next_readahead = index; + ras->ras_window_pages = 0; + ras->ras_next_readahead_idx = index; } } ras->ras_consecutive_bytes += count; if (mmap) { - unsigned int idx = (ras->ras_consecutive_bytes >> PAGE_SHIFT); + pgoff_t idx = ras->ras_consecutive_bytes >> PAGE_SHIFT; - if ((idx >= 4 && idx % 4 == 0) || stride_detect) + if ((idx >= 4 && (idx & 3UL) == 0) || stride_detect) ras->ras_need_increase_window = true; } else if ((ras->ras_consecutive_requests > 1 || stride_detect)) { ras->ras_need_increase_window = true; } - ras->ras_last_read_end = pos + count - 1; + ras->ras_last_read_end_bytes = pos + count - 1; } -void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count) +void ll_ras_enter(struct file *f, loff_t pos, size_t count) { struct ll_file_data *fd = LUSTRE_FPRIVATE(f); struct ll_readahead_state *ras = &fd->fd_ras; @@ -998,10 +1008,10 @@ void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count) if (kms_pages && kms_pages <= ra->ra_max_read_ahead_whole_pages) { - ras->ras_window_start = 0; - ras->ras_next_readahead = index + 1; - ras->ras_window_len = min(ra->ra_max_pages_per_file, - ra->ra_max_read_ahead_whole_pages); + ras->ras_window_start_idx = 0; + ras->ras_next_readahead_idx = index + 1; + ras->ras_window_pages = min(ra->ra_max_pages_per_file, + ra->ra_max_read_ahead_whole_pages); ras->ras_no_miss_check = true; goto out_unlock; } @@ -1012,18 +1022,19 @@ void ll_ras_enter(struct file *f, unsigned long pos, unsigned long count) } static bool index_in_stride_window(struct ll_readahead_state *ras, - unsigned int index) + pgoff_t index) { - unsigned long pos = index << PAGE_SHIFT; - unsigned long offset; + loff_t pos = (loff_t)index << PAGE_SHIFT; if (ras->ras_stride_length == 0 || ras->ras_stride_bytes == 0 || ras->ras_stride_bytes == ras->ras_stride_length) return false; if (pos >= ras->ras_stride_offset) { - offset = (pos - ras->ras_stride_offset) % - ras->ras_stride_length; + u64 offset; + + div64_u64_rem(pos - ras->ras_stride_offset, + ras->ras_stride_length, &offset); if (offset < ras->ras_stride_bytes || ras->ras_stride_length - offset < PAGE_SIZE) return true; @@ -1035,14 +1046,13 @@ static bool index_in_stride_window(struct ll_readahead_state *ras, } /* - * ll_ras_enter() is used to detect read pattern according to - * pos and count. + * ll_ras_enter() is used to detect read pattern according to pos and count. * * ras_update() is used to detect cache miss and * reset window or increase window accordingly */ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, - struct ll_readahead_state *ras, unsigned long index, + struct ll_readahead_state *ras, pgoff_t index, enum ras_update_flags flags) { struct ll_ra_info *ra = &sbi->ll_ra_info; @@ -1065,13 +1075,13 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, goto out_unlock; if (flags & LL_RAS_MMAP) - ras_detect_read_pattern(ras, sbi, index << PAGE_SHIFT, + ras_detect_read_pattern(ras, sbi, (loff_t)index << PAGE_SHIFT, PAGE_SIZE, true); - if (!hit && ras->ras_window_len && - index < ras->ras_next_readahead && - pos_in_window(index, ras->ras_window_start, 0, - ras->ras_window_len)) { + if (!hit && ras->ras_window_pages && + index < ras->ras_next_readahead_idx && + pos_in_window(index, ras->ras_window_start_idx, 0, + ras->ras_window_pages)) { ll_ra_stats_inc_sbi(sbi, RA_STAT_MISS_IN_WINDOW); ras->ras_need_increase_window = false; @@ -1090,8 +1100,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, * is still intersect with normal sequential * read-ahead window. */ - if (ras->ras_window_start < - ras->ras_stride_offset) + if (ras->ras_window_start_idx < ras->ras_stride_offset) ras_stride_reset(ras); RAS_CDEBUG(ras); } else { @@ -1111,18 +1120,18 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, if (stride_io_mode(ras)) { /* Since stride readahead is sensitive to the offset * of read-ahead, so we use original offset here, - * instead of ras_window_start, which is RPC aligned + * instead of ras_window_start_idx, which is RPC aligned. */ - ras->ras_next_readahead = max(index + 1, - ras->ras_next_readahead); - ras->ras_window_start = - max(ras->ras_stride_offset >> PAGE_SHIFT, - ras->ras_window_start); + ras->ras_next_readahead_idx = max(index + 1, + ras->ras_next_readahead_idx); + ras->ras_window_start_idx = + max_t(pgoff_t, ras->ras_window_start_idx, + ras->ras_stride_offset >> PAGE_SHIFT); } else { - if (ras->ras_next_readahead < ras->ras_window_start) - ras->ras_next_readahead = ras->ras_window_start; + if (ras->ras_next_readahead_idx < ras->ras_window_start_idx) + ras->ras_next_readahead_idx = ras->ras_window_start_idx; if (!hit) - ras->ras_next_readahead = index + 1; + ras->ras_next_readahead_idx = index + 1; } if (ras->ras_need_increase_window) { @@ -1241,7 +1250,7 @@ int ll_writepages(struct address_space *mapping, struct writeback_control *wbc) int result; if (wbc->range_cyclic) { - start = mapping->writeback_index << PAGE_SHIFT; + start = (loff_t)mapping->writeback_index << PAGE_SHIFT; end = OBD_OBJECT_EOF; } else { start = wbc->range_start; @@ -1429,8 +1438,8 @@ static int kickoff_async_readahead(struct file *file, unsigned long pages) struct ll_readahead_state *ras = &fd->fd_ras; struct ll_ra_info *ra = &sbi->ll_ra_info; unsigned long throttle; - unsigned long start = ras_align(ras, ras->ras_next_readahead, NULL); - unsigned long end = start + pages - 1; + pgoff_t start_idx = ras_align(ras, ras->ras_next_readahead_idx); + pgoff_t end_idx = start_idx + pages - 1; throttle = min(ra->ra_async_pages_per_file_threshold, ra->ra_max_pages_per_file); @@ -1440,24 +1449,24 @@ static int kickoff_async_readahead(struct file *file, unsigned long pages) * we do async readahead, allowing the user thread to do fast i/o. */ if (stride_io_mode(ras) || !throttle || - ras->ras_window_len < throttle) + ras->ras_window_pages < throttle) return 0; if ((atomic_read(&ra->ra_cur_pages) + pages) > ra->ra_max_pages) return 0; - if (ras->ras_async_last_readpage == start) + if (ras->ras_async_last_readpage_idx == start_idx) return 1; /* ll_readahead_work_free() free it */ lrw = kzalloc(sizeof(*lrw), GFP_NOFS); if (lrw) { lrw->lrw_file = get_file(file); - lrw->lrw_start = start; - lrw->lrw_end = end; + lrw->lrw_start_idx = start_idx; + lrw->lrw_end_idx = end_idx; spin_lock(&ras->ras_lock); - ras->ras_next_readahead = end + 1; - ras->ras_async_last_readpage = start; + ras->ras_next_readahead_idx = end_idx + 1; + ras->ras_async_last_readpage_idx = start_idx; spin_unlock(&ras->ras_lock); ll_readahead_work_add(inode, lrw); } else { @@ -1489,7 +1498,7 @@ int ll_readpage(struct file *file, struct page *vmpage) struct lu_env *local_env = NULL; struct inode *inode = file_inode(file); unsigned long fast_read_pages = - max(RA_REMAIN_WINDOW_MIN, ras->ras_rpc_size); + max(RA_REMAIN_WINDOW_MIN, ras->ras_rpc_pages); struct vvp_page *vpg; result = -ENODATA; @@ -1526,8 +1535,8 @@ int ll_readpage(struct file *file, struct page *vmpage) * the case, we can't do fast IO because we will need * a cl_io to issue the RPC. */ - if (ras->ras_window_start + ras->ras_window_len < - ras->ras_next_readahead + fast_read_pages || + if (ras->ras_window_start_idx + ras->ras_window_pages < + ras->ras_next_readahead_idx + fast_read_pages || kickoff_async_readahead(file, fast_read_pages) > 0) result = 0; } diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h index 1cc152f..0382b79 100644 --- a/fs/lustre/llite/vvp_internal.h +++ b/fs/lustre/llite/vvp_internal.h @@ -103,8 +103,8 @@ struct vvp_io { struct kiocb *vui_iocb; /* Readahead state. */ - pgoff_t vui_ra_start; - pgoff_t vui_ra_count; + pgoff_t vui_ra_start_idx; + pgoff_t vui_ra_pages; /* Set when vui_ra_{start,count} have been initialized. */ bool vui_ra_valid; }; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 259b14a..cf116be 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -739,8 +739,8 @@ static int vvp_io_read_start(const struct lu_env *env, struct file *file = vio->vui_fd->fd_file; int result; loff_t pos = io->u.ci_rd.rd.crw_pos; - long cnt = io->u.ci_rd.rd.crw_count; - long tot = vio->vui_tot_count; + size_t cnt = io->u.ci_rd.rd.crw_count; + size_t tot = vio->vui_tot_count; int exceed = 0; CLOBINVRNT(env, obj, vvp_object_invariant(obj)); @@ -776,16 +776,16 @@ static int vvp_io_read_start(const struct lu_env *env, /* initialize read-ahead window once per syscall */ if (!vio->vui_ra_valid) { vio->vui_ra_valid = true; - vio->vui_ra_start = cl_index(obj, pos); - vio->vui_ra_count = cl_index(obj, tot + PAGE_SIZE - 1); + vio->vui_ra_start_idx = cl_index(obj, pos); + vio->vui_ra_pages = cl_index(obj, tot + PAGE_SIZE - 1); /* If both start and end are unaligned, we read one more page * than the index math suggests. */ - if (pos % PAGE_SIZE != 0 && (pos + tot) % PAGE_SIZE != 0) - vio->vui_ra_count++; + if ((pos & ~PAGE_MASK) != 0 && ((pos + tot) & ~PAGE_MASK) != 0) + vio->vui_ra_pages++; - CDEBUG(D_READA, "tot %ld, ra_start %lu, ra_count %lu\n", tot, - vio->vui_ra_start, vio->vui_ra_count); + CDEBUG(D_READA, "tot %zu, ra_start %lu, ra_count %lu\n", + tot, vio->vui_ra_start_idx, vio->vui_ra_pages); } /* BUG: 5972 */ @@ -1424,7 +1424,7 @@ static int vvp_io_read_ahead(const struct lu_env *env, struct vvp_io *vio = cl2vvp_io(env, ios); if (unlikely(vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) { - ra->cra_end = CL_PAGE_EOF; + ra->cra_end_idx = CL_PAGE_EOF; result = 1; /* no need to call down */ } } diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 971f9ba..019e986 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -1014,7 +1014,8 @@ static int lov_io_read_ahead(const struct lu_env *env, ra); CDEBUG(D_READA, DFID " cra_end = %lu, stripes = %d, rc = %d\n", - PFID(lu_object_fid(lov2lu(loo))), ra->cra_end, r0->lo_nr, rc); + PFID(lu_object_fid(lov2lu(loo))), ra->cra_end_idx, + r0->lo_nr, rc); if (rc) return rc; @@ -1027,15 +1028,15 @@ static int lov_io_read_ahead(const struct lu_env *env, */ /* cra_end is stripe level, convert it into file level */ - ra_end = ra->cra_end; + ra_end = ra->cra_end_idx; if (ra_end != CL_PAGE_EOF) - ra->cra_end = lov_stripe_pgoff(loo->lo_lsm, index, - ra_end, stripe); + ra->cra_end_idx = lov_stripe_pgoff(loo->lo_lsm, index, + ra_end, stripe); /* boundary of current component */ ra_end = cl_index(obj, (loff_t)lov_io_extent(lio, index)->e_end); - if (ra_end != CL_PAGE_EOF && ra->cra_end >= ra_end) - ra->cra_end = ra_end - 1; + if (ra_end != CL_PAGE_EOF && ra->cra_end_idx >= ra_end) + ra->cra_end_idx = ra_end - 1; if (r0->lo_nr == 1) /* single stripe file */ return 0; @@ -1043,13 +1044,13 @@ static int lov_io_read_ahead(const struct lu_env *env, pps = lov_lse(loo, index)->lsme_stripe_size >> PAGE_SHIFT; CDEBUG(D_READA, - DFID " max_index = %lu, pps = %u, index = %u, stripe_size = %u, stripe no = %u, start index = %lu\n", - PFID(lu_object_fid(lov2lu(loo))), ra->cra_end, pps, index, + DFID " max_index = %lu, pps = %u, index = %d, stripe_size = %u, stripe no = %u, start index = %lu\n", + PFID(lu_object_fid(lov2lu(loo))), ra->cra_end_idx, pps, index, lov_lse(loo, index)->lsme_stripe_size, stripe, start); /* never exceed the end of the stripe */ - ra->cra_end = min_t(pgoff_t, - ra->cra_end, start + pps - start % pps - 1); + ra->cra_end_idx = min_t(pgoff_t, ra->cra_end_idx, + start + pps - start % pps - 1); return 0; } diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 312e527..496491f 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -1099,8 +1099,8 @@ static int mdc_io_read_ahead(const struct lu_env *env, ldlm_lock_decref(&lockh, dlmlock->l_req_mode); } - ra->cra_rpc_size = osc_cli(osc)->cl_max_pages_per_rpc; - ra->cra_end = CL_PAGE_EOF; + ra->cra_rpc_pages = osc_cli(osc)->cl_max_pages_per_rpc; + ra->cra_end_idx = CL_PAGE_EOF; ra->cra_release = osc_read_ahead_release; ra->cra_cbdata = dlmlock; diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c index 230e1a5..cbb91ed 100644 --- a/fs/lustre/obdclass/integrity.c +++ b/fs/lustre/obdclass/integrity.c @@ -229,7 +229,7 @@ static void obd_t10_performance_test(const char *obd_name, for (start = jiffies, end = start + HZ / 4, bcount = 0; time_before(jiffies, end) && rc == 0; bcount++) { rc = __obd_t10_performance_test(obd_name, cksum_type, page, - buf_len / PAGE_SIZE); + buf_len >> PAGE_SHIFT); if (rc) break; } diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index dde03bd..7a8dbfc 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1349,7 +1349,7 @@ static int osc_refresh_count(const struct lu_env *env, return 0; else if (cl_offset(obj, index + 1) > kms) /* catch sub-page write at end of file */ - return kms % PAGE_SIZE; + return kms & ~PAGE_MASK; else return PAGE_SIZE; } diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 1ff2df2..f26c95d 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -88,12 +88,12 @@ static int osc_io_read_ahead(const struct lu_env *env, ldlm_lock_decref(&lockh, dlmlock->l_req_mode); } - ra->cra_rpc_size = osc_cli(osc)->cl_max_pages_per_rpc; - ra->cra_end = cl_index(osc2cl(osc), - dlmlock->l_policy_data.l_extent.end); + ra->cra_rpc_pages = osc_cli(osc)->cl_max_pages_per_rpc; + ra->cra_end_idx = cl_index(osc2cl(osc), + dlmlock->l_policy_data.l_extent.end); ra->cra_release = osc_read_ahead_release; ra->cra_cbdata = dlmlock; - if (ra->cra_end != CL_PAGE_EOF) + if (ra->cra_end_idx != CL_PAGE_EOF) ra->cra_contention = true; result = 0; } From patchwork Thu Feb 27 21:17:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0E561580 for ; Thu, 27 Feb 2020 21:47:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A97AB24690 for ; Thu, 27 Feb 2020 21:47:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A97AB24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E928C34B521; Thu, 27 Feb 2020 13:37:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B14093489F6 for ; Thu, 27 Feb 2020 13:21:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 51EA8A152; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 501BA46C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:47 -0500 Message-Id: <1582838290-17243-600-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 599/622] lustre: llite: Accept EBUSY for page unaligned read X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When doing unaligned strided reads, it's possible for the first and last page of a stride to be read by another thread on the same node, resulting in EBUSY. Also this could potentially happen for sequential read, for example, several MPI split one large file with unaligned page size, sequential read happen with each MPI program. We shouldn't stop readahead in these cases. WC-bug-id: https://jira.whamcloud.com/browse/LU-12518 Lustre-commit: b9c155065d2c ("LU-12518 llite: Accept EBUSY for page unaligned read") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/35457 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 9509023..1b5260d 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -360,7 +360,8 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria) { struct cl_read_ahead ra = { 0 }; pgoff_t page_idx; - int count = 0; + /* busy page count is per stride */ + int count = 0, busy_page_count = 0; int rc; LASSERT(ria); @@ -416,8 +417,21 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria) /* If the page is inside the read-ahead window */ rc = ll_read_ahead_page(env, io, queue, page_idx); - if (rc < 0) + if (rc < 0 && rc != -EBUSY) break; + if (rc == -EBUSY) { + busy_page_count++; + CDEBUG(D_READA, + "skip busy page: %lu\n", page_idx); + /* For page unaligned readahead the first + * last pages of each region can be read by + * another reader on the same node, and so + * may be busy. So only stop for > 2 busy + * pages. + */ + if (busy_page_count > 2) + break; + } *ra_end = page_idx; /* Only subtract from reserve & count the page if we @@ -441,6 +455,7 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria) pos += (ria->ria_length - offset); if ((pos >> PAGE_SHIFT) >= page_idx + 1) page_idx = (pos >> PAGE_SHIFT) - 1; + busy_page_count = 0; CDEBUG(D_READA, "Stride: jump %llu pages to %lu\n", ria->ria_length - offset, page_idx); From patchwork Thu Feb 27 21:17:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410835 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8450C924 for ; Thu, 27 Feb 2020 21:47:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6CDEA24690 for ; Thu, 27 Feb 2020 21:47:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CDEA24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 82BD834B560; Thu, 27 Feb 2020 13:38:06 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 552063489F6 for ; Thu, 27 Feb 2020 13:21:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 57998A154; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 55F0246A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:48 -0500 Message-Id: <1582838290-17243-601-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 600/622] lustre: handle: remove locking from class_handle2object() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown There is limited value in this locking and test on h_in. If the lookup could have run in parallel with class_handle_unhash_nolock() and seen "h_in == 0", then it could equally well have run moments earlier and not seen it - no locking would prevent that, so the caller much be prepared to have an object returned which has already been unhashed by the time it sees the object. In other words, any interlock between unhash and lookup must be provided at a higher level than where this code is trying to handle it. The locking *does* prevent the refcount from being incremented if the object has already been removed from the list. As the final reference is always dropped after that removal, it indirectly stops the refcount from being incremented after the final reference is dropped. This can be more directly achieved by using refcount_inc_not_zero(). So remove the locking, and replace it with refcount_inc_not_zero(). WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: e2458a94a6a2 ("LU-12542 handle: remove locking from class_handle2object()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35861 Reviewed-by: Mike Pershin Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lustre_handles.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index 6989a60..acee2db 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -149,15 +149,12 @@ void *class_handle2object(u64 cookie, const char *owner) if (h->h_cookie != cookie || h->h_owner != owner) continue; - spin_lock(&h->h_lock); - if (likely(h->h_in != 0)) { - refcount_inc(&h->h_ref); + if (refcount_inc_not_zero(&h->h_ref)) { CDEBUG(D_INFO, "GET %s %p refcount=%d\n", h->h_owner, h, refcount_read(&h->h_ref)); retval = h; } - spin_unlock(&h->h_lock); break; } rcu_read_unlock(); From patchwork Thu Feb 27 21:17:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B20C7924 for ; Thu, 27 Feb 2020 21:47:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A85624690 for ; Thu, 27 Feb 2020 21:47:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9A85624690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0F84B34B554; Thu, 27 Feb 2020 13:38:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F2D7A3489F6 for ; Thu, 27 Feb 2020 13:21:25 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 57902A153; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5689746D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:49 -0500 Message-Id: <1582838290-17243-602-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 601/622] lustre: handle: use hlist for hash lists. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown hlist_head/hlist_node is the preferred data structure for hash tables. Not only does it make the 'head' smaller, but is also provides hlist_unhashed() which can be used to check if an object is in the list. This means that we don't need h_in any more. WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: 9c9ea6584cfb ("LU-12542 handle: use hlist for hash lists.") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/35862 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_handles.h | 3 +-- fs/lustre/ldlm/ldlm_lock.c | 2 +- fs/lustre/obdclass/genops.c | 2 +- fs/lustre/obdclass/lustre_handles.c | 20 +++++++++----------- 4 files changed, 12 insertions(+), 15 deletions(-) diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h index 55f9a09..afdade7 100644 --- a/fs/lustre/include/lustre_handles.h +++ b/fs/lustre/include/lustre_handles.h @@ -58,7 +58,7 @@ * to compute the start of the structure based on the handle field. */ struct portals_handle { - struct list_head h_link; + struct hlist_node h_link; u64 h_cookie; const char *h_owner; refcount_t h_ref; @@ -66,7 +66,6 @@ struct portals_handle { /* newly added fields to handle the RCU issue. -jxiong */ struct rcu_head h_rcu; spinlock_t h_lock; - unsigned int h_in:1; }; /* handles.c */ diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 2c19636..396bf53 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -404,7 +404,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource) lprocfs_counter_incr(ldlm_res_to_ns(resource)->ns_stats, LDLM_NSS_LOCKS); - INIT_LIST_HEAD(&lock->l_handle.h_link); + INIT_HLIST_NODE(&lock->l_handle.h_link); class_handle_hash(&lock->l_handle, lock_handle_owner); lu_ref_init(&lock->l_reference); diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 0fbe03e..146e735 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -813,7 +813,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd, spin_lock_init(&export->exp_uncommitted_replies_lock); INIT_LIST_HEAD(&export->exp_uncommitted_replies); INIT_LIST_HEAD(&export->exp_req_replay_queue); - INIT_LIST_HEAD_RCU(&export->exp_handle.h_link); + INIT_HLIST_NODE(&export->exp_handle.h_link); INIT_LIST_HEAD(&export->exp_hp_rpcs); class_handle_hash(&export->exp_handle, export_handle_owner); spin_lock_init(&export->exp_lock); diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index acee2db..0048036 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -48,7 +48,7 @@ static struct handle_bucket { spinlock_t lock; - struct list_head head; + struct hlist_head head; } *handle_hash; #define HANDLE_HASH_SIZE (1 << 16) @@ -63,7 +63,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner) struct handle_bucket *bucket; LASSERT(h); - LASSERT(list_empty(&h->h_link)); + LASSERT(hlist_unhashed(&h->h_link)); /* * This is fast, but simplistic cookie generation algorithm, it will @@ -89,8 +89,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner) bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK]; spin_lock(&bucket->lock); - list_add_rcu(&h->h_link, &bucket->head); - h->h_in = 1; + hlist_add_head_rcu(&h->h_link, &bucket->head); spin_unlock(&bucket->lock); CDEBUG(D_INFO, "added object %p with handle %#llx to hash\n", @@ -100,7 +99,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner) static void class_handle_unhash_nolock(struct portals_handle *h) { - if (list_empty(&h->h_link)) { + if (hlist_unhashed(&h->h_link)) { CERROR("removing an already-removed handle (%#llx)\n", h->h_cookie); return; @@ -110,13 +109,12 @@ static void class_handle_unhash_nolock(struct portals_handle *h) h, h->h_cookie); spin_lock(&h->h_lock); - if (h->h_in == 0) { + if (hlist_unhashed(&h->h_link)) { spin_unlock(&h->h_lock); return; } - h->h_in = 0; + hlist_del_init_rcu(&h->h_link); spin_unlock(&h->h_lock); - list_del_rcu(&h->h_link); } void class_handle_unhash(struct portals_handle *h) @@ -145,7 +143,7 @@ void *class_handle2object(u64 cookie, const char *owner) bucket = handle_hash + (cookie & HANDLE_HASH_MASK); rcu_read_lock(); - list_for_each_entry_rcu(h, &bucket->head, h_link) { + hlist_for_each_entry_rcu(h, &bucket->head, h_link) { if (h->h_cookie != cookie || h->h_owner != owner) continue; @@ -177,7 +175,7 @@ int class_handle_init(void) spin_lock_init(&handle_base_lock); for (bucket = handle_hash + HANDLE_HASH_SIZE - 1; bucket >= handle_hash; bucket--) { - INIT_LIST_HEAD(&bucket->head); + INIT_HLIST_HEAD(&bucket->head); spin_lock_init(&bucket->lock); } @@ -196,7 +194,7 @@ static int cleanup_all_handles(void) struct portals_handle *h; spin_lock(&handle_hash[i].lock); - list_for_each_entry_rcu(h, &handle_hash[i].head, h_link) { + hlist_for_each_entry_rcu(h, &handle_hash[i].head, h_link) { CERROR("force clean handle %#llx addr %p owner %p\n", h->h_cookie, h, h->h_owner); From patchwork Thu Feb 27 21:17:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC41817E0 for ; Thu, 27 Feb 2020 21:41:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 95112246A1 for ; Thu, 27 Feb 2020 21:41:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 95112246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2971034A98A; Thu, 27 Feb 2020 13:33:41 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 98DB63489F6 for ; Thu, 27 Feb 2020 13:21:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 59F42A155; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5895447C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:50 -0500 Message-Id: <1582838290-17243-603-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 602/622] lustre: obdclass: convert waiting in cl_sync_io_wait(). X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This function will *always* wait until ->csi_sync_nr reaches zero. The effect of the timeout is: 1/ to report an error if the count doesn't reach zero in the given time 2/ to return -ETIMEDOUt instead of csi_sync_rc if the timeout was exceeded. So we rearrange the code to make that more obvious. A small exrta change is that we now call wait_event_idle() again even if there was a timeout and the first wait succeeded. This will simply test csi_sync_nr again and not actually wait. We could protected it with 'rc != 0 || timeout == 0' but there seems no point. WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: d6ce546eb7e2 ("LU-10467 obdclass: convert waiting in cl_sync_io_wait().") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36102 Reviewed-by: Bobi Jam Reviewed-by: Wang Shilong Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 3bc9097..e11f9fe 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1054,27 +1054,24 @@ void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, long timeout) { - int rc = 1; + int rc = 0; LASSERT(timeout >= 0); - if (timeout == 0) - wait_event_idle(anchor->csi_waitq, - atomic_read(&anchor->csi_sync_nr) == 0); - else - rc = wait_event_idle_timeout(anchor->csi_waitq, - atomic_read(&anchor->csi_sync_nr) == 0, - timeout * HZ); - if (rc == 0) { + if (timeout > 0 && + wait_event_idle_timeout(anchor->csi_waitq, + atomic_read(&anchor->csi_sync_nr) == 0, + timeout * HZ) == 0) { rc = -ETIMEDOUT; CERROR("IO failed: %d, still wait for %d remaining entries\n", rc, atomic_read(&anchor->csi_sync_nr)); + } - wait_event_idle(anchor->csi_waitq, - atomic_read(&anchor->csi_sync_nr) == 0); - } else { + wait_event_idle(anchor->csi_waitq, + atomic_read(&anchor->csi_sync_nr) == 0); + if (!rc) rc = anchor->csi_sync_rc; - } + /* We take the lock to ensure that cl_sync_io_note() has finished */ spin_lock(&anchor->csi_waitq.lock); LASSERT(atomic_read(&anchor->csi_sync_nr) == 0); From patchwork Thu Feb 27 21:17:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73ABB17E0 for ; Thu, 27 Feb 2020 21:48:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C5CE24690 for ; Thu, 27 Feb 2020 21:48:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C5CE24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E740B34B303; Thu, 27 Feb 2020 13:38:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DBD1F3489F6 for ; Thu, 27 Feb 2020 13:21:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5C4B0A156; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5B604468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:51 -0500 Message-Id: <1582838290-17243-604-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 603/622] lnet: modules: use list_move were appropriate. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Rather than list_del(&foo); list_add(&foo, &bar); use list_move(&foo, &bar); Similarly for list_add_tail and list_move_tail. In lnet_attach_rsp_tracker, local_rspt already has a suitably initialised ->rspt_on_list, so the new_entry variable can be discarded. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: 7525fd36a266 ("LU-9679 modules: use list_move were appropriate.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36670 Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ca292a6..47d5389 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4374,7 +4374,6 @@ void lnet_monitor_thr_stop(void) struct lnet_libmd *md, struct lnet_handle_md mdh) { s64 timeout_ns; - bool new_entry = true; struct lnet_rsp_tracker *local_rspt; /* MD has a refcount taken by message so it's not going away. @@ -4391,7 +4390,6 @@ void lnet_monitor_thr_stop(void) * update the deadline on that one. */ lnet_rspt_free(rspt, cpt); - new_entry = false; } else { /* new md */ rspt->rspt_mdh = mdh; @@ -4406,9 +4404,7 @@ void lnet_monitor_thr_stop(void) * list in order to expire all the older entries first. */ lnet_net_lock(cpt); - if (!new_entry && !list_empty(&local_rspt->rspt_on_list)) - list_del_init(&local_rspt->rspt_on_list); - list_add_tail(&local_rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); + list_move_tail(&local_rspt->rspt_on_list, the_lnet.ln_mt_rstq[cpt]); lnet_net_unlock(cpt); lnet_res_unlock(cpt); } From patchwork Thu Feb 27 21:17:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F27A138D for ; Thu, 27 Feb 2020 21:41:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 864A224690 for ; Thu, 27 Feb 2020 21:41:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 864A224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DFC3834A9B8; Thu, 27 Feb 2020 13:33:44 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 285143489F6 for ; Thu, 27 Feb 2020 13:21:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 5F909A157; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5E15B46C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:52 -0500 Message-Id: <1582838290-17243-605-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 604/622] lnet: fix small race in unloading klnd modules. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Reference counting of klnd modules is handled by the module itself. Currently, it is possible for a module to be completely unloaded between the time when the module called module_put(), and when it subsequently returns from the function that makes that call. During this time there may be one or two instructions to execute, and if the module is unmapped before they are executed, an exception will result. The module unload will call lnet_unregister_lnd() which takes the_lnet.ln_lnd_mutex, so module unload cannot complete while that is held. lnd_startup is called with this mutex held to avoid any races, but lnd_shutdown is not. Adding that protection will close the race. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: c087091cd901 ("LU-12678 lnet: fix small race in unloading klnd modules.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36853 Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0ca8bef..5df39aa 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1983,7 +1983,14 @@ static void lnet_push_target_fini(void) islo = ni->ni_net->net_lnd->lnd_type == LOLND; LASSERT(!in_interrupt()); + /* Holding the mutex makes it safe for lnd_shutdown + * to call module_put(). Module unload cannot finish + * until lnet_unregister_lnd() completes, and that + * requires the mutex. + */ + mutex_lock(&the_lnet.ln_lnd_mutex); net->net_lnd->lnd_shutdown(ni); + mutex_unlock(&the_lnet.ln_lnd_mutex); if (!islo) CDEBUG(D_LNI, "Removed LNI %s\n", From patchwork Thu Feb 27 21:17:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 447BA17E0 for ; Thu, 27 Feb 2020 21:41:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2D0AB246A1 for ; Thu, 27 Feb 2020 21:41:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D0AB246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7F26A34A9E3; Thu, 27 Feb 2020 13:33:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6BE6321FC75 for ; Thu, 27 Feb 2020 13:21:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 62409A158; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 60FE3496; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:53 -0500 Message-Id: <1582838290-17243-606-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 605/622] lnet: me: discard struct lnet_handle_me X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The Portals API uses a cookie 'handle' to identify an ME. This is appropriate for a user-space API for objects maintained by the kernel, but it brings no value when the API client and implementation are both in the kernel, as is the case with Lustre and LNet. Instead of using a 'handle', a pointer to the 'struct lnet_me' can be used. This object is not reference counted and is always freed correctly, so there can be no case where the cookie becomes invalid while it is still held - as can be seen by the fact that the return value from LNetMEUnlink() is never used except to assert that it is zero. So use 'struct lnet_me *' directly instead of having indirection through a 'struct lnet_handle_me'. Also: - change LNetMEUnlink() to return void as it cannot fail now. - have LNetMEAttach() return the pointer, using ERR_PTR() to return errors. - discard ln_me_containers and don't store the me there-in. - store an explicit 'cpt' in each me, we no longer store one implicitly via the cookie. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: ceeeae4271fd ("LU-12678 lnet: me: discard struct lnet_handle_me") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36859 Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/niobuf.c | 45 ++++++++++++++++----------------- include/linux/lnet/api.h | 20 +++++++-------- include/linux/lnet/lib-lnet.h | 22 ----------------- include/linux/lnet/lib-types.h | 4 +-- include/uapi/linux/lnet/lnet-types.h | 4 --- net/lnet/lnet/api-ni.c | 46 ++++++++++++---------------------- net/lnet/lnet/lib-md.c | 16 +++++------- net/lnet/lnet/lib-me.c | 48 +++++++++++------------------------- net/lnet/selftest/rpc.c | 14 +++++------ 9 files changed, 76 insertions(+), 143 deletions(-) diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index fcf7bfa..26a1f97 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -118,11 +118,10 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) struct ptlrpc_bulk_desc *desc = req->rq_bulk; struct lnet_process_id peer; int rc = 0; - int rc2; int posted_md; int total_md; u64 mbits; - struct lnet_handle_me me_h; + struct lnet_me *me; struct lnet_md md; if (OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_GET_NET)) @@ -183,8 +182,9 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) { rc = -ENOMEM; } else { - rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, - LNET_UNLINK, LNET_INS_AFTER, &me_h); + me = LNetMEAttach(desc->bd_portal, peer, mbits, 0, + LNET_UNLINK, LNET_INS_AFTER); + rc = PTR_ERR_OR_ZERO(me); } if (rc != 0) { CERROR("%s: LNetMEAttach failed x%llu/%d: rc = %d\n", @@ -194,14 +194,13 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) } /* About to let the network at it... */ - rc = LNetMDAttach(me_h, md, LNET_UNLINK, + rc = LNetMDAttach(me, md, LNET_UNLINK, &desc->bd_mds[posted_md]); if (rc != 0) { CERROR("%s: LNetMDAttach failed x%llu/%d: rc = %d\n", desc->bd_import->imp_obd->obd_name, mbits, posted_md, rc); - rc2 = LNetMEUnlink(me_h); - LASSERT(rc2 == 0); + LNetMEUnlink(me); break; } } @@ -479,11 +478,10 @@ int ptlrpc_error(struct ptlrpc_request *req) int ptl_send_rpc(struct ptlrpc_request *request, int noreply) { int rc; - int rc2; unsigned int mpflag = 0; struct lnet_handle_md bulk_cookie; struct ptlrpc_connection *connection; - struct lnet_handle_me reply_me_h; + struct lnet_me *reply_me; struct lnet_md reply_md; struct obd_import *imp = request->rq_import; struct obd_device *obd = imp->imp_obd; @@ -611,10 +609,11 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) request->rq_repmsg = NULL; } - rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/ - connection->c_peer, request->rq_xid, 0, - LNET_UNLINK, LNET_INS_AFTER, &reply_me_h); - if (rc != 0) { + reply_me = LNetMEAttach(request->rq_reply_portal, + connection->c_peer, request->rq_xid, 0, + LNET_UNLINK, LNET_INS_AFTER); + if (IS_ERR(reply_me)) { + rc = PTR_ERR(reply_me); CERROR("LNetMEAttach failed: %d\n", rc); LASSERT(rc == -ENOMEM); rc = -ENOMEM; @@ -652,7 +651,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) /* We must see the unlink callback to set rq_reply_unlinked, * so we can't auto-unlink */ - rc = LNetMDAttach(reply_me_h, reply_md, LNET_RETAIN, + rc = LNetMDAttach(reply_me, reply_md, LNET_RETAIN, &request->rq_reply_md_h); if (rc != 0) { CERROR("LNetMDAttach failed: %d\n", rc); @@ -710,8 +709,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) * nobody apart from the PUT's target has the right nid+XID to * access the reply buffer. */ - rc2 = LNetMEUnlink(reply_me_h); - LASSERT(rc2 == 0); + LNetMEUnlink(reply_me); /* UNLINKED callback called synchronously */ LASSERT(!request->rq_receiving_reply); @@ -750,7 +748,7 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) }; int rc; struct lnet_md md; - struct lnet_handle_me me_h; + struct lnet_me *me; CDEBUG(D_NET, "LNetMEAttach: portal %d\n", service->srv_req_portal); @@ -762,12 +760,12 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) * which means buffer can only be attached on local CPT, and LND * threads can find it by grabbing a local lock */ - rc = LNetMEAttach(service->srv_req_portal, + me = LNetMEAttach(service->srv_req_portal, match_id, 0, ~0, LNET_UNLINK, rqbd->rqbd_svcpt->scp_cpt >= 0 ? - LNET_INS_LOCAL : LNET_INS_AFTER, &me_h); - if (rc != 0) { - CERROR("LNetMEAttach failed: %d\n", rc); + LNET_INS_LOCAL : LNET_INS_AFTER); + if (IS_ERR(me)) { + CERROR("LNetMEAttach failed: %ld\n", PTR_ERR(me)); return -ENOMEM; } @@ -782,14 +780,13 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) md.user_ptr = &rqbd->rqbd_cbid; md.eq_handle = ptlrpc_eq_h; - rc = LNetMDAttach(me_h, md, LNET_UNLINK, &rqbd->rqbd_md_h); + rc = LNetMDAttach(me, md, LNET_UNLINK, &rqbd->rqbd_md_h); if (rc == 0) return 0; CERROR("LNetMDAttach failed: %d;\n", rc); LASSERT(rc == -ENOMEM); - rc = LNetMEUnlink(me_h); - LASSERT(rc == 0); + LNetMEUnlink(me); rqbd->rqbd_refcount = 0; return -ENOMEM; diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index ac602fc..f9f6860 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -94,15 +94,15 @@ * and removed from its list by LNetMEUnlink(). * @{ */ -int LNetMEAttach(unsigned int portal, - struct lnet_process_id match_id_in, - u64 match_bits_in, - u64 ignore_bits_in, - enum lnet_unlink unlink_in, - enum lnet_ins_pos pos_in, - struct lnet_handle_me *handle_out); - -int LNetMEUnlink(struct lnet_handle_me current_in); +struct lnet_me * +LNetMEAttach(unsigned int portal, + struct lnet_process_id match_id_in, + u64 match_bits_in, + u64 ignore_bits_in, + enum lnet_unlink unlink_in, + enum lnet_ins_pos pos_in); + +void LNetMEUnlink(struct lnet_me *current_in); /** @} lnet_me */ /** \defgroup lnet_md Memory descriptors @@ -118,7 +118,7 @@ int LNetMEAttach(unsigned int portal, * associated with a MD: LNetMDUnlink(). * @{ */ -int LNetMDAttach(struct lnet_handle_me current_in, +int LNetMDAttach(struct lnet_me *current_in, struct lnet_md md_in, enum lnet_unlink unlink_in, struct lnet_handle_md *md_handle_out); diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index bf357b0..a8051fe 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -326,28 +326,6 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, } static inline void -lnet_me2handle(struct lnet_handle_me *handle, struct lnet_me *me) -{ - handle->cookie = me->me_lh.lh_cookie; -} - -static inline struct lnet_me * -lnet_handle2me(struct lnet_handle_me *handle) -{ - /* ALWAYS called with resource lock held */ - struct lnet_libhandle *lh; - int cpt; - - cpt = lnet_cpt_of_cookie(handle->cookie); - lh = lnet_res_lh_lookup(the_lnet.ln_me_containers[cpt], - handle->cookie); - if (!lh) - return NULL; - - return lh_entry(lh, struct lnet_me, me_lh); -} - -static inline void lnet_peer_net_addref_locked(struct lnet_peer_net *lpn) { atomic_inc(&lpn->lpn_refcount); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 9055da9..3345940 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -192,7 +192,7 @@ struct lnet_eq { struct lnet_me { struct list_head me_list; - struct lnet_libhandle me_lh; + int me_cpt; struct lnet_process_id me_match_id; unsigned int me_portal; unsigned int me_pos; /* hash offset in mt_hash */ @@ -1027,8 +1027,6 @@ struct lnet { int ln_nportals; /* the vector of portals */ struct lnet_portal **ln_portals; - /* percpt ME containers */ - struct lnet_res_container **ln_me_containers; /* percpt MD container */ struct lnet_res_container **ln_md_containers; diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h index cf263b9..118340f 100644 --- a/include/uapi/linux/lnet/lnet-types.h +++ b/include/uapi/linux/lnet/lnet-types.h @@ -374,10 +374,6 @@ static inline int LNetMDHandleIsInvalid(struct lnet_handle_md h) return (LNET_WIRE_HANDLE_COOKIE_NONE == h.cookie); } -struct lnet_handle_me { - u64 cookie; -}; - /** * Global process ID. */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 5df39aa..852bb0c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1115,14 +1115,6 @@ struct list_head ** if (rc) goto failed; - recs = lnet_res_containers_create(LNET_COOKIE_TYPE_ME); - if (!recs) { - rc = -ENOMEM; - goto failed; - } - - the_lnet.ln_me_containers = recs; - recs = lnet_res_containers_create(LNET_COOKIE_TYPE_MD); if (!recs) { rc = -ENOMEM; @@ -1185,11 +1177,6 @@ struct list_head ** the_lnet.ln_md_containers = NULL; } - if (the_lnet.ln_me_containers) { - lnet_res_containers_destroy(the_lnet.ln_me_containers); - the_lnet.ln_me_containers = NULL; - } - lnet_res_container_cleanup(&the_lnet.ln_eq_container); lnet_msg_containers_destroy(); @@ -1594,7 +1581,7 @@ struct lnet_ping_buffer * .nid = LNET_NID_ANY, .pid = LNET_PID_ANY }; - struct lnet_handle_me me_handle; + struct lnet_me *me; struct lnet_md md = { NULL }; int rc, rc2; @@ -1614,11 +1601,11 @@ struct lnet_ping_buffer * } /* Ping target ME/MD */ - rc = LNetMEAttach(LNET_RESERVED_PORTAL, id, + me = LNetMEAttach(LNET_RESERVED_PORTAL, id, LNET_PROTO_PING_MATCHBITS, 0, - LNET_UNLINK, LNET_INS_AFTER, - &me_handle); - if (rc) { + LNET_UNLINK, LNET_INS_AFTER); + if (IS_ERR(me)) { + rc = PTR_ERR(me); CERROR("Can't create ping target ME: %d\n", rc); goto fail_decref_ping_buffer; } @@ -1633,7 +1620,7 @@ struct lnet_ping_buffer * md.eq_handle = the_lnet.ln_ping_target_eq; md.user_ptr = *ppbuf; - rc = LNetMDAttach(me_handle, md, LNET_RETAIN, ping_mdh); + rc = LNetMDAttach(me, md, LNET_RETAIN, ping_mdh); if (rc) { CERROR("Can't attach ping target MD: %d\n", rc); goto fail_unlink_ping_me; @@ -1643,8 +1630,7 @@ struct lnet_ping_buffer * return 0; fail_unlink_ping_me: - rc2 = LNetMEUnlink(me_handle); - LASSERT(!rc2); + LNetMEUnlink(me); fail_decref_ping_buffer: LASSERT(lnet_ping_buffer_numref(*ppbuf) == 1); lnet_ping_buffer_decref(*ppbuf); @@ -1773,7 +1759,7 @@ int lnet_push_target_resize(void) .pid = LNET_PID_ANY }; struct lnet_md md = { NULL }; - struct lnet_handle_me meh; + struct lnet_me *me; struct lnet_handle_md mdh; struct lnet_handle_md old_mdh; struct lnet_ping_buffer *pbuf; @@ -1792,11 +1778,11 @@ int lnet_push_target_resize(void) goto fail_return; } - rc = LNetMEAttach(LNET_RESERVED_PORTAL, id, + me = LNetMEAttach(LNET_RESERVED_PORTAL, id, LNET_PROTO_PING_MATCHBITS, 0, - LNET_UNLINK, LNET_INS_AFTER, - &meh); - if (rc) { + LNET_UNLINK, LNET_INS_AFTER); + if (IS_ERR(me)) { + rc = PTR_ERR(me); CERROR("Can't create push target ME: %d\n", rc); goto fail_decref_pbuf; } @@ -1811,10 +1797,10 @@ int lnet_push_target_resize(void) md.user_ptr = pbuf; md.eq_handle = the_lnet.ln_push_target_eq; - rc = LNetMDAttach(meh, md, LNET_RETAIN, &mdh); + rc = LNetMDAttach(me, md, LNET_RETAIN, &mdh); if (rc) { CERROR("Can't attach push MD: %d\n", rc); - goto fail_unlink_meh; + goto fail_unlink_me; } lnet_ping_buffer_addref(pbuf); @@ -1837,8 +1823,8 @@ int lnet_push_target_resize(void) return 0; -fail_unlink_meh: - LNetMEUnlink(meh); +fail_unlink_me: + LNetMEUnlink(me); fail_decref_pbuf: lnet_ping_buffer_decref(pbuf); fail_return: diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c index 5ee43c2..4dae58f 100644 --- a/net/lnet/lnet/lib-md.c +++ b/net/lnet/lnet/lib-md.c @@ -337,7 +337,7 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) /** * Create a memory descriptor and attach it to a ME * - * @meh A handle for a ME to associate the new MD with. + * @me An ME to associate the new MD with. * @umd Provides initial values for the user-visible parts of a MD. * Other than its use for initialization, there is no linkage * between this structure and the MD maintained by the LNet. @@ -354,19 +354,18 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) * Return: 0 on success. * -EINVAL If @umd is not valid. * -ENOMEM If new MD cannot be allocated. - * -ENOENT Either @meh or @umd.eq_handle does not point to a + * -ENOENT Either @me or @umd.eq_handle does not point to a * valid object. Note that it's OK to supply a NULL @umd.eq_handle * by calling LNetInvalidateHandle() on it. - * -EBUSY if the ME pointed to by @meh is already associated with + * -EBUSY if the ME pointed to by @me is already associated with * a MD. */ int -LNetMDAttach(struct lnet_handle_me meh, struct lnet_md umd, +LNetMDAttach(struct lnet_me *me, struct lnet_md umd, enum lnet_unlink unlink, struct lnet_handle_md *handle) { LIST_HEAD(matches); LIST_HEAD(drops); - struct lnet_me *me; struct lnet_libmd *md; int cpt; int rc; @@ -389,14 +388,11 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) if (rc) goto out_free; - cpt = lnet_cpt_of_cookie(meh.cookie); + cpt = me->me_cpt; lnet_res_lock(cpt); - me = lnet_handle2me(&meh); - if (!me) - rc = -ENOENT; - else if (me->me_md) + if (me->me_md) rc = -EBUSY; else rc = lnet_md_link(md, umd.eq_handle, cpt); diff --git a/net/lnet/lnet/lib-me.c b/net/lnet/lnet/lib-me.c index 47cf498..d17f41d 100644 --- a/net/lnet/lnet/lib-me.c +++ b/net/lnet/lnet/lib-me.c @@ -62,20 +62,16 @@ * @pos Indicates whether the new ME should be prepended or * appended to the match list. Allowed constants: LNET_INS_BEFORE, * LNET_INS_AFTER. - * @handle On successful returns, a handle to the newly created ME object - * is saved here. This handle can be used later in LNetMEUnlink(), - * or LNetMDAttach() functions. * - * Return: 0 On success. - * -EINVAL If @portal is invalid. - * -ENOMEM If new ME object cannot be allocated. + * Return: 0 On success. handle to the newly created ME is returned on success + * ERR_PTR(-EINVAL) If \a portal is invalid. + * ERR_PTR(-ENOMEM) If new ME object cannot be allocated. */ -int +struct lnet_me * LNetMEAttach(unsigned int portal, struct lnet_process_id match_id, u64 match_bits, u64 ignore_bits, - enum lnet_unlink unlink, enum lnet_ins_pos pos, - struct lnet_handle_me *handle) + enum lnet_unlink unlink, enum lnet_ins_pos pos) { struct lnet_match_table *mtable; struct lnet_me *me; @@ -84,16 +80,16 @@ LASSERT(the_lnet.ln_refcount > 0); if ((int)portal >= the_lnet.ln_nportals) - return -EINVAL; + return ERR_PTR(-EINVAL); mtable = lnet_mt_of_attach(portal, match_id, match_bits, ignore_bits, pos); if (!mtable) /* can't match portal type */ - return -EPERM; + return ERR_PTR(-EPERM); me = kmem_cache_alloc(lnet_mes_cachep, GFP_NOFS | __GFP_ZERO); if (!me) - return -ENOMEM; + return ERR_PTR(-ENOMEM); lnet_res_lock(mtable->mt_cpt); @@ -104,8 +100,8 @@ me->me_unlink = unlink; me->me_md = NULL; - lnet_res_lh_initialize(the_lnet.ln_me_containers[mtable->mt_cpt], - &me->me_lh); + me->me_cpt = mtable->mt_cpt; + if (ignore_bits) head = &mtable->mt_mhash[LNET_MT_HASH_IGNORE]; else @@ -117,10 +113,8 @@ else list_add(&me->me_list, head); - lnet_me2handle(handle, me); - lnet_res_unlock(mtable->mt_cpt); - return 0; + return me; } EXPORT_SYMBOL(LNetMEAttach); @@ -132,32 +126,22 @@ * and an unlink event will be generated. It is an error to use the ME handle * after calling LNetMEUnlink(). * - * @meh A handle for the ME to be unlinked. - * - * Return 0 On success. - * -ENOENT If @meh does not point to a valid ME. + * @me The ME to be unlinked. * * \see LNetMDUnlink() for the discussion on delivering unlink event. */ -int -LNetMEUnlink(struct lnet_handle_me meh) +void +LNetMEUnlink(struct lnet_me *me) { - struct lnet_me *me; struct lnet_libmd *md; struct lnet_event ev; int cpt; LASSERT(the_lnet.ln_refcount > 0); - cpt = lnet_cpt_of_cookie(meh.cookie); + cpt = me->me_cpt; lnet_res_lock(cpt); - me = lnet_handle2me(&meh); - if (!me) { - lnet_res_unlock(cpt); - return -ENOENT; - } - md = me->me_md; if (md) { md->md_flags |= LNET_MD_FLAG_ABORTED; @@ -170,7 +154,6 @@ lnet_me_unlink(me); lnet_res_unlock(cpt); - return 0; } EXPORT_SYMBOL(LNetMEUnlink); @@ -188,6 +171,5 @@ lnet_md_unlink(md); } - lnet_res_lh_invalidate(&me->me_lh); kfree(me); } diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index 7a8226c..531377d 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -360,11 +360,12 @@ struct srpc_bulk * { int rc; struct lnet_md md; - struct lnet_handle_me meh; + struct lnet_me *me; - rc = LNetMEAttach(portal, peer, matchbits, 0, LNET_UNLINK, - local ? LNET_INS_LOCAL : LNET_INS_AFTER, &meh); - if (rc) { + me = LNetMEAttach(portal, peer, matchbits, 0, LNET_UNLINK, + local ? LNET_INS_LOCAL : LNET_INS_AFTER); + if (IS_ERR(me)) { + rc = PTR_ERR(me); CERROR("LNetMEAttach failed: %d\n", rc); LASSERT(rc == -ENOMEM); return -ENOMEM; @@ -377,13 +378,12 @@ struct srpc_bulk * md.options = options; md.eq_handle = srpc_data.rpc_lnet_eq; - rc = LNetMDAttach(meh, md, LNET_UNLINK, mdh); + rc = LNetMDAttach(me, md, LNET_UNLINK, mdh); if (rc) { CERROR("LNetMDAttach failed: %d\n", rc); LASSERT(rc == -ENOMEM); - rc = LNetMEUnlink(meh); - LASSERT(!rc); + LNetMEUnlink(me); return -ENOMEM; } From patchwork Thu Feb 27 21:17:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B819924 for ; Thu, 27 Feb 2020 21:48:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7414824690 for ; Thu, 27 Feb 2020 21:48:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7414824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4761034A543; Thu, 27 Feb 2020 13:39:21 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C13BA21FC75 for ; Thu, 27 Feb 2020 13:21:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 64BD8A159; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 63B9746A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:54 -0500 Message-Id: <1582838290-17243-607-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 606/622] lnet: avoid extra memory consumption X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov use slab allocation for the rsp_tracker and lnet_message structs to avoid memory fragmnetation. Cray-bug-id: LUS-8190 WC-bug-id: https://jira.whamcloud.com/browse/LU-13036 Lustre-commit: a3ce59ae2c62 ("LU-13036 lnet: avoid extra memory consumption") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/36897 Reviewed-by: Alexandr Boyko Reviewed-by: Alexander Zarochentsev Reviewed-by: Chris Horn Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 13 ++++++++++--- net/lnet/lnet/api-ni.c | 28 ++++++++++++++++++++++++---- net/lnet/lnet/lib-move.c | 11 ++++++----- 3 files changed, 40 insertions(+), 12 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index a8051fe..de0cef0 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -83,13 +83,17 @@ /* default timeout */ #define DEFAULT_PEER_TIMEOUT 180 +#define LNET_LND_DEFAULT_TIMEOUT 5 + +bool lnet_is_route_alive(struct lnet_route *route); #define LNET_SMALL_MD_SIZE offsetof(struct lnet_libmd, md_iov.iov[1]) extern struct kmem_cache *lnet_mes_cachep; /* MEs kmem_cache */ extern struct kmem_cache *lnet_small_mds_cachep; /* <= LNET_SMALL_MD_SIZE bytes * MDs kmem_cache */ -#define LNET_LND_DEFAULT_TIMEOUT 5 +extern struct kmem_cache *lnet_rspt_cachep; +extern struct kmem_cache *lnet_msg_cachep; bool lnet_is_route_alive(struct lnet_route *route); bool lnet_is_gateway_alive(struct lnet_peer *gw); @@ -417,19 +421,22 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec, { struct lnet_rsp_tracker *rspt; - rspt = kzalloc(sizeof(*rspt), GFP_NOFS); + rspt = kmem_cache_zalloc(lnet_rspt_cachep, GFP_NOFS); if (rspt) { lnet_net_lock(cpt); the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc++; lnet_net_unlock(cpt); } + CDEBUG(D_MALLOC, "rspt alloc %p\n", rspt); return rspt; } static inline void lnet_rspt_free(struct lnet_rsp_tracker *rspt, int cpt) { - kfree(rspt); + CDEBUG(D_MALLOC, "rspt free %p\n", rspt); + + kmem_cache_free(lnet_rspt_cachep, rspt); lnet_net_lock(cpt); the_lnet.ln_counters[cpt]->lct_health.lch_rst_alloc--; lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 852bb0c..b9c38f3 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -494,9 +494,11 @@ static int lnet_discover(struct lnet_process_id id, u32 force, struct kmem_cache *lnet_small_mds_cachep; /* <= LNET_SMALL_MD_SIZE bytes * MDs kmem_cache */ +struct kmem_cache *lnet_rspt_cachep; /* response tracker cache */ +struct kmem_cache *lnet_msg_cachep; static int -lnet_descriptor_setup(void) +lnet_slab_setup(void) { /* create specific kmem_cache for MEs and small MDs (i.e., originally * allocated in kmem_cache). @@ -512,12 +514,30 @@ static int lnet_discover(struct lnet_process_id id, u32 force, if (!lnet_small_mds_cachep) return -ENOMEM; + lnet_rspt_cachep = kmem_cache_create("lnet_rspt", + sizeof(struct lnet_rsp_tracker), + 0, 0, NULL); + if (!lnet_rspt_cachep) + return -ENOMEM; + + lnet_msg_cachep = kmem_cache_create("lnet_msg", + sizeof(struct lnet_msg), + 0, 0, NULL); + if (!lnet_msg_cachep) + return -ENOMEM; + return 0; } static void -lnet_descriptor_cleanup(void) +lnet_slab_cleanup(void) { + kmem_cache_destroy(lnet_msg_cachep); + lnet_msg_cachep = NULL; + + kmem_cache_destroy(lnet_rspt_cachep); + lnet_rspt_cachep = NULL; + kmem_cache_destroy(lnet_small_mds_cachep); lnet_small_mds_cachep = NULL; @@ -1081,7 +1101,7 @@ struct list_head ** LNetInvalidateEQHandle(&the_lnet.ln_mt_eqh); init_completion(&the_lnet.ln_started); - rc = lnet_descriptor_setup(); + rc = lnet_slab_setup(); if (rc != 0) goto failed; @@ -1188,7 +1208,7 @@ struct list_head ** the_lnet.ln_counters = NULL; } lnet_destroy_remote_nets_table(); - lnet_descriptor_cleanup(); + lnet_slab_cleanup(); return 0; } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 47d5389..cd36d52 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4186,7 +4186,7 @@ void lnet_monitor_thr_stop(void) } } - msg = kzalloc(sizeof(*msg), GFP_NOFS); + msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS); if (!msg) { CERROR("%s, src %s: Dropping %s (out of memory)\n", libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), @@ -4194,7 +4194,7 @@ void lnet_monitor_thr_stop(void) goto drop; } - /* msg zeroed by kzalloc() + /* msg zeroed by kmem_cache_zalloc(). * i.e. flags all clear, pointers NULL etc */ msg->msg_type = type; @@ -4475,7 +4475,7 @@ void lnet_monitor_thr_stop(void) return -EIO; } - msg = kzalloc(sizeof(*msg), GFP_NOFS); + msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS); if (!msg) { CERROR("Dropping PUT to %s: ENOMEM on struct lnet_msg\n", libcfs_id2str(target)); @@ -4571,7 +4571,7 @@ struct lnet_msg * * CAVEAT EMPTOR: 'getmsg' is the original GET, which is freed when * lnet_finalize() is called on it, so the LND must call this first */ - struct lnet_msg *msg = kzalloc(sizeof(*msg), GFP_NOFS); + struct lnet_msg *msg; struct lnet_libmd *getmd = getmsg->msg_md; struct lnet_process_id peer_id = getmsg->msg_target; int cpt; @@ -4579,6 +4579,7 @@ struct lnet_msg * LASSERT(!getmsg->msg_target_is_router); LASSERT(!getmsg->msg_routing); + msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS); if (!msg) { CERROR("%s: Dropping REPLY from %s: can't allocate msg\n", libcfs_nid2str(ni->ni_nid), libcfs_id2str(peer_id)); @@ -4708,7 +4709,7 @@ struct lnet_msg * return -EIO; } - msg = kzalloc(sizeof(*msg), GFP_NOFS); + msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS); if (!msg) { CERROR("Dropping GET to %s: ENOMEM on struct lnet_msg\n", libcfs_id2str(target)); From patchwork Thu Feb 27 21:17:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410581 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A86D924 for ; Thu, 27 Feb 2020 21:41:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 82290246A1 for ; Thu, 27 Feb 2020 21:41:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82290246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2248534AA0E; Thu, 27 Feb 2020 13:33:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2326021FD8C for ; Thu, 27 Feb 2020 13:21:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 69276A15A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6679846D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:55 -0500 Message-Id: <1582838290-17243-608-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 607/622] lustre: uapi: remove unused LUSTRE_DIRECTIO_FL X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger The LUSTRE_DIRECTIO_FL was added based on the upstream FS_DIRECTIO_FL flag in the hopes that it might be useful, but it has since been removed from the upstream in kernel commit v4.4-rc4-22-g68ce7bfcd995 and replaced by FS_VERITY_FL using the same value in kernel commit v5.3-rc2-4-gfe9918d3b228, which we are much more likely to use. Since LUSTRE_DIRECTIO_FL was unused, there is no risk to remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-13164 Lustre-commit: ff168481a1b2 ("LU-13164 uapi: remove unused LUSTRE_DIRECTIO_FL") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/37295 Reviewed-by: Shaun Tancheff Reviewed-by: Arshad Hussain Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 -- include/uapi/linux/lustre/lustre_idl.h | 1 - 2 files changed, 3 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 6c66815..96f327f 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -2176,8 +2176,6 @@ void lustre_assert_wire_constants(void) LUSTRE_DIRSYNC_FL); LASSERTF(LUSTRE_TOPDIR_FL == 0x00020000, "found 0x%.8x\n", LUSTRE_TOPDIR_FL); - LASSERTF(LUSTRE_DIRECTIO_FL == 0x00100000, "found 0x%.8x\n", - LUSTRE_DIRECTIO_FL); LASSERTF(LUSTRE_INLINE_DATA_FL == 0x10000000, "found 0x%.8x\n", LUSTRE_INLINE_DATA_FL); LASSERTF(MDS_INODELOCK_LOOKUP == 0x00000001UL, "found 0x%.8x\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 19ac0cb..df2e34b 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1543,7 +1543,6 @@ enum { #define LUSTRE_INDEX_FL 0x00001000 /* hash-indexed directory */ #define LUSTRE_DIRSYNC_FL 0x00010000 /* dirsync behaviour (dir only) */ #define LUSTRE_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ -#define LUSTRE_DIRECTIO_FL 0x00100000 /* Use direct i/o */ #define LUSTRE_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */ #define LUSTRE_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ From patchwork Thu Feb 27 21:17:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9BBE924 for ; Thu, 27 Feb 2020 21:46:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B274424690 for ; Thu, 27 Feb 2020 21:46:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B274424690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE30F34B153; Thu, 27 Feb 2020 13:36:36 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6865B348A1A for ; Thu, 27 Feb 2020 13:21:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6AE52A15B; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 694E1468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:56 -0500 Message-Id: <1582838290-17243-609-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 608/622] lustre: lustre: Reserve OST_FALLOCATE(fallocate) opcode X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Swapnil Pimpale , Arshad Hussain , Li Xi , Abrarahmed Momin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Swapnil Pimpale A new RPC, OST_FALLOCATE has been added for space preallocation. This patch reserves OST_FALLOCATE opcode for fallocate syscall. Reserving opcode upfront would ensure consistency and would avoid protocol interoperability issues in the future. WC-bug-id: https://jira.whamcloud.com/browse/LU-3606 Lustre-commit: 46a11df089c9 ("LU-3606 lustre: Reserve OST_FALLOCATE(fallocate) opcode") Signed-off-by: Swapnil Pimpale Signed-off-by: Li Xi Signed-off-by: Abrarahmed Momin Signed-off-by: Arshad Hussain Reviewed-on: https://review.whamcloud.com/37277 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Signed-off-by: James Simmons --- fs/lustre/ptlrpc/lproc_ptlrpc.c | 3 ++- fs/lustre/ptlrpc/wiretest.c | 4 +++- include/uapi/linux/lustre/lustre_idl.h | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index f34aec3..fc7aa3e 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -67,6 +67,7 @@ { OST_QUOTACTL, "ost_quotactl" }, { OST_QUOTA_ADJUST_QUNIT, "ost_quota_adjust_qunit" }, { OST_LADVISE, "ost_ladvise" }, + { OST_FALLOCATE, "ost_fallocate"}, { MDS_GETATTR, "mds_getattr" }, { MDS_GETATTR_NAME, "mds_getattr_lock" }, { MDS_CLOSE, "mds_close" }, @@ -115,7 +116,7 @@ { 401, /* was OBD_LOG_CANCEL */ "llog_cancel" }, { 402, /* was OBD_QC_CALLBACK */ "obd_quota_callback" }, { OBD_IDX_READ, "dt_index_read" }, - { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, + { LLOG_ORIGIN_HANDLE_CREATE, "llog_origin_handle_open" }, { LLOG_ORIGIN_HANDLE_NEXT_BLOCK, "llog_origin_handle_next_block" }, { LLOG_ORIGIN_HANDLE_READ_HEADER, "llog_origin_handle_read_header" }, { 504, /*LLOG_ORIGIN_HANDLE_WRITE_REC*/ "llog_origin_handle_write_rec" }, diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 96f327f..d94d2d9 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -106,7 +106,9 @@ void lustre_assert_wire_constants(void) (long long)OST_QUOTA_ADJUST_QUNIT); LASSERTF(OST_LADVISE == 21, "found %lld\n", (long long)OST_LADVISE); - LASSERTF(OST_LAST_OPC == 22, "found %lld\n", + LASSERTF(OST_FALLOCATE == 22, "found %lld\n", + (long long)OST_FALLOCATE); + LASSERTF(OST_LAST_OPC == 23, "found %lld\n", (long long)OST_LAST_OPC); LASSERTF(OBD_OBJECT_EOF == 0xffffffffffffffffULL, "found 0x%.16llxULL\n", OBD_OBJECT_EOF); diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index df2e34b..12ab369 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -956,6 +956,7 @@ enum ost_cmd { OST_QUOTACTL = 19, OST_QUOTA_ADJUST_QUNIT = 20, /* not used since 2.4 */ OST_LADVISE = 21, + OST_FALLOCATE = 22, OST_LAST_OPC /* must be < 33 to avoid MDS_GETATTR */ }; #define OST_FIRST_OPC OST_REPLY @@ -2789,6 +2790,7 @@ struct obdo { #define o_dropped o_misc #define o_cksum o_nlink #define o_grant_used o_data_version +#define o_falloc_mode o_nlink /* request structure for OST's */ struct ost_body { From patchwork Thu Feb 27 21:17:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 514401580 for ; Thu, 27 Feb 2020 21:48:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 39B9524690 for ; Thu, 27 Feb 2020 21:48:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39B9524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3CA5A34B583; Thu, 27 Feb 2020 13:38:13 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C02F6348A1F for ; Thu, 27 Feb 2020 13:21:28 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6D4D9A15C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C0E946C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:57 -0500 Message-Id: <1582838290-17243-610-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 609/622] lnet: libcfs: Cleanup use of bare printk X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff Some users of printk( "fmt" can be converted to pr_level("fmt" equivalents WC-bug-id: https://jira.whamcloud.com/browse/LU-12861 Lustre-commit: b4c8a5180dec ("LU-12861 libcfs: Cleanup use of bare printk") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/37046 Reviewed-by: Ben Evans Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/debug.c | 2 +- net/lnet/libcfs/module.c | 4 ++-- net/lnet/libcfs/tracefile.c | 43 +++++++++++++++++++++++++------------------ 3 files changed, 28 insertions(+), 21 deletions(-) diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c index c6b92df..d7747e7 100644 --- a/net/lnet/libcfs/debug.c +++ b/net/lnet/libcfs/debug.c @@ -418,7 +418,7 @@ void libcfs_debug_dumplog(void) "libcfs_debug_dumper"); set_current_state(TASK_INTERRUPTIBLE); if (IS_ERR(dumper)) - pr_err("LustreError: cannot start log dump thread: %ld\n", + pr_err("LustreError: cannot start log dump thread: rc = %ld\n", PTR_ERR(dumper)); else schedule(); diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c index 20d4302..a53efcc 100644 --- a/net/lnet/libcfs/module.c +++ b/net/lnet/libcfs/module.c @@ -720,7 +720,7 @@ int libcfs_setup(void) rc = libcfs_debug_init(5 * 1024 * 1024); if (rc < 0) { - pr_err("LustreError: libcfs_debug_init: %d\n", rc); + pr_err("LustreError: libcfs_debug_init: rc = %d\n", rc); goto err; } @@ -794,7 +794,7 @@ static void libcfs_exit(void) /* the below message is checked in test-framework.sh check_mem_leak() */ rc = libcfs_debug_cleanup(); if (rc) - pr_err("LustreError: libcfs_debug_cleanup: %d\n", rc); + pr_err("LustreError: libcfs_debug_cleanup: rc = %d\n", rc); } MODULE_AUTHOR("OpenSFS, Inc. "); diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index bda3523..1eb5397 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -332,7 +332,8 @@ static struct cfs_trace_page *cfs_trace_get_tage(struct cfs_trace_cpu_data *tcd, * from here: this will lead to infinite recursion. */ if (len > PAGE_SIZE) { - pr_err("cowardly refusing to write %lu bytes in a page\n", len); + pr_err("LustreError: cowardly refusing to write %lu bytes in a page\n", + len); return NULL; } @@ -477,7 +478,8 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, max_nob = PAGE_SIZE - tage->used - known_size; if (max_nob <= 0) { - pr_emerg("negative max_nob: %d\n", max_nob); + pr_emerg("LustreError: negative max_nob: %d\n", + max_nob); mask |= D_ERROR; cfs_trace_put_tcd(tcd); tcd = NULL; @@ -499,10 +501,15 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, break; } - if (*(string_buf + needed - 1) != '\n') - pr_info("format at %s:%d:%s doesn't end in newline\n", file, - msgdata->msg_line, msgdata->msg_fn); - + if (*(string_buf + needed - 1) != '\n') { + pr_info("Lustre: format at %s:%d:%s doesn't end in newline\n", + file, msgdata->msg_line, msgdata->msg_fn); + } else if (mask & D_TTY) { + /* TTY needs '\r\n' to move carriage to leftmost position */ + if (needed < 2 || *(string_buf + needed - 2) != '\r') + pr_info("Lustre: format at %s:%d:%s doesn't end in '\\r\\n'\n", + file, msgdata->msg_line, msgdata->msg_fn); + } header.ph_len = known_size + needed; debug_buf = (char *)page_address(tage->page) + tage->used; @@ -816,7 +823,7 @@ int cfs_tracefile_dump_all_pages(char *filename) if (IS_ERR(filp)) { rc = PTR_ERR(filp); filp = NULL; - pr_err("LustreError: can't open %s for dump: rc %d\n", + pr_err("LustreError: can't open %s for dump: rc = %d\n", filename, rc); goto out; } @@ -839,8 +846,8 @@ int cfs_tracefile_dump_all_pages(char *filename) kunmap(tage->page); if (rc != (int)tage->used) { - pr_warn("wanted to write %u but wrote %d\n", tage->used, - rc); + pr_warn("Lustre: wanted to write %u but wrote %d\n", + tage->used, rc); put_pages_back(&pc); __LASSERT(list_empty(&pc.pc_pages)); break; @@ -851,7 +858,7 @@ int cfs_tracefile_dump_all_pages(char *filename) rc = vfs_fsync(filp, 1); if (rc) - pr_err("sync returns %d\n", rc); + pr_err("LustreError: sync returns: rc = %d\n", rc); close: filp_close(filp, NULL); out: @@ -985,7 +992,7 @@ int cfs_trace_daemon_command(char *str) } else { strcpy(cfs_tracefile, str); - pr_info("debug daemon will attempt to start writing to %s (%lukB max)\n", + pr_info("Lustre: debug daemon will attempt to start writing to %s (%lukB max)\n", cfs_tracefile, (long)(cfs_tracefile_size >> 10)); @@ -1100,8 +1107,8 @@ static int tracefiled(void *arg) if (IS_ERR(filp)) { rc = PTR_ERR(filp); filp = NULL; - pr_warn("couldn't open %s: %d\n", cfs_tracefile, - rc); + pr_warn("Lustre: couldn't open %s: rc = %d\n", + cfs_tracefile, rc); } } up_read(&cfs_tracefile_sem); @@ -1126,7 +1133,7 @@ static int tracefiled(void *arg) kunmap(tage->page); if (rc != (int)tage->used) { - pr_warn("wanted to write %u but wrote %d\n", + pr_warn("Lustre: wanted to write %u but wrote %d\n", tage->used, rc); put_pages_back(&pc); __LASSERT(list_empty(&pc.pc_pages)); @@ -1139,8 +1146,8 @@ static int tracefiled(void *arg) if (!list_empty(&pc.pc_pages)) { int i; - pr_alert("trace pages aren't empty\n"); - pr_err("total cpus(%d): ", num_possible_cpus()); + pr_alert("Lustre: trace pages aren't empty\n"); + pr_err("Lustre: total cpus(%d): ", num_possible_cpus()); for (i = 0; i < num_possible_cpus(); i++) if (cpu_online(i)) pr_cont("%d(on) ", i); @@ -1151,9 +1158,9 @@ static int tracefiled(void *arg) i = 0; list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) - pr_err("page %d belongs to cpu %d\n", + pr_err("Lustre: page %d belongs to cpu %d\n", ++i, tage->cpu); - pr_err("There are %d pages unwritten\n", i); + pr_err("Lustre: There are %d pages unwritten\n", i); } __LASSERT(list_empty(&pc.pc_pages)); end_loop: From patchwork Thu Feb 27 21:17:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410585 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C907017E0 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1CA6246A1 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1CA6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E654034AA3F; Thu, 27 Feb 2020 13:33:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 234F6348A1A for ; Thu, 27 Feb 2020 13:21:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 71646A15D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6EDDD47C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:58 -0500 Message-Id: <1582838290-17243-611-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 610/622] lnet: Do not assume peers are MR capable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If a peer has discovery disabled then it will not consolidate peer NI information. This means we need to use a consistent source NI when sending to it just like we do for non-MR peers. A comment in lnet_discovery_event_reply() indicates that this was a known issue, but the situation is not handled properly. Do not assume peers are multi-rail capable when peer objects are allocated and initialized. Do not mark a peer as multi-rail capable unless all of the following conditions are satisified: 1. The peer has the MR feature flag set 2. The peer has discovery enabled. 3. We have discovery enabled locally Note: 1, 2, and 3 above are implemented in the code for lnet_discovery_event_reply(), but code earlier in the function breaks this behavior. Remove the offending code. Update sanity-lnet tests 100 and 101 to reflect the fact that peers added via the traffic path no longer have multi-rail by default. Cray-bug-id: LUS-7918 WC-bug-id: https://jira.whamcloud.com/browse/LU-12889 Lustre-commit: 3c580c93b8d3 ("LU-12889 lnet: Do not assume peers are MR capable") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36512 Reviewed-by: Amir Shehata Reviewed-by: Serguei Smirnov Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 45 ++++++++++++++++----------------------------- 1 file changed, 16 insertions(+), 29 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index f987fff..0d7fbd4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1520,10 +1520,7 @@ struct lnet_peer_net * struct lnet_peer *lp; struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; - /* Assume peer is Multi-Rail capable and let discovery find out - * otherwise. - */ - unsigned int flags = LNET_PEER_MULTI_RAIL; + unsigned int flags = 0; int rc = 0; if (nid == LNET_NID_ANY) { @@ -2298,20 +2295,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) } /* - * Only enable the multi-rail feature on the peer if both sides of - * the connection have discovery on - */ - if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { - CDEBUG(D_NET, "Peer %s has Multi-Rail feature enabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state |= LNET_PEER_MULTI_RAIL; - } else { - CDEBUG(D_NET, "Peer %s has Multi-Rail feature disabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state &= ~LNET_PEER_MULTI_RAIL; - } - - /* The peer may have discovery disabled at its end. Set + * The peer may have discovery disabled at its end. Set * NO_DISCOVERY as appropriate. */ if ((pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) && @@ -2332,21 +2316,24 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) */ if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { if (lp->lp_state & LNET_PEER_MULTI_RAIL) { - /* Everything's fine */ + CDEBUG(D_NET, "peer %s(%p) is MR\n", + libcfs_nid2str(lp->lp_primary_nid), lp); } else if (lp->lp_state & LNET_PEER_CONFIGURED) { CWARN("Reply says %s is Multi-Rail, DLC says not\n", libcfs_nid2str(lp->lp_primary_nid)); + } else if (lnet_peer_discovery_disabled) { + CDEBUG(D_NET, + "peer %s(%p) not MR: DD disabled locally\n", + libcfs_nid2str(lp->lp_primary_nid), lp); + } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { + CDEBUG(D_NET, + "peer %s(%p) not MR: DD disabled remotely\n", + libcfs_nid2str(lp->lp_primary_nid), lp); } else { - /* if discovery is disabled then we don't want to - * update the state of the peer. All we'll do is - * update the peer_nis which were reported back in - * the initial ping - */ - - if (!lnet_is_discovery_disabled_locked(lp)) { - lp->lp_state |= LNET_PEER_MULTI_RAIL; - lnet_peer_clr_non_mr_pref_nids(lp); - } + CDEBUG(D_NET, "peer %s(%p) is MR capable\n", + libcfs_nid2str(lp->lp_primary_nid), lp); + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); } } else if (lp->lp_state & LNET_PEER_MULTI_RAIL) { if (lp->lp_state & LNET_PEER_CONFIGURED) { From patchwork Thu Feb 27 21:17:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410815 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8DF0D1580 for ; Thu, 27 Feb 2020 21:47:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 75CE824690 for ; Thu, 27 Feb 2020 21:47:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75CE824690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A00334B482; Thu, 27 Feb 2020 13:37:32 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D0E7348A25 for ; Thu, 27 Feb 2020 13:21:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 72E63A15E; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 71DE846A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:59 -0500 Message-Id: <1582838290-17243-612-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 611/622] lnet: socklnd: convert peers hash table to hashtable.h X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Using a hashtable.h hashtable, rather than bespoke code, has several advantages: - the table is comprised of hlist_head, rather than list_head, so it consumes less memory (though we need to make it a little bigger as it must be a power-of-2) - there are existing macros for easily walking the whole table - it uses a "real" hash function rather than "mod a prime number". In some ways, rhashtable might be even better, but it can change the ordering of objects in the table at arbitrary moments, and that could hurt the user-space API. It also does not support the partitioned walking that ksocknal_check_peer_timeouts() depends on. Note that new peers are inserted at the top of a hash chain, rather than appended at the end. I don't think that should be a problem. WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: dbbcf61d2bdc ("LU-12678 socklnd: convert peers hash table to hashtable.h") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36837 Reviewed-by: James Simmons Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 299 ++++++++++++++++-------------------- net/lnet/klnds/socklnd/socklnd.h | 18 +-- net/lnet/klnds/socklnd/socklnd_cb.c | 8 +- 3 files changed, 140 insertions(+), 185 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 016e005..7abb75a 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -167,10 +167,10 @@ struct ksock_peer_ni * ksocknal_find_peer_locked(struct lnet_ni *ni, struct lnet_process_id id) { - struct list_head *peer_list = ksocknal_nid2peerlist(id.nid); struct ksock_peer_ni *peer_ni; - list_for_each_entry(peer_ni, peer_list, ksnp_list) { + hash_for_each_possible(ksocknal_data.ksnd_peers, peer_ni, + ksnp_list, id.nid) { LASSERT(!peer_ni->ksnp_closing); if (peer_ni->ksnp_ni != ni) @@ -229,7 +229,7 @@ struct ksock_peer_ni * LASSERT(list_empty(&peer_ni->ksnp_routes)); LASSERT(!peer_ni->ksnp_closing); peer_ni->ksnp_closing = 1; - list_del(&peer_ni->ksnp_list); + hlist_del(&peer_ni->ksnp_list); /* lose peerlist's ref */ ksocknal_peer_decref(peer_ni); } @@ -247,55 +247,52 @@ struct ksock_peer_ni * read_lock(&ksocknal_data.ksnd_global_lock); - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) { - list_for_each_entry(peer_ni, &ksocknal_data.ksnd_peers[i], - ksnp_list) { - if (peer_ni->ksnp_ni != ni) - continue; + hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list) { + if (peer_ni->ksnp_ni != ni) + continue; - if (!peer_ni->ksnp_n_passive_ips && - list_empty(&peer_ni->ksnp_routes)) { - if (index-- > 0) - continue; + if (!peer_ni->ksnp_n_passive_ips && + list_empty(&peer_ni->ksnp_routes)) { + if (index-- > 0) + continue; - *id = peer_ni->ksnp_id; - *myip = 0; - *peer_ip = 0; - *port = 0; - *conn_count = 0; - *share_count = 0; - rc = 0; - goto out; - } + *id = peer_ni->ksnp_id; + *myip = 0; + *peer_ip = 0; + *port = 0; + *conn_count = 0; + *share_count = 0; + rc = 0; + goto out; + } - for (j = 0; j < peer_ni->ksnp_n_passive_ips; j++) { - if (index-- > 0) - continue; + for (j = 0; j < peer_ni->ksnp_n_passive_ips; j++) { + if (index-- > 0) + continue; - *id = peer_ni->ksnp_id; - *myip = peer_ni->ksnp_passive_ips[j]; - *peer_ip = 0; - *port = 0; - *conn_count = 0; - *share_count = 0; - rc = 0; - goto out; - } + *id = peer_ni->ksnp_id; + *myip = peer_ni->ksnp_passive_ips[j]; + *peer_ip = 0; + *port = 0; + *conn_count = 0; + *share_count = 0; + rc = 0; + goto out; + } - list_for_each_entry(route, &peer_ni->ksnp_routes, - ksnr_list) { - if (index-- > 0) - continue; + list_for_each_entry(route, &peer_ni->ksnp_routes, + ksnr_list) { + if (index-- > 0) + continue; - *id = peer_ni->ksnp_id; - *myip = route->ksnr_myipaddr; - *peer_ip = route->ksnr_ipaddr; - *port = route->ksnr_port; - *conn_count = route->ksnr_conn_count; - *share_count = route->ksnr_share_count; - rc = 0; - goto out; - } + *id = peer_ni->ksnp_id; + *myip = route->ksnr_myipaddr; + *peer_ip = route->ksnr_ipaddr; + *port = route->ksnr_port; + *conn_count = route->ksnr_conn_count; + *share_count = route->ksnr_share_count; + rc = 0; + goto out; } } out: @@ -463,8 +460,7 @@ struct ksock_peer_ni * peer_ni = peer2; } else { /* peer_ni table takes my ref on peer_ni */ - list_add_tail(&peer_ni->ksnp_list, - ksocknal_nid2peerlist(id.nid)); + hash_add(ksocknal_data.ksnd_peers, &peer_ni->ksnp_list, id.nid); } list_for_each_entry(route2, &peer_ni->ksnp_routes, ksnr_list) { @@ -544,7 +540,7 @@ struct ksock_peer_ni * ksocknal_del_peer(struct lnet_ni *ni, struct lnet_process_id id, u32 ip) { LIST_HEAD(zombies); - struct ksock_peer_ni *pnxt; + struct hlist_node *pnxt; struct ksock_peer_ni *peer_ni; int lo; int hi; @@ -554,17 +550,17 @@ struct ksock_peer_ni * write_lock_bh(&ksocknal_data.ksnd_global_lock); if (id.nid != LNET_NID_ANY) { - lo = (int)(ksocknal_nid2peerlist(id.nid) - ksocknal_data.ksnd_peers); - hi = (int)(ksocknal_nid2peerlist(id.nid) - ksocknal_data.ksnd_peers); + lo = hash_min(id.nid, HASH_BITS(ksocknal_data.ksnd_peers)); + hi = lo; } else { lo = 0; - hi = ksocknal_data.ksnd_peer_hash_size - 1; + hi = HASH_SIZE(ksocknal_data.ksnd_peers) - 1; } for (i = lo; i <= hi; i++) { - list_for_each_entry_safe(peer_ni, pnxt, - &ksocknal_data.ksnd_peers[i], - ksnp_list) { + hlist_for_each_entry_safe(peer_ni, pnxt, + &ksocknal_data.ksnd_peers[i], + ksnp_list) { if (peer_ni->ksnp_ni != ni) continue; @@ -609,23 +605,20 @@ struct ksock_peer_ni * read_lock(&ksocknal_data.ksnd_global_lock); - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) { - list_for_each_entry(peer_ni, &ksocknal_data.ksnd_peers[i], - ksnp_list) { - LASSERT(!peer_ni->ksnp_closing); + hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list) { + LASSERT(!peer_ni->ksnp_closing); + + if (peer_ni->ksnp_ni != ni) + continue; - if (peer_ni->ksnp_ni != ni) + list_for_each_entry(conn, &peer_ni->ksnp_conns, + ksnc_list) { + if (index-- > 0) continue; - list_for_each_entry(conn, &peer_ni->ksnp_conns, - ksnc_list) { - if (index-- > 0) - continue; - - ksocknal_conn_addref(conn); - read_unlock(&ksocknal_data.ksnd_global_lock); - return conn; - } + ksocknal_conn_addref(conn); + read_unlock(&ksocknal_data.ksnd_global_lock); + return conn; } } @@ -1119,8 +1112,8 @@ struct ksock_peer_ni * * NB this puts an "empty" peer_ni in the peer * table (which takes my ref) */ - list_add_tail(&peer_ni->ksnp_list, - ksocknal_nid2peerlist(peerid.nid)); + hash_add(ksocknal_data.ksnd_peers, + &peer_ni->ksnp_list, peerid.nid); } else { ksocknal_peer_decref(peer_ni); peer_ni = peer2; @@ -1732,7 +1725,7 @@ struct ksock_peer_ni * ksocknal_close_matching_conns(struct lnet_process_id id, u32 ipaddr) { struct ksock_peer_ni *peer_ni; - struct ksock_peer_ni *pnxt; + struct hlist_node *pnxt; int lo; int hi; int i; @@ -1741,17 +1734,17 @@ struct ksock_peer_ni * write_lock_bh(&ksocknal_data.ksnd_global_lock); if (id.nid != LNET_NID_ANY) { - lo = (int)(ksocknal_nid2peerlist(id.nid) - ksocknal_data.ksnd_peers); - hi = (int)(ksocknal_nid2peerlist(id.nid) - ksocknal_data.ksnd_peers); + lo = hash_min(id.nid, HASH_BITS(ksocknal_data.ksnd_peers)); + hi = lo; } else { lo = 0; - hi = ksocknal_data.ksnd_peer_hash_size - 1; + hi = HASH_SIZE(ksocknal_data.ksnd_peers) - 1; } for (i = lo; i <= hi; i++) { - list_for_each_entry_safe(peer_ni, pnxt, - &ksocknal_data.ksnd_peers[i], - ksnp_list) { + hlist_for_each_entry_safe(peer_ni, pnxt, + &ksocknal_data.ksnd_peers[i], + ksnp_list) { if (!((id.nid == LNET_NID_ANY || id.nid == peer_ni->ksnp_id.nid) && (id.pid == LNET_PID_ANY || @@ -1769,10 +1762,7 @@ struct ksock_peer_ni * if (id.nid == LNET_NID_ANY || id.pid == LNET_PID_ANY || !ipaddr) return 0; - if (!count) - return -ENOENT; - else - return 0; + return count ? 0 : -ENOENT; } void @@ -1892,21 +1882,20 @@ struct ksock_peer_ni * static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) { - struct list_head *start; - struct list_head *end; - struct list_head *tmp; + int lo; + int hi; + int bkt; int rc = -ENOENT; - unsigned int hsize = ksocknal_data.ksnd_peer_hash_size; - if (id.nid == LNET_NID_ANY) { - start = &ksocknal_data.ksnd_peers[0]; - end = &ksocknal_data.ksnd_peers[hsize - 1]; + if (id.nid != LNET_NID_ANY) { + lo = hash_min(id.nid, HASH_BITS(ksocknal_data.ksnd_peers)); + hi = lo; } else { - start = ksocknal_nid2peerlist(id.nid); - end = ksocknal_nid2peerlist(id.nid); + lo = 0; + hi = HASH_SIZE(ksocknal_data.ksnd_peers) - 1; } - for (tmp = start; tmp <= end; tmp++) { + for (bkt = lo; bkt <= hi; bkt++) { int peer_off; /* searching offset in peer_ni hash table */ for (peer_off = 0; ; peer_off++) { @@ -1914,7 +1903,9 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) int i = 0; read_lock(&ksocknal_data.ksnd_global_lock); - list_for_each_entry(peer_ni, tmp, ksnp_list) { + hlist_for_each_entry(peer_ni, + &ksocknal_data.ksnd_peers[bkt], + ksnp_list) { if (!((id.nid == LNET_NID_ANY || id.nid == peer_ni->ksnp_id.nid) && (id.pid == LNET_PID_ANY || @@ -1969,24 +1960,15 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) iface->ksni_nroutes = 0; iface->ksni_npeers = 0; - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) { - list_for_each_entry(peer_ni, - &ksocknal_data.ksnd_peers[i], - ksnp_list) { - - for (j = 0; - j < peer_ni->ksnp_n_passive_ips; - j++) - if (peer_ni->ksnp_passive_ips[j] == - ipaddress) - iface->ksni_npeers++; - - list_for_each_entry(route, - &peer_ni->ksnp_routes, - ksnr_list) { - if (route->ksnr_myipaddr == ipaddress) - iface->ksni_nroutes++; - } + hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list) { + for (j = 0; j < peer_ni->ksnp_n_passive_ips; j++) + if (peer_ni->ksnp_passive_ips[j] == ipaddress) + iface->ksni_npeers++; + + list_for_each_entry(route, &peer_ni->ksnp_routes, + ksnr_list) { + if (route->ksnr_myipaddr == ipaddress) + iface->ksni_nroutes++; } } @@ -2048,7 +2030,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) { struct ksock_net *net = ni->ni_data; int rc = -ENOENT; - struct ksock_peer_ni *nxt; + struct hlist_node *nxt; struct ksock_peer_ni *peer_ni; u32 this_ip; int i; @@ -2070,16 +2052,12 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) net->ksnn_ninterfaces--; - for (j = 0; j < ksocknal_data.ksnd_peer_hash_size; j++) { - list_for_each_entry_safe(peer_ni, nxt, - &ksocknal_data.ksnd_peers[j], - ksnp_list) { - if (peer_ni->ksnp_ni != ni) - continue; + hash_for_each_safe(ksocknal_data.ksnd_peers, j, + nxt, peer_ni, ksnp_list) { + if (peer_ni->ksnp_ni != ni) + continue; - ksocknal_peer_del_interface_locked(peer_ni, - this_ip); - } + ksocknal_peer_del_interface_locked(peer_ni, this_ip); } } @@ -2224,8 +2202,6 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) if (ksocknal_data.ksnd_schedulers) cfs_percpt_free(ksocknal_data.ksnd_schedulers); - kvfree(ksocknal_data.ksnd_peers); - spin_lock(&ksocknal_data.ksnd_tx_lock); if (!list_empty(&ksocknal_data.ksnd_idle_noop_txs)) { @@ -2250,6 +2226,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) ksocknal_base_shutdown(void) { struct ksock_sched *sched; + struct ksock_peer_ni *peer_ni; int i; LASSERT(!ksocknal_data.ksnd_nnets); @@ -2260,9 +2237,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) /* fall through */ case SOCKNAL_INIT_ALL: case SOCKNAL_INIT_DATA: - LASSERT(ksocknal_data.ksnd_peers); - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) - LASSERT(list_empty(&ksocknal_data.ksnd_peers[i])); + hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list) + LASSERT(0); LASSERT(list_empty(&ksocknal_data.ksnd_nets)); LASSERT(list_empty(&ksocknal_data.ksnd_enomem_conns)); @@ -2326,15 +2302,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) memset(&ksocknal_data, 0, sizeof(ksocknal_data)); /* zero pointers */ - ksocknal_data.ksnd_peer_hash_size = SOCKNAL_PEER_HASH_SIZE; - ksocknal_data.ksnd_peers = kvmalloc_array(ksocknal_data.ksnd_peer_hash_size, - sizeof(struct list_head), - GFP_KERNEL); - if (!ksocknal_data.ksnd_peers) - return -ENOMEM; - - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) - INIT_LIST_HEAD(&ksocknal_data.ksnd_peers[i]); + hash_init(ksocknal_data.ksnd_peers); rwlock_init(&ksocknal_data.ksnd_global_lock); INIT_LIST_HEAD(&ksocknal_data.ksnd_nets); @@ -2452,43 +2420,38 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) read_lock(&ksocknal_data.ksnd_global_lock); - for (i = 0; i < ksocknal_data.ksnd_peer_hash_size; i++) { - list_for_each_entry(peer_ni, &ksocknal_data.ksnd_peers[i], - ksnp_list) { - struct ksock_route *route; - struct ksock_conn *conn; - - if (peer_ni->ksnp_ni != ni) - continue; + hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list) { + struct ksock_route *route; + struct ksock_conn *conn; - CWARN("Active peer_ni on shutdown: %s, ref %d, closing %d, accepting %d, err %d, zcookie %llu, txq %d, zc_req %d\n", - libcfs_id2str(peer_ni->ksnp_id), - atomic_read(&peer_ni->ksnp_refcount), - peer_ni->ksnp_closing, - peer_ni->ksnp_accepting, peer_ni->ksnp_error, - peer_ni->ksnp_zc_next_cookie, - !list_empty(&peer_ni->ksnp_tx_queue), - !list_empty(&peer_ni->ksnp_zc_req_list)); + if (peer_ni->ksnp_ni != ni) + continue; - list_for_each_entry(route, &peer_ni->ksnp_routes, - ksnr_list) { - CWARN("Route: ref %d, schd %d, conn %d, cnted %d, del %d\n", - atomic_read(&route->ksnr_refcount), - route->ksnr_scheduled, - route->ksnr_connecting, - route->ksnr_connected, - route->ksnr_deleted); - } + CWARN("Active peer_ni on shutdown: %s, ref %d, closing %d, accepting %d, err %d, zcookie %llu, txq %d, zc_req %d\n", + libcfs_id2str(peer_ni->ksnp_id), + atomic_read(&peer_ni->ksnp_refcount), + peer_ni->ksnp_closing, + peer_ni->ksnp_accepting, peer_ni->ksnp_error, + peer_ni->ksnp_zc_next_cookie, + !list_empty(&peer_ni->ksnp_tx_queue), + !list_empty(&peer_ni->ksnp_zc_req_list)); + + list_for_each_entry(route, &peer_ni->ksnp_routes, ksnr_list) { + CWARN("Route: ref %d, schd %d, conn %d, cnted %d, del %d\n", + atomic_read(&route->ksnr_refcount), + route->ksnr_scheduled, + route->ksnr_connecting, + route->ksnr_connected, + route->ksnr_deleted); + } - list_for_each_entry(conn, &peer_ni->ksnp_conns, - ksnc_list) { - CWARN("Conn: ref %d, sref %d, t %d, c %d\n", - atomic_read(&conn->ksnc_conn_refcount), - atomic_read(&conn->ksnc_sock_refcount), - conn->ksnc_type, conn->ksnc_closing); - } - goto done; + list_for_each_entry(conn, &peer_ni->ksnp_conns, ksnc_list) { + CWARN("Conn: ref %d, sref %d, t %d, c %d\n", + atomic_read(&conn->ksnc_conn_refcount), + atomic_read(&conn->ksnc_sock_refcount), + conn->ksnc_type, conn->ksnc_closing); } + goto done; } done: read_unlock(&ksocknal_data.ksnd_global_lock); diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 2d4e8d59..9ebb959 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -43,7 +43,7 @@ #include #include #include -#include +#include #include #include @@ -54,7 +54,7 @@ #define SOCKNAL_NSCHEDS 3 #define SOCKNAL_NSCHEDS_HIGH (SOCKNAL_NSCHEDS << 1) -#define SOCKNAL_PEER_HASH_SIZE 101 /* # peer_ni lists */ +#define SOCKNAL_PEER_HASH_BITS 7 /* # log2 of # of peer_ni lists */ #define SOCKNAL_RESCHED 100 /* # scheduler loops before reschedule */ #define SOCKNAL_INSANITY_RECONN 5000 /* connd is trying on reconn infinitely */ #define SOCKNAL_ENOMEM_RETRY 1 /* seconds between retries */ @@ -190,10 +190,10 @@ struct ksock_nal_data { rwlock_t ksnd_global_lock; /* stabilize * peer_ni/conn ops */ - struct list_head *ksnd_peers; /* hash table of all my + DECLARE_HASHTABLE(ksnd_peers, SOCKNAL_PEER_HASH_BITS); + /* hash table of all my * known peers */ - int ksnd_peer_hash_size; /* size of ksnd_peers */ int ksnd_nthreads; /* # live threads */ int ksnd_shuttingdown; /* tell threads to exit @@ -411,7 +411,7 @@ struct ksock_route { #define SOCKNAL_KEEPALIVE_PING 1 /* cookie for keepalive ping */ struct ksock_peer_ni { - struct list_head ksnp_list; /* stash on global peer_ni list */ + struct hlist_node ksnp_list; /* on global peer_nis hash table */ time64_t ksnp_last_alive; /* when (in seconds) I was last * alive */ @@ -519,14 +519,6 @@ struct ksock_proto { (1 << SOCKLND_CONN_BULK_OUT)); } -static inline struct list_head * -ksocknal_nid2peerlist(lnet_nid_t nid) -{ - unsigned int hash = ((unsigned int)nid) % ksocknal_data.ksnd_peer_hash_size; - - return &ksocknal_data.ksnd_peers[hash]; -} - static inline void ksocknal_conn_addref(struct ksock_conn *conn) { diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index 996b231..fb933e3 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -2386,7 +2386,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) static void ksocknal_check_peer_timeouts(int idx) { - struct list_head *peers = &ksocknal_data.ksnd_peers[idx]; + struct hlist_head *peers = &ksocknal_data.ksnd_peers[idx]; struct ksock_peer_ni *peer_ni; struct ksock_conn *conn; struct ksock_tx *tx; @@ -2399,7 +2399,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) */ read_lock(&ksocknal_data.ksnd_global_lock); - list_for_each_entry(peer_ni, peers, ksnp_list) { + hlist_for_each_entry(peer_ni, peers, ksnp_list) { struct ksock_tx *tx_stale; time64_t deadline = 0; int resid = 0; @@ -2564,7 +2564,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) while ((timeout = deadline - ktime_get_seconds()) <= 0) { const int n = 4; const int p = 1; - int chunk = ksocknal_data.ksnd_peer_hash_size; + int chunk = HASH_SIZE(ksocknal_data.ksnd_peers); unsigned int lnd_timeout; /* @@ -2585,7 +2585,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) for (i = 0; i < chunk; i++) { ksocknal_check_peer_timeouts(peer_index); peer_index = (peer_index + 1) % - ksocknal_data.ksnd_peer_hash_size; + HASH_SIZE(ksocknal_data.ksnd_peers); } deadline += p; From patchwork Thu Feb 27 21:18:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B68FC17E0 for ; Thu, 27 Feb 2020 21:41:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F7C8246A1 for ; Thu, 27 Feb 2020 21:41:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F7C8246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D9B12349E05; Thu, 27 Feb 2020 13:33:59 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6430348741 for ; Thu, 27 Feb 2020 13:21:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7606FA15F; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 74B3C46D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:00 -0500 Message-Id: <1582838290-17243-613-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 612/622] lustre: llite: Update mdc and lite stats on open|creat X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Olaf Faaland Increment "create" counter in mdc//md_stats, and "mknod" counter in llite/stats when an open with the CREAT flag results in a newly created file. The mknod counter is chosen for consistency with patch http://review.whamcloud.com/20246 "LU-8150 mdt: Track open+create as mknod" but the mdc counter set does not include mknod. WC-bug-id: https://jira.whamcloud.com/browse/LU-11114 Lustre-commit: 4b8518ee4fa5 ("LU-11114 llite: Update mdc and lite stats on open|creat") Signed-off-by: Olaf Faaland Reviewed-on: https://review.whamcloud.com/36948 Reviewed-by: Andreas Dilger Reviewed-by: Emoly Liu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 12 ++++++++++-- fs/lustre/mdc/mdc_locks.c | 6 ++++++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 89317db..cf2a77f 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -605,7 +605,8 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) static int ll_lookup_it_finish(struct ptlrpc_request *request, struct lookup_intent *it, struct inode *parent, struct dentry **de, - void *secctx, u32 secctxlen) + void *secctx, u32 secctxlen, + ktime_t kstart) { struct inode *inode = NULL; u64 bits = 0; @@ -708,6 +709,11 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, } } + if (it_disposition(it, DISP_OPEN_CREATE)) { + ll_stats_ops_tally(ll_i2sbi(parent), LPROC_LL_MKNOD, + ktime_us_delta(ktime_get(), kstart)); + } + out: if (rc != 0 && it->it_op & IT_OPEN) { ll_intent_drop_lock(it); @@ -722,6 +728,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, u32 *secctxlen, struct pcc_create_attach *pca) { + ktime_t kstart = ktime_get(); struct lookup_intent lookup_it = { .it_op = IT_LOOKUP }; struct dentry *save = dentry, *retval; struct ptlrpc_request *req = NULL; @@ -887,7 +894,8 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, ll_unlock_md_op_lsm(op_data); rc = ll_lookup_it_finish(req, it, parent, &dentry, secctx ? *secctx : NULL, - secctxlen ? *secctxlen : 0); + secctxlen ? *secctxlen : 0, + kstart); if (rc != 0) { ll_intent_release(it); retval = ERR_PTR(rc); diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 60bbae1..b252605 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -733,6 +733,12 @@ static int mdc_finish_enqueue(struct obd_export *exp, mdc_set_open_replay_data(NULL, NULL, it); } + if (it_disposition(it, DISP_OPEN_CREATE) && + !it_open_error(DISP_OPEN_CREATE, it)) { + lprocfs_counter_incr(exp->exp_obd->obd_md_stats, + LPROC_MD_CREATE); + } + if (body->mbo_valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) { void *eadata; From patchwork Thu Feb 27 21:18:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC08217E0 for ; Thu, 27 Feb 2020 21:46:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DC7524690 for ; Thu, 27 Feb 2020 21:46:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DC7524690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 401BD3499E2; Thu, 27 Feb 2020 13:36:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 42BA7348741 for ; Thu, 27 Feb 2020 13:21:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 79300A160; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7777C468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:01 -0500 Message-Id: <1582838290-17243-614-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 613/622] lustre: osc: glimpse and lock cancel race X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev osc_dlm_blocking_ast0 clears l_ast_data before writing file data to OST and opens a race window. Neither a glimpse AST nor ldlm_cb_interpret can find correct file attributes at that moment. Cray-bug-id: LUS-8344 WC-bug-id: https://jira.whamcloud.com/browse/LU-13128 Lustre-commit: 7c99f67d9d39 ("LU-13128 osc: glimpse and lock cancel race") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/37215 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/mdc/mdc_dev.c | 2 +- fs/lustre/osc/osc_lock.c | 8 ++++++-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 7dfef0f..f7fed0e 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -332,6 +332,7 @@ #define OBD_FAIL_OSC_CONNECT_GRANT_PARAM 0x413 #define OBD_FAIL_OSC_DELAY_IO 0x414 #define OBD_FAIL_OSC_NO_SIZE_DATA 0x415 +#define OBD_FAIL_OSC_DELAY_CANCEL 0x416 #define OBD_FAIL_PTLRPC 0x500 #define OBD_FAIL_PTLRPC_ACK 0x501 diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 496491f..5a6be44 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -313,7 +313,6 @@ static int mdc_dlm_blocking_ast0(const struct lu_env *env, if (dlmlock->l_ast_data) { obj = osc2cl(dlmlock->l_ast_data); - dlmlock->l_ast_data = NULL; cl_object_get(obj); } unlock_res_and_lock(dlmlock); @@ -332,6 +331,7 @@ static int mdc_dlm_blocking_ast0(const struct lu_env *env, */ /* losing a lock, update kms */ lock_res_and_lock(dlmlock); + dlmlock->l_ast_data = NULL; cl_object_attr_lock(obj); attr->cat_kms = 0; cl_object_attr_update(env, obj, attr, CAT_KMS); diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index ce592d7..3bb5bbd 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -419,13 +419,13 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, if (dlmlock->l_ast_data) { obj = osc2cl(dlmlock->l_ast_data); - dlmlock->l_ast_data = NULL; - cl_object_get(obj); } unlock_res_and_lock(dlmlock); + OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_DELAY_CANCEL, 5); + /* if l_ast_data is NULL, the dlmlock was enqueued by AGL or * the object has been destroyed. */ @@ -442,6 +442,10 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, /* losing a lock, update kms */ lock_res_and_lock(dlmlock); + /* clearing l_ast_data after flushing data, + * to let glimpse ast find the lock and the object + */ + dlmlock->l_ast_data = NULL; cl_object_attr_lock(obj); /* Must get the value under the lock to avoid race. */ old_kms = cl2osc(obj)->oo_oinfo->loi_kms; From patchwork Thu Feb 27 21:18:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 574B7924 for ; Thu, 27 Feb 2020 21:48:06 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F64B24690 for ; Thu, 27 Feb 2020 21:48:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F64B24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A4F1F34B592; Thu, 27 Feb 2020 13:38:16 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 84694348A33 for ; Thu, 27 Feb 2020 13:21:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7B85CA161; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A59E46C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:02 -0500 Message-Id: <1582838290-17243-615-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 614/622] lustre: llog: keep llog handle alive until last reference X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Llog handle keeps related dt_object pinned until llog_close() call, meanwhile llog handle can still have other users which took llog handle via llog_cat_id2handle() Patch changes llog_handle_put() to call lop_close() upon last reference drop. So llog_osd_close() will put dt_object only when llog_handle has no more references. The llog_handle_get() checks and reports if llog_handle has zero reference. Also patch modifies checks for destroyed llogs, llog handle has new lgh_destroyed flag which is set when llog is destroyed, llog_osd_exist() checks dt_object_exist() and lgh_destroyed flag, so destroyed llogs are considered as non-existent too. Previously it uses lu_object_is_dying() check which is not reliable because means only that object is not to be kept in cache. WC-bug-id: https://jira.whamcloud.com/browse/LU-10198 Lustre-commit: d6bd5e9cc49b ("LU-10198 llog: keep llog handle alive until last reference") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/37367 Reviewed-by: Andreas Dilger Reviewed-by: Alexandr Boyko Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_log.h | 3 ++- fs/lustre/obdclass/llog.c | 49 +++++++++++++++++++------------------- fs/lustre/obdclass/llog_cat.c | 19 +++++++++------ fs/lustre/obdclass/llog_internal.h | 4 ++-- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/fs/lustre/include/lustre_log.h b/fs/lustre/include/lustre_log.h index 9c784ac..6995414 100644 --- a/fs/lustre/include/lustre_log.h +++ b/fs/lustre/include/lustre_log.h @@ -226,7 +226,8 @@ struct llog_handle { char *lgh_name; void *private_data; struct llog_operations *lgh_logops; - struct kref lgh_refcount; + refcount_t lgh_refcount; + bool lgh_destroyed; }; #define LLOG_CTXT_FLAG_UNINITIALIZED 0x00000001 diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c index 620ebc6..5d828bd 100644 --- a/fs/lustre/obdclass/llog.c +++ b/fs/lustre/obdclass/llog.c @@ -65,7 +65,7 @@ static struct llog_handle *llog_alloc_handle(void) init_rwsem(&loghandle->lgh_lock); INIT_LIST_HEAD(&loghandle->u.phd.phd_entry); - kref_init(&loghandle->lgh_refcount); + refcount_set(&loghandle->lgh_refcount, 1); return loghandle; } @@ -73,11 +73,8 @@ static struct llog_handle *llog_alloc_handle(void) /* * Free llog handle and header data if exists. Used in llog_close() only */ -static void llog_free_handle(struct kref *kref) +static void llog_free_handle(struct llog_handle *loghandle) { - struct llog_handle *loghandle = container_of(kref, struct llog_handle, - lgh_refcount); - /* failed llog_init_handle */ if (!loghandle->lgh_hdr) goto out; @@ -91,15 +88,30 @@ static void llog_free_handle(struct kref *kref) kfree(loghandle); } -void llog_handle_get(struct llog_handle *loghandle) +struct llog_handle *llog_handle_get(struct llog_handle *loghandle) { - kref_get(&loghandle->lgh_refcount); + if (refcount_inc_not_zero(&loghandle->lgh_refcount)) + return loghandle; + return NULL; } -void llog_handle_put(struct llog_handle *loghandle) +int llog_handle_put(const struct lu_env *env, struct llog_handle *loghandle) { - LASSERT(kref_read(&loghandle->lgh_refcount) > 0); - kref_put(&loghandle->lgh_refcount, llog_free_handle); + int rc = 0; + + if (refcount_dec_and_test(&loghandle->lgh_refcount)) { + struct llog_operations *lop; + + rc = llog_handle2ops(loghandle, &lop); + if (!rc) { + if (lop->lop_close) + rc = lop->lop_close(env, loghandle); + else + rc = -EOPNOTSUPP; + } + llog_free_handle(loghandle); + } + return rc; } static int llog_read_header(const struct lu_env *env, @@ -541,7 +553,7 @@ int llog_open(const struct lu_env *env, struct llog_ctxt *ctxt, revert_creds(old_cred); if (rc) { - llog_free_handle(&(*lgh)->lgh_refcount); + llog_free_handle(*lgh); *lgh = NULL; } return rc; @@ -550,19 +562,6 @@ int llog_open(const struct lu_env *env, struct llog_ctxt *ctxt, int llog_close(const struct lu_env *env, struct llog_handle *loghandle) { - struct llog_operations *lop; - int rc; - - rc = llog_handle2ops(loghandle, &lop); - if (rc) - goto out; - if (!lop->lop_close) { - rc = -EOPNOTSUPP; - goto out; - } - rc = lop->lop_close(env, loghandle); -out: - llog_handle_put(loghandle); - return rc; + return llog_handle_put(env, loghandle); } EXPORT_SYMBOL(llog_close); diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index 75226f4..46636f8 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -85,10 +85,16 @@ static int llog_cat_id2handle(const struct lu_env *env, cgl->lgl_ogen, logid->lgl_ogen); continue; } + *res = llog_handle_get(loghandle); + if (!*res) { + CERROR("%s: log "DFID" refcount is zero!\n", + loghandle->lgh_ctxt->loc_obd->obd_name, + PFID(&logid->lgl_oi.oi_fid)); + continue; + } loghandle->u.phd.phd_cat_handle = cathandle; up_write(&cathandle->lgh_lock); - rc = 0; - goto out; + return rc; } } up_write(&cathandle->lgh_lock); @@ -105,10 +111,12 @@ static int llog_cat_id2handle(const struct lu_env *env, rc = llog_init_handle(env, loghandle, fmt | LLOG_F_IS_PLAIN, NULL); if (rc < 0) { llog_close(env, loghandle); - loghandle = NULL; + *res = NULL; return rc; } + *res = llog_handle_get(loghandle); + LASSERT(*res); down_write(&cathandle->lgh_lock); list_add_tail(&loghandle->u.phd.phd_entry, &cathandle->u.chd.chd_head); up_write(&cathandle->lgh_lock); @@ -117,9 +125,6 @@ static int llog_cat_id2handle(const struct lu_env *env, loghandle->u.phd.phd_cookie.lgc_lgl = cathandle->lgh_id; loghandle->u.phd.phd_cookie.lgc_index = loghandle->lgh_hdr->llh_cat_idx; -out: - llog_handle_get(loghandle); - *res = loghandle; return 0; } @@ -204,7 +209,7 @@ static int llog_cat_process_cb(const struct lu_env *env, } out: - llog_handle_put(llh); + llog_handle_put(env, llh); return rc; } diff --git a/fs/lustre/obdclass/llog_internal.h b/fs/lustre/obdclass/llog_internal.h index 365bac9..0376656 100644 --- a/fs/lustre/obdclass/llog_internal.h +++ b/fs/lustre/obdclass/llog_internal.h @@ -61,8 +61,8 @@ struct llog_thread_info { int llog_info_init(void); void llog_info_fini(void); -void llog_handle_get(struct llog_handle *loghandle); -void llog_handle_put(struct llog_handle *loghandle); +struct llog_handle *llog_handle_get(struct llog_handle *loghandle); +int llog_handle_put(const struct lu_env *env, struct llog_handle *loghandle); int class_config_dump_handler(const struct lu_env *env, struct llog_handle *handle, struct llog_rec_hdr *rec, void *data); From patchwork Thu Feb 27 21:18:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DAC11580 for ; Thu, 27 Feb 2020 21:48:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5659224690 for ; Thu, 27 Feb 2020 21:48:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5659224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28FF534A360; Thu, 27 Feb 2020 13:38:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D9E0C348A38 for ; Thu, 27 Feb 2020 13:21:30 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7ED17A162; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7D32F47C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:03 -0500 Message-Id: <1582838290-17243-616-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 615/622] lnet: handling device failure by IB event handler X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Tatsushi Takamura , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Tatsushi Takamura The following IB events cannot be handled by QP event handler - IB_EVENT_DEVICE_FATAL - IB_EVENT_PORT_ERR - IB_EVENT_PORT_ACTIVE IB event handler handles device errors such as hardware errors and link down. WC-bug-id: https://jira.whamcloud.com/browse/LU-12287 Lustre-commit: c6e4c21c4f8b ("LU-12287 lnet: handling device failure by IB event handler") Signed-off-by: Tatsushi Takamura Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/35037 Reviewed-by: Chris Horn Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 100 +++++++++++++++++++++++++++++++++++++++ net/lnet/klnds/o2iblnd/o2iblnd.h | 8 ++++ 2 files changed, 108 insertions(+) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index f6db2c7..7bf2883 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2306,9 +2306,93 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, return rc; } +static int kiblnd_port_get_attr(struct kib_hca_dev *hdev) +{ + struct ib_port_attr *port_attr; + int rc; + unsigned long flags; + rwlock_t *g_lock = &kiblnd_data.kib_global_lock; + + port_attr = kzalloc(sizeof(*port_attr), GFP_NOFS); + if (!port_attr) { + CDEBUG(D_NETERROR, "Out of memory\n"); + return -ENOMEM; + } + + rc = ib_query_port(hdev->ibh_ibdev, hdev->ibh_port, port_attr); + + write_lock_irqsave(g_lock, flags); + + if (rc == 0) + hdev->ibh_state = port_attr->state == IB_PORT_ACTIVE + ? IBLND_DEV_PORT_ACTIVE + : IBLND_DEV_PORT_DOWN; + + write_unlock_irqrestore(g_lock, flags); + kfree(port_attr); + + if (rc != 0) { + CDEBUG(D_NETERROR, "Failed to query IB port: %d\n", rc); + return rc; + } + return 0; +} + +static inline void +kiblnd_set_ni_fatal_on(struct kib_hca_dev *hdev, int val) +{ + struct kib_net *net; + + /* for health check */ + list_for_each_entry(net, &hdev->ibh_dev->ibd_nets, ibn_list) { + if (val) + CDEBUG(D_NETERROR, "Fatal device error for NI %s\n", + libcfs_nid2str(net->ibn_ni->ni_nid)); + atomic_set(&net->ibn_ni->ni_fatal_error_on, val); + } +} + +void +kiblnd_event_handler(struct ib_event_handler *handler, struct ib_event *event) +{ + rwlock_t *g_lock = &kiblnd_data.kib_global_lock; + struct kib_hca_dev *hdev; + unsigned long flags; + + hdev = container_of(handler, struct kib_hca_dev, ibh_event_handler); + + write_lock_irqsave(g_lock, flags); + + switch (event->event) { + case IB_EVENT_DEVICE_FATAL: + CDEBUG(D_NET, "IB device fatal\n"); + hdev->ibh_state = IBLND_DEV_FATAL; + kiblnd_set_ni_fatal_on(hdev, 1); + break; + case IB_EVENT_PORT_ACTIVE: + CDEBUG(D_NET, "IB port active\n"); + if (event->element.port_num == hdev->ibh_port) { + hdev->ibh_state = IBLND_DEV_PORT_ACTIVE; + kiblnd_set_ni_fatal_on(hdev, 0); + } + break; + case IB_EVENT_PORT_ERR: + CDEBUG(D_NET, "IB port err\n"); + if (event->element.port_num == hdev->ibh_port) { + hdev->ibh_state = IBLND_DEV_PORT_DOWN; + kiblnd_set_ni_fatal_on(hdev, 1); + } + break; + default: + break; + } + write_unlock_irqrestore(g_lock, flags); +} + static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) { struct ib_device_attr *dev_attr = &hdev->ibh_ibdev->attrs; + int rc2 = 0; /* * It's safe to assume a HCA can handle a page size @@ -2338,12 +2422,19 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) hdev->ibh_mr_size = dev_attr->max_mr_size; hdev->ibh_max_qp_wr = dev_attr->max_qp_wr; + rc2 = kiblnd_port_get_attr(hdev); + if (rc2 != 0) + return rc2; + CERROR("Invalid mr size: %#llx\n", hdev->ibh_mr_size); return -EINVAL; } void kiblnd_hdev_destroy(struct kib_hca_dev *hdev) { + if (hdev->ibh_event_handler.device) + ib_unregister_event_handler(&hdev->ibh_event_handler); + if (hdev->ibh_pd) ib_dealloc_pd(hdev->ibh_pd); @@ -2491,6 +2582,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) hdev->ibh_dev = dev; hdev->ibh_cmid = cmid; hdev->ibh_ibdev = cmid->device; + hdev->ibh_port = cmid->port_num; pd = ib_alloc_pd(cmid->device, 0); if (IS_ERR(pd)) { @@ -2513,6 +2605,10 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) goto out; } + INIT_IB_EVENT_HANDLER(&hdev->ibh_event_handler, + hdev->ibh_ibdev, kiblnd_event_handler); + ib_register_event_handler(&hdev->ibh_event_handler); + write_lock_irqsave(&kiblnd_data.kib_global_lock, flags); swap(dev->ibd_hdev, hdev); /* take over the refcount */ @@ -2907,6 +3003,7 @@ static int kiblnd_startup(struct lnet_ni *ni) goto net_failed; } + net->ibn_ni = ni; net->ibn_incarnation = ktime_get_real_ns() / NSEC_PER_USEC; rc = kiblnd_tunables_setup(ni); @@ -3000,6 +3097,9 @@ static int kiblnd_startup(struct lnet_ni *ni) write_lock_irqsave(&kiblnd_data.kib_global_lock, flags); ibdev->ibd_nnets++; list_add_tail(&net->ibn_list, &ibdev->ibd_nets); + /* for health check */ + if (ibdev->ibd_hdev->ibh_state == IBLND_DEV_PORT_DOWN) + kiblnd_set_ni_fatal_on(ibdev->ibd_hdev, 1); write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); net->ibn_init = IBLND_INIT_ALL; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 2169fdd..8aa79d5 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -180,6 +180,13 @@ struct kib_hca_dev { u64 ibh_mr_size; /* size of MR */ int ibh_max_qp_wr; /* maximum work requests size */ struct ib_pd *ibh_pd; /* PD */ + u8 ibh_port; /* port number */ + struct ib_event_handler + ibh_event_handler; /* IB event handler */ + int ibh_state; /* device status */ +#define IBLND_DEV_PORT_DOWN 0 +#define IBLND_DEV_PORT_ACTIVE 1 +#define IBLND_DEV_FATAL 2 struct kib_dev *ibh_dev; /* owner */ atomic_t ibh_ref; /* refcount */ }; @@ -309,6 +316,7 @@ struct kib_net { struct kib_fmr_poolset **ibn_fmr_ps; /* fmr pool-set */ struct kib_dev *ibn_dev; /* underlying IB device */ + struct lnet_ni *ibn_ni; /* LNet interface */ }; #define KIB_THREAD_SHIFT 16 From patchwork Thu Feb 27 21:18:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 647061580 for ; Thu, 27 Feb 2020 21:48:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4CA9924690 for ; Thu, 27 Feb 2020 21:48:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CA9924690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17DFC34B78B; Thu, 27 Feb 2020 13:38:25 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3BF0A348A3D for ; Thu, 27 Feb 2020 13:21:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 81189A163; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8003D46A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:04 -0500 Message-Id: <1582838290-17243-617-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 616/622] lustre: ptlrpc: simplify wait_event handling in unregister functions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown We can simplify the wait_event_idle_timeout() handling in both ptlrpc_unregister_bulk() and ptlrpc_unregister_reply() by changing the timeout to a countdown. Less variables are needed on the stack. WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: 5e30a2c06176f50f ("LU-10467 lustre: convert most users of LWI_TIMEOUT_INTERVAL()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35973 Reviewed-by: James Simmons Reviewed-by: Petros Koutoupis Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 31 ++++++++++--------------------- fs/lustre/ptlrpc/niobuf.c | 30 ++++++++++++++---------------- 2 files changed, 24 insertions(+), 37 deletions(-) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 632ddf1..1714e66 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -2570,9 +2570,6 @@ u64 ptlrpc_req_xid(struct ptlrpc_request *request) */ static int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async) { - int rc; - wait_queue_head_t *wq; - /* Might sleep. */ LASSERT(!in_interrupt()); @@ -2599,29 +2596,21 @@ static int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async) if (async) return 0; - /* - * We have to wait_event_idle_timeout() whatever the result, to get - * a chance to run reply_in_callback(), and to make sure we've - * unlinked before returning a req to the pool. - */ - if (request->rq_set) - wq = &request->rq_set->set_waitq; - else - wq = &request->rq_reply_waitq; - for (;;) { + wait_queue_head_t *wq = (request->rq_set) ? + &request->rq_set->set_waitq : + &request->rq_reply_waitq; + int seconds = LONG_UNLINK; /* * Network access will complete in finite time but the HUGE * timeout lets us CWARN for visibility of sluggish NALs */ - int cnt = 0; - - while (cnt < LONG_UNLINK && - (rc = wait_event_idle_timeout(*wq, - !ptlrpc_client_recv_or_unlink(request), - HZ)) == 0) - cnt += 1; - if (rc > 0) { + while (seconds > LONG_UNLINK && + (wait_event_idle_timeout(*wq, + !ptlrpc_client_recv_or_unlink(request), + HZ)) == 0) + seconds -= 1; + if (seconds > 0) { ptlrpc_rqphase_move(request, request->rq_next_phase); return 1; } diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 26a1f97..ab2753a 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -244,8 +244,6 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) int ptlrpc_unregister_bulk(struct ptlrpc_request *req, int async) { struct ptlrpc_bulk_desc *desc = req->rq_bulk; - wait_queue_head_t *wq; - int rc; LASSERT(!in_interrupt()); /* might sleep */ @@ -276,23 +274,23 @@ int ptlrpc_unregister_bulk(struct ptlrpc_request *req, int async) if (async) return 0; - if (req->rq_set) - wq = &req->rq_set->set_waitq; - else - wq = &req->rq_reply_waitq; - for (;;) { - /* Network access will complete in finite time but the HUGE + /* The wq argument is ignored by user-space wait_event macros */ + wait_queue_head_t *wq = (req->rq_set != NULL) ? + &req->rq_set->set_waitq : + &req->rq_reply_waitq; + /* + * Network access will complete in finite time but the HUGE * timeout lets us CWARN for visibility of sluggish LNDs */ - int cnt = 0; - - while (cnt < LONG_UNLINK && - (rc = wait_event_idle_timeout(*wq, - !ptlrpc_client_bulk_active(req), - HZ)) == 0) - cnt += 1; - if (rc > 0) { + int seconds = LONG_UNLINK; + + while (seconds > LONG_UNLINK && + wait_event_idle_timeout(*wq, + !ptlrpc_client_bulk_active(req), + HZ) == 0) + seconds -= 1; + if (seconds > 0) { ptlrpc_rqphase_move(req, req->rq_next_phase); return 1; } From patchwork Thu Feb 27 21:18:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 052B71580 for ; Thu, 27 Feb 2020 21:49:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E169E24690 for ; Thu, 27 Feb 2020 21:49:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E169E24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB40934A56B; Thu, 27 Feb 2020 13:39:24 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9615D348A3F for ; Thu, 27 Feb 2020 13:21:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8583AA164; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 82EA346D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:05 -0500 Message-Id: <1582838290-17243-618-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 617/622] lustre: ptlrpc: use l_wait_event_abortable in ptlrpcd_add_reg() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Using wait_event_idle() will ignore signals which is not what we want in ptlrpcd_add_req(). Change it to l_wait_event_abortable(). WC-bug-id: https://jira.whamcloud.com/browse/LU-10467 Lustre-commit: ca6c35cab141 ("LU-10467 lustre: convert users of back_to_sleep()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/35980 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/ptlrpcd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c index 1a1fa05..533f592 100644 --- a/fs/lustre/ptlrpc/ptlrpcd.c +++ b/fs/lustre/ptlrpc/ptlrpcd.c @@ -235,8 +235,8 @@ void ptlrpcd_add_req(struct ptlrpc_request *req) if (wait_event_idle_timeout(req->rq_set_waitq, !req->rq_set, 5 * HZ) == 0) - wait_event_idle(req->rq_set_waitq, - !req->rq_set); + l_wait_event_abortable(req->rq_set_waitq, + !req->rq_set); } else if (req->rq_set) { /* * If we have a valid "rq_set", just reuse it to avoid double From patchwork Thu Feb 27 21:18:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CAEB924 for ; Thu, 27 Feb 2020 21:48:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6540224690 for ; Thu, 27 Feb 2020 21:48:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6540224690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A236834B7A3; Thu, 27 Feb 2020 13:38:28 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB101348A3D for ; Thu, 27 Feb 2020 13:21:31 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 86E54A165; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 85D1E468; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:06 -0500 Message-Id: <1582838290-17243-619-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 618/622] lnet: use LIST_HEAD() for local lists. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown When declaring a local list head, instead of struct list_head list; INIT_LIST_HEAD(&list); use LIST_HEAD(list); which does both steps. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: 135b5c0009e5 ("LU-9679 lnet: use LIST_HEAD() for local lists.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36954 Reviewed-by: James Simmons Reviewed-by: Shaun Tancheff Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 7 ++----- net/lnet/klnds/socklnd/socklnd_cb.c | 3 +-- net/lnet/lnet/api-ni.c | 20 +++++--------------- net/lnet/lnet/config.c | 28 ++++++++-------------------- net/lnet/lnet/lib-move.c | 30 ++++++++---------------------- net/lnet/lnet/net_fault.c | 14 ++++---------- net/lnet/lnet/peer.c | 8 ++------ net/lnet/lnet/router.c | 15 ++++----------- net/lnet/selftest/console.c | 4 +--- 9 files changed, 35 insertions(+), 94 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index f769a45..67780d0 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1384,11 +1384,9 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, { rwlock_t *glock = &kiblnd_data.kib_global_lock; char *reason = NULL; - struct list_head txs; + LIST_HEAD(txs); unsigned long flags; - INIT_LIST_HEAD(&txs); - write_lock_irqsave(glock, flags); if (!peer_ni->ibp_reconnecting) { if (peer_ni->ibp_accepting) @@ -2218,7 +2216,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, { struct kib_peer_ni *peer_ni = conn->ibc_peer; struct kib_tx *tx; - struct list_head txs; + LIST_HEAD(txs); unsigned long flags; int active; @@ -2277,7 +2275,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, } /* grab pending txs while I have the lock */ - INIT_LIST_HEAD(&txs); list_splice_init(&peer_ni->ibp_tx_queue, &txs); if (!kiblnd_peer_active(peer_ni) || /* peer_ni has been deleted */ diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index fb933e3..66b0ac7 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -2491,14 +2491,13 @@ void ksocknal_write_callback(struct ksock_conn *conn) wait_queue_entry_t wait; struct ksock_conn *conn; struct ksock_sched *sched; - struct list_head enomem_conns; + LIST_HEAD(enomem_conns); int nenomem_conns; time64_t timeout; int i; int peer_index = 0; time64_t deadline = ktime_get_seconds(); - INIT_LIST_HEAD(&enomem_conns); init_waitqueue_entry(&wait, current); spin_lock_bh(&ksocknal_data.ksnd_reaper_lock); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index b9c38f3..8f59266 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2062,11 +2062,9 @@ static void lnet_push_target_fini(void) lnet_shutdown_lndnets(void) { struct lnet_net *net; - struct list_head resend; + LIST_HEAD(resend); struct lnet_msg *msg, *tmp; - INIT_LIST_HEAD(&resend); - /* NB called holding the global mutex */ /* All quiet on the API front */ @@ -2202,7 +2200,7 @@ static void lnet_push_target_fini(void) { struct lnet_ni *ni; struct lnet_net *net_l = NULL; - struct list_head local_ni_list; + LIST_HEAD(local_ni_list); int ni_count = 0; u32 lnd_type; struct lnet_lnd *lnd; @@ -2214,8 +2212,6 @@ static void lnet_push_target_fini(void) int peerrtrcredits = net->net_tunables.lct_peer_rtr_credits; - INIT_LIST_HEAD(&local_ni_list); - /* * make sure that this net is unique. If it isn't then * we are adding interfaces to an already existing network, and @@ -2509,11 +2505,9 @@ void lnet_lib_exit(void) int ni_count; struct lnet_ping_buffer *pbuf; struct lnet_handle_md ping_mdh; - struct list_head net_head; + LIST_HEAD(net_head); struct lnet_net *net; - INIT_LIST_HEAD(&net_head); - mutex_lock(&the_lnet.ln_api_mutex); CDEBUG(D_OTHER, "refs %d\n", the_lnet.ln_refcount); @@ -3098,9 +3092,7 @@ static int lnet_handle_legacy_ip2nets(char *ip2nets, struct lnet_net *net; char *nets; int rc; - struct list_head net_head; - - INIT_LIST_HEAD(&net_head); + LIST_HEAD(net_head); rc = lnet_parse_ip2nets(&nets, ip2nets); if (rc < 0) @@ -3282,13 +3274,11 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) lnet_dyn_add_net(struct lnet_ioctl_config_data *conf) { struct lnet_net *net; - struct list_head net_head; + LIST_HEAD(net_head); int rc; struct lnet_ioctl_config_lnd_tunables tun; char *nets = conf->cfg_config_u.cfg_net.net_intf; - INIT_LIST_HEAD(&net_head); - /* Create a net/ni structures for the network string */ rc = lnet_parse_networks(&net_head, nets, use_tcp_bonding); if (rc <= 0) diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index f50df88..9d3813c 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -889,14 +889,12 @@ struct lnet_ni * static int lnet_str2tbs_sep(struct list_head *tbs, char *str) { - struct list_head pending; + LIST_HEAD(pending); char *sep; int nob; int i; struct lnet_text_buf *ltb; - INIT_LIST_HEAD(&pending); - /* Split 'str' into separate commands */ for (;;) { /* skip leading whitespace */ @@ -973,7 +971,7 @@ struct lnet_ni * lnet_str2tbs_expand(struct list_head *tbs, char *str) { char num[16]; - struct list_head pending; + LIST_HEAD(pending); char *sep; char *sep2; char *parsed; @@ -985,8 +983,6 @@ struct lnet_ni * int nob; int scanned; - INIT_LIST_HEAD(&pending); - sep = strchr(str, '['); if (!sep) /* nothing to expand */ return 0; @@ -1097,8 +1093,8 @@ struct lnet_ni * { /* static scratch buffer OK (single threaded) */ static char cmd[LNET_SINGLE_TEXTBUF_NOB]; - struct list_head nets; - struct list_head gateways; + LIST_HEAD(nets); + LIST_HEAD(gateways); struct list_head *tmp1; struct list_head *tmp2; u32 net; @@ -1114,9 +1110,6 @@ struct lnet_ni * int got_hops = 0; unsigned int priority = 0; - INIT_LIST_HEAD(&gateways); - INIT_LIST_HEAD(&nets); - /* save a copy of the string for error messages */ strncpy(cmd, str, sizeof(cmd)); cmd[sizeof(cmd) - 1] = '\0'; @@ -1260,13 +1253,11 @@ struct lnet_ni * int lnet_parse_routes(char *routes, int *im_a_router) { - struct list_head tbs; + LIST_HEAD(tbs); int rc = 0; *im_a_router = 0; - INIT_LIST_HEAD(&tbs); - if (lnet_str2tbs_sep(&tbs, routes) < 0) { CERROR("Error parsing routes\n"); rc = -EINVAL; @@ -1453,9 +1444,9 @@ struct lnet_ni * { static char networks[LNET_SINGLE_TEXTBUF_NOB]; static char source[LNET_SINGLE_TEXTBUF_NOB]; - struct list_head raw_entries; - struct list_head matched_nets; - struct list_head current_nets; + LIST_HEAD(raw_entries); + LIST_HEAD(matched_nets); + LIST_HEAD(current_nets); struct list_head *t; struct list_head *t2; struct lnet_text_buf *tb; @@ -1467,15 +1458,12 @@ struct lnet_ni * int dup; int rc; - INIT_LIST_HEAD(&raw_entries); if (lnet_str2tbs_sep(&raw_entries, ip2nets) < 0) { CERROR("Error parsing ip2nets\n"); LASSERT(!lnet_tbnob); return -EINVAL; } - INIT_LIST_HEAD(&matched_nets); - INIT_LIST_HEAD(¤t_nets); networks[0] = 0; count = 0; len = 0; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index cd36d52..cd7ac7f 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -166,7 +166,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_test_peer *tp; struct list_head *el; struct list_head *next; - struct list_head cull; + LIST_HEAD(cull); /* NB: use lnet_net_lock(0) to serialize operations on test peers */ if (threshold) { @@ -184,9 +184,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return 0; } - /* removing entries */ - INIT_LIST_HEAD(&cull); - lnet_net_lock(0); list_for_each_safe(el, next, &the_lnet.ln_test_peers) { @@ -216,11 +213,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_test_peer *tp; struct list_head *el; struct list_head *next; - struct list_head cull; + LIST_HEAD(cull); int fail = 0; - INIT_LIST_HEAD(&cull); - /* NB: use lnet_net_lock(0) to serialize operations on test peers */ lnet_net_lock(0); @@ -2620,7 +2615,6 @@ struct lnet_mt_event_info { lnet_finalize_expired_responses(void) { struct lnet_libmd *md; - struct list_head local_queue; struct lnet_rsp_tracker *rspt, *tmp; ktime_t now; int i; @@ -2629,7 +2623,7 @@ struct lnet_mt_event_info { return; cfs_cpt_for_each(i, lnet_cpt_table()) { - INIT_LIST_HEAD(&local_queue); + LIST_HEAD(local_queue); lnet_net_lock(i); if (!the_lnet.ln_mt_rstq[i]) { @@ -2856,8 +2850,8 @@ struct lnet_mt_event_info { lnet_recover_local_nis(void) { struct lnet_mt_event_info *ev_info; - struct list_head processed_list; - struct list_head local_queue; + LIST_HEAD(processed_list); + LIST_HEAD(local_queue); struct lnet_handle_md mdh; struct lnet_ni *tmp; struct lnet_ni *ni; @@ -2865,9 +2859,6 @@ struct lnet_mt_event_info { int healthv; int rc; - INIT_LIST_HEAD(&local_queue); - INIT_LIST_HEAD(&processed_list); - /* splice the recovery queue on a local queue. We will iterate * through the local queue and update it as needed. Once we're * done with the traversal, we'll splice the local queue back on @@ -3091,11 +3082,9 @@ struct lnet_mt_event_info { lnet_clean_resendqs(void) { struct lnet_msg *msg, *tmp; - struct list_head msgs; + LIST_HEAD(msgs); int i; - INIT_LIST_HEAD(&msgs); - cfs_cpt_for_each(i, lnet_cpt_table()) { lnet_net_lock(i); list_splice_init(the_lnet.ln_mt_resendqs[i], &msgs); @@ -3114,8 +3103,8 @@ struct lnet_mt_event_info { lnet_recover_peer_nis(void) { struct lnet_mt_event_info *ev_info; - struct list_head processed_list; - struct list_head local_queue; + LIST_HEAD(processed_list); + LIST_HEAD(local_queue); struct lnet_handle_md mdh; struct lnet_peer_ni *lpni; struct lnet_peer_ni *tmp; @@ -3123,9 +3112,6 @@ struct lnet_mt_event_info { int healthv; int rc; - INIT_LIST_HEAD(&local_queue); - INIT_LIST_HEAD(&processed_list); - /* Always use cpt 0 for locking across all interactions with * ln_mt_peerNIRecovq */ diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index 8408e93..515aa05 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -201,11 +201,9 @@ struct lnet_drop_rule { { struct lnet_drop_rule *rule; struct lnet_drop_rule *tmp; - struct list_head zombies; + LIST_HEAD(zombies); int n = 0; - INIT_LIST_HEAD(&zombies); - lnet_net_lock(LNET_LOCK_EX); list_for_each_entry_safe(rule, tmp, &the_lnet.ln_drop_rules, dr_link) { if (rule->dr_attr.fa_src != src && src) @@ -725,9 +723,8 @@ struct delay_daemon_data { lnet_delay_rule_check(void) { struct lnet_delay_rule *rule; - struct list_head msgs; + LIST_HEAD(msgs); - INIT_LIST_HEAD(&msgs); while (1) { if (list_empty(&delay_dd.dd_sched_rules)) break; @@ -886,14 +883,11 @@ struct delay_daemon_data { { struct lnet_delay_rule *rule; struct lnet_delay_rule *tmp; - struct list_head rule_list; - struct list_head msg_list; + LIST_HEAD(rule_list); + LIST_HEAD(msg_list); int n = 0; bool cleanup; - INIT_LIST_HEAD(&rule_list); - INIT_LIST_HEAD(&msg_list); - if (shutdown) { src = 0; dst = 0; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 0d7fbd4..b76ff94 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1912,9 +1912,7 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp) { struct lnet_msg *msg, *tmp; int rc = 0; - struct list_head pending_msgs; - - INIT_LIST_HEAD(&pending_msgs); + LIST_HEAD(pending_msgs); CDEBUG(D_NET, "Discovery complete. Dequeue peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); @@ -3238,11 +3236,9 @@ static int lnet_peer_discovery_wait_for_work(void) static void lnet_resend_msgs(void) { struct lnet_msg *msg, *tmp; - struct list_head resend; + LIST_HEAD(resend); int rc; - INIT_LIST_HEAD(&resend); - spin_lock(&the_lnet.ln_msg_resend_lock); list_splice(&the_lnet.ln_msg_resend, &resend); spin_unlock(&the_lnet.ln_msg_resend_lock); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 7ba406a..69df212 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -717,19 +717,16 @@ static void lnet_shuffle_seed(void) int lnet_del_route(u32 net, lnet_nid_t gw_nid) { - struct list_head rnet_zombies; + LIST_HEAD(rnet_zombies); struct lnet_remotenet *rnet; struct lnet_remotenet *tmp; struct list_head *rn_list; struct lnet_peer_ni *lpni; struct lnet_route *route; - struct list_head zombies; + LIST_HEAD(zombies); struct lnet_peer *lp = NULL; int i = 0; - INIT_LIST_HEAD(&rnet_zombies); - INIT_LIST_HEAD(&zombies); - CDEBUG(D_NET, "Del route: net %s : gw %s\n", libcfs_net2str(net), libcfs_nid2str(gw_nid)); @@ -1152,14 +1149,12 @@ bool lnet_router_checker_active(void) lnet_rtrpool_free_bufs(struct lnet_rtrbufpool *rbp, int cpt) { int npages = rbp->rbp_npages; - struct list_head tmp; + LIST_HEAD(tmp); struct lnet_rtrbuf *rb; if (!rbp->rbp_nbuffers) /* not initialized or already freed */ return; - INIT_LIST_HEAD(&tmp); - lnet_net_lock(cpt); list_splice_init(&rbp->rbp_msgs, &tmp); lnet_drop_routed_msgs_locked(&tmp, cpt); @@ -1181,7 +1176,7 @@ bool lnet_router_checker_active(void) static int lnet_rtrpool_adjust_bufs(struct lnet_rtrbufpool *rbp, int nbufs, int cpt) { - struct list_head rb_list; + LIST_HEAD(rb_list); struct lnet_rtrbuf *rb; int num_rb; int num_buffers = 0; @@ -1213,8 +1208,6 @@ bool lnet_router_checker_active(void) rbp->rbp_req_nbuffers = nbufs; lnet_net_unlock(cpt); - INIT_LIST_HEAD(&rb_list); - /* * allocate the buffers on a local list first. If all buffers are * allocated successfully then join this list to the rbp buffer diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c index 9f32c1f..cc2c61d 100644 --- a/net/lnet/selftest/console.c +++ b/net/lnet/selftest/console.c @@ -1484,12 +1484,10 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, lstcon_ndlist_stat(struct list_head *ndlist, int timeout, struct list_head __user *result_up) { - struct list_head head; + LIST_HEAD(head); struct lstcon_rpc_trans *trans; int rc; - INIT_LIST_HEAD(&head); - rc = lstcon_rpc_trans_ndlist(ndlist, &head, LST_TRANS_STATQRY, NULL, NULL, &trans); if (rc) { From patchwork Thu Feb 27 21:18:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EF481580 for ; Thu, 27 Feb 2020 21:48:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7779324690 for ; Thu, 27 Feb 2020 21:48:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7779324690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AD2AB34B7BB; Thu, 27 Feb 2020 13:38:31 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3CCB1348A3D for ; Thu, 27 Feb 2020 13:21:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8A0C0A166; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 88B0746C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:07 -0500 Message-Id: <1582838290-17243-620-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 619/622] lustre: lustre: use LIST_HEAD() for local lists. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown When declaring a local list head, instead of struct list_head list; INIT_LIST_HEAD(&list); use LIST_HEAD(list); which does both steps. WC-bug-id: https://jira.whamcloud.com/browse/LU-9679 Lustre-commit: 0098396983e1 ("LU-9679 lustre: use LIST_HEAD() for local lists.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/36955 Reviewed-by: Shaun Tancheff Reviewed-by: Andreas Dilger Reviewed-by: Arshad Hussain Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 3 +-- fs/lustre/obdclass/lu_object.c | 6 ++---- fs/lustre/obdclass/obd_mount.c | 3 +-- fs/lustre/ptlrpc/client.c | 3 +-- fs/lustre/ptlrpc/service.c | 3 +-- 5 files changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 325005d..b19a1bd 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -1885,7 +1885,7 @@ int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count, struct root_squash_info *squash, char *name) { char *kernbuf = NULL, *errmsg; - struct list_head tmp; + LIST_HEAD(tmp); int len = count; int rc; @@ -1924,7 +1924,6 @@ int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count, return count; } - INIT_LIST_HEAD(&tmp); if (cfs_parse_nidlist(kernbuf, count, &tmp) <= 0) { errmsg = "can't parse"; rc = -EINVAL; diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 7ea9948..e328f89 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -361,7 +361,7 @@ static void lu_object_free(const struct lu_env *env, struct lu_object *o) struct lu_site *site; struct lu_object *scan; struct list_head *layers; - struct list_head splice; + LIST_HEAD(splice); site = o->lo_dev->ld_site; layers = &o->lo_header->loh_layers; @@ -380,7 +380,6 @@ static void lu_object_free(const struct lu_env *env, struct lu_object *o) * necessary, because lu_object_header is freed together with the * top-level slice. */ - INIT_LIST_HEAD(&splice); list_splice_init(layers, &splice); while (!list_empty(&splice)) { /* @@ -408,7 +407,7 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, struct lu_object_header *h; struct lu_object_header *temp; struct lu_site_bkt_data *bkt; - struct list_head dispose; + LIST_HEAD(dispose); int did_sth; unsigned int start = 0; int count; @@ -418,7 +417,6 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, if (OBD_FAIL_CHECK(OBD_FAIL_OBD_NO_LRU)) return 0; - INIT_LIST_HEAD(&dispose); /* * Under LRU list lock, scan LRU list and move unreferenced objects to * the dispose list, removing them from LRU and hash table. diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c index 31f2f5b..206edde 100644 --- a/fs/lustre/obdclass/obd_mount.c +++ b/fs/lustre/obdclass/obd_mount.c @@ -982,7 +982,7 @@ static bool lmd_find_delimiter(char *buf, char **endh) */ static int lmd_parse_nidlist(char *buf, char **endh) { - struct list_head nidlist; + LIST_HEAD(nidlist); char *endp = buf; int rc = 0; char tmp; @@ -1000,7 +1000,6 @@ static int lmd_parse_nidlist(char *buf, char **endh) tmp = *endp; *endp = '\0'; - INIT_LIST_HEAD(&nidlist); if (cfs_parse_nidlist(buf, strlen(buf), &nidlist) <= 0) rc = 1; cfs_free_nidlist(&nidlist); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 1714e66..424819e 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1715,13 +1715,12 @@ static inline int ptlrpc_set_producer(struct ptlrpc_request_set *set) int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) { struct ptlrpc_request *req, *next; - struct list_head comp_reqs; + LIST_HEAD(comp_reqs); int force_timer_recalc = 0; if (atomic_read(&set->set_remaining) == 0) return 1; - INIT_LIST_HEAD(&comp_reqs); list_for_each_entry_safe(req, next, &set->set_requests, rq_set_chain) { struct obd_import *imp = req->rq_import; int unregistered = 0; diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index f65d5c5..b10c61b 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1211,7 +1211,7 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) { struct ptlrpc_at_array *array = &svcpt->scp_at_array; struct ptlrpc_request *rq, *n; - struct list_head work_list; + LIST_HEAD(work_list); u32 index, count; time64_t deadline; time64_t now = ktime_get_real_seconds(); @@ -1244,7 +1244,6 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) * We're close to a timeout, and we don't know how much longer the * server will take. Send early replies to everyone expiring soon. */ - INIT_LIST_HEAD(&work_list); deadline = -1; div_u64_rem(array->paa_deadline, array->paa_size, &index); count = array->paa_count; From patchwork Thu Feb 27 21:18:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 998361580 for ; Thu, 27 Feb 2020 21:48:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 81D7C24690 for ; Thu, 27 Feb 2020 21:48:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81D7C24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA39034A3ED; Thu, 27 Feb 2020 13:38:34 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9278721FDDE for ; Thu, 27 Feb 2020 13:21:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8CFA0A167; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8B6E047C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:08 -0500 Message-Id: <1582838290-17243-621-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 620/622] lustre: handle: discard h_lock. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: NeilBrown The h_lock spinlock is now only taken while bucket->lock is held. As a handle is associated with precisely one bucket, this means that h_lock can never be contended, so it isn't needed. So discard h_lock. Also discard an increasingly irrelevant comment in the declaration of struct portals_handle. WC-bug-id: https://jira.whamcloud.com/browse/LU-12542 Lustre-commit: 6acafe7ac4ef ("LU-12542 handle: discard h_lock.") Signed-off-by: NeilBrown Reviewed-on: https://review.whamcloud.com/35863 Reviewed-by: Neil Brown Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_handles.h | 3 --- fs/lustre/obdclass/lustre_handles.c | 7 ------- 2 files changed, 10 deletions(-) diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h index afdade7..9dbe7c9 100644 --- a/fs/lustre/include/lustre_handles.h +++ b/fs/lustre/include/lustre_handles.h @@ -62,10 +62,7 @@ struct portals_handle { u64 h_cookie; const char *h_owner; refcount_t h_ref; - - /* newly added fields to handle the RCU issue. -jxiong */ struct rcu_head h_rcu; - spinlock_t h_lock; }; /* handles.c */ diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c index 0048036..7ecd15ad3 100644 --- a/fs/lustre/obdclass/lustre_handles.c +++ b/fs/lustre/obdclass/lustre_handles.c @@ -85,7 +85,6 @@ void class_handle_hash(struct portals_handle *h, const char *owner) spin_unlock(&handle_base_lock); h->h_owner = owner; - spin_lock_init(&h->h_lock); bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK]; spin_lock(&bucket->lock); @@ -108,13 +107,7 @@ static void class_handle_unhash_nolock(struct portals_handle *h) CDEBUG(D_INFO, "removing object %p with handle %#llx from hash\n", h, h->h_cookie); - spin_lock(&h->h_lock); - if (hlist_unhashed(&h->h_link)) { - spin_unlock(&h->h_lock); - return; - } hlist_del_init_rcu(&h->h_link); - spin_unlock(&h->h_lock); } void class_handle_unhash(struct portals_handle *h) From patchwork Thu Feb 27 21:18:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C89A924 for ; Thu, 27 Feb 2020 21:48:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 149CE24690 for ; Thu, 27 Feb 2020 21:48:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 149CE24690 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 868C534B7E4; Thu, 27 Feb 2020 13:38:38 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D30E921FDDE for ; Thu, 27 Feb 2020 13:21:32 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8FD78A168; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8E3EF46A; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:09 -0500 Message-Id: <1582838290-17243-622-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 621/622] lnet: remove lnd_query interface. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The ->lnd_query interface is completely unused, and has been since commit 8e498d3f23ea ("LU-11300 lnet: peer aliveness") So remove all mention of it. Fixes: 5cdf0e31a7a9 ("lnet: peer aliveness") WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 0d816af574b7 ("LU-11300 lnet: remove lnd_query interface.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/37337 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 3 -- net/lnet/klnds/o2iblnd/o2iblnd.c | 32 ------------------- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 2 +- net/lnet/klnds/socklnd/socklnd.c | 62 ------------------------------------- net/lnet/lnet/api-ni.c | 3 -- 5 files changed, 1 insertion(+), 101 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 3345940..e885131 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -298,9 +298,6 @@ struct lnet_lnd { /* notification of peer down */ void (*lnd_notify_peer_down)(lnet_nid_t peer); - /* query of peer aliveness */ - void (*lnd_query)(struct lnet_ni *ni, lnet_nid_t peer, time64_t *when); - /* accept a new connection */ int (*lnd_accept)(struct lnet_ni *ni, struct socket *sock); }; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 7bf2883..196ea4d 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -1128,37 +1128,6 @@ static int kiblnd_ctl(struct lnet_ni *ni, unsigned int cmd, void *arg) return rc; } -static void kiblnd_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when) -{ - time64_t last_alive = 0; - time64_t now = ktime_get_seconds(); - rwlock_t *glock = &kiblnd_data.kib_global_lock; - struct kib_peer_ni *peer_ni; - unsigned long flags; - - read_lock_irqsave(glock, flags); - - peer_ni = kiblnd_find_peer_locked(ni, nid); - if (peer_ni) - last_alive = peer_ni->ibp_last_alive; - - read_unlock_irqrestore(glock, flags); - - if (last_alive) - *when = last_alive; - - /* - * peer_ni is not persistent in hash, trigger peer_ni creation - * and connection establishment with a NULL tx - */ - if (!peer_ni) - kiblnd_launch_tx(ni, NULL, nid); - - CDEBUG(D_NET, "peer_ni %s %p, alive %lld secs ago\n", - libcfs_nid2str(nid), peer_ni, - last_alive ? now - last_alive : -1); -} - static void kiblnd_free_pages(struct kib_pages *p) { int npages = p->ibp_npages; @@ -3125,7 +3094,6 @@ static int kiblnd_startup(struct lnet_ni *ni) .lnd_startup = kiblnd_startup, .lnd_shutdown = kiblnd_shutdown, .lnd_ctl = kiblnd_ctl, - .lnd_query = kiblnd_query, .lnd_send = kiblnd_send, .lnd_recv = kiblnd_recv, }; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 67780d0..087657c 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2684,7 +2684,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, * attempts (active or passive) are in progress * NB: reconnect is still needed even when ibp_tx_queue is * empty if ibp_version != version because reconnect may be - * initiated by kiblnd_query() + * initiated. */ reconnect = (!list_empty(&peer_ni->ibp_tx_queue) || peer_ni->ibp_version != version) && diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 7abb75a..d967958 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1789,67 +1789,6 @@ struct ksock_peer_ni * */ } -void -ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when) -{ - int connect = 1; - time64_t last_alive = 0; - time64_t now = ktime_get_seconds(); - struct ksock_peer_ni *peer_ni = NULL; - rwlock_t *glock = &ksocknal_data.ksnd_global_lock; - struct lnet_process_id id = { - .nid = nid, - .pid = LNET_PID_LUSTRE, - }; - - read_lock(glock); - - peer_ni = ksocknal_find_peer_locked(ni, id); - if (peer_ni) { - struct ksock_conn *conn; - int bufnob; - - list_for_each_entry(conn, &peer_ni->ksnp_conns, ksnc_list) { - bufnob = conn->ksnc_sock->sk->sk_wmem_queued; - - if (bufnob < conn->ksnc_tx_bufnob) { - /* something got ACKed */ - conn->ksnc_tx_deadline = ktime_get_seconds() + - lnet_get_lnd_timeout(); - peer_ni->ksnp_last_alive = now; - conn->ksnc_tx_bufnob = bufnob; - } - } - - last_alive = peer_ni->ksnp_last_alive; - if (!ksocknal_find_connectable_route_locked(peer_ni)) - connect = 0; - } - - read_unlock(glock); - - if (last_alive) - *when = last_alive * HZ; - - CDEBUG(D_NET, "peer_ni %s %p, alive %lld secs ago, connect %d\n", - libcfs_nid2str(nid), peer_ni, - last_alive ? now - last_alive : -1, - connect); - - if (!connect) - return; - - ksocknal_add_peer(ni, id, LNET_NIDADDR(nid), lnet_acceptor_port()); - - write_lock_bh(glock); - - peer_ni = ksocknal_find_peer_locked(ni, id); - if (peer_ni) - ksocknal_launch_all_connections_locked(peer_ni); - - write_unlock_bh(glock); -} - static void ksocknal_push_peer(struct ksock_peer_ni *peer_ni) { @@ -2775,7 +2714,6 @@ static void __exit ksocklnd_exit(void) .lnd_send = ksocknal_send, .lnd_recv = ksocknal_recv, .lnd_notify_peer_down = ksocknal_notify_gw_down, - .lnd_query = ksocknal_query, .lnd_accept = ksocknal_accept, }; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 8f59266..ea23471 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2304,9 +2304,6 @@ static void lnet_push_target_fini(void) if (rc < 0) goto failed1; - LASSERT(ni->ni_net->net_tunables.lct_peer_timeout <= 0 || - ni->ni_net->net_lnd->lnd_query); - lnet_ni_addref(ni); list_add_tail(&ni->ni_netlist, &local_ni_list); From patchwork Thu Feb 27 21:18:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDD1B924 for ; Thu, 27 Feb 2020 21:41:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B69F5246A1 for ; Thu, 27 Feb 2020 21:41:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B69F5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6D20D349DF5; Thu, 27 Feb 2020 13:34:03 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35FC021FDD4 for ; Thu, 27 Feb 2020 13:21:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 91E00A169; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 90E6D46D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:18:10 -0500 Message-Id: <1582838290-17243-623-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 622/622] lnet: use conservative health timeouts X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Use more conservative lnet_transaction_timeout and lnet_retry_count values by default. Currently with timeout=10 and retry=3 there is only a 3s window for the RPC to be sent before it is timed out. This has caused fault injection rather than fault tolerance. Increase the default timeout to 50s with retry=2, which is hopefully long enough to cover virtually all uses, but still allows LNet Health to be enabled by default and resend before Lustre times out itself. Fixes: d24c948e4467 ("lnet: setup health timeout defaults") WC-bug-id: https://jira.whamcloud.com/browse/LU-13145 Lustre-commit: 361e9eaef13c ("LU-13145 lnet: use conservative health timeouts") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/37430 Reviewed-by: Serguei Smirnov Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index ea23471..10ade73 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -141,7 +141,7 @@ static int recovery_interval_set(const char *val, "Set to 1 to drop asymmetrical route messages."); #define LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT 50 -#define LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT 10 +#define LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT 50 unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT; static int transaction_to_set(const char *val, const struct kernel_param *kp); @@ -156,7 +156,7 @@ static int recovery_interval_set(const char *val, MODULE_PARM_DESC(lnet_transaction_timeout, "Maximum number of seconds to wait for a peer response."); -#define LNET_RETRY_COUNT_HEALTH_DEFAULT 3 +#define LNET_RETRY_COUNT_HEALTH_DEFAULT 2 unsigned int lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT; static int retry_count_set(const char *val, const struct kernel_param *kp); static struct kernel_param_ops param_ops_retry_count = {