From patchwork Tue Sep 6 01:55:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966722 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5EE7AC6FA86 for ; Tue, 6 Sep 2022 01:55:48 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7l55rgQz1y6G; Mon, 5 Sep 2022 18:55:45 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l15ccfz1wn4 for ; Mon, 5 Sep 2022 18:55:41 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 9744C100AFF4; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9177758992; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:14 -0400 Message-Id: <1662429337-18737-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/24] lustre: sec: new connect flag for name encryption X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Introduce OBD_CONNECT2_ENCRYPT_NAME connection flag for compatibility with older versions that do not support name encryption. When server side does not have this flag, client side is forced to null encryption for file names. And client needs to use old xattr to store encryption context. Fixes: 860818695d ("lustre: sec: filename encryption - digest support") WC-bug-id: https://jira.whamcloud.com/browse/LU-15922 Lustre-commit: 71d63602c57e6d0cb ("LU-15922 sec: new connect flag for name encryption") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/47574 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_crypto.h | 2 ++ fs/lustre/llite/crypto.c | 19 ++++++++++++++++--- fs/lustre/llite/llite_internal.h | 30 +++++++++++++++++++++++++++--- fs/lustre/llite/llite_lib.c | 14 +++++++++++++- fs/lustre/llite/namei.c | 2 +- fs/lustre/llite/statahead.c | 2 +- fs/lustre/llite/symlink.c | 2 +- fs/lustre/llite/xattr_cache.c | 8 ++++---- fs/lustre/obdclass/lprocfs_status.c | 1 + fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 11 files changed, 69 insertions(+), 14 deletions(-) diff --git a/fs/lustre/include/lustre_crypto.h b/fs/lustre/include/lustre_crypto.h index c31cc1e..2252798 100644 --- a/fs/lustre/include/lustre_crypto.h +++ b/fs/lustre/include/lustre_crypto.h @@ -44,6 +44,8 @@ int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen, bool ll_sb_has_test_dummy_encryption(struct super_block *sb); bool ll_sbi_has_encrypt(struct ll_sb_info *sbi); void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set); +bool ll_sbi_has_name_encrypt(struct ll_sb_info *sbi); +void ll_sbi_set_name_encrypt(struct ll_sb_info *sbi, bool set); #else static inline int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen, bool preload) diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c index ad045c3..d6750fb 100644 --- a/fs/lustre/llite/crypto.c +++ b/fs/lustre/llite/crypto.c @@ -36,7 +36,7 @@ static int ll_get_context(struct inode *inode, void *ctx, size_t len) /* Get enc context xattr directly instead of going through the VFS, * as there is no xattr handler for "encryption.". */ - rc = ll_xattr_list(inode, LL_XATTR_NAME_ENCRYPTION_CONTEXT, + rc = ll_xattr_list(inode, xattr_for_enc(inode), XATTR_ENCRYPTION_T, ctx, len, OBD_MD_FLXATTR); /* used as encryption unit size */ @@ -59,7 +59,7 @@ int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen, if (encctx && encctxlen) rc = ll_xattr_cache_insert(inode, - LL_XATTR_NAME_ENCRYPTION_CONTEXT, + xattr_for_enc(inode), encctx, encctxlen); if (rc) return rc; @@ -108,7 +108,7 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len, * through the VFS, as there is no xattr handler for "encryption.". */ rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), - OBD_MD_FLXATTR, LL_XATTR_NAME_ENCRYPTION_CONTEXT, + OBD_MD_FLXATTR, xattr_for_enc(inode), ctx, len, XATTR_CREATE, ll_i2suppgid(inode), &req); if (rc) return rc; @@ -173,6 +173,19 @@ void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set) } } +bool ll_sbi_has_name_encrypt(struct ll_sb_info *sbi) +{ + return test_bit(LL_SBI_ENCRYPT_NAME, sbi->ll_flags); +} + +void ll_sbi_set_name_encrypt(struct ll_sb_info *sbi, bool set) +{ + if (set) + set_bit(LL_SBI_ENCRYPT_NAME, sbi->ll_flags); + else + clear_bit(LL_SBI_ENCRYPT_NAME, sbi->ll_flags); +} + static bool ll_empty_dir(struct inode *inode) { /* used by fscrypt_ioctl_set_policy(), because a policy can only be set diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 2139f88..b018515 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -468,11 +468,28 @@ static inline bool obd_connect_has_enc(struct obd_connect_data *data) static inline void obd_connect_set_enc(struct obd_connect_data *data) { -#ifdef HAVE_LUSTRE_CRYPTO +#ifdef CONFIG_FS_ENCRYPTION data->ocd_connect_flags2 |= OBD_CONNECT2_ENCRYPT; #endif } +static inline bool obd_connect_has_name_enc(struct obd_connect_data *data) +{ +#ifdef CONFIG_FS_ENCRYPTION + return data->ocd_connect_flags & OBD_CONNECT_FLAGS2 && + data->ocd_connect_flags2 & OBD_CONNECT2_ENCRYPT_NAME; +#else + return false; +#endif +} + +static inline void obd_connect_set_name_enc(struct obd_connect_data *data) +{ +#ifdef CONFIG_FS_ENCRYPTION + data->ocd_connect_flags2 |= OBD_CONNECT2_ENCRYPT_NAME; +#endif +} + /* * Locking to guarantee consistency of non-atomic updates to long long i_size, * consistency between file size and KMS. @@ -639,6 +656,7 @@ enum ll_sbi_flags { LL_SBI_TINY_WRITE, /* tiny write support */ LL_SBI_FILE_HEAT, /* file heat support */ LL_SBI_PARALLEL_DIO, /* parallel (async) O_DIRECT RPCs */ + LL_SBI_ENCRYPT_NAME, /* name encryption */ LL_SBI_NUM_FLAGS }; @@ -1727,7 +1745,6 @@ static inline struct pcc_super *ll_info2pccs(struct ll_inode_info *lli) } /* crypto.c */ -#ifdef CONFIG_FS_ENCRYPTION /* The digested form is made of a FID (16 bytes) followed by the second-to-last * ciphertext block (16 bytes), so a total length of 32 bytes. * That way, fscrypt does not compute a digested form of this digest. @@ -1737,6 +1754,14 @@ struct ll_digest_filename { char ldf_excerpt[FS_CRYPTO_BLOCK_SIZE]; }; +static inline char *xattr_for_enc(struct inode *inode) +{ + if (ll_sbi_has_name_encrypt(ll_i2sbi(inode))) + return LL_XATTR_NAME_ENCRYPTION_CONTEXT; + + return LL_XATTR_NAME_ENCRYPTION_CONTEXT_OLD; +} +#ifdef CONFIG_FS_ENCRYPTION int ll_setup_filename(struct inode *dir, const struct qstr *iname, int lookup, struct fscrypt_name *fname, struct lu_fid *fid); @@ -1752,7 +1777,6 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname, { return fscrypt_setup_filename(dir, iname, lookup, fname); } - int ll_fname_disk_to_usr(struct inode *inode, u32 hash, u32 minor_hash, struct fscrypt_str *iname, struct fscrypt_str *oname) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index dee2e51..5daced0 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -173,6 +173,7 @@ static struct ll_sb_info *ll_init_sbi(void) set_bit(LL_SBI_TINY_WRITE, sbi->ll_flags); set_bit(LL_SBI_PARALLEL_DIO, sbi->ll_flags); ll_sbi_set_encrypt(sbi, true); + ll_sbi_set_name_encrypt(sbi, true); /* root squash */ sbi->ll_squash.rsi_uid = 0; @@ -349,8 +350,10 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags &= ~OBD_CONNECT_PINGLESS; obd_connect_set_secctx(data); - if (ll_sbi_has_encrypt(sbi)) + if (ll_sbi_has_encrypt(sbi)) { + obd_connect_set_name_enc(data); obd_connect_set_enc(data); + } #if defined(CONFIG_SECURITY) data->ocd_connect_flags2 |= OBD_CONNECT2_SELINUX_POLICY; @@ -482,6 +485,14 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) ll_sbi_set_encrypt(sbi, false); } + if (ll_sbi_has_name_encrypt(sbi) && !obd_connect_has_name_enc(data)) { + if (ll_sb_has_test_dummy_encryption(sb)) + LCONSOLE_WARN("%s: server %s does not support name encryption, not using it.\n", + sbi->ll_fsname, + sbi->ll_md_exp->exp_obd->obd_name); + ll_sbi_set_name_encrypt(sbi, false); + } + if (data->ocd_ibits_known & MDS_INODELOCK_XATTR) { if (!(data->ocd_connect_flags & OBD_CONNECT_MAX_EASIZE)) { LCONSOLE_INFO("%s: disabling xattr cache due to unknown maximum xattr size.\n", @@ -928,6 +939,7 @@ void ll_kill_super(struct super_block *sb) {LL_SBI_TINY_WRITE, "tiny_write"}, {LL_SBI_FILE_HEAT, "file_heat"}, {LL_SBI_PARALLEL_DIO, "parallel_dio"}, + {LL_SBI_ENCRYPT_NAME, "name_encrypt"}, }; int ll_sbi_flags_seq_show(struct seq_file *m, void *v) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 2215dd8..a08b1c1 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -668,7 +668,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, "server returned encryption ctx for " DFID "\n", PFID(ll_inode2fid(inode))); rc = ll_xattr_cache_insert(inode, - LL_XATTR_NAME_ENCRYPTION_CONTEXT, + xattr_for_enc(inode), encctx, encctxlen); if (rc) { CWARN("%s: cannot set enc ctx for " DFID ": rc = %d\n", diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 3043a51..c6779eb 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -682,7 +682,7 @@ static void sa_instantiate(struct ll_statahead_info *sai, "server returned encryption ctx for "DFID"\n", PFID(ll_inode2fid(child))); rc = ll_xattr_cache_insert(child, - LL_XATTR_NAME_ENCRYPTION_CONTEXT, + xattr_for_enc(child), encctx, encctxlen); if (rc) CWARN("%s: cannot set enc ctx for "DFID": rc = %d\n", diff --git a/fs/lustre/llite/symlink.c b/fs/lustre/llite/symlink.c index 8ea16bb..35a8574 100644 --- a/fs/lustre/llite/symlink.c +++ b/fs/lustre/llite/symlink.c @@ -38,7 +38,7 @@ #include "llite_internal.h" /* Must be called with lli_size_mutex locked */ -/* HAVE_IOP_GET_LINK is defined from kernel 4.5, whereas +/* iop->get_link is defined from kernel 4.5, whereas * IS_ENCRYPTED is brought by kernel 4.14. * So there is no need to handle encryption case otherwise. */ diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c index 7e5b807..ae59806 100644 --- a/fs/lustre/llite/xattr_cache.c +++ b/fs/lustre/llite/xattr_cache.c @@ -109,7 +109,8 @@ static int ll_xattr_cache_add(struct list_head *cache, struct ll_xattr_entry *xattr; if (ll_xattr_cache_find(cache, xattr_name, &xattr) == 0) { - if (!strcmp(xattr_name, LL_XATTR_NAME_ENCRYPTION_CONTEXT)) + if (!strcmp(xattr_name, LL_XATTR_NAME_ENCRYPTION_CONTEXT) || + !strcmp(xattr_name, LL_XATTR_NAME_ENCRYPTION_CONTEXT_OLD)) /* it means enc ctx was already in cache, * ignore error as it cannot be modified */ @@ -288,8 +289,7 @@ int ll_xattr_cache_empty(struct inode *inode) goto out_empty; list_for_each_entry_safe(entry, n, &lli->lli_xattrs, xe_list) { - if (strcmp(entry->xe_name, - LL_XATTR_NAME_ENCRYPTION_CONTEXT) == 0) + if (strcmp(entry->xe_name, xattr_for_enc(inode)) == 0) continue; CDEBUG(D_CACHE, "delete: %s\n", entry->xe_name); @@ -534,7 +534,7 @@ int ll_xattr_cache_get(struct inode *inode, const char *name, char *buffer, * cache if we are just interested in encryption context. */ if ((valid & OBD_MD_FLXATTRLS || - strcmp(name, LL_XATTR_NAME_ENCRYPTION_CONTEXT) != 0) && + strcmp(name, xattr_for_enc(inode)) != 0) && !ll_xattr_cache_filled(lli)) { up_read(&lli->lli_xattrs_list_rwsem); rc = ll_xattr_cache_refill(inode); diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 335e748..d80e7bd 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -137,6 +137,7 @@ "mne_nid_type", /* 0x1000000 */ "lock_contend", /* 0x2000000 */ "atomic_open_lock", /* 0x4000000 */ + "name_encryption", /* 0x8000000 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 60a7fd0..66c7c17 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1241,6 +1241,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_PCCRO); LASSERTF(OBD_CONNECT2_ATOMIC_OPEN_LOCK == 0x4000000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ATOMIC_OPEN_LOCK); + LASSERTF(OBD_CONNECT2_ENCRYPT_NAME == 0x8000000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ENCRYPT_NAME); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 37db3ee..319dc81d 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -809,6 +809,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_BATCH_RPC 0x400000ULL /* Multi-RPC batch request */ #define OBD_CONNECT2_PCCRO 0x800000ULL /* Read-only PCC */ #define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL/* request lock on 1st open */ +#define OBD_CONNECT2_ENCRYPT_NAME 0x8000000ULL /* name encrypt */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same * flag value is not in use on some other branch. Please clear any such From patchwork Tue Sep 6 01:55:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7EF3AECAAD3 for ; Tue, 6 Sep 2022 01:55:55 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lF73nhz1y6p; Mon, 5 Sep 2022 18:55:53 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l23tVfz1wn4 for ; Mon, 5 Sep 2022 18:55:42 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 9CF3B100AFF6; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 93BC058994; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:15 -0400 Message-Id: <1662429337-18737-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/24] lustre: lmv: always space-balance r-r directories X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao If the MDT free space is imbalanced, use QOS space balancing for round-robin subdirectory creation, regardless of the depth of the directory tree. Otherwise, new subdirectories created in parents with round-robin default layout may suddenly become "sticky" on the parent MDT and upset the space balancing and load distribution. Fixes: a8948860e4 ("lustre: lmv: improve MDT QOS space balance") WC-bug-id: https://jira.whamcloud.com/browse/LU-15850 Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/47578 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 6c0eb03..0988b1a 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -55,6 +55,7 @@ #include "lmv_internal.h" static int lmv_check_connect(struct obd_device *obd); +static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data); void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, int activate) @@ -1446,8 +1447,8 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 mdt, - unsigned short dir_depth) +static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, + struct md_op_data *op_data) { struct lu_tgt_desc *tgt, *cur = NULL; u64 total_avail = 0; @@ -1481,23 +1482,31 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 mdt, tgt->ltd_qos.ltq_usable = 1; lu_tgt_qos_weight_calc(tgt); - if (tgt->ltd_index == mdt) + if (tgt->ltd_index == op_data->op_mds) cur = tgt; total_avail += tgt->ltd_qos.ltq_avail; total_weight += tgt->ltd_qos.ltq_weight; total_usable++; } - /* if current MDT has above-average space, within range of the QOS - * threshold, stay on the same MDT to avoid creating needless remote - * MDT directories. It's more likely for low level directories - * "16 / (dir_depth + 10)" is the factor to make it more unlikely for - * top level directories, while more likely for low levels. + /* If current MDT has above-average space and dir is not aleady using + * round-robin to spread across more MDTs, stay on the parent MDT + * to avoid creating needless remote MDT directories. Remote dirs + * close to the root balance space more effectively than bottom dirs, + * so prefer to create remote dirs at top level of directory tree. + * "16 / (dir_depth + 10)" is the factor to make it less likely + * for top-level directories to stay local unless they have more than + * average free space, while deep dirs prefer local until more full. + * depth=0 -> 160%, depth=3 -> 123%, depth=6 -> 100%, + * depth=9 -> 84%, depth=12 -> 73%, depth=15 -> 64% */ - rand = total_avail * 16 / (total_usable * (dir_depth + 10)); - if (cur && cur->ltd_qos.ltq_avail >= rand) { - tgt = cur; - goto unlock; + if (!lmv_op_default_rr_mkdir(op_data)) { + rand = total_avail * 16 / + (total_usable * (op_data->op_dir_depth + 10)); + if (cur && cur->ltd_qos.ltq_avail >= rand) { + tgt = cur; + goto unlock; + } } rand = lu_prandom_u64_max(total_weight); @@ -1836,9 +1845,6 @@ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data) { const struct lmv_stripe_md *lsm = op_data->op_default_mea1; - if (!lmv_op_default_qos_mkdir(op_data)) - return false; - return (op_data->op_flags & MF_RR_MKDIR) || (lsm && lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE) || fid_is_root(&op_data->op_fid1); @@ -1873,7 +1879,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_by_space(struct lmv_obd *lmv, { struct lmv_tgt_desc *tmp = tgt; - tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds, op_data->op_dir_depth); + tgt = lmv_locate_tgt_qos(lmv, op_data); if (tgt == ERR_PTR(-EAGAIN)) { if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) && !lmv_op_default_rr_mkdir(op_data) && From patchwork Tue Sep 6 01:55:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91C6CECAAA1 for ; Tue, 6 Sep 2022 01:55:58 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lK5tKjz1y7R; Mon, 5 Sep 2022 18:55:57 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l32DXDz1y4v for ; Mon, 5 Sep 2022 18:55:43 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id AA038100AFF7; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9C12B58999; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:16 -0400 Message-Id: <1662429337-18737-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/24] lustre: ldlm: rid of obsolete param of ldlm_resource_get() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam The second parameter @parent of ldlm_resource_get() is obsolete, just remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-8238 Lustre-commit: bab7c8998a6539c0d ("LU-8238 ldlm: rid of obsolete param of ldlm_resource_get()") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/20631 Reviewed-by: James Simmons Reviewed-by: Arshad Hussain Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 - fs/lustre/ldlm/ldlm_lock.c | 6 +++--- fs/lustre/ldlm/ldlm_request.c | 4 ++-- fs/lustre/ldlm/ldlm_resource.c | 6 ++---- fs/lustre/mdc/mdc_dev.c | 5 +++-- fs/lustre/mdc/mdc_locks.c | 2 +- fs/lustre/mdc/mdc_reint.c | 2 +- fs/lustre/osc/osc_object.c | 6 +++--- fs/lustre/osc/osc_request.c | 2 +- 9 files changed, 16 insertions(+), 18 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 9286bec..6053e01 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -1287,7 +1287,6 @@ static inline void ldlm_svc_get_eopc(const struct ldlm_request *dlm_req, /* resource.c - internal */ struct ldlm_resource *ldlm_resource_get(struct ldlm_namespace *ns, - struct ldlm_resource *parent, const struct ldlm_res_id *, enum ldlm_type type, int create); struct ldlm_resource *ldlm_resource_getref(struct ldlm_resource *res); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 4225c0b..39ab2a0 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -447,7 +447,7 @@ int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock, type = oldres->lr_type; unlock_res_and_lock(lock); - newres = ldlm_resource_get(ns, NULL, new_resid, type, 1); + newres = ldlm_resource_get(ns, new_resid, type, 1); if (IS_ERR(newres)) return PTR_ERR(newres); @@ -1308,7 +1308,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, *data.lmd_mode = data.lmd_old->l_req_mode; } - res = ldlm_resource_get(ns, NULL, res_id, type, 0); + res = ldlm_resource_get(ns, res_id, type, 0); if (IS_ERR(res)) { LASSERT(!data.lmd_old); return 0; @@ -1537,7 +1537,7 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, struct ldlm_resource *res; int rc; - res = ldlm_resource_get(ns, NULL, res_id, type, 1); + res = ldlm_resource_get(ns, res_id, type, 1); if (IS_ERR(res)) return ERR_CAST(res); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 4ba64b1..f59778e 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1887,7 +1887,7 @@ int ldlm_cli_cancel_unused_resource(struct ldlm_namespace *ns, int count; int rc; - res = ldlm_resource_get(ns, NULL, res_id, 0, 0); + res = ldlm_resource_get(ns, res_id, 0, 0); if (IS_ERR(res)) { /* This is not a problem. */ CDEBUG(D_INFO, "No resource %llu\n", res_id->name[0]); @@ -2039,7 +2039,7 @@ int ldlm_resource_iterate(struct ldlm_namespace *ns, LASSERTF(ns, "must pass in namespace\n"); - res = ldlm_resource_get(ns, NULL, res_id, 0, 0); + res = ldlm_resource_get(ns, res_id, 0, 0); if (IS_ERR(res)) return 0; diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index d4b6e41..a189dbd 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -1105,9 +1105,8 @@ static struct ldlm_resource *ldlm_resource_new(enum ldlm_type ldlm_type) * Returns: referenced, unlocked ldlm_resource or NULL */ struct ldlm_resource * -ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent, - const struct ldlm_res_id *name, enum ldlm_type type, - int create) +ldlm_resource_get(struct ldlm_namespace *ns, const struct ldlm_res_id *name, + enum ldlm_type type, int create) { struct hlist_node *hnode; struct ldlm_resource *res = NULL; @@ -1116,7 +1115,6 @@ struct ldlm_resource * int ns_refcount = 0; int hash; - LASSERT(!parent); LASSERT(ns->ns_rs_hash); LASSERT(name->name[0] != 0); diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index de67720..fd0e362 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -998,13 +998,14 @@ static int mdc_get_lock_handle(const struct lu_env *env, struct osc_object *osc, OSC_DAP_FL_TEST_LOCK | OSC_DAP_FL_CANCELING); if (!lock) { + struct ldlm_namespace *ns; struct ldlm_resource *res; struct ldlm_res_id *resname; resname = &osc_env_info(env)->oti_resname; fid_build_reg_res_name(lu_object_fid(osc2lu(osc)), resname); - res = ldlm_resource_get(osc_export(osc)->exp_obd->obd_namespace, - NULL, resname, LDLM_IBITS, 0); + ns = osc_export(osc)->exp_obd->obd_namespace; + res = ldlm_resource_get(ns, resname, LDLM_IBITS, 0); ldlm_resource_dump(D_ERROR, res); libcfs_debug_dumpstack(NULL); rc = -ENOENT; diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 2a9b9a8..ae55cc3 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -177,7 +177,7 @@ int mdc_null_inode(struct obd_export *exp, fid_build_reg_res_name(fid, &res_id); - res = ldlm_resource_get(ns, NULL, &res_id, 0, 0); + res = ldlm_resource_get(ns, &res_id, 0, 0); if (IS_ERR(res)) return 0; diff --git a/fs/lustre/mdc/mdc_reint.c b/fs/lustre/mdc/mdc_reint.c index 3f4e28a..4d33655 100644 --- a/fs/lustre/mdc/mdc_reint.c +++ b/fs/lustre/mdc/mdc_reint.c @@ -81,7 +81,7 @@ int mdc_resource_get_unused_res(struct obd_export *exp, if (exp_connect_cancelset(exp) && !ns_connect_cancelset(ns)) return 0; - res = ldlm_resource_get(ns, NULL, res_id, 0, 0); + res = ldlm_resource_get(ns, res_id, 0, 0); if (IS_ERR(res)) return 0; LDLM_RESOURCE_ADDREF(res); diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c index 517ce5c..efb0533 100644 --- a/fs/lustre/osc/osc_object.c +++ b/fs/lustre/osc/osc_object.c @@ -389,6 +389,7 @@ static void osc_req_attr_set(const struct lu_env *env, struct cl_object *obj, lock = osc_dlmlock_at_pgoff(env, cl2osc(obj), osc_index(opg), OSC_DAP_FL_TEST_LOCK | OSC_DAP_FL_CANCELING); if (!lock && !opg->ops_srvlock) { + struct ldlm_namespace *ns; struct ldlm_resource *res; struct ldlm_res_id *resname; @@ -397,9 +398,8 @@ static void osc_req_attr_set(const struct lu_env *env, struct cl_object *obj, resname = &osc_env_info(env)->oti_resname; ostid_build_res_name(&oinfo->loi_oi, resname); - res = ldlm_resource_get( - osc_export(cl2osc(obj))->exp_obd->obd_namespace, - NULL, resname, LDLM_EXTENT, 0); + ns = osc_export(cl2osc(obj))->exp_obd->obd_namespace; + res = ldlm_resource_get(ns, resname, LDLM_EXTENT, 0); ldlm_resource_dump(D_ERROR, res); libcfs_debug_dumpstack(NULL); LBUG(); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 21e036e..5a4db29 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -587,7 +587,7 @@ static int osc_resource_get_unused(struct obd_export *exp, struct obdo *oa, return 0; ostid_build_res_name(&oa->o_oi, &res_id); - res = ldlm_resource_get(ns, NULL, &res_id, 0, 0); + res = ldlm_resource_get(ns, &res_id, 0, 0); if (IS_ERR(res)) return 0; From patchwork Tue Sep 6 01:55:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8FFB2C6FA86 for ; Tue, 6 Sep 2022 01:55:52 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7l917Ysz1y7W; Mon, 5 Sep 2022 18:55:49 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l40V06z1y5j for ; Mon, 5 Sep 2022 18:55:44 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id AB006100AFF8; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9E267589A0; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:17 -0400 Message-Id: <1662429337-18737-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/24] lustre: llite: fully disable readahead in kernel I/O path X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin In the new kernel (rhel9 or ubuntu 2204), the readahead path may be out of the control of Lustre CLIO engine: generic_file_read_iter() ->filemap_read() ->filemap_get_pages() ->page_cache_sync_readahead() ->page_cache_sync_ra() void page_cache_sync_ra() { if (!ractl->ra->ra_pages || blk_cgroup_congested()) { if (!ractl->file) return; req_count = 1; do_forced_ra = true; } /* be dumb */ if (do_forced_ra) { force_page_cache_ra(ractl, req_count); return; } ... } From the kernel readahead code, even if read-ahead is disabled (via @ra_pages == 0), it still issues this request as read-ahead as we will need it to satisfy the requested range. The forced read-ahead will do the right thing and limit the read to just the requested range, which we will set to 1 page for this case. Thus it can not totally avoid the read-ahead in the kernel I/O path only by setting @ra_pages with 0. To fully disable the read-ahead in the Linux kernel I/O path, we still need to set @io_pages to 0, it will set I/O range to 0 in @force_page_cache_ra(): void force_page_cache_ra() { ... max_pages = = max_t(unsigned long, bdi->io_pages, ra->ra_pages); nr_to_read = min_t(unsigned long, nr_to_read, max_pages); while (nr_to_read) { ... } ... } After set bdi->io_pages with 0, it can pass the sanity/101j. WC-bug-id: https://jira.whamcloud.com/browse/LU-16019 Lustre-commit: f0cf7fd3cccb2313f ("LU-16019 llite: fully disable readahead in kernel I/O path") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/47993 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 5daced0..5931258 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1261,6 +1261,7 @@ int ll_fill_super(struct super_block *sb) /* disable kernel readahead */ sb->s_bdi->ra_pages = 0; + sb->s_bdi->io_pages = 0; /* Call ll_debugsfs_register_super() before lustre_process_log() * so that "llite.*.*" params can be processed correctly. From patchwork Tue Sep 6 01:55:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 05912ECAAA1 for ; Tue, 6 Sep 2022 01:55:49 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7l70TRGz1y6r; Mon, 5 Sep 2022 18:55:47 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l45Hmvz1y64 for ; Mon, 5 Sep 2022 18:55:44 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B067A100AFF9; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A183F589A1; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:18 -0400 Message-Id: <1662429337-18737-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/24] lustre: llite: use fatal_signal_pending in range_lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin FIO io_uring failed with one file shared by two FIO processes under newer kernels. After analyzed, we found that range_lock() function return -ERESTARTSYS when there pending signal on current process in Lustre I/O. This causes -EINTR returned to the application. we solve this bug by replacing @signal_pending(current) with @fatal_signal_pending(current) in range_lock(). The range_lock() function only returns -ERESTARTSYS when the current process has fatal pending signal such as SIGKILL. WC-bug-id: https://jira.whamcloud.com/browse/LU-15994 Lustre-commit: 4c5b0b0967f052af3 ("LU-15994 llite: use fatal_signal_pending in range_lock") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/48106 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/range_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/range_lock.c b/fs/lustre/obdclass/range_lock.c index 2af6385..9731e57 100644 --- a/fs/lustre/obdclass/range_lock.c +++ b/fs/lustre/obdclass/range_lock.c @@ -159,7 +159,7 @@ int range_lock(struct range_lock_tree *tree, struct range_lock *lock) spin_unlock(&tree->rlt_lock); schedule(); - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { range_unlock(tree, lock); rc = -ERESTARTSYS; goto out; From patchwork Tue Sep 6 01:55:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 11F05ECAAA1 for ; Tue, 6 Sep 2022 01:56:02 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lP55vVz1y79; Mon, 5 Sep 2022 18:56:01 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l52yBtz1y6G for ; Mon, 5 Sep 2022 18:55:45 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B40B7100AFFA; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A878E37C; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:19 -0400 Message-Id: <1662429337-18737-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/24] lustre: update version to 2.15.51 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin New tag 2.15.51 Signed-off-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_ver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h index bcee87c..7e15b81 100644 --- a/include/uapi/linux/lustre/lustre_ver.h +++ b/include/uapi/linux/lustre/lustre_ver.h @@ -3,9 +3,9 @@ #define LUSTRE_MAJOR 2 #define LUSTRE_MINOR 15 -#define LUSTRE_PATCH 50 +#define LUSTRE_PATCH 51 #define LUSTRE_FIX 0 -#define LUSTRE_VERSION_STRING "2.15.50" +#define LUSTRE_VERSION_STRING "2.15.51" #define OBD_OCD_VERSION(major, minor, patch, fix) \ (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix)) From patchwork Tue Sep 6 01:55:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966738 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E6B32ECAAA1 for ; Tue, 6 Sep 2022 01:56:23 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lq4Rw5z1y6T; Mon, 5 Sep 2022 18:56:23 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l60ZXMz1y6W for ; Mon, 5 Sep 2022 18:55:46 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B6BDB100B000; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AC1DA58992; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:20 -0400 Message-Id: <1662429337-18737-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/24] lustre: llite: simplify callback handling for async getattr X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin In this patch, it prepares the inode and set lock data directly in the callback interpret of the intent async getattr RPC request (in ptlrpcd context), simplifies the old impementation that defer this work in the statahead thread. If the statahead entry is a striped directory, it may generate new RPCs in the ptlrpcd interpret context to obtain the attributes for slaves of the striped directory: @ll_prep_inode()->@lmv_revaildate_slaves() This is dangerous and may result in deadlock in ptlrpcd interpret context, thus we use work queue to handle these extra RPCs. Add sanity 123d to verify that it works correctly. According to the benchmark result, the workload "ls -l" to a large directory on a client without any caching (server and client), containing 1M files (47001 bytes) shows the results with measured elapsed time: - w/o patch: 180 seconds; - w patch: 181 seconds; There is no any obvious performance regession. WC-bug-id: https://jira.whamcloud.com/browse/LU-14139 Lustre-commit: 509d7305ce8a01351 ("LU-14139 llite: simplify callback handling for async getattr") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/45648 Reviewed-by: Lai Siyao Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_intent.h | 2 + fs/lustre/include/obd.h | 27 +-- fs/lustre/include/obd_class.h | 4 +- fs/lustre/llite/llite_internal.h | 17 +- fs/lustre/llite/llite_lib.c | 8 + fs/lustre/llite/statahead.c | 380 +++++++++++++++++++------------------- fs/lustre/lmv/lmv_obd.c | 6 +- fs/lustre/mdc/mdc_internal.h | 3 +- fs/lustre/mdc/mdc_locks.c | 30 +-- 9 files changed, 241 insertions(+), 236 deletions(-) diff --git a/fs/lustre/include/lustre_intent.h b/fs/lustre/include/lustre_intent.h index e7d81f6..298270b 100644 --- a/fs/lustre/include/lustre_intent.h +++ b/fs/lustre/include/lustre_intent.h @@ -50,6 +50,8 @@ struct lookup_intent { u64 it_remote_lock_handle; struct ptlrpc_request *it_request; unsigned int it_lock_set:1; + unsigned int it_extra_rpc_check:1; + unsigned int it_extra_rpc_need:1; }; static inline int it_disposition(struct lookup_intent *it, int flag) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index c5e2a24..c452da7 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -834,18 +834,19 @@ struct md_readdir_info { int mr_partial_readdir_rc; }; -struct md_enqueue_info; -/* metadata stat-ahead */ - -struct md_enqueue_info { - struct md_op_data mi_data; - struct lookup_intent mi_it; - struct lustre_handle mi_lockh; - struct inode *mi_dir; - struct ldlm_enqueue_info mi_einfo; - int (*mi_cb)(struct ptlrpc_request *req, - struct md_enqueue_info *minfo, int rc); - void *mi_cbdata; +struct md_op_item; +typedef int (*md_op_item_cb_t)(struct req_capsule *pill, + struct md_op_item *item, + int rc); + +struct md_op_item { + struct md_op_data mop_data; + struct lookup_intent mop_it; + struct lustre_handle mop_lockh; + struct ldlm_enqueue_info mop_einfo; + md_op_item_cb_t mop_cb; + void *mop_cbdata; + struct inode *mop_dir; }; struct obd_ops { @@ -1078,7 +1079,7 @@ struct md_ops { struct lu_fid *fid); int (*intent_getattr_async)(struct obd_export *exp, - struct md_enqueue_info *minfo); + struct md_op_item *item); int (*revalidate_lock)(struct obd_export *, struct lookup_intent *, struct lu_fid *, u64 *bits); diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index f603140..80ff4e8 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1593,7 +1593,7 @@ static inline int md_init_ea_size(struct obd_export *exp, u32 easize, } static inline int md_intent_getattr_async(struct obd_export *exp, - struct md_enqueue_info *minfo) + struct md_op_item *item) { int rc; @@ -1604,7 +1604,7 @@ static inline int md_intent_getattr_async(struct obd_export *exp, lprocfs_counter_incr(exp->exp_obd->obd_md_stats, LPROC_MD_INTENT_GETATTR_ASYNC); - return MDP(exp->exp_obd, intent_getattr_async)(exp, minfo); + return MDP(exp->exp_obd, intent_getattr_async)(exp, item); } static inline int md_revalidate_lock(struct obd_export *exp, diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index b018515..e7f703a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1504,17 +1504,12 @@ struct ll_statahead_info { * is not a hidden one */ unsigned int sai_skip_hidden;/* skipped hidden dentry count */ - unsigned int sai_ls_all:1, /* "ls -al", do stat-ahead for - * hidden entries - */ - sai_in_readpage:1;/* statahead in readdir() */ + unsigned int sai_ls_all:1; /* "ls -al", do stat-ahead for + * hidden entries + */ wait_queue_head_t sai_waitq; /* stat-ahead wait queue */ struct task_struct *sai_task; /* stat-ahead thread */ struct task_struct *sai_agl_task; /* AGL thread */ - struct list_head sai_interim_entries; /* entries which got async - * stat reply, but not - * instantiated - */ struct list_head sai_entries; /* completed entries */ struct list_head sai_agls; /* AGLs to be sent */ struct list_head sai_cache[LL_SA_CACHE_SIZE]; @@ -1522,6 +1517,12 @@ struct ll_statahead_info { atomic_t sai_cache_count; /* entry count in cache */ }; +struct ll_interpret_work { + struct work_struct lpw_work; + struct md_op_item *lpw_item; + struct req_capsule *lpw_pill; +}; + int ll_revalidate_statahead(struct inode *dir, struct dentry **dentry, bool unplug); int ll_start_statahead(struct inode *dir, struct dentry *dentry, bool agl); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 5931258..0bffe5e 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -3080,6 +3080,14 @@ int ll_prep_inode(struct inode **inode, struct req_capsule *pill, if (rc) goto out; + if (S_ISDIR(md.body->mbo_mode) && md.lmv && lmv_dir_striped(md.lmv) && + it && it->it_extra_rpc_check) { + /* TODO: Check @lsm unchanged via @lsm_md_eq. */ + it->it_extra_rpc_need = 1; + rc = -EAGAIN; + goto out; + } + /* * clear default_lmv only if intent_getattr reply doesn't contain it. * but it needs to be done after iget, check this early because diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index c6779eb..5662f44 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -56,13 +56,12 @@ enum se_stat { /* * sa_entry is not refcounted: statahead thread allocates it and do async stat, - * and in async stat callback ll_statahead_interpret() will add it into - * sai_interim_entries, later statahead thread will call sa_handle_callback() to - * instantiate entry and move it into sai_entries, and then only scanner process - * can access and free it. + * and in async stat callback ll_statahead_interpret() will prepare the inode + * and set lock data in the ptlrpcd context. Then the scanner process will be + * woken up if this entry is the waiting one, can access and free it. */ struct sa_entry { - /* link into sai_interim_entries or sai_entries */ + /* link into sai_entries */ struct list_head se_list; /* link into sai hash table locally */ struct list_head se_hash; @@ -74,10 +73,6 @@ struct sa_entry { enum se_stat se_state; /* entry size, contains name */ int se_size; - /* pointer to async getattr enqueue info */ - struct md_enqueue_info *se_minfo; - /* pointer to the async getattr request */ - struct ptlrpc_request *se_req; /* pointer to the target inode */ struct inode *se_inode; /* entry name */ @@ -137,12 +132,6 @@ static inline int sa_sent_full(struct ll_statahead_info *sai) return atomic_read(&sai->sai_cache_count) >= sai->sai_max; } -/* got async stat replies */ -static inline int sa_has_callback(struct ll_statahead_info *sai) -{ - return !list_empty(&sai->sai_interim_entries); -} - static inline int agl_list_empty(struct ll_statahead_info *sai) { return list_empty(&sai->sai_agls); @@ -328,61 +317,61 @@ static void sa_free(struct ll_statahead_info *sai, struct sa_entry *entry) } /* finish async stat RPC arguments */ -static void sa_fini_data(struct md_enqueue_info *minfo) +static void sa_fini_data(struct md_op_item *item) { - struct md_op_data *op_data = &minfo->mi_data; + struct md_op_data *op_data = &item->mop_data; if (op_data->op_flags & MF_OPNAME_KMALLOCED) /* allocated via ll_setup_filename called from sa_prep_data */ kfree(op_data->op_name); - ll_unlock_md_op_lsm(&minfo->mi_data); - iput(minfo->mi_dir); - kfree(minfo); + ll_unlock_md_op_lsm(op_data); + iput(item->mop_dir); + kfree(item); } -static int ll_statahead_interpret(struct ptlrpc_request *req, - struct md_enqueue_info *minfo, int rc); +static int ll_statahead_interpret(struct req_capsule *pill, + struct md_op_item *item, int rc); /* * prepare arguments for async stat RPC. */ -static struct md_enqueue_info * +static struct md_op_item * sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry) { - struct md_enqueue_info *minfo; + struct md_op_item *item; struct ldlm_enqueue_info *einfo; - struct md_op_data *op_data; + struct md_op_data *op_data; - minfo = kzalloc(sizeof(*minfo), GFP_NOFS); - if (!minfo) + item = kzalloc(sizeof(*item), GFP_NOFS); + if (!item) return ERR_PTR(-ENOMEM); - op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child, + op_data = ll_prep_md_op_data(&item->mop_data, dir, child, entry->se_qstr.name, entry->se_qstr.len, 0, LUSTRE_OPC_ANY, NULL); if (IS_ERR(op_data)) { - kfree(minfo); - return (struct md_enqueue_info *)op_data; + kfree(item); + return (struct md_op_item *)op_data; } if (!child) op_data->op_fid2 = entry->se_fid; - minfo->mi_it.it_op = IT_GETATTR; - minfo->mi_dir = igrab(dir); - minfo->mi_cb = ll_statahead_interpret; - minfo->mi_cbdata = entry; - - einfo = &minfo->mi_einfo; - einfo->ei_type = LDLM_IBITS; - einfo->ei_mode = it_to_lock_mode(&minfo->mi_it); - einfo->ei_cb_bl = ll_md_blocking_ast; - einfo->ei_cb_cp = ldlm_completion_ast; - einfo->ei_cb_gl = NULL; + item->mop_it.it_op = IT_GETATTR; + item->mop_dir = igrab(dir); + item->mop_cb = ll_statahead_interpret; + item->mop_cbdata = entry; + + einfo = &item->mop_einfo; + einfo->ei_type = LDLM_IBITS; + einfo->ei_mode = it_to_lock_mode(&item->mop_it); + einfo->ei_cb_bl = ll_md_blocking_ast; + einfo->ei_cb_cp = ldlm_completion_ast; + einfo->ei_cb_gl = NULL; einfo->ei_cbdata = NULL; einfo->ei_req_slot = 1; - return minfo; + return item; } /* @@ -393,22 +382,8 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret) { struct ll_inode_info *lli = ll_i2info(sai->sai_dentry->d_inode); - struct md_enqueue_info *minfo = entry->se_minfo; - struct ptlrpc_request *req = entry->se_req; bool wakeup; - /* release resources used in RPC */ - if (minfo) { - entry->se_minfo = NULL; - ll_intent_release(&minfo->mi_it); - sa_fini_data(minfo); - } - - if (req) { - entry->se_req = NULL; - ptlrpc_req_finished(req); - } - spin_lock(&lli->lli_sa_lock); wakeup = __sa_make_ready(sai, entry, ret); spin_unlock(&lli->lli_sa_lock); @@ -465,7 +440,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry) sai->sai_index = 1; init_waitqueue_head(&sai->sai_waitq); - INIT_LIST_HEAD(&sai->sai_interim_entries); INIT_LIST_HEAD(&sai->sai_entries); INIT_LIST_HEAD(&sai->sai_agls); @@ -528,7 +502,6 @@ static void ll_sai_put(struct ll_statahead_info *sai) LASSERT(sai->sai_task == NULL); LASSERT(sai->sai_agl_task == NULL); LASSERT(sai->sai_sent == sai->sai_replied); - LASSERT(!sa_has_callback(sai)); list_for_each_entry_safe(entry, next, &sai->sai_entries, se_list) @@ -618,52 +591,18 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai) iput(inode); } -/* - * prepare inode for sa entry, add it into agl list, now sa_entry is ready - * to be used by scanner process. - */ -static void sa_instantiate(struct ll_statahead_info *sai, - struct sa_entry *entry) +static int ll_statahead_interpret_common(struct inode *dir, + struct ll_statahead_info *sai, + struct req_capsule *pill, + struct lookup_intent *it, + struct sa_entry *entry, + struct mdt_body *body) { - struct inode *dir = sai->sai_dentry->d_inode; struct inode *child; - struct md_enqueue_info *minfo; - struct lookup_intent *it; - struct ptlrpc_request *req; - struct mdt_body *body; - int rc = 0; - - LASSERT(entry->se_handle != 0); - - minfo = entry->se_minfo; - it = &minfo->mi_it; - req = entry->se_req; - body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); - if (!body) { - rc = -EFAULT; - goto out; - } + int rc; child = entry->se_inode; - /* revalidate; unlinked and re-created with the same name */ - if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) { - if (child) { - entry->se_inode = NULL; - iput(child); - } - /* The mdt_body is invalid. Skip this entry */ - rc = -EAGAIN; - goto out; - } - - it->it_lock_handle = entry->se_handle; - rc = md_revalidate_lock(ll_i2mdexp(dir), it, ll_inode2fid(dir), NULL); - if (rc != 1) { - rc = -EAGAIN; - goto out; - } - - rc = ll_prep_inode(&child, &req->rq_pill, dir->i_sb, it); + rc = ll_prep_inode(&child, pill, dir->i_sb, it); if (rc) goto out; @@ -671,10 +610,8 @@ static void sa_instantiate(struct ll_statahead_info *sai, * inode now to save an extra getxattr. */ if (body->mbo_valid & OBD_MD_ENCCTX) { - void *encctx = req_capsule_server_get(&req->rq_pill, - &RMF_FILE_ENCCTX); - u32 encctxlen = req_capsule_get_size(&req->rq_pill, - &RMF_FILE_ENCCTX, + void *encctx = req_capsule_server_get(pill, &RMF_FILE_ENCCTX); + u32 encctxlen = req_capsule_get_size(pill, &RMF_FILE_ENCCTX, RCL_SERVER); if (encctxlen) { @@ -691,7 +628,7 @@ static void sa_instantiate(struct ll_statahead_info *sai, } } - CDEBUG(D_READA, "%s: setting %.*s" DFID " l_data to inode %p\n", + CDEBUG(D_READA, "%s: setting %.*s"DFID" l_data to inode %p\n", ll_i2sbi(dir)->ll_fsname, entry->se_qstr.len, entry->se_qstr.name, PFID(ll_inode2fid(child)), child); ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, child, it, NULL); @@ -700,51 +637,100 @@ static void sa_instantiate(struct ll_statahead_info *sai, if (agl_should_run(sai, child)) ll_agl_add(sai, child, entry->se_index); - out: + return rc; +} + +static void ll_statahead_interpret_fini(struct ll_inode_info *lli, + struct ll_statahead_info *sai, + struct md_op_item *item, + struct sa_entry *entry, + struct ptlrpc_request *req, + int rc) +{ /* - * sa_make_ready() will drop ldlm ibits lock refcount by calling + * First it will drop ldlm ibits lock refcount by calling * ll_intent_drop_lock() in spite of failures. Do not worry about * calling ll_intent_drop_lock() more than once. */ + ll_intent_release(&item->mop_it); + sa_fini_data(item); + if (req) + ptlrpc_req_finished(req); sa_make_ready(sai, entry, rc); + + spin_lock(&lli->lli_sa_lock); + sai->sai_replied++; + spin_unlock(&lli->lli_sa_lock); } -/* once there are async stat replies, instantiate sa_entry from replies */ -static void sa_handle_callback(struct ll_statahead_info *sai) +static void ll_statahead_interpret_work(struct work_struct *data) { - struct ll_inode_info *lli; + struct ll_interpret_work *work = container_of(data, + struct ll_interpret_work, + lpw_work); + struct md_op_item *item = work->lpw_item; + struct req_capsule *pill = work->lpw_pill; + struct inode *dir = item->mop_dir; + struct ll_inode_info *lli = ll_i2info(dir); + struct ll_statahead_info *sai = lli->lli_sai; + struct lookup_intent *it; + struct sa_entry *entry; + struct mdt_body *body; + struct inode *child; + int rc; - lli = ll_i2info(sai->sai_dentry->d_inode); + entry = (struct sa_entry *)item->mop_cbdata; + LASSERT(entry->se_handle != 0); - spin_lock(&lli->lli_sa_lock); - while (sa_has_callback(sai)) { - struct sa_entry *entry; + it = &item->mop_it; + body = req_capsule_server_get(pill, &RMF_MDT_BODY); + if (!body) { + rc = -EFAULT; + goto out; + } - entry = list_first_entry(&sai->sai_interim_entries, - struct sa_entry, se_list); - list_del_init(&entry->se_list); - spin_unlock(&lli->lli_sa_lock); + child = entry->se_inode; + /* revalidate; unlinked and re-created with the same name */ + if (unlikely(!lu_fid_eq(&item->mop_data.op_fid2, &body->mbo_fid1))) { + if (child) { + entry->se_inode = NULL; + iput(child); + } + /* The mdt_body is invalid. Skip this entry */ + rc = -EAGAIN; + goto out; + } - sa_instantiate(sai, entry); - spin_lock(&lli->lli_sa_lock); + it->it_lock_handle = entry->se_handle; + rc = md_revalidate_lock(ll_i2mdexp(dir), it, ll_inode2fid(dir), NULL); + if (rc != 1) { + rc = -EAGAIN; + goto out; } - spin_unlock(&lli->lli_sa_lock); + + LASSERT(it->it_extra_rpc_check == 0); + rc = ll_statahead_interpret_common(dir, sai, pill, it, entry, body); +out: + ll_statahead_interpret_fini(lli, sai, item, entry, pill->rc_req, rc); + kfree(work); } /* - * callback for async stat RPC, because this is called in ptlrpcd context, we - * only put sa_entry in sai_interim_entries, and wake up statahead thread to - * really prepare inode and instantiate sa_entry later. + * Callback for async stat RPC, this is called in ptlrpcd context. It prepares + * the inode and set lock data directly in the ptlrpcd context. It will wake up + * the directory listing process if the dentry is the waiting one. */ -static int ll_statahead_interpret(struct ptlrpc_request *req, - struct md_enqueue_info *minfo, int rc) +static int ll_statahead_interpret(struct req_capsule *pill, + struct md_op_item *item, int rc) { - struct lookup_intent *it = &minfo->mi_it; - struct inode *dir = minfo->mi_dir; + struct lookup_intent *it = &item->mop_it; + struct inode *dir = item->mop_dir; struct ll_inode_info *lli = ll_i2info(dir); struct ll_statahead_info *sai = lli->lli_sai; - struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata; + struct sa_entry *entry = (struct sa_entry *)item->mop_cbdata; + struct mdt_body *body; + struct inode *child; u64 handle = 0; if (it_disposition(it, DISP_LOOKUP_NEG)) @@ -760,10 +746,37 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, CDEBUG(D_READA, "sa_entry %.*s rc %d\n", entry->se_qstr.len, entry->se_qstr.name, rc); - if (rc) { - ll_intent_release(it); - sa_fini_data(minfo); - } else { + if (rc) + goto out; + + body = req_capsule_server_get(pill, &RMF_MDT_BODY); + if (!body) { + rc = -EFAULT; + goto out; + } + + child = entry->se_inode; + /* revalidate; unlinked and re-created with the same name */ + if (unlikely(!lu_fid_eq(&item->mop_data.op_fid2, &body->mbo_fid1))) { + if (child) { + entry->se_inode = NULL; + iput(child); + } + /* The mdt_body is invalid. Skip this entry */ + rc = -EAGAIN; + goto out; + } + + entry->se_handle = it->it_lock_handle; + /* + * In ptlrpcd context, it is not allowed to generate new RPCs + * especially for striped directories. + */ + it->it_extra_rpc_check = 1; + rc = ll_statahead_interpret_common(dir, sai, pill, it, entry, body); + if (rc == -EAGAIN && it->it_extra_rpc_need) { + struct ll_interpret_work *work; + /* * release ibits lock ASAP to avoid deadlock when statahead * thread enqueues lock on parent in readdir and another @@ -772,53 +785,53 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, */ handle = it->it_lock_handle; ll_intent_drop_lock(it); - ll_unlock_md_op_lsm(&minfo->mi_data); - } - - spin_lock(&lli->lli_sa_lock); - if (rc) { - if (__sa_make_ready(sai, entry, rc)) - wake_up(&sai->sai_waitq); - } else { - int first = 0; + ll_unlock_md_op_lsm(&item->mop_data); + it->it_extra_rpc_check = 0; + it->it_extra_rpc_need = 0; - entry->se_minfo = minfo; - entry->se_req = ptlrpc_request_addref(req); /* - * Release the async ibits lock ASAP to avoid deadlock - * when statahead thread tries to enqueue lock on parent - * for readpage and other tries to enqueue lock on child - * with parent's lock held, for example: unlink. + * If the stat-ahead entry is a striped directory, there are two + * solutions: + * 1. It can drop the result, let the scanning process do stat() + * on the striped directory in synchronous way. By this way, it + * can avoid to generate new RPCs to obtain the attributes for + * slaves of the striped directory in the ptlrpcd context as it + * is dangerous of blocking in ptlrpcd thread. + * 2. Use work queue or the separate statahead thread to handle + * the extra RPCs (@ll_prep_inode->@lmv_revalidate_slaves). + * Here we adopt the second solution. */ - entry->se_handle = handle; - if (!sa_has_callback(sai)) - first = 1; - - list_add_tail(&entry->se_list, &sai->sai_interim_entries); - - if (first && sai->sai_task) - wake_up_process(sai->sai_task); + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (!work) { + rc = -ENOMEM; + goto out; + } + INIT_WORK(&work->lpw_work, ll_statahead_interpret_work); + work->lpw_item = item; + work->lpw_pill = pill; + ptlrpc_request_addref(pill->rc_req); + schedule_work(&work->lpw_work); + return 0; } - sai->sai_replied++; - - spin_unlock(&lli->lli_sa_lock); +out: + ll_statahead_interpret_fini(lli, sai, item, entry, NULL, rc); return rc; } /* async stat for file not found in dcache */ static int sa_lookup(struct inode *dir, struct sa_entry *entry) { - struct md_enqueue_info *minfo; + struct md_op_item *item; int rc; - minfo = sa_prep_data(dir, NULL, entry); - if (IS_ERR(minfo)) - return PTR_ERR(minfo); + item = sa_prep_data(dir, NULL, entry); + if (IS_ERR(item)) + return PTR_ERR(item); - rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo); + rc = md_intent_getattr_async(ll_i2mdexp(dir), item); if (rc) - sa_fini_data(minfo); + sa_fini_data(item); return rc; } @@ -838,7 +851,7 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, .it_op = IT_GETATTR, .it_lock_handle = 0 }; - struct md_enqueue_info *minfo; + struct md_op_item *item; int rc; if (unlikely(!inode)) @@ -847,9 +860,9 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, if (d_mountpoint(dentry)) return 1; - minfo = sa_prep_data(dir, inode, entry); - if (IS_ERR(minfo)) - return PTR_ERR(minfo); + item = sa_prep_data(dir, inode, entry); + if (IS_ERR(item)) + return PTR_ERR(item); entry->se_inode = igrab(inode); rc = md_revalidate_lock(ll_i2mdexp(dir), &it, ll_inode2fid(inode), @@ -857,15 +870,15 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, if (rc == 1) { entry->se_handle = it.it_lock_handle; ll_intent_release(&it); - sa_fini_data(minfo); + sa_fini_data(item); return 1; } - rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo); + rc = md_intent_getattr_async(ll_i2mdexp(dir), item); if (rc) { entry->se_inode = NULL; iput(inode); - sa_fini_data(minfo); + sa_fini_data(item); } return rc; @@ -1040,10 +1053,8 @@ static int ll_statahead_thread(void *arg) break; } - sai->sai_in_readpage = 1; page = ll_get_dir_page(dir, op_data, pos, NULL); ll_unlock_md_op_lsm(op_data); - sai->sai_in_readpage = 0; if (IS_ERR(page)) { rc = PTR_ERR(page); CDEBUG(D_READA, @@ -1108,11 +1119,6 @@ static int ll_statahead_thread(void *arg) while (({set_current_state(TASK_IDLE); sai->sai_task; })) { - if (sa_has_callback(sai)) { - __set_current_state(TASK_RUNNING); - sa_handle_callback(sai); - } - spin_lock(&lli->lli_agl_lock); while (sa_sent_full(sai) && !agl_list_empty(sai)) { @@ -1191,16 +1197,11 @@ static int ll_statahead_thread(void *arg) /* * statahead is finished, but statahead entries need to be cached, wait - * for file release to stop me. + * for file release closedir() call to stop me. */ while (({set_current_state(TASK_IDLE); sai->sai_task; })) { - if (sa_has_callback(sai)) { - __set_current_state(TASK_RUNNING); - sa_handle_callback(sai); - } else { - schedule(); - } + schedule(); } __set_current_state(TASK_RUNNING); out: @@ -1215,9 +1216,6 @@ static int ll_statahead_thread(void *arg) msleep(125); } - /* release resources held by statahead RPCs */ - sa_handle_callback(sai); - CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n", sai, parent); @@ -1502,10 +1500,6 @@ static int revalidate_statahead_dentry(struct inode *dir, goto out_unplug; } - /* if statahead is busy in readdir, help it do post-work */ - if (!sa_ready(entry) && sai->sai_in_readpage) - sa_handle_callback(sai); - if (!sa_ready(entry)) { spin_lock(&lli->lli_sa_lock); sai->sai_index_wait = entry->se_index; diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 0988b1a..e10d1bf 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3626,9 +3626,9 @@ static int lmv_clear_open_replay_data(struct obd_export *exp, } static int lmv_intent_getattr_async(struct obd_export *exp, - struct md_enqueue_info *minfo) + struct md_op_item *item) { - struct md_op_data *op_data = &minfo->mi_data; + struct md_op_data *op_data = &item->mop_data; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *ptgt = NULL; @@ -3652,7 +3652,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp, if (ctgt != ptgt) return -EREMOTE; - return md_intent_getattr_async(ptgt->ltd_exp, minfo); + return md_intent_getattr_async(ptgt->ltd_exp, item); } static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index fab40bd..2416607 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -130,8 +130,7 @@ int mdc_cancel_unused(struct obd_export *exp, const struct lu_fid *fid, int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, struct lu_fid *fid, u64 *bits); -int mdc_intent_getattr_async(struct obd_export *exp, - struct md_enqueue_info *minfo); +int mdc_intent_getattr_async(struct obd_export *exp, struct md_op_item *item); enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags, const struct lu_fid *fid, enum ldlm_type type, diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index ae55cc3..31c5bc0 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -49,7 +49,7 @@ struct mdc_getattr_args { struct obd_export *ga_exp; - struct md_enqueue_info *ga_minfo; + struct md_op_item *ga_item; }; int it_open_error(int phase, struct lookup_intent *it) @@ -1365,10 +1365,10 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, { struct mdc_getattr_args *ga = args; struct obd_export *exp = ga->ga_exp; - struct md_enqueue_info *minfo = ga->ga_minfo; - struct ldlm_enqueue_info *einfo = &minfo->mi_einfo; - struct lookup_intent *it = &minfo->mi_it; - struct lustre_handle *lockh = &minfo->mi_lockh; + struct md_op_item *item = ga->ga_item; + struct ldlm_enqueue_info *einfo = &item->mop_einfo; + struct lookup_intent *it = &item->mop_it; + struct lustre_handle *lockh = &item->mop_lockh; struct ldlm_reply *lockrep; u64 flags = LDLM_FL_HAS_INTENT; @@ -1393,18 +1393,18 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, if (rc) goto out; - rc = mdc_finish_intent_lock(exp, req, &minfo->mi_data, it, lockh); + rc = mdc_finish_intent_lock(exp, req, &item->mop_data, it, lockh); out: - minfo->mi_cb(req, minfo, rc); + item->mop_cb(&req->rq_pill, item, rc); return 0; } int mdc_intent_getattr_async(struct obd_export *exp, - struct md_enqueue_info *minfo) + struct md_op_item *item) { - struct md_op_data *op_data = &minfo->mi_data; - struct lookup_intent *it = &minfo->mi_it; + struct md_op_data *op_data = &item->mop_data; + struct lookup_intent *it = &item->mop_it; struct ptlrpc_request *req; struct mdc_getattr_args *ga; struct ldlm_res_id res_id; @@ -1433,11 +1433,11 @@ int mdc_intent_getattr_async(struct obd_export *exp, * to avoid possible races. It is safe to have glimpse handler * for non-DOM locks and costs nothing. */ - if (!minfo->mi_einfo.ei_cb_gl) - minfo->mi_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast; + if (!item->mop_einfo.ei_cb_gl) + item->mop_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast; - rc = ldlm_cli_enqueue(exp, &req, &minfo->mi_einfo, &res_id, &policy, - &flags, NULL, 0, LVB_T_NONE, &minfo->mi_lockh, 1); + rc = ldlm_cli_enqueue(exp, &req, &item->mop_einfo, &res_id, &policy, + &flags, NULL, 0, LVB_T_NONE, &item->mop_lockh, 1); if (rc < 0) { ptlrpc_req_finished(req); return rc; @@ -1445,7 +1445,7 @@ int mdc_intent_getattr_async(struct obd_export *exp, ga = ptlrpc_req_async_args(ga, req); ga->ga_exp = exp; - ga->ga_minfo = minfo; + ga->ga_item = item; req->rq_interpret_reply = mdc_intent_getattr_async_interpret; ptlrpcd_add_req(req); From patchwork Tue Sep 6 01:55:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0AE30ECAAA1 for ; Tue, 6 Sep 2022 01:55:55 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lG2z2Sz1y7N; Mon, 5 Sep 2022 18:55:54 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l65y5mz1y6k for ; Mon, 5 Sep 2022 18:55:46 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B986E100B001; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B065F58994; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:21 -0400 Message-Id: <1662429337-18737-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/24] lustre: statahead: add total hit/miss count stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin In this patch, it adds total hit/miss count stats for statahead. These statistics are updated when the statahead thread terminated. This patch also adds support to clear all statahead stats: $LCTL set_param llite.*.statahead_stats=0 WC-bug-id: https://jira.whamcloud.com/browse/LU-14139 Lustre-commit: b9167201a00e38ce8 ("LU-14139 statahead: add total hit/miss count stats") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/46309 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 2 ++ fs/lustre/llite/llite_lib.c | 2 ++ fs/lustre/llite/lproc_llite.c | 26 +++++++++++++++++++++++--- fs/lustre/llite/statahead.c | 3 +++ 4 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index e7f703a..227b944 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -749,6 +749,8 @@ struct ll_sb_info { * count */ atomic_t ll_agl_total; /* AGL thread started count */ + atomic_t ll_sa_hit_total; /* total hit count */ + atomic_t ll_sa_miss_total; /* total miss count */ dev_t ll_sdev_orig; /* save s_dev before assign for * clustered nfs diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 0bffe5e..191a83c 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -168,6 +168,8 @@ static struct ll_sb_info *ll_init_sbi(void) atomic_set(&sbi->ll_sa_wrong, 0); atomic_set(&sbi->ll_sa_running, 0); atomic_set(&sbi->ll_agl_total, 0); + atomic_set(&sbi->ll_sa_hit_total, 0); + atomic_set(&sbi->ll_sa_miss_total, 0); set_bit(LL_SBI_AGL_ENABLED, sbi->ll_flags); set_bit(LL_SBI_FAST_READ, sbi->ll_flags); set_bit(LL_SBI_TINY_WRITE, sbi->ll_flags); diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 095b696..1391828 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -843,14 +843,34 @@ static int ll_statahead_stats_seq_show(struct seq_file *m, void *v) seq_printf(m, "statahead total: %u\n" "statahead wrong: %u\n" - "agl total: %u\n", + "agl total: %u\n" + "hit_total: %u\n" + "miss_total: %u\n", atomic_read(&sbi->ll_sa_total), atomic_read(&sbi->ll_sa_wrong), - atomic_read(&sbi->ll_agl_total)); + atomic_read(&sbi->ll_agl_total), + atomic_read(&sbi->ll_sa_hit_total), + atomic_read(&sbi->ll_sa_miss_total)); return 0; } -LDEBUGFS_SEQ_FOPS_RO(ll_statahead_stats); +static ssize_t ll_statahead_stats_seq_write(struct file *file, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct seq_file *m = file->private_data; + struct super_block *sb = m->private; + struct ll_sb_info *sbi = ll_s2sbi(sb); + + atomic_set(&sbi->ll_sa_total, 0); + atomic_set(&sbi->ll_sa_wrong, 0); + atomic_set(&sbi->ll_agl_total, 0); + atomic_set(&sbi->ll_sa_hit_total, 0); + atomic_set(&sbi->ll_sa_miss_total, 0); + + return count; +} +LDEBUGFS_SEQ_FOPS(ll_statahead_stats); static ssize_t lazystatfs_show(struct kobject *kobj, struct attribute *attr, diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 5662f44..1f1fafd 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1224,6 +1224,9 @@ static int ll_statahead_thread(void *arg) spin_unlock(&lli->lli_sa_lock); wake_up(&sai->sai_waitq); + atomic_add(sai->sai_hit, &sbi->ll_sa_hit_total); + atomic_add(sai->sai_miss, &sbi->ll_sa_miss_total); + ll_sai_put(sai); return rc; From patchwork Tue Sep 6 01:55:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966742 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C326EECAAA1 for ; Tue, 6 Sep 2022 01:56:32 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7m03XTVz1yD4; Mon, 5 Sep 2022 18:56:32 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l74RJnz1y76 for ; Mon, 5 Sep 2022 18:55:47 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id BA210100B003; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B43D7589A2; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:22 -0400 Message-Id: <1662429337-18737-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/24] lnet: o2iblnd: Salt comp_vector X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ian Ziemba , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Ian Ziemba If conns_per_peer is greater than 1, all the connections targeting the same peer are assigned the same comp_vector. This results in multiple IB CQs targeting the same peer to be serialized on a single comp_vector. Help spread out the IB CQ work to multiple cores by salting comp_vector based on number of connections. 1 client to 1 server LST 1M write results with 4 conns_per_peer and RXE configured to spread out work based on comp_vector. Before: 1377.92 MB/s After: 3828.48 MB/s HPE-bug-id: LUS-11043 WC-bug-id: https://jira.whamcloud.com/browse/LU-16078 Lustre-commit: 1ef1fa06b20c424f5 ("LU-16078 o2iblnd: Salt comp_vector") Signed-off-by: Ian Ziemba Reviewed-on: https://review.whamcloud.com/48148 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 14 +++++++++++--- net/lnet/klnds/o2iblnd/o2iblnd.h | 2 ++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index ea28c65..c713528 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -338,6 +338,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp, peer_ni->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits; peer_ni->ibp_queue_depth_mod = 0; /* try to use the default */ kref_init(&peer_ni->ibp_kref); + atomic_set(&peer_ni->ibp_nconns, 0); INIT_HLIST_NODE(&peer_ni->ibp_list); INIT_LIST_HEAD(&peer_ni->ibp_conns); @@ -569,7 +570,7 @@ static int kiblnd_get_completion_vector(struct kib_conn *conn, int cpt) int vectors; int off; int i; - lnet_nid_t nid = conn->ibc_peer->ibp_nid; + lnet_nid_t ibp_nid; vectors = conn->ibc_cmid->device->num_comp_vectors; if (vectors <= 1) @@ -579,8 +580,13 @@ static int kiblnd_get_completion_vector(struct kib_conn *conn, int cpt) if (!mask) return 0; - /* hash NID to CPU id in this partition... */ - off = do_div(nid, cpumask_weight(*mask)); + /* hash NID to CPU id in this partition... when targeting a single peer + * with multiple QPs, to engage more cores in CQ processing to a single + * peer, use ibp_nconns to salt the value the comp_vector value + */ + ibp_nid = conn->ibc_peer->ibp_nid + + atomic_read(&conn->ibc_peer->ibp_nconns); + off = do_div(ibp_nid, cpumask_weight(*mask)); for_each_cpu(i, *mask) { if (!off--) return i % vectors; @@ -889,6 +895,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, conn->ibc_state = state; /* 1 more conn */ + atomic_inc(&peer_ni->ibp_nconns); atomic_inc(&net->ibn_nconns); return conn; @@ -954,6 +961,7 @@ void kiblnd_destroy_conn(struct kib_conn *conn) kiblnd_peer_decref(peer_ni); rdma_destroy_id(cmid); + atomic_dec(&peer_ni->ibp_nconns); atomic_dec(&net->ibn_nconns); } } diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 0066e85..56d486f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -522,6 +522,8 @@ struct kib_peer_ni { u16 ibp_queue_depth; /* reduced value which allows conn to be created if max fails */ u16 ibp_queue_depth_mod; + /* Number of connections allocated. */ + atomic_t ibp_nconns; }; extern struct kib_data kiblnd_data; From patchwork Tue Sep 6 01:55:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 18223ECAAD3 for ; Tue, 6 Sep 2022 01:56:10 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lY4nDlz1yBy; Mon, 5 Sep 2022 18:56:09 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l82q4Fz1y2G for ; Mon, 5 Sep 2022 18:55:48 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id BDC3E100B004; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B96D458999; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:23 -0400 Message-Id: <1662429337-18737-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/24] lnet: selftest: use preallocate bulk for server X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov Server side want to have a preallocate bulk to avoid large lock contention on the page cache. Without it LST limited with 35Gb/s speed with 3 rail host (HDR each) due large CPU usage. Preallocate bulks increase a memory consumption for small bulk, but performance improved dramatically up to 74Gb/s with very low cpu usage. WC-bug-id: https://jira.whamcloud.com/browse/LU-16011 Lustre-commit: 2447564e120cf6226 ("LU-16011 lnet: use preallocate bulk for server") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/47952 Reviewed-by: Chris Horn Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/selftest/brw_test.c | 67 ++++++++++++++++++++++++++++++------------- net/lnet/selftest/framework.c | 18 +++++------- net/lnet/selftest/rpc.c | 51 +++++++++++++++++++++----------- net/lnet/selftest/selftest.h | 15 ++++++---- 4 files changed, 99 insertions(+), 52 deletions(-) diff --git a/net/lnet/selftest/brw_test.c b/net/lnet/selftest/brw_test.c index 87ad765..a00b731 100644 --- a/net/lnet/selftest/brw_test.c +++ b/net/lnet/selftest/brw_test.c @@ -124,11 +124,12 @@ list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) { bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid, NULL), - off, npg, len, opc == LST_BRW_READ); + npg); if (!bulk) { brw_client_fini(tsi); return -ENOMEM; } + srpc_init_bulk(bulk, off, npg, len, opc == LST_BRW_READ); tsu->tsu_private = bulk; } @@ -389,8 +390,6 @@ static int brw_inject_one_error(void) CDEBUG(D_NET, "Transferred %d pages bulk data %s %s\n", blk->bk_niov, blk->bk_sink ? "from" : "to", libcfs_id2str(rpc->srpc_peer)); - - sfw_free_pages(rpc); } static int @@ -438,7 +437,6 @@ static int brw_inject_one_error(void) struct srpc_brw_reply *reply = &replymsg->msg_body.brw_reply; struct srpc_brw_reqst *reqst = &reqstmsg->msg_body.brw_reqst; int npg; - int rc; LASSERT(sv->sv_id == SRPC_SERVICE_BRW); @@ -489,11 +487,8 @@ static int brw_inject_one_error(void) return 0; } - rc = sfw_alloc_pages(rpc, rpc->srpc_scd->scd_cpt, npg, - reqst->brw_len, - reqst->brw_rw == LST_BRW_WRITE); - if (rc) - return rc; + srpc_init_bulk(rpc->srpc_bulk, 0, npg, reqst->brw_len, + reqst->brw_rw == LST_BRW_WRITE); if (reqst->brw_rw == LST_BRW_READ) brw_fill_bulk(rpc->srpc_bulk, reqst->brw_flags, BRW_MAGIC); @@ -503,23 +498,55 @@ static int brw_inject_one_error(void) return 0; } -struct sfw_test_client_ops brw_test_client; +static int +brw_srpc_init(struct srpc_server_rpc *rpc, int cpt) +{ + /* just alloc a maximal size - actual values will be adjusted later */ + rpc->srpc_bulk = srpc_alloc_bulk(cpt, LNET_MAX_IOV); + if (!rpc->srpc_bulk) + return -ENOMEM; + + srpc_init_bulk(rpc->srpc_bulk, 0, LNET_MAX_IOV, 0, 0); + + return 0; +} -void brw_init_test_client(void) +static void +brw_srpc_fini(struct srpc_server_rpc *rpc) { - brw_test_client.tso_init = brw_client_init; - brw_test_client.tso_fini = brw_client_fini; - brw_test_client.tso_prep_rpc = brw_client_prep_rpc; - brw_test_client.tso_done_rpc = brw_client_done_rpc; + /* server RPC have just MAX_IOV size */ + srpc_init_bulk(rpc->srpc_bulk, 0, LNET_MAX_IOV, 0, 0); + + srpc_free_bulk(rpc->srpc_bulk); + rpc->srpc_bulk = NULL; +} + +struct sfw_test_client_ops brw_test_client = { + .tso_init = brw_client_init, + .tso_fini = brw_client_fini, + .tso_prep_rpc = brw_client_prep_rpc, + .tso_done_rpc = brw_client_done_rpc, }; -struct srpc_service brw_test_service; +struct srpc_service brw_test_service = { + .sv_id = SRPC_SERVICE_BRW, + .sv_name = "brw_test", + .sv_handler = brw_server_handle, + .sv_bulk_ready = brw_bulk_ready, + + .sv_srpc_init = brw_srpc_init, + .sv_srpc_fini = brw_srpc_fini, +}; void brw_init_test_service(void) { - brw_test_service.sv_id = SRPC_SERVICE_BRW; - brw_test_service.sv_name = "brw_test"; - brw_test_service.sv_handler = brw_server_handle; - brw_test_service.sv_bulk_ready = brw_bulk_ready; + unsigned long cache_size = totalram_pages() >> 1; + + /* brw prealloc cache should don't eat more than half memory */ + cache_size /= LNET_MAX_IOV; + brw_test_service.sv_wi_total = brw_srv_workitems; + + if (brw_test_service.sv_wi_total > cache_size) + brw_test_service.sv_wi_total = cache_size; } diff --git a/net/lnet/selftest/framework.c b/net/lnet/selftest/framework.c index e84904e..121bdf0 100644 --- a/net/lnet/selftest/framework.c +++ b/net/lnet/selftest/framework.c @@ -290,8 +290,10 @@ swi_state2str(rpc->srpc_wi.swi_state), status); - if (rpc->srpc_bulk) - sfw_free_pages(rpc); + if (rpc->srpc_bulk) { + srpc_free_bulk(rpc->srpc_bulk); + rpc->srpc_bulk = NULL; + } } static void @@ -1088,13 +1090,6 @@ return -ENOENT; } -void -sfw_free_pages(struct srpc_server_rpc *rpc) -{ - srpc_free_bulk(rpc->srpc_bulk); - rpc->srpc_bulk = NULL; -} - int sfw_alloc_pages(struct srpc_server_rpc *rpc, int cpt, int npages, int len, int sink) @@ -1102,10 +1097,12 @@ LASSERT(!rpc->srpc_bulk); LASSERT(npages > 0 && npages <= LNET_MAX_IOV); - rpc->srpc_bulk = srpc_alloc_bulk(cpt, 0, npages, len, sink); + rpc->srpc_bulk = srpc_alloc_bulk(cpt, npages); if (!rpc->srpc_bulk) return -ENOMEM; + srpc_init_bulk(rpc->srpc_bulk, 0, npages, len, sink); + return 0; } @@ -1629,7 +1626,6 @@ struct srpc_client_rpc * INIT_LIST_HEAD(&sfw_data.fw_zombie_rpcs); INIT_LIST_HEAD(&sfw_data.fw_zombie_sessions); - brw_init_test_client(); brw_init_test_service(); rc = sfw_register_test(&brw_test_service, &brw_test_client); LASSERT(!rc); diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index c376019..b9d8211 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -109,14 +109,12 @@ void srpc_get_counters(struct srpc_counters *cnt) } static int -srpc_add_bulk_page(struct srpc_bulk *bk, struct page *pg, int i, int off, - int nob) +srpc_init_bulk_page(struct srpc_bulk *bk, int i, int off, int nob) { LASSERT(off < PAGE_SIZE); LASSERT(nob > 0 && nob <= PAGE_SIZE); bk->bk_iovs[i].bv_offset = off; - bk->bk_iovs[i].bv_page = pg; bk->bk_iovs[i].bv_len = nob; return nob; } @@ -140,9 +138,7 @@ void srpc_get_counters(struct srpc_counters *cnt) kfree(bk); } -struct srpc_bulk * -srpc_alloc_bulk(int cpt, unsigned int bulk_off, unsigned int bulk_npg, - unsigned int bulk_len, int sink) +struct srpc_bulk *srpc_alloc_bulk(int cpt, unsigned int bulk_npg) { struct srpc_bulk *bk; int i; @@ -157,13 +153,10 @@ struct srpc_bulk * } memset(bk, 0, offsetof(struct srpc_bulk, bk_iovs[bulk_npg])); - bk->bk_sink = sink; - bk->bk_len = bulk_len; bk->bk_niov = bulk_npg; for (i = 0; i < bulk_npg; i++) { struct page *pg; - int nob; pg = alloc_pages_node(cfs_cpt_spread_node(lnet_cpt_table(), cpt), @@ -173,15 +166,37 @@ struct srpc_bulk * srpc_free_bulk(bk); return NULL; } + bk->bk_iovs[i].bv_page = pg; + } + + return bk; +} + +void +srpc_init_bulk(struct srpc_bulk *bk, unsigned int bulk_off, + unsigned int bulk_npg, unsigned int bulk_len, int sink) +{ + int i; + + LASSERT(bk); + LASSERT(bulk_npg > 0 && bulk_npg <= LNET_MAX_IOV); + + bk->bk_sink = sink; + bk->bk_len = bulk_len; + bk->bk_niov = bulk_npg; + + for (i = 0; i < bulk_npg && bulk_len > 0; i++) { + int nob; + + LASSERT(bk->bk_iovs[i].bv_page); nob = min_t(unsigned int, bulk_off + bulk_len, PAGE_SIZE) - bulk_off; - srpc_add_bulk_page(bk, pg, i, bulk_off, nob); + + srpc_init_bulk_page(bk, i, bulk_off, nob); bulk_len -= nob; bulk_off = 0; } - - return bk; } static inline u64 @@ -195,7 +210,6 @@ struct srpc_bulk * struct srpc_service_cd *scd, struct srpc_buffer *buffer) { - memset(rpc, 0, sizeof(*rpc)); swi_init_workitem(&rpc->srpc_wi, srpc_handle_rpc, srpc_serv_is_framework(scd->scd_svc) ? lst_serial_wq : lst_test_wq[scd->scd_cpt]); @@ -207,6 +221,9 @@ struct srpc_bulk * rpc->srpc_peer = buffer->buf_peer; rpc->srpc_self = buffer->buf_self; LNetInvalidateMDHandle(&rpc->srpc_replymdh); + + rpc->srpc_aborted = 0; + rpc->srpc_status = 0; } static void @@ -244,6 +261,8 @@ struct srpc_bulk * struct srpc_server_rpc, srpc_list)) != NULL) { list_del(&rpc->srpc_list); + if (svc->sv_srpc_fini) + svc->sv_srpc_fini(rpc); kfree(rpc); } } @@ -314,7 +333,8 @@ struct srpc_bulk * for (j = 0; j < nrpcs; j++) { rpc = kzalloc_cpt(sizeof(*rpc), GFP_NOFS, i); - if (!rpc) { + if (!rpc || + (svc->sv_srpc_init && svc->sv_srpc_init(rpc, i))) { srpc_service_fini(svc); return -ENOMEM; } @@ -946,8 +966,7 @@ struct srpc_bulk * atomic_inc(&RPC_STAT32(SRPC_RPC_DROP)); if (rpc->srpc_done) - (*rpc->srpc_done) (rpc); - LASSERT(!rpc->srpc_bulk); + (*rpc->srpc_done)(rpc); spin_lock(&scd->scd_lock); diff --git a/net/lnet/selftest/selftest.h b/net/lnet/selftest/selftest.h index 223a432..8ae258d 100644 --- a/net/lnet/selftest/selftest.h +++ b/net/lnet/selftest/selftest.h @@ -316,6 +316,12 @@ struct srpc_service { */ int (*sv_handler)(struct srpc_server_rpc *); int (*sv_bulk_ready)(struct srpc_server_rpc *, int); + + /** Service side srpc constructor/destructor. + * used for the bulk preallocation as usual. + */ + int (*sv_srpc_init)(struct srpc_server_rpc *rpc, int cpt); + void (*sv_srpc_fini)(struct srpc_server_rpc *rpc); }; struct sfw_session { @@ -424,7 +430,6 @@ int sfw_create_test_rpc(struct sfw_test_unit *tsu, void sfw_post_rpc(struct srpc_client_rpc *rpc); void sfw_client_rpc_done(struct srpc_client_rpc *rpc); void sfw_unpack_message(struct srpc_msg *msg); -void sfw_free_pages(struct srpc_server_rpc *rpc); void sfw_add_bulk_page(struct srpc_bulk *bk, struct page *pg, int i); int sfw_alloc_pages(struct srpc_server_rpc *rpc, int cpt, int npages, int len, int sink); @@ -439,9 +444,10 @@ struct srpc_client_rpc * void srpc_post_rpc(struct srpc_client_rpc *rpc); void srpc_abort_rpc(struct srpc_client_rpc *rpc, int why); void srpc_free_bulk(struct srpc_bulk *bk); -struct srpc_bulk *srpc_alloc_bulk(int cpt, unsigned int off, - unsigned int bulk_npg, unsigned int bulk_len, - int sink); +struct srpc_bulk *srpc_alloc_bulk(int cpt, unsigned int bulk_npg); +void srpc_init_bulk(struct srpc_bulk *bk, unsigned int off, + unsigned int bulk_npg, unsigned int bulk_len, int sink); + void srpc_send_rpc(struct swi_workitem *wi); int srpc_send_reply(struct srpc_server_rpc *rpc); int srpc_add_service(struct srpc_service *sv); @@ -605,7 +611,6 @@ struct srpc_bulk *srpc_alloc_bulk(int cpt, unsigned int off, } extern struct sfw_test_client_ops brw_test_client; -void brw_init_test_client(void); extern struct srpc_service brw_test_service; void brw_init_test_service(void); From patchwork Tue Sep 6 01:55:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A72D5ECAAA1 for ; Tue, 6 Sep 2022 01:56:05 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lT2Xp9z1y2N; Mon, 5 Sep 2022 18:56:05 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l91K1Dz1y7Z for ; Mon, 5 Sep 2022 18:55:49 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C2005100B005; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BDA16589A0; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:24 -0400 Message-Id: <1662429337-18737-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/24] lnet: change ni_status in lnet_ni to u32* X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown struct lnet_ni.ni_status points to a 'struct lnet_ni_status', but only the ns_status field of that structure is ever accessed. Change ni_status to point directly to just the ns_status field. This will provide flexibility for introducing a variant for 'struct lnet_ni_status' which holds a large-address nid. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 9e2df7e5cc5fca3c6 ("LU-10391 lnet: change ni_status in lnet_ni to u32*") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/44626 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 8 ++++---- include/linux/lnet/lib-types.h | 2 +- net/lnet/lnet/api-ni.c | 2 +- net/lnet/lnet/router.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 57c8dc2..2900c05 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -108,11 +108,11 @@ { bool update = false; - if (ni->ni_status && ni->ni_status->ns_status != status) { + if (ni->ni_status && *ni->ni_status != status) { CDEBUG(D_NET, "ni %s status changed from %#x to %#x\n", libcfs_nidstr(&ni->ni_nid), - ni->ni_status->ns_status, status); - ni->ni_status->ns_status = status; + *ni->ni_status, status); + *ni->ni_status = status; update = true; } @@ -128,7 +128,7 @@ else if (atomic_read(&ni->ni_fatal_error_on)) return LNET_NI_STATUS_DOWN; else if (ni->ni_status) - return ni->ni_status->ns_status; + return *ni->ni_status; else return LNET_NI_STATUS_UP; } diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 09b9d8e..2266d1b 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -500,7 +500,7 @@ struct lnet_ni { struct lnet_net *ni_net; /* my health status */ - struct lnet_ni_status *ni_status; + u32 *ni_status; /* NI FSM. Protected by lnet_ni_lock() */ enum lnet_ni_state ni_state; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 7c94d16..0449136 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1967,7 +1967,7 @@ struct lnet_ping_buffer * lnet_ni_lock(ni); ns->ns_status = lnet_ni_get_status_locked(ni); - ni->ni_status = ns; + ni->ni_status = &ns->ns_status; lnet_ni_unlock(ni); i++; diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index b684243..98707e9 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1082,7 +1082,7 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) */ if (atomic_read(&ni->ni_fatal_error_on) && ni->ni_status && - ni->ni_status->ns_status != LNET_NI_STATUS_DOWN && + *ni->ni_status != LNET_NI_STATUS_DOWN && lnet_ni_set_status(ni, LNET_NI_STATUS_DOWN)) push = true; } From patchwork Tue Sep 6 01:55:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966728 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A158BECAAD3 for ; Tue, 6 Sep 2022 01:55:59 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lL6wkVz1yBf; Mon, 5 Sep 2022 18:55:58 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7l96fnPz1y80 for ; Mon, 5 Sep 2022 18:55:49 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C4CAA100B006; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C0E2D589A1; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:25 -0400 Message-Id: <1662429337-18737-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/24] lustre: llite: Rework upper/lower DIO/AIO X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell One of the patches for LU-13799, "Implement lower/upper aio" (https://review.whamcloud.com/44209/) created a complicated setup where the cl_dio_aio struct was used both for the top level DIO or AIO and for the lower level sub I/Os (corresponding to stripes). This is quite complicated and hard to follow, so this rewrites these two uses to be separate structs. This incidentally fixes at least one possible memory leak, but is mostly a cleanup. Fixes: c51105d64c ("lustre: llite: Implement lower/upper aio") WC-bug-id: https://jira.whamcloud.com/browse/LU-15811 Lustre-commit: 51c18539338f1a23f ("LU-15811 llite: Rework upper/lower DIO/AIO") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/47187 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 41 ++++++++----- fs/lustre/llite/file.c | 37 ++++++----- fs/lustre/llite/rw26.c | 83 +++++++++++++------------ fs/lustre/obdclass/cl_internal.h | 1 + fs/lustre/obdclass/cl_io.c | 129 ++++++++++++++++++++++----------------- fs/lustre/obdclass/cl_object.c | 6 ++ 6 files changed, 175 insertions(+), 122 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index c717d03..0f28cfe 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1788,8 +1788,8 @@ struct cl_io { enum cl_io_state ci_state; /** main object this io is against. Immutable after creation. */ struct cl_object *ci_obj; - /** one AIO request might be split in cl_io_loop */ - struct cl_dio_aio *ci_aio; + /** top level dio_aio */ + struct cl_dio_aio *ci_dio_aio; /** * Upper layer io, of which this io is a part of. Immutable after * creation. @@ -2532,11 +2532,12 @@ void cl_req_attr_set(const struct lu_env *env, struct cl_object *obj, struct cl_sync_io; struct cl_dio_aio; +struct cl_sub_dio; typedef void (cl_sync_io_end_t)(const struct lu_env *, struct cl_sync_io *); -void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, - struct cl_dio_aio *aio, cl_sync_io_end_t *end); +void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, void *dio_aio, + cl_sync_io_end_t *end); int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, long timeout); @@ -2544,9 +2545,12 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, int ioret); int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret); -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj, - struct cl_dio_aio *ll_aio); -void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio); +struct cl_dio_aio *cl_dio_aio_alloc(struct kiocb *iocb, struct cl_object *obj, + bool is_aio); +struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool nofree); +void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio, + bool always_free); +void cl_sub_dio_free(struct cl_sub_dio *sdio, bool nofree); static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr) { @@ -2568,8 +2572,8 @@ struct cl_sync_io { wait_queue_head_t csi_waitq; /** callback to invoke when this IO is finished */ cl_sync_io_end_t *csi_end_io; - /** aio private data */ - struct cl_dio_aio *csi_aio; + /* private pointer for an associated DIO/AIO */ + void *csi_dio_aio; }; @@ -2587,17 +2591,26 @@ struct ll_dio_pages { loff_t ldp_file_offset; }; -/* To support Direct AIO */ +/* Top level struct used for AIO and DIO */ struct cl_dio_aio { struct cl_sync_io cda_sync; - struct cl_page_list cda_pages; struct cl_object *cda_obj; struct kiocb *cda_iocb; ssize_t cda_bytes; - struct cl_dio_aio *cda_ll_aio; - struct ll_dio_pages cda_dio_pages; unsigned int cda_no_aio_complete:1, - cda_no_aio_free:1; + cda_no_sub_free:1; +}; + +/* Sub-dio used for splitting DIO (and AIO, because AIO is DIO) according to + * the layout/striping, so we can do parallel submit of DIO RPCs + */ +struct cl_sub_dio { + struct cl_sync_io csd_sync; + struct cl_page_list csd_pages; + ssize_t csd_bytes; + struct cl_dio_aio *csd_ll_aio; + struct ll_dio_pages csd_dio_pages; + unsigned int csd_no_free:1; }; void ll_release_user_pages(struct page **pages, int npages); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index ac20d05..92e450f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1664,7 +1664,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, unsigned int dio_lock = 0; bool is_aio = false; bool is_parallel_dio = false; - struct cl_dio_aio *ci_aio = NULL; + struct cl_dio_aio *ci_dio_aio = NULL; size_t per_bytes; bool partial_io = false; size_t max_io_pages, max_cached_pages; @@ -1694,9 +1694,10 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, if (!ll_sbi_has_parallel_dio(sbi)) is_parallel_dio = false; - ci_aio = cl_aio_alloc(args->u.normal.via_iocb, - ll_i2info(inode)->lli_clob, NULL); - if (!ci_aio) { + ci_dio_aio = cl_dio_aio_alloc(args->u.normal.via_iocb, + ll_i2info(inode)->lli_clob, + is_aio); + if (!ci_dio_aio) { rc = -ENOMEM; goto out; } @@ -1715,7 +1716,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, partial_io = per_bytes < count; io = vvp_env_thread_io(env); ll_io_init(io, file, iot == CIT_WRITE, args); - io->ci_aio = ci_aio; + io->ci_dio_aio = ci_dio_aio; io->ci_dio_lock = dio_lock; io->ci_ndelay_tried = retried; io->ci_parallel_dio = is_parallel_dio; @@ -1762,12 +1763,8 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, rc = io->ci_result; } - /* N/B: parallel DIO may be disabled during i/o submission; - * if that occurs, async RPCs are resolved before we get here, and this - * wait call completes immediately. - */ if (is_parallel_dio) { - struct cl_sync_io *anchor = &io->ci_aio->cda_sync; + struct cl_sync_io *anchor = &io->ci_dio_aio->cda_sync; /* for dio, EIOCBQUEUED is an implementation detail, * and we don't return it to userspace @@ -1775,6 +1772,11 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, if (rc == -EIOCBQUEUED) rc = 0; + /* N/B: parallel DIO may be disabled during i/o submission; + * if that occurs, I/O shifts to sync, so it's all resolved + * before we get here, and this wait call completes + * immediately. + */ rc2 = cl_sync_io_wait_recycle(env, anchor, 0, 0); if (rc2 < 0) rc = rc2; @@ -1838,24 +1840,29 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, goto restart; } - if (io->ci_aio) { + if (io->ci_dio_aio) { /* * VFS will call aio_complete() if no -EIOCBQUEUED * is returned for AIO, so we can not call aio_complete() * in our end_io(). + * + * NB: This is safe because the atomic_dec_and_lock in + * cl_sync_io_init has implicit memory barriers, so this will + * be seen by whichever thread completes the DIO/AIO, even if + * it's not this one */ if (rc != -EIOCBQUEUED) - io->ci_aio->cda_no_aio_complete = 1; + io->ci_dio_aio->cda_no_aio_complete = 1; /** * Drop one extra reference so that end_io() could be * called for this IO context, we could call it after * we make sure all AIO requests have been proceed. */ - cl_sync_io_note(env, &io->ci_aio->cda_sync, + cl_sync_io_note(env, &io->ci_dio_aio->cda_sync, rc == -EIOCBQUEUED ? 0 : rc); if (!is_aio) { - cl_aio_free(env, io->ci_aio); - io->ci_aio = NULL; + cl_dio_aio_free(env, io->ci_dio_aio, true); + io->ci_dio_aio = NULL; } } diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 7147f0f..0f9ab68 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -202,13 +202,13 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i) static int ll_direct_rw_pages(const struct lu_env *env, struct cl_io *io, size_t size, - int rw, struct inode *inode, struct cl_dio_aio *aio) + int rw, struct inode *inode, struct cl_sub_dio *sdio) { - struct ll_dio_pages *pv = &aio->cda_dio_pages; + struct ll_dio_pages *pv = &sdio->csd_dio_pages; struct cl_page *page; struct cl_2queue *queue = &io->ci_queue; struct cl_object *obj = io->ci_obj; - struct cl_sync_io *anchor = &aio->cda_sync; + struct cl_sync_io *anchor = &sdio->csd_sync; loff_t offset = pv->ldp_file_offset; int io_pages = 0; size_t page_size = cl_page_size(obj); @@ -268,7 +268,7 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i) smp_mb(); rc = cl_io_submit_rw(env, io, iot, queue); if (rc == 0) { - cl_page_list_splice(&queue->c2_qout, &aio->cda_pages); + cl_page_list_splice(&queue->c2_qout, &sdio->csd_pages); } else { atomic_add(-queue->c2_qin.pl_nr, &anchor->csi_sync_nr); @@ -307,13 +307,15 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct cl_io *io; struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - struct cl_dio_aio *ll_aio; - struct cl_dio_aio *ldp_aio; + struct cl_dio_aio *ll_dio_aio; + struct cl_sub_dio *ldp_aio; size_t count = iov_iter_count(iter); ssize_t tot_bytes = 0, result = 0; loff_t file_offset = iocb->ki_pos; int rw = iov_iter_rw(iter); + bool sync_submit = false; struct vvp_io *vio; + ssize_t rc2; /* Check EOF by ourselves */ if (rw == READ && file_offset >= i_size_read(inode)) @@ -343,9 +345,22 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io = lcc->lcc_io; LASSERT(io); - ll_aio = io->ci_aio; - LASSERT(ll_aio); - LASSERT(ll_aio->cda_iocb == iocb); + ll_dio_aio = io->ci_dio_aio; + LASSERT(ll_dio_aio); + LASSERT(ll_dio_aio->cda_iocb == iocb); + + /* We cannot do parallel submission of sub-I/Os - for AIO or regular + * DIO - unless lockless because it causes us to release the lock + * early. + * + * There are also several circumstances in which we must disable + * parallel DIO, so we check if it is enabled. + * + * The check for "is_sync_kiocb" excludes AIO, which does not need to + * be disabled in these situations. + */ + if (io->ci_dio_lock || (is_sync_kiocb(iocb) && !io->ci_parallel_dio)) + sync_submit = true; while (iov_iter_count(iter)) { struct ll_dio_pages *pvec; @@ -360,22 +375,24 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) count = i_size_read(inode) - file_offset; } - /* this aio is freed on completion from cl_sync_io_note, so we - * do not need to directly free the memory here + /* if we are doing sync_submit, then we free this below, + * otherwise it is freed on the final call to cl_sync_io_note + * (either in this function or from a ptlrpcd daemon) */ - ldp_aio = cl_aio_alloc(iocb, ll_i2info(inode)->lli_clob, - ll_aio); + ldp_aio = cl_sub_dio_alloc(ll_dio_aio, sync_submit); if (!ldp_aio) { result = -ENOMEM; goto out; } - pvec = &ldp_aio->cda_dio_pages; + pvec = &ldp_aio->csd_dio_pages; result = ll_get_user_pages(rw, iter, &pages, &pvec->ldp_count, count); if (unlikely(result <= 0)) { - cl_sync_io_note(env, &ldp_aio->cda_sync, result); + cl_sync_io_note(env, &ldp_aio->csd_sync, result); + if (sync_submit) + cl_sub_dio_free(ldp_aio, true); goto out; } @@ -388,8 +405,15 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) /* We've submitted pages and can now remove the extra * reference for that */ - cl_sync_io_note(env, &ldp_aio->cda_sync, result); - + cl_sync_io_note(env, &ldp_aio->csd_sync, result); + + if (sync_submit) { + rc2 = cl_sync_io_wait(env, &ldp_aio->csd_sync, + 0); + if (result == 0 && rc2) + result = rc2; + cl_sub_dio_free(ldp_aio, true); + } if (unlikely(result < 0)) goto out; @@ -399,35 +423,18 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } out: - ll_aio->cda_bytes += tot_bytes; + ll_dio_aio->cda_bytes += tot_bytes; if (rw == WRITE) vio->u.readwrite.vui_written += tot_bytes; else vio->u.readwrite.vui_read += tot_bytes; - /* We cannot do async submission - for AIO or regular DIO - unless - * lockless because it causes us to release the lock early. - * - * There are also several circumstances in which we must disable - * parallel DIO, so we check if it is enabled. - * - * The check for "is_sync_kiocb" excludes AIO, which does not need to - * be disabled in these situations. + /* AIO is not supported on pipes, so we cannot return EIOCBQEUED like + * we normally would for both DIO and AIO here */ - if (io->ci_dio_lock || (is_sync_kiocb(iocb) && !io->ci_parallel_dio)) { - ssize_t rc2; - - /* Wait here rather than doing async submission */ - rc2 = cl_sync_io_wait_recycle(env, &ll_aio->cda_sync, 0, 0); - if (result == 0 && rc2) - result = rc2; - - if (result == 0) - result = tot_bytes; - } else if (result == 0) { + if (result == 0 && !iov_iter_is_pipe(iter)) result = -EIOCBQUEUED; - } return result; } diff --git a/fs/lustre/obdclass/cl_internal.h b/fs/lustre/obdclass/cl_internal.h index db9dd98..eb3d81a 100644 --- a/fs/lustre/obdclass/cl_internal.h +++ b/fs/lustre/obdclass/cl_internal.h @@ -47,6 +47,7 @@ struct cl_thread_info { }; extern struct kmem_cache *cl_dio_aio_kmem; +extern struct kmem_cache *cl_sub_dio_kmem; extern struct kmem_cache *cl_page_kmem_array[16]; extern unsigned short cl_page_kmem_size_array[16]; diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index c388700..06b9eb8 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1072,14 +1072,14 @@ void cl_req_attr_set(const struct lu_env *env, struct cl_object *obj, * anchor->csi_waitq.lock */ void cl_sync_io_init_notify(struct cl_sync_io *anchor, int nr, - struct cl_dio_aio *aio, cl_sync_io_end_t *end) + void *dio_aio, cl_sync_io_end_t *end) { memset(anchor, 0, sizeof(*anchor)); init_waitqueue_head(&anchor->csi_waitq); atomic_set(&anchor->csi_sync_nr, nr); anchor->csi_sync_rc = 0; anchor->csi_end_io = end; - anchor->csi_aio = aio; + anchor->csi_dio_aio = dio_aio; } EXPORT_SYMBOL(cl_sync_io_init_notify); @@ -1117,32 +1117,37 @@ int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, } EXPORT_SYMBOL(cl_sync_io_wait); -static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor) +static void cl_dio_aio_end(const struct lu_env *env, struct cl_sync_io *anchor) { struct cl_dio_aio *aio = container_of(anchor, typeof(*aio), cda_sync); ssize_t ret = anchor->csi_sync_rc; + if (!aio->cda_no_aio_complete) { + aio->cda_iocb->ki_complete(aio->cda_iocb, ret ?: aio->cda_bytes, + 0); + } +} + +static void cl_sub_dio_end(const struct lu_env *env, struct cl_sync_io *anchor) +{ + struct cl_sub_dio *sdio = container_of(anchor, typeof(*sdio), csd_sync); + ssize_t ret = anchor->csi_sync_rc; + /* release pages */ - while (aio->cda_pages.pl_nr > 0) { - struct cl_page *page = cl_page_list_first(&aio->cda_pages); + while (sdio->csd_pages.pl_nr > 0) { + struct cl_page *page = cl_page_list_first(&sdio->csd_pages); cl_page_delete(env, page); - cl_page_list_del(env, &aio->cda_pages, page); + cl_page_list_del(env, &sdio->csd_pages, page); } - if (!aio->cda_no_aio_complete) - aio->cda_iocb->ki_complete(aio->cda_iocb, - ret ?: aio->cda_bytes, 0); - - if (aio->cda_ll_aio) { - ll_release_user_pages(aio->cda_dio_pages.ldp_pages, - aio->cda_dio_pages.ldp_count); - cl_sync_io_note(env, &aio->cda_ll_aio->cda_sync, ret); - } + ll_release_user_pages(sdio->csd_dio_pages.ldp_pages, + sdio->csd_dio_pages.ldp_count); + cl_sync_io_note(env, &sdio->csd_ll_aio->cda_sync, ret); } -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj, - struct cl_dio_aio *ll_aio) +struct cl_dio_aio *cl_dio_aio_alloc(struct kiocb *iocb, struct cl_object *obj, + bool is_aio) { struct cl_dio_aio *aio; @@ -1152,46 +1157,63 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj, * Hold one ref so that it won't be released until * every pages is added. */ - cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_aio_end); - cl_page_list_init(&aio->cda_pages); + cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_dio_aio_end); aio->cda_iocb = iocb; - if (is_sync_kiocb(iocb) || ll_aio) - aio->cda_no_aio_complete = 1; - else - aio->cda_no_aio_complete = 0; - /* in the case of a lower level aio struct (ll_aio is set), or - * true AIO (!is_sync_kiocb()), the memory is freed by - * the daemons calling cl_sync_io_note, because they are the - * last users of the aio struct + aio->cda_no_aio_complete = !is_aio; + /* if this is true AIO, the memory is freed by the last call + * to cl_sync_io_note (when all the I/O is complete), because + * no one is waiting (in the kernel) for this to complete * * in other cases, the last user is cl_sync_io_wait, and in - * that case, the caller frees the aio struct after that call - * completes + * that case, the caller frees the struct after that call */ - if (ll_aio || !is_sync_kiocb(iocb)) - aio->cda_no_aio_free = 0; - else - aio->cda_no_aio_free = 1; + aio->cda_no_sub_free = !is_aio; cl_object_get(obj); aio->cda_obj = obj; - aio->cda_ll_aio = ll_aio; - - if (ll_aio) - atomic_add(1, &ll_aio->cda_sync.csi_sync_nr); } return aio; } -EXPORT_SYMBOL(cl_aio_alloc); +EXPORT_SYMBOL(cl_dio_aio_alloc); -void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio) +struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool nofree) { - if (aio) { + struct cl_sub_dio *sdio; + + sdio = kmem_cache_zalloc(cl_sub_dio_kmem, GFP_NOFS); + if (sdio) { + /* + * Hold one ref so that it won't be released until + * every pages is added. + */ + cl_sync_io_init_notify(&sdio->csd_sync, 1, sdio, + cl_sub_dio_end); + cl_page_list_init(&sdio->csd_pages); + + sdio->csd_ll_aio = ll_aio; + atomic_add(1, &ll_aio->cda_sync.csi_sync_nr); + sdio->csd_no_free = nofree; + } + return sdio; +} +EXPORT_SYMBOL(cl_sub_dio_alloc); + +void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio, + bool always_free) +{ + if (aio && (!aio->cda_no_sub_free || always_free)) { cl_object_put(env, aio->cda_obj); kmem_cache_free(cl_dio_aio_kmem, aio); } } -EXPORT_SYMBOL(cl_aio_free); +EXPORT_SYMBOL(cl_dio_aio_free); + +void cl_sub_dio_free(struct cl_sub_dio *sdio, bool always_free) +{ + if (sdio && (!sdio->csd_no_free || always_free)) + kmem_cache_free(cl_sub_dio_kmem, sdio); +} +EXPORT_SYMBOL(cl_sub_dio_free); /* * ll_release_user_pages - tear down page struct array @@ -1225,7 +1247,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, LASSERT(atomic_read(&anchor->csi_sync_nr) > 0); if (atomic_dec_and_lock(&anchor->csi_sync_nr, &anchor->csi_waitq.lock)) { - struct cl_dio_aio *aio = NULL; + void *dio_aio = NULL; cl_sync_io_end_t *end_io = anchor->csi_end_io; @@ -1238,29 +1260,28 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, if (end_io) end_io(env, anchor); - aio = anchor->csi_aio; + dio_aio = anchor->csi_dio_aio; spin_unlock(&anchor->csi_waitq.lock); - if (aio && !aio->cda_no_aio_free) - cl_aio_free(env, aio); + if (dio_aio) { + if (end_io == cl_dio_aio_end) + cl_dio_aio_free(env, + (struct cl_dio_aio *) dio_aio, + false); + else if (end_io == cl_sub_dio_end) + cl_sub_dio_free((struct cl_sub_dio *) dio_aio, + false); + } } } EXPORT_SYMBOL(cl_sync_io_note); - int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret) { - bool no_aio_free = anchor->csi_aio->cda_no_aio_free; int rc = 0; - /* for true AIO, the daemons running cl_sync_io_note would normally - * free the aio struct, but if we're waiting on it, we need them to not - * do that. This ensures the aio is not freed when we drop the - * reference count to zero in cl_sync_io_note below - */ - anchor->csi_aio->cda_no_aio_free = 1; /* * @anchor was inited as 1 to prevent end_io to be * called before we add all pages for IO, so drop @@ -1280,8 +1301,6 @@ int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, */ atomic_add(1, &anchor->csi_sync_nr); - anchor->csi_aio->cda_no_aio_free = no_aio_free; - return rc; } EXPORT_SYMBOL(cl_sync_io_wait_recycle); diff --git a/fs/lustre/obdclass/cl_object.c b/fs/lustre/obdclass/cl_object.c index 6f87160..28bf1e4 100644 --- a/fs/lustre/obdclass/cl_object.c +++ b/fs/lustre/obdclass/cl_object.c @@ -57,6 +57,7 @@ static struct kmem_cache *cl_env_kmem; struct kmem_cache *cl_dio_aio_kmem; +struct kmem_cache *cl_sub_dio_kmem; struct kmem_cache *cl_page_kmem_array[16]; unsigned short cl_page_kmem_size_array[16]; @@ -989,6 +990,11 @@ struct cl_thread_info *cl_env_info(const struct lu_env *env) .ckd_size = sizeof(struct cl_dio_aio) }, { + .ckd_cache = &cl_sub_dio_kmem, + .ckd_name = "cl_sub_dio_kmem", + .ckd_size = sizeof(struct cl_sub_dio) + }, + { .ckd_cache = NULL } }; From patchwork Tue Sep 6 01:55:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D9A6CECAAD3 for ; Tue, 6 Sep 2022 01:56:52 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7mN3xjrz1y6T; Mon, 5 Sep 2022 18:56:52 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lB55gbz1y6Q for ; Mon, 5 Sep 2022 18:55:50 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C92F8100B007; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C48C737C; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:26 -0400 Message-Id: <1662429337-18737-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 13/24] lustre: sec: use enc pool for bounce pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Take pages from the enc pool so that they can be used for encryption, instead of letting fscrypt allocate a bounce page for every call to the encryption primitives. Pages are taken from the enc pool a whole array at a time. This requires modifying the fscrypt API, so that new functions fscrypt_encrypt_block() and fsrypt_decrypt_block() are exported. These functions take a destination page parameter. Using enc pool for bounce pages is a worthwhile performance win. Here are performance penalties incurred by encryption, without this patch, and with this patch: ||=====================|=====================|| || Performance penalty | Performance penalty || || without patch | with patch || ||==========================================|=====================|| || Bandwidth – write | 30%-35% | 5%-10% large IOs || || | | 15% small IOs || ||------------------------------------------|---------------------|| || Bandwidth – read | 20% | less than 10% || ||------------------------------------------|---------------------|| || Metadata | N/A | 5% || || creat,stat,remove | | || ||==========================================|=====================|| WC-bug-id: https://jira.whamcloud.com/browse/LU-15003 Lustre-commit: f3fe144b8572e9e75b ("LU-15003 sec: use enc pool for bounce pages") Signed-off-by: Sebastien Buisson Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/47149 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin --- fs/crypto/crypto.c | 34 ++-- fs/lustre/include/lustre_sec.h | 3 + fs/lustre/llite/dir.c | 5 +- fs/lustre/osc/osc_request.c | 134 +++++++++++- fs/lustre/ptlrpc/sec_bulk.c | 452 +++++++++++++++++++++++++++++++++++++++-- include/linux/fscrypt.h | 49 ++++- 6 files changed, 632 insertions(+), 45 deletions(-) diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 92123257..de3e040 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -202,9 +202,10 @@ struct page *fscrypt_encrypt_pagecache_blocks(struct page *page, EXPORT_SYMBOL(fscrypt_encrypt_pagecache_blocks); /** - * fscrypt_encrypt_block_inplace() - Encrypt a filesystem block in-place + * fscrypt_encrypt_block() - Cache an encrypted filesystem block in a page * @inode: The inode to which this block belongs - * @page: The page containing the block to encrypt + * @src: The page containing the block to encrypt + * @dst: The page which will contain the encrypted data * @len: Size of block to encrypt. Doesn't need to be a multiple of the * fs block size, but must be a multiple of FS_CRYPTO_BLOCK_SIZE. * @offs: Byte offset within @page at which the block to encrypt begins @@ -215,17 +216,18 @@ struct page *fscrypt_encrypt_pagecache_blocks(struct page *page, * Encrypt a possibly-compressed filesystem block that is located in an * arbitrary page, not necessarily in the original pagecache page. The @inode * and @lblk_num must be specified, as they can't be determined from @page. + * The decrypted data will be stored in @dst. * * Return: 0 on success; -errno on failure */ -int fscrypt_encrypt_block_inplace(const struct inode *inode, struct page *page, - unsigned int len, unsigned int offs, - u64 lblk_num, gfp_t gfp_flags) +int fscrypt_encrypt_block(const struct inode *inode, struct page *src, + struct page *dst, unsigned int len, unsigned int offs, + u64 lblk_num, gfp_t gfp_flags) { - return fscrypt_crypt_block(inode, FS_ENCRYPT, lblk_num, page, page, + return fscrypt_crypt_block(inode, FS_ENCRYPT, lblk_num, src, dst, len, offs, gfp_flags); } -EXPORT_SYMBOL(fscrypt_encrypt_block_inplace); +EXPORT_SYMBOL(fscrypt_encrypt_block); /** * fscrypt_decrypt_pagecache_blocks() - Decrypt filesystem blocks in a @@ -272,9 +274,10 @@ int fscrypt_decrypt_pagecache_blocks(struct page *page, unsigned int len, EXPORT_SYMBOL(fscrypt_decrypt_pagecache_blocks); /** - * fscrypt_decrypt_block_inplace() - Decrypt a filesystem block in-place + * fscrypt_decrypt_block() - Cache a decrypted a filesystem block in a page * @inode: The inode to which this block belongs - * @page: The page containing the block to decrypt + * @src: The page containing the block to decrypt + * @dst: The page which will contain the plain data * @len: Size of block to decrypt. Doesn't need to be a multiple of the * fs block size, but must be a multiple of FS_CRYPTO_BLOCK_SIZE. * @offs: Byte offset within @page at which the block to decrypt begins @@ -284,17 +287,18 @@ int fscrypt_decrypt_pagecache_blocks(struct page *page, unsigned int len, * Decrypt a possibly-compressed filesystem block that is located in an * arbitrary page, not necessarily in the original pagecache page. The @inode * and @lblk_num must be specified, as they can't be determined from @page. + * The encrypted data will be stored in @dst. * * Return: 0 on success; -errno on failure */ -int fscrypt_decrypt_block_inplace(const struct inode *inode, struct page *page, - unsigned int len, unsigned int offs, - u64 lblk_num) +int fscrypt_decrypt_block(const struct inode *inode, struct page *src, + struct page *dst, unsigned int len, unsigned int offs, + u64 lblk_num, gfp_t gfp_flags) { - return fscrypt_crypt_block(inode, FS_DECRYPT, lblk_num, page, page, - len, offs, GFP_NOFS); + return fscrypt_crypt_block(inode, FS_DECRYPT, lblk_num, src, dst, + len, offs, gfp_flags); } -EXPORT_SYMBOL(fscrypt_decrypt_block_inplace); +EXPORT_SYMBOL(fscrypt_decrypt_block); /** * fscrypt_initialize() - allocate major buffers for fs encryption. diff --git a/fs/lustre/include/lustre_sec.h b/fs/lustre/include/lustre_sec.h index e8410e1..7c3c12a 100644 --- a/fs/lustre/include/lustre_sec.h +++ b/fs/lustre/include/lustre_sec.h @@ -1048,6 +1048,9 @@ int sptlrpc_target_export_check(struct obd_export *exp, struct ptlrpc_request *req); /* bulk security api */ +int sptlrpc_enc_pool_add_user(void); +int sptlrpc_enc_pool_get_pages_array(struct page **pa, unsigned int count); +void sptlrpc_enc_pool_put_pages_array(struct page **pa, unsigned int count); void sptlrpc_enc_pool_put_pages(struct ptlrpc_bulk_desc *desc); int get_free_pages_in_pool(void); int pool_is_at_full_capacity(void); diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index aea15f5..bffd34c 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -2292,7 +2292,10 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case FS_IOC_ADD_ENCRYPTION_KEY: if (!ll_sbi_has_encrypt(ll_i2sbi(inode))) return -EOPNOTSUPP; - return fscrypt_ioctl_add_key(file, (void __user *)arg); + rc = fscrypt_ioctl_add_key(file, (void __user *)arg); + if (!rc) + sptlrpc_enc_pool_add_user(); + return rc; case FS_IOC_REMOVE_ENCRYPTION_KEY: if (!ll_sbi_has_encrypt(ll_i2sbi(inode))) return -EOPNOTSUPP; diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 5a4db29..d66185b 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1378,13 +1378,109 @@ static int osc_checksum_bulk_rw(const char *obd_name, return rc; } +/** + * osc_encrypt_pagecache_blocks() - overlay to fscrypt_encrypt_pagecache_blocks + * @srcpage: The locked pagecache page containing the block(s) to encrypt + * @dstpage: The page to put encryption result + * @len: Total size of the block(s) to encrypt. Must be a nonzero + * multiple of the filesystem's block size. + * @offs: Byte offset within @page of the first block to encrypt. Must be + * a multiple of the filesystem's block size. + * @gfp_flags: Memory allocation flags + * + * This overlay function is necessary to be able to provide our own bounce page. + */ +static struct page *osc_encrypt_pagecache_blocks(struct page *srcpage, + struct page *dstpage, + unsigned int len, + unsigned int offs, + gfp_t gfp_flags) +{ + const struct inode *inode = srcpage->mapping->host; + const unsigned int blockbits = inode->i_blkbits; + const unsigned int blocksize = 1 << blockbits; + u64 lblk_num = ((u64)srcpage->index << (PAGE_SHIFT - blockbits)) + + (offs >> blockbits); + unsigned int i; + int err; + + if (unlikely(!dstpage)) + return fscrypt_encrypt_pagecache_blocks(srcpage, len, offs, + gfp_flags); + + if (WARN_ON_ONCE(!PageLocked(srcpage))) + return ERR_PTR(-EINVAL); + + if (WARN_ON_ONCE(len <= 0 || !IS_ALIGNED(len | offs, blocksize))) + return ERR_PTR(-EINVAL); + + /* Set PagePrivate2 for disambiguation in + * osc_finalize_bounce_page(). + * It means cipher page was not allocated by llcrypt. + */ + SetPagePrivate2(dstpage); + + for (i = offs; i < offs + len; i += blocksize, lblk_num++) { + err = fscrypt_encrypt_block(inode, srcpage, dstpage, blocksize, + i, lblk_num, gfp_flags); + if (err) + return ERR_PTR(err); + } + SetPagePrivate(dstpage); + set_page_private(dstpage, (unsigned long)srcpage); + return dstpage; +} + +/** + * osc_finalize_bounce_page() - overlay to llcrypt_finalize_bounce_page + * + * This overlay function is necessary to handle bounce pages + * allocated by ourselves. + */ +static inline void osc_finalize_bounce_page(struct page **pagep) +{ + struct page *page = *pagep; + + /* PagePrivate2 was set in osc_encrypt_pagecache_blocks + * to indicate the cipher page was allocated by ourselves. + * So we must not free it via fscrypt. + */ + if (unlikely(!page || !PagePrivate2(page))) + return fscrypt_finalize_bounce_page(pagep); + + if (fscrypt_is_bounce_page(page)) { + *pagep = fscrypt_pagecache_page(page); + ClearPagePrivate2(page); + set_page_private(page, (unsigned long)NULL); + ClearPagePrivate(page); + } +} + static inline void osc_release_bounce_pages(struct brw_page **pga, u32 page_count) { #ifdef CONFIG_FS_ENCRYPTION - int i; + struct page **pa = NULL; + int i, j = 0; + + if (PageChecked(pga[0]->pg)) { + pa = kvmalloc_array(page_count, sizeof(*pa), + GFP_KERNEL | __GFP_ZERO); + if (!pa) + return; + } for (i = 0; i < page_count; i++) { + /* Bounce pages used by osc_encrypt_pagecache_blocks() + * called from osc_brw_prep_request() + * are identified thanks to the PageChecked flag. + */ + if (PageChecked(pga[i]->pg)) { + if (pa) + pa[j++] = pga[i]->pg; + osc_finalize_bounce_page(&pga[i]->pg); + } + /* Bounce pages allocated by a call to * fscrypt_encrypt_pagecache_blocks() in osc_brw_prep_request() * are identified thanks to the PageChecked flag. @@ -1394,6 +1490,11 @@ static inline void osc_release_bounce_pages(struct brw_page **pga, pga[i]->count -= pga[i]->bp_count_diff; pga[i]->off += pga[i]->bp_off_diff; } + + if (pa) { + sptlrpc_enc_pool_put_pages_array(pa, j); + kvfree(pa); + } #endif } @@ -1445,6 +1546,22 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode) && fscrypt_has_encryption_key(inode)) { + struct page **pa = NULL; + + kvfree(pa); + if (!pa) { + ptlrpc_request_free(req); + return -ENOMEM; + } + + rc = sptlrpc_enc_pool_get_pages_array(pa, page_count); + if (rc) { + CDEBUG(D_SEC, "failed to allocate from enc pool: %d\n", + rc); + ptlrpc_request_free(req); + return rc; + } + for (i = 0; i < page_count; i++) { struct brw_page *brwpg = pga[i]; struct page *data_page = NULL; @@ -1474,9 +1591,10 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, brwpg->pg->index = clpage->cp_page_index; } data_page = - fscrypt_encrypt_pagecache_blocks(brwpg->pg, - nunits, 0, - GFP_NOFS); + osc_encrypt_pagecache_blocks(brwpg->pg, + pa ? pa[i] : NULL, + nunits, 0, + GFP_NOFS); if (directio) { brwpg->pg->mapping = map_orig; brwpg->pg->index = index_orig; @@ -1490,6 +1608,11 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, rc = 0; goto retry_encrypt; } + if (pa) { + sptlrpc_enc_pool_put_pages_array(pa + i, + page_count - i); + kvfree(pa); + } ptlrpc_request_free(req); return rc; } @@ -1515,6 +1638,9 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, brwpg->bp_off_diff = brwpg->off & ~PAGE_MASK; brwpg->off = brwpg->off & PAGE_MASK; } + + if (pa) + kvfree(pa); } else if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode)) { struct osc_async_page *oap = brw_page2oap(pga[0]); struct cl_page *clpage = oap2cl_page(oap); diff --git a/fs/lustre/ptlrpc/sec_bulk.c b/fs/lustre/ptlrpc/sec_bulk.c index b6ae77b..5dad83f 100644 --- a/fs/lustre/ptlrpc/sec_bulk.c +++ b/fs/lustre/ptlrpc/sec_bulk.c @@ -297,14 +297,190 @@ static unsigned long enc_pools_cleanup(struct page ***pools, int npools) return cleaned; } +/* + * merge @npools pointed by @pools which contains @npages new pages + * into current pools. + * + * we have options to avoid most memory copy with some tricks. but we choose + * the simplest way to avoid complexity. It's not frequently called. + */ +static void enc_pools_insert(struct page ***pools, int npools, int npages) +{ + int freeslot; + int op_idx, np_idx, og_idx, ng_idx; + int cur_npools, end_npools; + + LASSERT(npages > 0); + LASSERT(page_pools.epp_total_pages+npages <= page_pools.epp_max_pages); + LASSERT(npages_to_npools(npages) == npools); + LASSERT(page_pools.epp_growing); + + spin_lock(&page_pools.epp_lock); + + /* + * (1) fill all the free slots of current pools. + */ + /* + * free slots are those left by rent pages, and the extra ones with + * index >= total_pages, locate at the tail of last pool. + */ + freeslot = page_pools.epp_total_pages % PAGES_PER_POOL; + if (freeslot != 0) + freeslot = PAGES_PER_POOL - freeslot; + freeslot += page_pools.epp_total_pages - page_pools.epp_free_pages; + + op_idx = page_pools.epp_free_pages / PAGES_PER_POOL; + og_idx = page_pools.epp_free_pages % PAGES_PER_POOL; + np_idx = npools - 1; + ng_idx = (npages - 1) % PAGES_PER_POOL; + + while (freeslot) { + LASSERT(!page_pools.epp_pools[op_idx][og_idx]); + LASSERT(!pools[np_idx][ng_idx]); + + page_pools.epp_pools[op_idx][og_idx] = pools[np_idx][ng_idx]; + pools[np_idx][ng_idx] = NULL; + + freeslot--; + + if (++og_idx == PAGES_PER_POOL) { + op_idx++; + og_idx = 0; + } + if (--ng_idx < 0) { + if (np_idx == 0) + break; + np_idx--; + ng_idx = PAGES_PER_POOL - 1; + } + } + + /* + * (2) add pools if needed. + */ + cur_npools = (page_pools.epp_total_pages + PAGES_PER_POOL - 1) / + PAGES_PER_POOL; + end_npools = (page_pools.epp_total_pages + npages + + PAGES_PER_POOL - 1) / PAGES_PER_POOL; + LASSERT(end_npools <= page_pools.epp_max_pools); + + np_idx = 0; + while (cur_npools < end_npools) { + LASSERT(page_pools.epp_pools[cur_npools] == NULL); + LASSERT(np_idx < npools); + LASSERT(pools[np_idx] != NULL); + + page_pools.epp_pools[cur_npools++] = pools[np_idx]; + pools[np_idx++] = NULL; + } + + page_pools.epp_total_pages += npages; + page_pools.epp_free_pages += npages; + page_pools.epp_st_lowfree = page_pools.epp_free_pages; + + if (page_pools.epp_total_pages > page_pools.epp_st_max_pages) + page_pools.epp_st_max_pages = page_pools.epp_total_pages; + + CDEBUG(D_SEC, "add %d pages to total %lu\n", npages, + page_pools.epp_total_pages); + + spin_unlock(&page_pools.epp_lock); +} + +static int enc_pools_add_pages(int npages) +{ + static DEFINE_MUTEX(add_pages_mutex); + struct page ***pools; + int npools, alloced = 0; + int i, j, rc = -ENOMEM; + + if (npages < PTLRPC_MAX_BRW_PAGES) + npages = PTLRPC_MAX_BRW_PAGES; + + mutex_lock(&add_pages_mutex); + + if (npages + page_pools.epp_total_pages > page_pools.epp_max_pages) + npages = page_pools.epp_max_pages - page_pools.epp_total_pages; + LASSERT(npages > 0); + + page_pools.epp_st_grows++; + + npools = npages_to_npools(npages); + + pools = kvmalloc_array(npools, sizeof(*pools), + GFP_KERNEL | __GFP_ZERO); + if (!pools) + goto out; + + for (i = 0; i < npools; i++) { + pools[i] = kzalloc(PAGE_SIZE, GFP_NOFS); + if (!pools[i]) + goto out_pools; + + for (j = 0; j < PAGES_PER_POOL && alloced < npages; j++) { + pools[i][j] = alloc_page(GFP_NOFS | + __GFP_HIGHMEM); + if (!pools[i][j]) + goto out_pools; + + alloced++; + } + } + LASSERT(alloced == npages); + + enc_pools_insert(pools, npools, npages); + CDEBUG(D_SEC, "added %d pages into pools\n", npages); + rc = 0; + +out_pools: + enc_pools_cleanup(pools, npools); + kvfree(pools); +out: + if (rc) { + page_pools.epp_st_grow_fails++; + CERROR("Failed to allocate %d enc pages\n", npages); + } + + mutex_unlock(&add_pages_mutex); + return rc; +} + static inline void enc_pools_wakeup(void) { assert_spin_locked(&page_pools.epp_lock); - if (unlikely(page_pools.epp_waitqlen)) { - LASSERT(waitqueue_active(&page_pools.epp_waitq)); + /* waitqueue_active */ + if (unlikely(waitqueue_active(&page_pools.epp_waitq))) wake_up(&page_pools.epp_waitq); - } +} + +static int enc_pools_should_grow(int page_needed, time64_t now) +{ + /* + * don't grow if someone else is growing the pools right now, + * or the pools has reached its full capacity + */ + if (page_pools.epp_growing || + page_pools.epp_total_pages == page_pools.epp_max_pages) + return 0; + + /* if total pages is not enough, we need to grow */ + if (page_pools.epp_total_pages < page_needed) + return 1; + + /* + * we wanted to return 0 here if there was a shrink just + * happened a moment ago, but this may cause deadlock if both + * client and ost live on single node. + */ + + /* + * here we perhaps need consider other factors like wait queue + * length, idle index, etc. ? + */ + + /* grow the pools in any other cases */ + return 1; } /* @@ -323,49 +499,287 @@ int pool_is_at_full_capacity(void) return (page_pools.epp_total_pages == page_pools.epp_max_pages); } -void sptlrpc_enc_pool_put_pages(struct ptlrpc_bulk_desc *desc) +static inline struct page **page_from_bulkdesc(void *array, int index) { + struct ptlrpc_bulk_desc *desc = (struct ptlrpc_bulk_desc *)array; + + return &desc->bd_enc_vec[index].bv_page; +} + +static inline struct page **page_from_pagearray(void *array, int index) +{ + struct page **pa = (struct page **)array; + + return &pa[index]; +} + +/* + * we allocate the requested pages atomically. + */ +static inline int __sptlrpc_enc_pool_get_pages(void *array, unsigned int count, + struct page **(*page_from)(void *, int)) +{ + wait_queue_entry_t waitlink; + unsigned long this_idle = -1; + u64 tick_ns = 0; + time64_t now; int p_idx, g_idx; - int i; + int i, rc = 0; - if (!desc->bd_enc_vec) - return; + if (!array || count <= 0 || count > page_pools.epp_max_pages) + return -EINVAL; + + spin_lock(&page_pools.epp_lock); + + page_pools.epp_st_access++; +again: + if (unlikely(page_pools.epp_free_pages < count)) { + if (tick_ns == 0) + tick_ns = ktime_get_ns(); + + now = ktime_get_real_seconds(); + + page_pools.epp_st_missings++; + page_pools.epp_pages_short += count; + + if (enc_pools_should_grow(count, now)) { + page_pools.epp_growing = 1; + + spin_unlock(&page_pools.epp_lock); + enc_pools_add_pages(page_pools.epp_pages_short / 2); + spin_lock(&page_pools.epp_lock); + + page_pools.epp_growing = 0; + + enc_pools_wakeup(); + } else { + if (page_pools.epp_growing) { + if (++page_pools.epp_waitqlen > + page_pools.epp_st_max_wqlen) + page_pools.epp_st_max_wqlen = + page_pools.epp_waitqlen; + + set_current_state(TASK_UNINTERRUPTIBLE); + init_wait(&waitlink); + add_wait_queue(&page_pools.epp_waitq, + &waitlink); + + spin_unlock(&page_pools.epp_lock); + schedule(); + remove_wait_queue(&page_pools.epp_waitq, + &waitlink); + spin_lock(&page_pools.epp_lock); + page_pools.epp_waitqlen--; + } else { + /* + * ptlrpcd thread should not sleep in that case, + * or deadlock may occur! + * Instead, return -ENOMEM so that upper layers + * will put request back in queue. + */ + page_pools.epp_st_outofmem++; + rc = -ENOMEM; + goto out_unlock; + } + } + + if (page_pools.epp_pages_short < count) { + rc = -EPROTO; + goto out_unlock; + } + page_pools.epp_pages_short -= count; + + this_idle = 0; + goto again; + } + + /* record max wait time */ + if (unlikely(tick_ns)) { + ktime_t tick = ktime_sub_ns(ktime_get(), tick_ns); + + if (ktime_after(tick, page_pools.epp_st_max_wait)) + page_pools.epp_st_max_wait = tick; + } + + /* proceed with rest of allocation */ + page_pools.epp_free_pages -= count; + + p_idx = page_pools.epp_free_pages / PAGES_PER_POOL; + g_idx = page_pools.epp_free_pages % PAGES_PER_POOL; + + for (i = 0; i < count; i++) { + struct page **pagep = page_from(array, i); + + if (!page_pools.epp_pools[p_idx][g_idx]) { + rc = -EPROTO; + goto out_unlock; + } + *pagep = page_pools.epp_pools[p_idx][g_idx]; + page_pools.epp_pools[p_idx][g_idx] = NULL; + + if (++g_idx == PAGES_PER_POOL) { + p_idx++; + g_idx = 0; + } + } + + if (page_pools.epp_free_pages < page_pools.epp_st_lowfree) + page_pools.epp_st_lowfree = page_pools.epp_free_pages; + + /* + * new idle index = (old * weight + new) / (weight + 1) + */ + if (this_idle == -1) { + this_idle = page_pools.epp_free_pages * IDLE_IDX_MAX / + page_pools.epp_total_pages; + } + page_pools.epp_idle_idx = (page_pools.epp_idle_idx * IDLE_IDX_WEIGHT + + this_idle) / (IDLE_IDX_WEIGHT + 1); + + page_pools.epp_last_access = ktime_get_seconds(); + +out_unlock: + spin_unlock(&page_pools.epp_lock); + return rc; +} + +int sptlrpc_enc_pool_get_pages(struct ptlrpc_bulk_desc *desc) +{ + int rc; LASSERT(desc->bd_iov_count > 0); + LASSERT(desc->bd_iov_count <= page_pools.epp_max_pages); + + /* resent bulk, enc iov might have been allocated previously */ + if (desc->bd_enc_vec) + return 0; + + desc->bd_enc_vec = kvmalloc_array(desc->bd_iov_count, + sizeof(*desc->bd_enc_vec), + GFP_KERNEL | __GFP_ZERO); + if (!desc->bd_enc_vec) + return -ENOMEM; + + rc = __sptlrpc_enc_pool_get_pages((void *)desc, desc->bd_iov_count, + page_from_bulkdesc); + if (rc) { + kvfree(desc->bd_enc_vec); + desc->bd_enc_vec = NULL; + } + return rc; +} +EXPORT_SYMBOL(sptlrpc_enc_pool_get_pages); + +int sptlrpc_enc_pool_get_pages_array(struct page **pa, unsigned int count) +{ + return __sptlrpc_enc_pool_get_pages((void *)pa, count, + page_from_pagearray); +} +EXPORT_SYMBOL(sptlrpc_enc_pool_get_pages_array); + +static int __sptlrpc_enc_pool_put_pages(void *array, unsigned int count, + struct page **(*page_from)(void *, int)) +{ + int p_idx, g_idx; + int i, rc = 0; + + if (!array || count <= 0) + return -EINVAL; spin_lock(&page_pools.epp_lock); p_idx = page_pools.epp_free_pages / PAGES_PER_POOL; g_idx = page_pools.epp_free_pages % PAGES_PER_POOL; - LASSERT(page_pools.epp_free_pages + desc->bd_iov_count <= - page_pools.epp_total_pages); - LASSERT(page_pools.epp_pools[p_idx]); + if (page_pools.epp_free_pages + count > page_pools.epp_total_pages) { + rc = -EPROTO; + goto out_unlock; + } + if (!page_pools.epp_pools[p_idx]) { + rc = -EPROTO; + goto out_unlock; + } - for (i = 0; i < desc->bd_iov_count; i++) { - LASSERT(desc->bd_enc_vec[i].bv_page); - LASSERT(g_idx != 0 || page_pools.epp_pools[p_idx]); - LASSERT(!page_pools.epp_pools[p_idx][g_idx]); + for (i = 0; i < count; i++) { + struct page **pagep = page_from(array, i); - page_pools.epp_pools[p_idx][g_idx] = - desc->bd_enc_vec[i].bv_page; + if (!*pagep || + page_pools.epp_pools[p_idx][g_idx]) { + rc = -EPROTO; + goto out_unlock; + } + page_pools.epp_pools[p_idx][g_idx] = *pagep; if (++g_idx == PAGES_PER_POOL) { p_idx++; g_idx = 0; } } - page_pools.epp_free_pages += desc->bd_iov_count; - + page_pools.epp_free_pages += count; enc_pools_wakeup(); +out_unlock: spin_unlock(&page_pools.epp_lock); + return rc; +} + +void sptlrpc_enc_pool_put_pages(struct ptlrpc_bulk_desc *desc) +{ + int rc; + + if (!desc->bd_enc_vec) + return; + + rc = __sptlrpc_enc_pool_put_pages((void *)desc, desc->bd_iov_count, + page_from_bulkdesc); + if (rc) + CDEBUG(D_SEC, "error putting pages in enc pool: %d\n", rc); - kfree(desc->bd_enc_vec); + kvfree(desc->bd_enc_vec); desc->bd_enc_vec = NULL; } +void sptlrpc_enc_pool_put_pages_array(struct page **pa, unsigned int count) +{ + int rc; + + rc = __sptlrpc_enc_pool_put_pages((void *)pa, count, + page_from_pagearray); + if (rc) + CDEBUG(D_SEC, "error putting pages in enc pool: %d\n", rc); +} +EXPORT_SYMBOL(sptlrpc_enc_pool_put_pages_array); + +/* + * we don't do much stuff for add_user/del_user anymore, except adding some + * initial pages in add_user() if current pools are empty, rest would be + * handled by the pools's self-adaption. + */ +int sptlrpc_enc_pool_add_user(void) +{ + int need_grow = 0; + + spin_lock(&page_pools.epp_lock); + if (page_pools.epp_growing == 0 && page_pools.epp_total_pages == 0) { + page_pools.epp_growing = 1; + need_grow = 1; + } + spin_unlock(&page_pools.epp_lock); + + if (need_grow) { + enc_pools_add_pages(PTLRPC_MAX_BRW_PAGES + + PTLRPC_MAX_BRW_PAGES); + + spin_lock(&page_pools.epp_lock); + page_pools.epp_growing = 0; + enc_pools_wakeup(); + spin_unlock(&page_pools.epp_lock); + } + return 0; +} +EXPORT_SYMBOL(sptlrpc_enc_pool_add_user); + static inline void enc_pools_alloc(void) { LASSERT(page_pools.epp_max_pools); diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 991ff85..be0490f 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -128,15 +128,35 @@ struct page *fscrypt_encrypt_pagecache_blocks(struct page *page, unsigned int len, unsigned int offs, gfp_t gfp_flags); -int fscrypt_encrypt_block_inplace(const struct inode *inode, struct page *page, - unsigned int len, unsigned int offs, - u64 lblk_num, gfp_t gfp_flags); +int fscrypt_encrypt_block(const struct inode *inode, struct page *src, + struct page *dst, unsigned int len, + unsigned int offs, u64 lblk_num, gfp_t gfp_flags); + +static inline int fscrypt_encrypt_block_inplace(const struct inode *inode, + struct page *page, + unsigned int len, + unsigned int offs, + u64 lblk_num) +{ + return fscrypt_encrypt_block(inode, page, page, len, offs, lblk_num, + GFP_NOFS); +} int fscrypt_decrypt_pagecache_blocks(struct page *page, unsigned int len, unsigned int offs); -int fscrypt_decrypt_block_inplace(const struct inode *inode, struct page *page, - unsigned int len, unsigned int offs, - u64 lblk_num); + +int fscrypt_decrypt_block(const struct inode *inode, struct page *src, + struct page *dst, unsigned int len, + unsigned int offs, u64 lblk_num, gfp_t gfp_flags); + +static inline int fscrypt_decrypt_block_inplace(const struct inode *inode, + struct page *page, + unsigned int len, unsigned int offs, + u64 lblk_num) +{ + return fscrypt_decrypt_block(inode, page, page, len, offs, lblk_num, + GFP_NOFS); +} static inline bool fscrypt_is_bounce_page(struct page *page) { @@ -272,6 +292,15 @@ static inline struct page *fscrypt_encrypt_pagecache_blocks(struct page *page, return ERR_PTR(-EOPNOTSUPP); } +static inline int fscrypt_encrypt_block(const struct inode *inode, + struct page *src, struct page *dst, + unsigned int len, + unsigned int offs, u64 lblk_num, + gfp_t gfp_flags) +{ + return -EOPNOTSUPP; +} + static inline int fscrypt_encrypt_block_inplace(const struct inode *inode, struct page *page, unsigned int len, @@ -296,6 +325,14 @@ static inline int fscrypt_decrypt_block_inplace(const struct inode *inode, return -EOPNOTSUPP; } +static inline int fscrypt_decrypt_block(const struct inode *inode, + struct page *src, struct page *dst, + unsigned int len, + unsigned int offs, u64 lblk_num) +{ + return -EOPNOTSUPP; +} + static inline bool fscrypt_is_bounce_page(struct page *page) { return false; From patchwork Tue Sep 6 01:55:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966730 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 80E89C6FA86 for ; Tue, 6 Sep 2022 01:56:03 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lR18bVz1yBR; Mon, 5 Sep 2022 18:56:03 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lD1tWvz1y6t for ; Mon, 5 Sep 2022 18:55:52 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CEB17100B009; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C8A3358992; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:27 -0400 Message-Id: <1662429337-18737-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/24] lustre: llite: Unify range unlock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Correct parallel_dio condition and unify range unlock code block. WC-bug-id: https://jira.whamcloud.com/browse/LU-15811 Lustre-commit: 36c34af60767bd5da ("LU-15811 llite: Unify range unlock") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/48000 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 92e450f..8152821 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1750,20 +1750,12 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, ll_cl_add(inode, env, io, LCC_RW); rc = cl_io_loop(env, io); ll_cl_remove(inode, env); - - if (range_locked && !is_parallel_dio) { - CDEBUG(D_VFSTRACE, "Range unlock [%llu, %llu]\n", - range.rl_start, - range.rl_last); - range_unlock(&lli->lli_write_tree, &range); - range_locked = false; - } } else { /* cl_io_rw_init() handled IO */ rc = io->ci_result; } - if (is_parallel_dio) { + if (io->ci_dio_aio && !is_aio) { struct cl_sync_io *anchor = &io->ci_dio_aio->cda_sync; /* for dio, EIOCBQUEUED is an implementation detail, @@ -1780,11 +1772,14 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, rc2 = cl_sync_io_wait_recycle(env, anchor, 0, 0); if (rc2 < 0) rc = rc2; + } - if (range_locked) { - range_unlock(&lli->lli_write_tree, &range); - range_locked = false; - } + if (range_locked) { + CDEBUG(D_VFSTRACE, "Range lock [%llu, %llu]\n", + range.rl_start, + range.rl_last); + range_unlock(&lli->lli_write_tree, &range); + range_locked = false; } /* From patchwork Tue Sep 6 01:55:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5DEBAECAAD3 for ; Tue, 6 Sep 2022 01:56:26 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lt0SYjz1yBf; Mon, 5 Sep 2022 18:56:26 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lC3cW3z1y5s for ; Mon, 5 Sep 2022 18:55:51 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CE9FE100B008; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CC5E958994; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:28 -0400 Message-Id: <1662429337-18737-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/24] lustre: llite: Refactor DIO/AIO free code X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Refactor the DIO/AIO free code and add some asserts. This removes a potential use-after-free in the freeing code. WC-bug-id: https://jira.whamcloud.com/browse/LU-15811 Lustre-commit: f1c8ac1156ebea2b8 ("LU-15811 llite: Refactor DIO/AIO free code") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/48115 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 11 +++++----- fs/lustre/llite/file.c | 3 ++- fs/lustre/llite/rw26.c | 9 ++++++--- fs/lustre/obdclass/cl_io.c | 47 ++++++++++++++++++++++++++----------------- 4 files changed, 41 insertions(+), 29 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 0f28cfe..3253f1c 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2547,10 +2547,9 @@ int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret); struct cl_dio_aio *cl_dio_aio_alloc(struct kiocb *iocb, struct cl_object *obj, bool is_aio); -struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool nofree); -void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio, - bool always_free); -void cl_sub_dio_free(struct cl_sub_dio *sdio, bool nofree); +struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool sync); +void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio); +void cl_sub_dio_free(struct cl_sub_dio *sdio); static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr) { @@ -2598,7 +2597,7 @@ struct cl_dio_aio { struct kiocb *cda_iocb; ssize_t cda_bytes; unsigned int cda_no_aio_complete:1, - cda_no_sub_free:1; + cda_creator_free:1; }; /* Sub-dio used for splitting DIO (and AIO, because AIO is DIO) according to @@ -2610,7 +2609,7 @@ struct cl_sub_dio { ssize_t csd_bytes; struct cl_dio_aio *csd_ll_aio; struct ll_dio_pages csd_dio_pages; - unsigned int csd_no_free:1; + unsigned int csd_creator_free:1; }; void ll_release_user_pages(struct page **pages, int npages); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 8152821..115ee69 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1856,7 +1856,8 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, cl_sync_io_note(env, &io->ci_dio_aio->cda_sync, rc == -EIOCBQUEUED ? 0 : rc); if (!is_aio) { - cl_dio_aio_free(env, io->ci_dio_aio, true); + LASSERT(io->ci_dio_aio->cda_creator_free); + cl_dio_aio_free(env, io->ci_dio_aio); io->ci_dio_aio = NULL; } } diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 0f9ab68..4f2e68e 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -391,8 +391,10 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) &pvec->ldp_count, count); if (unlikely(result <= 0)) { cl_sync_io_note(env, &ldp_aio->csd_sync, result); - if (sync_submit) - cl_sub_dio_free(ldp_aio, true); + if (sync_submit) { + LASSERT(ldp_aio->csd_creator_free); + cl_sub_dio_free(ldp_aio); + } goto out; } @@ -412,7 +414,8 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) 0); if (result == 0 && rc2) result = rc2; - cl_sub_dio_free(ldp_aio, true); + LASSERT(ldp_aio->csd_creator_free); + cl_sub_dio_free(ldp_aio); } if (unlikely(result < 0)) goto out; diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 06b9eb8..ee82260 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1165,9 +1165,9 @@ struct cl_dio_aio *cl_dio_aio_alloc(struct kiocb *iocb, struct cl_object *obj, * no one is waiting (in the kernel) for this to complete * * in other cases, the last user is cl_sync_io_wait, and in - * that case, the caller frees the struct after that call + * that case, the creator frees the struct after that call */ - aio->cda_no_sub_free = !is_aio; + aio->cda_creator_free = !is_aio; cl_object_get(obj); aio->cda_obj = obj; @@ -1176,7 +1176,7 @@ struct cl_dio_aio *cl_dio_aio_alloc(struct kiocb *iocb, struct cl_object *obj, } EXPORT_SYMBOL(cl_dio_aio_alloc); -struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool nofree) +struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool sync) { struct cl_sub_dio *sdio; @@ -1192,25 +1192,24 @@ struct cl_sub_dio *cl_sub_dio_alloc(struct cl_dio_aio *ll_aio, bool nofree) sdio->csd_ll_aio = ll_aio; atomic_add(1, &ll_aio->cda_sync.csi_sync_nr); - sdio->csd_no_free = nofree; + sdio->csd_creator_free = sync; } return sdio; } EXPORT_SYMBOL(cl_sub_dio_alloc); -void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio, - bool always_free) +void cl_dio_aio_free(const struct lu_env *env, struct cl_dio_aio *aio) { - if (aio && (!aio->cda_no_sub_free || always_free)) { + if (aio) { cl_object_put(env, aio->cda_obj); kmem_cache_free(cl_dio_aio_kmem, aio); } } EXPORT_SYMBOL(cl_dio_aio_free); -void cl_sub_dio_free(struct cl_sub_dio *sdio, bool always_free) +void cl_sub_dio_free(struct cl_sub_dio *sdio) { - if (sdio && (!sdio->csd_no_free || always_free)) + if (sdio) kmem_cache_free(cl_sub_dio_kmem, sdio); } EXPORT_SYMBOL(cl_sub_dio_free); @@ -1247,7 +1246,10 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, LASSERT(atomic_read(&anchor->csi_sync_nr) > 0); if (atomic_dec_and_lock(&anchor->csi_sync_nr, &anchor->csi_waitq.lock)) { - void *dio_aio = NULL; + struct cl_sub_dio *sub_dio_aio = NULL; + struct cl_dio_aio *dio_aio = NULL; + void *csi_dio_aio = NULL; + bool creator_free = true; cl_sync_io_end_t *end_io = anchor->csi_end_io; @@ -1260,18 +1262,25 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, if (end_io) end_io(env, anchor); - dio_aio = anchor->csi_dio_aio; + csi_dio_aio = anchor->csi_dio_aio; + sub_dio_aio = csi_dio_aio; + dio_aio = csi_dio_aio; + + if (csi_dio_aio && end_io == cl_dio_aio_end) + creator_free = dio_aio->cda_creator_free; + else if (csi_dio_aio && end_io == cl_sub_dio_end) + creator_free = sub_dio_aio->csd_creator_free; spin_unlock(&anchor->csi_waitq.lock); - if (dio_aio) { - if (end_io == cl_dio_aio_end) - cl_dio_aio_free(env, - (struct cl_dio_aio *) dio_aio, - false); - else if (end_io == cl_sub_dio_end) - cl_sub_dio_free((struct cl_sub_dio *) dio_aio, - false); + if (csi_dio_aio) { + if (end_io == cl_dio_aio_end) { + if (!creator_free) + cl_dio_aio_free(env, dio_aio); + } else if (end_io == cl_sub_dio_end) { + if (!creator_free) + cl_sub_dio_free(sub_dio_aio); + } } } } From patchwork Tue Sep 6 01:55:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966732 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3BE4EC6FA86 for ; Tue, 6 Sep 2022 01:56:10 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lY56mVz1y6k; Mon, 5 Sep 2022 18:56:09 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lD6nksz1y2G for ; Mon, 5 Sep 2022 18:55:52 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D85B8100B02F; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D0E5658999; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:29 -0400 Message-Id: <1662429337-18737-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/24] lnet: Use fatal NI if none other available X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov Allow NI in fatal state to be selected for sending if there are no NIs in non-fatal state. HPE-bug-id: LUS-11019 WC-bug-id: https://jira.whamcloud.com/browse/LU-14955 Lustre-commit: ff3322fd0c77a8042 ("LU-14955 lnet: Use fatal NI if none other available") Signed-off-by: Serguei Smirnov Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/44746 Reviewed-by: Cyril Bordage Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 6ad0963..3b20a1b7 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1449,6 +1449,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, int best_healthv; u32 best_sel_prio; unsigned int best_dev_prio; + int best_ni_fatal; unsigned int dev_idx = UINT_MAX; bool gpu = md ? (md->md_flags & LNET_MD_FLAG_GPU) : false; @@ -1470,6 +1471,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_dev_prio = UINT_MAX; best_credits = INT_MIN; best_healthv = 0; + best_ni_fatal = true; } else { best_dev_prio = lnet_dev_prio_of_md(best_ni, dev_idx); shortest_distance = cfs_cpt_distance(lnet_cpt_table(), md_cpt, @@ -1477,6 +1479,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_credits = atomic_read(&best_ni->ni_tx_credits); best_healthv = atomic_read(&best_ni->ni_healthv); best_sel_prio = best_ni->ni_sel_priority; + best_ni_fatal = atomic_read(&best_ni->ni_fatal_error_on); } while ((ni = lnet_get_next_ni_locked(local_net, ni))) { @@ -1510,7 +1513,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!gpu && distance < lnet_numa_range) distance = lnet_numa_range; - /* * Select on health, selection policy, direct dma prio, + /** Select on health, selection policy, direct dma prio, * shorter distance, available credits, then round-robin. */ if (ni_fatal) @@ -1518,16 +1521,24 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (best_ni) CDEBUG(D_NET, - "compare ni %s [c:%d, d:%d, s:%d, p:%u, g:%u, h:%d] with best_ni %s [c:%d, d:%d, s:%d, p:%u, g:%u, h:%d]\n", - libcfs_nidstr(&ni->ni_nid), ni_credits, distance, + "compare ni %s [f:%s, c:%d, d:%d, s:%d, p:%u, g:%u, h:%d] with best_ni %s [f:%s, c:%d, d:%d, s:%d, p:%u, g:%u, h:%d]\n", + libcfs_nidstr(&ni->ni_nid), + ni_fatal ? "y" : "n", ni_credits, distance, ni->ni_seq, ni_sel_prio, ni_dev_prio, ni_healthv, - (best_ni) ? libcfs_nidstr(&best_ni->ni_nid) - : "not selected", best_credits, shortest_distance, + (best_ni) ? libcfs_nidstr(&best_ni->ni_nid) : + "not selected", + best_ni_fatal ? "y" : "n", best_credits, + shortest_distance, (best_ni) ? best_ni->ni_seq : 0, best_sel_prio, best_dev_prio, best_healthv); else goto select_ni; + if (ni_fatal && !best_ni_fatal) + continue; + else if (!ni_fatal && best_ni_fatal) + goto select_ni; + if (ni_healthv < best_healthv) continue; else if (ni_healthv > best_healthv) @@ -1563,6 +1574,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_healthv = ni_healthv; best_ni = ni; best_credits = ni_credits; + best_ni_fatal = ni_fatal; } CDEBUG(D_NET, "selected best_ni %s\n", From patchwork Tue Sep 6 01:55:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966740 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E139ECAAD3 for ; Tue, 6 Sep 2022 01:56:29 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lx2Lwyz1y52; Mon, 5 Sep 2022 18:56:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lF58GJz1y6h for ; Mon, 5 Sep 2022 18:55:53 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D916D100B030; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D5109589A0; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:30 -0400 Message-Id: <1662429337-18737-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/24] lnet: LNet peer aliveness broken X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The peer health feature used on LNet routers is intended to detect if a peer is dead or alive by keeping track of the last time it received a message from the peer. If the last alive value is outside of a configurable interval then the peer is considered dead and the router will drop messages to that peer rather than attempt to send to it. This feature no longer works as intended because even if the last alive value is outside the interval the router will still consider the peer NI to be alive if the health value of the NI and the cached status both indicate the peer NI is alive. So even if a router has not received any messages from the client in days, as long as the router thinks the peer's interfaces are healthy then it will consider the peer alive. This doesn't make any sense as peers are supposed to regularly ping the router, and if they don't do so then they should not be considered alive. Fix the issue by relying solely on the last alive value to determine peer aliveness. Do not consider the health value or cached status when determining whether to drop the message. lnet_peer_alive_locked() has single caller that only checks whether zero was returned. We can convert lnet_peer_alive_locked() to return bool rather than int. Rename lnet_peer_alive_locked() to lnet_check_message_drop() to better reflect the purpose of the function. The return value is inverted to reflect the name change. WC-bug-id: https://jira.whamcloud.com/browse/LU-15595 Lustre-commit: caf6095ade66f70d4 ("LU-15595 lnet: LNet peer aliveness broken") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/46623 Reviewed-by: Serguei Smirnov Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 51 +++++++++++++++--------------------------------- 1 file changed, 16 insertions(+), 35 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 3b20a1b7..ec8be8f 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -572,55 +572,37 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return rc; } +/* returns true if this message should be dropped */ static bool -lnet_is_peer_deadline_passed(struct lnet_peer_ni *lpni, time64_t now) +lnet_check_message_drop(struct lnet_ni *ni, struct lnet_peer_ni *lpni, + struct lnet_msg *msg) { - time64_t deadline; - - deadline = lpni->lpni_last_alive + - lpni->lpni_net->net_tunables.lct_peer_timeout; - - /* assume peer_ni is alive as long as we're within the configured - * peer timeout - */ - if (deadline > now) + if (msg->msg_target.pid & LNET_PID_USERFLAG) return false; - return true; -} - -/* - * NB: returns 1 when alive, 0 when dead, negative when error; - * may drop the lnet_net_lock - */ -static int -lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer_ni *lpni, - struct lnet_msg *msg) -{ - time64_t now = ktime_get_seconds(); - if (!lnet_peer_aliveness_enabled(lpni)) - return -ENODEV; + return false; - /* - * If we're resending a message, let's attempt to send it even if + /* If we're resending a message, let's attempt to send it even if * the peer is down to fulfill our resend quota on the message */ if (msg->msg_retry_count > 0) - return 1; + return false; /* try and send recovery messages irregardless */ if (msg->msg_recovery) - return 1; + return false; /* always send any responses */ if (lnet_msg_is_response(msg)) - return 1; - - if (!lnet_is_peer_deadline_passed(lpni, now)) - return true; + return false; - return lnet_is_peer_ni_alive(lpni); + /* assume peer_ni is alive as long as we're within the configured + * peer timeout + */ + return ktime_get_seconds() >= + (lpni->lpni_last_alive + + lpni->lpni_net->net_tunables.lct_peer_timeout); } /** @@ -653,8 +635,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, LASSERT(!nid_same(&lp->lpni_nid, &the_lnet.ln_loni->ni_nid)); /* NB 'lp' is always the next hop */ - if (!(msg->msg_target.pid & LNET_PID_USERFLAG) && - !lnet_peer_alive_locked(ni, lp, msg)) { + if (lnet_check_message_drop(ni, lp, msg)) { the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++; the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length += msg->msg_len; From patchwork Tue Sep 6 01:55:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2FA7EECAAA1 for ; Tue, 6 Sep 2022 01:56:20 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7ll5t26z1y2M; Mon, 5 Sep 2022 18:56:19 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lG3P2Wz1y6h for ; Mon, 5 Sep 2022 18:55:54 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DAB49100B031; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D8C6137C; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:31 -0400 Message-Id: <1662429337-18737-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/24] lnet: Correct net selection for router ping X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn lnet_find_best_ni_on_local_net() contains logic for restricting the NI selection to a net specified by lnet_peer::lp_disc_net_id. The purpose of this is to ensure that LNet peers ping every interface on a router at a regular interval as part of the LNet router health feature. However, this logic is flawed because lnet_msg_discovery() is used to determine whether the message being sent is a discovery message, but that function actually determines whether a given message can _trigger_ discovery. Introduce a new function, lnet_msg_is_ping(), which determines whether a given lnet_msg is a GET on the LNET_RESERVED_PORTAL. Modify lnet_find_best_ni_on_local_net() to restrict NI selection to lp_disc_net_id iff: 1. lp_disc_net_id is non-zero 2. The peer has the LNET_PEER_RTR_DISCOVERY flag set. 3. lnet_msg_is_ping() returns true HPE-bug-id: LUS-11017 WC-bug-id: https://jira.whamcloud.com/browse/LU-15929 Lustre-commit: 2431e099b143a4c7e ("LU-15929 lnet: Correct net selection for router ping") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/47527 Reviewed-by: Frank Sehr Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index ec8be8f..3c9602e 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1577,7 +1577,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return false; } -/* +/* Can the specified message trigger peer discovery? + * * Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery, * because such traffic is required to perform discovery. We therefore * exclude all GET and PUT on that portal. We also exclude all ACK and @@ -1591,6 +1592,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, return !(lnet_reserved_msg(msg) || lnet_msg_is_response(msg)); } +/* Is the specified message an LNet ping? + */ +static bool +lnet_msg_is_ping(struct lnet_msg *msg) +{ + if (msg->msg_type == LNET_MSG_GET && + msg->msg_hdr.msg.get.ptl_index == LNET_RESERVED_PORTAL) + return true; + + return false; +} + #define SRC_SPEC 0x0001 #define SRC_ANY 0x0002 #define LOCAL_DST 0x0004 @@ -2228,10 +2241,14 @@ struct lnet_ni * u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY; u32 net_sel_prio; - /* if this is a discovery message and lp_disc_net_id is - * specified then use that net to send the discovery on. + /* If lp_disc_net_id is set, this peer is a router undergoing + * discovery, and this message is an LNet ping, then this may be a + * discovery message and we need to select an NI on the peer net + * specified by lp_disc_net_id */ - if (discovery && peer->lp_disc_net_id) { + if (peer->lp_disc_net_id && + (peer->lp_state & LNET_PEER_RTR_DISCOVERY) && + lnet_msg_is_ping(msg)) { best_lpn = lnet_peer_get_net_locked(peer, peer->lp_disc_net_id); if (best_lpn && lnet_get_net_locked(best_lpn->lpn_net_id)) goto select_best_ni; From patchwork Tue Sep 6 01:55:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966734 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6AAEECAAA1 for ; Tue, 6 Sep 2022 01:56:13 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7ld27CQz1yBj; Mon, 5 Sep 2022 18:56:13 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lH1mQVz1y7H for ; Mon, 5 Sep 2022 18:55:55 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E0BA2100B033; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DCAE558992; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:32 -0400 Message-Id: <1662429337-18737-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/24] lnet: Remove duplicate checks for peer sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Callers of lnet_inc_lpni_healthv_locked() and lnet_dec_healthv_locked() currently check whether the parent peer has a peer specific sensitivity defined. To remove this code duplication, this logic is rolled into lnet_inc_lpni_healthv_locked() and lnet_dec_lpni_healthv_locked(). The latter is a new wrapper around lnet_dec_healthv_locked(). lnet_dec_healthv_locked() is changed to return a bool indicating whether the health value was actually modified so that the peer net health is only updated when the peer NI health actually changes. HPE-bug-id: LUS-11018 WC-bug-id: https://jira.whamcloud.com/browse/LU-15930 Lustre-commit: 84b1ca8618129d4e3 ("LU-15930 lnet: Remove duplicate checks for peer sensitivity") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/46626 Reviewed-by: Cyril Bordage Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 44 +++++++++++++++++++++++++++++++++++++++---- net/lnet/lnet/lib-msg.c | 37 ++---------------------------------- net/lnet/lnet/router.c | 9 +-------- 3 files changed, 43 insertions(+), 47 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 2900c05..1d9b8c7 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -1108,13 +1108,49 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, return mod; } +static bool +lnet_dec_healthv_locked(atomic_t *healthv, int sensitivity) +{ + int h = atomic_read(healthv); + + if (h == 0) + return false; + + if (h < sensitivity) + h = 0; + else + h -= sensitivity; + + return (atomic_xchg(healthv, h) != h); +} + static inline void -lnet_inc_lpni_healthv_locked(struct lnet_peer_ni *lpni, int value) +lnet_dec_lpni_healthv_locked(struct lnet_peer_ni *lpni) { - /* only adjust the net health if the lpni health value changed */ - if (lnet_atomic_add_unless_max(&lpni->lpni_healthv, value, - LNET_MAX_HEALTH_VALUE)) + /* If there is a health sensitivity in the peer then use that + * instead of the globally set one. + * only adjust the net health if the lpni health value changed + */ + if (lnet_dec_healthv_locked(&lpni->lpni_healthv, + lpni->lpni_peer_net->lpn_peer->lp_health_sensitivity ? : + lnet_health_sensitivity)) { lnet_update_peer_net_healthv(lpni); + } +} + +static inline void +lnet_inc_lpni_healthv_locked(struct lnet_peer_ni *lpni) +{ + /* If there is a health sensitivity in the peer then use that + * instead of the globally set one. + * only adjust the net health if the lpni health value changed + */ + if (lnet_atomic_add_unless_max(&lpni->lpni_healthv, + lpni->lpni_peer_net->lpn_peer->lp_health_sensitivity ? : + lnet_health_sensitivity, + LNET_MAX_HEALTH_VALUE)) { + lnet_update_peer_net_healthv(lpni); + } } static inline void diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 95695b2..3b1f6a3 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -443,19 +443,6 @@ return 0; } -static void -lnet_dec_healthv_locked(atomic_t *healthv, int sensitivity) -{ - int h = atomic_read(healthv); - - if (h < sensitivity) { - atomic_set(healthv, 0); - } else { - h -= sensitivity; - atomic_set(healthv, h); - } -} - /* must hold net_lock/0 */ void lnet_ni_add_to_recoveryq_locked(struct lnet_ni *ni, @@ -505,20 +492,7 @@ void lnet_handle_remote_failure_locked(struct lnet_peer_ni *lpni) { - u32 sensitivity = lnet_health_sensitivity; - u32 lp_sensitivity; - - /* If there is a health sensitivity in the peer then use that - * instead of the globally set one. - */ - lp_sensitivity = lpni->lpni_peer_net->lpn_peer->lp_health_sensitivity; - if (lp_sensitivity) - sensitivity = lp_sensitivity; - - lnet_dec_healthv_locked(&lpni->lpni_healthv, sensitivity); - - /* update the peer_net's health value */ - lnet_update_peer_net_healthv(lpni); + lnet_dec_lpni_healthv_locked(lpni); /* add the peer NI to the recovery queue if it's not already there * and it's health value is actually below the maximum. It's @@ -914,14 +888,7 @@ lnet_set_lpni_healthv_locked(lpni, LNET_MAX_HEALTH_VALUE); } else { - struct lnet_peer *lpn_peer; - u32 sensitivity; - - lpn_peer = lpni->lpni_peer_net->lpn_peer; - sensitivity = lpn_peer->lp_health_sensitivity ? - lpn_peer->lp_health_sensitivity : - lnet_health_sensitivity; - lnet_inc_lpni_healthv_locked(lpni, sensitivity); + lnet_inc_lpni_healthv_locked(lpni); /* This peer NI may have previously aged out * of recovery. Now that we've received a * message from it, we can continue recovery diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 98707e9..146647c 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1761,14 +1761,7 @@ bool lnet_router_checker_active(void) lnet_set_lpni_healthv_locked(lpni, LNET_MAX_HEALTH_VALUE); } else { - struct lnet_peer *lpn_peer; - u32 sensitivity; - - lpn_peer = lpni->lpni_peer_net->lpn_peer; - sensitivity = lpn_peer->lp_health_sensitivity; - lnet_inc_lpni_healthv_locked(lpni, - (sensitivity) ? sensitivity : - lnet_health_sensitivity); + lnet_inc_lpni_healthv_locked(lpni); } } else if (reset) { lpni->lpni_ns_status = LNET_NI_STATUS_DOWN; From patchwork Tue Sep 6 01:55:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F25EFC6FA89 for ; Tue, 6 Sep 2022 01:56:13 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7ld3JsXz1y6D; Mon, 5 Sep 2022 18:56:13 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lJ05y6z1y68 for ; Mon, 5 Sep 2022 18:55:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E168C100B034; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E065958994; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:33 -0400 Message-Id: <1662429337-18737-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/24] lustre: obdclass: use consistent stats units X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Use consistent stats units, since some were "usec" and others "usecs". Most stats already use LPROCFS_TYPE_* to encode type stats type, so use this to provide units for those stats, and only explicitly provide strings for the few stats that don't match the commonly-used units. This also reduces the number of repeat static strings in the modules. WC-bug-id: https://jira.whamcloud.com/browse/LU-15642 Lustre-commit: b515c6ec2ab84598c ("LU-15642 obdclass: use consistent stats units") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/46833 Reviewed-by: Jian Yu Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lprocfs_status.h | 27 +++++++++++------- fs/lustre/ldlm/ldlm_pool.c | 34 ++++++++++++++-------- fs/lustre/ldlm/ldlm_resource.c | 3 +- fs/lustre/llite/lproc_llite.c | 31 ++++++++------------ fs/lustre/obdclass/lprocfs_status.c | 56 +++++++++++++++++++++++++++++++++---- fs/lustre/obdclass/lu_object.c | 17 ++++------- fs/lustre/ptlrpc/lproc_ptlrpc.c | 32 ++++++++++----------- 7 files changed, 122 insertions(+), 78 deletions(-) diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h index 3e86e8e..5cea77d 100644 --- a/fs/lustre/include/lprocfs_status.h +++ b/fs/lustre/include/lprocfs_status.h @@ -126,18 +126,22 @@ struct rename_stats { * multiply per counter increment. */ -enum { +enum lprocfs_counter_config { LPROCFS_CNTR_EXTERNALLOCK = 0x0001, LPROCFS_CNTR_AVGMINMAX = 0x0002, LPROCFS_CNTR_STDDEV = 0x0004, - /* counter data type */ - LPROCFS_TYPE_REQS = 0x0100, + /* counter unit type */ + LPROCFS_TYPE_REQS = 0x0000, /* default if config = 0 */ LPROCFS_TYPE_BYTES = 0x0200, LPROCFS_TYPE_PAGES = 0x0400, - LPROCFS_TYPE_USEC = 0x0800, + LPROCFS_TYPE_LOCKS = 0x0500, + LPROCFS_TYPE_LOCKSPS = 0x0600, + LPROCFS_TYPE_SECS = 0x0700, + LPROCFS_TYPE_USECS = 0x0800, + LPROCFS_TYPE_MASK = 0x0f00, - LPROCFS_TYPE_LATENCY = LPROCFS_TYPE_USEC | + LPROCFS_TYPE_LATENCY = LPROCFS_TYPE_USECS | LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV, LPROCFS_TYPE_BYTES_FULL = LPROCFS_TYPE_BYTES | @@ -148,9 +152,9 @@ enum { #define LC_MIN_INIT ((~(u64)0) >> 1) struct lprocfs_counter_header { - unsigned int lc_config; - const char *lc_name; /* must be static */ - const char *lc_units; /* must be static */ + enum lprocfs_counter_config lc_config; + const char *lc_name; /* must be static */ + const char *lc_units; /* must be static */ }; struct lprocfs_counter { @@ -435,8 +439,11 @@ int ldebugfs_alloc_md_stats(struct obd_device *obd, unsigned int num_private_stats); void ldebugfs_free_md_stats(struct obd_device *obd); void lprocfs_counter_init(struct lprocfs_stats *stats, int index, - unsigned int conf, const char *name, - const char *units); + enum lprocfs_counter_config config, + const char *name); +void lprocfs_counter_init_units(struct lprocfs_stats *stats, int index, + enum lprocfs_counter_config config, + const char *name, const char *units); extern const struct file_operations lprocfs_stats_seq_fops; /* lprocfs_status.c */ diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c index 155b585..8c6491d 100644 --- a/fs/lustre/ldlm/ldlm_pool.c +++ b/fs/lustre/ldlm/ldlm_pool.c @@ -605,27 +605,37 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl) } lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANTED_STAT, - LPROCFS_CNTR_AVGMINMAX, "granted", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "granted"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_STAT, - LPROCFS_CNTR_AVGMINMAX, "grant", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "grant"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_CANCEL_STAT, - LPROCFS_CNTR_AVGMINMAX, "cancel", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "cancel"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_RATE_STAT, - LPROCFS_CNTR_AVGMINMAX, "grant_rate", "locks/s"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKSPS, + "grant_rate"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_CANCEL_RATE_STAT, - LPROCFS_CNTR_AVGMINMAX, "cancel_rate", "locks/s"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKSPS, + "cancel_rate"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_PLAN_STAT, - LPROCFS_CNTR_AVGMINMAX, "grant_plan", "locks/s"); - lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SLV_STAT, - LPROCFS_CNTR_AVGMINMAX, "slv", "slv"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKSPS, + "grant_plan"); + lprocfs_counter_init_units(pl->pl_stats, LDLM_POOL_SLV_STAT, + LPROCFS_CNTR_AVGMINMAX, "slv", "lock.secs"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SHRINK_REQTD_STAT, - LPROCFS_CNTR_AVGMINMAX, "shrink_request", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "shrink_request"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SHRINK_FREED_STAT, - LPROCFS_CNTR_AVGMINMAX, "shrink_freed", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "shrink_freed"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_RECALC_STAT, - LPROCFS_CNTR_AVGMINMAX, "recalc_freed", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "recalc_freed"); lprocfs_counter_init(pl->pl_stats, LDLM_POOL_TIMING_STAT, - LPROCFS_CNTR_AVGMINMAX, "recalc_timing", "sec"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_SECS, + "recalc_timing"); debugfs_create_file("stats", 0644, pl->pl_debugfs_entry, pl->pl_stats, &lprocfs_stats_seq_fops); diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c index a189dbd..866f31d 100644 --- a/fs/lustre/ldlm/ldlm_resource.c +++ b/fs/lustre/ldlm/ldlm_resource.c @@ -455,7 +455,8 @@ static int ldlm_namespace_sysfs_register(struct ldlm_namespace *ns) } lprocfs_counter_init(ns->ns_stats, LDLM_NSS_LOCKS, - LPROCFS_CNTR_AVGMINMAX, "locks", "locks"); + LPROCFS_CNTR_AVGMINMAX | LPROCFS_TYPE_LOCKS, + "locks"); return err; } diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 1391828..a57d7bb 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -36,6 +36,7 @@ #include #include "llite_internal.h" +#include "lprocfs_status.h" #include "vvp_internal.h" static struct kobject *llite_kobj; @@ -1772,9 +1773,9 @@ static void sbi_kobj_release(struct kobject *kobj) }; static const struct llite_file_opcode { - u32 opcode; - u32 type; - const char *opname; + u32 lfo_opcode; + enum lprocfs_counter_config lfo_config; + const char *lfo_opname; } llite_opcode_table[LPROC_LL_FILE_OPCODES] = { /* file operation */ { LPROC_LL_READ_BYTES, LPROCFS_TYPE_BYTES_FULL, "read_bytes" }, @@ -1790,8 +1791,7 @@ static void sbi_kobj_release(struct kobject *kobj) { LPROC_LL_LLSEEK, LPROCFS_TYPE_LATENCY, "seek" }, { LPROC_LL_FSYNC, LPROCFS_TYPE_LATENCY, "fsync" }, { LPROC_LL_READDIR, LPROCFS_TYPE_LATENCY, "readdir" }, - { LPROC_LL_INODE_OCOUNT, LPROCFS_TYPE_REQS | - LPROCFS_CNTR_AVGMINMAX | + { LPROC_LL_INODE_OCOUNT, LPROCFS_TYPE_REQS | LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV, "opencount" }, { LPROC_LL_INODE_OPCLTM, LPROCFS_TYPE_LATENCY, "openclosetime" }, /* inode operation */ @@ -1893,20 +1893,11 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name) } /* do counter init */ - for (id = 0; id < LPROC_LL_FILE_OPCODES; id++) { - u32 type = llite_opcode_table[id].type; - void *ptr = "unknown"; - - if (type & LPROCFS_TYPE_REQS) - ptr = "reqs"; - else if (type & LPROCFS_TYPE_BYTES) - ptr = "bytes"; - else if (type & LPROCFS_TYPE_USEC) - ptr = "usec"; + for (id = 0; id < LPROC_LL_FILE_OPCODES; id++) lprocfs_counter_init(sbi->ll_stats, - llite_opcode_table[id].opcode, type, - llite_opcode_table[id].opname, ptr); - } + llite_opcode_table[id].lfo_opcode, + llite_opcode_table[id].lfo_config, + llite_opcode_table[id].lfo_opname); debugfs_create_file("stats", 0644, sbi->ll_debugfs_entry, sbi->ll_stats, &lprocfs_stats_seq_fops); @@ -1919,8 +1910,8 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name) } for (id = 0; id < ARRAY_SIZE(ra_stat_string); id++) - lprocfs_counter_init(sbi->ll_ra_stats, id, 0, - ra_stat_string[id], "pages"); + lprocfs_counter_init(sbi->ll_ra_stats, id, LPROCFS_TYPE_PAGES, + ra_stat_string[id]); debugfs_create_file("read_ahead_stats", 0644, sbi->ll_debugfs_entry, sbi->ll_ra_stats, &lprocfs_stats_seq_fops); diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index d80e7bd..7ce20b6 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -1448,9 +1448,41 @@ static int lprocfs_stats_seq_open(struct inode *inode, struct file *file) }; EXPORT_SYMBOL_GPL(lprocfs_stats_seq_fops); -void lprocfs_counter_init(struct lprocfs_stats *stats, int index, - unsigned int conf, const char *name, - const char *units) +static const char *lprocfs_counter_config_units(const char *name, + enum lprocfs_counter_config config) +{ + const char *units; + + switch (config & LPROCFS_TYPE_MASK) { + default: + units = "reqs"; + break; + case LPROCFS_TYPE_BYTES: + units = "bytes"; + break; + case LPROCFS_TYPE_PAGES: + units = "pages"; + break; + case LPROCFS_TYPE_LOCKS: + units = "locks"; + break; + case LPROCFS_TYPE_LOCKSPS: + units = "locks/s"; + break; + case LPROCFS_TYPE_SECS: + units = "secs"; + break; + case LPROCFS_TYPE_USECS: + units = "usecs"; + break; + } + + return units; +} + +void lprocfs_counter_init_units(struct lprocfs_stats *stats, int index, + enum lprocfs_counter_config config, + const char *name, const char *units) { struct lprocfs_counter_header *header; struct lprocfs_counter *percpu_cntr; @@ -1462,7 +1494,7 @@ void lprocfs_counter_init(struct lprocfs_stats *stats, int index, LASSERTF(header, "Failed to allocate stats header:[%d]%s/%s\n", index, name, units); - header->lc_config = conf; + header->lc_config = config; header->lc_name = name; header->lc_units = units; @@ -1481,6 +1513,15 @@ void lprocfs_counter_init(struct lprocfs_stats *stats, int index, } lprocfs_stats_unlock(stats, LPROCFS_GET_NUM_CPU, &flags); } +EXPORT_SYMBOL(lprocfs_counter_init_units); + +void lprocfs_counter_init(struct lprocfs_stats *stats, int index, + enum lprocfs_counter_config config, + const char *name) +{ + lprocfs_counter_init_units(stats, index, config, name, + lprocfs_counter_config_units(name, config)); +} EXPORT_SYMBOL(lprocfs_counter_init); static const char * const mps_stats[] = { @@ -1524,7 +1565,8 @@ int ldebugfs_alloc_md_stats(struct obd_device *obd, return -ENOMEM; for (i = 0; i < ARRAY_SIZE(mps_stats); i++) { - lprocfs_counter_init(stats, i, 0, mps_stats[i], "reqs"); + lprocfs_counter_init(stats, i, LPROCFS_TYPE_REQS, + mps_stats[i]); if (!stats->ls_cnt_header[i].lc_name) { CERROR("Missing md_stat initializer md_op operation at offset %d. Aborting.\n", i); @@ -1577,7 +1619,9 @@ s64 lprocfs_read_helper(struct lprocfs_counter *lc, ret = lc->lc_max; break; case LPROCFS_FIELDS_FLAGS_AVG: - ret = (lc->lc_max - lc->lc_min) / 2; + ret = div64_u64((flags & LPROCFS_STATS_FLAG_IRQ_SAFE ? + lc->lc_sum_irq : 0) + lc->lc_sum, + lc->lc_count); break; case LPROCFS_FIELDS_FLAGS_SUMSQUARE: ret = lc->lc_sumsquare; diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c index 25b47d8..7ecd0c4 100644 --- a/fs/lustre/obdclass/lu_object.c +++ b/fs/lustre/obdclass/lu_object.c @@ -1099,18 +1099,13 @@ int lu_site_init(struct lu_site *s, struct lu_device *top) return -ENOMEM; } - lprocfs_counter_init(s->ls_stats, LU_SS_CREATED, - 0, "created", "created"); - lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_HIT, - 0, "cache_hit", "cache_hit"); - lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_MISS, - 0, "cache_miss", "cache_miss"); - lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_RACE, - 0, "cache_race", "cache_race"); + lprocfs_counter_init(s->ls_stats, LU_SS_CREATED, 0, "created"); + lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_HIT, 0, "cache_hit"); + lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_MISS, 0, "cache_miss"); + lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_RACE, 0, "cache_race"); lprocfs_counter_init(s->ls_stats, LU_SS_CACHE_DEATH_RACE, - 0, "cache_death_race", "cache_death_race"); - lprocfs_counter_init(s->ls_stats, LU_SS_LRU_PURGED, - 0, "lru_purged", "lru_purged"); + 0, "cache_death_race"); + lprocfs_counter_init(s->ls_stats, LU_SS_LRU_PURGED, 0, "lru_purged"); INIT_LIST_HEAD(&s->ls_linkage); s->ls_top_dev = top; diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index b2daf1f..52010cb 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -191,9 +191,9 @@ static const char *ll_eopcode2str(u32 opcode) { struct dentry *svc_debugfs_entry; struct lprocfs_stats *svc_stats; + enum lprocfs_counter_config config = LPROCFS_CNTR_AVGMINMAX | + LPROCFS_CNTR_STDDEV; int i; - unsigned int svc_counter_config = LPROCFS_CNTR_AVGMINMAX | - LPROCFS_CNTR_STDDEV; LASSERT(!*debugfs_root_ret); LASSERT(!*stats_ret); @@ -209,37 +209,33 @@ static const char *ll_eopcode2str(u32 opcode) svc_debugfs_entry = root; lprocfs_counter_init(svc_stats, PTLRPC_REQWAIT_CNTR, - svc_counter_config, "req_waittime", "usec"); + config | LPROCFS_TYPE_USECS, "req_waittime"); lprocfs_counter_init(svc_stats, PTLRPC_REQQDEPTH_CNTR, - svc_counter_config, "req_qdepth", "reqs"); + config | LPROCFS_TYPE_REQS, "req_qdepth"); lprocfs_counter_init(svc_stats, PTLRPC_REQACTIVE_CNTR, - svc_counter_config, "req_active", "reqs"); + config | LPROCFS_TYPE_REQS, "req_active"); lprocfs_counter_init(svc_stats, PTLRPC_TIMEOUT, - svc_counter_config, "req_timeout", "sec"); - lprocfs_counter_init(svc_stats, PTLRPC_REQBUF_AVAIL_CNTR, - svc_counter_config, "reqbuf_avail", "bufs"); + config | LPROCFS_TYPE_SECS, "req_timeout"); + lprocfs_counter_init_units(svc_stats, PTLRPC_REQBUF_AVAIL_CNTR, + config, "reqbuf_avail", "bufs"); for (i = 0; i < EXTRA_LAST_OPC; i++) { - char *units; + enum lprocfs_counter_config extra_type = LPROCFS_TYPE_REQS; switch (i) { case BRW_WRITE_BYTES: case BRW_READ_BYTES: - units = "bytes"; - break; - default: - units = "reqs"; + extra_type = LPROCFS_TYPE_BYTES; break; } lprocfs_counter_init(svc_stats, PTLRPC_LAST_CNTR + i, - svc_counter_config, - ll_eopcode2str(i), units); + config | extra_type, ll_eopcode2str(i)); } for (i = 0; i < LUSTRE_MAX_OPCODES; i++) { u32 opcode = ll_rpc_opcode_table[i].opcode; - lprocfs_counter_init(svc_stats, - EXTRA_MAX_OPCODES + i, svc_counter_config, - ll_opcode2str(opcode), "usec"); + lprocfs_counter_init(svc_stats, EXTRA_MAX_OPCODES + i, + config | LPROCFS_TYPE_USECS, + ll_opcode2str(opcode)); } debugfs_create_file("stats", 0644, svc_debugfs_entry, svc_stats, From patchwork Tue Sep 6 01:55:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966743 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1C08FECAAA1 for ; Tue, 6 Sep 2022 01:56:36 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7m361Chz1yCj; Mon, 5 Sep 2022 18:56:35 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lJ5YHxz1y6M for ; Mon, 5 Sep 2022 18:55:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E69CD100B035; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E5ACC37C; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:34 -0400 Message-Id: <1662429337-18737-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/24] lnet: Memory leak on adding existing interface X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Frank Sehr , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Frank Sehr In the function lnet_dyn_add_ni an lnet_ni structure is allocated. In case of an error the function returns without freeing the memory of the structure. Added handling of possible lnet_net structure memory leaks. WC-bug-id: https://jira.whamcloud.com/browse/LU-16081 Lustre-commit: 26beb8664f4533a6e ("LU-16081 lnet: Memory leak on adding existing interface") Signed-off-by: Frank Sehr Reviewed-on: https://review.whamcloud.com/48173 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0449136..9bf2860 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3551,25 +3551,34 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf) return -ENOMEM; for (i = 0; i < conf->lic_ncpts; i++) { - if (conf->lic_cpts[i] >= LNET_CPT_NUMBER) + if (conf->lic_cpts[i] >= LNET_CPT_NUMBER) { + lnet_net_free(net); return -EINVAL; + } } ni = lnet_ni_alloc_w_cpt_array(net, conf->lic_cpts, conf->lic_ncpts, conf->lic_ni_intf); - if (!ni) + if (!ni) { + lnet_net_free(net); return -ENOMEM; + } lnet_set_tune_defaults(tun); mutex_lock(&the_lnet.ln_api_mutex); - if (the_lnet.ln_state != LNET_STATE_RUNNING) + if (the_lnet.ln_state != LNET_STATE_RUNNING) { + lnet_net_free(net); rc = -ESHUTDOWN; - else + } else { rc = lnet_add_net_common(net, tun); + } mutex_unlock(&the_lnet.ln_api_mutex); + if (rc) + lnet_ni_free(ni); + return rc; } From patchwork Tue Sep 6 01:55:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2FBEBECAAD3 for ; Tue, 6 Sep 2022 01:56:32 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lz6PWsz1yBh; Mon, 5 Sep 2022 18:56:31 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lK3DtFz1y6G for ; Mon, 5 Sep 2022 18:55:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id EAF10100B036; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E97CA58999; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:35 -0400 Message-Id: <1662429337-18737-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/24] lustre: sec: fix detection of SELinux enforcement X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson For newer kernels, for which selinux_is_enabled() does not exist anymore, the only way to find out if SELinux is enforced when initializing the security context is to fetch the length of the security attribute name. If it is 0, we conclude SELinux is disabled. WC-bug-id: https://jira.whamcloud.com/browse/LU-16012 Lustre-commit: 155cbc22ba4f758cf ("LU-16012 sec: fix detection of SELinux enforcement") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/48049 Reviewed-by: Jian Yu Reviewed-by: Yingjin Qian Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 3 ++- fs/lustre/llite/llite_internal.h | 3 ++- fs/lustre/llite/namei.c | 6 ++++-- fs/lustre/llite/xattr_security.c | 12 +++++++++++- 4 files changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index bffd34c..9e7812d 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -513,7 +513,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, * to determine the security context for the file. So our fake * dentry should be real enough for this purpose. */ - err = ll_dentry_init_security(&dentry, mode, &dentry.d_name, + err = ll_dentry_init_security(parent, + &dentry, mode, &dentry.d_name, &op_data->op_file_secctx_name, &op_data->op_file_secctx, &op_data->op_file_secctx_size); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 227b944..6d85b96 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -447,7 +447,8 @@ static inline void obd_connect_set_secctx(struct obd_connect_data *data) #endif } -int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name, +int ll_dentry_init_security(struct inode *parent, struct dentry *dentry, + int mode, struct qstr *name, const char **secctx_name, void **secctx, u32 *secctx_size); int ll_inode_init_security(struct dentry *dentry, struct inode *inode, diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index a08b1c1..d382554 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -891,7 +891,8 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry, if (it->it_op & IT_CREAT && test_bit(LL_SBI_FILE_SECCTX, ll_i2sbi(parent)->ll_flags)) { - rc = ll_dentry_init_security(dentry, it->it_create_mode, + rc = ll_dentry_init_security(parent, + dentry, it->it_create_mode, &dentry->d_name, &op_data->op_file_secctx_name, &op_data->op_file_secctx, @@ -1570,7 +1571,8 @@ static int ll_new_node(struct inode *dir, struct dentry *dchild, ll_qos_mkdir_prep(op_data, dir); if (test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags)) { - err = ll_dentry_init_security(dchild, mode, &dchild->d_name, + err = ll_dentry_init_security(dir, + dchild, mode, &dchild->d_name, &op_data->op_file_secctx_name, &op_data->op_file_secctx, &op_data->op_file_secctx_size); diff --git a/fs/lustre/llite/xattr_security.c b/fs/lustre/llite/xattr_security.c index f14021d..39229d3 100644 --- a/fs/lustre/llite/xattr_security.c +++ b/fs/lustre/llite/xattr_security.c @@ -38,7 +38,8 @@ /* * Check for LL_SBI_FILE_SECCTX before calling. */ -int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name, +int ll_dentry_init_security(struct inode *parent, struct dentry *dentry, + int mode, struct qstr *name, const char **secctx_name, void **secctx, u32 *secctx_size) { @@ -58,6 +59,15 @@ int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name, * from SELinux. */ + /* fetch length of security xattr name */ + rc = security_inode_listsecurity(parent, NULL, 0); + /* xattr name length == 0 means SELinux is disabled */ + if (rc == 0) + return 0; + /* we support SELinux only */ + if (rc != strlen(XATTR_NAME_SELINUX) + 1) + return -EOPNOTSUPP; + rc = security_dentry_init_security(dentry, mode, name, secctx, secctx_size); /* Usually, security_dentry_init_security() returns -EOPNOTSUPP when From patchwork Tue Sep 6 01:55:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DF3A0ECAAD3 for ; Tue, 6 Sep 2022 01:56:22 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7lp4qJTz1y57; Mon, 5 Sep 2022 18:56:22 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lL1Tk4z1yBr for ; Mon, 5 Sep 2022 18:55:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id F1134100B053; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EE69158992; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:36 -0400 Message-Id: <1662429337-18737-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 23/24] lustre: idl: add checks for OBD_CONNECT flags X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Make it harder to accidentally declare OBD_CONNECT flags without properly defining their names. Otherwise, this can cause serious compatibility problems if two features are using the same flag. Make it clear whom to contact when reserving a new feature flag. WC-bug-id: https://jira.whamcloud.com/browse/LU-1904 Lustre-commit: d851381ea69472448 ("LU-1904 idl: add checks for OBD_CONNECT flags") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/48053 Reviewed-by: Lai Siyao Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lprocfs_status.c | 189 +++++++++++---------- include/uapi/linux/lustre/lustre_idl.h | 291 +++++++++++++++------------------ 2 files changed, 227 insertions(+), 253 deletions(-) diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 7ce20b6..64d7cc48c 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -43,101 +43,100 @@ #include #include -static const char * const obd_connect_names[] = { - /* flags names */ - "read_only", - "lov_index", - "connect_from_mds", - "write_grant", - "server_lock", - "version", - "request_portal", - "acl", - "xattr", - "create_on_write", - "truncate_lock", - "initial_transno", - "inode_bit_locks", - "join_file(obsolete)", - "getattr_by_fid", - "no_oh_for_devices", - "remote_client", - "remote_client_by_force", - "max_byte_per_rpc", - "64bit_qdata", - "mds_capability", - "oss_capability", - "early_lock_cancel", - "som", - "adaptive_timeouts", - "lru_resize", - "mds_mds_connection", - "real_conn", - "change_qunit_size", - "alt_checksum_algorithm", - "fid_is_enabled", - "version_recovery", - "pools", - "grant_shrink", - "skip_orphan", - "large_ea", - "full20", - "layout_lock", - "64bithash", - "object_max_bytes", - "imp_recov", - "jobstats", - "umask", - "einprogress", - "grant_param", - "flock_owner", - "lvb_type", - "nanoseconds_times", - "lightweight_conn", - "short_io", - "pingless", - "flock_deadlock", - "disp_stripe", - "open_by_fid", - "lfsck", - "unknown", - "unlink_close", - "multi_mod_rpcs", - "dir_stripe", - "subtree", - "lockahead", - "bulk_mbits", - "compact_obdo", - "second_flags", - /* flags2 names */ - "file_secctx", /* 0x01 */ - "lockaheadv2", /* 0x02 */ - "dir_migrate", /* 0x04 */ - "sum_statfs", /* 0x08 */ - "overstriping", /* 0x10 */ - "flr", /* 0x20 */ - "wbc", /* 0x40 */ - "lock_convert", /* 0x80 */ - "archive_id_array", /* 0x100 */ - "increasing_xid", /* 0x200 */ - "selinux_policy", /* 0x400 */ - "lsom", /* 0x800 */ - "pcc", /* 0x1000 */ - "crush", /* 0x2000 */ - "async_discard", /* 0x4000 */ - "client_encryption", /* 0x8000 */ - "fidmap", /* 0x10000 */ - "getattr_pfid", /* 0x20000 */ - "lseek", /* 0x40000 */ - "dom_lvb", /* 0x80000 */ - "reply_mbits", /* 0x100000 */ - "mode_convert", /* 0x200000 */ - "batch_rpc", /* 0x400000 */ - "pcc_ro", /* 0x800000 */ - "mne_nid_type", /* 0x1000000 */ - "lock_contend", /* 0x2000000 */ - "atomic_open_lock", /* 0x4000000 */ - "name_encryption", /* 0x8000000 */ +static const char *const obd_connect_names[] = { + "read_only", /* 0x01 */ + "lov_index", /* 0x02 */ + "connect_from_mds", /* 0x03 */ + "write_grant", /* 0x04 */ + "server_lock", /* 0x10 */ + "version", /* 0x20 */ + "request_portal", /* 0x40 */ + "acl", /* 0x80 */ + "xattr", /* 0x100 */ + "create_on_write", /* 0x200 */ + "truncate_lock", /* 0x400 */ + "initial_transno", /* 0x800 */ + "inode_bit_locks", /* 0x1000 */ + "barrier", /* 0x2000 */ + "getattr_by_fid", /* 0x4000 */ + "no_oh_for_devices", /* 0x8000 */ + "remote_client", /* 0x10000 */ + "remote_client_by_force", /* 0x20000 */ + "max_byte_per_rpc", /* 0x40000 */ + "64bit_qdata", /* 0x80000 */ + "mds_capability", /* 0x100000 */ + "oss_capability", /* 0x200000 */ + "early_lock_cancel", /* 0x400000 */ + "som", /* 0x800000 */ + "adaptive_timeouts", /* 0x1000000 */ + "lru_resize", /* 0x2000000 */ + "mds_mds_connection", /* 0x4000000 */ + "real_conn", /* 0x8000000 */ + "change_qunit_size", /* 0x10000000 */ + "alt_checksum_algorithm", /* 0x20000000 */ + "fid_is_enabled", /* 0x40000000 */ + "version_recovery", /* 0x80000000 */ + "pools", /* 0x100000000 */ + "grant_shrink", /* 0x200000000 */ + "skip_orphan", /* 0x400000000 */ + "large_ea", /* 0x800000000 */ + "full20", /* 0x1000000000 */ + "layout_lock", /* 0x2000000000 */ + "64bithash", /* 0x4000000000 */ + "object_max_bytes", /* 0x8000000000 */ + "imp_recov", /* 0x10000000000 */ + "jobstats", /* 0x20000000000 */ + "umask", /* 0x40000000000 */ + "einprogress", /* 0x80000000000 */ + "grant_param", /* 0x100000000000 */ + "flock_owner", /* 0x200000000000 */ + "lvb_type", /* 0x400000000000 */ + "nanoseconds_times", /* 0x800000000000 */ + "lightweight_conn", /* 0x1000000000000 */ + "short_io", /* 0x2000000000000 */ + "pingless", /* 0x4000000000000 */ + "flock_deadlock", /* 0x8000000000000 */ + "disp_stripe", /* 0x10000000000000 */ + "open_by_fid", /* 0x20000000000000 */ + "lfsck", /* 0x40000000000000 */ + "unknown", /* 0x80000000000000 */ + "unlink_close", /* 0x100000000000000 */ + "multi_mod_rpcs", /* 0x200000000000000 */ + "dir_stripe", /* 0x400000000000000 */ + "subtree", /* 0x800000000000000 */ + "lockahead", /* 0x1000000000000000 */ + "bulk_mbits", /* 0x2000000000000000 */ + "compact_obdo", /* 0x4000000000000000 */ + "second_flags", /* 0x8000000000000000 */ + /* ocd_connect_flags2 names */ + "file_secctx", /* 0x01 */ + "lockaheadv2", /* 0x02 */ + "dir_migrate", /* 0x04 */ + "sum_statfs", /* 0x08 */ + "overstriping", /* 0x10 */ + "flr", /* 0x20 */ + "wbc", /* 0x40 */ + "lock_convert", /* 0x80 */ + "archive_id_array", /* 0x100 */ + "increasing_xid", /* 0x200 */ + "selinux_policy", /* 0x400 */ + "lsom", /* 0x800 */ + "pcc", /* 0x1000 */ + "crush", /* 0x2000 */ + "async_discard", /* 0x4000 */ + "client_encryption", /* 0x8000 */ + "fidmap", /* 0x10000 */ + "getattr_pfid", /* 0x20000 */ + "lseek", /* 0x40000 */ + "dom_lvb", /* 0x80000 */ + "reply_mbits", /* 0x100000 */ + "mode_convert", /* 0x200000 */ + "batch_rpc", /* 0x400000 */ + "pcc_ro", /* 0x800000 */ + "mne_nid_type", /* 0x1000000 */ + "lock_contend", /* 0x2000000 */ + "atomic_open_lock", /* 0x4000000 */ + "name_encryption", /* 0x8000000 */ NULL }; diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 319dc81d..475151c 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -681,147 +681,122 @@ struct ptlrpc_body_v2 { #define MSG_PTLRPC_HEADER_OFF 31 /* Connect flags */ -#define OBD_CONNECT_RDONLY 0x1ULL /*client has read-only access*/ -#define OBD_CONNECT_INDEX 0x2ULL /*connect specific LOV idx */ -#define OBD_CONNECT_MDS 0x4ULL /*connect from MDT to OST */ -#define OBD_CONNECT_GRANT 0x8ULL /*OSC gets grant at connect */ -#define OBD_CONNECT_SRVLOCK 0x10ULL /*server takes locks for cli */ -#define OBD_CONNECT_VERSION 0x20ULL /*Lustre versions in ocd */ -#define OBD_CONNECT_REQPORTAL 0x40ULL /*Separate non-IO req portal */ -#define OBD_CONNECT_ACL 0x80ULL /*access control lists */ -#define OBD_CONNECT_XATTR 0x100ULL /*client use extended attr */ -#define OBD_CONNECT_LARGE_ACL 0x200ULL /* more than 32 ACL entries */ -/* was OBD_CONNECT_TRUNCLOCK 0x400ULL *locks on server for punch */ +#define OBD_CONNECT_RDONLY 0x1ULL /* client is read-only */ +#define OBD_CONNECT_INDEX 0x2ULL /* connect to LOV idx */ +#define OBD_CONNECT_MDS 0x4ULL /* connect MDT to OST */ +#define OBD_CONNECT_GRANT 0x8ULL /* fetch grant connect */ +#define OBD_CONNECT_SRVLOCK 0x10ULL /* server lock for RPC */ +#define OBD_CONNECT_VERSION 0x20ULL /* versions in OCD */ +#define OBD_CONNECT_REQPORTAL 0x40ULL /* non-IO portal */ +#define OBD_CONNECT_ACL 0x80ULL /* access control list */ +#define OBD_CONNECT_XATTR 0x100ULL /* extended attributes */ +#define OBD_CONNECT_LARGE_ACL 0x200ULL /* over 32 ACL entries */ +/* was OBD_CONNECT_TRUNCLOCK 0x400ULL * server locks punch */ /* temporary reuse until 2.21.53 to indicate pre-2.15 client, see LU-15478 */ -#define OBD_CONNECT_OLD_FALLOC 0x400ULL /* missing o_valid flags */ -#define OBD_CONNECT_TRANSNO 0x800ULL /*replay sends init transno */ -#define OBD_CONNECT_IBITS 0x1000ULL /* not checked in 2.11+ */ -#define OBD_CONNECT_BARRIER 0x2000ULL /* write barrier. Resevered to - * avoid use on client. - */ -#define OBD_CONNECT_ATTRFID 0x4000ULL /*Server can GetAttr By Fid*/ -#define OBD_CONNECT_NODEVOH 0x8000ULL /*No open hndl on specl nodes*/ -#define OBD_CONNECT_RMT_CLIENT 0x10000ULL /* Remote client, never used - * in production. Removed in - * 2.9. Keep this flag to - * avoid reuse. - */ -#define OBD_CONNECT_RMT_CLIENT_FORCE 0x20000ULL /* Remote client by force, - * never used in production. - * Removed in 2.9. Keep this - * flag to avoid reuse - */ -#define OBD_CONNECT_BRW_SIZE 0x40000ULL /*Max bytes per rpc */ -#define OBD_CONNECT_QUOTA64 0x80000ULL /*Not used since 2.4 */ -#define OBD_CONNECT_MDS_CAPA 0x100000ULL /*MDS capability */ -#define OBD_CONNECT_OSS_CAPA 0x200000ULL /*OSS capability */ -#define OBD_CONNECT_CANCELSET 0x400000ULL /*Early batched cancels. */ -#define OBD_CONNECT_SOM 0x800000ULL /*Size on MDS */ -#define OBD_CONNECT_AT 0x1000000ULL /*client uses AT */ -#define OBD_CONNECT_LRU_RESIZE 0x2000000ULL /*LRU resize feature. */ -#define OBD_CONNECT_MDS_MDS 0x4000000ULL /*MDS-MDS connection */ -#define OBD_CONNECT_REAL 0x8000000ULL /* obsolete since 2.8 */ -#define OBD_CONNECT_CHANGE_QS 0x10000000ULL /*Not used since 2.4 */ -#define OBD_CONNECT_CKSUM 0x20000000ULL /*support several cksum algos*/ -#define OBD_CONNECT_FID 0x40000000ULL /*FID is supported by server */ -#define OBD_CONNECT_VBR 0x80000000ULL /*version based recovery */ -#define OBD_CONNECT_LOV_V3 0x100000000ULL /*client supports LOV v3 EA */ -#define OBD_CONNECT_GRANT_SHRINK 0x200000000ULL /* support grant shrink */ -#define OBD_CONNECT_SKIP_ORPHAN 0x400000000ULL /* don't reuse orphan objids */ -#define OBD_CONNECT_MAX_EASIZE 0x800000000ULL /* preserved for large EA */ -#define OBD_CONNECT_FULL20 0x1000000000ULL /* it is 2.0 client */ -#define OBD_CONNECT_LAYOUTLOCK 0x2000000000ULL /* client uses layout lock */ -#define OBD_CONNECT_64BITHASH 0x4000000000ULL /* client supports 64-bits - * directory hash - */ -#define OBD_CONNECT_MAXBYTES 0x8000000000ULL /* max stripe size */ -#define OBD_CONNECT_IMP_RECOV 0x10000000000ULL /* imp recovery support */ -#define OBD_CONNECT_JOBSTATS 0x20000000000ULL /* jobid in ptlrpc_body */ -#define OBD_CONNECT_UMASK 0x40000000000ULL /* create uses client umask */ -#define OBD_CONNECT_EINPROGRESS 0x80000000000ULL /* client handles -EINPROGRESS - * RPC error properly - */ -#define OBD_CONNECT_GRANT_PARAM 0x100000000000ULL/* extra grant params used for - * finer space reservation - */ -#define OBD_CONNECT_FLOCK_OWNER 0x200000000000ULL /* for the fixed 1.8 - * policy and 2.x server - */ -#define OBD_CONNECT_LVB_TYPE 0x400000000000ULL /* variable type of LVB */ -#define OBD_CONNECT_NANOSEC_TIME 0x800000000000ULL /* nanosecond timestamps */ -#define OBD_CONNECT_LIGHTWEIGHT 0x1000000000000ULL/* lightweight connection */ -#define OBD_CONNECT_SHORTIO 0x2000000000000ULL/* short io */ -#define OBD_CONNECT_PINGLESS 0x4000000000000ULL/* pings not required */ -#define OBD_CONNECT_FLOCK_DEAD 0x8000000000000ULL/* flock deadlock detection */ -#define OBD_CONNECT_DISP_STRIPE 0x10000000000000ULL/*create stripe disposition*/ -#define OBD_CONNECT_OPEN_BY_FID 0x20000000000000ULL /* open by fid won't pack - * name in request - */ -#define OBD_CONNECT_LFSCK 0x40000000000000ULL/* support online LFSCK */ -#define OBD_CONNECT_UNLINK_CLOSE 0x100000000000000ULL/* close file in unlink */ -#define OBD_CONNECT_MULTIMODRPCS 0x200000000000000ULL /* support multiple modify - * RPCs in parallel - */ -#define OBD_CONNECT_DIR_STRIPE 0x400000000000000ULL/* striped DNE dir */ -#define OBD_CONNECT_SUBTREE 0x800000000000000ULL /* fileset mount */ +#define OBD_CONNECT_OLD_FALLOC 0x400ULL /* no o_valid flags */ +#define OBD_CONNECT_TRANSNO 0x800ULL /* replay send transno */ +#define OBD_CONNECT_IBITS 0x1000ULL /* not checked 2.11+ */ +#define OBD_CONNECT_BARRIER 0x2000ULL /* write barrier */ +#define OBD_CONNECT_ATTRFID 0x4000ULL /* Server GetAttr FID */ +#define OBD_CONNECT_NODEVOH 0x8000ULL /* No open handle spec */ +#define OBD_CONNECT_RMT_CLIENT 0x10000ULL /* Never used, gone 2.9*/ +#define OBD_CONNECT_RMT_CLIENT_FORCE 0x20000ULL /* Never used, gone 2.9*/ +#define OBD_CONNECT_BRW_SIZE 0x40000ULL /* Max bytes per rpc */ +#define OBD_CONNECT_QUOTA64 0x80000ULL /* Unused since 2.4 */ +#define OBD_CONNECT_MDS_CAPA 0x100000ULL /* Unused since 2.7 */ +#define OBD_CONNECT_OSS_CAPA 0x200000ULL /* Unused since 2.7 */ +#define OBD_CONNECT_CANCELSET 0x400000ULL /* Early batch cancel */ +#define OBD_CONNECT_SOM 0x800000ULL /* Unused since 2.7 */ +#define OBD_CONNECT_AT 0x1000000ULL /* client uses AT */ +#define OBD_CONNECT_LRU_RESIZE 0x2000000ULL /* LRU resize feature */ +#define OBD_CONNECT_MDS_MDS 0x4000000ULL /* MDS-MDS connection */ +#define OBD_CONNECT_REAL 0x8000000ULL /* Unused since 2.8 */ +#define OBD_CONNECT_CHANGE_QS 0x10000000ULL /* Unused since 2.4 */ +#define OBD_CONNECT_CKSUM 0x20000000ULL /* cksum algo choice */ +#define OBD_CONNECT_FID 0x40000000ULL /* server handles FIDs */ +#define OBD_CONNECT_VBR 0x80000000ULL /* version based recov */ +#define OBD_CONNECT_LOV_V3 0x100000000ULL /* client LOV v3 EA */ +#define OBD_CONNECT_GRANT_SHRINK 0x200000000ULL /* handle grant shrink */ +#define OBD_CONNECT_SKIP_ORPHAN 0x400000000ULL /* no orph objid reuse */ +#define OBD_CONNECT_MAX_EASIZE 0x800000000ULL /* EA size in reply */ +#define OBD_CONNECT_FULL20 0x1000000000ULL /* it is 2.0 client */ +#define OBD_CONNECT_LAYOUTLOCK 0x2000000000ULL /* client layout lock */ +#define OBD_CONNECT_64BITHASH 0x4000000000ULL /* 64-bits dir hash */ +#define OBD_CONNECT_MAXBYTES 0x8000000000ULL /* max stripe size */ +#define OBD_CONNECT_IMP_RECOV 0x10000000000ULL /* imp recov support */ +#define OBD_CONNECT_JOBSTATS 0x20000000000ULL /* ptlrpc_body jobid */ +#define OBD_CONNECT_UMASK 0x40000000000ULL /* create client umask */ +#define OBD_CONNECT_EINPROGRESS 0x80000000000ULL /* client -EINPROGRESS + * RPC error handling + */ +#define OBD_CONNECT_GRANT_PARAM 0x100000000000ULL /* extra grant params for + * space reservation + */ +#define OBD_CONNECT_FLOCK_OWNER 0x200000000000ULL /* unused since 2.0 */ +#define OBD_CONNECT_LVB_TYPE 0x400000000000ULL /* variable LVB type */ +#define OBD_CONNECT_NANOSEC_TIME 0x800000000000ULL /* nanosec timestamp */ +#define OBD_CONNECT_LIGHTWEIGHT 0x1000000000000ULL /* lightweight connect */ +#define OBD_CONNECT_SHORTIO 0x2000000000000ULL /* short io */ +#define OBD_CONNECT_PINGLESS 0x4000000000000ULL /* pings not required */ +#define OBD_CONNECT_FLOCK_DEAD 0x8000000000000ULL /* flk deadlock detect */ +#define OBD_CONNECT_DISP_STRIPE 0x10000000000000ULL /* create stripe disp */ +#define OBD_CONNECT_OPEN_BY_FID 0x20000000000000ULL /* open by FID won't pack + * name in request + */ +#define OBD_CONNECT_LFSCK 0x40000000000000ULL /* allow online LFSCK */ +#define OBD_CONNECT_UNLINK_CLOSE 0x100000000000000ULL /* unlink closes file */ +#define OBD_CONNECT_MULTIMODRPCS 0x200000000000000ULL /* allow multiple change + * RPCs in parallel + */ +#define OBD_CONNECT_DIR_STRIPE 0x400000000000000ULL /* striped DNE dir */ +#define OBD_CONNECT_SUBTREE 0x800000000000000ULL /* fileset mount */ /* was OBD_CONNECT_LOCKAHEAD_OLD 0x1000000000000000ULL old lockahead 2.12-2.13*/ - -/** bulk matchbits is sent within ptlrpc_body */ -#define OBD_CONNECT_BULK_MBITS 0x2000000000000000ULL +#define OBD_CONNECT_BULK_MBITS 0x2000000000000000ULL /* ptlrpc_body matchbit*/ #define OBD_CONNECT_OBDOPACK 0x4000000000000000ULL /* compact OUT obdo */ #define OBD_CONNECT_FLAGS2 0x8000000000000000ULL /* second flags word */ /* ocd_connect_flags2 flags */ -#define OBD_CONNECT2_FILE_SECCTX 0x1ULL /* set file security - * context at create - */ -#define OBD_CONNECT2_LOCKAHEAD 0x2ULL /* ladvise lockahead - * v2 - */ -#define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir - */ -#define OBD_CONNECT2_SUM_STATFS 0x8ULL /* MDT return aggregated stats */ -#define OBD_CONNECT2_OVERSTRIPING 0x10ULL /* OST overstriping support */ -#define OBD_CONNECT2_FLR 0x20ULL /* FLR support */ -#define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* create/unlink/... intents - * for wbc, also operations - * under client-held parent - * locks - */ -#define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert support */ -#define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* store HSM archive_id in array */ -#define OBD_CONNECT2_INC_XID 0x200ULL /* Increasing xid */ -#define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* has client SELinux policy */ -#define OBD_CONNECT2_LSOM 0x800ULL /* LSOM support */ -#define OBD_CONNECT2_PCC 0x1000ULL /* Persistent Client Cache */ -#define OBD_CONNECT2_CRUSH 0x2000ULL /* crush hash striped directory - */ -#define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* support async DoM data - * discard - */ -#define OBD_CONNECT2_ENCRYPT 0x8000ULL /* client-to-disk encrypt */ -#define OBD_CONNECT2_FIDMAP 0x10000ULL /* FID map */ -#define OBD_CONNECT2_GETATTR_PFID 0x20000ULL /* pack parent FID in getattr */ -#define OBD_CONNECT2_LSEEK 0x40000ULL /* SEEK_HOLE/DATA RPC */ -#define OBD_CONNECT2_DOM_LVB 0x80000ULL /* pack DOM glimpse data in LVB */ -#define OBD_CONNECT2_REP_MBITS 0x100000ULL /* match reply by mbits, not xid */ -#define OBD_CONNECT2_REP_MBITS 0x100000ULL /* match reply mbits not xid*/ -#define OBD_CONNECT2_MODE_CONVERT 0x200000ULL /* LDLM mode convert */ -#define OBD_CONNECT2_BATCH_RPC 0x400000ULL /* Multi-RPC batch request */ -#define OBD_CONNECT2_PCCRO 0x800000ULL /* Read-only PCC */ -#define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL/* request lock on 1st open */ -#define OBD_CONNECT2_ENCRYPT_NAME 0x8000000ULL /* name encrypt */ -/* XXX README XXX: - * Please DO NOT add flag values here before first ensuring that this same - * flag value is not in use on some other branch. Please clear any such - * changes with senior engineers before starting to use a new flag. Then, - * submit a small patch against EVERY branch that ONLY adds the new flag, +#define OBD_CONNECT2_FILE_SECCTX 0x1ULL /* security context */ +#define OBD_CONNECT2_LOCKAHEAD 0x2ULL /* ladvise lockahead */ +#define OBD_CONNECT2_DIR_MIGRATE 0x4ULL /* migrate striped dir */ +#define OBD_CONNECT2_SUM_STATFS 0x8ULL /* MDT aggregate statfs*/ +#define OBD_CONNECT2_OVERSTRIPING 0x10ULL /* OST overstriping */ +#define OBD_CONNECT2_FLR 0x20ULL /* FLR mirror handling */ +#define OBD_CONNECT2_WBC_INTENTS 0x40ULL /* MDS wb cache intent */ +#define OBD_CONNECT2_LOCK_CONVERT 0x80ULL /* IBITS lock convert */ +#define OBD_CONNECT2_ARCHIVE_ID_ARRAY 0x100ULL /* HSM archive_id array*/ +#define OBD_CONNECT2_INC_XID 0x200ULL /* Increasing xid */ +#define OBD_CONNECT2_SELINUX_POLICY 0x400ULL /* cli SELinux policy */ +#define OBD_CONNECT2_LSOM 0x800ULL /* Lazy Size on MDT */ +#define OBD_CONNECT2_PCC 0x1000ULL /* Persist Client Cache*/ +#define OBD_CONNECT2_CRUSH 0x2000ULL /* CRUSH dir hash */ +#define OBD_CONNECT2_ASYNC_DISCARD 0x4000ULL /* async DoM discard */ +#define OBD_CONNECT2_ENCRYPT 0x8000ULL /* client disk encrypt */ +#define OBD_CONNECT2_FIDMAP 0x10000ULL /* MDT migrate FID map */ +#define OBD_CONNECT2_GETATTR_PFID 0x20000ULL /* parent FID getattr */ +#define OBD_CONNECT2_LSEEK 0x40000ULL /* SEEK_HOLE/DATA RPC */ +#define OBD_CONNECT2_DOM_LVB 0x80000ULL /* DoM glimpse in LVB */ +#define OBD_CONNECT2_REP_MBITS 0x100000ULL /* reply mbits, not XID*/ +#define OBD_CONNECT2_MODE_CONVERT 0x200000ULL /* LDLM mode convert */ +#define OBD_CONNECT2_BATCH_RPC 0x400000ULL /* Multi-op batch RPCs */ +#define OBD_CONNECT2_PCCRO 0x800000ULL /* PCC read-only */ +#define OBD_CONNECT2_MNE_TYPE 0x1000000ULL /* mne_nid_type IPv6 */ +#define OBD_CONNECT2_LOCK_CONTENTION 0x2000000ULL /* contention detect */ +#define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL /* lock on first open */ +#define OBD_CONNECT2_ENCRYPT_NAME 0x8000000ULL /* name encrypt */ +/* XXX README XXX README XXX README XXX README XXX README XXX README XXX + * Please DO NOT add OBD_CONNECT flags before first ensuring that this value + * is not in use by some other branch/patch. Email adilger@whamcloud.com + * to reserve the new OBD_CONNECT value for use by your feature. Then, submit + * a small patch against master and LTS branches that ONLY adds the new flag, * updates obd_connect_names[], adds the flag to check_obd_connect_data(), * and updates wiretests accordingly, so it can be approved and landed easily - * to reserve the flag for future use. + * to reserve the flag for future use by your feature (submitted separately). */ -#define OCD_HAS_FLAG(ocd, flg) \ - (!!((ocd)->ocd_connect_flags & OBD_CONNECT_##flg)) +#define OCD_HAS_FLAG(ocd, flag) \ + (!!((ocd)->ocd_connect_flags & OBD_CONNECT_##flag)) +#define OCD_HAS_FLAG2(ocd, flag2) (OCD_HAS_FLAG(ocd, FLAGS2) && \ + !!((ocd)->ocd_connect_flags2 & OBD_CONNECT2_##flag2)) /* Features required for this version of the client to work with server */ #define CLIENT_CONNECT_MDT_REQD (OBD_CONNECT_FID | \ @@ -855,32 +830,32 @@ struct obd_connect_data { * any field after ocd_maxbytes on the receiver without a valid flag * may result in out-of-bound memory access and kernel oops. */ - __u16 ocd_maxmodrpcs; /* Maximum modify RPCs in parallel */ - __u16 padding0; /* added 2.1.0. also fix lustre_swab_connect */ - __u32 padding1; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 ocd_connect_flags2; - __u64 padding3; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding4; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding5; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding6; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding7; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding8; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 padding9; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingA; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingB; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingC; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingD; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingE; /* added 2.1.0. also fix lustre_swab_connect */ - __u64 paddingF; /* added 2.1.0. also fix lustre_swab_connect */ -}; - -/* XXX README XXX: - * Please DO NOT use any fields here before first ensuring that this same - * field is not in use on some other branch. Please clear any such changes - * with senior engineers before starting to use a new field. Then, submit - * a small patch against EVERY branch that ONLY adds the new field along with - * the matching OBD_CONNECT flag, so that can be approved and landed easily to - * reserve the flag for future use. + __u16 ocd_maxmodrpcs; /* Maximum modify RPCs in parallel */ + __u16 padding0; /* READ BELOW! also fix lustre_swab_connect */ + __u32 padding1; /* READ BELOW! also fix lustre_swab_connect */ + __u64 ocd_connect_flags2;/* OBD_CONNECT2_* per above */ + __u64 padding3; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding4; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding5; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding6; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding7; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding8; /* READ BELOW! also fix lustre_swab_connect */ + __u64 padding9; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingA; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingB; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingC; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingD; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingE; /* READ BELOW! also fix lustre_swab_connect */ + __u64 paddingF; /* READ BELOW! also fix lustre_swab_connect */ +}; +/* XXX README XXX README XXX README XXX README XXX README XXX README XXX + * Please DO NOT use any fields before first ensuring that this field is + * not in use by some other branch/patch. Email adilger@whamcloud.com to + * reserve the new obd_connect_data field for use by your feature. Then, submit + * a small patch against master and LTS branch that ONLY adds the new field, + * updates lustre_swab_connect(), along with the matching OBD_CONNECT flag, + * and updates wiretests accordingly,so it can be approved and landed easily + * to reserve the field for future use by your feature (submitted separately). */ /* From patchwork Tue Sep 6 01:55:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12966744 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 167DBECAAD3 for ; Tue, 6 Sep 2022 01:56:39 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4MM7m65qsQz1yCC; Mon, 5 Sep 2022 18:56:38 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4MM7lL6s2Nz1yBW for ; Mon, 5 Sep 2022 18:55:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id F3658100B054; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F169358994; Mon, 5 Sep 2022 21:55:39 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 5 Sep 2022 21:55:37 -0400 Message-Id: <1662429337-18737-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> References: <1662429337-18737-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 24/24] lustre: llite: fix stat attributes_mask X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Fix stat attributes_mask to return STATX_ATTR_ENCRYPTED whenever it is possible. Also fix sanityn test_106c to expect at least the 0x30 flag for attributes_mask. WC-bug-id: https://jira.whamcloud.com/browse/LU-16085 Lustre-commit: 0e48653c27eacad29 ("LU-16085 llite: fix stat attributes_mask") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/48208 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 115ee69..5394cce 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -5268,6 +5268,9 @@ int ll_getattr_dentry(struct dentry *de, struct kstat *stat, u32 request_mask, } stat->attributes_mask = STATX_ATTR_IMMUTABLE | STATX_ATTR_APPEND; +#ifdef CONFIG_FS_ENCRYPTION + stat->attributes_mask |= STATX_ATTR_ENCRYPTED; +#endif stat->attributes |= ll_inode_to_ext_flags(inode->i_flags); stat->result_mask &= request_mask;